The dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals. It has 50 features representing 101766 diabetes patient and hospital outcomes.
Finally after cleaning, we have 98052 rows and 21 columns (dimensions). Check my comments inside clean_diab_dataset.py
to see how I clean the data.
Dataset source: https://archive.ics.uci.edu/ml/datasets/Diabetes+130-US+hospitals+for+years+1999-2008