# Undertanding the diabetes set

The notebook aims to undertand the content of the tips data set.


## Acknowledgments

- https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html

- The **sklearn.datasets** python package

- Image below from https://www.gutmicrobiotaforhealth.com/es/como-contribuye-la-microbiota-intestinal-a-la-diabetes-de-tipo-2-lo-que-ya-sabemos/


# Diabetes data set

![Diabetes.jpg](datasets/diabetes/diabetes.jpg)

1. The dataset description
    - Many observations/measurements/recordings of the characteristics/attributes/variables of persons
    - Variables: age, sex, bmi, bp, tc, ... (10 variables)
    - Total numer of observations: 442


2. Description of the predictors/variables/features/attributes
    - age in years
    - sex
    - bmi body mass index
    - bp average blood pressure
    - s1 tc, total serum cholesterol
    - s2 ldl, low-density lipoproteins
    - s3 hdl, high-density lipoproteins
    - s4 tch, total cholesterol / HDL
    - s5 ltg, possibly log of serum triglycerides level
    - s6 glu, blood sugar level


3. Description of the response
    - quantitative measure of disease progression one year after baseline


# Option 1: Importing and inspecting the data from a file in HHDD (raw data)

In [35]:
# Import the packages that we will be using
import numpy as np                  # For arrays, matrices, and functions to operate on them
import matplotlib.pyplot as plt     # For showing plots

# Dataset url
url = "datasets/diabetes/diabetes.txt"

# Load the dataset
data = np.loadtxt(url)

X    = data[:,:-1]
y    = data[:,-1]

In [36]:
X.shape

(442, 10)

In [37]:
y.shape

(442,)

# Option 2: Importing and inspecting the data from sklearn (normalized data)

In [38]:
# Import the packages that we will be using
from sklearn import datasets

# Load the dataset
X, y = datasets.load_diabetes(return_X_y=True)


In [39]:
X.shape

(442, 10)

Note that each of the 10 variables have been mean centered and scaled by the standard deviation times n_samples (i.e. the sum of squares of each column totals 1)

In [40]:
y.shape

(442,)