# Prediction of diabetes using normalized features, a gradient descent model from scikit-learn, and a sigmoid function.

First, we will import:

- <b>NumPy</b> for working with arrays.
- <b>StandardScaler</b> and <b>SGDRegressor</b> from scikit-learn for feature normalization and prediction.

In [69]:
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor

We will import the diabetes dataset from scikit-learn for our analysis.

In [70]:
from sklearn.datasets import load_diabetes
from sklearn.datasets import fetch_openml

X_train, y_train = fetch_openml("diabetes", version=1, as_frame=True, return_X_y=True)

Visualize the feature data

In [3]:
X_train.head()

Unnamed: 0,preg,plas,pres,skin,insu,mass,pedi,age
0,6,148,72,35,0,33.6,0.627,50
1,1,85,66,29,0,26.6,0.351,31
2,8,183,64,0,0,23.3,0.672,32
3,1,89,66,23,94,28.1,0.167,21
4,0,137,40,35,168,43.1,2.288,33


Also the target

In [73]:
y_train.head()

0    tested_positive
1    tested_negative
2    tested_positive
3    tested_negative
4    tested_positive
Name: class, dtype: category
Categories (2, object): ['tested_negative', 'tested_positive']

Map the values <b>"tested_negative"</b> and <b>"tested_positive"</b> to binary.

In [5]:
y_bin = y_train.map({"tested_negative": 0, "tested_positive": 1})
y_bin.head()

0    1
1    0
2    1
3    0
4    1
Name: class, dtype: category
Categories (2, int64): [0, 1]

Normalize feature data

In [6]:
scaler = StandardScaler()
X_norm = scaler.fit_transform(X_train)

Construct the gradient descent-based prediction model.

In [9]:
sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(X_norm, y_bin)
print(sgdr)
print(f"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}")

SGDRegressor()
number of iterations completed: 7, number of weight updates: 5377.0


Check the model’s weights and bias.

In [10]:
b_norm = sgdr.intercept_
w_norm = sgdr.coef_
print(f"model parameters:                   w: {w_norm}, b:{b_norm}")

model parameters:                   w: [ 0.07043269  0.18730111 -0.0440602   0.00020021 -0.01753583  0.10113062
  0.04745718  0.0307373 ], b:[0.35397221]


Implementation of the sigmoid function.

In [11]:
def sigmoid(z):
    """
    Compute the sigmoid of z

    Args:
        z (ndarray): A scalar, numpy array of any size.

    Returns:
        g (ndarray): sigmoid(z), with the same shape as z
         
    """
    return 1 / (1 + np.exp(-z))

We can make predictions using either the SGDRegressor model or by applying the linear formula directly and test it.

In [31]:
y_pred_sgd = sgdr.predict(X_norm)
y_pred = np.dot(X_norm, w_norm) + b_norm

print(f"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}")

prediction using np.dot() and sgdr.predict match: True


Convert the predicted values to binary format using a threshold of 0.5. Additionally, we can visualize some samples of the training data alongside their predicted values.

In [79]:
y_pred_bin = (y_pred >= 0.5).astype(int)
y_pred_bin[:10]

array([1, 0, 1, 0, 1, 0, 0, 1, 1, 0])

In [53]:
y_bin[:10]

0    1
1    0
2    1
3    0
4    1
5    0
6    1
7    0
8    1
9    1
Name: class, dtype: category
Categories (2, int64): [0, 1]

Finally, we can calculate the model’s error by dividing the number of errors by the total number of samples.

In [81]:
pct_error = np.mean(y_pred_bin != y_bin) * 100
print(f"Error percent of the model: {pct_error}%")

Error percent of the model: 21.875%
