# **Diabetes Dataset Overview**

The Diabetes dataset contains data collected from 442 patients, with the aim of predicting diabetes progression over the course of one year. The dataset includes 10 features (predictors) that represent various physiological and lifestyle factors:

**age:** Age of the patient.

**sex:** Gender of the patient (encoded as 0 for female, 1 for male).

**bmi:** Body Mass Index, a measure of body fat based on height and weight.

**bp:** Average blood pressure of the patient.

**s1 to s6:** Six different blood serum measurements that capture various biochemical properties.

**disease_progression:** The target variable, which is a quantitative measure of the patient's diabetes progression one year after the initial measurements.

### **Problem Statement:The goal of the dataset is to predict the disease progression based on these features using SVR model.**

url for SVR: https://www.youtube.com/watch?v=kPw1IGUAoY8

## **Import the necessary libraries**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_squared_error, r2_score

## **Load the dataset from the pandas library and print the pandas dataframe:**

In [None]:
# Load the Diabetes dataset
diabetes = load_diabetes()

# Convert to a Pandas DataFrame
diabetes_df = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)

# Display the DataFrame
diabetes_df.head()


### **Print the description of the data:**

In [None]:
# Print dataset description
print(diabetes.DESCR)

### **Segregate the Independent and dependent variables:**

In [None]:
# Convert features (X) to a DataFrame
X = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)

# Convert target (y) to a DataFrame
y = pd.DataFrame(diabetes.target, columns=['disease progression'])

### **Display the X and y dataframes:**

In [None]:
# Display the DataFrames
print("X (features):")
X.head()

In [None]:
print("\nY (disease progression):")
y.head()

### **Combining the X and y dataframes:**

In [None]:
# Combine features and target into one DataFrame for analysis
data = pd.concat([X, y], axis=1)
data

### **Plot the relationship between BMI and disease progression**

In [None]:
# Example: Plot the relationship between BMI and disease progression
plt.scatter(data['bmi'], data['disease_progression'], alpha=0.5)
plt.title("Relationship between BMI and Disease Progression")
plt.xlabel("BMI (Body Mass Index)")
plt.ylabel("Disease Progression")
plt.show()

### **Split the data into training and testing data**

In [None]:
# Split the data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### **Create and train the SVR model**

In [None]:
# Create and train the SVR model
from sklearn.svm import SVR
svr = SVR(kernel='rbf', C=100, gamma=0.1, epsilon=0.1)
svr.fit(X_train, y_train)

## **Make Predictions:**

In [None]:
# Make predictions
y_pred = svr.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")

# Display predictions and actual values
results = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred})
results

### **Use GridSearch to tune the hyperparameters:**

In [None]:
from sklearn.model_selection import GridSearchCV

# Define the parameter grid for SVR
param_grid = {
    'C': [1, 10, 100, 1000],
    'gamma': [0.1, 0.01, 0.001],
    'epsilon': [0.1, 0.2, 0.5]
}

# Create the SVR model
svr = SVR(kernel='rbf')

# Set up GridSearchCV
grid_search = GridSearchCV(estimator=svr, param_grid=param_grid, cv=5, scoring='neg_mean_squared_error', verbose=1)

# Perform GridSearch on the training data
grid_search.fit(X_train, y_train)

# Best parameters and best score
best_params = grid_search.best_params_
best_score = -grid_search.best_score_  # Convert from negative MSE to positive

print(f"Best Parameters: {best_params}")
print(f"Best Cross-Validated MSE: {best_score:.2f}")

# Use the best model to make predictions
best_model = grid_search.best_estimator_
y_pred = best_model.predict(X_test)

# Evaluate the best model
mse = mean_squared_error(y_test, y_pred)
print(f"Test Set MSE: {mse:.2f}")

# Display predictions and actual values
results = pd.DataFrame({'Actual': y_test.values, 'Predicted': y_pred})
results


### **Plot the Actual Values and Predicted Values:**

In [None]:
import matplotlib.pyplot as plt

# Plot predicted vs actual values
plt.figure(figsize=(8, 6))

# Plotting the actual values
plt.plot(y_test.values, label='Actual Values', color='blue', marker='o')

# Plotting the predicted values
plt.plot(y_pred, label='Predicted Values', color='red', linestyle='dashed', marker='x')

# Title and labels
plt.title("Predicted vs Actual Values")
plt.xlabel("Index")
plt.ylabel("Disease Progression")

# Add a legend
plt.legend()

# Display the plot
plt.grid(True)
plt.show()
