In [1]:
import numpy as np 
import pandas as pd 
import os


for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



/kaggle/input/diabetes/diabetes.csv


**Import Libraries & Explore Input Files**

This cell imports the required Python libraries, including NumPy, Pandas, and OS.
It then explores the Kaggle input directory to list all available dataset files, ensuring that the dataset is correctly loaded and accessible.

In [2]:

df = pd.read_csv('/kaggle/input/diabetes/diabetes.csv')


print("First 5 rows ")
print(df.head())

print("Info about Columns \n")
print(df.info())

First 5 rows 
   Pregnancies  Glucose  BloodPressure  SkinThickness  Insulin   BMI  \
0            6      148             72             35        0  33.6   
1            1       85             66             29        0  26.6   
2            8      183             64              0        0  23.3   
3            1       89             66             23       94  28.1   
4            0      137             40             35      168  43.1   

   DiabetesPedigreeFunction  Age  Outcome  
0                     0.627   50        1  
1                     0.351   31        0  
2                     0.672   32        1  
3                     0.167   21        0  
4                     2.288   33        1  
Info about Columns 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Pregnancies               768 non-null    int64  

**Load Dataset & Initial Exploration**
This cell loads the diabetes dataset from the Kaggle input directory into a Pandas DataFrame.
It displays:

The first five rows of the dataset to understand its structure.

General information about columns, data types, and missing values using df.info().

This step helps verify data quality before preprocessing.

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix

X = df.drop('Outcome', axis=1) 
y = df['Outcome']              

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

model = LogisticRegression()
model.fit(X_train_scaled, y_train)


predictions = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, predictions)

print(f"Accuracy is  {accuracy * 100:.2f}%")

conf_matrix = confusion_matrix(y_test, predictions)
print("Table of Errors\n")
print(conf_matrix)


Accuracy is  75.32%
Table of Errors

[[79 20]
 [18 37]]


**Import Machine Learning Tools & Split Data**

Description:

This cell imports essential scikit-learn modules for:

1.Train-test splitting

2.Feature scaling

3.Logistic Regression modeling

4.Model evaluation metrics

The dataset is divided into:

1.Features (X): all columns except Outcome

2.Target (y): the Outcome column

-The data is then split into training and testing sets.

**Feature Scaling**

Description:

This cell applies StandardScaler to normalize feature values.
Feature scaling is important for Logistic Regression to ensure all features contribute equally to the model.

The scaler is fitted on the training data and applied to both training and test datasets.

**Train Logistic Regression Model**

Description:

This cell initializes and trains a Logistic Regression model using the scaled training data.
The model learns the relationship between medical attributes and the diabetes outcome.

**Model Prediction & Evaluation**

Description:

This cell uses the trained model to:

Predict diabetes outcomes on the test dataset

Evaluate performance using accuracy score

Generate a confusion matrix to assess classification results

This step measures how well the model performs on unseen data.