### what is logistic regression?
* Logistic Regression is a supervised machine learning algorithm used primarily for classification problems.
#### What it does
* It predicts the probability that a given input belongs to a certain class.
* Unlike Linear Regression (which predicts continuous values), Logistic Regression outputs values between 0 and 1 using the sigmoid (logistic) function.
#### Main Uses in Machine Learning
1. Binary Classification (two possible outcomes):
* Example: Spam (1) or Not Spam (0) in email filtering
* Example: Tumor is Malignant (1) or Benign (0) in medical diagnosis

2. Multi-class Classification (extended using One-vs-Rest or Softmax):
* Example: Classifying types of flowers (iris dataset)
* Example: Handwritten digit recognition (0–9)

3. Probability Estimation:
* It gives not only a predicted class, but also the probability score (e.g., "This email has a 90% chance of being spam").

4. Feature Impact Analysis:
* Since it uses coefficients for features, you can interpret which features increase or decrease the probability of belonging to a certain class.

### Understanding the Confusion Matrix in Machine Learning
* A confusion matrix in Machine Learning is a table used to evaluate the performance of a classification model.
* It shows how well the model’s predictions match the actual labels.
* It’s called a “confusion” matrix because it makes it easy to see where the model is confusing one class for another.

#### Metrics based on Confusion Matrix Data
* These metrics help you decide whether your model is better at minimizing false alarms (FP) or better at catching true cases (TP).
* 🔹 1. Accuracy
How often the model is correct.
* Accuracy=𝑇𝑃+𝑇𝑁/𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
Accuracy=
TP+TN+FP+FN
TP+TN
	​

🔹 2. Precision (Positive Predictive Value)

Of all predicted positives, how many are actually positive?

Precision
=
𝑇
𝑃
𝑇
𝑃
+
𝐹
𝑃
Precision=
TP+FP
TP
	​

🔹 3. Recall (Sensitivity / True Positive Rate)

Of all actual positives, how many did the model correctly identify?

Recall
=
𝑇
𝑃
𝑇
𝑃
+
𝐹
𝑁
Recall=
TP+FN
TP
	​

🔹 4. Specificity (True Negative Rate)

Of all actual negatives, how many did the model correctly identify?

Specificity
=
𝑇
𝑁
𝑇
𝑁
+
𝐹
𝑃
Specificity=
TN+FP
TN
	​

🔹 5. F1-Score

Harmonic mean of precision and recall (balances both).

𝐹
1
=
2
×
(
Precision
×
Recall
)
Precision
+
Recall
F1=
Precision+Recall
2×(Precision×Recall)
	​

🔹 6. False Positive Rate (FPR)

Of all actual negatives, how many were incorrectly classified as positive?

𝐹
𝑃
𝑅
=
𝐹
𝑃
𝐹
𝑃
+
𝑇
𝑁
FPR=
FP+TN
FP
	​

🔹 7. False Negative Rate (FNR)

Of all actual positives, how many were missed (predicted negative)?

𝐹
𝑁
𝑅
=
𝐹
𝑁
𝑇
𝑃
+
𝐹
𝑁
FNR=
TP+FN
FN
	​

🔹 8. Balanced Accuracy

Average of recall and specificity.

Balanced Accuracy
=
Recall
+
Specificity
2
Balanced Accuracy=
2
Recall+Specificity
	​
How often the model is correct.




In [15]:
from sklearn.datasets import load_breast_cancer
X,y=load_breast_cancer(return_X_y=True)
X.shape

(569, 30)

In [16]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=0.4,random_state=42)

In [17]:
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(X_train,y_train)


STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [18]:
predict=model.predict(X_test)
predict

array([1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1,
       0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1,
       1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,
       0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1,
       0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       0, 1, 0, 0, 1, 1, 0, 1])

In [19]:
from sklearn.metrics import confusion_matrix, classification_report,classification_report
classification_report(y_test,predict)




'              precision    recall  f1-score   support\n\n           0       0.96      0.94      0.95        80\n           1       0.97      0.98      0.97       148\n\n    accuracy                           0.96       228\n   macro avg       0.96      0.96      0.96       228\nweighted avg       0.96      0.96      0.96       228\n'

In [20]:
from sklearn.datasets import load_digits
# Load the dataset
digits = load_digits()
# print(digits.keys())
x=digits.data 
y=digits.target


In [25]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
x_train

array([[ 0.,  0.,  3., ..., 13.,  4.,  0.],
       [ 0.,  0.,  9., ...,  3.,  0.,  0.],
       [ 0.,  0.,  0., ...,  6.,  0.,  0.],
       ...,
       [ 0.,  0.,  9., ..., 16.,  2.,  0.],
       [ 0.,  0.,  1., ...,  0.,  0.,  0.],
       [ 0.,  0.,  1., ...,  1.,  0.,  0.]])

In [29]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler() 
scaler_x_train=scaler.fit_transform(x_train)
scaler_x_test=scaler.fit_transform(x_test)
from sklearn.linear_model import LogisticRegression
model=LogisticRegression() 
model.fit(scaler_x_train,y_train)

In [30]:
import joblib
joblib.dump(model,"model_2.pkl")

['model_2.pkl']

In [4]:
import numpy as np 


In [4]:
import numpy as np 


In [34]:
import pandas as pd 
import numpy as np 

In [36]:
pd.read_csv("D:/Ad_py_lib.csv/ml_csv/India_GDP_1960-2022 (1).csv")

Unnamed: 0.1,Unnamed: 0,India GDP - Historical Data,India GDP - Historical Data.1,India GDP - Historical Data.2,India GDP - Historical Data.3
0,,Year,GDP in (Billion) $,Per Capita in rupees,Growth %
1,0.0,2021,3173.4,182160,8.95
2,1.0,2020,2667.69,154640,-6.6
3,2.0,2019,2831.55,165760,3.74
4,3.0,2018,2702.93,159840,6.45
...,...,...,...,...,...
58,57.0,1964,56.48,9280,7.45
59,58.0,1963,48.42,8080,5.99
60,59.0,1962,42.16,7200,2.93
61,60.0,1961,39.23,6800,3.72


In [37]:
df=pd.read_csv("D:/Ad_py_lib.csv/ml_csv/India_GDP_1960-2022 (1).csv")

In [38]:
df

Unnamed: 0.1,Unnamed: 0,India GDP - Historical Data,India GDP - Historical Data.1,India GDP - Historical Data.2,India GDP - Historical Data.3
0,,Year,GDP in (Billion) $,Per Capita in rupees,Growth %
1,0.0,2021,3173.4,182160,8.95
2,1.0,2020,2667.69,154640,-6.6
3,2.0,2019,2831.55,165760,3.74
4,3.0,2018,2702.93,159840,6.45
...,...,...,...,...,...
58,57.0,1964,56.48,9280,7.45
59,58.0,1963,48.42,8080,5.99
60,59.0,1962,42.16,7200,2.93
61,60.0,1961,39.23,6800,3.72


In [43]:
df.columns

Index(['Unnamed: 0', 'India GDP - Historical Data',
       'India GDP - Historical Data.1', 'India GDP - Historical Data.2',
       'India GDP - Historical Data.3'],
      dtype='object')

In [47]:
df.isnull().sum()

Unnamed: 0                       1
India GDP - Historical Data      0
India GDP - Historical Data.1    0
India GDP - Historical Data.2    0
India GDP - Historical Data.3    0
dtype: int64

In [48]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Unnamed: 0,62.0,30.5,18.041619,0.0,15.25,30.5,45.75,61.0


In [6]:
# load datasets 
import numpy as np 
from sklearn.datasets import load_digits
data=load_digits()
x=data.data
y=data.target

# split the datasets
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
# model selection 
from sklearn.linear_model import LogisticRegression
model=LogisticRegression() 
model.fit(x_train,y_train)

# model save 
import joblib
joblib.dump(model,"number_recognition.pkl")

STOP: TOTAL NO. OF ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


['number_recognition.pkl']

In [None]:
import gradio as gr 
def greet(name,intensity): 
    return "Hello, "+name+"!"*(intensity)