scikit learn

1: # 📈 Linear Regression - Summary

'''
✅ What is Linear Regression?
- A supervised learning algorithm used for predicting a continuous output.
- It finds the best-fitting straight line (y = mx + c) through the data points.

🧪 How it Works:
1. Assumes a linear relationship between input (X) and output (y).
2. Tries to minimize the error (difference between actual and predicted values).
3. Uses the **least squares method** to find the best-fit line.

⚙️ Main Code Steps:
1. Import:
   from sklearn.linear_model import LinearRegression
2. Prepare data:
   x = df[['feature']]        # Independent variable(s)
   y = df[['target']]         # Dependent variable
3. Train-test split:
   from sklearn.model_selection import train_test_split
   x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
4. Train model:
   model = LinearRegression()
   model.fit(x_train, y_train)
5. Predict:
   y_pred = model.predict(x_test)
6. Evaluate:
   from sklearn.metrics import mean_squared_error, r2_score
   mse = mean_squared_error(y_test, y_pred)
   r2 = r2_score(y_test, y_pred)

🎯 Key Points:
- y = mx + c → m = slope (coefficient), c = intercept
- Good for predicting numerical values (e.g., price, temperature).
- Sensitive to outliers.
- Best for linearly related data.
- Evaluate using metrics like **R² score**, **Mean Squared Error (MSE)**.

📌 Use Cases:
- Predicting house prices
- Forecasting sales/revenue
- Analyzing relationships (e.g., hours studied vs. exam score)
'''


In [2]:
#step 1
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression 

#step 2

student_data={'hour_study':[2,3,4,5,6,7,8,9,10],'exam_score':[50,60,70,75,80,85,90,93,95]}

df=pd.DataFrame(student_data)

#step 3 (store extracted feature in variable)

x=df[['hour_study']]
y=df[['exam_score']]

#step 4 (test the stored features using 80:20 means 80% training and 20% testing for efficient model)

x_train , x_test , y_train , y_test = train_test_split(x , y , test_size=0.2 , random_state=42) 

#creating instance for linear res to call as model

model=LinearRegression()

#training for x,y 80%

model.fit(x_train,y_train)

#user input testing

user_input=float(input("enter number of hours you study:"))

#transfering predicting data to model

predicted_score=model.predict([[user_input]])

print(f"predicted exam score:{predicted_score}")

predicted exam score:[[88.80952381]]




2:# 🔐 Logistic Regression - Summary

'''
✅ What is Logistic Regression?
- A supervised classification algorithm.
- Used to predict the probability of a binary (or multi-class) outcome.
- Output is between 0 and 1 using the **sigmoid function**.

🧪 How it Works:
1. Takes input features (X) and applies linear equation: z = wx + b
2. Passes z into the sigmoid function: sigmoid(z) = 1 / (1 + e^(-z))
3. Outputs a probability → class is 1 if prob > 0.5, else 0

⚙️ Main Code Steps:
1. Import:
   from sklearn.linear_model import LogisticRegression
2. Prepare data:
   x = df[['features']]
   y = df['target']       # 0 or 1
3. Split:
   from sklearn.model_selection import train_test_split
   x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
4. Train:
   model = LogisticRegression()
   model.fit(x_train, y_train)
5. Predict:
   y_pred = model.predict(x_test)
6. Evaluate:
   from sklearn.metrics import accuracy_score, classification_report
   accuracy = accuracy_score(y_test, y_pred)

🎯 Key Points:
- Used for binary classification (0/1), or multi-class (with `multi_class='multinomial'`)
- Uses sigmoid curve to model probabilities
- Outputs class labels (0/1) and probabilities (with `predict_proba`)
- Works best with linearly separable data
- Evaluate using accuracy, precision, recall, F1-score

📌 Use Cases:
- Spam detection
- Disease prediction (e.g., diabetes)
- Customer churn prediction
- Admission or purchase likelihood
'''


In [9]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Step 1: Input data
x = np.array([[30, 25, 0], [30, 40, 1], [20, 35, 0], [35, 45, 1]])
y = np.array([0, 1, 0, 1])

# Step 2: Split data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=42)

# Step 3: Train model
model = LogisticRegression()
model.fit(x_train, y_train)

# Step 4: Check accuracy
accuracy = model.score(x_test, y_test)
print(f"Model Accuracy: {accuracy * 100:.2f}%\n")

# Step 5: User input
try:
    user_age = float(input(" Enter user age: "))
    user_time_spent = float(input(" Enter time spent on website (in mins): "))
    user_add_cart = int(input(" Enter 1 if user added to cart, else 0: "))

    user_data = np.array([[user_age, user_time_spent, user_add_cart]])

    # Step 6: Prediction
    prediction = model.predict(user_data)

    # Step 7: Output result
    if prediction[0] == 1:
        print("\n User is most likely to purchase!")
    else:
        print("\n User may not purchase (low chance).")

except ValueError:
    print("\n Invalid input. Please enter numbers only.")


Model Accuracy: 100.00%


 User is most likely to purchase!


# ⚔️ Support Vector Machine (SVM) - Summary

'''
✅ What is SVM?
- A powerful supervised learning algorithm used for **classification** and **regression**.
- Best for complex and high-dimensional datasets.
- It finds the **best decision boundary (hyperplane)** that separates classes with the **maximum margin**.

🧪 How it Works:
1. Maps input data into high-dimensional space (if needed).
2. Finds the **optimal hyperplane** that separates classes.
3. Focuses on the **support vectors** — the closest points to the boundary.
4. Can use different **kernels** to handle non-linear data.

⚙️ Main Code Steps:
1. Import:
   from sklearn.svm import SVC    # for classification
2. Prepare data:
   x = df[['features']]
   y = df['labels']
3. Train-test split:
   from sklearn.model_selection import train_test_split
   x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
4. Train:
   model = SVC(kernel='linear')   # try 'rbf', 'poly', etc.
   model.fit(x_train, y_train)
5. Predict:
   y_pred = model.predict(x_test)
6. Evaluate:
   from sklearn.metrics import accuracy_score, classification_report
   accuracy = accuracy_score(y_test, y_pred)

🎯 Key Points:
- **Kernel** trick allows handling non-linear data:
  - 'linear', 'rbf' (Gaussian), 'poly', 'sigmoid'
- Good for both small & large feature spaces
- Not affected much by outliers
- Focuses only on **support vectors** for decision-making
- Can be slow on very large datasets

📌 Use Cases:
- Image classification
- Bioinformatics (gene classification)
- Text classification (e.g., spam vs. ham)
- Handwriting recognition
'''



In [10]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report

# Step 1: Simpler, pattern-based dataset
data = {
    'age': [18, 22, 25, 28, 31, 35, 40, 45, 50, 55, 60, 65],
    'monthly_recharge': [10, 15, 18, 20, 25, 30, 100, 110, 115, 120, 130, 140],
    'churn': [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]  # 0 for low recharge, 1 for high
}
df = pd.DataFrame(data)

# Step 2: Split features and labels
x = df[['age', 'monthly_recharge']]
y = df['churn']

# Step 3: Scale features
scaler = StandardScaler()
x_scaled = scaler.fit_transform(x)

# Step 4: Train-test split
x_train, x_test, y_train, y_test = train_test_split(x_scaled, y, test_size=0.25, random_state=42)

# Step 5: Train SVM using linear kernel
model = SVC(kernel='linear', C=1.0)
model.fit(x_train, y_train)

# Step 6: Predict and evaluate
y_pred = model.predict(x_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"\n Model Accuracy (linear kernel): {accuracy:.2f}")
print("\n Classification Report:\n", classification_report(y_test, y_pred))

#user input

input_age=int(input("enter the value of your age"))
input_monthlycharge=float(input("enter the monthly charge"))

user_data=np.array([[input_age,input_monthlycharge]])

prediction=model.predict(user_data)

if prediction[0]==0:
    print("the user will stay")
else:
    print(">>user may leave<<")


 Model Accuracy (linear kernel): 1.00

 Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         1
           1       1.00      1.00      1.00         2

    accuracy                           1.00         3
   macro avg       1.00      1.00      1.00         3
weighted avg       1.00      1.00      1.00         3

>>user may leave<<


# 🔍 K-Nearest Neighbors (KNN) - Summary

'''
✅ What is KNN?
- KNN is a supervised machine learning algorithm.
- Mostly used for classification; also works for regression.
- It predicts the class of a new data point based on the majority class of its k-nearest neighbors.

🧪 How it Works:
1. Choose the number of neighbors (k).
2. Calculate distance (usually Euclidean) between the new point and all training points.
3. Select the k-nearest points.
4. For classification → Majority vote.
   For regression → Take average of neighbors.
5. Output the result.

⚙️ Main Code Steps:
1. Import:
   from sklearn.neighbors import KNeighborsClassifier
2. Scale data:
   from sklearn.preprocessing import StandardScaler
   scaler = StandardScaler()
3. Split data:
   from sklearn.model_selection import train_test_split
4. Train:
   knn = KNeighborsClassifier(n_neighbors=3)
   knn.fit(x_train, y_train)
5. Test & evaluate:
   accuracy = knn.score(x_test, y_test)

🎯 Key Points:
- Small k = flexible but noisy (overfitting risk).
- Large k = smoother but may underfit.
- Distance metric: Default is Euclidean.
- Data scaling is important.
- KNN is a lazy learner: No actual training, stores data for prediction.

📌 Use Cases:
- Customer segmentation
- Pattern recognition (handwriting, face)
- Recommender systems
'''



In [None]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# 2. Data
data = np.array([
    [25, 40000, 2], [40, 70000, 1], [20, 30000, 2], [35, 60000, 3], [25, 50000, 1],
    [20, 70000, 3], [28, 48000, 1], [32, 65000, 2], [22, 35000, 3], [27, 42000, 2],
    [30, 52000, 1], [23, 45000, 2], [33, 62000, 3], [26, 49000, 2], [29, 53000, 1],
    [31, 64000, 2], [21, 37000, 3], [24, 41000, 2], [34, 61000, 3], [36, 59000, 1],
    [37, 58000, 2], [38, 57000, 1], [39, 56000, 2], [41, 54000, 3], [42, 55000, 1],
    [43, 53000, 2], [44, 52000, 1], [45, 51000, 3], [46, 50000, 2], [47, 49000, 1]
])
labels = np.array([1, 0, 1, 2, 0, 2, 1, 2, 0, 1,
                   1, 1, 2, 1, 1, 2, 0, 1, 2, 0,
                   2, 0, 2, 2, 0, 2, 0, 2, 1, 0])#0>>low , 1>>medium , 2>>high

# 3. Scale data
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)

# 4. Train-test split
x_train, x_test, y_train, y_test = train_test_split(data_scaled, labels, test_size=0.2, random_state=42, stratify=labels)

# 5. Train model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(x_train, y_train)

# 6. Test accuracy
accuracy = knn.score(x_test, y_test)
print(f"Accuracy of the model is:{ (accuracy*100)}%")

#7. user input

user_input=np.array([[45,80000,1]])
user_input_scaled=scaler.transform(user_input)
knn.predict(user_input_scaled)


Accuracy of the model is:83.33333333333334%


array([0])