### A machine learning pipeline is a sequence of steps that automates the complete Machine Learning workflow from data collection to prediction
* Data Collection
* Features Engineering (Encoding, one hot lable encoding...)
* Model Training (Linear, logistic, Decision Tree...)
* Evaluation: (Accuracy, mae, rmse)
* Prediction

In [1]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer

In [2]:
dataset={
    'Study Hours':[1,2,3,4,np.nan,6,7,8,9,10],
    'Attendacne':[60,65,np.nan,75,80,np.nan,90,92,93,98],
    'Score':[40,45,50,55,60,65,70,75,80,85]
}

In [3]:
df=pd.DataFrame(dataset)

In [4]:
df

Unnamed: 0,Study Hours,Attendacne,Score
0,1.0,60.0,40
1,2.0,65.0,45
2,3.0,,50
3,4.0,75.0,55
4,,80.0,60
5,6.0,,65
6,7.0,90.0,70
7,8.0,92.0,75
8,9.0,93.0,80
9,10.0,98.0,85


In [5]:
X=df.drop('Score',axis=1)
y=df['Score']

In [6]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

In [7]:
pipeline=Pipeline([
    ('imputer',SimpleImputer(strategy='mean')),
    ('scaler',StandardScaler()),
    ('model',LinearRegression())
])

In [8]:
pipeline

In [9]:
pipeline.fit(X_train,y_train) #Training

In [10]:
y_pred=pipeline.predict(X_test) #Prediction
print('Actual Values: ',y_test.values)
print('Predicted Values: ',y_pred)

Actual Values:  [80 45]
Predicted Values:  [79.51982398 44.43394232]


In [11]:
mae=mean_absolute_error(y_test,y_pred)
mse=mean_squared_error(y_test,y_pred)
rmse=np.sqrt(mse)
r2=r2_score(y_test,y_pred)

In [12]:
print('Mean Absolute Error: ',mae)
print('Mean Squared Error: ',mse)
print('Root Mean Squared Error: ',rmse)
print('R2 Score: ',r2)

Mean Absolute Error:  0.5231168492684759
Mean Squared Error:  0.27549515259839574
Root Mean Squared Error:  0.524876321239962
R2 Score:  0.9991004239915154


In [19]:
study_hours=float(input('Enter the study hours: '))
attendance=float(input('Enter the attendance: '))
new_data=pd.DataFrame({'Study Hours':[study_hours],'Attendacne':[attendance]})
predicted_marks=pipeline.predict(new_data)
print('Predicted Marks: ',predicted_marks[0])

Enter the study hours: 7
Enter the attendance: 98
Predicted Marks:  70.16301463039422


## Use student_performance_linear_regression.csv
* you can download from gitHub https://github.com/salman1256/aiml_dec_2025/blob/master/Day-11/
* Create a pipeline to predict student's final score.
* Evaluate & predict for a new student using pipeline


In [20]:
df=pd.read_csv('/content/sample_data/student_performance_linear_regression.csv')

In [34]:
df

Unnamed: 0,study_hours,attendance_percentage,previous_exam_score,final_exam_score
0,7,77,58,87
1,4,51,87,83
2,8,91,55,101
3,5,94,42,74
4,7,55,59,87
...,...,...,...,...
115,7,86,65,93
116,4,81,74,70
117,7,58,89,105
118,3,90,64,72


In [23]:
X=df.drop('final_exam_score',axis=1)
y=df['final_exam_score']

In [24]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2,random_state=42)

In [25]:
pipeline=Pipeline([
    ('imputer',SimpleImputer(strategy='mean')),
    ('scaler',StandardScaler()),
    ('model',LinearRegression())
])

In [26]:
pipeline

In [27]:
pipeline.fit(X_train,y_train) #Training

In [28]:
y_pred=pipeline.predict(X_test) #Prediction
print('Actual Values: ',y_test.values)
print('Predicted Values: ',y_pred)

Actual Values:  [ 93  67  87  78  80  65  96  88  81 112  61  86  91  71  65  90  86  87
  88  69  95  86  87  97]
Predicted Values:  [ 99.3470484   74.94031454  81.93646701  85.40501845  73.48589537
  67.626957    96.91152912  87.38160671  77.40737339 106.97002269
  66.55701052  85.83518004  96.06575288  72.76247995  69.13006728
  96.4392374   88.03016617  87.32951574  86.14598304  70.9930693
  94.90340129  77.96052551  90.73538567 100.49185082]


In [29]:
mae=mean_absolute_error(y_test,y_pred)
mse=mean_squared_error(y_test,y_pred)
rmse=np.sqrt(mse)
r2=r2_score(y_test,y_pred)

In [30]:
print('Mean Absolute Error: ',mae)
print('Mean Squared Error: ',mse)
print('Root Mean Squared Error: ',rmse)
print('R2 Score: ',r2)

Mean Absolute Error:  3.7807895082412935
Mean Squared Error:  20.54950420851446
Root Mean Squared Error:  4.5331560979646905
R2 Score:  0.8568740698415438


In [33]:
#study_hours	attendance_percentage	previous_exam_score
study_hours=float(input('Enter the study hours: '))
attendance=float(input('Enter the attendance: '))
previous_exam_score=float(input('Enter the previous exam score: '))
new_data=pd.DataFrame({'study_hours':[study_hours],'attendance_percentage':[attendance],'previous_exam_score':[previous_exam_score]})
predicted_marks=pipeline.predict(new_data)
print('Predicted Marks: ',round(predicted_marks[0],2))

Enter the study hours: 8
Enter the attendance: 76
Enter the previous exam score: 45
Predicted Marks:  85.6
