
Task 1:  
-------
Boston House Price Prediction – Create a Machine Learning Regression model that predicts the house price of Boston City. Collect the Dataset online. 

Make sure to try out all the Regression models we learn’t and choose the best one.

- CRIM: Per capita crime rate by town.
- ZN: Proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS: Proportion of non-retail business acres per town.
- CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise).
- NOX: Nitric Oxide concentration (parts per 10 million).
- RM: Average number of rooms per dwelling.
- AGE: Proportion of owner-occupied units built before 1940.
- DIS: Weighted distances to five Boston employment centers.
- RAD: Index of accessibility to radial highways.
- TAX: Full-value property-tax rate per 10,000.
- PTRATIO: Pupil-teacher ratio by town.
- LSTAT: % lower status of the population.
- MEDV: Median value of owner-occupied homes in $1000s (target variable).


In [38]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split

#CLASSIFICATION Imports :
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

#REGRESSION Imports : 
from sklearn.linear_model import LinearRegression,Lasso,Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import  SVR

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix,accuracy_score,precision_score,recall_score,mean_absolute_error,mean_squared_error,r2_score

In [39]:
df=pd.read_csv('BostonHousePricePrediction.csv')
x=df.iloc[:,:-1]
y=df.iloc[:,-1]
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,LSTAT,MEDV
0,0.00632,18.0,2.31,0,0.538,6.575,65.2,4.09,1,296,15.3,4.98,24.0
1,0.02731,0.0,7.07,0,0.469,6.421,78.9,4.9671,2,242,17.8,9.14,21.6
2,0.02729,0.0,7.07,0,0.469,7.185,61.1,4.9671,2,242,17.8,4.03,34.7
3,0.03237,0.0,2.18,0,0.458,6.998,45.8,6.0622,3,222,18.7,2.94,33.4
4,0.06905,0.0,2.18,0,0.458,7.147,54.2,6.0622,3,222,18.7,5.33,36.2


In [40]:
X_train,X_test,Y_train,Y_test=train_test_split(x,y,test_size=0.2)

In [41]:
scaler=StandardScaler()
X_train=scaler.fit_transform(X_train)
X_test=scaler.transform(X_test)

In [59]:
#Define Regression Models
regressor_models={
    'Multi-Linear Regression' : LinearRegression(),
    'Polynomial Regression_Lasso' : make_pipeline(PolynomialFeatures(degree=5),Lasso(alpha=0.1,max_iter=10000)),
    'Polynomial Regression_Ridge' : make_pipeline(PolynomialFeatures(degree=5),Ridge(alpha=0.1,max_iter=10000)),
    'KNN-Regressor' : KNeighborsRegressor(n_neighbors=5,metric='minkowski',p=2),
    'Decision Tree Regressor' : DecisionTreeRegressor(criterion='squared_error',max_depth=3) ,
    'Random Forest Regressor' : RandomForestRegressor(n_estimators=100,max_depth=3),
    'Support Vector Regressor' :SVR(kernel='rbf',C=20)

       }

In [60]:
#Evaluate the Models
result = {}
for name, model in regressor_models.items():
    model.fit(X_train, Y_train)  # Fit the model
    y_pred = model.predict(X_test)  # Make predictions
    
    # Calculate evaluation metrics
    mae = mean_absolute_error(Y_test, y_pred)
    mse = mean_squared_error(Y_test, y_pred)
    r2 = r2_score(Y_test, y_pred)
    
    result[name] = {'MAE': mae, 'MSE': mse, 'R2_Score': r2}
    
    print(f'{name} - MAE: {mae:.2f}, MSE: {mse:.2f}, R2_Score: {r2:.2f}')

Multi-Linear Regression - MAE: 3.62, MSE: 31.92, R2_Score: 0.68


  model = cd_fast.enet_coordinate_descent(


Polynomial Regression_Lasso - MAE: 3.88, MSE: 65.99, R2_Score: 0.35
Polynomial Regression_Ridge - MAE: 9.75, MSE: 425.58, R2_Score: -3.22
KNN-Regressor - MAE: 3.28, MSE: 35.85, R2_Score: 0.64
Decision Tree Regressor - MAE: 3.75, MSE: 24.05, R2_Score: 0.76
Random Forest Regressor - MAE: 3.37, MSE: 24.31, R2_Score: 0.76
Support Vector Regressor - MAE: 2.52, MSE: 19.87, R2_Score: 0.80


In [44]:
# Determine the best fit algorithm
best_model_name = {}
best_model_score = 0

for name in result:
    if result[name]['R2_Score'] > best_model_score:
        best_model_score = result[name]['R2_Score']
        best_model_name = name

print(f'\nThe model with the highest R2_Score is: {best_model_name} with R2_Score: {best_model_score:.2f}')



The model with the highest R2_Score is: Support Vector Regressor with R2_Score: 0.80


Task 2:
-------
Titanic Passenger Survival Prediction - The Titanic dataset contains data on passengers aboard the Titanic, where the task is to predict whether a passenger survived or not based on various features.

Implement all the Classification Algorithms and highlight the best one.

In [45]:
df=pd.read_csv('Titanic_training.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [46]:
print("Dataset Shape: ",df.shape)
print("Dataset Size: ",df.size)

Dataset Shape:  (891, 12)
Dataset Size:  10692


In [47]:
df.isnull().sum()

PassengerId      0
Survived         0
Pclass           0
Name             0
Sex              0
Age            177
SibSp            0
Parch            0
Ticket           0
Fare             0
Cabin          687
Embarked         2
dtype: int64

In [48]:
# Fill missing values
df['Age'].median()
df['Age']=df['Age'].fillna(df['Age'].median())
df['Embarked'].fillna(df['Embarked'].mode()[0])
df['Cabin']=df['Cabin'].replace(to_replace=np.nan,value='No Data')
df


Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,No Data,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,No Data,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,No Data,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,No Data,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,28.0,1,2,W./C. 6607,23.4500,No Data,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


In [49]:
# Convert categorical variables to numerical
df = pd.get_dummies(df, columns=['Sex', 'Embarked'], drop_first=True)

#Feature Selection
features = ['Pclass', 'Age', 'SibSp', 'Parch', 'Fare', 'Sex_male', 'Embarked_Q', 'Embarked_S']
X = df[features]
Y = df['Survived']

In [50]:
df.isnull().sum()

PassengerId    0
Survived       0
Pclass         0
Name           0
Age            0
SibSp          0
Parch          0
Ticket         0
Fare           0
Cabin          0
Sex_male       0
Embarked_Q     0
Embarked_S     0
dtype: int64

In [51]:
X_Train,X_Test,Y_Train,Y_Test=train_test_split(X,Y,test_size=0.2)

In [52]:
scaler=StandardScaler()
X_Train=scaler.fit_transform(X_Train)
X_Test=scaler.transform(X_Test)

In [53]:
#Define Classification Models
classifier_models={
    'Decision Tree Classifier':DecisionTreeClassifier(criterion='entropy',max_depth=3),
    'Random Forest':RandomForestClassifier(n_estimators=100,criterion='gini',max_depth=5),
    'Support Vector Classifier':SVC(kernel='rbf',C=10,gamma=0.5),
    'LogisticRegression':LogisticRegression(),
    'KNN':KNeighborsClassifier(n_neighbors=5)
       }

In [54]:
result_c={}
for name,model in classifier_models.items():
    model.fit(X_Train,Y_Train)
    y_pred_c=model.predict(X_Test)
    acc=accuracy_score(Y_Test,y_pred_c)
    result_c[name] = {'Accuracy': acc}
    print(f'{name} Accuracy: {acc:.2f}')
    print(confusion_matrix(Y_Test,y_pred_c))

Decision Tree Classifier Accuracy: 0.82
[[100  15]
 [ 18  46]]
Random Forest Accuracy: 0.82
[[103  12]
 [ 20  44]]
Support Vector Classifier Accuracy: 0.82
[[100  15]
 [ 18  46]]
LogisticRegression Accuracy: 0.79
[[97 18]
 [20 44]]
KNN Accuracy: 0.80
[[96 19]
 [16 48]]


In [55]:
# Determine the best fit algorithm
best_model_name_c = {}
best_model_score_c = 0

for name in result_c:
    if result_c[name]['Accuracy'] > best_model_score_c:
        best_model_score_c = result_c[name]['Accuracy']
        best_model_name_c = name

print(f'\nThe model with the highest R2_Score is: {best_model_name_c} with Accuracy: {best_model_score_c:.2f}')


The model with the highest R2_Score is: Random Forest with Accuracy: 0.82
