## Steps for ML Model

1. Read the dataset
2. Exploratory Data Analysis
    * Initial Inspection
    * Visualization
3. Preprocess the Data
    * Missing Values
    * Feature Scaling
    * Label Encoding
    * Feature Selection / Feature Extraction
4. Split into Train and Test
5. Build the Model
6. Train the Data / Learn from Data
7. Test and Evaluate the Model
8. Use the model for predicting future

### Importing Libaries and Read the Dataset

In [1]:
import pandas as pd
import numpy as np
df=pd.read_csv("E:\\ML\\Housing.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,price,lotsize,bedrooms,bathrms,stories,driveway,recroom,fullbase,gashw,airco,garagepl,prefarea
0,1,42000.0,5850,3,1,2,yes,no,yes,no,no,1,no
1,2,38500.0,4000,2,1,1,yes,no,no,no,no,0,no
2,3,49500.0,3060,3,1,1,yes,no,no,no,no,0,no
3,4,60500.0,6650,3,1,2,yes,yes,no,no,no,0,no
4,5,61000.0,6360,2,1,1,yes,no,no,no,no,0,no


### Checking for the varaible types 

In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 546 entries, 0 to 545
Data columns (total 13 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Unnamed: 0  546 non-null    int64  
 1   price       546 non-null    float64
 2   lotsize     546 non-null    int64  
 3   bedrooms    546 non-null    int64  
 4   bathrms     546 non-null    int64  
 5   stories     546 non-null    int64  
 6   driveway    546 non-null    object 
 7   recroom     546 non-null    object 
 8   fullbase    546 non-null    object 
 9   gashw       546 non-null    object 
 10  airco       546 non-null    object 
 11  garagepl    546 non-null    int64  
 12  prefarea    546 non-null    object 
dtypes: float64(1), int64(6), object(6)
memory usage: 55.6+ KB


### Checking for Null Values

In [3]:
df.isna().sum()

Unnamed: 0    0
price         0
lotsize       0
bedrooms      0
bathrms       0
stories       0
driveway      0
recroom       0
fullbase      0
gashw         0
airco         0
garagepl      0
prefarea      0
dtype: int64

### Treating Categorical Variables

In [4]:
from sklearn.preprocessing import LabelEncoder

In [5]:
le=LabelEncoder()
cols=["driveway","recroom","fullbase","gashw","airco","prefarea"]
for f in cols:
    df[f]=le.fit_transform(df[f])

In [6]:
df.head()

Unnamed: 0.1,Unnamed: 0,price,lotsize,bedrooms,bathrms,stories,driveway,recroom,fullbase,gashw,airco,garagepl,prefarea
0,1,42000.0,5850,3,1,2,1,0,1,0,0,1,0
1,2,38500.0,4000,2,1,1,1,0,0,0,0,0,0
2,3,49500.0,3060,3,1,1,1,0,0,0,0,0,0
3,4,60500.0,6650,3,1,2,1,1,0,0,0,0,0
4,5,61000.0,6360,2,1,1,1,0,0,0,0,0,0


##  Simple Linear Regression

### Determining X and Y for SLR

In [7]:
df.corr()

Unnamed: 0.1,Unnamed: 0,price,lotsize,bedrooms,bathrms,stories,driveway,recroom,fullbase,gashw,airco,garagepl,prefarea
Unnamed: 0,1.0,0.376007,0.374338,0.114219,0.108395,0.231427,0.314014,0.096721,-0.002034,-0.033494,0.157111,0.125326,0.519994
price,0.376007,1.0,0.535796,0.366447,0.516719,0.42119,0.297167,0.25496,0.186218,0.092837,0.453347,0.383302,0.329074
lotsize,0.374338,0.535796,1.0,0.151851,0.193833,0.083675,0.288778,0.140327,0.047487,-0.009201,0.221765,0.352872,0.234782
bedrooms,0.114219,0.366447,0.151851,1.0,0.373769,0.407974,-0.011996,0.080492,0.097201,0.046028,0.160412,0.139117,0.078953
bathrms,0.108395,0.516719,0.193833,0.373769,1.0,0.324066,0.041955,0.126892,0.102791,0.067365,0.184955,0.178178,0.064013
stories,0.231427,0.42119,0.083675,0.407974,0.324066,1.0,0.122499,0.042281,-0.17386,0.018261,0.296216,0.043412,0.04294
driveway,0.314014,0.297167,0.288778,-0.011996,0.041955,0.122499,1.0,0.091959,0.043428,-0.011942,0.10629,0.203682,0.199378
recroom,0.096721,0.25496,0.140327,0.080492,0.126892,0.042281,0.091959,1.0,0.372434,-0.010119,0.136626,0.038122,0.161292
fullbase,-0.002034,0.186218,0.047487,0.097201,0.102791,-0.17386,0.043428,0.372434,1.0,0.004677,0.045248,0.052524,0.228651
gashw,-0.033494,0.092837,-0.009201,0.046028,0.067365,0.018261,-0.011942,-0.010119,0.004677,1.0,-0.13035,0.068144,-0.05917


In [31]:
x=df[["lotsize",'bathrms','airco']]
y=df.price

### Dividing into Train and Test

In [32]:
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2,random_state=1,shuffle=True)

### LinearRegression

In [42]:
from sklearn.linear_model import Lasso
slr=Lasso()
slr.fit(xtrain,ytrain)


Lasso()

### Evaluation

In [34]:
ypred=slr.predict(xtest)

In [35]:
ypred

array([ 52740.1472373 ,  60625.0229595 ,  48590.21264666,  69478.21675286,
        55183.99760734,  82119.69266871,  49281.86841177,  46930.23881041,
        57581.73759304, 100230.00274019,  45915.81035492,  99208.20110526,
        43886.95344394,  87035.05963939,  97463.37967977,  59656.70488835,
        53431.8030024 ,  59226.02941657, 110062.72815185,  96414.0543871 ,
        64959.3990875 ,  73605.09615132,  80452.48822587,  88241.30281169,
        54105.01461377,  46100.25189228,  51495.1668601 ,  70250.87973949,
        95157.86046275,  82539.43973907,  70038.91454439,  50434.62802028,
        74181.47595558,  48912.98533704,  62273.7832486 ,  42572.80749024,
        69992.80416005,  55276.21837602,  48451.88149364,  82708.05669084,
        76863.25142677,  52601.81608427,  75479.93989656,  91007.92587211,
        62008.33448972,  42572.80749024,  71414.85289516,  48820.76456836,
        65151.21380431,  48728.54379968,  83715.11196688,  49466.30994913,
        71399.17088243,  

In [36]:
from sklearn.metrics import mean_absolute_error,r2_score
r2_score(ytest,ypred)

0.540173635683557