# **Baisc Notebook for Machine Learning using sklearn:**

This notebook will cover all the steps which are important and constant in every machine learning task using sklearn.

## **Step 1**: Load the required libraries

In [13]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# **Step 2**: Load the dataset

In [2]:
df = sns.load_dataset('tips')
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


# **Step 3**: Data Preprocessing

## Data Preprocessing include:
1. `Data Cleaning:` 
    1. Imputing missing values
    2. Handling outliers
    3. Smoothing the noisy data
    4. Inconsistancies

2. `Data Integration:`
    1. Merging
    2. Joining
    3. Aggregating
    4. Concatenating
    5. Consolidation
    6. Handling the duplicates

3. `Data Transformation:`
    1. Scaling
    2. Normalization
    3. Aggregation
    4. Generalization

4. `Data Reduction:`
    1. Attribute selection
    2. Dimensionality reduction
    3. Numerosity reduction
    4. Data compression
    5. Data Encoding

5. `Data Discretization:`
    1. Binning
    2. Clustering
    3. Quantization
    

# **Step 4**: Seperate the Feature `X` and Target/Label `y` categories

In [14]:

X = df[['total_bill']]
y = df['tip']

# **Step 5**: Train Test Split

In [15]:
X_train, X_test, y_train, y_test = train_test_split(X , y, train_size=0.8)

# **Step 6**: Call the model:

In [16]:
model = LinearRegression()

# **Step 7**: Fit the model

In [17]:
model.fit(X_train, y_train)

# **Step 8**: Predict the model


In [18]:
y_pred = model.predict(X_test)

In [19]:
y_pred

array([3.66151455, 2.51874635, 3.72828086, 2.11911614, 3.0673912 ,
       5.07425087, 2.0823463 , 2.42488647, 5.26003537, 3.56184891,
       2.08911969, 2.77807055, 2.46842971, 2.37263458, 2.57486875,
       2.23716671, 5.76513698, 2.38424611, 3.43702495, 3.26478723,
       2.60680046, 3.08674375, 3.50766176, 3.85116957, 3.41864002,
       2.69291932, 2.61841199, 2.02525627, 2.05041458, 2.33683236,
       2.35908779, 2.54583992, 3.71860458, 2.83419295, 2.21974942,
       2.21200839, 3.89180993, 4.45883971, 2.32328557, 2.31360929,
       2.74033307, 4.00502236, 6.00510863, 2.13072768, 3.28994555,
       2.68034016, 4.83621447, 2.63486166, 2.79064971])

In [36]:
model.predict([[120]])



array([12.70012406])

# **Step 9**: Evaluate the model

In [30]:
from sklearn.metrics import mean_squared_error, r2_score, root_mean_squared_error

print('MSE: ', mean_squared_error(y_test, y_pred))
print('R2: ', r2_score(y_test, y_pred))
print('RMSE: ', np.sqrt(mean_squared_error(y_test, y_pred)))


MSE:  1.038345996606986
R2:  0.6076310404026314
RMSE:  1.0189926381515158


# **Step 10**: Save and load a model

In [32]:
import pickle
pickle.dump(model, open('model_01.pkl', 'wb')) # wb = write binary: write the model in binary format


In [33]:
import pickle
load_model = pickle.load(open('model_01.pkl', 'rb')) # rb = read binary: read the model in binary format

In [35]:
load_model.predict([[120]])



array([12.70012406])

---

### So, these `10 step` are the basic steps that are involved in Machine Learning uisng scikit_learn:

1. Load the required libraries
2. Load the dataset
3. Data Preprocessing
4. Sepearte Feature 'X' and target/label 'y'
5. trian_test split the data
6. Model selection
7. Model fitting
8. Model pridiction
9. Model Evaluation
10. Save & load the model


---

# About Me:

<img src="https://scontent.flhe6-1.fna.fbcdn.net/v/t39.30808-6/449152277_18043153459857839_8752993961510467418_n.jpg?_nc_cat=108&ccb=1-7&_nc_sid=127cfc&_nc_ohc=6slHzGIxf0EQ7kNvgEeodY9&_nc_ht=scontent.flhe6-1.fna&oh=00_AYCiVUtssn2d_rREDU_FoRbXvszHQImqOjfNEiVq94lfBA&oe=66861B78" width="30%">

**Muhammd Faizan**

3rd Year BS Computer Science student at University of Agriculture, Faisalabad.\
Contact me for queries/collabs/correction

[Kaggle](https://www.kaggle.com/faizanyousafonly/)\
[Linkedin](https://www.linkedin.com/in/mrfaizanyousaf/)\
[GitHub](https://github.com/faizan-yousaf/)\
[Email] faizan6t45@gmail.com or faizanyousaf815@gmail.com \
[Phone/WhatsApp]() +923065375389