## Supervised learning using Regression

## Predicting Price

## Objectives

On completing this assignment, you will learn how to write a simple AI application involving supervised learning using regression.

## Description

Write an AI application which, when provided with a diamond's attributes, will predict its price. For training and testing the application, please use the labeled data set provided in the file, sb_diamonds.csv. The data set contains data regarding 53940 diamonds with 10 attributes each including the price. Use 80% of the data items for training, and the remaining 20% for testing. Use the sklearn's Linear Regression (LinearRegression) model for training. After the model is trained, test it using the test data and produce the Mean Absolute Percentage Error (MeanAbsolutePercentageError) (MAPE) reflecting its performance. Also produce trained model's coefficient (coeff_) and intercept (intercpt_) values. 

#### Regressor models to be used

Altogether, try out the following regression models of sklearn's library and compare their performance using Mean Absolute Percentage Error (MAPE) values.

- Linear Regressor (LinearRegression) from sklearn,linear_model

- KNeighbor Regressor (KNeighborRegressor) from sklearn.neighbors (using n_neighbors=5)
  
- Support Vector Regressor (SVR) from sklearn.svm
  
- Random Forest Regressor (RandomForestRegressor) from sklearn.ensemble 

#### Individual Values

Also, try out made-up attribute values of a few diamonds with the best performing model from the above list and report the attribute values used and predicted prices received from the model.

## Implementation 

#### Preprocessing

- Remove rows containing missing or null values
- Remove duplicate rows

#### Columns Used

Use all columns provided.

#### Column Cleaning

- carat, depth, table, price, x, y, z, and price column values are already numerical. So, leave them as they are.
  
- cut and clarity column values seem to be ordinal type. So, we need to convert them into numerical values using sklearn.preprocessing's label encoder (LabelEncoder).
  
- color column values seem to be nominal type. So, we need to convert them into numerical values using panda's getdummies function (using one hot decoding).

## Discussion

#### Column Data Types

Data values are of either quantitative or qualitative type.

#### Quantitative (Numerical) Values

We can recognize quantitative (numrical) type values from the fact that they can be shown along a number line and we can perform mathematical operations (+, -, *, /) on them. The quantitative (numerical) type values can be either of discrete or continuous type.

##### Continuous Values

When quantitative (numerical) type values are along a number line within a range and all possible values within the range are permitted, then they are considered to be of continuous type. For example, height and weight size values are considered continuous because all weight and height values with a range are permitted.

##### Discrete Values
 
When quantitative (numerical) type values are along a number line within a range but some values  within the range are not included, then they are considered to be of discrete type. For example clothes and shoe size values are considered discrete because only certain clothes and shoe sizes exist within a range. 

For differentiating between discrete and continuous type values, consider shoe size and foot size values. Shoe size values are considered discrete because only certain shoe size values are permitted (the shoe size values of 8.11, 8.12 etc. do not exist). On the other hand, foot size values are considered continuous because we can specify a foot size of any value within a range

Regressors versus Classifiers 

In our supervised learning problems, if the target (label) values are continuous such as prices (a price can have any value within the range)then we use regressors to solve them. However, when the target can have only certain values or can belong to certain categories, we use classifiers to solve them.


#### Qualitative (Categorical) (Non-numerical) Values

We can recognize Qualitative (Categorical) (non-numerical) type values from the fact that they can be shown along a number line and we cannot perform mathematical operations (+, -, *, /) on them. The quantitative (numerical) type values can be either of nominal or ordinal type.

##### Nominal Values
 
When data values are just names without any ranking or order to them, they are considered nominal values. For example, if a hair-color column contains values such as black, brown, red etc., then these value are considered nominal values because there is no ranking attached to these values. 

##### Ordinal Values

When data values are names but there is an implied ranking or order attached to them, they are considered ordinal values. For example, if a job satisfaction column contains values such as unsatisfied, satisfied, very satisfied etc. then these values are considered ordinal values because there is an implied ranking or order attached to them.

Implementing nominal and ordinal values

In our problem, both nominal and ordinal column values are converted to numerical values. For converting nominal values, we use Pandas' getdummies method. It creates a separate column for each different name value. So, in our hair color example above, it will create a column for "black', a column for "brown", and a column for "red" etc. and assign 0 or 1 in each column indicating the presence or absent of that color in the individual. 

On the other hand, for an ordinal column value, we use sklearn.preprocessing module's label encoder (LabelEncoder). The encoder does not create any new columns. Instead, it substitutes value 0, 1, 2, 3, etc for different ordered name values. 


## Implementation Notes


#### Dataset source

The data set was downloaded fity-data-determining-factors


## Submittal

The uploaded submittal should contain the following:

- jpynb file after running the application from start to finish containing the marked source code, output, and your interaction.
  
- the corresponding html file.

## Keith Yrisarri Stateson
June 23, 2024. Python 3.11.0

## Title: Predicting Diamond Prices Using Various Regression Models - Supervised Learning

## Summary
This program is an AI application to predict the prices of Diamonds based on their attributes. Supervised learning and regression techniques are used to train and evaluate multiple models on a provided dataset, predict diamond prices for new, made-up attributes, and assess model accuracy using MAPE. The goal is to determine which model performs best in predicting diamond prices and to understand the influence of various features on the price.

#### Assumption
Columns x, y, z are length, width, depth (height) of the diamond.

## Table of Contents

Part 1: DataFrame Cleaning

Part 2: Evaluate the Features and Target variable

Part 3: Data Cleaning

Part 4: Feature Engineering

Part 5: Train-Test Split and Feature Scaling

Part 6: Modeling
- Linear Regression Model
- Random Forest Regressor Model
- KNN Model
- SVR Model

Part 7: Identify the best performing model to predict diamond prices

Part 8: Predict Diamond Prices with New Data

## Part 1: DataFrame Cleaning

Evaluate the dataframe for missing values, empty rows and columns, and duplicate entries

In [146]:
import seaborn as sns
import pandas as pd
import warnings
import numpy as np

warnings.filterwarnings('ignore')
#warnings.filterwarnings('ignore', category=UserWarning)

df=pd.read_csv('sb_diamonds.csv',index_col=0)

print(df.shape)
df.head(3)

(53940, 10)


Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31


In [147]:
# check for missing values
df.isnull().sum()

carat      0
cut        0
color      0
clarity    0
depth      0
table      0
price      0
x          0
y          0
z          0
dtype: int64

In [148]:
# Sum of missing values across all rows and columns in the entire dataframe. It does not indicate
# the number of empty rows directly, but rather the total number of missing entries in the dataframe.
df.isna().sum().sum()

np.int64(0)

In [149]:
# Find empty rows (rows where all elements are NaN)
empty_rows = df[df.isna().all(axis=1)]
print('Empty Rows: ', empty_rows)

# Find empty columns (columns where all elements are NaN)
empty_columns = df.columns[df.isna().all()].tolist()
print("Empty Columns:", empty_columns)

Empty Rows:  Empty DataFrame
Columns: [carat, cut, color, clarity, depth, table, price, x, y, z]
Index: []
Empty Columns: []


In [150]:
# Drop rows with missing values
df = df.dropna()
print(df.shape)

# Verifiy that there are no rows with missing values
df.isna().sum().sum()

(53940, 10)


np.int64(0)

In [151]:
# Find duplicate rows
duplicate_rows = df[df.duplicated()]
print('Duplicate Rows:', duplicate_rows)

Duplicate Rows:        carat    cut color clarity  depth  table  price     x     y     z
1005    0.79  Ideal     G     SI1   62.3   57.0   2898  5.90  5.85  3.66
1006    0.79  Ideal     G     SI1   62.3   57.0   2898  5.90  5.85  3.66
1007    0.79  Ideal     G     SI1   62.3   57.0   2898  5.90  5.85  3.66
1008    0.79  Ideal     G     SI1   62.3   57.0   2898  5.90  5.85  3.66
2025    1.52   Good     E      I1   57.3   58.0   3105  7.53  7.42  4.28
...      ...    ...   ...     ...    ...    ...    ...   ...   ...   ...
47969   0.52  Ideal     D     VS2   61.8   55.0   1919  5.19  5.16  3.20
49326   0.51  Ideal     F    VVS2   61.2   56.0   2093  5.17  5.19  3.17
49557   0.71   Good     F     SI2   64.1   60.0   2130  0.00  0.00  0.00
50079   0.51  Ideal     F    VVS2   61.2   56.0   2203  5.19  5.17  3.17
52861   0.50   Fair     E     VS2   79.0   73.0   2579  5.21  5.18  4.09

[146 rows x 10 columns]


In [152]:
# Drop duplicate rows
df = df.drop_duplicates()
print(df.shape)

# Verify that there are no duplicate rows
duplicate_rows = df[df.duplicated()]
print('Duplicate Rows:', duplicate_rows)


(53794, 10)
Duplicate Rows: Empty DataFrame
Columns: [carat, cut, color, clarity, depth, table, price, x, y, z]
Index: []


In [153]:
print(df.head(3))

   carat      cut color clarity  depth  table  price     x     y     z
0   0.23    Ideal     E     SI2   61.5   55.0    326  3.95  3.98  2.43
1   0.21  Premium     E     SI1   59.8   61.0    326  3.89  3.84  2.31
2   0.23     Good     E     VS1   56.9   65.0    327  4.05  4.07  2.31


## Part 2: Evaluate the Features and Target variable

In [154]:
# Evaluate the unique values in the 'price' column
df['price'].value_counts()

price
605      132
802      126
625      125
776      124
828      124
        ... 
14683      1
14680      1
14675      1
8812       1
9793       1
Name: count, Length: 11602, dtype: int64

In [155]:
# Evaluate the feature 'carat'
df['carat'].value_counts()

carat
0.30    2596
1.01    2240
0.31    2238
0.70    1981
0.32    1827
        ... 
3.02       1
3.65       1
3.50       1
3.22       1
3.11       1
Name: count, Length: 273, dtype: int64

In [156]:
# Evaluate the feature 'cut'
df['cut'].value_counts()

cut
Ideal        21488
Premium      13748
Very Good    12069
Good          4891
Fair          1598
Name: count, dtype: int64

In [157]:
# Evaluate the feature 'color'
df.color.value_counts()

color
G    11262
E     9776
F     9520
H     8272
D     6755
I     5407
J     2802
Name: count, dtype: int64

In [158]:
# Evaluate the feature 'clarity'
df.clarity.value_counts()

clarity
SI1     13032
VS2     12229
SI2      9150
VS1      8156
VVS2     5056
VVS1     3647
IF       1784
I1        740
Name: count, dtype: int64

In [159]:
# evaluate the feature 'depth'
df.depth.value_counts()

depth
62.0    2233
61.9    2160
61.8    2069
62.2    2033
62.1    2011
        ... 
71.3       1
44.0       1
53.0       1
53.1       1
54.7       1
Name: count, Length: 184, dtype: int64

In [160]:
# Evaluate the feature 'table'
df.table.value_counts()

table
56.0    9851
57.0    9695
58.0    8352
59.0    6562
55.0    6242
        ... 
51.6       1
63.5       1
43.0       1
62.4       1
61.6       1
Name: count, Length: 127, dtype: int64

In [161]:
# Evaluate the feature 'x'
df.x.value_counts()

x
4.37     444
4.34     436
4.33     428
4.38     426
4.32     422
        ... 
10.74      1
9.36       1
9.06       1
8.89       1
9.05       1
Name: count, Length: 554, dtype: int64

In [162]:
# Evaluate the feature 'y'
df.y.value_counts()

y
4.34     436
4.37     432
4.35     424
4.33     420
4.32     411
        ... 
8.89       1
10.16      1
9.46       1
9.63       1
31.80      1
Name: count, Length: 552, dtype: int64

In [163]:
# Evaluate the feature 'z'
df.z.value_counts()

z
2.70     761
2.69     745
2.71     733
2.68     730
2.72     691
        ... 
5.86       1
5.66       1
5.72       1
5.73       1
31.80      1
Name: count, Length: 375, dtype: int64

## Part 3: Data Cleaning - Features and Target variable

*Conversion of Panda Series into a NumPy Array.*  
Many machine learning libraries, such as scikit-learn, expect input data to be in the form of NumPy arrays rather than pandas Series.  
Converting the target to a NumPy array ensures compatibility with these libraries.

In [164]:
# No cleaning required for dataset values

In [165]:
# Assign the target variable
target = df.price
print(target.head(3))
target = np.array(target)
print(type(target))
print(type(target[0]))

0    326
1    326
2    327
Name: price, dtype: int64
<class 'numpy.ndarray'>
<class 'numpy.int64'>


In [166]:
# Assign the features
df_features = df.filter(['carat', 'cut', 'color', 'clarity', 'depth', 'table', 'x', 'y', 'z'])
df_features

Unnamed: 0,carat,cut,color,clarity,depth,table,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,4.20,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,4.34,4.35,2.75
...,...,...,...,...,...,...,...,...,...
53935,0.72,Ideal,D,SI1,60.8,57.0,5.75,5.76,3.50
53936,0.72,Good,D,SI1,63.1,55.0,5.69,5.75,3.61
53937,0.70,Very Good,D,SI1,62.8,60.0,5.66,5.68,3.56
53938,0.86,Premium,H,SI2,61.0,58.0,6.15,6.12,3.74


In [167]:
# Convert the features to a numpy array
df_features.carat = np.array(df_features.carat)
df_features.cut = np.array(df_features.cut)
df_features.color = np.array(df_features.color)
df_features.clarity = np.array(df_features.clarity)
df_features.depth = np.array(df_features.depth)
df_features.table = np.array(df_features.table)
df_features.x = np.array(df_features.x)
df_features.y = np.array(df_features.y)
df_features.z = np.array(df_features.z)

## Part 4: Feature Engineering

Transform nominal categorical data to numerical using pandas.get_dummies, and drop the catgorical column and add the newly created numerical version to the features dataframe.

Transforms ordinal categorical data to numerical using the LabelEncoder.

In [168]:
# Convert categorial 'color' to numerical values using get dummies
df_features_color_num = pd.get_dummies(df_features.color, dtype=int, drop_first=True)
df_features_color_num

Unnamed: 0,E,F,G,H,I,J
0,1,0,0,0,0,0
1,1,0,0,0,0,0
2,1,0,0,0,0,0
3,0,0,0,0,1,0
4,0,0,0,0,0,1
...,...,...,...,...,...,...
53935,0,0,0,0,0,0
53936,0,0,0,0,0,0
53937,0,0,0,0,0,0
53938,0,0,0,1,0,0


In [169]:
# Encode the 'cut' feature using LabelEncoder

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()

print(df_features.cut.head(3))
print('\n')

df_features.cut = le.fit_transform(df_features.cut)
print(df_features.cut.value_counts())
print('\n')

# Visual mapping of Screen_resolution values to integers 
mapping = dict(zip(le.classes_, range(len(le.classes_))))
print(mapping)

0      Ideal
1    Premium
2       Good
Name: cut, dtype: object


cut
2    21488
3    13748
4    12069
1     4891
0     1598
Name: count, dtype: int64


{'Fair': 0, 'Good': 1, 'Ideal': 2, 'Premium': 3, 'Very Good': 4}


In [170]:
# Encode the 'clarity' feature using LabelEncoder
print(df_features.clarity.head(3))
print('\n')

df_features.clarity = le.fit_transform(df_features.clarity)
print(df_features.clarity.value_counts())
print('\n')

# Visual mapping of Screen_resolution values to integers 
mapping = dict(zip(le.classes_, range(len(le.classes_))))
print(mapping)

0    SI2
1    SI1
2    VS1
Name: clarity, dtype: object


clarity
2    13032
5    12229
3     9150
4     8156
7     5056
6     3647
1     1784
0      740
Name: count, dtype: int64


{'I1': 0, 'IF': 1, 'SI1': 2, 'SI2': 3, 'VS1': 4, 'VS2': 5, 'VVS1': 6, 'VVS2': 7}


In [171]:
# Drop the categorical 'color' feature and replace it with numberical df_features_color_num
df_features = df_features.drop('color', axis=1)
df_features = pd.concat([df_features, df_features_color_num], axis=1)
print(df_features)

       carat  cut  clarity  depth  table     x     y     z  E  F  G  H  I  J
0       0.23    2        3   61.5   55.0  3.95  3.98  2.43  1  0  0  0  0  0
1       0.21    3        2   59.8   61.0  3.89  3.84  2.31  1  0  0  0  0  0
2       0.23    1        4   56.9   65.0  4.05  4.07  2.31  1  0  0  0  0  0
3       0.29    3        5   62.4   58.0  4.20  4.23  2.63  0  0  0  0  1  0
4       0.31    1        3   63.3   58.0  4.34  4.35  2.75  0  0  0  0  0  1
...      ...  ...      ...    ...    ...   ...   ...   ... .. .. .. .. .. ..
53935   0.72    2        2   60.8   57.0  5.75  5.76  3.50  0  0  0  0  0  0
53936   0.72    1        2   63.1   55.0  5.69  5.75  3.61  0  0  0  0  0  0
53937   0.70    4        2   62.8   60.0  5.66  5.68  3.56  0  0  0  0  0  0
53938   0.86    3        3   61.0   58.0  6.15  6.12  3.74  0  0  0  1  0  0
53939   0.75    2        3   62.2   55.0  5.83  5.87  3.64  0  0  0  0  0  0

[53794 rows x 14 columns]


## Part 5: Train-Test Split and Feature Scaling

Assign features and target.  
Split the dataset into 80% training, 20% testing.  
Standardize the training and test feature data.  
Apply the transformation to both the training and test datasets.

In [172]:
# Assign featurs and the target variable, and split the data into training and test sets

X = df_features
y = target

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [173]:
# Standardize the features using StandardScaler and fit the scaler to the training data, and transform the test data

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print(X_train)
print('\n')
print(X_test)

[[-0.94345735 -0.54238513  0.67731271 ... -0.42625798 -0.33455838
  -0.23421566]
 [ 2.96422154 -0.54238513 -0.48265118 ... -0.42625798 -0.33455838
  -0.23421566]
 [-0.56529488 -0.54238513  0.67731271 ... -0.42625798 -0.33455838
  -0.23421566]
 ...
 [-1.00648443 -0.54238513 -1.06263312 ... -0.42625798 -0.33455838
   4.26956936]
 [ 0.2120391   0.43196552 -1.06263312 ... -0.42625798 -0.33455838
   4.26956936]
 [ 0.44313838  0.43196552  0.09733077 ... -0.42625798 -0.33455838
  -0.23421566]]


[[-0.1871324  -2.49108644  0.67731271 ... -0.42625798 -0.33455838
   4.26956936]
 [ 0.2120391   1.40631617 -0.48265118 ... -0.42625798 -0.33455838
  -0.23421566]
 [-0.48125877 -0.54238513  0.09733077 ... -0.42625798 -0.33455838
  -0.23421566]
 ...
 [-1.00648443 -0.54238513  1.25729466 ... -0.42625798 -0.33455838
  -0.23421566]
 [ 1.55661678  0.43196552  0.67731271 ... -0.42625798 -0.33455838
  -0.23421566]
 [-0.94345735  0.43196552  1.25729466 ... -0.42625798 -0.33455838
  -0.23421566]]


## Part 6: Modeling

Linear Regression  
Random Forest Regressor  
KNN Model  
SVR

In [174]:
# Train the model using Linear Regression, RandomForestRegressor, KNeighborsRegressor, and SVR, and make predictions

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.svm import SVR

# Linear Regression
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

# RandomForestRegressor
rf_model = RandomForestRegressor(random_state=42)
rf_model.fit(X_train, y_train)

# KNeighborsRegressor
knn_model = KNeighborsRegressor()
knn_model.fit(X_train, y_train)

# SVR
svr_model = SVR()
svr_model.fit(X_train, y_train)

# Prediction using Linear Regression
y_pred_lr = lr_model.predict(X_test)
# y_pred_lr = np.maximum(0, y_pred_lr)  # This only corrects the output after the prediction, but it doesn't address
# the underlying issue with the model training and feature engineering.
print(f'Predicted Prices (Linear Regression): {y_pred_lr[:10]}')
print('\n')

# Prediction using RandomForestRegressor
y_pred_rf = rf_model.predict(X_test)
# y_pred_rf = np.maximum(0, y_pred_rf)
print(f'Predicted Prices (RandomForest): {y_pred_rf[:10]}')
print('\n')

# Prediction using KNeighborsRegressor
y_pred_knn = knn_model.predict(X_test)
# y_pred_knn = np.maximum(0, y_pred_knn)
print(f'Predicted Prices (KNeighbors): {y_pred_knn[:10]}')
print('\n')

# Prediction using SVR
y_pred_svr = svr_model.predict(X_test)
# y_pred_svr = np.maximum(0, y_pred_svr)
print(f'Predicted Prices (SVR): {y_pred_svr[:10]}')
print('\n')

# Print the actual prices of the test set
print(f'Actual Prices: {y_test[0:10]}')

Predicted Prices (Linear Regression): [1387.58056255 4721.46090486 2217.3873344  1800.45752497 5873.31106769
   73.83998816  465.33499263 1977.01980533 1026.68609899 4575.49973261]


Predicted Prices (RandomForest): [1682.78 3696.55 1868.14 1630.1  5284.52  633.48  470.58 2315.18 1058.54
 4801.71]


Predicted Prices (KNeighbors): [1417.2 3952.2 1824.6 1646.4 5240.   668.4  549.6 2200.8 1764.4 4674. ]


Predicted Prices (SVR): [2984.22347221 3664.41342034 1885.32636926 1717.99141832 4447.08151184
  847.76012863  321.75187697 2736.31189265 1781.4912965  4238.27023873]


Actual Prices: [1435 3584 1851 1590 5690  596  492 2063  970 4796]


In [175]:
# Calculate MAPE for each model

from sklearn.metrics import mean_absolute_percentage_error

# Calculate MAPE for Linear Regression
mape_lr = mean_absolute_percentage_error(y_test, y_pred_lr)
print(f'Linear Regression MAPE: {mape_lr}')

# Calculate MAPE for RandomForestRegressor
mape_rf = mean_absolute_percentage_error(y_test, y_pred_rf)
print(f'RandomForestRegressor MAPE: {mape_rf}')

# Calculate MAPE for KNeighborsRegressor
mape_knn = mean_absolute_percentage_error(y_test, y_pred_knn)
print(f'KNeighborsRegressor MAPE: {mape_knn}')

# Calculate MAPE for SVR
mape_svr = mean_absolute_percentage_error(y_test, y_pred_svr)
print(f'SVR MAPE: {mape_svr}')

Linear Regression MAPE: 0.36533548596427373
RandomForestRegressor MAPE: 0.06696369185391982
KNeighborsRegressor MAPE: 0.11579333475616
SVR MAPE: 0.37367955328281194


## Part 7: Identify the best performing model to predict mobile phone prices

In [176]:
# Best model based on MAPE
mape_values_dict = {'Linear Regression': mape_lr, 'RandomForestRegressor': mape_rf, 'KNeighborsRegressor': mape_knn, 'SVR': mape_svr}
print(f'Best Model is: {min(mape_values_dict, key=mape_values_dict.get)} with MAPE: {min(mape_values_dict.values())}')

Best Model is: RandomForestRegressor with MAPE: 0.06696369185391982


## Part 8: Predict Diamond Prices with New Data

In [181]:
# carat  cut  clarity  depth  table     x     y     z  E  F  G  H  I  J
sc_new_data = sc.transform([[0.33, 2, 3, 61.5, 55.0, 3, 3, 2, 1, 0, 0, 0, 0, 0]])
print(rf_model.predict (sc_new_data))
print(knn_model.predict (sc_new_data))
print(lr_model.predict (sc_new_data))
print(svr_model.predict (sc_new_data))

[396.15]
[467.6]
[2279.08009674]
[1189.47392309]
