#Context
Airbnb, Inc. is an American vacation rental online marketplace company based in San Francisco, California, United States. Airbnb offers arrangements for lodging, primarily homestays, or tourism experiences. The company does not own any of the real estate listings, nor does it host events; it acts as a broker, receiving commissions from each booking. [Reference](<https://en.wikipedia.org/wiki/Airbnb>)

Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way.

#Objective
Imagine you are Data Scientist who would help find the price for lodging or homestays based on different attributes mentioned in their listings. Oh wait, what are listings? Listings can include written descriptions, photographs with captions, and a user profile where potential guests can get to know a bit about the hosts.

And you are given the listings of one of the most popular cities in central Europe: Amsterdam. Now your job is to build a machine learning model that will automatically predict the price for lodging or homestays.

#Acknowledgement
This dataset is part of Airbnb Inside, and the original source can be found [here](<http://insideairbnb.com/get-the-data.html>).


In [1]:
!pip install lazypredict



In [2]:
!pip install xgboost



In [3]:
!pip install catboost



In [4]:
!pip install lightgbm



In [5]:
!pip install https://github.com/pandas-profiling/pandas-profiling/archive/master.zip

Collecting https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
  Using cached https://github.com/pandas-profiling/pandas-profiling/archive/master.zip
Building wheels for collected packages: pandas-profiling
  Building wheel for pandas-profiling (setup.py) ... [?25l[?25hdone
  Created wheel for pandas-profiling: filename=pandas_profiling-2.9.0rc1-py2.py3-none-any.whl size=258106 sha256=aa3bc20ce711e75923ed68b08af3cfdadcbee15a5fd7578dd8f6da2749ce295d
  Stored in directory: /tmp/pip-ephem-wheel-cache-pck5j5y7/wheels/56/c2/dd/8d945b0443c35df7d5f62fa9e9ae105a2d8b286302b92e0109
Successfully built pandas-profiling


In [6]:
import numpy as np
import pandas as pd
from pandas_profiling import ProfileReport
#import libraries for visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
pd.options.display.max_columns = 100
# libraries for machine learning
from sklearn.model_selection import train_test_split 
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
import lazypredict
from lazypredict.Supervised import LazyRegressor

  import pandas.util.testing as tm


# New Section

In [7]:
amsterdam_airbnb = pd.read_csv('airbnb_listing_train.csv')

In [8]:
amsterdam_airbnb_profile = ProfileReport(amsterdam_airbnb)
amsterdam_airbnb_profile

HBox(children=(FloatProgress(value=0.0, description='Summarize dataset', max=30.0, style=ProgressStyle(descrip…




HBox(children=(FloatProgress(value=0.0, description='Generate report structure', max=1.0, style=ProgressStyle(…




HBox(children=(FloatProgress(value=0.0, description='Render HTML', max=1.0, style=ProgressStyle(description_wi…






In [9]:
amsterdam_airbnb.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'minimum_nights',
       'number_of_reviews', 'last_review', 'reviews_per_month',
       'calculated_host_listings_count', 'availability_365', 'price'],
      dtype='object')

#Feature Description

|Column Name|Description|
|:----|:----|
|id|The id of each lodge or home or listing|
|name|The name/description of the lodge/home|
|host_id|The id of the host|
|host_name|Name of the host|
|neighbourhood|Name of the neighbourhood place|
|neighbourhood_group|Group in the neighbourhood|
|latitude|Latitude of the location|
|longitude|Longitude of the location|
|room_type|Type of the room that consumer booked, for example, private room or an entire home, etc.|
|minimum_nights|The minimum number of nights customer will stay|
|number_of_reviews|Number of reviews given to the lodge/home|
|last_review|The date of the last review given to the lodge/home|
|reviews_per_month|Average reviews per month|
|calculated_host_listings_count|The count of the listing that each host has|
|availability_365|The number of days (out of 365 days) for which lodge/home is available|
|price|Price for the lodging/homestays in USD - the target variable|

#Acknowledgement

This dataset is downloaded from [Airbnb Inside](<http://insideairbnb.com/get-the-data.html>).

In [10]:
#dropping name and host_name as we have id and host_id
#dropping neighbourhood_group as entire column has missing value
amsterdam_airbnb.drop(['name','host_name','neighbourhood_group'], axis=1, inplace =True)

In [11]:
amsterdam_airbnb.columns

Index(['id', 'host_id', 'neighbourhood', 'latitude', 'longitude', 'room_type',
       'minimum_nights', 'number_of_reviews', 'last_review',
       'reviews_per_month', 'calculated_host_listings_count',
       'availability_365', 'price'],
      dtype='object')

In [12]:
amsterdam_airbnb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12901 entries, 0 to 12900
Data columns (total 13 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              12901 non-null  int64  
 1   host_id                         12901 non-null  int64  
 2   neighbourhood                   12901 non-null  object 
 3   latitude                        12901 non-null  float64
 4   longitude                       12901 non-null  float64
 5   room_type                       12901 non-null  object 
 6   minimum_nights                  12901 non-null  int64  
 7   number_of_reviews               12901 non-null  int64  
 8   last_review                     11305 non-null  object 
 9   reviews_per_month               11305 non-null  float64
 10  calculated_host_listings_count  12901 non-null  int64  
 11  availability_365                12901 non-null  int64  
 12  price                           

In [13]:
amsterdam_airbnb.head()

Unnamed: 0,id,host_id,neighbourhood,latitude,longitude,room_type,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,price
0,11602914,3123809,De Pijp - Rivierenbuurt,52.35,4.9,Entire home/apt,3,13,15-02-2020,0.27,1,0,220
1,13289321,10259430,Oud-Oost,52.36,4.92,Entire home/apt,4,14,19-06-2019,0.29,1,0,110
2,40779315,224969266,Centrum-West,52.38,4.9,Entire home/apt,2,9,16-03-2020,1.65,1,7,100
3,7820311,693472,Westerpark,52.38,4.87,Entire home/apt,3,42,17-02-2020,0.72,1,0,130
4,27346603,41888346,Westerpark,52.38,4.87,Private room,2,89,26-02-2020,4.02,1,24,90


In [14]:
amsterdam_airbnb.describe()

Unnamed: 0,id,host_id,latitude,longitude,minimum_nights,number_of_reviews,reviews_per_month,calculated_host_listings_count,availability_365,price
count,12901.0,12901.0,12901.0,12901.0,12901.0,12901.0,11305.0,12901.0,12901.0,12901.0
mean,19849225.6,63217002.85,52.37,4.89,3.54,24.8,0.75,2.14,61.39,166.96
std,12162414.67,80108144.86,0.02,0.04,15.15,53.41,1.26,6.0,107.99,234.79
min,20168.0,3592.0,52.29,4.76,1.0,0.0,0.01,1.0,0.0,6.0
25%,9869642.0,8948269.0,52.36,4.86,2.0,2.0,0.18,1.0,0.0,99.0
50%,18749386.0,27116014.0,52.36,4.89,2.0,9.0,0.38,1.0,0.0,135.0
75%,29142324.0,83376861.0,52.38,4.91,3.0,24.0,0.77,1.0,87.0,190.0
max,43709001.0,349017537.0,52.43,5.02,1001.0,843.0,50.0,78.0,365.0,9000.0


In [15]:
amsterdam_airbnb.isnull().sum()

id                                   0
host_id                              0
neighbourhood                        0
latitude                             0
longitude                            0
room_type                            0
minimum_nights                       0
number_of_reviews                    0
last_review                       1596
reviews_per_month                 1596
calculated_host_listings_count       0
availability_365                     0
price                                0
dtype: int64

In [16]:
#dropping last_review and reviews_per_month as we have number_of_reviews which is more relevant compare to previous two columns
amsterdam_airbnb.drop(['last_review','reviews_per_month'], axis=1, inplace =True)

In [17]:
amsterdam_airbnb.columns

Index(['id', 'host_id', 'neighbourhood', 'latitude', 'longitude', 'room_type',
       'minimum_nights', 'number_of_reviews', 'calculated_host_listings_count',
       'availability_365', 'price'],
      dtype='object')

In [18]:
amsterdam_airbnb.isnull().sum()

id                                0
host_id                           0
neighbourhood                     0
latitude                          0
longitude                         0
room_type                         0
minimum_nights                    0
number_of_reviews                 0
calculated_host_listings_count    0
availability_365                  0
price                             0
dtype: int64

In [19]:
amsterdam_airbnb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12901 entries, 0 to 12900
Data columns (total 11 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              12901 non-null  int64  
 1   host_id                         12901 non-null  int64  
 2   neighbourhood                   12901 non-null  object 
 3   latitude                        12901 non-null  float64
 4   longitude                       12901 non-null  float64
 5   room_type                       12901 non-null  object 
 6   minimum_nights                  12901 non-null  int64  
 7   number_of_reviews               12901 non-null  int64  
 8   calculated_host_listings_count  12901 non-null  int64  
 9   availability_365                12901 non-null  int64  
 10  price                           12901 non-null  int64  
dtypes: float64(2), int64(7), object(2)
memory usage: 1.1+ MB


In [20]:
amsterdam_airbnb['room_type'].value_counts()

Entire home/apt    10064
Private room        2622
Hotel room           174
Shared room           41
Name: room_type, dtype: int64

In [21]:
#convert to category dtype
amsterdam_airbnb['neighbourhood'] = amsterdam_airbnb['neighbourhood'].astype('category')
amsterdam_airbnb.dtypes

id                                   int64
host_id                              int64
neighbourhood                     category
latitude                           float64
longitude                          float64
room_type                           object
minimum_nights                       int64
number_of_reviews                    int64
calculated_host_listings_count       int64
availability_365                     int64
price                                int64
dtype: object

In [22]:
#use .cat.codes to create new colums with encoded value
amsterdam_airbnb['neighbourhood'] = amsterdam_airbnb['neighbourhood'].cat.codes

In [23]:
amsterdam_airbnb['neighbourhood'].value_counts()

7     2209
8     1600
5     1429
4     1097
20     958
21     924
17     831
2      730
14     644
19     381
16     376
11     321
18     269
13     257
12     171
3      155
10     144
6       88
15      83
9       80
0       78
1       76
Name: neighbourhood, dtype: int64

In [24]:
amsterdam_airbnb.dtypes

id                                  int64
host_id                             int64
neighbourhood                        int8
latitude                          float64
longitude                         float64
room_type                          object
minimum_nights                      int64
number_of_reviews                   int64
calculated_host_listings_count      int64
availability_365                    int64
price                               int64
dtype: object

In [25]:
#convert categorical variable into dummy/indicator variables for ML
amsterdam_airbnb = pd.get_dummies(amsterdam_airbnb, columns=['room_type'])

In [26]:
amsterdam_airbnb.columns

Index(['id', 'host_id', 'neighbourhood', 'latitude', 'longitude',
       'minimum_nights', 'number_of_reviews', 'calculated_host_listings_count',
       'availability_365', 'price', 'room_type_Entire home/apt',
       'room_type_Hotel room', 'room_type_Private room',
       'room_type_Shared room'],
      dtype='object')

In [27]:
amsterdam_airbnb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12901 entries, 0 to 12900
Data columns (total 14 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              12901 non-null  int64  
 1   host_id                         12901 non-null  int64  
 2   neighbourhood                   12901 non-null  int8   
 3   latitude                        12901 non-null  float64
 4   longitude                       12901 non-null  float64
 5   minimum_nights                  12901 non-null  int64  
 6   number_of_reviews               12901 non-null  int64  
 7   calculated_host_listings_count  12901 non-null  int64  
 8   availability_365                12901 non-null  int64  
 9   price                           12901 non-null  int64  
 10  room_type_Entire home/apt       12901 non-null  uint8  
 11  room_type_Hotel room            12901 non-null  uint8  
 12  room_type_Private room          

In [28]:
#Convert longitude and latitude into cartesian coordinates. Assume the earth as sphere not ellipsoid
R = 6371000 #Approximate mean radius of earth (in m)
 #lons and lats must be in radians
lon,lat = map(np.radians, [amsterdam_airbnb['longitude'], amsterdam_airbnb['latitude']])

# 'Single-point' Haversine formula
a = np.sin(lat/2)**2 + np.cos(lat) * np.sin(lon/2)**2
distance = 2 * R * np.arcsin(np.sqrt(a))

#creating new column distance and dropping longitude and latitude column
amsterdam_airbnb['distance'] = distance
amsterdam_airbnb.drop(['latitude', 'longitude'], axis = 1, inplace = True)

In [29]:
amsterdam_airbnb.columns

Index(['id', 'host_id', 'neighbourhood', 'minimum_nights', 'number_of_reviews',
       'calculated_host_listings_count', 'availability_365', 'price',
       'room_type_Entire home/apt', 'room_type_Hotel room',
       'room_type_Private room', 'room_type_Shared room', 'distance'],
      dtype='object')

In [30]:
amsterdam_airbnb.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12901 entries, 0 to 12900
Data columns (total 13 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              12901 non-null  int64  
 1   host_id                         12901 non-null  int64  
 2   neighbourhood                   12901 non-null  int8   
 3   minimum_nights                  12901 non-null  int64  
 4   number_of_reviews               12901 non-null  int64  
 5   calculated_host_listings_count  12901 non-null  int64  
 6   availability_365                12901 non-null  int64  
 7   price                           12901 non-null  int64  
 8   room_type_Entire home/apt       12901 non-null  uint8  
 9   room_type_Hotel room            12901 non-null  uint8  
 10  room_type_Private room          12901 non-null  uint8  
 11  room_type_Shared room           12901 non-null  uint8  
 12  distance                        

In [31]:
amsterdam_airbnb.groupby('price').mean()

Unnamed: 0_level_0,id,host_id,neighbourhood,minimum_nights,number_of_reviews,calculated_host_listings_count,availability_365,room_type_Entire home/apt,room_type_Hotel room,room_type_Private room,room_type_Shared room,distance
price,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
6,6764212.00,2793078.00,8.00,2.00,14.00,1.00,0.00,1.00,0.00,0.00,0.00,5839451.43
10,40588320.00,313839059.00,16.00,1.00,5.00,1.00,195.00,0.00,0.00,1.00,0.00,5843176.75
12,4009850.00,1351304.00,16.00,2.00,11.00,1.00,0.00,1.00,0.00,0.00,0.00,5842757.90
19,11147683.00,34203893.00,2.00,6.00,4.00,1.00,0.00,0.00,0.00,1.00,0.00,5841069.91
20,27910574.75,190250470.62,13.25,7.12,16.62,7.88,311.12,0.12,0.62,0.25,0.00,5841855.19
...,...,...,...,...,...,...,...,...,...,...,...,...
5555,32355945.00,1464510.00,17.00,1.00,2.00,78.00,0.00,1.00,0.00,0.00,0.00,5840045.85
6477,43121886.00,318649852.00,4.00,1.00,0.00,6.00,343.00,0.00,0.00,1.00,0.00,5841218.33
7000,41911423.75,316681026.00,5.00,1.00,0.00,5.00,364.75,0.00,0.00,1.00,0.00,5841169.55
7550,33842636.00,38425413.00,4.00,2.00,0.00,1.00,89.00,0.00,0.00,1.00,0.00,5840935.66


In [32]:
X = amsterdam_airbnb.drop('price', axis = 1)
y = amsterdam_airbnb['price']

In [33]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=5)

In [34]:
# import sys
reg = LazyRegressor(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = reg.fit(X_train, X_test, y_train, y_test)
models

 90%|████████▉ | 35/39 [01:27<00:14,  3.56s/it]



100%|██████████| 39/39 [01:28<00:00,  2.27s/it]


Unnamed: 0_level_0,R-Squared,RMSE,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
SGDRegressor,0.05,160.75,0.03
ElasticNet,0.04,161.19,0.01
XGBRegressor,0.04,161.5,0.57
Lasso,0.03,161.75,0.02
ElasticNetCV,0.03,161.75,0.19
OrthogonalMatchingPursuitCV,0.03,161.8,0.04
BayesianRidge,0.03,161.85,0.03
LassoCV,0.03,161.88,0.18
RidgeCV,0.03,161.94,0.02
Ridge,0.03,161.94,0.02


In [35]:
from catboost import CatBoostRegressor
from sklearn.model_selection import GridSearchCV

In [36]:
model = CatBoostRegressor(random_seed = 42 )
parameters = {'depth'         : [6,8,10,21],
                  'learning_rate' : [0.01, 0.05, 0.1],
                  'iterations'    : [30, 50, 100]
              }
grid = GridSearchCV(estimator=model, param_grid = parameters, cv = 2, n_jobs=-1)
grid.fit(X_train, y_train)


0:	learn: 244.4382825	total: 82.5ms	remaining: 8.17s
1:	learn: 240.1185145	total: 104ms	remaining: 5.09s
2:	learn: 236.1479290	total: 124ms	remaining: 4.01s
3:	learn: 231.9562254	total: 145ms	remaining: 3.48s
4:	learn: 228.1668169	total: 165ms	remaining: 3.14s
5:	learn: 224.2159889	total: 186ms	remaining: 2.91s
6:	learn: 220.7483249	total: 206ms	remaining: 2.74s
7:	learn: 217.3534504	total: 227ms	remaining: 2.61s
8:	learn: 214.1038554	total: 248ms	remaining: 2.5s
9:	learn: 211.1783329	total: 268ms	remaining: 2.41s
10:	learn: 208.5675474	total: 293ms	remaining: 2.37s
11:	learn: 205.9405104	total: 313ms	remaining: 2.3s
12:	learn: 203.5350374	total: 335ms	remaining: 2.24s
13:	learn: 200.9682453	total: 355ms	remaining: 2.18s
14:	learn: 198.8966547	total: 375ms	remaining: 2.13s
15:	learn: 196.8866077	total: 396ms	remaining: 2.08s
16:	learn: 194.9877525	total: 416ms	remaining: 2.03s
17:	learn: 193.2770903	total: 437ms	remaining: 1.99s
18:	learn: 191.5196857	total: 458ms	remaining: 1.95s
19:	

GridSearchCV(cv=2, error_score=nan,
             estimator=<catboost.core.CatBoostRegressor object at 0x7f93afaadc50>,
             iid='deprecated', n_jobs=-1,
             param_grid={'depth': [6, 8, 10, 21], 'iterations': [30, 50, 100],
                         'learning_rate': [0.01, 0.05, 0.1]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

In [37]:
pred = grid.predict(X_test)

In [38]:
grid.score(X_test, y_test)
# 0.11175927808523356

0.11175927808523356

In [39]:
r2_score(y_test, pred)
# 0.11175927808523356

0.11175927808523356

In [40]:
#evaluating the performance of the model
#MAE
print(mean_absolute_error(y_test, pred), end='\n')
#MSE
print(mean_squared_error(y_test, pred), end='\n')
#RMSE
print(np.sqrt(mean_squared_error(y_test, pred)))
# 62.54765937124322
# 24080.109181691554
# 155.17766972632228

62.54765937124322
24080.109181691554
155.17766972632228


In [41]:
test_new = pd.read_csv('airbnb_listing_validate.csv')

In [42]:
test_new.columns

Index(['id', 'name', 'host_id', 'host_name', 'neighbourhood_group',
       'neighbourhood', 'latitude', 'longitude', 'room_type', 'minimum_nights',
       'number_of_reviews', 'last_review', 'reviews_per_month',
       'calculated_host_listings_count', 'availability_365'],
      dtype='object')

In [43]:
test_new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6451 entries, 0 to 6450
Data columns (total 15 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              6451 non-null   int64  
 1   name                            6440 non-null   object 
 2   host_id                         6451 non-null   int64  
 3   host_name                       6448 non-null   object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   6451 non-null   object 
 6   latitude                        6451 non-null   float64
 7   longitude                       6451 non-null   float64
 8   room_type                       6451 non-null   object 
 9   minimum_nights                  6451 non-null   int64  
 10  number_of_reviews               6451 non-null   int64  
 11  last_review                     5655 non-null   object 
 12  reviews_per_month               56

In [44]:
test_new.drop(['name','host_name','neighbourhood_group', 'last_review','reviews_per_month'], axis=1, inplace =True)

In [45]:
#convert to category dtype
test_new['neighbourhood'] = test_new['neighbourhood'].astype('category')
test_new.dtypes

id                                   int64
host_id                              int64
neighbourhood                     category
latitude                           float64
longitude                          float64
room_type                           object
minimum_nights                       int64
number_of_reviews                    int64
calculated_host_listings_count       int64
availability_365                     int64
dtype: object

In [46]:
#use .cat.codes to create new colums with encoded value
test_new['neighbourhood'] = test_new['neighbourhood'].cat.codes

In [47]:
test_new.dtypes

id                                  int64
host_id                             int64
neighbourhood                        int8
latitude                          float64
longitude                         float64
room_type                          object
minimum_nights                      int64
number_of_reviews                   int64
calculated_host_listings_count      int64
availability_365                    int64
dtype: object

In [48]:
#convert categorical variable into dummy/indicator variables for ML
test_new = pd.get_dummies(test_new, columns=['room_type'])

In [49]:
#Convert longitude and latitude into cartesian coordinates. Assume the earth as sphere not ellipsoid
R = 6371000 #Approximate mean radius of earth (in m)
 #lons and lats must be in radians
lon,lat = map(np.radians, [test_new['longitude'], test_new['latitude']])

# 'Single-point' Haversine formula
a = np.sin(lat/2)**2 + np.cos(lat) * np.sin(lon/2)**2
distance = 2 * R * np.arcsin(np.sqrt(a))

#creating new column distance and dropping longitude and latitude column
test_new['distance'] = distance
test_new.drop(['latitude', 'longitude'], axis = 1, inplace = True)

In [50]:
test_new.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6451 entries, 0 to 6450
Data columns (total 12 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              6451 non-null   int64  
 1   host_id                         6451 non-null   int64  
 2   neighbourhood                   6451 non-null   int8   
 3   minimum_nights                  6451 non-null   int64  
 4   number_of_reviews               6451 non-null   int64  
 5   calculated_host_listings_count  6451 non-null   int64  
 6   availability_365                6451 non-null   int64  
 7   room_type_Entire home/apt       6451 non-null   uint8  
 8   room_type_Hotel room            6451 non-null   uint8  
 9   room_type_Private room          6451 non-null   uint8  
 10  room_type_Shared room           6451 non-null   uint8  
 11  distance                        6451 non-null   float64
dtypes: float64(1), int64(6), int8(1), 

In [52]:
newmodel = CatBoostRegressor(random_seed = 42 )
newparameters = {'depth'         : [6,8,10,21],
                  'learning_rate' : [0.01, 0.05, 0.1],
                  'iterations'    : [30, 50, 100]
              }
newgrid = GridSearchCV(estimator=newmodel, param_grid = newparameters, cv = 2, n_jobs=-1)
newgrid.fit(X, y)

0:	learn: 226.8804979	total: 27ms	remaining: 2.67s
1:	learn: 219.7382604	total: 51.8ms	remaining: 2.54s
2:	learn: 213.6310975	total: 77.6ms	remaining: 2.51s
3:	learn: 207.8563910	total: 103ms	remaining: 2.46s
4:	learn: 202.2634806	total: 127ms	remaining: 2.42s
5:	learn: 197.5928495	total: 152ms	remaining: 2.39s
6:	learn: 194.2531726	total: 177ms	remaining: 2.35s
7:	learn: 191.4235273	total: 209ms	remaining: 2.4s
8:	learn: 187.9726597	total: 243ms	remaining: 2.46s
9:	learn: 184.7710159	total: 269ms	remaining: 2.42s
10:	learn: 182.0696438	total: 295ms	remaining: 2.39s
11:	learn: 179.5887931	total: 320ms	remaining: 2.35s
12:	learn: 176.9082028	total: 345ms	remaining: 2.31s
13:	learn: 174.1498182	total: 370ms	remaining: 2.27s
14:	learn: 171.8225554	total: 395ms	remaining: 2.24s
15:	learn: 170.0824987	total: 420ms	remaining: 2.21s
16:	learn: 168.9389340	total: 447ms	remaining: 2.18s
17:	learn: 166.8862484	total: 471ms	remaining: 2.15s
18:	learn: 165.3678104	total: 496ms	remaining: 2.12s
19:

GridSearchCV(cv=2, error_score=nan,
             estimator=<catboost.core.CatBoostRegressor object at 0x7f93dea68f28>,
             iid='deprecated', n_jobs=-1,
             param_grid={'depth': [6, 8, 10, 21], 'iterations': [30, 50, 100],
                         'learning_rate': [0.01, 0.05, 0.1]},
             pre_dispatch='2*n_jobs', refit=True, return_train_score=False,
             scoring=None, verbose=0)

In [53]:
testnew_pred = newgrid.predict(test_new)

In [54]:
res = pd.DataFrame(testnew_pred) #preditcions are nothing but the final predictions of your model on input features of your new unseen test data
res.index = test_new.index # its important for comparison # its important for comparison. Here "test_new" is your new test dataset
res.index = test_new['id'] # format for Kaggle
res.columns = ['price']
# To download the csv file locally
from google.colab import files
res.to_csv('sample_submission.csv')         
files.download('sample_submission.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>