# Final Project All Groups
### AI-MACHINE LEARNING FOUNDATIONS         2020

![title](img/street.png)

## Short Term Accommodation

Technology and the internet are changing the world and our concepts of how businesses run. Amazon is the biggest retail company that doesn´t have any products, just as Airbnb is the biggest hotel without any rooms.

You are considering creating a startup where you would advise homeowners about setting up their real estate for short term rental.
You need to create a 15 min (approx) presentation to your potential investors where you will show them how machine learning using __CRISP_DM methodology__, can be used to predict variables. The choice of dependent variable is yours!

The data available to you is public data from Inside Airbnb however if you want to use additional data you may do so.

http://insideairbnb.com/madrid/?neighbourhood=&filterEntireHomes=false&filterHighlyAvailable=false&filterRecentReviews=false&filterMultiListings=false

Description
*  Madrid	listings.csv.gz	Detailed Listings data for Madrid
*  Madrid	calendar.csv.gz	Detailed Calendar Data for listings in Madrid
*  Madrid	reviews.csv.gz	Detailed Review Data for listings in Madrid
*  Madrid	listings.csv	Summary information and metrics for listings in Madrid (good for visualisations).
*  Madrid	reviews.csv	Summary Review data and Listing ID (to facilitate time based analytics and visualisations linked to a listing).
*	Madrid	neighbourhoods.csv	Neighbourhood list for geo filter. Sourced from city or open source GIS files.
*	Madrid	neighbourhoods.geojson	GeoJSON file of neighbourhoods of the city.

![title](img/room.png)

In [2]:
import pandas as pd 
import numpy as np

In [63]:

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

pd.options.display.max_rows

In [3]:
#Upload the data into a pandas dataframe 
df = pd.read_csv("Madrid_listings.csv") 
df.head()

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price,minimum_nights,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365
0,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86
1,20185,CENTRAL MADRID.UP TO 10,76357,Francisco,Centro,Embajadores,40.41043,-3.70156,Entire home/apt,82,1,9,2011-09-13,0.08,8,364
2,21853,Bright and airy room,83531,Abdel,Latina,Cármenes,40.40341,-3.74084,Private room,17,4,33,2018-07-15,0.57,2,0
3,23148,MODERN.AMAZING.COLOURFUL.APARTMENT,76357,Francisco,Centro,Justicia,40.42174,-3.69945,Entire home/apt,76,1,17,2012-01-19,0.15,8,365
4,24805,Gran Via Studio Madrid,101471,Iraido,Centro,Universidad,40.42202,-3.70395,Entire home/apt,85,5,2,2017-07-03,0.04,1,358


In [7]:
df.shape

(21439, 16)

In [11]:
df['id'].unique().shape

(21439,)

In [4]:
calendar = pd.read_csv("calendar.csv") 
calendar.head(100)

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,336834,2019-09-19,f,$63.00,$63.00,5,250
1,6369,2019-09-19,f,$70.00,$70.00,1,365
2,6369,2019-09-20,f,$75.00,$75.00,1,365
3,6369,2019-09-21,f,$75.00,$75.00,1,365
4,6369,2019-09-22,t,$70.00,$70.00,1,365
...,...,...,...,...,...,...,...
95,6369,2019-12-22,f,$80.00,$80.00,1,365
96,6369,2019-12-23,f,$80.00,$80.00,1,365
97,6369,2019-12-24,f,$80.00,$80.00,1,365
98,6369,2019-12-25,f,$80.00,$80.00,1,365


In [6]:
calendar.shape

(7605505, 7)

In [48]:
calendar.head(50)

Unnamed: 0,listing_id,date,available,price,adjusted_price,minimum_nights,maximum_nights
0,336834,2019-09-19,f,$63.00,$63.00,5,250
1,6369,2019-09-19,f,$70.00,$70.00,1,365
2,6369,2019-09-20,f,$75.00,$75.00,1,365
3,6369,2019-09-21,f,$75.00,$75.00,1,365
4,6369,2019-09-22,t,$70.00,$70.00,1,365
5,6369,2019-09-23,t,$70.00,$70.00,1,365
6,6369,2019-09-24,t,$70.00,$70.00,1,365
7,6369,2019-09-25,f,$70.00,$70.00,1,365
8,6369,2019-09-26,f,$70.00,$70.00,1,365
9,6369,2019-09-27,t,$75.00,$75.00,1,365


In [13]:
calendar['listing_id'].unique().shape

(20837,)

In [37]:
listingsid = df['id']
calendarid = calendar['listing_id']

In [40]:
print(listingsid.shape)
print(calendarid.shape)


(21439,)
(7605505,)


In [41]:
def returnNotMatches(a, b):
    a = set(a)
    b = set(b)
    return [list(b - a), list(a - b)]

In [42]:
returnNotMatches(listingsid, calendarid)

[[36601861,
  37077000,
  34545685,
  36372503,
  38404120,
  38117399,
  37511194,
  37912603,
  38723612,
  36864030,
  36995104,
  16556072,
  37478445,
  38690862,
  35242032,
  36626484,
  37830708,
  38625332,
  38715449,
  38420538,
  35225663,
  37249088,
  36872262,
  37560392,
  36560974,
  36618327,
  18440287,
  38215778,
  38592612,
  38486116,
  36266089,
  7200873,
  27459692,
  38404206,
  36143217,
  36700277,
  27418743,
  18849911,
  38199421,
  37544062,
  38699147,
  38305932,
  38215823,
  37494931,
  38019222,
  38600858,
  36954267,
  37355676,
  37765294,
  37961903,
  37109936,
  37765298,
  37765299,
  37642418,
  37028021,
  29360310,
  37904566,
  37126328,
  37208249,
  36585663,
  37986497,
  37273797,
  36659400,
  8356043,
  25534668,
  37552336,
  37249233,
  28172498,
  31465683,
  37560531,
  36602072,
  37380313,
  35053788,
  7209181,
  34848997,
  38314214,
  36970727,
  37970163,
  35234037,
  36970745,
  38396159,
  36659456,
  37593346,
  22765

In [53]:
listingsid

0            6369
1           20185
2           21853
3           23148
4           24805
           ...   
21434    36576704
21435    36576880
21436    36580017
21437    36580707
21438    36583995
Name: id, Length: 21439, dtype: int64

In [49]:
print((calendar['listing_id']==37077000).unique())
print((df['id']==37077000).unique()) 


[False  True]
[False]


In [18]:
dfmerged = pd.merge(df, calendar, left_on='id', right_on='listing_id', how='left')

In [50]:
dfmerged.head(1)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_x,minimum_nights_x,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,listing_id,date,available,price_y,adjusted_price,minimum_nights_y,maximum_nights
0,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-19,f,$70.00,$70.00,1.0,365.0


In [65]:
dfmerged.head(500)

Unnamed: 0,id,name,host_id,host_name,neighbourhood_group,neighbourhood,latitude,longitude,room_type,price_x,minimum_nights_x,number_of_reviews,last_review,reviews_per_month,calculated_host_listings_count,availability_365,listing_id,date,available,price_y,adjusted_price,minimum_nights_y,maximum_nights
0,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-19,f,$70.00,$70.00,1.0,365.0
1,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-20,f,$75.00,$75.00,1.0,365.0
2,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-21,f,$75.00,$75.00,1.0,365.0
3,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-22,t,$70.00,$70.00,1.0,365.0
4,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-23,t,$70.00,$70.00,1.0,365.0
5,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-24,t,$70.00,$70.00,1.0,365.0
6,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-25,f,$70.00,$70.00,1.0,365.0
7,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-26,f,$70.00,$70.00,1.0,365.0
8,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-27,t,$75.00,$75.00,1.0,365.0
9,6369,"Rooftop terrace room with ensuite bathroom, Airc.",13660,Simon,Chamartín,Hispanoamérica,40.45628,-3.67763,Private room,70,3,64,2019-05-14,0.56,1,86,6369.0,2019-09-28,f,$75.00,$75.00,1.0,365.0


In [27]:
dfmerged.dtypes

id                                  int64
name                               object
host_id                             int64
host_name                          object
neighbourhood_group                object
neighbourhood                      object
latitude                          float64
longitude                         float64
room_type                          object
price_x                             int64
minimum_nights_x                    int64
number_of_reviews                   int64
last_review                        object
reviews_per_month                 float64
calculated_host_listings_count      int64
availability_365                    int64
listing_id                        float64
date                               object
available                          object
price_y                            object
adjusted_price                     object
minimum_nights_y                  float64
maximum_nights                    float64
dtype: object

In [29]:
dfmerged['date'] = pd.to_datetime(dfmerged['date'])

array([False])

In [5]:
#calculate variable RevPAR - Revenue per available room


Sources:

<a href="https://www.freepik.com/free-photos-vectors/people">People photo created by freepik - www.freepik.com</a>


<a href="https://www.freepik.com/free-photos-vectors/background">Background vector created by roserodionova - www.freepik.com</a>