# Scenario
Create a model that will predict whether a airbnb will get a perfect 5.0 rating in San Diego California. The purpose of this model is for airbnb hosts to have a way to evaluate their rentals and make sure that they are meeting all of the criteria to get that perfect review.

## Questions to Answer

1. How many units have a perfect rating?
2. How long have they had perfect rating?
3. How many reviews should the unit have to be considered? (ie, one 5.0 isn't enough)
4. What review metrics have the most impact?
5. What house factors have the most impact?
6. Relationship between price and rating?

In [24]:
import pandas as pd

In [21]:
pd.set_option('display.max_rows', 1000)

In [5]:
reviews_df = pd.read_csv('reviews.csv.gz')

In [6]:
reviews_df.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,29967,62788,2010-07-09,151260,Debbie,When I booked our stay in San Diego at Dennis ...
1,29967,64568,2010-07-14,141552,Eric,This was my first experience with using airbnb...
2,29967,67502,2010-07-22,141591,David,We found the house to be very accommodating--e...
3,29967,70466,2010-07-29,125982,Anders,As advertised and more. Dennis was very helpfu...
4,29967,74876,2010-08-07,29835,Miyoko,We had a great time in San Diego. Denis' house...


In [7]:
listing_df = pd.read_csv('listings.csv.gz')

In [9]:
listing_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10935 entries, 0 to 10934
Data columns (total 74 columns):
 #   Column                                        Non-Null Count  Dtype  
---  ------                                        --------------  -----  
 0   id                                            10935 non-null  int64  
 1   listing_url                                   10935 non-null  object 
 2   scrape_id                                     10935 non-null  int64  
 3   last_scraped                                  10935 non-null  object 
 4   name                                          10935 non-null  object 
 5   description                                   10809 non-null  object 
 6   neighborhood_overview                         7440 non-null   object 
 7   picture_url                                   10935 non-null  object 
 8   host_id                                       10935 non-null  int64  
 9   host_url                                      10935 non-null 

In [17]:
review_score_df = listing_df[['id', 'price', 'review_scores_rating', 'review_scores_accuracy',
                             'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication',
                             'review_scores_location', 'review_scores_value', 'number_of_reviews',
                             'number_of_reviews_ltm', 'number_of_reviews_l30d']]

In [18]:
df = review_score_df

In [22]:
df.head(1000)

Unnamed: 0,id,price,review_scores_rating,review_scores_accuracy,review_scores_cleanliness,review_scores_checkin,review_scores_communication,review_scores_location,review_scores_value,number_of_reviews,number_of_reviews_ltm,number_of_reviews_l30d
0,53157684,$60.00,5.0,5.0,5.0,5.0,5.0,4.0,5.0,1,1,0
1,4541431,$282.00,4.87,4.91,4.64,4.99,4.98,4.87,4.86,100,16,1
2,41089200,$348.00,4.92,5.0,4.92,5.0,5.0,4.92,4.92,12,7,0
3,43078286,$368.00,4.88,4.96,4.96,5.0,5.0,4.75,4.71,24,12,0
4,51558974,$264.00,5.0,5.0,4.75,4.75,4.75,5.0,4.75,4,4,0
5,38816619,$140.00,5.0,4.0,4.0,5.0,5.0,5.0,4.5,2,0,0
6,29281731,$19.00,4.79,4.85,4.85,4.85,4.84,4.87,4.75,67,23,1
7,53205282,$25.00,5.0,4.67,5.0,5.0,5.0,5.0,5.0,3,3,1
8,38550805,$60.00,5.0,5.0,5.0,5.0,5.0,5.0,5.0,1,0,0
9,53479335,$39.00,5.0,5.0,5.0,5.0,5.0,5.0,5.0,3,3,1


In [23]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10935 entries, 0 to 10934
Data columns (total 12 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   id                           10935 non-null  int64  
 1   price                        10935 non-null  object 
 2   review_scores_rating         9408 non-null   float64
 3   review_scores_accuracy       9385 non-null   float64
 4   review_scores_cleanliness    9385 non-null   float64
 5   review_scores_checkin        9383 non-null   float64
 6   review_scores_communication  9385 non-null   float64
 7   review_scores_location       9383 non-null   float64
 8   review_scores_value          9383 non-null   float64
 9   number_of_reviews            10935 non-null  int64  
 10  number_of_reviews_ltm        10935 non-null  int64  
 11  number_of_reviews_l30d       10935 non-null  int64  
dtypes: float64(7), int64(4), object(1)
memory usage: 1.0+ MB


In [10]:
df = pd.read_csv('listings.csv')

In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10946 entries, 0 to 10945
Data columns (total 18 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              10946 non-null  int64  
 1   name                            10946 non-null  object 
 2   host_id                         10946 non-null  int64  
 3   host_name                       10931 non-null  object 
 4   neighbourhood_group             0 non-null      float64
 5   neighbourhood                   10946 non-null  object 
 6   latitude                        10946 non-null  float64
 7   longitude                       10946 non-null  float64
 8   room_type                       10946 non-null  object 
 9   price                           10946 non-null  int64  
 10  minimum_nights                  10946 non-null  int64  
 11  number_of_reviews               10946 non-null  int64  
 12  last_review                     