# Lab 8: Define and Solve an ML Problem of Your Choosing

In [1]:
import pandas as pd
import numpy as np
import os 
import matplotlib.pyplot as plt
import seaborn as sns

In this lab assignment, you will follow the machine learning life cycle and implement a model to solve a machine learning problem of your choosing. You will select a data set and choose a predictive problem that the data set supports.  You will then inspect the data with your problem in mind and begin to formulate a  project plan. You will then implement the machine learning project plan. 

You will complete the following tasks:

1. Build Your DataFrame
2. Define Your ML Problem
3. Perform exploratory data analysis to understand your data.
4. Define Your Project Plan
5. Implement Your Project Plan:
    * Prepare your data for your model.
    * Fit your model to the training data and evaluate your model.
    * Improve your model's performance.

## Part 1: Build Your DataFrame

You will have the option to choose one of four data sets that you have worked with in this program:

* The "census" data set that contains Census information from 1994: `censusData.csv`
* Airbnb NYC "listings" data set: `airbnbListingsData.csv`
* World Happiness Report (WHR) data set: `WHR2018Chapter2OnlineData.csv`
* Book Review data set: `bookReviewsData.csv`

Note that these are variations of the data sets that you have worked with in this program. For example, some do not include some of the preprocessing necessary for specific models. 

#### Load a Data Set and Save it as a Pandas DataFrame

The code cell below contains filenames (path + filename) for each of the four data sets available to you.

<b>Task:</b> In the code cell below, use the same method you have been using to load the data using `pd.read_csv()` and save it to DataFrame `df`. 

You can load each file as a new DataFrame to inspect the data before choosing your data set.

In [2]:
# File names of the four data sets
adultDataSet_filename = os.path.join(os.getcwd(), "data", "censusData.csv")
airbnbDataSet_filename = os.path.join(os.getcwd(), "data", "airbnbListingsData.csv")
WHRDataSet_filename = os.path.join(os.getcwd(), "data", "WHR2018Chapter2OnlineData.csv")
bookReviewDataSet_filename = os.path.join(os.getcwd(), "data", "bookReviewsData.csv")


df = pd.read_csv(airbnbDataSet_filename,header=0)# YOUR CODE
print(list(df.columns))

df.head()

['name', 'description', 'neighborhood_overview', 'host_name', 'host_location', 'host_about', 'host_response_rate', 'host_acceptance_rate', 'host_is_superhost', 'host_listings_count', 'host_total_listings_count', 'host_has_profile_pic', 'host_identity_verified', 'neighbourhood_group_cleansed', 'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'amenities', 'price', 'minimum_nights', 'maximum_nights', 'minimum_minimum_nights', 'maximum_minimum_nights', 'minimum_maximum_nights', 'maximum_maximum_nights', 'minimum_nights_avg_ntm', 'maximum_nights_avg_ntm', 'has_availability', 'availability_30', 'availability_60', 'availability_90', 'availability_365', 'number_of_reviews', 'number_of_reviews_ltm', 'number_of_reviews_l30d', 'review_scores_rating', 'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication', 'review_scores_location', 'review_scores_value', 'instant_bookable', 'calculated_host_listings_count', 'calculated_host_listings_count_entire_homes', 'c

Unnamed: 0,name,description,neighborhood_overview,host_name,host_location,host_about,host_response_rate,host_acceptance_rate,host_is_superhost,host_listings_count,...,review_scores_communication,review_scores_location,review_scores_value,instant_bookable,calculated_host_listings_count,calculated_host_listings_count_entire_homes,calculated_host_listings_count_private_rooms,calculated_host_listings_count_shared_rooms,reviews_per_month,n_host_verifications
0,Skylit Midtown Castle,"Beautiful, spacious skylit studio in the heart...",Centrally located in the heart of Manhattan ju...,Jennifer,"New York, New York, United States",A New Yorker since 2000! My passion is creatin...,0.8,0.17,True,8.0,...,4.79,4.86,4.41,False,3,3,0,0,0.33,9
1,"Whole flr w/private bdrm, bath & kitchen(pls r...","Enjoy 500 s.f. top floor in 1899 brownstone, w...",Just the right mix of urban center and local n...,LisaRoxanne,"New York, New York, United States",Laid-back Native New Yorker (formerly bi-coast...,0.09,0.69,True,1.0,...,4.8,4.71,4.64,False,1,1,0,0,4.86,6
2,"Spacious Brooklyn Duplex, Patio + Garden",We welcome you to stay in our lovely 2 br dupl...,,Rebecca,"Brooklyn, New York, United States","Rebecca is an artist/designer, and Henoch is i...",1.0,0.25,True,1.0,...,5.0,4.5,5.0,False,1,1,0,0,0.02,3
3,Large Furnished Room Near B'way,Please don’t expect the luxury here just a bas...,"Theater district, many restaurants around here.",Shunichi,"New York, New York, United States",I used to work for a financial industry but no...,1.0,1.0,True,1.0,...,4.42,4.87,4.36,False,1,0,1,0,3.68,4
4,Cozy Clean Guest Room - Family Apt,"Our best guests are seeking a safe, clean, spa...",Our neighborhood is full of restaurants and ca...,MaryEllen,"New York, New York, United States",Welcome to family life with my oldest two away...,,,True,1.0,...,4.95,4.94,4.92,False,1,0,1,0,0.87,7


## Part 2: Define Your ML Problem

Next you will formulate your ML Problem. In the markdown cell below, answer the following questions:

1. List the data set you have chosen.
2. What will you be predicting? What is the label?
3. Is this a supervised or unsupervised learning problem? Is this a clustering, classification or regression problem? Is it a binary classificaiton or multi-class classifiction problem?
4. What are your features? (note: this list may change after your explore your data)
5. Explain why this is an important problem. In other words, how would a company create value with a model that predicts this label?

<
The data that I have chosen is the Airbnb NYC "listing" data set.
I will be predicting the rating of the review scores value of an airbnb. The label is review_scores_rating. 
This is a supervised learning problem because the model will have access to label for the dataset it will be trained on. This is a regression problem because we are trying to find a continous numerical value.

My features are:
-'host_location', 
-'host_response_rate', 
-'host_acceptance_rate',
-'host_is_superhost', 
-'neighbourhood_group_cleansed', 
-'room_type', 
-'accommodates', 
-'bathrooms', 
-'bedrooms', 
-'beds',
-'amenities', 
-'price', 
-'minimum_nights', 
-'maximum_nights',   
-'review_scores_value', 
-'review_scores_cleanliness', 
-'review_scores_checkin', 
-'review_scores_communication', '
-'review_scores_location',
-'instant_bookable',

This is an important problem because it will help the company and the host better understand what qualities in the host/airbnb place leads to an overall high review rates from clients making bookings. A company would make money by helping hosts tailor how they present their airbnb place or what things they can offer that they have seen with other places that have high review rates, leading to more people booking as they are reading the reviews. This model could just help gain more insight into how to improve airbnb places that have low review rates, that probably have low bookings. >

## Part 3: Understand Your Data

The next step is to perform exploratory data analysis. Inspect and analyze your data set with your machine learning problem in mind. Consider the following as you inspect your data:

1. What data preparation techniques would you like to use? These data preparation techniques may include:

    * addressing missingness, such as replacing missing values with means
    * finding and replacing outliers
    * renaming features and labels
    * finding and replacing outliers
    * performing feature engineering techniques such as one-hot encoding on categorical features
    * selecting appropriate features and removing irrelevant features
    * performing specific data cleaning and preprocessing techniques for an NLP problem
    * addressing class imbalance in your data sample to promote fair AI
    

2. What machine learning model (or models) you would like to use that is suitable for your predictive problem and data?
    * Are there other data preparation techniques that you will need to apply to build a balanced modeling data set for your problem and model? For example, will you need to scale your data?
 
 
3. How will you evaluate and improve the model's performance?
    * Are there specific evaluation metrics and methods that are appropriate for your model?
    

Think of the different techniques you have used to inspect and analyze your data in this course. These include using Pandas to apply data filters, using the Pandas `describe()` method to get insight into key statistics for each column, using the Pandas `dtypes` property to inspect the data type of each column, and using Matplotlib and Seaborn to detect outliers and visualize relationships between features and labels. If you are working on a classification problem, use techniques you have learned to determine if there is class imbalance.

<b>Task</b>: Use the techniques you have learned in this course to inspect and analyze your data. You can import additional packages that you have used in this course that you will need to perform this task.

<b>Note</b>: You can add code cells if needed by going to the <b>Insert</b> menu and clicking on <b>Insert Cell Below</b> in the drop-drown menu.

In [3]:
# Getting the row and column numbers for the dataframe
my_features=['host_location','host_response_rate', 'host_acceptance_rate','host_is_superhost', 'neighbourhood_group_cleansed', 'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds','amenities', 'price', 'minimum_nights','maximum_nights','review_scores_value', 'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication','review_scores_location','instant_bookable','review_scores_rating']
df=df[my_features]
df.shape
#renaming columns for better understanding 
df.rename(columns={'review_scores_rating':'Overall_rating','host_location':'location','host_is_superhost':'response_rate','neighbourhood_group_cleansed':'neighbourhood','review_scores_value':'value_money_rate','review_scores_cleanliness':'cleanliness_score','review_scores_checkin':'checkin_score','review_scores_communication':'communication_score','review_scores_location':'location_score'}, inplace=True)
#checking if there are nay null values
df.isnull().values.any()


True

In [4]:
#counting the number of times a missing value occurs in each column
nan_count=np.sum(df.isnull(),axis=0)
nan_count

location                   60
host_response_rate      11843
host_acceptance_rate    11113
response_rate               0
neighbourhood               0
room_type                   0
accommodates                0
bathrooms                   0
bedrooms                 2918
beds                     1354
amenities                   0
price                       0
minimum_nights              0
maximum_nights              0
value_money_rate            0
cleanliness_score           0
checkin_score               0
communication_score         0
location_score              0
instant_bookable            0
Overall_rating              0
dtype: int64

In [5]:
#Storing the names of the columns that have a missing value into a list & checking the type of value each column holds
condition=nan_count !=0
col_names=nan_count[condition].index
nan_columns=list(col_names)
print(nan_columns)
nan_types=df[nan_columns].dtypes
nan_types

['location', 'host_response_rate', 'host_acceptance_rate', 'bedrooms', 'beds']


location                 object
host_response_rate      float64
host_acceptance_rate    float64
bedrooms                float64
beds                    float64
dtype: object

In [6]:
#filling in empty values for host_response_rate
mean_host_response_rate=df['host_response_rate'].mean()
print("The mean value of the host_response_rate columns is: " +str(mean_host_response_rate))
df['host_response_rate'].fillna(value=mean_host_response_rate,inplace=True)

#filling in empty values for host_acceptance_rate
mean_host_acceptance_rate=df['host_acceptance_rate'].mean()
print("The mean value of the host_acceptance_rate columns is: " +str(mean_host_acceptance_rate))
df['host_acceptance_rate'].fillna(value=mean_host_acceptance_rate,inplace=True)
#drop location from the df,bedrooms and beds columns
df=df.drop(columns=['location','beds','bedrooms'])
df

The mean value of the host_response_rate columns is: 0.9069009209469064
The mean value of the host_acceptance_rate columns is: 0.7919528061978829


Unnamed: 0,host_response_rate,host_acceptance_rate,response_rate,neighbourhood,room_type,accommodates,bathrooms,amenities,price,minimum_nights,maximum_nights,value_money_rate,cleanliness_score,checkin_score,communication_score,location_score,instant_bookable,Overall_rating
0,0.800000,0.170000,True,Manhattan,Entire home/apt,1,1.0,"[""Extra pillows and blankets"", ""Baking sheet"",...",150.0,30,1125,4.41,4.62,4.76,4.79,4.86,False,4.70
1,0.090000,0.690000,True,Brooklyn,Entire home/apt,3,1.0,"[""Extra pillows and blankets"", ""Luggage dropof...",75.0,1,730,4.64,4.49,4.78,4.80,4.71,False,4.45
2,1.000000,0.250000,True,Brooklyn,Entire home/apt,4,1.5,"[""Kitchen"", ""BBQ grill"", ""Cable TV"", ""Carbon m...",275.0,5,1125,5.00,5.00,5.00,5.00,4.50,False,5.00
3,1.000000,1.000000,True,Manhattan,Private room,2,1.0,"[""Room-darkening shades"", ""Lock on bedroom doo...",68.0,2,14,4.36,3.73,4.66,4.42,4.87,False,4.21
4,0.906901,0.791953,True,Manhattan,Private room,1,1.0,"[""Breakfast"", ""Carbon monoxide alarm"", ""Fire e...",75.0,2,14,4.92,4.82,4.97,4.95,4.94,False,4.91
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28017,1.000000,1.000000,True,Queens,Private room,2,1.0,"[""Lock on bedroom door"", ""Hot water kettle"", ""...",89.0,1,365,1.00,5.00,5.00,5.00,3.00,True,5.00
28018,0.910000,0.890000,True,Brooklyn,Entire home/apt,6,1.0,"[""Kitchen"", ""Carbon monoxide alarm"", ""TV"", ""Wa...",1000.0,1,1,5.00,5.00,5.00,5.00,5.00,False,5.00
28019,0.990000,0.990000,True,Brooklyn,Private room,2,2.0,"[""Hangers"", ""Keypad"", ""Kitchen"", ""Carbon monox...",64.0,1,10,2.00,1.00,1.00,5.00,5.00,True,1.00
28020,0.900000,1.000000,True,Brooklyn,Entire home/apt,3,1.0,"[""Luggage dropoff allowed"", ""Security cameras ...",84.0,7,365,5.00,5.00,5.00,5.00,5.00,False,5.00


In [7]:
#Turning amenities into a boolean 
df['has_amenities']=df['amenities'].astype(bool)
df.drop(columns=['amenities'],inplace=True)
df

Unnamed: 0,host_response_rate,host_acceptance_rate,response_rate,neighbourhood,room_type,accommodates,bathrooms,price,minimum_nights,maximum_nights,value_money_rate,cleanliness_score,checkin_score,communication_score,location_score,instant_bookable,Overall_rating,has_amenities
0,0.800000,0.170000,True,Manhattan,Entire home/apt,1,1.0,150.0,30,1125,4.41,4.62,4.76,4.79,4.86,False,4.70,True
1,0.090000,0.690000,True,Brooklyn,Entire home/apt,3,1.0,75.0,1,730,4.64,4.49,4.78,4.80,4.71,False,4.45,True
2,1.000000,0.250000,True,Brooklyn,Entire home/apt,4,1.5,275.0,5,1125,5.00,5.00,5.00,5.00,4.50,False,5.00,True
3,1.000000,1.000000,True,Manhattan,Private room,2,1.0,68.0,2,14,4.36,3.73,4.66,4.42,4.87,False,4.21,True
4,0.906901,0.791953,True,Manhattan,Private room,1,1.0,75.0,2,14,4.92,4.82,4.97,4.95,4.94,False,4.91,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28017,1.000000,1.000000,True,Queens,Private room,2,1.0,89.0,1,365,1.00,5.00,5.00,5.00,3.00,True,5.00,True
28018,0.910000,0.890000,True,Brooklyn,Entire home/apt,6,1.0,1000.0,1,1,5.00,5.00,5.00,5.00,5.00,False,5.00,True
28019,0.990000,0.990000,True,Brooklyn,Private room,2,2.0,64.0,1,10,2.00,1.00,1.00,5.00,5.00,True,1.00,True
28020,0.900000,1.000000,True,Brooklyn,Entire home/apt,3,1.0,84.0,7,365,5.00,5.00,5.00,5.00,5.00,False,5.00,True


In [8]:
df.describe()

Unnamed: 0,host_response_rate,host_acceptance_rate,accommodates,bathrooms,price,minimum_nights,maximum_nights,value_money_rate,cleanliness_score,checkin_score,communication_score,location_score,Overall_rating
count,28022.0,28022.0,28022.0,28022.0,28022.0,28022.0,28022.0,28022.0,28022.0,28022.0,28022.0,28022.0,28022.0
mean,0.906901,0.791953,2.874491,1.142174,154.228749,18.689387,78695.41,4.64767,4.613352,4.8143,4.808041,4.750393,4.683482
std,0.172697,0.214963,1.860251,0.421132,140.816605,25.569151,12829730.0,0.518023,0.573891,0.438603,0.464585,0.415717,0.505857
min,0.0,0.0,1.0,0.0,29.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.906901,0.791953,2.0,1.0,70.0,2.0,40.0,4.55,4.5,4.81,4.81,4.67,4.6
50%,0.906901,0.791953,2.0,1.0,115.0,30.0,1124.0,4.78,4.8,4.96,4.97,4.88,4.83
75%,1.0,0.95,4.0,1.0,180.0,30.0,1125.0,5.0,5.0,5.0,5.0,5.0,5.0
max,1.0,1.0,16.0,8.0,1000.0,1250.0,2147484000.0,5.0,5.0,5.0,5.0,5.0,5.0


In [9]:
#checking if there are nay null values
df.isnull().values.any()

False

In [10]:
final_list=list(df.columns)
print(final_list)
df.columns

['host_response_rate', 'host_acceptance_rate', 'response_rate', 'neighbourhood', 'room_type', 'accommodates', 'bathrooms', 'price', 'minimum_nights', 'maximum_nights', 'value_money_rate', 'cleanliness_score', 'checkin_score', 'communication_score', 'location_score', 'instant_bookable', 'Overall_rating', 'has_amenities']


Index(['host_response_rate', 'host_acceptance_rate', 'response_rate',
       'neighbourhood', 'room_type', 'accommodates', 'bathrooms', 'price',
       'minimum_nights', 'maximum_nights', 'value_money_rate',
       'cleanliness_score', 'checkin_score', 'communication_score',
       'location_score', 'instant_bookable', 'Overall_rating',
       'has_amenities'],
      dtype='object')

In [11]:
print(df.dtypes)
df=df.drop(columns=['neighbourhood','room_type'])

host_response_rate      float64
host_acceptance_rate    float64
response_rate              bool
neighbourhood            object
room_type                object
accommodates              int64
bathrooms               float64
price                   float64
minimum_nights            int64
maximum_nights            int64
value_money_rate        float64
cleanliness_score       float64
checkin_score           float64
communication_score     float64
location_score          float64
instant_bookable           bool
Overall_rating          float64
has_amenities              bool
dtype: object


In [12]:
print(df.dtypes)

host_response_rate      float64
host_acceptance_rate    float64
response_rate              bool
accommodates              int64
bathrooms               float64
price                   float64
minimum_nights            int64
maximum_nights            int64
value_money_rate        float64
cleanliness_score       float64
checkin_score           float64
communication_score     float64
location_score          float64
instant_bookable           bool
Overall_rating          float64
has_amenities              bool
dtype: object


## Part 4: Define Your Project Plan

Now that you understand your data, in the markdown cell below, define your plan to implement the remaining phases of the machine learning life cycle (data preparation, modeling, evaluation) to solve your ML problem. Answer the following questions:

* Do you have a new feature list? If so, what are the features that you chose to keep and remove after inspecting the data? 
* Explain different data preparation techniques that you will use to prepare your data for modeling.
* What is your model (or models)?
* Describe your plan to train your model, analyze its performance and then improve the model. That is, describe your model building, validation and selection plan to produce a model that generalizes well to new data. 

<Yes, I do have a new feature list. I removed any columns that had objects as their data type, the reason is because it would cause issues when trying to train the model on data, since it is not a numerical value. Since the columns have various responses within each entry, creating numerical numbers would be complex such as for location and number of baths/bedrooms.I kept features with float or integer data types, for some that were objects such as amenities, I turned it into a boolean, as it can be easily interpreted when model training.

The different preparation techniques that I used to prepare my data for modeling was mainly used to create a condensed, clear and non-repetitive data set. I looked through all the features and eliminated any that I thought would not be useful or did not provide much relevant importance. I then started off with renaming my features because some of the feature names were a little bit confusing. After that I checked to see if there were any null values in my data set. Following that I determined which features had the empty values and put them in a list. I looked at their data types and realized some of the features consisted of objects but also did not really seem relevant in order to obtain a overall review score. I also wanted to create a decent size dataset, where there was not too many features. I dropped uneccesary columns and replaced null vlaues for floats with the average of that column such as the response rates.

My model would be a linear regression model since I am dealing with a regression problem. The reason why I picked this type of model is because it is easy model to analyze visually and it is able to handle large data sets with minimal resources. This is a good consideration since the Airbnb company has a large amount of Airbnb hosts on it's platform, meaning large quanities of data is being dealt with.



My plan to train the model involves some additional packages such as the LinearRegression scikit-learn linear model. I started by creating the label and the feature for this model. Following that I created the training and test data set. After that I created a multiple linear regression model and printed out the parameters for the model such as the intercept which is the alpha value and the weights for each feature.Then I analyzed the R^2 score and the RMSE value to see how it described the results from the model. The results were not perfect in terms of that it generated all the correct answers which means that there was no overfitting so far, but instead it was close to the test data results, meaning that this model can generalize well. This means that the model can learn on it's own when dealing with new datasets and is able to handle these changes in data.>

## Part 5: Implement Your Project Plan

<b>Task:</b> In the code cell below, import additional packages that you have used in this course that you will need to implement your project plan.

In [13]:
#importing packages
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

<b>Task:</b> Use the rest of this notebook to carry out your project plan. 

You will:

1. Prepare your data for your model.
2. Fit your model to the training data and evaluate your model.
3. Improve your model's performance by performing model selection and/or feature selection techniques to find best model for your problem.

Add code cells below and populate the notebook with commentary, code, analyses, results, and figures as you see fit. 

In [14]:
# Creating labeled examples from the data set
y=df['Overall_rating']
X=df.drop(columns=['Overall_rating'])
print(X)
print(y)


       host_response_rate  host_acceptance_rate  response_rate  accommodates  \
0                0.800000              0.170000           True             1   
1                0.090000              0.690000           True             3   
2                1.000000              0.250000           True             4   
3                1.000000              1.000000           True             2   
4                0.906901              0.791953           True             1   
...                   ...                   ...            ...           ...   
28017            1.000000              1.000000           True             2   
28018            0.910000              0.890000           True             6   
28019            0.990000              0.990000           True             2   
28020            0.900000              1.000000           True             3   
28021            0.906901              0.791953           True             1   

       bathrooms   price  minimum_night

In [15]:
#creating the training & test data sets
X_train,X_test,y_train, y_test= train_test_split(X,y, test_size=0.10, random_state=42)

In [16]:
#creating the model, fitting it and predicting with the model
model=LinearRegression()
model.fit(X_train,y_train)
prediction=model.predict(X_test)

In [17]:
#inspecting parameters
print("Summary for Feature 1: \n")
print("Weight=",model.coef_[0])
print('Alpha=',model.intercept_)

Summary for Feature 1: 

Weight= 0.04725312496105383
Alpha= -0.14276953461543052


In [18]:
print("The model summary:")
#printing out the y-intercept
print("Intercept: Alpha= " , model.intercept_)
i=1
#printing out the weights for each feature
for w in model.coef_:
    print("Feature",i, "weight=" , w )
    i+=1

The model summary:
Intercept: Alpha=  -0.14276953461543052
Feature 1 weight= 0.04725312496105383
Feature 2 weight= -0.02135070354619952
Feature 3 weight= -9.71445146547012e-17
Feature 4 weight= -0.0032301703856688897
Feature 5 weight= 5.228549632961512e-05
Feature 6 weight= 9.462852116214288e-05
Feature 7 weight= 0.0002083893439190083
Feature 8 weight= -6.577370592619758e-12
Feature 9 weight= 0.3856976209853192
Feature 10 weight= 0.27055338014992103
Feature 11 weight= 0.11462654621350064
Feature 12 weight= 0.20655213271157252
Feature 13 weight= 0.04404505989749916
Feature 14 weight= -0.015841751935416974
Feature 15 weight= 0.0


In [19]:
print('\n Overall Model Summary\n\nRMSE =   %.2f'
      % np.sqrt(mean_squared_error(y_test, prediction)))
print(' R^2 =   %.2f'
      % r2_score(y_test, prediction))


 Overall Model Summary

RMSE =   0.24
 R^2 =   0.77


In [20]:
''' The final conclusion that can be said about this model is that since it has a lower RMSE value, it indicates that predictions of ths model are
are very close to the actual values, meaning it deviates less since my specific range focuses up to 1 Also since the R^2 value is 0.77 it means that
77% of the variability within the model can be explained by the model itself and not just by chance. By adjusting the weights of the features and adding
additional training data this model can be utilized to determine the future overall rating score of an Airbnb place based on the features focused on in this model.'''

' The final conclusion that can be said about this model is that since it has a lower RMSE value, it indicates that predictions of ths model are\nare very close to the actual values, meaning it deviates less since my specific range focuses up to 1 Also since the R^2 value is 0.77 it means that\n77% of the variability within the model can be explained by the model itself and not just by chance. By adjusting the weights of the features and adding\nadditional training data this model can be utilized to determine the future overall rating score of an Airbnb place based on the features focused on in this model.'