<a href="https://colab.research.google.com/github/shashankmorker/AIR_BNB-EDA/blob/main/Capstone_Project_EDA_Airbnb_Booking_Analysis_Version_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Name**            - Shashank Morker


# **Project Summary -**




* The objective of this analysis is to identify trends across all variables and understand the factors affecting Airbnb prices in New York City. The insights derived from our study are beneficial for both travelers and hosts in the city, and also provide valuable information for Airbnb's business strategy.

* The project involves exploring and cleaning a dataset to make it suitable for analysis. The exploration phase includes understanding the data's characteristics such as data types, missing values, and value distributions. The cleaning phase involves rectifying any inconsistencies in the data, such as errors, missing values, or duplicate entries, and removing outliers.

* This process ensures the data is free from errors and ready for further research, which is a crucial step in any data analysis project. It allows us to work with high-quality data and avoid potential biases or errors that could distort the results.

* Once the data is cleaned and prepared, it is explored and summarized by describing the data, creating visualizations, and identifying patterns and trends. This exploration may reveal relationships between variables or the root causes of certain patterns or trends.

* Data visualization is used to understand patterns in Airbnb data. Various graphs and charts are created, with observations and insights noted beneath each one, to help us better understand the data and identify useful insights and patterns.

* The insights gathered from this process will be useful for future Airbnb analysis and decision-making. Furthermore, our analysis provides valuable information for travelers and hosts in the city.
















# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings("ignore")

### Dataset Loading

In [None]:
# Load Dataset
from google.colab import drive
drive.mount('/content/drive')


In [None]:
data = pd.read_csv('/content/drive/MyDrive/Airbnb csv/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
# Dataset First Look
df = pd.DataFrame(data)
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape # This help us to find total rows and columns in a dataframe.

In [None]:
df.columns # These are the columns in our dataframe.

#### **I am renaming few columns for better understanding.**

In [None]:
rename_col = {'id':'listing_id','name':'listing_name','number_of_reviews':'total_reviews','calculated_host_listings_count':'host_listings_count'}

In [None]:
# use a pandas function to rename the current function
df = df.rename(columns = rename_col)
df.head(2)

### Dataset Information

In [None]:
# Dataset Info
df.info() # this tells us non-Null count and data type.

#### Duplicate Values

In [None]:
# check duplicate rows in dataset
df = df.drop_duplicates()
df.count()

In [None]:
df.duplicated().sum()

##### **We donot have duplicate rows in our dataframe.**

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isna().sum()

##### **We have very few NA values in listing_name and host_name so I will be removing them in the upcoming steps**

### What did you know about your dataset?

* This Airbnb dataset from New York contains nearly 49,000 observations in 16 columns of data.

* The Data contains both categorical and numerical values, providing a wide range of information about the listings.

* This Dataset may be useful for analyzing trends and patterns in the New York Airbnb market, as well as gaining insights into the preferences and behavior of Airbnb users in the area.

* This dataset contains information about Airbnb bookings in New York City in 2019. By analyzing this data, you may be able to understand the trends and patterns of Airbnb use in New York City.





## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### **Variables Description**

* **Listing_id** :- This is a unique identifier for each listing in the dataset.

* **Listing_name**:- This is the name or title of the listing, as it appears on the Airbnb website.

* **Host_id**:- This is a unique identifier for each host in the dataset.

* **Host_name** :- This is the name of the host as it appears on the Airbnb website.

* **Neighbourhood_group**:- This is a grouping of neighborhoods in New York City, such as Manhattan or Brooklyn.

* **Neighbourhood** :- This is the specific neighborhood in which the listing is located.

* **Latitude** :- This is the geographic latitude of the listing.

* **Longitude**:- This is the geographic longitude of the listing.

* **Room_type** :- This is the type of room or property being offered, such as an entire home, private room, shared room.

* **Price** :- This is the nightly price for the listing, in US dollars.

* **Minimum_nights** :- This is the minimum number of nights that a guest must stay at the listing.

* **Total_reviews** :- This is the total number of reviews that the listing has received.

* **Reviews_per_month**:- This is the average number of reviews that the listing receives per month.

* **Host_listings_count**:- This is the total number of listings that the host has on Airbnb.

* **Availability_365** :- This is the number of days in the next 365 days that the listing is available for booking.







### **Check Unique Values for each variable.**

In [None]:
# Check Unique Values for each variable.
# All the listing ids are different and each listings are different here.
df['listing_id'].nunique()

In [None]:
# There are 221 unique neighborhood in Dataset
df['neighbourhood'].nunique()

In [None]:
# There are total 5 unique neighborhood_group in Dataset
df['neighbourhood_group'].nunique()

In [None]:
# There are total 11453 different hosts in Airbnb-NYC
df['host_name'].nunique()

In [None]:
# Most of the listing/property are different in Dataset
df['listing_name'].nunique()

In [None]:
# We have same host which is David operates different 402 listing/property.
df[df['host_name']=='David']['listing_name'].nunique()


In [None]:
#Same hosts have many listings in same neighbourhood_groups with different room type or same/different room_type in other neighbporhood.
df.loc[(df['neighbourhood_group']=='Manhattan') & (df['host_name']=='John')]


In [None]:
# There are few listings where the listing/property name and the host have same names.
df[df['listing_name']==df['host_name']].head()

## 3. ***Data Wrangling***

### Data Wrangling Code

### **Note:**
##### **I tried removing all the rows which are having null values but because of that 20% of the data was being removed which is not acceptable. Instead I removed last_review column which I felt was not that useful. and can have all other data to work on.**

In [None]:
# Write your code to make your dataset analysis ready.
df = df.drop(['last_review'], axis=1) # Removed last_review column from dataframe.

In [None]:
# Visualizing the missing values
df.sample(15) # We are checking random 15 rows to check NA values.

#### **I am filling missing values in reviews_per_month column.**

In [None]:
df['reviews_per_month'].fillna(value=df['reviews_per_month'].mean(), inplace=True)
df['reviews_per_month'] = df['reviews_per_month'].round(1)

In [None]:
df.sample(15)

In [None]:
df = df.dropna()

In [None]:
df.isna().sum() # We don't have Na values now.

In [None]:
df.shape # This the current number of rows and columns in our dataframe.

## **What all manipulations have you done and insights you found?**

### **Manipulations done:**
I was given a dataset of Airbnb bookings,so here I noticed there are few missing names in listing_names and host_names columns. There are over 10k missing values in last_review and reviews_per_month columns. So I deleted last review column and filled missing values in reviews_per_month column using mean. Lastly I removed those few missing names from listing_names and host_names and managed to clean the data.

### **Insights found:**

* There are few listings where the listing/property name and the host have same names.

* We have same host which is David operates different 402 listing/property.

* Most of the listing/property are different in Dataset

* There are total 11453 different hosts in Airbnb-NYC

* There are total 5 unique neighborhood_group in Dataset

* There are 221 unique neighborhood in Dataset

* All the listing ids are different and each listings are different here.









### **Columns:**
#### Room_type, neighbourhood, neighbourhood_group are categorical, hence I am changing their type to categorical.

In [None]:
df.room_type = df.room_type.astype("category")

In [None]:
df.neighbourhood = df.neighbourhood.astype("category")

In [None]:
df.neighbourhood_group = df.neighbourhood_group.astype("category")

In [None]:
df.info() # We can see their type has been chnged to categorical.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

#### **Checking outliers in price column using boxplot.**

In [None]:
# Chart - 1 visualization code
sns.boxplot(x = 'price', data = df)
plt.show() # we can see we have many outliers

#### **Removing outliers using Inter Quartile Range.**

In [None]:
def remove_outliers(df, column_name):
    """
    Remove outliers from a specific column in a DataFrame using the IQR method.

    Parameters:
    - df: pandas DataFrame
    - column_name: Name of the column for outlier removal

    Returns:
    - DataFrame with outliers removed
    """

    # Calculate the first and third quartiles
    Q1 = df[column_name].quantile(0.25)
    Q3 = df[column_name].quantile(0.75)

    # Calculate the IQR (Interquartile Range)
    IQR = Q3 - Q1

    # Define the lower and upper bounds for outlier removal
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    # Remove outliers
    df_no_outliers = df[(df[column_name] >= lower_bound) & (df[column_name] <= upper_bound)]

    return df_no_outliers

updated_df = remove_outliers(df, 'price')

In [None]:
# Here I am checking how much data is removed of original data.
c = round(((df.shape[0] - updated_df.shape[0]) / df.shape[0])*100 , 2)
print(f'Removed Outliers which consisted of {c} % of the orignal data')

#### **This is a new boxplot with fewer outliers.**

In [None]:
sns.boxplot(x = 'price', data = updated_df)
plt.show()

##### **1. Why did you pick the specific chart?**

I picked boxplot to mainly check how much outliers I am having in my price column. After checking I managed to remove the outliers by doing IQR method.  

##### **2. What is/are the insight(s) found from the chart?**

Insights found by using this chart is the outiers count is very high in **"price column"** and after removing the outlier I found that removed Outliers consisted of **"6.08 %"** of the orignal data. But I guess that is fine considering the total amount of data.  

##### **3. Will the gained insights help creating a positive business impact?**
#### **Are there any insights that lead to negative growth? Justify with specific reason.**

The gained insight can be used to do further analysis and gather more data.

There are no negative insights from this chart.

#### Chart - 2

#### **After removing the outliers from 'Price' column, I used that to find the price distribution of Airbnb.**

In [None]:
# Chart - 2 visualization code

# Create a figure with a custom size
plt.figure(figsize=(12, 6))

# Set the seaborn theme to darkgrid
sns.set_theme(style='darkgrid')

# Create a histogram of the 'price' column of the dataframe
# using sns distplot function and specifying the color as red
sns.distplot(updated_df['price'],color=('r'))

# Add labels to the x-axis and y-axis
plt.xlabel('Price', fontsize=14)
plt.ylabel('Density', fontsize=14)

# Add a title to the plot
plt.title('Distribution of Airbnb Prices',fontsize=15)
plt.show()

##### **1. Why did you pick the specific chart?**

I picked this chart so that I would have an idea about the distribution of the Airbnb prices and know the range from min to max and peek range based on density

##### **2. What is/are the insight(s) found from the chart?**

* The prices charged on Airbnb appear to range from **20 dollars** to **330 dollars**.

* Maximum number of bookings are ranging between **50 dollars** to **150 dollars**, compared to other listings which were below 50 dollars and above 150 dollars.

* There may be fewer bookings available at prices **above 250 dollars**, as the density of listings drops significantly in this range.   

##### **3. Will the gained insights help creating a positive business impact?**
**Are there any insights that lead to negative growth? Justify with specific reason.**

With this insight we can understand that the demand for room booking is maximum between 50 dollars and 150 dollars, so the company must focus on providing rooms at this price range for maximum profits. Also to make sure not to exceed booking price above 300 dollars as there is drastic down fall in the graph which shows similar to no bookings at all.

#### Chart - 3

#### **Checking higest and lowest bookings in different Neighbourhood_groups.**

In [None]:
# Chart - 3 visualization code
sns.countplot(y = df["neighbourhood_group"], order = df["neighbourhood_group"].value_counts().index)
plt.title('Different Neighbourhood Groups')
plt.show()

##### **1. Why did you pick the specific chart?**

I chose count plot to check the higest booking and least booking in different Neighbourhood_groups.

##### **2. What is/are the insight(s) found from the chart?**

* I found out that maximum number of bookings are in Manhattan and Brooklyn.

* Least number of bookings are in Staten Island.




##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

Since maximum number of bookings are in Manhattan and Brooklyn, company must focus on expanding more in these neighbourhood_groups. Also should focus on improving booking rate in other neighbourhood_groups as well.

#### **Chart - 4**

#### **Checking top 20 neighbourhoods on the basis of no of listings in entire NYC.**

In [None]:
# Chart - 4 visualization code

#checking top 20 neighbourhoods on the basis of no of listings in entire NYC!

top_20_neigbourhood = df['neighbourhood'].value_counts()[:20]
top_20_neigbourhood.plot(kind = 'bar',figsize=(12, 6), color = 'purple')
plt.xlabel('Neighbourhood')
plt.ylabel('Counts in entire NYC')
plt.title('Top neighbourhoods in entire NYC on the basis of count of listings')
plt.show()

##### **1. Why did you pick the specific chart?**

I pick this chart to shows which neighbourhood had most no of booking.

##### **2. What is/are the insight(s) found from the chart?**

* The top neighborhoods in New York City in terms of listing counts are Williamsburg, Bedford-Stuyvesant, Harlem, Bushwick, and the Upper West Side.

* The top neighborhoods are primarily located in Brooklyn and Manhattan. This may be due to the fact that these boroughs have a higher overall population and a higher demand for housing.

* The number of listings alone may not be indicative of the overall demand for housing in a particular neighborhood, as other factors such as the cost of living and the availability of housing may also play a significant role.






##### **3. Will the gained insights help creating a positive business impact?**
**Are there any insights that lead to negative growth? Justify with specific reason.**

With the help of this data we can find target places to improve services at those and it will also help to get profit by limiting the advertisement market area so we can minimize the cost, also we should keep into consideration about the cost of living and the availability of housing and to keep the prices based on demand and area to increase the profit.

#### Chart - 5

#### **Checking top 10 hosts in the Airbnb NYC dataset based on the number of listings each host has.**

In [None]:
# Chart - 5 visualization code

# create a new DataFrame that displays the top 10 hosts in the Airbnb NYC dataset based on the number of listings each host has
top_10_hosts = df['host_name'].value_counts()[:10].reset_index()

# rename the columns of the resulting DataFrame to 'host_name' and 'Total_listings'
top_10_hosts.columns = ['Host_name', 'Total_listings']

# display the resulting DataFrame
top_10_hosts

In [None]:
top_hosts = df['host_name'].value_counts()[:10]

# Create a bar plot of the top 10 hosts
top_hosts.plot(kind='bar', color='chocolate', figsize=(12, 6))

# Set the x-axis label
plt.xlabel('Top10_hosts', fontsize=14)

# Set the y-axis label
plt.ylabel('Total_NYC_listings', fontsize=14)

# Set the title of the plot
plt.title('Top 10 hosts on the basis of number of listings in entire NYC', fontsize=15)
plt.show()

##### 1. Why did you pick the specific chart?

I chose this graph because this data is categorical and I need to find top 10 hosts on the basis of their listings in entire NYC.

##### 2. What is/are the insight(s) found from the chart?

* The top three hosts in terms of total listings are Michael, David and Sounder (NYC) who have 417, 403, and 327 listings, respectively.

* There is a significant difference between the top hosts and the rest of the hosts. For example, Sounder (NYC) has 327 listings, compared to Michael's 417 listings.

* Maria has 204 listings in this top10 list, which is significantly less than Michael's 417 listings. This may indicate that the success rate of different Airbnb hosts varies greatly.

* There are only a few hosts who have plenty of listings. This indicates that the Airbnb market is competitive, with just a handful of hosts controlling a large portion of the marketplace.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

There are 3 main player here who has most number of listings in Airbnb. This indicates Airbnb is competitive and dominative. Since there are dominant players present in the market, it would be tough for new hosts to enter the market. Airbnb must focus on introducing schemes and benefits for emerging hosts, Ultimately it will be useful for company too.   

#### Chart - 6

#### **Checking total counts of each room type.**

In [None]:
# create a new DataFrame that displays the number of listings of each room type in the Airbnb NYC dataset
top_room_type = df['room_type'].value_counts().reset_index()

# rename the columns of the resulting DataFrame to 'Room_Type' and 'Total_counts'
top_room_type.columns = ['Room_Type', 'Total_counts']

# display the resulting DataFrame
top_room_type

In [None]:
# Set the figure size
plt.figure(figsize=(10, 6))

# Get the room type counts
room_type_counts = df['room_type'].value_counts()

# Set the labels and sizes for the pie chart
labels = room_type_counts.index
sizes = room_type_counts.values

# Create the pie chart
plt.pie(sizes, labels=labels, autopct='%1.1f%%')

# Add a legend to the chart
plt.legend(title='Room Type', bbox_to_anchor=(0.8, 0, 0.5, 1), fontsize='12')
plt.title('Percentage use of different room_types')
# Show the plot
plt.show()

##### 1. Why did you pick the specific chart?

To show the percentage use of different room_types.

##### 2. What is/are the insight(s) found from the chart?

* Airbnb has 25393 listings for entire homes or apartments, followed by 22306 listings for private rooms and 1159 listings for shared rooms.

* The number of listings for each room type varies greatly. There are nearly 20 times more listings for entire homes or apartments than there are for shared rooms.

* According to the data, travelers who use Airbnb have a wide range of accommodation options to choose from, including private rooms and entire homes or apartments.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The percentage of entire houses/appt is 52.0 %. The percentage for private rooms is 45.7 %. Lastly the percentage of shared rooms is just 2.4 %. So this data enables us to determine that the majority of people want to rent an entire house/apt, so our data suggests that the company should focus on making/listing/availability of entire houses/apartments or private rooms.

#### Chart - 7

#### **Checking average prices in different room types.**  

In [None]:
# Chart - 7 visualization code.
avg_price = df.groupby(["room_type"])["price"].mean()
ap = avg_price.plot.bar(figsize = (8,4), fontsize = 10, color = 'm')
ap.set_xlabel("Room Type", fontsize = 12)
ap.set_ylabel("Average Price", fontsize = 12)
ap.set_title("Average price in different room type", fontsize = 14)
plt.show()

##### 1. Why did you pick the specific chart?

To get an idea of the average price of each room type in relation to its demand/occupancy.

##### 2. What is/are the insight(s) found from the chart?

This graph shows that the average mean price of an entire home/apt is the highest when compared to the other two room types. This also provides us with a profit-oriented new listing strategy to focus more on entire homes/apts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This chart draws special attention to ensure that the company focuses more on availability, followed by providing the best service to the customers for this room type in order to achieve overall positive customer feedback and reviews, which will ultimately boost bookings.

#### Chart - 8

#### **Checking average stays in different room types.**

In [None]:
# Chart - 8 visualization code.
df.groupby('room_type')['minimum_nights'].mean().plot(figsize= (4,4), kind='bar', color='peru')
plt.title('Average Stays in different room types', fontsize = 14)
plt.xlabel('Room types', fontsize = 12)
plt.ylabel('Average Stays', fontsize = 12 )
plt.show()

##### 1. Why did you pick the specific chart?

This chart was chosen to show the average number of nights spent in different room types.

##### 2. What is/are the insight(s) found from the chart?

This chart informs us about providing adequate facilities on average days and when the respective room type will be available for booking again.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

We can calculate business profit using this chart by performing simple calculations such as the number of times each room type is available for booking in a month and deciding on a booking price that takes into account expenses.

#### Chart - 9

#### **Checking location of different neighbourhood groups in the city.**

In [None]:
# Chart - 9 visualization code.

# Chart - visualization of each neighbourhood_group using latitude and longitude
plt.figure(figsize = (14,7))
sns.scatterplot(x = df["longitude"], y = df["latitude"], hue = df["neighbourhood_group"],palette='colorblind')
plt.show()

##### 1. Why did you pick the specific chart?

This chart shows the location with respect to longitude and latitude of different neighbourhood groups in the city.

##### 2. What is/are the insight(s) found from the chart?

The Scatter plot shows that Manhattan and Brooklyn have nearly identical longitudes, which explains why they both receive nearly 85% of bookings. And because Staten Island is on the outskirts, there are fewer bookings and listings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This chart shows that locations closer to popular hotspots can attract more bookings, so the company must work hard to list a greater variety of popular room types in order to increase overall revenue by attracting customers.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize = (14,7))

# create a scatter plot that displays the longitude and latitude of the listings in the Airbnb NYC dataset with room_types.
ax = sns.scatterplot(x= df.longitude, y = df.latitude, hue = df.room_type, palette='colorblind')

# set the title of the plot
ax.set_title('Distribution of type of rooms across NYC', fontsize='14')
plt.show()

##### 1. Why did you pick the specific chart?

I chose scatterplot to see the distribution of types of rooms across NYC.

##### 2. What is/are the insight(s) found from the chart?

From the avove chart we can see the demand for entire Home/Appartment and private room is at the fullest and very few areas we can see demand for shared room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Most of the customers prefer booking Entire Appartment/Home and private room. So company must focus on expanding these room types more and  maximize profits.

#### Chart - 11

In [None]:
# Chart - 11 visualization code.
# Group the Airbnb data by neighbourhood group
reviews_by_neighbourhood_group = df.groupby("neighbourhood_group")["total_reviews"].max()

# Create a pie chart to visualize the distribution of maximum number of reviews among different neighbourhood groups
plt.pie(reviews_by_neighbourhood_group, labels=reviews_by_neighbourhood_group.index, autopct='%1.1f%%')

# Add a title to the chart
plt.title("Number of maximum Reviews by Neighborhood Group in NYC", fontsize='15')

# Display the chart
plt.show()

##### 1. Why did you pick the specific chart?

To check which neighbourhood group in NYC has maximum reviews.

##### 2. What is/are the insight(s) found from the chart?

* The most popular neighborhoods for reviewing appear to be Queens and Manhattan, as both have a high number of maximum reviews.

* Queens has the highest percentage of reviews (26.5%), but the third most listings (after Manhattan and Brooklyn). This suggests that Queens, despite having fewer listings than Manhattan and Brooklyn, may be a particularly popular destination for tourists or visitors.

* Both Manhattan and Brooklyn have a high percentage of reviews, with 25.5% and 20.5%, respectively. This indicates that it is also a popular tourist or visitor destination. as there are more listings/bookings than queens.

* Based on the large number of reviews, this data suggests that Queens, Manhattan, and Brooklyn are the most popular neighborhoods for tourists or visitors.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

From the above pie chart we can see that maximum reviews are from Queens, Manhattan and Brooklyn, its safe to say that if company must implement similar techniques or do advertising in the other nighbourhood groups as well so that the listings/bookings and reviews increase in those neighbourhood groups as well which will maximise the profits.

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***