<a href="https://colab.research.google.com/github/shanmukhareddygali/EDA/blob/main/Airbnb_booking_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - **AirBnb Bookings Analysis**



**Project Type**    - EDA
##### **Contribution**    - Individual
**Name** - Shanmukha Reddy


# **Project Summary -**

Airbnb  is an online marketplace that connects people who want to rent out their homes with people looking for accommodations in specific locales. The company has come a long way since 2007, when its co-founders first came up with the idea to invite paying guests to sleep on an air mattress in their living room. According to Airbnb's latest data, it now has more than 7 million listings, covering some 100,000 cities and towns in 220-plus countries and regions worldwide.

This project involved exploring and cleaning a dataset to prepare it for analysis. The data exploration process involved identifying and understanding the characteristics of the data, such as the data types, missing values, and distributions of values. The data cleaning process involved identifying and addressing any issues or inconsistencies in the data, such as errors, missing values, or duplicate records and remove outliers.Using data visualization to explore and understand patterns in Airbnb data. I created various graphs and charts to visualize the data, and wrote observations and insights below each one to help us better understand the data and identify useful insights and patterns.



# **GitHub Link -**

Provide your GitHub Link here- https://github.com/shanmukhareddygali/EDA

# **Problem Statement**


To perform the exploratory data analysis for understanding the behaviour of customers and hosts on the Airbnb platform to make business decissions.

#### **Define Your Business Objective?**

Since 2008, guests and hosts have used Airbnb to expand on travelling possibilities and present a more unique, personalised way of experiencing the world. Today, Airbnb became one of a kind service that is used and recognized by the whole world. Data analysis on millions of listings provided through Airbnb is a crucial factor for the company. These millions of listings generate a lot of data - data that can be analysed and used for security, business decisions, understanding of customers' and providers' (hosts) behaviour and performance on the platform, guiding marketing initiatives, implementation of innovative additional services and much more. This dataset has around 49,000 observations in it with 16 columns and it is a mix of categorical and numeric values. This project is about explore and analyse the data to discover key understandings and make business decisions.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

### Dataset Loading

In [None]:
#mount google drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
#Import dataset
df = pd.read_csv("/content/drive/My Drive/Colab Notebooks/Airbnb NYC 2019.csv")    #read_csv functions reads csv file

### Dataset First View

In [None]:
# Dataset First Look
df

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape    #shape function returns no of rows and columns


### Dataset Information

In [None]:
# Dataset Info
df.info() #info function gives data types and null values information

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()   #duplicated function gives duplicated values

There are no duplicate values in dataset.

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
df.isnull().sum().plot.bar(x='index')

The column last_review  contains 10052 null values.It disturbs the entire analysis.so deleting this column is advisible for analysis.

In [None]:
del df['last_review']     #del is keyword is used to delete columns

The column 'reviews_per_month'  contains 10052 null values.It is a numerical column.The null values are located at a place where the column 'number_of_reviews' contains zero value.so replacing null values with zero is good for analysis.   

In [None]:
df['reviews_per_month'] = df['reviews_per_month'].fillna(0)    #fillna is used to fill null values

In [None]:
df

In [None]:
df.isnull().sum()

In [None]:
df = df.dropna()                  #dropna is used to drop null values

In [None]:
df

In [None]:
df.isnull().sum()

In [None]:
#check for outliers
sns.boxplot(x='price', data=df)
plt.title('Box Plot of Your Data')
plt.show()

In this dataset these outliers are important for analysis.

### What did you know about your dataset?

The dataset is about Airbnb bookings in Newyork City in USA. This dataset includes a wide range of information such as such as property listings, booking details, host information, guest reviews, pricing, and geographical data.

---



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns #gives name of all columns

In [None]:
# Dataset Describe
df.describe() #gives basic statistical information

### Variables Description

id - unique Id

name - name of the listing

host_id - unique host id

neighbourhood_group - location

neighbourhood - area

latitude - latitude range

longitude - longitude range

room_type - type of listing

price - price of listing

minimum_nights - minimum nights to be paid for

Number_of reviews - number of reviews

last_review - content of the last review

reviews per month - number of checks per month

calculated_host_listing_count - Total count

availability_365 - availability around the year

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()  #Total count of unique values of each variable

In [None]:
df['neighbourhood_group'].unique() #It returns the unique values in neighbourhood_group column

In [None]:
df['neighbourhood'].unique() #It returns the unique values in neighbourhood column

In [None]:
df['room_type'].unique() #It returns the unique values in room type column

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df #This is the final dataframe after data cleaning

### What all manipulations have you done and insights you found?

After loading the dataset,I checked for null values in the dataset.There are 10052 null values in reviews_per_month column and last_review column.The last_review column contains date of last review.Deleting this column wont affect the analysis.In reviews_per_month column the null values are located at a place where the column 'number_of_reviews' contains zero value.so replacing null values with zero is appropriate for analysis..After that I deleted the remaining null values.Now the dataset is free of null values and ready for exploration.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

**Univariate Analysis**

#### Chart - 1

**What is the most preferred Neighbourhood group in NYC?**

In [None]:
# Chart - 2 visualization code
fig1 = df['neighbourhood_group'].value_counts()
  #the value_counts() function is used to count the occurrences of unique values in a Series or DataFrame column
#plot pie chart
fig1.plot.pie(labels = fig1.index ,autopct = '%.1f%%',fontsize=8,labeldistance=1.1)
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are a type of data visualization that represent data in a circular graph, where each slice of the pie represents a proportion of the whole. Pie charts are useful for conveying relative proportions of different categories or components within a dataset.


##### 2. What is/are the insight(s) found from the chart?

85% of listings located in Manhattan and Brooklyn neighbour_hood.Bronx and staten island has just 3% listings.It can say that Manhattan and Brooklyn are more preferred neighbour_hood groups and Staten island and Bronx is least preferred neighbour_hood groups

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights  gained can definitely help create a positive business impact.

 1.By focusing marketing efforts on Manhattan and Brooklyn, It can attract more users to these high-demand areas. Highlighting the unique features and attractions of these neighborhoods can lead to increased bookings and positive customer experiences.

2 . Adjusting pricing strategies to reflect the high demand in Manhattan and Brooklyn can lead to improved revenue. Implementing dynamic pricing during peak periods and offering competitive rates can attract more bookings.

3.Neglecting the less preferred neighborhoods like Bronx and Staten Island might result in missing out on potential customers.




#### Chart - 2

**What is the most preferred room type in NYC?**

In [None]:
# Chart - 2 visualization code
fig2 = df['room_type'].value_counts()
#the value_counts() function is used to count the occurrences of unique values in a Series or DataFrame column
fig2



In [None]:
#plot pie chart
fig2.plot.pie(labels = fig2.index,autopct = '%.1f%%',fontsize=8,labeldistance=1.1)
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are a type of data visualization that represent data in a circular graph, where each slice of the pie represents a proportion of the whole. Pie charts are useful for conveying relative proportions of different categories or components within a dataset

##### 2. What is/are the insight(s) found from the chart?

Most preferred room type in the NYC is Entire home/apt followed by private room and least preferred room type is shared room.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained can definitely help create a positive business impact.

1.Knowing that guests prefer entire homes or private rooms can guide hosts and property owners to focus on these types of accommodations. This insight allows hosts to adjust their offerings to meet the demand, potentially attracting more bookings and increasing overall revenue.

2.Hosts can strategically price these accommodations to maximize their earnings. They might also consider offering promotions or discounts for longer stays in these preferred room types.

3.Adjust marketing campaigns to highlight the benefits of entire homes/apartments and private rooms, such as privacy, space, and access to kitchens and living rooms.

4 . Neglecting shared rooms might result in missing out on potential customers.


#### Chart - 3

**What is the most preferred neighbourhood in NYC?**

In [None]:
# Chart - 3 visualization code
fig3 = df['neighbourhood'].value_counts().head(10)
#the value_counts() function is used to count the occurrences of unique values in a Series or DataFrame column
fig3

In [None]:
#plot horizontal bar chart for Neighbourhood vs bookings count
plt.barh(fig3.index, fig3.values)           #barh is used to plot horizantal bar chart

# Add labels and title
plt.xlabel('Bookings Count')
plt.ylabel('Neighbourhood')
plt.title('Horizontal Bar Chart')

# Show plot
plt.show()

1.Why did you pick the specific chart?

Horizontal bar charts are typically used to display and compare categorical data. They are particularly useful when you have long category labels or when you want to emphasize the comparison of values across categories.

##### 2. What is/are the insight(s) found from the chart?

1.Williamsburg is the most preferred neighbourhood in NYC with 3917 bookings.

2.Williamsburg and Bedford-Stuyvesant are the two neighbourhoods which have bookings more than 3500.

3.The third preferred neighbourhood is Harlem with 2655 bookings.

4.The bookings of six neighbourhoods are in between 1500 and 2000.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained can definitely help create a positive business impact.

1.Knowing that Williamsburg is the most preferred neighborhood with 3917 bookings allows businesses to focus their marketing efforts and resources on this area. They can tailor their services or products to cater to the specific needs and preferences of visitors to Williamsburg.

2.Recognizing that both Williamsburg and Bedford-Stuyvesant have bookings exceeding 3500 suggests that these neighborhoods are in high demand among tourists or visitors. Businesses can capitalize on this information by strategically locating or expanding their operations in these areas to maximize their visibility and potential customer base.

3.While focusing on popular neighborhoods like Williamsburg, Bedford-Stuyvesant, and Harlem can be beneficial, it's essential not to overlook other areas with lower booking numbers. Neglecting these neighborhoods could result in missed opportunities for business growth and expansion, especially if they have untapped potential or unique offerings that appeal to certain demographics.

#### Chart - 4 Bi-Variate Analysis

Who is the most preferred host?

In [None]:
# Chart - 4 visualization code
fig4 = df.groupby(['host_id', 'host_name']).size().sort_values(ascending = False).head(10).reset_index(name = 'count')

In [None]:
fig4

In [None]:
#Plot bar chart for host_id vs Count
plt.barh(fig4['host_name'], fig4['count'])

# Add labels and title
plt.xlabel('Bookings Count')
plt.ylabel('host name')
plt.title('Horizontal Bar Chart')

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?

Horizontal bar charts are typically used to display and compare categorical data. They are particularly useful when you have long category labels or when you want to emphasize the comparison of values across categories.

##### 2. What is/are the insight(s) found from the chart?

1.Sonder(NYC) is the most preferred host with 327 bookings in NYC.

2.The second most preferred host is Blueground with 232 bookings.

3.Remaining hosts have very less bookings compared to Sonder(NYC).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained can definitely help create a positive business impact.

1.Knowing that Sonder(NYC) is the most preferred host can help competitors understand what attributes or services make it popular among customers. They can then work on enhancing their own offerings to compete more effectively.

2.Blueground, as the second most preferred host, can use this information to identify areas where it can improve to potentially surpass Sonder(NYC) in bookings. Analyzing what sets Sonder(NYC) apart can help Blueground refine its own marketing and service strategies.

3.The significantly lower number of bookings for remaining hosts compared to Sonder(NYC) and Blueground suggests that these hosts may struggle to gain traction in the market. This could lead to negative growth or even failure for smaller or less established hosts if they are unable to differentiate themselves or attract customers.

#### Chart - 5

**Which is the most expensive Neighbourhood group?**

In [None]:
# Chart - 5 visualization code
costly_nhg = df.groupby('neighbourhood_group')['price'].mean().reset_index()
costly_nhg

In [None]:
#plot bar plot for Neighbourhood_group vs Mean price
fig5 = costly_nhg.plot(kind = 'bar',x ='neighbourhood_group',y = 'price',color = 'green')
# Add labels and title
plt.xlabel('location')
plt.ylabel('mean price')
plt.title('Mean price in Neighbourhood_groups')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is a graphical representation of data where rectangular bars of varying lengths are used to visualize the values associated with different categories or groups,particularly effective for comparing and communicating categorical data in a clear and accessible manner.

##### 2. What is/are the insight(s) found from the chart?

**Insights:**

1.It can clearly say that Manhattan is the most expensive neighbourhood group with mean price of 196.90 dollors followed by brooklyn with 124.41 dollors.

2.Staten Island is the area with the least number of bookings has mean price higher than Queens and Bronx.

3.Bronx is lower in mean price compare to all neighbourhood groups with 87.47 dollors.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained can definitely help create a positive business impact.

1.Knowing Manhattan and Brooklyn have the highest average price points allows for targeted marketing campaigns emphasizing luxury or premium services specific to those areas.

2.Staten Island's higher prices despite fewer bookings suggests potential overpricing. Adjusting prices competitively could attract more bookings and increase revenue.

3.While targeting high-paying areas like Manhattan is beneficial, relying solely on that might neglect potential customers in other boroughs who are willing to spend within a certain range. Balancing pricing across neighborhoods to capture a wider customer base could drive further growth.

#### Chart - 6

**Which is the most expensive room type?**

In [None]:
# Chart - 6 visualization code
fig6 = df.groupby('room_type')['price'].mean().reset_index()
fig6

In [None]:
#Plot Bar chart for Mean price vs Room types
costly_room_type = fig6.plot.bar(x='room_type',y = 'price',legend = False)
# Add labels and title
plt.xlabel('Room type')
plt.ylabel('mean price')
plt.title('Mean price of Room types')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is a graphical representation of data where rectangular bars of varying lengths are used to visualize the values associated with different categories or groups,particularly effective for comparing and communicating categorical data in a clear and accessible manner.

##### 2. What is/are the insight(s) found from the chart?

1.Entire home/apt room type is the expensive type.

2.Despite having a mean price of $211, 52% of total bookings are done for the Entire home or apartment type.

3.The second most expensive room type is a Private room with a mean price of $89. 47% of total bookings are done in this room type.

4.The mean price of an Entire home/apt type is two times more than that of a Private room, but still, bookings of the Private room type are 5% less than those of the Entire room/apt room type.

5.Least costliest room type is Shared room type,but the bookings of this room type is just 2.4%.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained can definitely help create a positive business impact.

1.Knowing that the Entire home/apt room type is the most expensive and yet comprises 52% of total bookings suggests that customers value privacy and space, even if it comes at a higher cost. This insight can guide pricing strategies, indicating that customers are willing to pay more for certain amenities.

2.Recognizing that Private rooms are the second most expensive but have slightly lower bookings compared to Entire home/apt suggests that there may be room for optimization. This insight could prompt the business to adjust pricing or marketing strategies for Private rooms to increase their booking percentage.

3.Since Shared rooms are the least costly yet have low bookings (just 2.4%), there's an opportunity to target marketing efforts towards this segment. Understanding why customers are less inclined to book this type of room could lead to adjustments in amenities, pricing, or promotional strategies to increase its appeal.


#### Chart - 7 Multivariate Analysis

**What is the most expensive Neighbourhood in NYC?**

In [None]:
# Chart - 7 visualization code
fig7 = df.groupby('neighbourhood').agg({'price' : ['mean','count']}).reset_index().sort_values(by = ('price',  'mean'),ascending = False).head(10)
fig7

In [None]:
#plot horizontal bar chart for Neighbourhood vs (mean and Count)
fig7.plot(kind = 'barh',y = 'price',x='neighbourhood')
# Add labels and title
plt.xlabel('Values')
plt.ylabel('neighbourhood')
plt.title('Horizontal Bar Chart')

plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart is a graphical representation of data in which bars are plotted horizontally. When the category labels are long or when you have many categories to display,horizontal bar charts are used.It is useful for comparing values across categories, particularly when you want to emphasize the differences between categories.

##### 2. What is/are the insight(s) found from the chart?

1.With only one booking, It is difficult to say that Fort wadsmith is the most expensive neighbourhood.

2.With considering bookings, Tribeca is the most expensive neighbourhood with mean price of 499 dollors and 177 bookings.


  

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1.While Tribeca is known for its luxury and amenities, it may not be suitable for everyone due to its high cost of living.So many customers prefer low affordable neighbourhoods.It leads to negative growth

#### Chart - 8

Who is the most expensive host in NYC?

In [None]:
# Chart - 8 visualization code
fig8 = df.groupby(['host_id','host_name']).agg({'price':['mean','count']}).reset_index().sort_values(by = ('price','mean'),ascending = False).head(10)
fig8

In [None]:
#plot horizontal bar chart
fig8.plot(kind = 'barh',y = 'price',x='host_name')
# Add labels and title
plt.xlabel('Mean price')
plt.ylabel('Neighbourhood')
plt.title('Horizontal Bar Chart')

plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart is a graphical representation of data in which bars are plotted horizontally. When the category labels are long or when you have many categories to display,horizontal bar charts are used.It is useful for comparing values across categories, particularly when you want to emphasize the differences between categories.

##### 2. What is/are the insight(s) found from the chart?

1.Jalena,Erin and Katherine are the most expensive hosts with 10000 dollors,which is very higher compared to mean price 152 dollors

2.The bookings of the above hosts is one.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1.The high cost of listings leads to less bookings.This can lead to negative growth.

#### Chart - 9

**Host with most reviews?**

In [None]:
# Chart - 9 visualization code
fig9 = df.groupby(['host_id','host_name'])['number_of_reviews'].sum().reset_index().sort_values(by = 'number_of_reviews',ascending = False).head(10)
fig9

In [None]:
#plot horizontal bar chart
fig9.plot(kind = 'barh',y = 'number_of_reviews',x='host_name')
# Add labels and title
plt.xlabel('Number of Reviews')
plt.ylabel('Host_name')
plt.title('Horizontal Bar Chart')

plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart is a graphical representation of data in which bars are plotted horizontally. When the category labels are long or when you have many categories to display,horizontal bar charts are used.It is useful for comparing values across categories, particularly when you want to emphasize the differences between categories.

##### 2. What is/are the insight(s) found from the chart?

1.Maya is the host who got the most number of reviews foloowed by Brooklyn& Breakfast -Len-.

2.There is only difference of 65 reviews between top two hosts.

3.The host with the most bookings is Sandor(NYC) got less reviews than Maya.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights gained can definitely help create a positive business impact.

1.Sandor has the most bookings despite having fewer reviews than Maya. This could indicate that Sandor is successful at attracting guests through other means such as marketing or word-of-mouth referrals.

2.The hosts which got more reviews compared to Sandor(NYC) are lacking the marketing strategies or poor maintaince.

3.To make positive growth,improve the marketing strategies and maintaince of the listings.

#### Chart - 10

What is the Mean prices of most preferred Neighbourhoods?

In [None]:
fig10 = df.groupby('neighbourhood').agg({'price' : ['mean','count']}).reset_index().sort_values(by = ('price',  'count'),ascending = False).head(10)
fig10

In [None]:
#plot horizontal bar chart for Neighbourhood vs (mean and Count)
fig10.plot(kind = 'barh',y = 'price',x='neighbourhood')
# Add labels and title
plt.xlabel('Values')
plt.ylabel('neighbourhood')
plt.title('Horizontal Bar Chart')

plt.show()

##### 1. Why did you pick the specific chart?

A horizontal bar chart is a graphical representation of data in which bars are plotted horizontally. When the category labels are long or when you have many categories to display,horizontal bar charts are used.It is useful for comparing values across categories, particularly when you want to emphasize the differences between categories.

##### 2. What is/are the insight(s) found from the chart?

1.Williamsburb neighbourhood with most bookings has the mean price of just 143 dollors.

2.In most preferred neighbourhoods,Mid town neighbourhood mean price(282 dollors) is very high and Bushwik neighbourhood mean price very low(84 dollors).Bushwik bookings are 1000 more than the Mid town.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1.Understanding that Midtown has a significantly higher mean price compared to Bushwick, despite having fewer bookings, suggests different market segments. Tailoring marketing strategies and service offerings to cater to these distinct segments could lead to increased customer satisfaction and loyalty.

2.Knowing that the Williamsburg neighborhood, with the most bookings, has a mean price of $143, can guide pricing strategies. If this price point correlates with high demand, it suggests that potential guests find it attractive. Capitalizing on this insight, you could adjust pricing in other neighborhoods to align with this successful pricing model, potentially increasing bookings and revenue.

#### Chart - 11

**How bookings of room types are distributed in each neighbourhood_group?**

In [None]:
# Chart - 11 visualization code
df.groupby(['neighbourhood_group', 'room_type'])['name'].count().unstack()

In [None]:

fig11 = df.groupby(['neighbourhood_group', 'room_type'])['name'].count().unstack(fill_value=0)

# Plot a bar chart for each location
fig11.plot(kind='bar', stacked=True)
# Add labels and title
plt.xlabel('Location')
plt.ylabel('Bookings Count')
plt.title('Distribution of Bookings')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()








##### 1. Why did you pick the specific chart?

A stacked bar chart is a type of bar chart that represents multiple categories as segments within each bar. Each bar represents a total, and the segments of the bar represent the proportion of that total contributed by different categories. This chart is useful for visualizing the composition of a whole and the contribution of individual components.

##### 2. What is/are the insight(s) found from the chart?

1.Overall most preferred room type in NYC is Entire home/apt room type,but every neighbourhood group except Manhattan, most preferred room type is Private room type.

2.Most customers preferring Entire home/apt type in Manhattan and Private room type in Brooklyn.

3.Staten Island bookings are lower in all room types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1.Knowing that the overall preference in NYC is for Entire home/apt, but Private room is preferred in most neighborhoods except Manhattan, allows for targeted marketing campaigns. For example, you could tailor advertisements to highlight the privacy and intimacy of Private rooms in non-Manhattan neighborhoods, while focusing on the luxury and convenience of Entire home/apt in Manhattan.

2.Understanding the preferences of customers in different neighborhoods enables you to optimize your inventory accordingly. You can allocate resources more effectively by offering more Private room options in Brooklyn and more Entire home/apt options in Manhattan.

3.The insight that bookings are lower in all room types in Staten Island could lead to stagnant growth in that area. If demand is consistently low and doesn't show signs of improvement, it might not be financially viable to invest resources into expanding operations there.

#### Chart - 12

**How reviews of room types distributed in each neighbourhood?**

In [None]:
# Chart - 12 visualization code
df.groupby(['neighbourhood_group', 'room_type'])['number_of_reviews'].sum()


In [None]:
fig12 = df.groupby(['neighbourhood_group', 'room_type'])['number_of_reviews'].sum().unstack(fill_value=0)

# Plot a bar chart for each location
fig12.plot(kind='bar', stacked=False)
# Add labels and title
plt.xlabel('Location')
plt.ylabel('Number of   Reviews')
plt.title('Distribution of reviews')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()

##### 1. Why did you pick the specific chart?

Unstacked bar charts are valuable tools for visualizing and comparing categorical data across multiple groups or categories. They allow for the clear representation of the composition and distribution of data within each category, making them particularly useful for displaying the relationship between different groups and their respective subgroups.

##### 2. What is/are the insight(s) found from the chart?

1.Despite have less number of bookings than Manhattan, Brooklyn is the Most reviewed neighbourhood group in Entire home/apt type.

2.Brooklyn is the most reviewed neighbourhood group in private room type.

3.Manhattan is the most reviewed neighbourhood group in shared room type.

4.Staten Island is the least reviewed neighbourhood group in all room types.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1.Focusing on promoting Brooklyn properties, especially entire homes/apartments and private rooms, can attract more attention and bookings. Highlighting Brooklyn's popularity among reviewers can enhance its appeal to potential guests.

2.Tailoring marketing campaigns to emphasize Brooklyn's unique attractions, such as cultural diversity, vibrant neighborhoods, and more affordable accommodation options compared to Manhattan, can attract a broader audience.

3.Ignoring the least-reviewed neighborhood group, Staten Island, entirely could also be a missed opportunity. It might be worth exploring ways to increase visibility and attractiveness for properties in Staten Island, especially if there are untapped markets or unique selling points to leverage.

#### Chart - 13

**Mean price of room types distribution in each neighbourhood_group?**

In [None]:
# Chart - 13 visualization code
df.groupby(['neighbourhood_group', 'room_type'])['price'].mean()

In [None]:
fig13 = df.groupby(['neighbourhood_group', 'room_type'])['price'].mean().unstack(fill_value=0)

# Plot a bar chart for each location
fig13.plot(kind='bar', stacked=False)
# Add labels and title
plt.xlabel('Location')
plt.ylabel('Mean Price')
plt.title('Mean price of room types in each neighbourhood_group')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()

##### 1. Why did you pick the specific chart?

Unstacked bar charts are valuable tools for visualizing and comparing categorical data across multiple groups or categories. They allow for the clear representation of the composition and distribution of data within each category, making them particularly useful for displaying the relationship between different groups and their respective subgroups.

##### 2. What is/are the insight(s) found from the chart?

1.The mean price of all room types are higher in Manhattan.

2.Mean price of entire home/apt type in Brooklyn and Staten Island is on same range. But the bookings of Brooklyn is very high compared to Stated Island.

3.Mean price of entire home/apt type in Staten Island is more than Queens entire home/apt type.But the bookings of Queens is almost 10 times higher than Staten Island.

4.Mean price of shared room in Brooklyn is lower than all neighbourhoods.But shared room bookings of Brooklyn is higher than Queens,Bronx and Staten Island.

5.Except Manhattan, all neighbourhood_groups have same range of mean price in Private room type.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

1.Knowing that the mean price of all room types is higher in Manhattan suggests that there's a potential for higher revenue and profit margins in this area. Businesses can leverage this information to focus marketing efforts on Manhattan or adjust pricing strategies accordingly to maximize returns.

2.Despite the mean price of entire home/apt type being similar in Brooklyn and Staten Island, Brooklyn sees significantly higher bookings. This indicates a higher demand or popularity for accommodations in Brooklyn. Businesses can capitalize on this insight by investing more resources into properties or marketing efforts in Brooklyn to further enhance bookings and revenue.

3.Despite having lower prices for shared rooms compared to other neighborhoods, Brooklyn still sees high bookings in this room type. This suggests that there's a market demand for affordable shared accommodations in Brooklyn. Businesses can use this insight to adjust pricing strategies and potentially increase revenue by offering competitive prices for shared rooms.

4.Except for Manhattan, all neighborhoods have a similar range of mean prices for private room types. This insight can help businesses maintain competitive pricing strategies across different neighborhoods to attract a wider range of customers.

5.Despite having higher mean prices for entire home/apt types compared to Queens, Staten Island sees significantly lower bookings. This indicates a potential issue with demand or attractiveness of accommodations in Staten Island despite higher pricing. Businesses operating in Staten Island may need to reassess their marketing strategies or consider adjusting prices to align with market demand and stimulate bookings.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Compute the correlation matrix
corr = df.loc[:,[ 'price',
        'number_of_reviews', 'reviews_per_month','minimum_nights','availability_365'
       ]].corr()

# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt=".2f")

# Add title
plt.title('Correlation Heatmap')

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?

A correlation heat map is a graphical representation of the correlation matrix, where correlation coefficients between variables are displayed as colors in a grid.Correlation heat maps provide a visually intuitive way to explore relationships between variables

##### 2. What is/are the insight(s) found from the chart?

1.Number of reviews and Reviews per month has the 0.59 correlation.From this we can say, if number of reviews and reviews per month has the positive correlation.

2.Remaining numerical columns have week correlation.

#### Chart - 15 - Pair Plot

In [None]:
df.columns

In [None]:
# Pair Plot visualization code
# Create pair plot
sns.pairplot(df.loc[:,[ 'price',
        'number_of_reviews', 'reviews_per_month','minimum_nights','availability_365'
       ]])

# Show plot
plt.show()

##### 1. Why did you pick the specific chart?

Pair plots, also known as scatterplot matrices, are a useful visualization tool for exploring relationships between multiple variables in a dataset.

##### 2. What is/are the insight(s) found from the chart?

1.Number of reviews and Reviews per month columns has the strong Positive correlation.

2.Remaining numerical columns have very weak correlation.There is no relationship between them.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1.From the above analysis I can say that Manhattan and Brooklyn are major business areas for Airbnb.Focus on these neighbourhood groups will increase the business

2.By improving targeted marketing campaigns,highlighting beauties of these locations will help in increase the business.

3.Improve the facilities of room types or modifying rooms based on the interests of customers will help in increase the business.




# **Conclusion**

The analysis of Airbnb booking data has revealed several key insights in studying the behaviour of customers and business of NYC.These findings have important implications for both Airbnb hosts and users.Moving forward, there is a need for further research in reviews.In dataset they only provide number of reviews,but there is need of positive reviews and negative reviews to study about the hosts maintaince and behaviour. Overall, this analysis underscores the importance of understanding Airbnb booking behavior and offers valuable insights for improving the Airbnb experience for all stakeholders."

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***