<a href="https://colab.research.google.com/github/hussainaarish7/capstone-project-1/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
**by**   - MOHD AARISH

# **Project Summary -**

Airbnb, a unique and personalized platform for travelers, has grown exponentially since its inception in 2008. It has become a globally recognized service, offering millions of listings that allow people to experience the world in a more diverse and customized manner. The wealth of data generated by these listings has become invaluable to Airbnb, enabling data analysis for various purposes.

This dataset contains approximately 49,000 observations and encompasses 16 columns comprising both categorical and numeric values. By delving into this dataset, we can gain valuable insights into several aspects. These insights include enhancing security measures, making informed business decisions, understanding customer and host behaviors and performance, shaping marketing strategies, and implementing innovative additional services.

By analyzing this dataset, we can unlock key understandings that will contribute to Airbnb's continued success and growth.

The following steps were followed  in the EDA process:

1. **Data Loading:**
   - Load the dataset from the relevant data source, such as a CSV file, Excel sheet, database, or API.

2. **Data Inspection:**
   - Explore the dataset to understand its structure, size, and data types. Check for any missing values or data inconsistencies.

3. **Variable Understanding:**
   - Examine the variables (columns) present in the dataset and understand their meaning and data types.

4. **Data Cleaning:**
   - Clean the data to ensure its quality and accuracy. Handle missing values, duplicates, and any other data issues.

5. **Summary Statistics:**
   - Calculate summary statistics to get a basic understanding of the data's central tendencies, dispersion, and key characteristics.

6. **Univariate, Bivariate, and Multivariate Analysis:**
   - Conduct univariate analysis to explore individual variables' distributions and characteristics.
   - Perform bivariate analysis to study relationships between pairs of variables.
   - Consider multivariate analysis to understand interactions among multiple variables.

7. **Data Visualization and Storytelling:**
   - Create visualizations to present the data in a visually appealing and informative manner. Use various plots, charts, and graphs to convey insights effectively.

8. **Solution to Business Objectives:**
   - Use the analysis results to address the defined business objectives or research questions. Provide actionable insights and recommendations.

9. **Conclusion:**
   - Summarize the key findings from the analysis and restate the implications for the business or research.


# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Q1. What can we learn from different hosts and areas?**

**Q2. What can we learn from room types and their prices according to the area?**

**Q3. What can we learn from the data? (e.g., location, prices, reviews, etc.)**

**Q4. Which hosts are the busiest, and what is the reason for their high activity?**

**Q5. Which hosts are charging higher prices?**

**Q6. Is there any traffic difference among different areas, and what could be the reason for it?**

**Q7. What is the correlation between different variables in the data?**

#### **Define Your Business Objective?**

**Optimize Pricing Strategy**: Analyze the relationship between listing prices, room types, and neighbourhoods to set competitive and attractive prices that align with market demand.

**Identify High-Demand Areas**: Determine the most popular neighbourhoods and areas with high booking demand to focus marketing efforts and enhance listing visibility.

**Improve Customer Satisfaction**: Examine customer reviews and feedback to identify areas of improvement in accommodations and services, aiming to provide a positive guest experience.

**Promote Niche Offerings**: Explore opportunities in niche markets or less-crowded areas to offer unique accommodations and attract guests seeking distinct experiences.

**Expand Business Reach**: Utilize insights from the data to identify potential growth areas and expand the hosting portfolio to new locations with high growth potential.

**Enhance Listing Features**: Analyze the impact of various listing features, such as amenities, photos, and descriptions, to optimize listings and attract more guests.

**Optimize Minimum Nights and Availability**: Determine the optimal minimum nights and availability to maximize occupancy rates while meeting guest preferences.

**Segmentation for Targeting**: Segment guests based on their preferences and booking behavior to tailor marketing efforts and personalized offers.

**Monitor Competitor Analysis**: Keep track of competitor listings and pricing in various neighbourhoods to adjust strategies accordingly and maintain competitiveness.

**Promote Positive Reviews**: Focus on delivering outstanding customer experiences to encourage positive reviews and enhance the reputation of hosts and listings.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





`# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline


### Dataset Loading

In [None]:
# Load Dataset
airbnb = pd.read_csv('/content/Airbnb NYC 2019.csv')

### Dataset First View

In [None]:
# Dataset First Look
airbnb.head(3)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
airbnb.shape

### Dataset Information

In [None]:
# Dataset Info
airbnb.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
airbnb.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
airbnb.isna().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(data=airbnb.isna())
plt.show()

### What did you know about your dataset?

In my data set thier are 48895 rows and 16 columns with no duplicated values but thier are some null values

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
airbnb.columns

In [None]:
# Dataset Describe
airbnb.describe()

### Variables Description

**id**: The unique identifier for each listing.

**host_id**: The unique identifier for each host.

**latitude**: The latitude coordinate of the listing's location.

**longitude**: The longitude coordinate of the listing's location.

**price**: The price of the listing per night.

**minimum_nights**: The minimum number of nights required for booking.

**number_of_reviews**: The total number of reviews received for the listing.

**reviews_per_month**: The average number of reviews received per month.

**calculated_host_listings_count**: The number of listings belonging to the same host.

**availability_365**: The number of days the listing is available for booking in a year.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for elem in airbnb:
    print(f'no. of unique values in \033[1m{elem}\033[0m is \033[1m{[airbnb[elem].nunique()]}\033[0m.')

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
airbnb.columns

In [None]:
# droping unnecessary columns
airbnb.drop(['last_review'],axis=1,inplace=True)


In [None]:
# droping all the null values from all the columns of data frame
airbnb.dropna(inplace=True)

In [None]:
# Count the number of occurrences for each room type in the Airbnb dataset
room_type_count = airbnb.room_type.value_counts().reset_index(name='count')

# Display the resulting DataFrame
room_type_count


Entire home/apt has highest number of listing followed by private room



In [None]:
#Q1. what can we learn from different hosts and areas?

In [None]:
# Find the maximum calculated_host_listings_count for each host in each neighbourhood group
host_areas = airbnb.groupby(['host_name', 'neighbourhood_group'])['calculated_host_listings_count'].max().reset_index()

# Sort the resulting DataFrame by calculated_host_listings_count in descending order
host_areas=host_areas.sort_values(by='calculated_host_listings_count', ascending=False)
host_areas

we learn that that Sondar(NYC) has the highest number of listings in Manhattan

In [None]:
# Q2 what can we learn from room_type and their prices according to area?

In [None]:
# Calculate the mean price for each combination of 'neighbourhood_group' and 'room_type'
price_location_room_type = airbnb.groupby(['neighbourhood_group', 'room_type'])['price'].mean().sort_values(ascending=False).reset_index()

# Display the resulting DataFrame
price_location_room_type


from this data frame we learned that Entire home/apt has the maximum average price located in Manhattan followed by brooklyn

In [None]:
# Q3 what can we learn from data ?(ex: locations,prices,reviews etc.)

In [None]:
# Calculate the maximum number of reviews for each 'neighbourhood_group'
area_reviews = airbnb.groupby(['neighbourhood_group'])['number_of_reviews'].count().sort_values(ascending=False).reset_index()

# Display the resulting DataFrame
area_reviews

Manhattan shows maximum number of reviews

In [None]:
# calculate the maximum average number of reviews for each 'room_type'
room_reviews = airbnb.groupby('room_type')['number_of_reviews'].count().sort_values(ascending=False).reset_index()
# Display the resulting DataFrame
room_reviews

maximum number of reviews we got for the Entire home/apt and then for private room

In [None]:
# Q4 which hosts are the busiest and what is the reason ?

In [None]:
# Calculate the maximum number of reviews for each host, host name, and room type combination
busy_host = airbnb.groupby(['host_id', 'host_name', 'room_type'])['number_of_reviews'].max().sort_values(ascending=False).reset_index()

# Display the first 10 rows of the resulting DataFrame
busy_host.head(10)

from this we get list of top ten busiest hosts and thier prices according to ther room type

In [None]:
# Q5 which hosts are charging higher price ?

In [None]:
# Calculate the highest price for each host, host name, and neighbourhood group combination
highest_price = airbnb.groupby(['host_id', 'host_name', 'neighbourhood_group'])['price'].max().sort_values(ascending=False).reset_index()

# Display the first 10 rows of the resulting DataFrame
highest_price = highest_price.head(10)
highest_price


these are the hosts who charging maximum price from thier costomers

In [None]:
# Q6 is there any traffic difference among different areas and what could be the reason for it?

In [None]:
# Count the occurrences of each combination of neighbourhood_group and room_type
traffic_areas = airbnb.groupby(['neighbourhood_group', 'room_type'])['minimum_nights'].count().sort_values(ascending=False).reset_index()

# Display the resulting DataFrame
traffic_areas


Manhattan and brooklyn are those areas where more traffic come

### What all manipulations have you done and insights you found?

1. The most popular listing type is "Entire home/apt," followed by "Private room."

2. The neighborhood group "Sondar" in NYC (New York City) has the highest number of listings in Manhattan, with "Blueground" being the second highest.

3. In Manhattan, "Entire home/apt" listings have the highest prices, followed by "Brooklyn."

4. Among all neighborhood groups, the listings located in Queens have received the highest number of reviews.

5. The top ten busiest hosts are: 'Dona', 'Jj', 'Maya', 'Carol', 'Danielle', 'Asa', 'Wanda', 'Linda', 'Dani', and 'Angela'.

6. The hosts charging the maximum prices from customers are: 'Kathrine', 'Erin', 'Amy', 'Olson', 'Rum', 'Jessica', 'Sandra', 'Jay And Liz', 'Debra', and 'Rasmus.'

7. Manhattan is the most popular area with the highest traffic, followed by Brooklyn as the second most preferred destination. Queens ranks third in terms of visitor preference.

These insights provide valuable information about listing types, host activities, pricing trends, and popular areas for visitors in the dataset.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.bar(room_type_count['index'],room_type_count['count'] )
plt.title('Room types ')
plt.xlabel('room_types')
plt.ylabel('counts')
plt.show()



##### 1. Why did you pick the specific chart?

I chose the bar chart to know the room_type distribution because it allows for a clear visualization of the distribution of listings among different rooms. The bar chart is effective for comparing the number of listings for each room type and identifying any notable differences or patterns.

##### 2. What is/are the insight(s) found from the chart?

this visualization tell us that Entire home/apt are showing the maximum number of listings which is followed by private room.It means that most number of hosts like to list thier properties as entire home/apt.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The count of each room_type give us a clear picture of which types of accommodations are in high demand among guests. This information  help to focus on offering the most popular room_types, ensuring we have a good supply of the preferred accommodation types.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
sns.barplot(data=host_areas.head(10),x='host_name',y='calculated_host_listings_count')
ax = plt.gca()
ax.set_xticklabels(ax.get_xticklabels(), rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

I chose the bar chart for the Host Name Distribution because it allows for a clear visualization of the distribution of listings among different host names. The bar chart is effective for comparing the number of listings for each host name and identifying any notable differences or patterns.

##### 2. What is/are the insight(s) found from the chart?

we found that host name Sondar(NYC) has listed highest number of listings in Manhatten followed by Blueground and Kara

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially have a positive impact on business decisions. For example:

Identifying hosts with a large number of listings can help understand their market presence and influence on the platform. Recognizing patterns or disparities in the distribution of listings can inform strategies related to host engagement, relationship management, or incentivization.It is possible that certain insights could highlight issues such as monopolistic behavior or an imbalance in listing distribution, which may require appropriate actions to ensure a fair and competitive marketplace. Addressing such issues can ultimately contribute to positive growth and a more balanced ecosystem.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# Create a box plot of price by neighbourhood group
sns.boxplot(x='neighbourhood_group', y='price', data=airbnb)
plt.xlabel('Neighbourhood Group')
plt.ylabel('Price')
plt.title('Price Distribution by Neighbourhood Group')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the box plot for Price Distribution by Neighbourhood Group because it is an effective way to compare the price distribution across different Neighbourhood Groups. The box plot provides key statistical measures such as median, quartiles, and outliers, allowing for a comprehensive understanding of the price range and variability within each group.

##### 2. What is/are the insight(s) found from the chart?

i found that hosts which have properties located in Manhattan charges higher price which is followed by brooklyn which have second highest price

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially create a positive business impact. Understanding the price distribution by Neighbourhood Groups can inform various business decisions, such as setting competitive pricing strategies, targeting specific customer segments, or identifying opportunities for growth in certain areas.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
sns.barplot(data=area_reviews,x='neighbourhood_group',y='number_of_reviews')
plt.show()

##### 1. Why did you pick the specific chart?

I picked the bar chart to visualize the relationship between 'neighbourhood_group' and the number of reviews because it is an effective way to showcase the distribution of reviews across different neighbourhood groups. Bar charts are suitable for displaying categorical data, making it easy to compare review counts for each neighbourhood group.

##### 2. What is/are the insight(s) found from the chart?

The chart provides a clear comparison of the number of reviews received by each 'neighbourhood_group.' It shows which neighbourhood groups have higher total review counts, indicating their popularity and overall guest satisfaction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can indeed help create a positive business impact. The client can identify popular neighbourhood groups with higher review counts and leverage this information to strengthen their marketing efforts. By promoting listings in these popular areas, the client can attract more guests and increase bookings, resulting in positive growth.

#### Chart - 5

In [None]:
# Create a box plot of price by room type
sns.boxplot(x='room_type', y='price', data=airbnb)
plt.xlabel('Room Type')
plt.ylabel('Price')
plt.title('Price Distribution by Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the box plot for Price Distribution by Room Type because it is an effective way to compare the price distribution across different room types. The box plot allows for easy comparison of the median, quartiles, and potential outliers, providing insights into the price ranges and variability within each room type.

##### 2. What is/are the insight(s) found from the chart?

from this visualization we found that price of Entire home/apt is the highest as compared to other room type and the private room type shows second highest and we also found that people pay less for shared room as compared to all the room types

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially create a positive business impact. Understanding the price distribution by room types can inform pricing strategies, marketing campaigns, and resource allocation decisions.
For example, identifying room types with higher median prices can help businesses focus on promoting and optimizing those higher-value listings. Understanding the spread and presence of outliers can guide pricing adjustments and identify potential opportunities for revenue growth.
It is crucial to ensure fair pricing practices and avoid any potential discrimination in pricing based on room types.It may lead to negative growth

#### Chart - 6

In [None]:
# Chart - 6 visualization code
sns.barplot(data=busy_host.head(10),x='host_name',y='number_of_reviews',hue='room_type')
plt.title('Busiest host in terms of reviews')
plt.show()

##### 1. Why did you pick the specific chart?

The bar chart is a popular type of graph used to visually represent categorical data with rectangular bars.

##### 2. What is/are the insight(s) found from the chart?

The chart highlights the top 10 hosts with the highest number of reviews, indicating that these hosts are among the most popular and well-reviewed on Airbnb.
By observing the number of reviews for each host, the client can identify the most successful hosts and acknowledge their excellent performance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from this chart can help the client recognize and reward top-performing hosts. The client can foster positive relationships with these hosts, provide incentives or benefits, and encourage them to continue offering excellent service to guests.
By maintaining strong connections with top hosts, the client can improve guest satisfaction, leading to positive reviews and increased bookings.
To ensure overall growth, the client should consider implementing strategies to attract and encourage new hosts. This could involve offering training, support, and incentives to help new hosts establish a positive reputation and increase the diversity of listings.

#### Chart - 7

In [None]:
# Filter the dataset to include only relevant columns
df_price_host = airbnb[['host_name', 'price']].copy()
# sort the dataset by using sort_values() function
df_price_host=df_price_host.sort_values(by='price',ascending = False)

# Create a box plot of price by host name
plt.figure(figsize=(12, 6))  # Adjust the figure size for better visibility
sns.boxplot(x='host_name', y='price', data=df_price_host.head(10))
plt.xlabel('Host Name')
plt.ylabel('Price')
plt.title('host charges maximum price')
plt.xticks(rotation=45)  # Rotate x-axis labels for better readability
plt.show()

##### 1. Why did you pick the specific chart?

I chose the box plot for Price Distribution by Host Name because it is an effective way to compare the price distribution across different hosts. The box plot allows for easy comparison of the median, quartiles, and potential outliers in price distribution for each host, providing insights into the variation and spread of prices among hosts.

##### 2. What is/are the insight(s) found from the chart?

from this chart i found that 'Erin', 'Kathrine', 'Olson', 'Amy', 'Jessica', 'Jonathan', 'Jay And Liz', 'Linda', 'Omri', 'Bianca' are the top ten hosts who charges the maximum price

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially create a positive business impact. Understanding the price distribution by host names can inform various business decisions, such as identifying hosts with higher-value listings, optimizing pricing strategies, or identifying opportunities for revenue growth.


#### Chart - 8

In [None]:
# Chart - 8 visualization code
sns.barplot(data=traffic_areas,x='neighbourhood_group',y='minimum_nights',hue='room_type')
plt.show()

##### 1. Why did you pick the specific chart?

Bar chart is the best chart to visualize in which room type and which area people spent more night

##### 2. What is/are the insight(s) found from the chart?

The chart shows the average minimum number of nights required for listings in each neighbourhood group. It helps identify neighbourhood groups with higher or lower minimum night requirements.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from this chart can help the client set appropriate minimum night requirements for different neighbourhood groups. Depending on factors like the type of accommodations and guest preferences in each area, the client can adjust the minimum nights to attract more bookings and increase occupancy.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
sns.scatterplot(data=airbnb,x='price',y='minimum_nights',hue='neighbourhood_group')
plt.show()

##### 1. Why did you pick the specific chart?

Scatter plots are useful data visualizations that show the relationship between two numerical variables. They display individual data points as dots on a graph, where one variable is plotted on the x-axis, and the other variable is plotted on the y-axis.

##### 2. What is/are the insight(s) found from the chart?

we found that people like to stay most where price are affordable


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

this help us to create positive growth in bussiness because it help us set competative price according to the custumer satisfaction on that particular area

#### Chart - 10

In [None]:
# Create a violin plot of Price Distribution by Neighbourhood Group and Room Type
plt.figure(figsize=(12, 8))
sns.violinplot(x='neighbourhood_group', y='price', hue='room_type', data=airbnb)
plt.xlabel('Neighbourhood Group')
plt.ylabel('Price')
plt.title('Price Distribution by Neighbourhood Group and Room Type')
plt.legend(title='Room Type')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the violin plot for Price Distribution by Neighbourhood Group and Room Type because it provides a concise and informative visualization of the distribution of prices across different neighbourhood groups and room types. The violin plot allows for comparisons of the price distribution between groups and provides insights into the range, central tendency, and variation in prices within each group.

##### 2. What is/are the insight(s) found from the chart?

1-We can say that the Manhattan has the highest price range for the listings,*followed by Brooklyn

2-*Queens and Staten Island seem to have a very similar distribution*,

3-*The Bronx is the cheapest*.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially create a positive business impact. Understanding the Price Distribution by Neighbourhood Group and Room Type can inform pricing strategies, marketing efforts, and resource allocation decisions for Airbnb hosts and property managers.
For example, analyzing the violin plot can help identify variations in price distributions between neighbourhood groups. This insight can guide pricing decisions and help hosts set competitive prices based on the characteristics and demand of each neighbourhood group.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# Create a scatter plot of Price vs. Number of Reviews
plt.scatter(airbnb['number_of_reviews'], airbnb['price'])
plt.xlabel('Number of Reviews')
plt.ylabel('Price')
plt.title('Price vs. Number of Reviews')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the scatter plot for Price vs. Number of Reviews because it is a suitable chart for examining the relationship between two numerical variables. The scatter plot allows us to visualize the individual data points and identify any patterns or correlations between the Price and the Number of Reviews.

##### 2. What is/are the insight(s) found from the chart?

this visualization tell us that most number of people like stay where price is less. people reviews are higher in those properties where price are less

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially create a positive business impact. Understanding the relationship between price and the number of reviews can inform pricing strategies, marketing efforts, and customer satisfaction.
This insight can guide pricing decisions and potentially increase revenue by identifying the optimal price range to maximize customer satisfaction and review engagement.


#### Chart - 12

In [None]:
# Chart - 12 visualization code
#create a histogram of Price Distribution
plt.hist(airbnb['price'], bins=30, edgecolor='black')
plt.xlabel('Price')
plt.ylabel('Frequency')
plt.title('Price Distribution')
plt.show()



##### 1. Why did you pick the specific chart?

I chose the histogram for Price Distribution because it is a suitable chart for visualizing the distribution and frequency of prices in the Airbnb dataset. The histogram allows us to examine the shape, spread, and central tendency of the price values, providing insights into the overall distribution pattern.

##### 2. What is/are the insight(s) found from the chart?

this histogram tell us that most common price range lies between 0 to 1000. most number of hosts set thier price with in this range

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding the Price Distribution can inform pricing strategies, identify market trends, and support revenue management decisions.
For example, analyzing the histogram can help identify popular price ranges and price points where there is high demand or customer preference. This insight can guide pricing decisions to optimize revenue and maximize customer satisfaction.
It is important to consider market dynamics and ensure fair pricing practices to avoid negative growth or customer dissatisfaction.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
# Create a histogram of Number of Reviews Distribution
plt.hist(airbnb['number_of_reviews'], bins=30, edgecolor='black')
plt.xlabel('Number of Reviews')
plt.ylabel('Frequency')
plt.title('Number of Reviews Distribution')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the histogram for Number of Reviews Distribution because it is a suitable chart for visualizing the distribution and frequency of the number of reviews in the Airbnb dataset. The histogram allows us to examine the frequency or count of listings falling within different ranges of the number of reviews, providing insights into the overall distribution pattern.

##### 2. What is/are the insight(s) found from the chart?

From this viualization we found that number of reviews are higher in the range of 0 to 100 its mean that most of the hosts are enable to fullfill customer needs/customer satisfaction.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The gained insights can potentially create a positive business impact. Understanding the Number of Reviews Distribution can inform marketing strategies, customer satisfaction efforts, and reputation management decisions.
For example, analyzing the histogram can help identify popular ranges of the number of reviews, indicating high engagement and customer satisfaction. This insight can guide marketing campaigns, highlight highly reviewed listings, and contribute to building trust and attracting more customers.
it also indicate potential issues such as limited customer engagement, low review counts, or negative feedback.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
# Select the numerical columns from the dataset
numerical_columns = ['price', 'minimum_nights', 'number_of_reviews', 'calculated_host_listings_count', 'availability_365']

# Create a correlation matrix of the numerical columns
correlation_matrix = airbnb[numerical_columns].corr()

# Create a heatmap of the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation between Numerical Variables')
plt.show()


##### 1. Why did you pick the specific chart?

I chose the heatmap for visualizing the correlation between numerical variables because it provides a clear and concise representation of the strength and direction of relationships between variables. The heatmap allows for quick identification of variables that are highly correlated or inversely correlated, enabling insights into patterns and dependencies among the numerical variables.

##### 2. What is/are the insight(s) found from the chart?

The heatmap provides insights into the correlation between numerical variables. By examining the heatmap, we can identify variables that are strongly correlated, moderately correlated, or not correlated at all.
Positive correlations (values close to 1) indicate variables that tend to increase or decrease together, while negative correlations (values close to -1) indicate variables that tend to move in opposite directions.
The heatmap also helps identify variables with weak or no correlation (values close to 0), indicating that changes in one variable have little influence on the other variable.


#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
# Assuming you have loaded the Airbnb dataset into a DataFrame named 'airbnb'

# Selecting a subset of numerical columns for the pair plot
selected_columns = ['price', 'minimum_nights', 'number_of_reviews', 'calculated_host_listings_count','availability_365']

# Create a pair plot using seaborn
sns.pairplot(airbnb[selected_columns])

# Display the plot
plt.show()


##### 1. Why did you pick the specific chart?

The pairplot function in seaborn is a powerful and convenient visualization tool used for exploring relationships between multiple numeric variables in a dataset. It creates a matrix of scatter plots, histograms, or other visualizations, providing a quick and comprehensive view of the interactions between variables.

##### 2. What is/are the insight(s) found from the chart?

**Answer** The insights found from each chart were outlined in the individual chart descriptions. Some of the insights include popular neighborhood groups and room types, price distribution by neighborhood group and room type, host-wise listing counts, correlation between numerical variables, availability trends over time, and more. The insights varied based on the specific chart and the data analyzed.

chart-16

In [None]:
sns.scatterplot(data=airbnb,x='longitude',y='latitude',hue='neighbourhood_group')
plt.show()

1. Why did you pick the specific chart?

The scatter plot of longitude vs. latitude is chosen to visualize the spatial distribution of data points. As the dataset contains geographical information (longitude and latitude), this type of plot is well-suited to display the data on a map-like representation.

2. What is/are the insight(s) found from the chart?

From the scatter plot, we can observe the locations of various data points based on their longitude and latitude coordinates.
Clusters of data points may indicate popular areas or regions with a higher concentration of listings.
Outliers or isolated data points might represent unique or remote locations.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights from the scatter plot can have a positive business impact for hosts and businesses on the Airbnb platform.
Hosts can identify popular areas with high concentrations of listings (clusters) and consider focusing their efforts on these locations to attract more guests and increase occupancy rates.
On the other hand, isolated or outlier data points might represent less popular or remote locations. Hosts can use this information to adjust their pricing or marketing strategies in those areas or consider exploring new areas with high potential for growth.
By understanding the spatial distribution of listings, hosts can optimize their offerings and enhance the overall customer experience, potentially leading to positive reviews and increased customer satisfaction.

chart-17

In [None]:
#price hike
airbnb[airbnb.price<100].plot(kind='scatter', x='longitude',y='latitude',label='Map of Price Distribution',c='price',cmap=plt.get_cmap('jet'),colorbar=True,alpha=0.4)
plt.show()

1. Why did you pick the specific chart?

The scatter plot of longitude vs. latitude is chosen to visualize the spatial distribution of data points. As the dataset contains geographical information (longitude and latitude) on the basis of price, this type of plot is well-suited to display the data on a map-like representation.

2. What is/are the insight(s) found from the chart?

The information we got from the graph above is red color dots are the rooms with a higher price. Also, we can see that the Manhattan region has a more expensive room price.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

this graph help us to identify in which location price is high and which location price is low this will create positive impact in our bussiness

chart-18

In [None]:
#availability per year
plt.figure(figsize=(10,6))
plt.title("Neighbourhood Group vs. Availability Room",fontsize=16)
sns.boxplot(data=airbnb, x='neighbourhood_group',y='availability_365')
plt.xlabel('neighbourhood_group',fontsize=16)
plt.ylabel("availability_365",fontsize=16)
plt.show()

1. Why did you pick the specific chart?

2. What is/are the insight(s) found from the chart?

The above box plot shows the relationship between the availability room and neighborhood group.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

chart-19

In [None]:
#hotel share per location
plt.figure(figsize=(10,6))
plt.title("Neighbourhood Group",fontsize = 16)
plt.pie(airbnb.neighbourhood_group.value_counts(), labels=airbnb.neighbourhood_group.value_counts().index,autopct='%1.1f%%')
plt.show()


1. Why did you pick the specific chart?

The pie chart is a circular statistical chart used to display categorical data as proportions or percentages of a whole. It is primarily used to show the distribution of different categories and how each category contributes to the whole.

2. what insight you found from the graph

The pie and bar chart above shows that Airbnb Listings in Manhattan, and Brooklyn has the highest share of hotels.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Counting the number of entries (listings) in each area helps hosts and businesses on Airbnb by identifying high-demand locations, setting competitive prices, targeting marketing efforts effectively, and exploring new growth opportunities. It enhances listing visibility, attracts guests, and maximizes bookings and revenue, leading to a positive business impact.

chart-20

In [None]:
neigh_price_group = airbnb.groupby(['neighbourhood']).agg({'price':'median'}).reset_index()

plt.figure(figsize=(15,5))
sns.barplot(x = 'neighbourhood',y = 'price',data = neigh_price_group.head(10))
plt.title("Median_price Vs Neighbourhood")
plt.show()

1. Why did you pick the specific chart?

The bar chart is a popular type of graph used to visually represent categorical data with rectangular bars.

2. What is/are the insight(s) found from the chart?



From the bar chart, we can observe the median price  for each neighbourhood. Some neighbourhoods may have higher median prices, indicating they are more upscale or offer premium accommodations. On the other hand, some neighbourhoods may have lower average prices, making them more budget-friendly or catering to travelers seeking affordable options.

3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights gained from the bar chart can have a positive business impact. Hosts and property owners can use this information to adjust their pricing strategies and tailor their offerings to better meet customer demands. By understanding the price distribution across neighbourhoods, hosts can position their listings competitively, attracting guests who prefer specific price ranges or neighbourhoods.

For example, if a neighbourhood has a significantly higher average price, hosts can target guests seeking luxury accommodations or unique experiences. On the other hand, lower-priced neighbourhoods may attract budget-conscious travelers. By catering to different price preferences, hosts can potentially increase bookings and revenue.

However, there might be insights that lead to negative growth if the pricing strategy is not aligned with the target audience. For instance, if a host sets high prices in a neighbourhood with a predominantly budget-conscious guest base, it might result in lower bookings and negative customer feedback.

Hosts should also be cautious not to underprice listings in high-demand neighbourhoods, as it might lead to lower revenue potential. Striking the right balance between pricing and the target market's preferences is essential for a positive business impact.

In conclusion, the bar chart of neighbourhood vs. price provides valuable insights into the price distribution across different areas. Hosts can use this information to adjust their pricing strategies and target specific customer segments effectively, leading to a positive business impact. However, it is essential to understand the preferences of the target audience and align pricing accordingly to avoid potential negative growth.








## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the graphs that were drawn and the insights gained from the data visualizations, here are some suggestions for the client to achieve their business objectives:

1. **Price Optimization Strategy:** Analyze the price distribution by neighborhood group and room type using the box plot and bar charts. The client can set competitive prices for popular room types and neighborhood groups to attract more customers. Additionally, they can adjust prices based on seasonal demand and other factors.

2. **Customer Satisfaction Improvement:** Utilize the scatter plot of price vs. number of reviews to understand the relationship between price and customer satisfaction. The client can focus on optimizing the pricing strategy to enhance customer satisfaction, which may lead to positive reviews and increased bookings.

3. **Marketing Initiatives:** The pie chart of the proportion of room types and neighborhood groups can guide marketing initiatives. The client can focus marketing efforts on promoting the most popular room types and target specific neighborhood groups with tailored campaigns to attract potential customers.

4. **Host Performance Management:** The busy_host DataFrame, sorted by the number of reviews, can help the client identify high-performing hosts. The client can incentivize and reward these hosts to encourage them to continue providing excellent service and increase overall customer satisfaction.

5. **Additional Services Implementation:** Analyze the correlation heatmap to identify relationships between different numerical variables. For example, if there is a positive correlation between the number of reviews and calculated_host_listings_count, the client can consider offering additional services or benefits to hosts with multiple listings to boost their performance.

8. **Data-Driven Decision Making:** Encourage data-driven decision-making across all aspects of the business. Use visualizations and insights to identify patterns, trends, and opportunities that inform pricing, marketing, and customer experience strategies.

9. **Focus on High-Traffic Areas:** The traffic_areas DataFrame, which shows the count of minimum_nights for different neighborhood groups and room types, can help the client identify high-traffic areas. They can focus on promoting listings in these areas to maximize occupancy and revenue.

In conclusion, the data visualizations provide valuable insights for the client to make informed decisions. By optimizing pricing, understanding customer preferences, identifying high-performing hosts, and implementing data-driven strategies, the client can enhance their business performance and achieve their business objectives effectively. It is essential to continue analyzing and monitoring data trends to stay competitive in the dynamic Airbnb market.

# **Conclusion**

Based on the analysis of the provided charts and visualizations, the following final conclusions can be drawn:

1. Neighbourhood Group Distribution:
   - Manhattan and Brooklyn are the most popular neighbourhood groups in terms of the number of listings, making them attractive locations for hosts and businesses.
   - Neighbourhood groups with more reviews are likely to be popular tourist destinations, drawing potential customers to those areas.
   - Short minimum nights suggest that many guests are travelers, staying for shorter durations, which could be a valuable insight for hosts and businesses catering to such customers.

2. Room Type Distribution:
   - "Entire home/apt" and "Private room" are the most common room types in the dataset, indicating these are the preferred choices among guests.
   - Guests staying in "Entire home/apt" tend to stay longer in a neighbourhood compared to those staying in "Private room," which can impact pricing and occupancy rates for hosts.

3. Price Distribution:
   - Prices vary across neighbourhood groups, room types, and individual hosts, suggesting that multiple factors influence pricing decisions.
   - "Entire home/apt" and "Private room" listings in Manhattan and Brooklyn command higher prices, possibly due to higher demand and location benefits.
   - Hosts such as 'Erin', 'Kathrine', 'Olson', 'Amy', 'Jessica', 'Jonathan', 'Jay And Liz', 'Linda', 'Omri', 'Bianca' charge higher prices, indicating their premium offerings.

4. Customer Reviews:
   - Positive customer experiences play a significant role in attracting more reviews and enhancing the reputation of hosts and businesses.
   - Hosts should focus on providing excellent accommodations and services to encourage more positive reviews from guests.

5. Correlations:
   - The correlation analysis did not reveal strong relationships between price, reviews, and location, suggesting that other factors may influence these variables independently.

6. Neighbourhood Insights:
   - The dataset contains a total of 221 different areas, with "Williamsburg" having the maximum number of listings, which could be a potential target area for further business development.

7. Host Insights:
   - There are a total of 37,457 hosts, and the host with host ID 219517861, "Sonder," stands out as the top host with 327 listings, showcasing their popularity and success on the platform.

8. Location Insights:
   - Among the five different locations in the dataset, Manhattan emerges as the most crowded location with 44.3% of listings, making it a highly competitive market for hosts and businesses.

9. Top Busiest Hosts:
   - The top five busiest hosts are Dona, Jj, Maya, Carol, and Danielle, indicating their popularity and the high demand for their accommodations.

10. High Price Hosts:
    - Hosts such as 'Kathrine', 'Erin', 'Amy', 'Olson', 'Rum', 'Jessica', 'Sandra', 'Jay And Liz', 'Debra', 'Rasmus' charge maximum prices from customers, possibly offering unique and luxurious accommodations.

Overall, the analysis provides valuable insights for hosts and businesses on the Airbnb platform. Understanding customer preferences, competitive pricing, and popular locations can help hosts optimize their listings and attract more guests. By focusing on positive customer experiences, hosts can enhance their reputation and foster long-term relationships with guests, leading to business growth and success on the platform.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***