<a href="https://colab.research.google.com/github/priyadharsh73/airbnb_eda/blob/main/airbnb_eda.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# In-depth Exploratory Data Analysis of Airbnb Dataset

## Introduction
This notebook provides an in-depth exploratory data analysis (EDA) of the Airbnb dataset. The objective is to uncover insights and patterns in the data to better understand the factors influencing Airbnb listings.

## Data Loading and Preparation

### Import Libraries
First, import the necessary libraries for data manipulation, visualization, and mapping.


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import folium
from folium.plugins import MarkerCluster
import zipfile
import requests
from io import BytesIO

### Download and Extract ZIP File from GitHub

This step involves downloading the ZIP file containing the Airbnb dataset from GitHub and extracting the CSV file from it. The `requests` library is used to download the file, and the `zipfile` and `io` libraries are used to handle the extraction. The extracted CSV file is then read into a pandas DataFrame for further analysis.


In [7]:
# URL of the ZIP file on GitHub (direct download link)
url = 'https://github.com/priyadharsh73/airbnb_eda/raw/main/airbnb_dataset.zip'
filename = 'Airbnb_Open_Data.csv'

# Download the ZIP file
response = requests.get(url)
zip_file = BytesIO(response.content)

# Check if the response is valid
if response.status_code == 200:
    try:
        # Extract the ZIP file
        with zipfile.ZipFile(zip_file, 'r') as z:
            # List all files in the ZIP
            print(z.namelist())

            # Extract and read the CSV file
            with z.open(filename) as f:
                df = pd.read_csv(f, low_memory=False)

        # Display the first few rows of the DataFrame
        print(df.head())
    except zipfile.BadZipFile:
        print("Error: The file is not a valid ZIP file.")
else:
    print(f"Error: Failed to download the file. Status code: {response.status_code}")


['Airbnb_Open_Data.csv']
        id                                              NAME      host id  \
0  1001254                Clean & quiet apt home by the park  80014485718   
1  1002102                             Skylit Midtown Castle  52335172823   
2  1002403               THE VILLAGE OF HARLEM....NEW YORK !  78829239556   
3  1002755                                               NaN  85098326012   
4  1003689  Entire Apt: Spacious Studio/Loft by central park  92037596077   

  host_identity_verified host name neighbourhood group neighbourhood  \
0            unconfirmed  Madaline            Brooklyn    Kensington   
1               verified     Jenna           Manhattan       Midtown   
2                    NaN     Elise           Manhattan        Harlem   
3            unconfirmed     Garry            Brooklyn  Clinton Hill   
4               verified    Lyndon           Manhattan   East Harlem   

        lat      long        country  ... service fee minimum nights  \
0  40.6

### Data Cleaning
Handle missing values and duplicates to ensure the dataset is clean and ready for analysis.

In [8]:
# Drop rows with missing values
df.dropna(inplace=True)

# Drop duplicate rows
df.drop_duplicates(inplace=True)


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1 entries, 11114 to 11114
Data columns (total 26 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              1 non-null      int64  
 1   NAME                            1 non-null      object 
 2   host id                         1 non-null      int64  
 3   host_identity_verified          1 non-null      object 
 4   host name                       1 non-null      object 
 5   neighbourhood group             1 non-null      object 
 6   neighbourhood                   1 non-null      object 
 7   lat                             1 non-null      float64
 8   long                            1 non-null      float64
 9   country                         1 non-null      object 
 10  country code                    1 non-null      object 
 11  instant_bookable                1 non-null      object 
 12  cancellation_policy             1 non