## NYC Airbnb Market Analysis
As a consultant working for a real estate start-up, you have collected Airbnb listing data from various sources to investigate the short-term rental market in New York. You'll analyze this data to provide insights on private rooms to the real estate company.



### Aims And Objectives

1. What are the dates of the earliest and most recent reviews? Store these values as two separate variables with your preferred names.
2. How many of the listings are private rooms? Save this into any variable.
3. What is the average listing price? Round to the nearest penny and save into a variable.
4. Combine the new variables into one DataFrame called review_dates with four columns in the following order: 
   first_reviewed, last_reviewed,    nb_private_rooms, and avg_price. The DataFrame should only contain one row of values.

### Data Gathering
Three files containing data on 2019 Airbnb listings will be used to answer these questions

### Importing the Airbnb Dataset

In [9]:
import pandas as pd
import datetime as dt
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ydata_profiling import ProfileReport


In [10]:
def Price_dataset():
    """"
    Load the data from the file
    
    """

    data = pd.read_csv('Airbnb/airbnb_price.csv')
    return data

def Room_Type_dataset():
    # Load the data from the file
    data = pd.ExcelFile("Airbnb/airbnb_room_type.xlsx")
    roomtype = data.parse(0)
    return roomtype

def Review_dataset():
    # Load the data from the file
    data = pd.read_csv('Airbnb/airbnb_last_review.tsv', sep='\t')
    return data

In [11]:
Price_dataset()

Unnamed: 0,listing_id,price,nbhood_full
0,2595,225 dollars,"Manhattan, Midtown"
1,3831,89 dollars,"Brooklyn, Clinton Hill"
2,5099,200 dollars,"Manhattan, Murray Hill"
3,5178,79 dollars,"Manhattan, Hell's Kitchen"
4,5238,150 dollars,"Manhattan, Chinatown"
...,...,...,...
25204,36425863,129 dollars,"Manhattan, Upper East Side"
25205,36427429,45 dollars,"Queens, Flushing"
25206,36438336,235 dollars,"Staten Island, Great Kills"
25207,36442252,100 dollars,"Bronx, Mott Haven"


In [5]:
# Remove White space an string characters
def replace_values_in_column(dataframe, column_name, string_to_replace, replace_with) -> pd.DataFrame:
    """
    This function Removes specific string characters from a column in a dataframe.

    Args:
            dataframe: The dataframe to be cleaned. Ensure that the dataframe has been imported before calling this function.

            column_name : The name of the column to be cleaned.
            
            string_to_replace : The string characters to be replaced in the column.

            replace_with : The string to replace the string characters with.

    Returns:
        The dataframe with the replaced values in the specified column.
    """
    dataframe[column_name] = dataframe[column_name].str.replace(string_to_replace, replace_with)
    return dataframe

# Now you can specify the string to replace when calling the function
Price = replace_values_in_column(Price_dataset(), 'price', ' dollars', '')

# Convert the price column to numeric
Price['price'] = pd.to_numeric(Price['price'])
print(Price["price"].describe())

count    25209.000000
mean       141.777936
std        147.349137
min          0.000000
25%         69.000000
50%        105.000000
75%        175.000000
max       7500.000000
Name: price, dtype: float64


# Detecting any outliers


In [13]:
#Profile = ProfileReport(Price, title="Pandas Profiling Report")
#display(Profile)


## Outliers
An outlier is a data point in a dataset that is distant from all other observation. Outliers can find their way into a dataset naturally through variability, or they can be the result of issues like human error, faulty equipment, or poor sampling. Regardless of how they get into the data, outliers can have a big impact on statistical analysis and machine learning because they impact calculations like mean and standard deviation, and they can skew hypothesis tests.  

In [15]:

def Seaborn_chats(Type, Dataset, x, y=None):
    """
    This function takes in the type of seaborn plot to be generated and the dataset to be used to generate the plot.
    Args:
        Type: The type of seaborn plot to be generated. It can be a barplot, scatterplot, boxplot, etc.
        Dataset: The dataset to be used to generate the plot.
        x: The x-axis of the plot.
        y: The y-axis of the plot. If not provided, only the x column will be plotted.
    Returns:
        The plot generated using the seaborn library.
    """
    sns.set(style="whitegrid")
    if Type == "barplot":
        return sns.barplot(data=Dataset, x=x, y=y)
    elif Type == "scatterplot":
        return sns.scatterplot(data=Dataset, x=x, y=y)
    elif Type == "boxplot":
        return sns.boxplot(data=Dataset, x=x, y=y)
    elif Type == "violinplot":
        return sns.violinplot(data=Dataset, x=x, y=y)
    elif Type == "stripplot":
        return sns.stripplot(data=Dataset, x=x, y=y)
    elif Type == "swarmplot":
        return sns.swarmplot(data=Dataset, x=x, y=y)
    elif Type == "countplot":
        return sns.countplot(data=Dataset, x=x)
    elif Type == "pointplot":
        return sns.pointplot(data=Dataset, x=x, y=y)
    elif Type == "lmplot":
        return sns.lmplot(data=Dataset, x=x, y=y)
    elif Type == "relplot":
        return sns.relplot(data=Dataset, x=x, y=y)
    elif Type == "catplot":
        return sns.catplot(data=Dataset, x=x, y=y)
    else:
        return "Invalid plot type"

In [23]:
Seaborn_chats("barplot", Price, "price")

#

  if pd.api.types.is_categorical_dtype(vector):
  plt.show()
