# Group AECA - "Airbnb in NYC: Market Trends & Impact"

## 1. Introduction 

### 1.1 Dataset Description
The Inside Airbnb dataset is a comprehensive collection of Airbnb listings in many major cities around the world. For our project, we will focus on the New York City dataset, which includes 377,784 records with each row representing an individual listing. The dataset features 75 columns that cover essential details like listing name, host identity verification, neighbourhood, room type, price per night, and availability. It also includes review metrics like the number of reviews, last review date, and ratings to provide insights into customer experiences. This dataset is useful for analyzing Airbnb market dynamics, host behaviours, pricing strategies, and customer preferences in New York City. 

### 1.2 Data Source
The dataset is publicly available on [Inside Airbnb] (https://insideairbnb.com/get-the-data/) under a Creative Commons Attribution 4.0 International License. The data is scraped from publicly available information on the Airbnb site. Afterwards, it is analyzed, cleansed, and aggregated by the collaborators of the project. The dataset is updated quarterly and includes information for the last twelve months. The version that we are using for our project was last updated on March 1, 2025. 


### 1.3 Team Members 

**Carol Zhang**: I am a fifth-year Commerce student with a minor in Data Science. I am interested in this dataset because Airbnb is a major player in the travel industry. Through my travels on exchange, I came to realize how important guest satisfaction is and how other reviews heavily influenced my decision-making when choosing accommodation. I want to better understand the factors that lead to positive guest experiences and make wiser choices the next time I travel.
  
**Ayuho Negishi**: I am a fifth-year Psychology student minoring in Data Science. My academic background has made me very interested in understanding human behaviour. I am especially curious how host strategies on Airbnb, like being a verified host, having multiple listings, or offering instant booking, influence guest satisfaction, booking frequency, and pricing. I want to find patterns in how these host behaviours affect the success of listings and the overall competitiveness in the Airbnb market.

**Erhan Asad Javed**: I'm a fourth-year Mathematics student with a minor in Data Science. This dataset is particularly interesting to me because it allows me to apply my practical skills in data cleaning, organization, and visualization to explore patterns in short-term rentals and urban housing dynamics. My goal is to uncover underlying trends in Airbnb pricing, host behaviour, guest satisfaction, and location-based factors affecting listings in New York City. Through this analysis, I aim to provide insights that can help travellers make informed decisions, hosts optimize their strategies, and policymakers better understand Airbnb’s impact on the housing market.

**Aaron Ma**: I’m a 3rd year Computer Science student. This dataset is particularly interesting to me because of the prevalence of AirBnB in a very common activity, travel. As someone who heavily relies on public reviews, I would like to see how hosts can leverage their tools such as being verified and having high response time, to gain an advantage against others in such a competitive space. I would like to better understand how Airbnb hosts garner advantages over one another through means that are not as obvious such as price. 

### 1.4 Intended Audience 
The intended audience for this project is Airbnb hosts and short-term rental operators. By analyzing location, pricing strategies, host behaviour, and guest satisfaction factors, this project aims to provide hosts with valuable insights into how they can optimize their listings for better performance. Hosts will learn how elements such as neighbourhood, room type, cancellation policies, and review ratings influence pricing and booking frequency. This information will help them refine their strategies, improve guest experiences, and increase bookings to stay competitive in a dynamic market.


## 2. About the Data 

### 2.1 Data Abstraction 

| Attribute Name                      | Attribute Type               | Data Semantics                                        | Cardinality                          |
|-------------------------------------|------------------------------|------------------------------------------------------|--------------------------------------|
| id                                  | Nominal                      | Unique identifier for each listing                   | 37,784                               |
| listing_url                         | Nominal                      | URL for listing on Airbnb                            | 37,784                               |
| scrape_id                           | Nominal                      | Unique identifier for data collection/scrape session | 1                                    |
| last_scraped                        | Temporal                     | Date of last data collection/scrape session          | 1                                    |
| source                              | Nominal                      | Where the data was sourced from                      | 2                                    |
| name                                | Nominal                      | Name of the listing                                  | 36,057                               |
| description                         | Nominal                      | Description of the listing                           | 31,144                               |
| neighborhood_overview               | Nominal                      | Description of the neighbourhood                     | 15,119                               |
| picture_url                         | Nominal                      | URL link to pictures of the property                 | 36,983                               |
| host id                             | Nominal                      | Unique identifier for each host                      | 22,323                               |
| host_url                            | Nominal                      | URL link to the host’s profile                       | 22,323                               |
| host_name                           | Nominal                      | Name of the host                                     | 8,495                                |
| host_since                          | Temporal                     | Date when the host started listing on Airbnb         | 5,095                                |
| host_location                       | Nominal                      | Location of the host                                 | 987                                  |
| host_about                          | Nominal                      | Host bios                                            | 11,679                               |
| host_response_time                  | Ordinal                      | Time taken by the host to respond                    | 4                                    |
| host_response_rate                  | Quantitative (Continuous)    | The response rate of the host                        | 59                                   |
| host_acceptance_rate                | Quantitative (Continuous)    | Acceptance rate of booking requests by the host      | 100                                  |
| host_is_superhost                   | Binary (Boolean)             | Whether the host is a Superhost                      | 2                                    |
| host_neighbourhood                  | Nominal                      | Neighbourhood of the host's primary location         | 521                                  |
| host_thumbnail_url                  | Nominal                      | URL link to the host’s thumbnail                     | 21,723                               |
| host_picture_url                    | Nominal                      | URL link to the host’s profile picture               | 21,723                               |
| host_listings_count                 | Quantitative (Discrete)      | Number of listings the host has on Airbnb            | 121                                  |
| host_total_listings_count           | Quantitative (Discrete)      | Total number of listings the host has (including other platforms) | 145                                  |
| host_verifications                  | Nominal                      | Types of verification that the host has gone through | 7                                    |
| host_has_profile_pic                | Binary (Boolean)             | Whether the host has a profile pic                   | 2                                    |
| host_identity_verified              | Binary (Boolean)             | Whether the host's identity is verified              | 2                                    |
| neighbourhood                        | Nominal                      | Unclear, no semantic meaning                         | 1                                    |
| neighbourhood_cleansed              | Nominal                      | Cleaned version of the specific neighbourhood location | 223                                  |
| neighbourhood_group_cleansed        | Nominal                      | Cleaned version of the neighbourhood group/area     | 5                                    |
| latitude                            | Quantitative (Continuous)    | Latitude of the listing                              | 23,085                               |
| longitude                           | Quantitative (Continuous)    | Longitude of the listing                             | 20,843                               |
| property_type                       | Nominal                      | Type of the property                                 | 69                                   |
| room_type                           | Nominal                      | Type of the room                                     | 4                                    |
| accommodates                        | Quantitative (Discrete)      | Number of people that the listing can accommodate    | 16                                   |
| bathrooms                           | Quantitative (Discrete)      | Number of bathrooms in the listing                   | 17                                   |
| bathrooms_text                      | Nominal                      | Description of the bathroom type e.g. shared or private bathroom | 31                                   |
| bedrooms                            | Quantitative (Discrete)      | Number of bedrooms in the listing                    | 14                                   |
| beds                                | Quantitative (Discrete)      | Number of beds in the listing                        | 19                                   |
| amenities                           | Nominal                      | List of amenities offered by the listing             | 30,453                               |
| price                               | Quantitative (Continuous)    | Price per night for the listing                      | 897                                  |
| minimum_nights                      | Quantitative (Discrete)      | Minimum number of nights required to book            | 121                                  |
| maximum_nights                      | Quantitative (Discrete)      | Maximum number of nights available for booking       | 255                                  |
| minimum_minimum_nights              | Quantitative (Discrete)      | Minimum value for minimum nights                     | 118                                  |
| maximum_minimum_nights              | Quantitative (Discrete)      | Maximum value for minimum nights                     | 140                                  |
| minimum_maximum_nights              | Quantitative (Discrete)      | Minimum value for maximum nights                     | 241                                  |
| maximum_maximum_nights              | Quantitative (Discrete)      | Maximum value for maximum nights                     | 240                                  |
| minimum_nights_avg_ntm              | Quantitative (Continuous)    | Average minimum number of nights required per month | 429                                  |
| maximum_nights_avg_ntm              | Quantitative (Continuous)    | Average maximum number of nights available per month | 989                                  |
| calendar_updated                    | Nominal                      | Unclear, no semantic meaning                         | 0                                    |
| has_availability                    | Binary (Boolean)             | Whether the listing has availability                 | 2                                    |
| availability_30                      | Quantitative (Discrete)      | Availability over the next 30 days                   | 31                                   |
| availability_60                      | Quantitative (Discrete)      | Availability over the next 60 days                   | 61                                   |
| availability_90                      | Quantitative (Discrete)      | Availability over the next 90 days                   | 91                                   |
| availability_365                     | Quantitative (Discrete)      | Availability over the next 365 days                  | 366                                  |
| calendar_last_scraped               | Temporal                     | Date when the availability information was last scraped from the calendar | 1                                    |
| number_of_reviews                   | Quantitative (Discrete)      | Number of reviews for the listing                    | 492                                  |
| number_of_reviews_ltm               | Quantitative (Discrete)      | Number of reviews in the last twelve months          | 175                                  |
| number_of_reviews_l30d              | Quantitative (Discrete)      | Number of reviews in the last 30 days                | 34                                   |
| first_review                         | Temporal                     | Date of the first review for the listing             | 4,284                                |
| last_review                          | Temporal                     | Date of the most recent review                       | 3,204                                |
| review_scores_rating                | Quantitative (Continuous)    | Average rating score from reviews                    | 163                                  |
| review_scores_accuracy              | Quantitative (Continuous)    | Accuracy rating from reviews                         | 152                                  |
| review_scores_cleanliness           | Quantitative (Continuous)    | Cleanliness rating from reviews                      | 180                                  |
| review_scores_checkin               | Quantitative (Continuous)    | Check-in process rating from reviews                 | 133                                  |
| review_scores_communication         | Quantitative (Continuous)    | Communication rating from reviews                    | 144                                  |
| review_scores_location              | Quantitative (Continuous)    | Location rating from reviews                         | 149                                  |
| review_scores_value                 | Quantitative (Continuous)    | Value for money rating from reviews                  | 166                                  |
| license                             | Nominal                      | Property license if available                        | 1,970                                |
| instant_bookable                    | Binary (Boolean)             | Whether the listing can be booked instantly          | 2                                    |
| calculated_host_listings_count      | Quantitative (Discrete)      | Number of listings listed by the host                | 73                                   |
| calculated_host_listings_count_entire_homes | Quantitative (Discrete)  | Number of entire home listings by the host           | 53                                   |
| calculated_host_listings_count_private_rooms | Quantitative (Discrete) | Number of private room listings hosted by the host  | 42                                   |
| calculated_host_listings_count_shared_rooms  | Quantitative (Discrete)  | Number of shared room listings hosted by the host   | 5                                    |
| reviews_per_month                   | Quantitative (Discrete)      | Average number of reviews per month for the listing  | 801                                  |
