# Final Project Report: Airbnb & Neighborhood Crime
*Spring 2023, HCDE 410: Human Data Interation*

## Introduction
### Motivation and Problem Statement
My goal is to be more informed about the consequences of homestay properties in local neighborhoods. For the purpose of this project, I will mostly focus the scope on finding a correlation between Airbnbs and crime reports in Seattle, rather than trying to prove causation. As a stretch goal, I would also want to figure out a way to rate the severity of the crime, not just the quantity of the reports.

I plan to create a map visualization of Airbnbs and reported crime in Seattle. At first glance, this can seem useful for people who are worried about safety while planning trips or vacations in the Seattle area. However, it is important to note that the presence of crime reports is not a sufficient measure of safety. The decontextualization of large datasets like the ones used in this project can lead to distorted visualizations. Thus, it is important to be careful and thoughtful when presenting these findings, especially since this topic can instill fear in the public.

### Data Selected
For this project, I plan to use the following two datasets:
* [Seattle Police Department (SPD) Crime Data (2008-present)](https://data.seattle.gov/Public-Safety/SPD-Crime-Data-2008-Present/tazs-3rd5)
    * License: Public Domain
* [Inside Airbnb (skip to Seattle, Washington, United States)](http://insideairbnb.com/get-the-data/)
    * License: [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/)

Potential ethical considerations to using these datasets are anonymity and misinterpretation of crime severity. Making visualizations of reported crime in the same neighborhoods as the Airbnb listings might discourage people from booking and affect their reviews. Another consideration is that the data only contains reported crime. Particular types of crime could be underreported, meaning the data is not completely representative of Seattle.

### Background and Related Work
The effect of homestays has already been researched since the rise of companies like Airbnb. According to a [2021 study about Airbnb and neighborhood crime](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253315), their findings support “the notion that the prevalence of Airbnb listings erodes the natural ability of a neighborhood to prevent crime.” A [different study](https://thecrimereport.org/2021/08/06/is-there-a-link-between-airbnb-and-neighborhood-crime-rates/) focused on the different types of Airbnb listings and their correlation with crime. This inspired me to try to find the geospatial correlation between Airbnb clusters and the different types of crime and their severity.

## Research Questions
The goal of this project is to create a visualization of reported crime and Airbnbs in the Seattle area and confirm the relationship (or lack thereof) between Airbnbs and local crime. The motivation behind this project was to learn and see which Airbnbs are located in low-crime areas so I, and other users, can be more informed about the historic crime rates and consequences of homestay properties in local neighborhoods.

**Hypotheses:** There will be more Airbnbs with higher ratings in or near locations with lower rates of reported crime.

**Research Questions:**
* Which areas have the most and least Airbnbs? Crime rates?
* Which areas have the highest and lowest Airbnb ratings?
* Which offense types are the most and least frequent?
* What is the relationship between Airbnbs and crime rates?

## Data-Cleaning and Analysis
First, I want to import all the necessary Python libraries, such as the [matplotlib](https://matplotlib.org/stable/api/index.html) and [Plotly API](https://plotly.com/python/getting-started/).

In [None]:
import pandas as pd
import plotly.express as px

import numpy as np
import matplotlib
import matplotlib as mpl
import matplotlib.pyplot as plt

from urllib.request import urlopen
import json
import csv

Then, I can start getting the data ready from my two datasets. The following is my process for ["data wrangling"](https://en.wikipedia.org/wiki/Data_wrangling), or data munging, to make the raw data easier to work with.

### [SPD Crime Data (2008-present)](https://data.seattle.gov/Public-Safety/SPD-Crime-Data-2008-Present/tazs-3rd5)
* License: Public Domain

The *SPD Crime Data* has relevant information such as the report time/date, crime against category (i.e. property or society), offense, and location (longitude and latitude). It also contains other data like the offense id, sector, and precint that are less relevant but could be interesting to look at. There was a change in management systems for the SPD crime data in 2019, but since I will only look at the data in 2022-2023, this should not be an issue.

Since the *SPD Crime Data* contains over one million rows and 17 columns of data, the wiser choice is to access the [data via SODA API](https://dev.socrata.com/foundry/data.seattle.gov/tazs-3rd5) rather than downloading csv or tsv files. The [Socrata Open Data (SODA) API](https://dev.socrata.com/) allows you to access open data resources from governments.

### [Inside Airbnb (skip to Seattle, Washington, United States)](http://insideairbnb.com/get-the-data/)
* License: [Creative Commons Attribution 4.0 International License](https://creativecommons.org/licenses/by/4.0/)

The *Inside Airbnb* dataset contains data of Airbnb listings, reviews, and neighborhoods in Seattle, WA. The listings data in particular will be useful for map visualizations since it has the location (longitude and latitude), price, reviews, and other identifiers. The Airbnb data only shows quarterly data from the last year, up to March 24th, 2023. Since the SPD crime dataset is very large (17 columns and 1.05M rows), this works well since I plan to only use the data from the past year.

In [None]:
import csv
d = {}
with open('airbnb_listings.csv', mode='r') as f:
    data = csv.reader(f)
    d = {rows[0]:rows[17] for rows in data}
print(d) 

## Methodology

**Which areas have the most and least Airbnbs? Crime rates?**
To visualize this answer, I plan to create a [choropleth map using the Plotly API](https://plotly.com/python/mapbox-county-choropleth/). I will make sure to keep the dates the same for the Airbnbs and crime rates, and exclude the ones outside of that range.

**Which areas have the highest and lowest Airbnb ratings?**
To visualize this answer, I plan to create a representation of the ratings using the Plotly API.

**Which offense types are the most and least frequent?**
I plan to create a [heatmap using matplotlib](https://matplotlib.org/stable/gallery/images_contours_and_fields/image_annotated_heatmap.html) that contains the monthly count of each offense type in the past year in major Seattle areas (variable name MCPP).

**What is the relationship between Airbnbs and crime rates?**
I will analyze correlation by comparing the Airbnb listings and reported crime.

I also plan to use [kepler.gl for Jupyter](https://docs.kepler.gl/docs/keplergl-jupyter) for a more complex, geospatial data visualization. It will allow me to better customize the map to include the number of Airbnbs and its ratings, and the crime rates and its severity.

## Findings

## Discussion
### Limitations and Implications

## Conclusion