# ETL Report


This project investigates AirBNB housing prices by zipcode, using two data sources:
1. A csv of NYC AirBNB listings found on Kaggle. This CSV lists the lat, lng of the individual AirBNB, but not the associated zipcode.
2. A library call to the python USZIPCODE library that performs a geolookup from the AirBNB lat and lngs to get the associated zipcode.
    - This library call retrieives an object that contains demographic data about the zipcode as well. 
    

## Data Collection & Data Cleaning

1. The first step in the process was downloading the AirBNB dataset from Kaggle. 
2. We read this into a pandas dataframe. The resulting df contained 48,895 listings. 
3. We filtered that df to only include listings that had 25 or greater reviews. This gave us a dataset with 11617 listings. 
4. Once we had our base dataset, we needed to perform geocode lookups.
    - The AirBNB dataset only had lat & lng coordinates.
    - To get the associated zipcode, we wrote a function to perform a coordinate lookup from the AirBNB file to the USZIPCODE library, which returned a result set that included the zipcode and a number of demographic data associated with that zipcode. 
    - We stored the entire zipcode_object as a dictionary in a new column within each row, which gave us the complete zip and demographic data for each listing. 
    - We wrote another function to pull out particular bits of information from the entire zipcode result set
        - zipcode
        - Median Home Value 
        - Median Household Income 
    - We could pull out lots of other information if we wanted to expand our analysis. 
5. After we run both the zipcode_obj retrieval function and the individual key-value lookups, we stored the resulting dataframe in PostgreSQL 
6. Because the zipcode_obj was stored as a dictionary, we had to tell sql_alchemy to treat that column as a JSON datatype so that it could properly insert the column into Postgre.

## Possible Reports

1. This dataset will allows to run a series of reports on AirBNB data:
    - Min, Max, Avg Value of AirBNB listings by Zipcode
        - This can be further broken down into separate categories: 
            - Entire Home
            - Private Room
            - Shared Room 
    - We can compare the AirBNB summary statistics vs median home value and median household income per zipcode to look for correlations and discrepencies between them
    - Having stored the entire USZIPCODE result set for each list, we could expand our relationship to look at other variables, such as population density