## CitySlip -- Data Analysis Details

## Zipcodes
A user enters a zip code, and the Python zipcodes package provides the city, state and lat/long. We first verify that the zip code actually represents a real location and not just a PO box or military APO by comparing the input ZIP to a free ZIP code CSV file (free-zipcode-database-Primary.csv).

Depending on the data we were retrieving, we used either the zip code or the lat/long coordinate as the input.

## Community Data

Onboard Informatics provides an API with access to property data, community, and school data. The community data contains demographic data and all sorts of statistics about a location (either zip code or lat/long coordinate). 

The data provided in the Community API is aggregated from a variety of public and private sources including:

The U.S. Census and other government agencies. USGS, EPA, FBI and local crime agencies. American Community Survey, Bureau of Economic Analysis, Bureau of Labor Statistics, Bureau of Transportation Statistics, CDC, Department of Defense, Federal Aviation Administration,Federal Financial Institutions Examination Council, Federal Housing Finance, IRS, NCES, National Center for Health Statistics, National Parks Service, Social Security, USPS, Mediamark Consumer Survey, Applied Geographic Solutions, School Digger, and Niche.

We selected the following pieces of info about the zip code:

a) Crime Risk

b) Sales Tax Rate

c) Average Temperature in January

d) Average Temperature in July

e) Age Demographics

We use the Onboard API, which is via HTTP connection. The request calls used a different syntax than what we were used to from Google, Twitter, etc. Formatting the URL is very similar, but the API key is stored in a header dictionary.  Onboard API documentation - https://www.onboardinformatics.com

### Request
    import http.client
    conn = http.client.HTTPSConnection("search.onboard-apis.com")
    headers = {
        'accept': "application/json",
        'apikey': "xxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyy",
        } 

    community_url = "/communityapi/v2.0.0/Area/Full/?"
    queries="AreaId=ZI"+target_zip
    query_url = community_url + queries
    conn.request("GET", query_url, headers=headers) 
    res = conn.getresponse()
    resp = json.loads(res.read())



### Response

The response returned is a very large JSON structure with hundreds of data points. One example of a field we used is:

    crime = resp['response']['result']['package']['item'][0]['crmcytotc']

From the response, age groups and numbers of people in those groups were stored in a DataFrame and plotted in a pie chart. We stored the other fields in a dictionary.


## Points of Interest (POIs)

We used the Google MAPs API to find the number of various points of interest within 5 miles of the lat/long coordinate. We selected the following POIs:

a) liquor_stores

b) gyms

c) parks

d) shopping_malls

e) grocery stores or supermarkets 

f) movie_theaters

We created the target urls and used requests to gather the necessary data, and then counted them. This is similar to a class exercise.

            target_url = "https://maps.googleapis.com/maps/api/place/radarsearch/json" \
                "?types=%s&location=%s,%s&radius=%s&key=%s" % (
                    target, target_area["lat"], target_area["lng"], target_radius, gkey)

            places_data = req.get(target_url).json()

            # use the len function to find the count of results
            numbers = len(places_data["results"])
            
We created a pie chart with the POIs.            

## Census Data

We used a census API to get the county and state for the input lat/long coordinate. Then we read a CSV file with census data to get the population for the county for years 2010-2016. A graph shows the change in population.

     API Info (No Key Required):  https://www.fcc.gov/general/census-block-conversions-api
     cen_block_url = ('http://data.fcc.gov/api/block/find?format=json&latitude=%s&longitude=%s&showall=true' % 
         (lat, lng))
     lat_lon_county = req.get(cen_block_url).json()
     county_name = lat_lon_county['County']['name']
     state_name = lat_lon_county['State']['name']
     county_census_pop = pd.read_csv('Resources/co-est2016-alldata.csv',\
                                encoding="ISO-8859-1").apply(lambda x: x.astype(str).str.lower())

## Real Estate Data (Zillow)

Zillow's API returns an XML response, rather than a JSON response. Instead of using the API, we decided to use Zillow CSV files:

MarketHealthIndex_Zip.csv -- provides an index about real estate market health of a zip code

Zip_Zhvi_AllHomes.csv -- provides home values by zip code by month for many years. We chose to use data from 2014 on because some zip codes did not contain data prior to 2014.

Zip_Zri_AllHomes.csv -- provides monthly rent data by zip code by month for many years. We chose to use data from 2014 on because some zip codes did not contain data prior to 2014.

These files were read into dataframes. In addition to gathering home value and monthly rent for the input zip code, we also calculated the mean home value and monthly rent of the entire database for the most recent month (Sept 2017). We used that later in the score calculation.

If Zillow didn't have data for the input zip code, we found surrounding zip codes using the Python zipcodes package, and averaged the values for those zip codes.

We plotted both home values and monthly rents for the input zip code.




## School Data

We used Onboard's API for schools to return the number of each type of schools within a radius of a lat/long coordinate. Onboard has data for public, private and Catholic schools. 

We use the Onboard API, which is via HTTP connection. The request calls used a different syntax than what we were used to from Google, Twitter, etc. Formatting the URL is very similar, but the API key is stored in a header dictionary.  Onboard API documentation - https://www.onboardinformatics.com

### Request

    conn = http.client.HTTPSConnection("search.onboard-apis.com") 
    school_url = "/propertyapi/v1.0.0/school/snapshot?"
    headers = { 
        'accept': "application/json", 
        'apikey': "xxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyy", 
        } 

    point = "latitude=" + str(lat) + "&longitude=" + str(lng) + "&radius=" + str(radius)
    query_url = school_url + point + "&pageSize=" + str(page_size)
 
    #print(query_url)
    
    #request the first page of school data
    conn.request("GET", query_url, headers=headers) 

    res = conn.getresponse()
    resp = json.loads(res.read())
    
### Response
    
    sch_type = resp['school'][i]['School']['Filetypetext']
    
We plotted the types of schools in a bar chart.    

## Walkability 

We used the Walkscore API to find out how walkable (pedestrian-friendly) a zip code is. 

Walkscore API – https://www.walkscore.com/professional/api.php

### Request
    walk_api_key = "ca8240c847695f334874949c406f04aa"
    walk_url = "http://api.walkscore.com/score?format=json&"
    
    query_url = walk_url  + "&lat=" + str(lat) + "&lon=" + str(lng) + "&transit=1&bike=1" + "&wsapikey=" + walk_api_key
    walk_response = req.get(query_url).json()

### Response
    walk_score = walk_response['walkscore']
    walk_description=walk_response['description']

### CitySlip Score

After retrieving all the above data and demographics, we used weighting factors for each to compute a score (range 0-100) for the input zip code. 

We stored the zip code, city, state, county and all the score components along with the date executed in a CSV file.

A future enhancement for additional analysis would be to plot the scores either in a scatter plot and/or a line plot, grouped by state. 

If this were a real product, we could track how a score changes over time.