## Plot all households on Google Map

This notebook is used to visulize the location of all households on the google map.     
The main task is to get the latitude and longitude for each zipcode and plot a heatmap for all households on the google map.     
Due to data confidentiality, cell outputs are not shown. 

In [None]:
!pwd

In [None]:
!ls

In [None]:
cd /Users/farahshih/Documents/Codes/Retail_Analytics/Panel_Data/

In [None]:
import datashader as ds
import datashader.transfer_functions as tf

In [None]:
import numpy as np
import pandas as pd

In [None]:
from datetime import datetime
import dateutil.parser as parser

In [None]:
import warnings
import sys
from glob import glob 

In [None]:
import os
from six.moves import cPickle 

In [None]:
from pylab import rcParams
rcParams['figure.figsize'] = 10, 7

In [None]:
from pyzipcode import Pyzipcode as pz

In [None]:
import gmaps
import gmaps.datasets

In [None]:
import itertools

In [None]:
%pylab inline

In [None]:
from matplotlib import pyplot as plt

## Data Exploration

### Load and extract data

In [None]:
## make a list of fine names
pl_path = glob('./2[0-1][0-1][0-9]/Annual_Files/panelists_*.tsv')
pl_path

In [None]:
print("Beginning to build Panel Dataset: ")
pl = pd.DataFrame()
for i, path in enumerate(pl_path):
    tmp = pd.read_csv(path, sep = '\t')
    pl = pl.append(tmp)
    print(pl.shape)

### Panel Dataset - Extract longitidue and latitude

In [None]:
pl.shape 

The size of the dataframe is 607,464 x 58

In [None]:
pl.columns

In [None]:
pl[['panelist_zip_code','fips_state_code', 'fips_state_descr', 'fips_county_code','fips_county_descr', 'region_code',
   'scantrack_market_descr']].head(6)

Convert zipcode into latitude and longitude

In [None]:
def get_lat_long(zipcode):
    place = pz.get(zipcode, "US") # return a dict containing location details
    loc = place.get('location')
    location = loc.get('lat'),loc.get('lng')
    return location

In [None]:
get_lat_long(94706)

In [None]:
test_zipcode = pl[['panelist_zip_code']]
test_zipcode.shape

In [None]:
zipcode_counts = pd.DataFrame(test_zipcode["panelist_zip_code"].value_counts())
zipcode_counts.head(6)

In [None]:
zipcode_counts.columns = ["counts"] #change column names
zipcode_counts.head(3)

In [None]:
int(zipcode_counts.loc[63125])

There are 21784 unique zipcode (including 4-digit zipcode)

In [None]:
len(unique(test_zipcode))  

In [None]:
zipcode_counts.shape

In [None]:
sum(zipcode_counts[zipcode_counts.index < 10000])  #There are 40560 households with only 4-digit zipcode

There are 19985 unique 5-digit zip code

In [None]:
zipcode_counts_5d = zipcode_counts[zipcode_counts.index >= 10000]
zipcode_counts_5d.shape

In [None]:
zipcode_counts_5d.loc[:,"lat"] = "NA"
zipcode_counts_5d.loc[:,"long"] = "NA"
zipcode_counts_5d.head(5)

In [None]:
zipcode_counts_5d.index

In [None]:
error = []

In [None]:
## Each day the limit of pyzipcode to use google api is around 2700
for i in zipcode_counts_5d.index[7947:10000]:
    try:
        lat_long = get_lat_long(i)
        zipcode_counts_5d.set_value(i,"lat", lat_long[0])
        zipcode_counts_5d.set_value(i,"long", lat_long[1])
    except:
        error.append(i)
        pass

In [None]:
len(error)

In [None]:
## export dataframe as a csv
zipcode_counts_5d.to_csv("/Users/farahshih/Documents/Codes/Retail_Analytics/zipcode_loc4.csv")

Convert lat and long columns into a list of tuples (This is for later mapping)

In [None]:
zipcode_counts_5d["counts"].loc[63125]

In [None]:
zipcode_counts_5d["counts"].iloc[0]

### Plot all households on Google Map

In [None]:
zipcode_counts_5d = pd.read_csv("/Users/farahshih/Documents/Codes/Retail_Analytics/Nielsen_Project/zipcode_loc_all.csv", index_col=0, parse_dates=False)

In [None]:
zipcode_counts_5d.head(3)

In [None]:
sum(pd.notnull(zipcode_counts_5d['lat']))

In [None]:
zipcode_counts_5d_clean = zipcode_counts_5d[pd.notnull(zipcode_counts_5d['lat'])]

In [None]:
sum(pd.notnull(zipcode_counts_5d_clean['lat']))

In [None]:
sum(zipcode_counts_5d_clean['counts'])

In [None]:
subset = zipcode_counts_5d_clean[['lat', 'long']]
tuples_loc = [tuple(x) for x in subset.values]

In [None]:
len(tuples_loc)

In [None]:
itertools.repeat(tuples_loc[0], 4)

In [None]:
## multiple each pair of (lat, long) by the frequency of each zipcode. Expand to full dataset for later map plotting.
geo_locations=[]
for i in range(0, len(zipcode_counts_5d_clean)):
    counts = zipcode_counts_5d_clean["counts"].iloc[i]
    for item in itertools.repeat(tuples_loc[i], counts):
        geo_locations.append(item)
geo_locations

In [None]:
len(geo_locations)

In [None]:
type(geo_locations)

In [None]:
gmaps.configure(api_key="AI...") # Fill in the Google API key  https://github.com/pbugnion/gmaps

In [None]:
households_map = gmaps.Map()
households_map.add_layer(gmaps.Heatmap(data=geo_locations))

In [None]:
households_map