# Part 2 - Mapping Yelp Search Results

## Obective

- For this CodeAlong, we will be working with the Yelp API results from last class. 
- You will load in the .csv.gz of your yelp results and prepare the data for visualization.
- You will use Plotly Express to create an interactive map with all of the results.

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- [Advanced Transformations with Pandas - Part 1](https://login.codingdojo.com/m/376/12529/88086)
- [Advanced Transformations with Pandas - Part 2](https://login.codingdojo.com/m/376/12529/88088)

### Goal

- We want to create a map with every restaurant plotted as a scatter plot with detailed information that appears when we hover over a business
- We will use plotly express's `px.scatter_mapbox` function to accomplish this.
    - https://plotly.com/python/scattermapbox/
    
    - We will need a Mapbox API token for some of the options:
        - https://studio.mapbox.com/
    

# Loading Data from Part 1

In [1]:
## Plotly is not included in your dojo-env
!pip install plotly



In [2]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import json

## importing plotly 
import plotly.express as px

In [5]:
## Load in csv.gz
df = pd.read_csv('Data/Seattle-pizza.csv.gz')
df.head()

EmptyDataError: No columns to parse from file

## Required Preprocessing 

- 1. We need to get the latitude and longitude for each business as separate columns.
- We also want to be able to show the restaurants:
    - name,
    - price range
    - address
    - and if they do delivery or takeout.

### Separating Latitude and Longitude

In [4]:
## use .apply pd.Series to convert a dict to columns
df['coordinates'].apply(pd.Series)

NameError: name 'df' is not defined

- Why didn't that work???

In [None]:
## slice out a single test coordinate
test_coord = df.loc[1, 'coordinates']
test_coord

- Its not a dictionary anymore!!! WTF??
    - CSV files cant store iterables (lists, dictionaries) so they get converted to strings.

### Fixing the String-Dictionaries

- The json module has another version of load and dump called `json.loads` and `json.dumps`
    - These are designed to process STRINGS instead of files. 
    
- If we use `json.loads` we can convert our string dictionary into an actual dictionary. 

In [None]:
## Use json.loads on the test coordinate
json.loads(test_coords)

- JSON requires double quotes!

In [None]:
## replace single ' with " 
test_coord =test_coord.replace("'", '"')
test_coord

In [None]:
## Use json.loads on the test coordinate, again
json.loads(test_coords)

### Now, how can we apply this same process to the entire column??

In [None]:
## replace ' with " (entire column)
df['coorinates']= df['coordinates'].str.replace("'", '"')
## apply json.loads
df['coorinates']= df['coorinates'].apply(json.loads)

In [None]:
## slice out a single test coordinate
test_coord = df.loc[5,'coordinates']
type(test_coord)

### Using Apply with pd.Series to convert a dictionary column into multiple columns

In [None]:
## use .apply pd.Series to convert a dict to columns
df['coordinates'].apply(pd.Series)

In [None]:
## Concatenate the 2 new columns and drop the original.
df = pd.concat([df,df['coordinates'].apply(pd.Series)],axis =1)
df =df.drop(columns= 'coordinates')
df.head(2)

## Creating a Simple Map

### Register for MapBox API

Mapbox API: https://www.mapbox.com/

In [None]:
## Load in mapbox api credentials from .secret
with open('/Users/scyjt/.secret/mapbox.json') as f:
    login = json.load(f)
login.keys()

- Use the plotly express `set_maptbox_acccess_token` function

In [None]:
## set mapbox token
px.set_mapbox_access_token(login['api_key'])

In [None]:
## use scatter_mapbox for M.V.P map
px.scatter_mapbox(df, lat= 'latitude',lon= 'logitude' ,mapbox_style='open-street-map')

### Adding Hover Data

- We want to show the restaurants:
    - name
    - price range
    - address
    - and if they do delivery or takeout.
    
    
- We can use the `hover_name` and `hover_data` arguments for `px.scatter_mapbox` to add this info!

In [None]:
## add hover_name (name) and hover_data for price,rating,location
px.scatter_mapbox(df, lat= 'latitude',lon= 'logitude' ,mapbox_style='open-street-map' ,hover_name='name' ,
                 hover_data= ['price','rating','location'])

### Fixing the Location Column

In [None]:
## slice out a test address
test_addr =df.loc[0, 'location']
test_addr

> Also a string-dictionary...

In [None]:
## replace ' with "
df['location'] = df['location'].str.replace("'", '"')
df

In [None]:
## apply json.loads
df['location'] = df['location'].apply(json.loads)
df

> Ruh roh....

- Hmm, let's slice out a test_address again and let's write a function to accomplish this instead.
    - We can use try and except in our function to get around the errors.

### Fixing Addresses - with a custom function


In [None]:
## slice out test address 
test_addr = df.loc[0, 'location']
test_addr

In [None]:
## write a function to just run json.loads on the address


In [None]:
## test applying our function


- It worked! Now let's save this as a new column (display_location),
and then let's investigate the businesses that had an "ERROR".

In [None]:
### save a new display_location column using our function


In [None]:
## filter for businesses with display_location == "ERROR"


In [None]:
## slice out a new test address and inspect
test_addr = df.loc[0, 'location']
test_addr

> After some more investigation, we would find a few issues with these "ERROR" rows.
1. They contained None.
2. They contained an apostrophe in the name.
3. ...?

### Possible Fixes (if we care to/have the time)


- Use Regular Expressions to find an fix the display addresses with "'" in them
- Use string split to split on the word display address.
    - Then use string methods to clean up

### Moving Forward without those rows (for now)

In [None]:
## remove any rows where display_location == 'ERROR'
def fix_address(test_addr):
    try:
        return json.loads(test_addr)
    except:
        return 'ERROR'

In [None]:
df['location'].apply(fix_address)

In [None]:
df['display_location'] = df['location'].apply(fix_address)

In [None]:
errors = df[df['display_location']=='ERROR']

In [None]:
test_addr = df.loc[854, 'location']
test_addr

In [None]:
df = df.loc[df['display_location']!='Error']
df.head(4)

- We want the "display_address" key from the "display_location" dictionaries.
- We could use a .apply and a lamda to slice out the desired key.

In [None]:
## use apply and lambda to slice correct key
df['display_address'] = df['display_location'].apply(lambda x :x['display_address']) 

- Almost done! We want to convert display_address to a string instead a list of strings.
- We can use the string method .join to do so!

In [None]:
## slice out a test_address
test_add =df.loc[339, 'display_address']
test_add

In [None]:
## test using .join with a "\n"
'\n'.join(test_add)

In [None]:
## apply the join to every row with a lambda
df['Address'] = df['display_address'].apply(lambda x: '\n'.join(x))

### Final Map

In [None]:
## make ourn final map and save as varaible


#### HTML Uses `<br>` instead of `\n`

In [None]:
## remake the final address column with <br> instead 

## plot the final map

In [None]:
## use fig.write_html to save map
