# Coffee Shops of America
## Where can I find a good coffee shop and how does it compare to the rest of the country?

Like a lot of people, I love coffee. At first, I only drank coffee to wake myself up in the morning (which is still the case). However, I've started to enjoy well-made coffee when I get the chance. There are always the times where I will need to quickly stop by Starbucks because I'm late to class, but if I have the time I love to grab myself a well-made cup.

I try to find coffee shops that I have never been to before, and I'll come back to the ones I like the most. However, like I said before, I don't always have the time to visit every shop I want to, and I would like to drink the most good coffee I possibly can. 

What can help with this is data analysis. Using data about coffee shops via FourSquare API, I can find what coffee shops around me are the most likely to have the best cups of joe. Not only can this be done with Atlanta (my hometown), but it can be done with almost anywhere in the country!

In this project, coffee shops from around Atlanta and other parts of the country will be compiled and analyzed to determine which places have the best coffee according to other reviews and ratings. Various statistical analyses will be done to get a better idea of which cities and coffee shops have the best chance of having the best coffee.

## Data

The data for this project was obtained using FourSquare API. To achieve the goals of this project, the data used pertained to:

* 50 coffee shops in a 15 km radius from each city center
* The respective geographical coordinates for each coffee shop
* The average rating on a 10 point scale for each coffee shop
* The number of ratings each coffee shop has received on FourSquare

This data will allow us to do the required statistical analyses to achieve our goals.

## Libraries Required

We will first import the necessary libraries and packages for this project:

In [114]:
import numpy as np
import pandas as pd
import json

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim

import requests
import json
from pandas.io.json import json_normalize

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors

!conda install -c conda-forge folium=0.5.0 --yes
import folium

import math

print('Libraries imported!')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported!


Next the account information required to obtain data from FourSquare API is needed:

In [115]:
CLIENT_ID = 'DCRK2QOAWX4FSMDUJPGNPTIJZCQNRRER0I5NX0YKOUGF1GGV' # your Foursquare ID
CLIENT_SECRET = '20TZQPRCMC2ZIEMCPJHLP2RAR4RKCPISIAMRGLNACVSK2U1G' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)



Your credentails:
CLIENT_ID: DCRK2QOAWX4FSMDUJPGNPTIJZCQNRRER0I5NX0YKOUGF1GGV
CLIENT_SECRET:20TZQPRCMC2ZIEMCPJHLP2RAR4RKCPISIAMRGLNACVSK2U1G


## Data Retrieval

To obtain data from FourSquare API, the latitude and longitude coordinates of the city are needed. A specified radius to look for coffee shops is also required. To search for coffee shops specifically, the query in the url request is specified to 'coffee'.

In [133]:
city = 'Atlanta Metropolitan Area, GA'
latitude = 33.74278
longitude = -84.38889
radius = 15000

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={},&radius={}&limit={}&offset={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            'coffee',
            radius,
            50,
            0)
result = requests.get(url).json()

In [117]:
results = result['response']['venues']
venues_list = []
venues_list.append([(city, latitude, longitude,v['id'], v['name'],v['location']['lat'],
                   v['location']['lng']) for v in results])
atl_coffee = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
atl_coffee.columns = ['City','City Latitude','City Longitude','Coffee Shop id','Coffee Shop Name',
                     'Coffee Shop Latitude','Coffee Shop Longitude']

atl_coffee

Unnamed: 0,City,City Latitude,City Longitude,Coffee Shop id,Coffee Shop Name,Coffee Shop Latitude,Coffee Shop Longitude
0,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,5dd2ad70eb2f3e00088e6e53,Resurgens Coffee & News,33.74857,-84.39088
1,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4e495987ae6014a2fdc4ccf9,Revelator Coffee + Little Tart Bakeshop,33.746074,-84.372786
2,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,5ae5ea6c5d891b002c9afa08,Rosie's Coffee Cafe,33.753458,-84.402501
3,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4cd2349dfd42199c21ffc56e,Condesa Coffee,33.759731,-84.371839
4,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4fe220fae4b02702d7d533c0,Dancing Goats Coffee Bar,33.771385,-84.367509
5,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,42225f00f964a520b91f1fe3,Aurora Coffee,33.767034,-84.34952
6,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4a524805f964a5207cb11fe3,Caribou Coffee,33.762592,-84.385392
7,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4a871f92f964a520b60220e3,Drip Coffee Shop,33.740605,-84.357204
8,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4a0857f3f964a520b6731fe3,Caribou Coffee,33.760351,-84.386297
9,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4a030291f964a52079711fe3,Caribou Coffee,33.782122,-84.380645


Now we're going to do the same thing with some other cities that are famously known for coffee. The other cities we will extract data from are:

* New York City
* Seattle
* San Francisco
* Chicago
* Washington D.C
* Boston
* Denver

We will repeat the same process below with the other cities:

In [118]:
city = 'New York City, NY'
latitude = 40.7128
longitude = -74.0060

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={},&radius={}&limit={}&offset={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            'coffee',
            radius,
            50,
            0)
result = requests.get(url).json()

In [119]:
results = result['response']['venues']
venues_list = []
venues_list.append([(city, latitude, longitude,v['id'], v['name'],v['location']['lat'],
                   v['location']['lng']) for v in results])
nyc_coffee = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nyc_coffee.columns = ['City','City Latitude','City Longitude','Coffee Shop id','Coffee Shop Name',
                     'Coffee Shop Latitude','Coffee Shop Longitude']

In [120]:
city = 'Seattle, WA'
latitude = 47.6062
longitude = -122.3321

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={},&radius={}&limit={}&offset={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            'coffee',
            radius,
            50,
            0)
result = requests.get(url).json()

In [121]:
results = result['response']['venues']
venues_list = []
venues_list.append([(city, latitude, longitude,v['id'], v['name'],v['location']['lat'],
                   v['location']['lng']) for v in results])
sea_coffee = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
sea_coffee.columns = ['City','City Latitude','City Longitude','Coffee Shop id','Coffee Shop Name',
                     'Coffee Shop Latitude','Coffee Shop Longitude']

In [122]:
city = 'San Francisco, CA'
latitude = 37.7749
longitude = -122.4194

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={},&radius={}&limit={}&offset={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            'coffee',
            radius,
            50,
            0)
result = requests.get(url).json()

In [123]:
results = result['response']['venues']
venues_list = []
venues_list.append([(city, latitude, longitude,v['id'], v['name'],v['location']['lat'],
                   v['location']['lng']) for v in results])
sf_coffee = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
sf_coffee.columns = ['City','City Latitude','City Longitude','Coffee Shop id','Coffee Shop Name',
                     'Coffee Shop Latitude','Coffee Shop Longitude']

In [124]:
city = 'Chicago, IL'
latitude = 41.8781
longitude = -87.6298

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={},&radius={}&limit={}&offset={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            'coffee',
            radius,
            50,
            0)
result = requests.get(url).json()

In [125]:
results = result['response']['venues']
venues_list = []
venues_list.append([(city, latitude, longitude,v['id'], v['name'],v['location']['lat'],
                   v['location']['lng']) for v in results])
chi_coffee = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
chi_coffee.columns = ['City','City Latitude','City Longitude','Coffee Shop id','Coffee Shop Name',
                     'Coffee Shop Latitude','Coffee Shop Longitude']

In [126]:
city = 'Washington DC'
latitude = 38.9072
longitude = -77.0369

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={},&radius={}&limit={}&offset={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            'coffee',
            radius,
            50,
            0)
result = requests.get(url).json()

In [127]:
results = result['response']['venues']
venues_list = []
venues_list.append([(city, latitude, longitude,v['id'], v['name'],v['location']['lat'],
                   v['location']['lng']) for v in results])
dc_coffee = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
dc_coffee.columns = ['City','City Latitude','City Longitude','Coffee Shop id','Coffee Shop Name',
                     'Coffee Shop Latitude','Coffee Shop Longitude']

In [128]:
city = 'Boston, MA'
latitude = 42.3601
longitude = -71.0589

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={},&radius={}&limit={}&offset={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            'coffee',
            radius,
            50,
            0)
result = requests.get(url).json()

In [129]:
results = result['response']['venues']
venues_list = []
venues_list.append([(city, latitude, longitude,v['id'], v['name'],v['location']['lat'],
                   v['location']['lng']) for v in results])
bos_coffee = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
bos_coffee.columns = ['City','City Latitude','City Longitude','Coffee Shop id','Coffee Shop Name',
                     'Coffee Shop Latitude','Coffee Shop Longitude']

In [130]:
city = 'Denver, CO'
latitude = 39.7392
longitude = -104.9903

url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&query={},&radius={}&limit={}&offset={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            latitude, 
            longitude,
            'coffee',
            radius,
            50,
            0)
result = requests.get(url).json()

In [131]:
results = result['response']['venues']
venues_list = []
venues_list.append([(city, latitude, longitude,v['id'], v['name'],v['location']['lat'],
                   v['location']['lng']) for v in results])
den_coffee = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
den_coffee.columns = ['City','City Latitude','City Longitude','Coffee Shop id','Coffee Shop Name',
                     'Coffee Shop Latitude','Coffee Shop Longitude']

For this project, we will be using folium to map the locations and information of the different coffee shops in these cities:

In [132]:
coffee_map = folium.Map(location = [39.8283, -98.5795], zoom_start=4)
coffee_cities = [atl_coffee, nyc_coffee, sea_coffee, sf_coffee, chi_coffee, dc_coffee, bos_coffee, den_coffee]

for i in range(len(coffee_cities)):
    # add markers to map
    for lat, lng, name in zip(coffee_cities[i]['Coffee Shop Latitude'], 
                              coffee_cities[i]['Coffee Shop Longitude'],
                              coffee_cities[i]['Coffee Shop Name']):
        label = '{}'.format(name)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=3,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(coffee_map)  

    
coffee_map

Finally we will extract the average rating of each coffee shop as well as the number of ratings each coffee shop has. This is shown below:

In [134]:
for i in range(len(coffee_cities)):
    for j in range(coffee_cities[i].shape[0]):
        id_val = coffee_cities[i].loc[j,'Coffee Shop id']
        url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
        id_val,
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION)
        
        result = requests.get(url).json()
        results = result['response']['venue']
        if 'rating' not in results.keys():
            coffee_cities[i].loc[j,'Avg Rating'] = 0.0
            coffee_cities[i].loc[j,'# of Ratings'] = 0
        else:
            coffee_cities[i].loc[j,'Avg Rating'] = results['rating']
            coffee_cities[i].loc[j,'# of Ratings'] = results['ratingSignals']

Since this kind of call to the FourSquare API is a premium call, I am going to save these results to csv files so that this block of code does not need to be run again:

In [135]:
coffee_cities[0].to_csv('atl_coffee.csv')
coffee_cities[1].to_csv('nyc_coffee.csv')
coffee_cities[2].to_csv('sea_coffee.csv')
coffee_cities[3].to_csv('sf_coffee.csv')
coffee_cities[4].to_csv('chi_coffee.csv')
coffee_cities[5].to_csv('dc_coffee.csv')
coffee_cities[6].to_csv('bos_coffee.csv')
coffee_cities[7].to_csv('den_coffee.csv')

## Methodology

In this project, we will look at a couple of different significant ways to show the patterns of coffee shops and quality coffee shops. The list is as follows:

1) We will look at the distribution of coffee shops in Atlanta and the ratings of coffee shops in different areas. This will be done by making a grid of the metropolitan area and showing the average rating of coffee shops in each grid. The amount of ratings each coffee shop has is important so that a better idea of how quality a coffee shop is can be obtained.

2) Next we will compile the ratings of coffee shops in each of the cities listed above, and the average rating of coffee shops in each city will be determined. 

3) Finally, further statistical analyses will be done using the 'describe' method. Other plots of the data will be used to show how ratings and number of ratings can show which coffee shops have the best chance to have quality coffee.

## Analysis

### Distribution of Coffee Shops in Atlanta

The first analysis to be done is see the geographical distribution of coffee shops in Atlanta. This can give a good idea of where it is easy to find a coffee shop in the metropolitan area. 

In [175]:
atl_coffee_map = folium.Map(location = [coffee_cities[0].loc[0,'City Latitude'], coffee_cities[0].loc[0,'City Longitude']], zoom_start=12)

for lat, lng, name, rating in zip(coffee_cities[0]['Coffee Shop Latitude'], 
                          coffee_cities[0]['Coffee Shop Longitude'],
                          coffee_cities[0]['Coffee Shop Name'],
                          coffee_cities[0]['Avg Rating']):
        label = '{}: {}'.format(name, rating)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(atl_coffee_map)   

    
atl_coffee_map

Looking at the map above, coffee shops are more concentrated in the center of the city. There may be just as many coffee shops in other areas of the metropolitan area, but they are more spaced out. In terms of conveniently finding coffee shops, it will be easier to find a lot of different coffee shops if you go to the city center than if you go a little outside the main part of the city.

Next, we will find the furthest coffee shop away from the city center and use that as a border for a grid on the city.

In [137]:
def get_distance(lat, lng):
    cent_lat = 33.74278
    cent_lng = -84.38889
    dist = np.sqrt((lat - cent_lat)**2 + (lng - cent_lng)**2)
    return dist

In [142]:
dists = []
for i in range(coffee_cities[0].shape[0]):
    lat = coffee_cities[0].loc[i,'Coffee Shop Latitude']
    lng = coffee_cities[0].loc[i,'Coffee Shop Longitude']
    dists.append(get_distance(lat, lng))

max_dist = max(dists)
print('max coordinate distance: {}'.format(max(dists)))
print('Furthest Coffee Shop: {}'.format(coffee_cities[0].loc[np.where(dists == max(dists))[0][0],'Coffee Shop Name']))

max coordinate distance: 0.12741764808765577
Furthest Coffee Shop: Good Karma Coffee House


Looks like Good Karma Coffee House is the furthese shop from the city center. We'll create a 3x3 grid  on Atlanta that uses this distance and categorize coffee shops into each grid.

First we'll create the boundaries and the increments of the latitude and longitude coordinates:

In [152]:
cent_lat = 33.74278
cent_lng = -84.38889
up_lat = cent_lat + max_dist
down_lat = cent_lat - max_dist
up_lng = cent_lng + max_dist
down_lng = cent_lng - max_dist
lat_inc = (up_lat - down_lat)/4
lng_inc = (up_lng - down_lng)/4

Then we'll create a function that classifies each coffee shop into each respective grid segment:

In [168]:
def grid_classifier(lat, lng):
    if lng < (down_lng + lng_inc):
        lng_class = 1
    elif lng < (down_lng + 2*lng_inc):
        lng_class = 2
    elif lng < (down_lng + 3*lng_inc):
        lng_class = 3
    else:
        lng_class = 4
        
    if lat < (down_lat + lat_inc):
        lat_class = 4
    elif lat < (down_lat + 2*lat_inc):
        lat_class = 3
    elif lat < (down_lat + 3*lat_inc):
        lat_class = 2
    else:
        lat_class = 1
      
    grid_class = (lat_class-1)*4 + lng_class
    return grid_class

Apply it to the dataframe:

In [169]:
for i in range(coffee_cities[0].shape[0]):
    lat = coffee_cities[0].loc[i,'Coffee Shop Latitude']
    lng = coffee_cities[0].loc[i,'Coffee Shop Longitude']
    coffee_cities[0].loc[i,'Grid Class'] = grid_classifier(lat, lng)

To show how the coffee shops are classified, let's show which grid each is in by color:

In [187]:
atl_coffee_map = folium.Map(location = [coffee_cities[0].loc[0,'City Latitude'], coffee_cities[0].loc[0,'City Longitude']], zoom_start=12)
color_list = ['red','darkred','orange','yellow','lightgreen','green','light blue','blue', 'darkblue', 'violet','purple','pink','white','lightgray','gray','black']
for lat, lng, grid_class, name in zip(coffee_cities[0]['Coffee Shop Latitude'], 
                          coffee_cities[0]['Coffee Shop Longitude'],
                          coffee_cities[0]['Grid Class'],
                          coffee_cities[0]['Coffee Shop Name']):
        label = '{}'.format(lat)
        label = folium.Popup(label, parse_html=True)
    
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup = label,
            color=color_list[int(grid_class)],
            fill=True,
            fill_color=color_list[int(grid_class)],
            fill_opacity=0.7,
            parse_html=False).add_to(atl_coffee_map)   

    
atl_coffee_map

Looks like Good Karma Coffee House is so far away it messes up the rest of the grid. To get a better variety in each grid, we will drop this instance:

In [205]:
#coffee_cities[0].drop(np.where(coffee_cities[0]['Coffee Shop Name'] == 'Good Karma Coffee House')[0][0], inplace=True)
#coffee_cities[0].drop(np.where(coffee_cities[0]['Coffee Shop Name'] == 'The Coffee Bean & Tea Leaf')[0][0], inplace=True)
#coffee_cities[0].drop(np.where(coffee_cities[0]['Coffee Shop Latitude'] == 33.80958557128906)[0][0], inplace=True)
#coffee_cities[0].drop(np.where(coffee_cities[0]['Coffee Shop Latitude'] == 33.8439126)[0][0], inplace=True)
#coffee_cities[0].drop(np.where(coffee_cities[0]['Coffee Shop Latitude'] == 33.841867)[0][0], inplace=True)
#coffee_cities[0].drop(['level_0', 'index'], axis=1, inplace=True)
#coffee_cities[0].reset_index(inplace=True)

In [206]:
dists = []
for i in range(coffee_cities[0].shape[0]):
    lat = coffee_cities[0].loc[i,'Coffee Shop Latitude']
    lng = coffee_cities[0].loc[i,'Coffee Shop Longitude']
    dists.append(get_distance(lat, lng))

max_dist = max(dists)
print('max coordinate distance: {}'.format(max(dists)))
print('Furthest Coffee Shop: {}'.format(coffee_cities[0].loc[np.where(dists == max(dists))[0][0],'Coffee Shop Name']))

cent_lat = 33.74278
cent_lng = -84.38889
up_lat = cent_lat + max_dist
down_lat = cent_lat - max_dist
up_lng = cent_lng + max_dist
down_lng = cent_lng - max_dist
lat_inc = (up_lat - down_lat)/4
lng_inc = (up_lng - down_lng)/4

for i in range(coffee_cities[0].shape[0]):
    lat = coffee_cities[0].loc[i,'Coffee Shop Latitude']
    lng = coffee_cities[0].loc[i,'Coffee Shop Longitude']
    coffee_cities[0].loc[i,'Grid Class'] = grid_classifier(lat, lng)
    
atl_coffee_map = folium.Map(location = [coffee_cities[0].loc[0,'City Latitude'], coffee_cities[0].loc[0,'City Longitude']], zoom_start=12)
color_list = ['red','darkred','orange','yellow','lightgreen','green','light blue','blue', 'darkblue', 'violet','purple','pink','white','lightgray','gray','black']
for lat, lng, grid_class, name in zip(coffee_cities[0]['Coffee Shop Latitude'], 
                          coffee_cities[0]['Coffee Shop Longitude'],
                          coffee_cities[0]['Grid Class'],
                          coffee_cities[0]['Coffee Shop Name']):
        label = '{}'.format(name)
        label = folium.Popup(label, parse_html=True)
    
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup = label,
            color=color_list[int(grid_class)],
            fill=True,
            fill_color=color_list[int(grid_class)],
            fill_opacity=0.7,
            parse_html=False).add_to(atl_coffee_map)   

    
atl_coffee_map

max coordinate distance: 0.058851375093132
Furthest Coffee Shop: San Francisco Coffee Roasting Co.


Now we will determine the average weighted rating for each coffee shop in each part of the grid.

In [239]:
weighted_ratings = []
for i in range(16):
    grid_piece = coffee_cities[0].loc[np.where(coffee_cities[0]['Grid Class'] == i+1)]
    if grid_piece.shape[0] == 0:
        weighted_ratings.append(0.0)
    else:
        total = 0
        for j in range(grid_piece.shape[0]):
            Rating_per = grid_piece.iloc[j,9] / grid_piece.iloc[:,9].sum()
            Weighted_Rat = grid_piece.iloc[j,8] * Rating_per
            total += Weighted_Rat
        total = round(total, 2)
        weighted_ratings.append(total)

  if __name__ == '__main__':


In [240]:
weighted_ratings

[0.0,
 8.73,
 7.8,
 7.8,
 0.0,
 7.3,
 8.67,
 7.74,
 0.0,
 nan,
 0.0,
 8.28,
 0.0,
 0.0,
 0.0,
 0.0]

I noticed the 'nan' value and looked back and saw that this grid piece only had one coffee shop. I manually adjusted it to be that coffee shop's rating.

In [241]:
grid_piece = coffee_cities[0].loc[np.where(coffee_cities[0]['Grid Class'] == 10)]
weighted_ratings[9] = grid_piece.iloc[0,8]
weighted_ratings

[0.0,
 8.73,
 7.8,
 7.8,
 0.0,
 7.3,
 8.67,
 7.74,
 0.0,
 0.0,
 0.0,
 8.28,
 0.0,
 0.0,
 0.0,
 0.0]

Finally we'll apply this to the map!

In [258]:
cent_lat = 33.74278
cent_lng = -84.38889
up_lat = cent_lat + max_dist
down_lat = cent_lat - max_dist
up_lng = cent_lng + max_dist
down_lng = cent_lng - max_dist
lat_inc = (up_lat - down_lat)/4
lng_inc = (up_lng - down_lng)/4

lat_list = [up_lat, up_lat-lat_inc, up_lat-2*lat_inc, up_lat-3*lat_inc]
lng_list = [down_lng, down_lng+lng_inc, down_lng+2*lng_inc, down_lng+3*lng_inc]
color_list = ['red','darkred','orange','yellow','lightgreen','green','light blue','blue', 'darkblue', 'violet','purple','pink','white','lightgray','gray','black']

atl_coffee_map = folium.Map(location = [coffee_cities[0].loc[0,'City Latitude'], coffee_cities[0].loc[0,'City Longitude']],tiles = 'Stamen Terrain', zoom_start=12)
for i in range(len(lat_list)):
    for j in range(len(lng_list)):
        grid_class = 4*i+j
        label = '{}'.format(weighted_ratings[grid_class])
        label = folium.Popup(label, parse_html=True)
        
        folium.CircleMarker(
            [lat_list[i], lng_list[j]],
            radius=5,
            popup = label,
            color=color_list[int(grid_class)],
            fill=True,
            fill_color=color_list[int(grid_class)],
            fill_opacity=0.7,
            parse_html=False).add_to(atl_coffee_map)
        
atl_coffee_map

And now we've got some grid pieces of Atlanta showing the average coffee shop rating in that area. Many of the segments have an average rating of zero, which makes sense because only 50 coffee shops were extracted for this project and most of them will come from closer to the "city center" I defined.

The two highest ratings are 8.73 and 8.67. Let's zoom into these two specific areas.

In [262]:
grid_class = 2
lat = lat_list[0]
lng = lng_list[1]
atl_coffee_map = folium.Map(location = [lat, lng],tiles = 'Stamen Terrain', zoom_start=13)

grid_piece = coffee_cities[0].loc[np.where(coffee_cities[0]['Grid Class'] == 2)]

for lat_1, lng_1, name, rating in zip(grid_piece['Coffee Shop Latitude'], 
                          grid_piece['Coffee Shop Longitude'],
                          grid_piece['Coffee Shop Name'],
                          grid_piece['Avg Rating']):
        label = '{}: {}'.format(name, rating)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat_1, lng_1],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(atl_coffee_map)   

    
atl_coffee_map

There's 5 coffee shops here, and 3 have ratings. Octane Coffee and Chattahoochee Coffee Company have very good ratings, and they have a significant number of ratings as well. There's also a decent chance that since there's good coffee shops here, then the shops that don't have reviews also have good coffee.

In [263]:
grid_class = 7
lat = lat_list[1]
lng = lng_list[2]
atl_coffee_map = folium.Map(location = [lat, lng],tiles = 'Stamen Terrain', zoom_start=13)

grid_piece = coffee_cities[0].loc[np.where(coffee_cities[0]['Grid Class'] == 7)]

for lat_1, lng_1, name, rating in zip(grid_piece['Coffee Shop Latitude'], 
                          grid_piece['Coffee Shop Longitude'],
                          grid_piece['Coffee Shop Name'],
                          grid_piece['Avg Rating']):
        label = '{}: {}'.format(name, rating)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat_1, lng_1],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(atl_coffee_map)   

    
atl_coffee_map

There's A LOT of coffee shops in this grid piece. They're all very close to each other, so if I wanted to visit multiple places at once, this is a great area to go to! It looks like Dancing Goats Coffee Bar has the highest rating with a 9.1. I have actually been to this place and can confirm it has amazing coffee!

So far, we have looked at coffee shops in Atlanta alone, how they are distributed, and how different areas in Atlanta have different amounts and ratings of coffee shops. So how does Atlanta coffee compare to other cities in the country?

## Comparing Coffee Shops Across the Country

To do this, we will take a similar approach as the previous section, however we won't separate the cities into grid pieces. Instead, we'll use all of the coffee shops in each list to create an average weighted rating for coffee shops in each city.

We'll first restore the original dataframe for the Atlanta coffee shops so that it is consistent with the other city dataframes:

In [284]:
coffee_cities[0] = pd.read_csv('atl_coffee.csv')

In [286]:
#coffee_cities[0].drop('Unnamed: 0', axis=1,inplace=True)

In [288]:
coffee_cities[0].head()

Unnamed: 0,City,City Latitude,City Longitude,Coffee Shop id,Coffee Shop Name,Coffee Shop Latitude,Coffee Shop Longitude,Avg Rating,# of Ratings
0,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,5dd2ad70eb2f3e00088e6e53,Resurgens Coffee & News,33.74857,-84.39088,0.0,0.0
1,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4e495987ae6014a2fdc4ccf9,Revelator Coffee + Little Tart Bakeshop,33.746074,-84.372786,8.6,533.0
2,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,5ae5ea6c5d891b002c9afa08,Rosie's Coffee Cafe,33.753458,-84.402501,7.3,10.0
3,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4cd2349dfd42199c21ffc56e,Condesa Coffee,33.759731,-84.371839,8.7,270.0
4,"Atlanta Metropolitan Area, GA",33.74278,-84.38889,4fe220fae4b02702d7d533c0,Dancing Goats Coffee Bar,33.771385,-84.367509,9.1,524.0


Now we'll do the same weighted rating calculations for each dataframe:

In [289]:
weighted_ratings = []
for i in range(len(coffee_cities)):
    for j in range(coffee_cities[i].shape[0]):
        coffee_cities[i].loc[j,'% of Ratings'] = coffee_cities[i].loc[j,'# of Ratings'] / coffee_cities[i]['# of Ratings'].sum()
        coffee_cities[i].loc[j,'Weighted Rating'] = coffee_cities[i].loc[j,'Avg Rating'] * coffee_cities[i].loc[j,'% of Ratings']
    total = coffee_cities[i]['Weighted Rating'].sum()
    total = round(total, 2)
    weighted_ratings.append(total)

Finally, we'll apply this info to a map!

In [290]:
coffee_map = folium.Map(location = [39.8283, -98.5795], zoom_start=4)
city_names = ['Atlanta', 'New York City', 'Seattle', 'San Francisco', 'Chicago', 'Washington DC', 'Boston', 'Denver']
for i in range(len(coffee_cities)):
    lat = coffee_cities[i].loc[0,'City Latitude']
    lng = coffee_cities[i].loc[0,'City Longitude']
    label = '{}: {}'.format(city_names[i], weighted_ratings[i])
    label = folium.Popup(label, parse_html=True)
    folium.Marker([lat, lng], popup=label).add_to(coffee_map)  

    
coffee_map

Displayed above is the average rating of coffee shops in each city. From this approach, San Francisco has the highest average rating with an 8.79, whereas Boston has the lowest average rating with a 7.83. Atlanta actually has a relatively high average score with an 8.4, which is the 4th highest rating out of the cities listed. 

One thing that was surprising was that Atlanta had a higher average rating than Seattle, which is known for having good coffee. There's a lot of explanations for this, but we will explore this more in the next section:

## Further Statistical Analysis on City Ratings

Having an overall weighted average rating of each coffee shop in each city is great, but what other statistics can give us a better look at which cities have good coffee shops? We'll start with using the ".describe()" method on each dataframe to see some well-known valuable statistics for this insight:

In [291]:
describe_tables = []
for i in range(len(coffee_cities)):
    describe_tables.append(coffee_cities[i].describe())

Right now we have different stats tables for each city. For conveniency's sake, let's convert this into tables relating to each statistic instead of each city:

In [303]:
#Create Table of Means
mean_table = []
for i in range(len(describe_tables)):
    means  = describe_tables[i].loc['mean',:].transpose()
    mean_table.append(means)
    
mean_table = pd.DataFrame(mean_table)
mean_table.index = city_names
mean_table

Unnamed: 0,City Latitude,City Longitude,Coffee Shop Latitude,Coffee Shop Longitude,Avg Rating,# of Ratings,% of Ratings,Weighted Rating
Atlanta,33.74278,-84.38889,33.764301,-84.380128,3.95,75.58,0.02,0.16806
New York City,40.7128,-74.006,40.716835,-74.000577,6.064,314.5,0.02,0.170741
Seattle,47.6062,-122.3321,47.611214,-122.330848,6.342,105.16,0.02,0.166603
San Francisco,37.7749,-122.4194,37.774805,-122.419366,6.436,520.48,0.02,0.175891
Chicago,41.8781,-87.6298,41.885681,-87.635718,4.524,110.4,0.02,0.170376
Washington DC,38.9072,-77.0369,38.905513,-77.033848,6.314,89.34,0.02,0.161011
Boston,42.3601,-71.0589,42.357427,-71.062908,3.724,53.02,0.02,0.156515
Denver,39.7392,-104.9903,39.740912,-104.989144,5.042,62.62,0.02,0.162124


In [306]:
#Create Table of Standard Deviations
describe_tables[0]
stdev_table = []
for i in range(len(describe_tables)):
    stdev = describe_tables[i].loc['std',:].transpose()
    stdev_table.append(stdev)
    
stdev_table = pd.DataFrame(stdev_table)
stdev_table.index = city_names
stdev_table

Unnamed: 0,City Latitude,City Longitude,Coffee Shop Latitude,Coffee Shop Longitude,Avg Rating,# of Ratings,% of Ratings,Weighted Rating
Atlanta,2.15327e-14,5.742052e-14,0.028563,0.02556,4.028685,149.633306,0.039596,0.346754
New York City,7.177566e-15,4.306539e-14,0.013759,0.013732,3.699739,483.664256,0.030758,0.27122
Seattle,1.435513e-14,8.613079e-14,0.008339,0.010783,3.118941,137.710041,0.026191,0.231007
San Francisco,5.024296e-14,1.579064e-13,0.011035,0.012525,3.390308,845.786972,0.0325,0.291891
Chicago,4.306539e-14,4.306539e-14,0.015424,0.015112,4.097115,244.588451,0.04431,0.388458
Washington DC,4.306539e-14,4.306539e-14,0.007788,0.015042,2.951306,118.015324,0.026419,0.233574
Boston,3.588783e-14,0.0,0.007583,0.012666,3.822199,97.110392,0.036632,0.29365
Denver,7.177566e-15,4.306539e-14,0.016176,0.016058,3.732488,78.678859,0.025129,0.218631


The mean table and the standard deviation table are the two main tables we'll use to get some further statistical analysis on coffee shops in each city.


The first thing that pops out is the Avg. Rating for each city:

In [307]:
mean_table.loc[:,'Avg Rating']

Atlanta          3.950
New York City    6.064
Seattle          6.342
San Francisco    6.436
Chicago          4.524
Washington DC    6.314
Boston           3.724
Denver           5.042
Name: Avg Rating, dtype: float64

These values are significantly different compared to the weighted averages we calculated earlier. This is because there are coffee shops in each database that do not have any ratings, and therefore, have a 0.0 rating overall. We can show that as the number of coffee shops with no ratings increases, the average rating we see above decreases:

In [331]:
numzeros = []
for i in range(len(coffee_cities)):
    num = len(np.where(coffee_cities[i]['# of Ratings'] == 0)[0])
    numzeros.append(num)
    
zeros = mean_table.loc[:,'Avg Rating'].copy()
#zeros['numzeros'] = numzeros
numzeros = pd.Series(numzeros).transpose()
numzeros.index = city_names
pd.concat([numzeros, zeros],axis=1)

Unnamed: 0,0,Avg Rating
Atlanta,25,3.95
New York City,13,6.064
Seattle,9,6.342
San Francisco,10,6.436
Chicago,22,4.524
Washington DC,8,6.314
Boston,25,3.724
Denver,17,5.042


These results make sense; the 'Avg Rating' mean considers the zero ratings as the same weight as any other rating. This is why we used the weighted average rating previously.

One potential conclusion that could be made is that if one wanted to go to a lot of well-known quality coffee shops, then go to the cities with not only high averaege weighted ratings, but also go to ones with minimal coffee shops with zero ratings. If one wanted to be the first to find a hidden gem of a coffee shop, then go to a city with high weighted average rating, but also a decent amount of coffee shops with zero ratings. If there are high-rated coffee shops in a city, then maybe the same coffee shops in that city with zero ratings will be good too!

The next thing that stands out is the # of Ratings:

In [332]:
mean_table.loc[:,'# of Ratings']

Atlanta           75.58
New York City    314.50
Seattle          105.16
San Francisco    520.48
Chicago          110.40
Washington DC     89.34
Boston            53.02
Denver            62.62
Name: # of Ratings, dtype: float64

San Francisco and New York City have significantly higher averaege number of ratings compared to the other cities. Combined with the fact that these two cities had relatively higher average ratings than the other cities, then that makes those averages even more impressive. The standard deviations on these values also provide some information:

In [333]:
stdev_table.loc[:,'# of Ratings']

Atlanta          149.633306
New York City    483.664256
Seattle          137.710041
San Francisco    845.786972
Chicago          244.588451
Washington DC    118.015324
Boston            97.110392
Denver            78.678859
Name: # of Ratings, dtype: float64

The standard deviations are high but make sense; the standard deviation will increase as the actual values increase in magnitude. Coffee shops with zero ratings will further increase the standard deviation since it will increase the range of distribution of number of ratings.

One important conclusion that can be confirmed from this is something we can generally expect: the cities and coffee shops with higher ratings tend to have more number of ratings as well. This makes sense since if a coffee shop is known to have good coffee, more people will go there, which leads to more higher reviews. 

Finally, let's find the highest rated coffee shop in each city:

In [335]:
#Create Table of Maxes
mean_table = []
for i in range(len(describe_tables)):
    means  = describe_tables[i].loc['max',:].transpose()
    mean_table.append(means)
    
mean_table = pd.DataFrame(mean_table)
mean_table.index = city_names
mean_table['Avg Rating']

Atlanta          9.1
New York City    9.4
Seattle          9.1
San Francisco    9.3
Chicago          9.2
Washington DC    9.1
Boston           9.0
Denver           9.1
Name: Avg Rating, dtype: float64

All of the highest rated coffee shops in each city are pretty close in rating, but one coffee shop in New York ends up having the highest rating with 9.4. Let's see which one it is and where it is in New York City:

In [340]:
max_rate = np.where(coffee_cities[1]['Avg Rating'] == 9.4)[0][0]
coffee_cities[1].loc[20,:]

City                                   New York City, NY
City Latitude                                    40.7128
City Longitude                                   -74.006
Coffee Shop id                  5b8e66ac234724002c927c4b
Coffee Shop Name         Brooklyn Bagel & Coffee Company
Coffee Shop Latitude                             40.7309
Coffee Shop Longitude                           -73.9933
Avg Rating                                           9.4
# of Ratings                                          91
% of Ratings                                  0.00578696
Weighted Rating                                0.0543975
Name: 20, dtype: object

The highest rated coffee shop out of the ones tested was Brooklyn Bagel & Coffee Company in New York City. Although this coffee shop only had 91 reviews, it still got great reviews from the people that went there.

Let's see where it is in New York City:

In [342]:
nyc_coffee_map = folium.Map(location = [coffee_cities[1].loc[0,'City Latitude'], coffee_cities[1].loc[0,'City Longitude']], zoom_start=13)


lat = coffee_cities[1].loc[20,'Coffee Shop Latitude']
lng = coffee_cities[1].loc[20,'Coffee Shop Longitude']
name = coffee_cities[1].loc[20,'Coffee Shop Name']
rating = coffee_cities[1].loc[20,'Avg Rating']
label = '{}: {}'.format(name, rating)
label = folium.Popup(label, parse_html = True)
folium.Marker([lat, lng], popup=label).add_to(nyc_coffee_map)
    
nyc_coffee_map

Looks like this coffee shop is right in the middle of Manhattan!

## Results and Discussion

In the first part of the analysis, coffee shops in Atlanta were analyzed and divided into different grid segments of the city. It was noticed that a majority of the coffee shops in Atlanta can be found in the center of the city, and the proximity of coffee shops in this part of Atlanta was much closer than any other part of the city. The average weighted rating of each segment of a 4x4 grid of Atlanta was calculated, and the highest ratings came from the center of the city and just off the split of I-75 and I-85. If I wanted to find good coffee shops in Atlanta, those are the two areas of the city I would most likely want to go to.

The second part of this analysis involved taking the weighted average of all the obtained coffee shops for 8 major cities and plotted on a folium Map. From this analysis, San Francisco had the highest weighted average rating, whereas Boston had the lowest. Compared to other major cities, Atlanta is fairly average when it comes to quality coffee. 

Finally, the data from each city was further evaluated using the '.describe()' method. A strong correlation between actual average rating and the number of coffee shops with zero ratings was found, and also expected. There was also a strong positive correlation between average rating and number of ratings for each city, which makes sense since better coffee shops will have more customers and thus, more ratings. The overall highest rated coffee shop out of the entire dataset was found to be in New York City, right in the middle of Manhattan.

## Conclusion

In this project, coffee shops from around Atlanta and other parts of the country were compiled and analyzed to determine which places have the best coffee according to other reviews and ratings. Various statistical analyses were done to get a better idea of which cities and coffee shops have the best chance of having the best coffee.

The distribution of coffee shops in Atlanta were analyzed, and "hotspots" of coffee shops in the city were found. The average rating of coffee shops in each city was found and compared to determine which cities had the highest rated coffee shops. Finally, correlations were found between ratings and other statistics to find valuable insights on how ratings and number of ratings are distributed between coffee shops.

Overall, some great areas of Atlanta were found so that I can find some great places to get a good cup of coffee. I can tell that these places are some of the best in the country by comparing these coffee shops to other cities, and if I get the chance to travel to some of these cities, I will know exactly where to go to get a cup of coffee and start my day.