## Summary notes

This **#TidyTuesday** project was posted back on 7th May, 2018.
Here's the motivating tweet from [@thomas_mock](https://twitter.com/thomas_mock):

> twitter: https://twitter.com/thomas_mock/status/993475998791405569

This week the data showed the global locations for various coffee chain shops.
Source data was taken from [flowingdata.com](https://flowingdata.com/2014/03/18/coffee-place-geography/).


## Dependencies

In [1]:
import pandas as pd
import altair as alt

## Constants

In [2]:
COFFEE_URL = ('https://github.com/rfordatascience/tidytuesday/blob/master/'
              + 'data/2018/2018-05-07/week6_coffee_chains.xlsx?raw=true')

In [3]:
ISO_URL = ('https://raw.githubusercontent.com/lukes/'
           + 'ISO-3166-Countries-with-Regional-Codes/master/all/all.csv')

## Main

### Load the data

In [4]:
coffee = pd.read_excel(COFFEE_URL)

In [1]:
iso = pd.read_csv(ISO_URL)

NameError: name 'pd' is not defined

### Process the data

In [31]:
#| code-summary: 'Get N Starbucks per city'
v_euro_starbucks = (
    coffee
    .merge(iso, left_on='Country', right_on='alpha-2', how='inner')
    .query("Brand == 'Starbucks' and region in 'Europe'")
    .groupby(['Country', 'City'])['City']
    .count()
    .rename('n_starbucks')
    .sort_values(ascending=False)
    .to_frame()
    .reset_index()
)

Unnamed: 0,Country,City,n_starbucks
0,GB,London,195
1,RU,Moscow,74
2,FR,Paris,60
3,ES,Madrid,47
4,IE,Dublin,46
...,...,...,...
681,GB,Crewe,1
682,GB,Craigavon,1
683,GB,Corley,1
684,GB,Coleraine,1


### Visualise the data

In [36]:
#| code-summary: 'Top 15 european cities by n_starbucks'
alt.Chart(v_euro_starbucks.head(15)).mark_bar().encode(
    x=alt.X('n_starbucks'),
    y=alt.Y('City', sort='-x'),
    color=alt.Color('Country', legend=None)
).properties(
    width=400,
    height=600,
    title='Top 15 European Cities by Number of Starbucks'
)