In [1]:
import altair as alt
import pandas as pd

In [3]:
d = pd.read_json("https://vega.github.io/vega-datasets/data/udistrict.json")

In [4]:
d.head()

Unnamed: 0,key,lat
0,bakeries,47.66887
1,bakeries,47.661781
2,bakeries,47.65998
3,bakeries,47.663373
4,bakeries,47.65821


**Part 1**

In [19]:
alt.Chart(d).mark_tick(height=50).encode(
    x=alt.X("lat:Q", scale=alt.Scale(domain=[d["lat"].min() - 1e-3, d["lat"].max() + 1e-3])),
).properties(
    width=600,
    height=100,
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


1. We see that all of the latitude points are right on top of each other in a very small range around 47 degrees. 
It would be nice to know perhaps what is at 47 degrees latitude and perhaps also their longitude. We would also be aided with more data about each restaurant more than just a simple label. Perhaps other information like opening hours or price points, etc. would be helpful.

**Part 2**

In [14]:
mean = alt.Chart(d).mark_tick().encode(
    alt.X('lat', aggregate='mean', scale=alt.Scale(domain=(47.64,47.68)), axis = alt.Axis(grid=False))
).properties(
    title='Mean, Min, Max, and St Dev Bars of Latitude'
)
min = alt.Chart(d).mark_tick().encode(
    alt.X('lat', aggregate='min')
)
max = alt.Chart(d).mark_tick().encode(
    alt.X('lat', aggregate='max')
)
stdev = alt.Chart(d).mark_errorbar(extent='stdev').encode(
    alt.X('lat')
)

mean+min+max+stdev

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [15]:
mean = alt.Chart(d).mark_tick().encode(
    alt.X('lat', aggregate='mean', scale=alt.Scale(domain=(47.64,47.68)), axis = alt.Axis(grid=False))
).properties(
    title='Mean, Min, Max, and IQR Bars of Latitude'
)
min = alt.Chart(d).mark_tick().encode(
    alt.X('lat', aggregate='min')
)
max = alt.Chart(d).mark_tick().encode(
    alt.X('lat', aggregate='max')
)
stdev = alt.Chart(d).mark_errorbar(extent='iqr').encode(
    alt.X('lat')
)

mean+min+max+stdev

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


**Part 3**

In [24]:
alt.Chart(d).transform_density(
    'lat',
    as_=['lat','density']
).mark_area().encode(
    alt.X('lat:Q'),
    alt.Y('density:Q')
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


Features of Interest: 
We see a large concentration around 47.657 and 47.663 with tapering on both sides of those spikes.

In [30]:
alt.Chart(d).transform_density(
    'lat',
    as_=['lat','density'],
    bandwidth=1e-4
).mark_area().encode(
    alt.X('lat:Q'),
    alt.Y('density:Q')
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [32]:
alt.Chart(d).transform_density(
    'lat',
    as_=['lat','density'],
    extent=[d['lat'].min()-1e-2,d['lat'].max()+1e-2]
).mark_area().encode(
    alt.X('lat:Q'),
    alt.Y('density:Q')
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


If we change the bandwidth, we get massive changes in the readability of the graph. The smaller the bandwidth (standard deviation), the more detailed and jagged, whereas, a large bandwidth smooths it out until it becomes a rectangle (uniform distribution). 

If we change the extent, we can see smoother endpoints, but it doesn't change anything else in the plot, just the "niceness" of it.

**Part 4**

In [35]:
alt.Chart(d).mark_tick(height=50).encode(
    x=alt.X("lat:Q", scale=alt.Scale(domain=[d["lat"].min() - 1e-3, d["lat"].max() + 1e-3])),
    color=alt.Color('key')
).properties(
    width=600,
    height=100,
    title='UNREADABLE'
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


In [38]:
alt.Chart(d).transform_density(
    'lat',
    as_=['lat', 'density'],
    groupby=['key'],
    bandwidth=1e-3,
    extent=[d["lat"].min() - 1e-2, d["lat"].max() + 1e-2],
).mark_area().encode(
    x="lat:Q",
    y='density:Q',
).properties(
    width=200,
    height=100,
).facet(
    'key:N',
    columns=5
)

  col = df[col_name].apply(to_list_if_array, convert_dtype=False)


Questions:

- Are there certain types of restaurants that are very concentrated?

Hawaiian, Vietnamese, and vegetarian restaurants have very high concentrations (we think hawaiian is such because there are very few hawaiian restaurants (only 3)). 

- Are there overarching types of food that have correlations with each other (like all asian restaurants, beverages, etc)?

Japanese and Middle Eastern have very similar shapes and distributions. Breakfast and bakeries are also very similar, but seem inversely correlated with bubble tea places. However, overall, there is not a trend with overarching types, it seems.