# Larger library pageviews dataset with big map

- Full 1M row dataset with interactive map
- VegaFusion allows reasonable interaction even with Altair doing transforms

**[Download this dataset](https://www.dropbox.com/scl/fi/wib4ufqxtfdckdch35z06/pageviews_20130403.csv.zip?rlkey=nmyo5efnfl1hq62gncz0vespt&dl=0)**
(10 Mb zipped, 148 Mb CSV)

In [1]:
import pandas as pd
import altair as alt
from altair import datum

# Deal with MaxRowsError
import vegafusion as vf
vf.enable_widget()

vegafusion.enable_widget()

## Read in library web site page views data

Also extracting longitude and latitude from location string

***NOTE: You need to change the file path to where you downloaded the file!***

In [2]:
pageviews = pd.read_csv('/Users/emonson/Dropbox/Projects/LibraryVis/pageviews_20130403.csv',
                       parse_dates=['timestamp'])
pageviews[['longitude','latitude']] = pageviews['loc'].str.extract(r'\[ ([0-9.-]+), ([0-9.-]+) \]').astype('float')
pageviews.head()

Unnamed: 0,timestamp,lcc_first_letter,lcc_category,visitors,visitors_per_lcc_category,visitors_per_lcc_first,loc,city,region,country,longitude,latitude
0,2012-01-01 20:00:00+00:00,D,DT,1,0.00014,1.6e-05,"[ -79.792, 36.0726 ]",Greensboro,North Carolina,United States,-79.792,36.0726
1,2012-01-01 17:00:00+00:00,P,PN,1,1.2e-05,5e-06,"[ -121.8747, 37.6624 ]",Pleasanton,California,United States,-121.8747,37.6624
2,2012-01-01 00:00:00+00:00,D,DS,1,4.1e-05,1.6e-05,"[ -80.84310000000001, 35.2271 ]",Charlotte,North Carolina,United States,-80.8431,35.2271
3,2012-01-01 13:00:00+00:00,J,JK,2,0.000429,6.9e-05,"[ -78.8986, 35.994 ]",Durham,North Carolina,United States,-78.8986,35.994
4,2012-01-01 10:00:00+00:00,P,PT,1,0.000211,5e-06,"[ -78.8986, 35.994 ]",Durham,North Carolina,United States,-78.8986,35.994


### Dataset is over 1M rows

In [3]:
len(pageviews)

1064922

## Map of non-NC world cities, time of day interaction/filtering

*The interaction helps you see that different regions of the world visit at different times of day!*

### Load in the countries map features from vega-datasets

In [4]:
countries = alt.topo_feature('https://vega.github.io/vega-datasets/data/world-110m.json', 
                             feature='countries')

### Explore visitors by time of day

- Aggregation of mean latitude & longitude within a city done in Altair
- Need to do the [map projection](https://altair-viz.github.io/gallery/world_projections.html) on both the points and geographic shapes
- *Note that it's adapting the symbol sizes to the data getting through the filter*

In [7]:
proj_type = 'mercator'
width = 600
height = 500
clip_extent = [[0,0.075*height],[width,0.8*height]]

interval_x = alt.selection_interval(encodings=['x'])

line = alt.Chart(pageviews).mark_line().encode(
    x = 'hours(timestamp):T',
    y = 'sum(visitors):Q'
).add_params(
    interval_x
).properties(
    width=400,
    height=100
)

background = alt.Chart(countries).mark_geoshape(
    fill='#e5d8bd',
    stroke='white',
    opacity=0.5
).project(
    type = proj_type, 
    clipExtent = clip_extent
).properties(
    width = width,
    height = height
)

points = alt.Chart(pageviews).mark_circle(
    opacity = 0.5,
    color = '#7570b3'
).encode(
    longitude='mean_longitude:Q',
    latitude='mean_latitude:Q',
    size='sum_visitors:Q',
    tooltip=alt.Tooltip(['country:N', 'region:N', 'city:N', 'sum_visitors:Q']),
).transform_filter(
    (interval_x) & (datum.region != 'North Carolina')
).transform_aggregate(
    sum_visitors = 'sum(visitors)',
    mean_longitude = 'mean(longitude)',
    mean_latitude = 'mean(latitude)',
    groupby=['country','region','city']
).project(
    type = proj_type, 
    clipExtent= clip_extent
).properties(
    width = width,
    height = height
)

(background + points) & line

VegaFusionWidget(spec='{\n  "config": {\n    "view": {\n      "continuousWidth": 300,\n      "continuousHeight…

