# Lecture 28 – Maps

## Data 94, Spring 2021

In [None]:
from datascience import *
import numpy as np

Table.interactive_plots()

## Review: scatter plots and line plots

In [None]:
wm = Table.read_table('data/walmart.csv').select('STREETADDR', 'STRCITY', 'STRSTATE', 'type_store', 'LAT', 'LON', 'YEAR')
wm

In [None]:
wm.group('YEAR')

In [None]:
wm.group('YEAR').plot('YEAR',
                     title = 'Number of Walmarts Opened Per Year')

In [None]:
wm_per_year = wm.group('YEAR')
wm_per_year = wm_per_year.with_columns(
    'total', np.cumsum(wm_per_year.column('count'))
)
wm_per_year

In [None]:
wm_per_year.plot('YEAR', 'total',
                title = 'Total Number of Walmarts Over Time')

## Maps with circles

In [None]:
wm_ca = wm.where('STRSTATE', 'CA')
wm_ca

In [None]:
wm_ca.select('LAT', 'LON')

In [None]:
Circle.map_table(wm_ca.select('LAT', 'LON'))

### Modifying circle appearance

In [None]:
Circle.map_table(wm_ca.select('LAT', 'LON'),
                 area = 200,
                 weight = 1.5, 
                 line_color = 'gold',
                 color = 'purple', 
                 fill_opacity = 0.8
                 )

### `labels`

In [None]:
wm_ca.select('LAT', 'LON', 'STREETADDR')

In [None]:
wm_ca_labeled = wm_ca.select('LAT', 'LON', 'STREETADDR').relabeled('STREETADDR', 'labels')
wm_ca_labeled

In [None]:
Circle.map_table(wm_ca_labeled)

### `color_scale`

In [None]:
wm_ca_scales = wm_ca.select('LAT', 'LON', 'STRCITY', 'YEAR') \
                    .relabeled(['STRCITY', 'YEAR'], ['labels', 'color_scale'])

wm_ca_scales

In [None]:
Circle.map_table(wm_ca_scales,
                fill_opacity = 0.8,
                line_color = None,
                area = 200)

The map above confirms the claims of [this LA Times article from 1990](https://www.latimes.com/archives/la-xpm-1990-06-11-mn-151-story.html), which says:

> The company plans to open 10 stores in California in 1990 and 1991, with most to be located in the interior sections of the state. This year, it will open stores in Lancaster, Victorville, El Centro, Madera, Modesto, Ridgecrest and Stockton. In 1991, it plans stores in Elk Grove, Hanford and Bakersfield.

### `colors`

In [None]:
wm_ca

In [None]:
def color_from_type(type_store):
    if type_store == 'Wal-Mart':
        return 'blue'
    else:
        return 'red'

In [None]:
wm_ca = wm_ca.with_columns(
    'colors', wm_ca.apply(color_from_type, 'type_store')
)

wm_ca

In [None]:
wm_ca.select('LAT', 'LON', 'colors')

In [None]:
Circle.map_table(wm_ca.select('LAT', 'LON', 'colors'),
                fill_opacity = 0.6,
                line_color = None,
                area = 200)

It seems like most Walmarts in California are standard locations and only a few are Supercenters.

What about in the rest of the country?

In [None]:
wm = wm.with_columns(
    'colors', wm.apply(color_from_type, 'type_store')
)

Circle.map_table(wm.select('LAT', 'LON', 'colors'),
                fill_opacity = 0.8,
                line_color = None,
                area = 20)

In many large metro areas there is a concentration of standard Walmarts (blue). Supercenters are more common in the eastern part of the country.

Remember this data is from 2006; things have changed since then.

### Quick Check 1

In [None]:
wm

In [None]:
# qc = wm.where('STRSTATE', 'AR') \
#        .select(..., ..., 'YEAR', ...) \
#        .relabeled('YEAR', ...)

# Circle.map_table(qc, 
#                  line_color = None, 
#                  fill_opacity = 0.7)

## Maps with markers (pins)

In [None]:
wm_ca.select('LAT', 'LON', 'colors')

In [None]:
Marker.map_table(wm_ca.select('LAT', 'LON', 'colors'))

### `marker_icon`

Most icon names [at this site](https://getbootstrap.com/docs/3.3/components/) work, but make sure to remove the term "glyphicon".

In [None]:
# Try changing 'shopping-cart' to 'off', 'euro', or 'remove'
Marker.map_table(wm_ca.select('LAT', 'LON', 'colors'), marker_icon = 'shopping-cart')

### `clustered_marker`

In [None]:
Marker.map_table(wm.select('LAT', 'LON'), clustered_marker = True, marker_icon = 'shopping-cart')

## Example: COVID cases

This data was pulled from [Johns Hopkins' Center  For Systems Science And Engineering](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series) on April 6th, 2021.

It describes the number of **cumulative** cases for each county, every day since January 22, 2020.

In [None]:
covid = Table.read_table('data/jhu-covid.csv')

In [None]:
covid

Let's aim to draw a map illustrating the average number of cases per day over the last 7 days in each county.

To do this, we take the number of cases on April 5, subtract from it the number of cases on March 29, and divide the result by 7.

In [None]:
april = covid.select('Combined_Key', 'Lat', 'Long_', '3/29/21', '4/5/21')
april

In [None]:
april = april.with_columns(
    '7-day avg', np.round((april.column('4/5/21') - april.column('3/29/21')) / 7)
)

april

We need to relabel our columns in order to prepare our table for `Circle.map_table`.

In [None]:
april_for_map = april.select('Lat', 'Long_', 'Combined_Key', '7-day avg') \
     .relabeled(['Combined_Key', '7-day avg'], ['labels', 'color_scale'])

april_for_map

There's something weird – there are a few counties whose 7-day average is negative. This is almost certainly due to some data logging issues; we will need to drop these rows before continuing as they'll mess up our color scale.

In [None]:
april_for_map.sort('color_scale')

In [None]:
april_for_map = april_for_map.where('color_scale', are.above_or_equal_to(0))

Time to call `Circle.map_table`.

In [None]:
Circle.map_table(april_for_map,
                area = 50,
                fill_opacity = 1,
                line_color = None)

We can take things a step further by creating more informative labels.

In [None]:
april

In [None]:
def make_label(name, avg):
    name_no_us = name.replace(', US', '')
    s = '<b>' + name_no_us + '</b>' + '<br>'
    s += '7-day avg: ' + str(int(avg))
    return s

In [None]:
print(make_label('Autauga, Alabama, US', 6))

In [None]:
april.apply(make_label, 'Combined_Key', '7-day avg')

In [None]:
april_for_new_map = april.with_columns(
    'labels', april.apply(make_label, 'Combined_Key', '7-day avg')
).select('Lat', 'Long_', 'labels', '7-day avg') \
 .relabeled('7-day avg', 'color_scale') \
 .where('color_scale', are.above_or_equal_to(0))

april_for_new_map

In [None]:
Circle.map_table(april_for_new_map,
                area = 50,
                fill_opacity = 1,
                line_color = None)

Now each circle tells you the county name and the average number of COVID cases over the past 7 days in that county.

## Extra: cumulative cases in Alameda county

**Note**: The exploration here won't be covered in lecture, and includes programming that is slightly more involved than you'll be responsible for. Nevertheless, you may find it interesting, so take a look!

The dataset has columns for each date; we want rows, because that's what `plot` expects.

In [None]:
alameda = covid.where('Admin2', 'Alameda').select(np.arange(11, covid.num_columns))
alameda

That's not a problem:

In [None]:
alameda_rotated = Table().with_columns(
    'Date', alameda.labels,
    'Cases', alameda.row(0)
)

alameda_rotated

What is a problem is that the date is not in a format that `datascience` recognizes as being a number. There's a solution; run the following cell to implement it.

In [None]:
from datetime import datetime

def convert_date(date):
    return datetime.strptime(date, '%m/%d/%y')

alameda_rotated = alameda_rotated.with_columns(
    'Date', alameda_rotated.apply(convert_date, 'Date')
)

alameda_rotated

Great, now run the following cell to draw the line plot:

In [None]:
alameda_rotated.plot('Date',
                    title = 'Total Number of COVID-19 Cases in Alameda County')

Awesome. But what if we want the number of new cases per day? We can compute that too, using `np.diff`. `np.diff` subtracts consecutive elements in an array. (Notice that when we call `np.diff` on an array of length `n`, the result is an array of length `n-1`.)

In [None]:
np.diff(np.array([5, 4, 9, 1, 8]))

We can use it on the `'Cases'` column of `alameda_rotated`.

In [None]:
alameda_rotated = alameda_rotated.with_columns(
    'New Cases', np.append(0, np.diff(alameda_rotated.column('Cases')))
)

alameda_rotated

In [None]:
alameda_rotated.plot('Date', 'New Cases',
                    title = 'Number of New COVID-19 Cases in Alameda County Per Day')

Hmm – there are a few jumps that don't quite seem right. What do you think happened? 🤔

(Hint: hover over the values for February 5, February 6, and February 7. What happens when you add the values for February 5 and February 6?)