# Create Map Showing Number of Comics Over Time

The code in this notebook creates one file: `date_place_grouped.csv` which are in a format suitable for input to [flourish.studio](https://flourish.studio/) as a Points Map graph.

This first step imports the pandas library and our dataset.

In [1]:
import pandas as pd

data_frame = pd.read_csv('../comics_as_data_north_america_2020-01-20_reconciled_full.csv')

We will now make a new column for the earliest year listed in the catalog record, which we will use to group the data. 
Since the `date_list` column is already sorted to have the earliest date first, we just need to grab the first 4 characters of that column. 

In [2]:
data_frame.loc[:, 'grouping_date'] = data_frame["date_list"].str[:4]

Next, set the Record number as the index to ensure each record is only counted once.

In [3]:
data_frame = data_frame.set_index("RECORD #")

In [4]:
#run this cell to take a look at the data
data_frame

Unnamed: 0_level_0,LANG,245|abnp,260|a,City of Publication,State/Province,Country,full_location,latitude,longitude,264|a,...,End Date,264|c,date_list,008 Country,Subj_clean,loc_subj_url,SUBJECT,AUTHOR,ADDITIONAL AUTHOR,grouping_date
RECORD #,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
b46629439,eng,Robin meets Man-Bat!,[Place of publication not identified] :,0,0,USA,USA,39.783730,-128.000000,,...,1976.0,,1976,xxu,"Superhero--Comic books, strips, etc.",,"Superhero comic books, strips, etc.",,,1976
b125567157,eng,The American Comic Strip,,0,NY,USA,"NY, USA",43.156168,-75.844995,"[Place of publication not identified] :;""New Y...",...,1978.0,"[1978];""2016."";""©1978""","1978, 2016",nyu,"Cartoonists Biography. ; Comic books, strips, ...",http://id.loc.gov/authorities/subjects/sh85020...,Cartoonists Biography. http://id.loc.gov/autho...,,Creative Arts Television (Firm) http://id.loc....,1978
b67985002,eng,Tales from the Stone Troll Café,[Place of publication not identified] :,0,0,USA,USA,39.783730,-128.000000,,...,1986.0,,1986,xxu,"Fantasy--Comic books, strips, etc. ; Fantasy c...",http://id.loc.gov/authorities/subjects/sh97000...,"Fantasy comic books, strips, etc. http://id.lo...",,,1986
b34915230,eng,Peanuts musical storybook : Around the world w...,[Place of publication not identified] :,0,0,USA,USA,39.783730,-128.000000,,...,1988.0,,1988,xxu,Musical books Specimens. ; Funny kid--Comic bo...,http://id.loc.gov/authorities/subjects/sh20081...,Musical books Specimens. http://id.loc.gov/aut...,"Kusina, Patrick. http://id.loc.gov/authorities...","Borchardt, Laure.;""Schulz, Charles M. (Charles...",1988
b23407177,eng,Homo patrol,[Place of publication not identified] :,0,0,USA,USA,39.783730,-128.000000,,...,1989.0,,1989,xxu,"Gay men--Comic books, strips, etc. ; Toleratio...",http://id.loc.gov/authorities/subjects/sh85061...,"Gay men Comic books, strips, etc. http://id.lo...","Roberts, K. L.","Roberts, Tom. http://id.loc.gov/authorities/na...",1989
b23423237,eng,Peanuts classics,[Place of publication not identified] :,0,0,USA,USA,39.783730,-128.000000,,...,1989.0,,1989,xxu,"Funny kid--Comic books, strips, etc. Calendars.",,"Funny kid comic books, strips, etc. Calendars.","Schulz, Charles M. (Charles Monroe), 1922-2000...",,1989
b34792600,eng,Back in time with B.J. and the chef. #3,[Place of publication not identified] :,0,0,USA,USA,39.783730,-128.000000,,...,1990.0,,1990,xxu,"Columbus, Christopher--Comic books, strips, et...",http://id.loc.gov/authorities/names/n78085478;...,"Columbus, Christopher Comic books, strips, etc...",,ConAgra Frozen Foods (Firm) http://id.loc.gov/...,1990
b34972596,eng,Looney tunes vacation activity book,[Place of publication not identified] :,0,0,USA,USA,39.783730,-128.000000,,...,1990.0,,1990,xxu,"Funny animal--Comic books, strips, etc. Miscel...",http://id.loc.gov/authorities/subjects/sh85109...,"Funny animal comic books, strips, etc. Miscell...",,"Holiday Inns, Inc. http://id.loc.gov/authoriti...",1990
b41902154,eng,Ronn Foss fanzine strips,[Place of publication not identified] :,0,WA,USA,"WA, USA",47.286835,-120.212614,,...,1990.0,,1990,wau,"Superhero--Comic books, strips, etc.; Fan maga...",http://id.loc.gov/authorities/subjects/sh85047176,"Superhero comic books, strips, etc.; Fan magaz...","Foss, Ronn.",,1990
b29547830,eng,"Pittsburgh Symposium on Comics, Film and Liter...",[Place of publication not identified] :,0,PA,USA,"PA, USA",40.969989,-77.727883,,...,1992.0,,1992,pau,"Comic books, strips, etc. Congresses.",http://id.loc.gov/authorities/subjects/sh85028863,"Comic books, strips, etc. Congresses. http://i...",,,1992


Now we will group first by date and then by location

In [5]:
grouped_date_loc = data_frame.groupby(["grouping_date", "full_location"])

We use the size() method from pandas to get the count of record numbers in each group.

In [6]:
#The first row is an anomoly that will need to be deleted later in Flourish.

grouped_date_loc.size()

grouping_date  full_location            
1201           Berkeley, CA, USA             1
1888           Kansas City, MO, USA          1
1897           New York, NY, USA             3
1900           Albany, NY, USA               1
               Boston, MA, USA               1
               Burbank, CA, USA              1
               Chicago, IL, USA              1
               New York, NY, USA             1
               Ridgewood, NJ, USA            1
               USA                           1
1902           Brooklyn, NY, USA             1
               Chicago, IL, USA              1
               New York, NY, USA             1
1903           New York, NY, USA             1
1905           Chicago, IL, USA              4
1906           New York, NY, USA             1
1907           Chicago, IL, USA              1
1908           Chicago, IL, USA              3
               Kansas City, MO, USA          1
               New York, NY, USA             1
1909           Batt

That looks about right, but we need to retain the latitude and longitude for the map to work correctly.
We know that there is only one lat/long pair for each location, so it won't actually create any new groups. 

In [7]:
grouped_date_latlong = data_frame.groupby(["grouping_date", "full_location", "latitude", "longitude"])

In [8]:
#preview the csv
grouped_date_latlong.size()

grouping_date  full_location              latitude   longitude  
1201           Berkeley, CA, USA          37.870839  -122.272864     1
1888           Kansas City, MO, USA       39.100105  -94.578142      1
1897           New York, NY, USA          42.684004  -73.847987      3
1900           Albany, NY, USA            42.651167  -73.754968      1
               Boston, MA, USA            42.360253  -71.058291      1
               Burbank, CA, USA           34.181648  -118.325855     1
               Chicago, IL, USA           41.875555  -87.624421      1
               New York, NY, USA          42.684004  -73.847987      1
               Ridgewood, NJ, USA         40.979186  -74.116576      1
               USA                        39.783730  -128.000000     1
1902           Brooklyn, NY, USA          40.650104  -73.949582      1
               Chicago, IL, USA           41.875555  -87.624421      1
               New York, NY, USA          42.684004  -73.847987      1
1903        

Finally, export the grouped data to a csv.

In [9]:
grouped_date_latlong.size().to_csv("date_place_grouped.csv")

  """Entry point for launching an IPython kernel.
