# How to Update

The website's [JavaScript file](script.js) generates visual content based on three data files: [sanborn-with-fips.json](data/sanborn-with-fips.json), [us-indexed.json](data/us-indexed.json), and [us-cities.json](data/us-cities.json).

Go ahead and import the json module and open the files that you'll be updating:

In [1]:
import json

In [2]:
with open('data/sanborn-with-fips.json') as f:
    sanborn = json.load(f)

In [5]:
with open('data/us-indexed.json') as f:
    us = json.load(f)

In [6]:
with open('data/us-cities.json') as f:
    cities = json.load(f)

## Sanborn Data Structure

For the sanborn-with-fips file, it's important to understand its structure (if you're familiar with it, skip down to the Add New Sanborns section). The file contains a list of dictionaries, where each dictionary contains the data from one state (including DC). The states are organized in alphabetical order. You can tell which state the item represents by accessing the "state" element in that dictionary. We can see this below:

In [8]:
for item in sanborn:
    print(item['state'])

Alabama
Alaska
Arizona
Arkansas
California
Colorado
Connecticut
Delaware
District of Columbia
Florida
Georgia
Hawaii
Idaho
Illinois
Indiana
Iowa
Kansas
Kentucky
Louisiana
Maine
Maryland
Massachusetts
Michigan
Minnesota
Mississippi
Missouri
Montana
Nebraska
Nevada
New Hampshire
New Jersey
New Mexico
New York
North Carolina
North Dakota
Ohio
Oklahoma
Oregon
Pennsylvania
Rhode Island
South Carolina
South Dakota
Tennessee
Texas
Utah
Vermont
Virginia
Washington
West Virginia
Wisconsin
Wyoming


Now, if we look at one specific item, say the dictionary that contains the data for Alabama, we can see that it has the 'state' and 'counties' keys:

In [10]:
sanborn[0].keys()

dict_keys(['state', 'counties'])

The object of the 'counties' key is another dictionary, containing a list of counties, where the name can be accessed using the key 'county':

In [12]:
len(sanborn[0]['counties'])

58

In [13]:
for county in sanborn[0]['counties']:
    print(county['county'])

Henry County
Etowah County
Tallapoosa County
Pickens County
Calhoun County
Limestone County
Lee County
Jefferson County
Escambia County
Jackson County
Pike County
Shelby County
Wilcox County
Mobile County
Chilton County
Barbour County
Houston County
Cullman County
Morgan County
Marengo County
Coffee County
Greene County
Conecuh County
Covington County
Lauderdale County
Baldwin County
Dekalb County
Sumter County
Geneva County
Butler County
Coosa County
Hale County
Madison County
Winston County
Cleburne County
Russell County
Clarke County
Walker County
Chambers County
Crenshaw County
Perry County
Montgomery County
Dale County
Lee And Russell Counties
Autauga County
Franklin County
Randolph County
Dallas County
Colbert County
Saint Clair County
Talladega County
Elmore And Tallapoosa Counties
Tuscaloosa County
Macon County
Bullock County
Bibb County
Elmore County
Marion County


Each county then has a list of cities, which in turn has a list of items. The counties also have their associated FIPS code.

In [14]:
alabama = sanborn[0]
henry_county = alabama['counties'][0]
henry_county.keys()

dict_keys(['county', 'cities', 'fips'])

In [17]:
for city in henry_county['cities']:
    print(city['city'])

Abbeville
Headland


In [19]:
abbeville = henry_county['cities'][0]
abbeville.keys()

dict_keys(['city', 'items'])

Each item has the item name, the date of publication, a list of thumbnail URLs for the images, a list of IIIF URLs (not currently used but could be used for creating a gallery on this site), and the item URL in the catalog.

In [20]:
abbeville['items'][0]

{'name': 'Sanborn Fire Insurance Map from Abbeville, Henry County, Alabama.',
 'date': '1907-06',
 'thumbnail_urls': ['https://tile.loc.gov/storage-services/service/gmd/gmd397m/g3974m/g3974am/g3974am_g000011907/00001_1907-0001.gif',
  'https://tile.loc.gov/storage-services/service/gmd/gmd397m/g3974m/g3974am/g3974am_g000011907/00001_1907-0001.gif#h=150&w=126'],
 'iiif_urls': ['https://tile.loc.gov/image-services/iiif/service:gmd:gmd397m:g3974m:g3974am:g3974am_g000011907:00001_1907-0001/full/pct:12.5/0/default.jpg',
  'https://tile.loc.gov/image-services/iiif/service:gmd:gmd397m:g3974m:g3974am:g3974am_g000011907:00001_1907-0001/full/pct:12.5/0/default.jpg'],
 'item_url': 'https://www.loc.gov/item/sanborn00001_001/'}

Overall, the file structure looks like this:
![sanborn file structure](data/sanborn-format.png)

## Add New Sanborns

The easiest way to add in new Sanborns would be to have a list already created of items that are not present in the file already, largely because it would reduce processing time. However, replacing the entire file is also possible and should be a similar process.

You'll need the information for each item as specified above (name, date, URLs). Then, modify and run the scripts below. The modifications will depend on how the new data is currently stored. The scripts currently assume that the dataset is a list of items, where the items are structured as specified above.

Note that this code has not been fully tested. The overall important points to remember are to maintain the same data structure as the existing file.

In [22]:
# creates a dictionary mapping from state name to index in sanborn
state_to_index = dict()
for i in range(len(sanborn)):
    state_to_index[sanborn[i]['state']] = i

In [32]:
# load your data file, using one element as an example
dataset = [{'name': 'Sanborn Fire Insurance Map from Abbeville, Henry County, Alabama.',
 'date': '1907-06',
 'thumbnail_urls': ['https://tile.loc.gov/storage-services/service/gmd/gmd397m/g3974m/g3974am/g3974am_g000011907/00001_1907-0001.gif',
  'https://tile.loc.gov/storage-services/service/gmd/gmd397m/g3974m/g3974am/g3974am_g000011907/00001_1907-0001.gif#h=150&w=126'],
 'iiif_urls': ['https://tile.loc.gov/image-services/iiif/service:gmd:gmd397m:g3974m:g3974am:g3974am_g000011907:00001_1907-0001/full/pct:12.5/0/default.jpg',
  'https://tile.loc.gov/image-services/iiif/service:gmd:gmd397m:g3974m:g3974am:g3974am_g000011907:00001_1907-0001/full/pct:12.5/0/default.jpg'],
 'item_url': 'https://www.loc.gov/item/sanborn00001_001/'}]

for item in dataset: # where dataset is your data file
    locations = item['name'][32:].split(', ') # where item['name'] accesses the item name
    city = locations[0]
    county = locations[1]
    state = locations[2][:-1] # to remove the period at the end
    
    # go to the state and check if the county exists
    current_state = sanborn[state_to_index[state]]
    for current_county in current_state['counties']:
        if county == current_county['county']: # found the same county, check if city exists
            for current_city in current_county['cities']:
                if city == current_city['city']: # found the same city, check if item is a duplicate
                    for current_item in current_city['items']:
                        if item['name'] == current_item['name']:
                            break
                    # not a duplicate item but the city already exists
                    current_city['items'].append(item)
            # didn't find the same city so need to add it
            temp = dict()
            temp['city'] = city
            temp['items'] = []
            temp['items'].append(item)
            current_county['cities'].append(temp)
    # didn't find the same county so need to add it
    temp = dict()
    temp['county'] = county
    temp['cities'] = []
    city_temp = dict()
    city_temp['city'] = city
    city_temp['items'] = []
    city_temp['items'].append(item)
    current_state['counties'].append(temp)

You'll then need to run the [connect-fips-sanborn](https://github.com/selenaqian/sanborn-maps-navigator/blob/master/data/sanborn/fips-connection/connect-fips-sanborn.py) script to add in those county codes. If there's only a few counties added, then you can also add in the FIPS codes manually.

Then, you'll need to write it back out to the original file.

In [None]:
f = open('data/sanborn-with-fips.json', 'w')
f.write(json.dumps(sanborn))
f.close()

## US Indexed: Updating Counties and States

To update this file, run the [index-counties notebook](https://github.com/selenaqian/sanborn-maps-navigator/blob/master/data/sanborn/fips-connection/index-counties.ipynb) and the [records-counter notebook](https://github.com/selenaqian/sanborn-maps-navigator/blob/master/data/records-counter.ipynb).

Update the color coding in the scaling of the JavaScript file if needed — intervals should make sure to include the max numbers found in the records counter notebook. The domains and the corresponding labels here:

```
/** Map chloropleth colors and legends */
var stateColor = d3.scaleThreshold()
    .domain([0, 500, 1000, 1500, 2000, 2500])
    .range(["#eee", "#D0E5ED", "#71B2CA", "#137FA6", "#0E5F7D", "#0A4053"]);

var stateLegend = d3.legendColor()
    .scale(stateColor)
    .orient("horizontal")
    .shapeWidth(110)
    .labels(["0", "1-500", "501-1000", "1001-1500", "1501-2000", ">2000"]);

var countyColor = d3.scaleThreshold()
    .domain([1, 50, 100, 150])
    .range(["white", "#D0E5ED", "#71B2CA", "#137FA6", "#0E5F7D"]);

var countyLegend = d3.legendColor()
    .scale(countyColor)
    .orient("horizontal")
    .shapeWidth(110)
    .labels(["0", "1-50", "51-100", "100-150", ">150"]);
```

## US Cities

If there are new cities in the dataset, then you'll need to add in the latitude and longitudes so that they show up on the map. Run the [geocode-cities notebook](https://github.com/selenaqian/sanborn-maps-navigator/blob/master/data/sanborn/city-coordinates/geocode-cities.ipynb) with modifications to the numbers and file names. Or, if there aren't many cities, you can do this manually, keeping in mind the struture of the us-cities file.

The file is formatted as a GeoJSON file. You can learn more about that format [here](https://geojson.org). The coordinates live within a FeatureCollection with a list of features. Each feature has a 'type' and 'geometry'. The 'type' should always be 'Feature', with 'geometry' then leading to another dictionary with 'type': 'Point', 'coordinates', 'id', and 'properties'.

In [33]:
cities.keys()

dict_keys(['type', 'features'])

In [34]:
cities['features'][0]

{'type': 'Feature',
 'geometry': {'type': 'Point', 'coordinates': [-85.222965, 31.559402]},
 'id': 'AbbevilleAlabama',
 'properties': {'state': 0, 'county': 0, 'city': 0, 'cityName': 'Abbeville'}}

In properties, the state, county, and city are the indices of the object within the Sanborn dataset. This allows for connection to that object in the script. For Abbeville, it's in the first state (Alabama), first county within Alabama (Henry County), and is the first city within that county's city list.

Please feel free to reach out to me if you have any questions or issues with this!