# Automate OpenIndexMaps GeoJSON Creation
## *County Bounding Box*

## Part 1: Introduction

This demonstration is the Jupyter Notebook version of ***indexmap*** script, which is used to convert **county** bounding box stored in csv file to <a href="https://openindexmaps.org/">OpenIndexMaps GeoJSON</a>.

## Part 2: Preparation

We will be using **Jupyter Notebook(anaconda 3)** to edit and run the script. Information on Anaconda installation can be found <a href='https://docs.anaconda.com/anaconda/install/'>here</a>. Please note that this script is running on Python 3.

To run this script you need:
- **code**.csv formatted in GBL Metadata Template
- **state**.geojson to specify where metadata records belong
- directory path 
    - **data** folder > **code** folder > **code**.csv
    - **county** folder > **state**.geojson

The script currently prints one GeoJSON file:
- **code**.geojson

>Original created on Dec 27 2020<br>
@author: Yijing Zhou @YijingZhou33

## Part 3: Get Started

###  Step 1: Import modules

In [1]:
import os
import pandas as pd
import json
import folium
import numpy as np
import geopandas as gpd

### Step 2: Manual items to change

In [2]:
##### Manually changed items #####
## code and title of the metadata 
code = '03d-01'
title = 'Iowa County Atlases - 03d-01'
## state where metadata records belong
state = 'Iowa'

## Part 4: Generate OpenIndexMap Schema

### Step 1: Convert GeoBlackLight Metadata csv file to dataframe

In [3]:
## list of metadata fields from the GBL metadata template for open data portals desired in the final OpenIndexMap geojson.
collist = ['Title', 'Bounding Box', 'Identifier']

## convert the whole csv file to dataframe
df = pd.read_csv(os.path.join('data', code, code+'.csv'))

## check if the metadata contains 'Image' column, if so then add it to the list
## also more properties can be added here!
if 'Image' in df.columns:
    collist.append('Image')

## only extract fields required for OpenIndexMap geojson properties
df = df[collist]

df.head()

Unnamed: 0,Title,Bounding Box,Identifier
0,Diagram of the Lead-Bearing Crevices near Dubu...,"-91.7206, 40.3745, -91.1122, 40.8135",c2739cdf-fae5-4cc9-aff4-b5ad866043e4
1,"Map of Washington County, Iowa, 1859, 1859","-91.9472, 41.1617, -91.4839, 41.5116",896c49c1-578a-453c-864d-88f62321fa8b
2,"Combination Atlas Map of Henry County, Iowa, 1870","-91.7192, 40.8123, -91.3713, 41.163",b7939794-3134-4416-9579-fa3f5bac800d
3,"Combination atlas map of Johnson County, Iowa,...","-91.8337, 41.4222, -91.3671, 41.862",f9dd09a4-7ca5-47fb-9c46-cb3fd3ee4d8e
4,"Atlas of Marshall County Iowa, 1871","-93.234, 41.8618, -92.7662, 42.2103",49a97edd-0988-43b5-8232-ee23a1d35906


In [4]:
## get the list of all duplicated records with same title
## if so, go back to csv file to delete the redundant one and save it
## then go to Kernel > Restart & Run All
df[df.duplicated(['Title'], keep=False)].sort_values('Title')

Unnamed: 0,Title,Bounding Box,Identifier


### Step 2: Build up OpenIndexMap schema

In [5]:
## create regular bouding box coordinate pairs and round them to 2 decimal places
df = pd.concat([df, df['Bounding Box'].str.split(',', expand=True).astype(float).round(2)], axis=1).rename(
    columns={0:'minX', 1:'minY', 2:'maxX', 3:'maxY'})
df['maxXmaxY'] = df.apply(lambda row: [row.maxX, row.maxY], axis = 1)
df['maxXminY'] = df.apply(lambda row: [row.maxX, row.minY], axis = 1)
df['minXminY'] = df.apply(lambda row: [row.minX, row.minY], axis = 1)
df['minXmaxY'] = df.apply(lambda row: [row.minX, row.maxY], axis = 1)
df['coords'] = df[['maxXmaxY', 'maxXminY', 'minXminY', 'minXmaxY', 'maxXmaxY']].values.tolist()

## concatenate landing page links
df['websiteURL'] = 'https://geo.btaa.org/catalog/' + df['Identifier']

## clean up unnecessary columns
df_clean = df.drop(columns =['minX', 'minY', 'maxX', 'maxY', 'maxXmaxY', 'maxXminY', 'minXminY', 'minXmaxY', 'Bounding Box'])

df_clean.head()

Unnamed: 0,Title,Identifier,coords,websiteURL
0,Diagram of the Lead-Bearing Crevices near Dubu...,c2739cdf-fae5-4cc9-aff4-b5ad866043e4,"[[-91.11, 40.81], [-91.11, 40.37], [-91.72, 40...",https://geo.btaa.org/catalog/c2739cdf-fae5-4cc...
1,"Map of Washington County, Iowa, 1859, 1859",896c49c1-578a-453c-864d-88f62321fa8b,"[[-91.48, 41.51], [-91.48, 41.16], [-91.95, 41...",https://geo.btaa.org/catalog/896c49c1-578a-453...
2,"Combination Atlas Map of Henry County, Iowa, 1870",b7939794-3134-4416-9579-fa3f5bac800d,"[[-91.37, 41.16], [-91.37, 40.81], [-91.72, 40...",https://geo.btaa.org/catalog/b7939794-3134-441...
3,"Combination atlas map of Johnson County, Iowa,...",f9dd09a4-7ca5-47fb-9c46-cb3fd3ee4d8e,"[[-91.37, 41.86], [-91.37, 41.42], [-91.83, 41...",https://geo.btaa.org/catalog/f9dd09a4-7ca5-47f...
4,"Atlas of Marshall County Iowa, 1871",49a97edd-0988-43b5-8232-ee23a1d35906,"[[-92.77, 42.21], [-92.77, 41.86], [-93.23, 41...",https://geo.btaa.org/catalog/49a97edd-0988-43b...


## Part 4: Replace regular bounding box with county bounding box

### Step 1: Convert state shapefile to json

In [6]:
## convert shapefile to geojson
county_geojson = gpd.read_file(os.path.join('county', state+'.geojson'))

## convert geojson to json
county_json = json.loads(county_geojson.to_json())

## only display attributes like county name and coordinates
df_allCounty = pd.json_normalize(county_json['features'])

df_allCounty.head()

Unnamed: 0,id,type,properties.county,geometry.type,geometry.coordinates
0,0,Feature,Johnson,Polygon,"[[[-91.48, 41.86, 0.0], [-91.48, 41.86, 0.0], ..."
1,1,Feature,Audubon,Polygon,"[[[-95.09, 41.85, 0.0], [-95.09, 41.86, 0.0], ..."
2,2,Feature,Harrison,Polygon,"[[[-96.11, 41.84, 0.0], [-96.11, 41.84, 0.0], ..."
3,3,Feature,Sioux,Polygon,"[[[-96.55, 43.26, 0.0], [-96.53, 43.26, 0.0], ..."
4,4,Feature,Lyon,Polygon,"[[[-96.59, 43.49, 0.0], [-96.59, 43.49, 0.0], ..."


### Step 1: Join OpenIndexMap and State dataframe based on county name

In [7]:
## join all county names into a string using |
## note that there are two kinds of formats to avoid missing counties 
## e.g. Sioux | Sioux County 
pat1 = '|'.join(df_allCounty['properties.county'] + ' County')
pat2 = '|'.join(df_allCounty['properties.county'])
pat = pat1 + '|' + pat2

## if title in OpenIndexMap dataframe contains county in State dataframe, 
## add county column to OpenIndexMap dataframe
df_clean.insert(0, 'county', df_clean['Title'].str.extract('(' + pat + ')', expand=False))

## remove 'County' for further merge operation
df_clean['county'] = df_clean['county'].str.replace('County', '')

df_clean.head()

Unnamed: 0,county,Title,Identifier,coords,websiteURL
0,Dubuque,Diagram of the Lead-Bearing Crevices near Dubu...,c2739cdf-fae5-4cc9-aff4-b5ad866043e4,"[[-91.11, 40.81], [-91.11, 40.37], [-91.72, 40...",https://geo.btaa.org/catalog/c2739cdf-fae5-4cc...
1,Washington,"Map of Washington County, Iowa, 1859, 1859",896c49c1-578a-453c-864d-88f62321fa8b,"[[-91.48, 41.51], [-91.48, 41.16], [-91.95, 41...",https://geo.btaa.org/catalog/896c49c1-578a-453...
2,Henry,"Combination Atlas Map of Henry County, Iowa, 1870",b7939794-3134-4416-9579-fa3f5bac800d,"[[-91.37, 41.16], [-91.37, 40.81], [-91.72, 40...",https://geo.btaa.org/catalog/b7939794-3134-441...
3,Johnson,"Combination atlas map of Johnson County, Iowa,...",f9dd09a4-7ca5-47fb-9c46-cb3fd3ee4d8e,"[[-91.37, 41.86], [-91.37, 41.42], [-91.83, 41...",https://geo.btaa.org/catalog/f9dd09a4-7ca5-47f...
4,Marshall,"Atlas of Marshall County Iowa, 1871",49a97edd-0988-43b5-8232-ee23a1d35906,"[[-92.77, 42.21], [-92.77, 41.86], [-93.23, 41...",https://geo.btaa.org/catalog/49a97edd-0988-43b...


In [8]:
## check if there exists any records doesn't include any county information in the title
## if so, go back to csv file and manually add county name 
## then go to Kernel > Restart & Run All
if df.isnull().values.any():
    nan_rows = df_clean[df_clean['county'].isnull()]
else:
    print('> No NULL rows')

> No NULL rows


### Step 3: Merge two dataframes using county name

In [9]:
## merge two dataframes with key attribute 'county'
df_merge = pd.merge(df_clean, df_allCounty, left_on= 'county', right_on='properties.county').drop(
            columns =['county', 'coords', 'id', 'type', 'properties.county', 'geometry.type'])

df_merge.head()

Unnamed: 0,Title,Identifier,websiteURL,geometry.coordinates
0,Diagram of the Lead-Bearing Crevices near Dubu...,c2739cdf-fae5-4cc9-aff4-b5ad866043e4,https://geo.btaa.org/catalog/c2739cdf-fae5-4cc...,"[[[-90.9, 42.65, 0.0], [-90.9, 42.66, 0.0], [-..."
1,"Plat book of Dubuque County, Iowa, 1892",b8609c7c-8e90-4d2a-8441-7f4075911f1e,https://geo.btaa.org/catalog/b8609c7c-8e90-4d2...,"[[[-90.9, 42.65, 0.0], [-90.9, 42.66, 0.0], [-..."
2,"Map of Washington County, Iowa, 1859, 1859",896c49c1-578a-453c-864d-88f62321fa8b,https://geo.btaa.org/catalog/896c49c1-578a-453...,"[[[-91.54, 41.51, 0.0], [-91.54, 41.51, 0.0], ..."
3,"Atlas of Washington County, Iowa, 1874",2572b511-63e3-439b-b90c-034ee2d93732,https://geo.btaa.org/catalog/2572b511-63e3-439...,"[[[-91.54, 41.51, 0.0], [-91.54, 41.51, 0.0], ..."
4,"Atlas of Washington County, Iowa, 1906",a0a71865-6c2c-49e4-b040-3ec2ae48722b,https://geo.btaa.org/catalog/a0a71865-6c2c-49e...,"[[[-91.54, 41.51, 0.0], [-91.54, 41.51, 0.0], ..."


## Part 5: Create OpenIndexMap GeoJSON

### Step 1: Create geojson features

In [11]:
# create_geojson_features 
def create_geojson_features(df):
    print('> Creating GeoJSON features...')
    features = []
    geojson = {
        'type': 'FeatureCollection',
        'title': title,
        'features': features
    }
    for _, row in df.iterrows():
        if type(row['geometry.coordinates'][0][0][0]) is float:
            geometry_type = 'Polygon'
        else:
            geometry_type = 'MultiPolygon'
        feature = {
            'type': 'Feature',
            'id': row['Identifier'],
            'geometry': {
                'type':geometry_type, 
                'coordinates':row['geometry.coordinates']
            },
            'properties': {
                'label': row['Title'],
                'title': row['Title'],
                'recordIdentifier': row['Identifier'],
                'websiteUrl': row['websiteURL']
            }
        }
        ### add more properties here if applicable
        if 'Image' in df.columns:
            feature['properties']['thumbnailUrl'] = row['Image']

        features.append(feature)
    return geojson

data_geojson = create_geojson_features(df_merge)

> Creating GeoJSON features...


### Step 2: Generate geojson file

In [12]:
with open(os.path.join('data', code, code+'.geojson'), 'w') as txtfile:
    json.dump(data_geojson, txtfile)
print('> Creating GeoJSON file...')

> Creating GeoJSON file...


## Part 6: Draw the index maps

In [13]:
print('> Making map...')
## change the location here to zoom to the center
m = folium.Map(location = [42.3756, -93.6397], control_scale = True, zoom_start = 7)

## check if the indexmap geojson files can be rendered properly
folium.GeoJson(open(os.path.join('data', code, code+'.geojson'), 'r').read(),
               tooltip = folium.GeoJsonTooltip(fields=('title', 'websiteUrl'),
                                               aliases=('title','websiteUrl')),
               show = True).add_to(m)
m

> Making map...
