# CSV to GeoBlacklight JSON

### This script takes an input CSV of metadata and converts it to a GeoBlacklight JSON

#### Look in the repo for an example input CSV, and let's get started
Import necessary modules

In [1]:
import csv
import json
import os
from datetime import datetime

This is a dictionary to translate single-value Dublin Core/GBL fields into GBLJson

In [2]:
single_dict = {
    "Dublin Core:Identifier":["layer_slug_s","dc_identifier_s"],
    "Dublin Core:Provenance":["dct_provenance_s"],
    "Dublin Core:Title":["dc_title_s"],
    "Dublin Core:Date":["solr_year_i"],
    "GeoBlacklight:Geometry Type":["layer_geom_type_s"],
    "Dublin Core:Date Issued":["dct_issued_s"]
    }

And this is a dictionary to translate multivalue Dublin Core/GBL fields into GBLJson

In [3]:
multiple_dict = {
    "Dublin Core:Spatial Coverage":["dct_spatial_sm"],
    "Dublin Core:Temporal Coverage":["dct_temporal_sm"],
    "Dublin Core:Is Part Of":["dct_isPartOf_sm"]
    }

This statement will create a folder to store the jsons if one does not already exist

In [4]:
if not os.path.exists("json"):
    os.mkdir("json")

Open the CSV with the GBL data. Change the string inside the open statement to match your file name

In [5]:
csvfile = open('ArcGIS_Reaccession_20190607 - actualNew.csv', 'r')

Reads the CSV into a dictionary and sets the date modified to today

In [None]:
reader = csv.DictReader(csvfile)
date_modified = datetime.today().strftime('%Y-%m-%d')

Now this is where the work happens. 
* Each row within the reader is a dictionary containing one line of the CSV. <br>
* A starting dictionary is created that has some pre-populated default values. These can change as needed; feel free to modify them. <br>
* Each row is examined for an identifying code. This code separates the records into collections. A folder for each code is created in the json folder so that the jsons can be sorted into their respective collection. <br>
* The script then goes through the single and multiple dictionaries that were defined above and writes them into the starting dictionary. <br>
* Next, the script looks for the the spatial coverage field and splits the WSEN values into their own variables. A centroid is calculated, and the geometry and centroid fields are populated accordingly. If the spatial coverage field doesn't have all of the necessary values, then the geometry and centroid fields are written to be null. <br>
* Finally, the unique identifier is pulled out, the output filename is named according to that unique identifier, and the output json file is written. This happens for every row in the CSV, so each record will be written to its own JSON file.

In [7]:
for row in reader: #each row is a dictionary
    code = ""
    small_dict = {"geoblacklight_version":"1.0","dc_rights_s":"Public","layer_modified_dt":date_modified} #starting dictionary with set values
    for key,val in row.items():
        if key == "Code":
            code = val
            if not os.path.exists("json/" + val): #makes a new folder for each code
                os.mkdir("json/" + val)
        if key in single_dict:
            for fieldname in single_dict[key]:
                small_dict[fieldname] = val
        if key in multiple_dict:
            for fieldname in multiple_dict[key]:
                small_dict[fieldname] = val.split('|') #creates a list with the multiple values
        if key == "Dublin Core:Coverage":
            val = val.split(',')
            if len(val) == 4: #takes care of bounding box values and calculates centroid
                west = val[0]
                south = val[1]
                east = val[2]
                north = val[3]
                centerlat = (float(north)+float(south))/2
                centerlong = (float(east)+float(west))/2
                small_dict["solr_geom"] = "ENVELOPE("+west+","+east+","+north+","+south+")"
                small_dict["b1g_centroid_ss"] = str(centerlat) + "," + str(centerlong)
            else: #if the bounding box doesn't have all coordinates, just write values as null
                small_dict["solr_geom"] = "NULL"
                small_dict["b1g_centroid_ss"] = "NULL"
    iden = row['Dublin Core:Identifier']
    filename = iden + ".json"
    with open("json/"+code+"/"+filename, 'w') as jsonfile: #writes to a json with the identifier as the filename
        json.dump(small_dict,jsonfile,indent=2)