# CSV to GeoBlacklight JSON

### This script takes an input CSV of metadata and converts it to a GeoBlacklight JSON

Import necessary modules

In [1]:
import csv
import json
import os
from datetime import datetime

This is a dictionary to translate single-value Dublin Core/GBL fields into GBLJson

In [2]:
single_dict = {
    "b1g_code_s":["b1g_code_s"],
    "dc_identifier_s":["dc_identifier_s"],
    "layer_slug_s":["layer_slug_s"],
    "dc_rights_s":["dc_rights_s"],
    "suppressed_b":["suppressed_b"],
    "dc_title_s":["dc_title_s"],
    "dc_description_s":["dc_description_s"],
    "dct_issued_s":["dct_issued_s"],
    "solr_year_i":["solr_year_i"],
    "dct_provenance_s":["dct_provenance_s"],
    "dc_type_s":["dc_type_s"],
    "dc_format_s":["dc_format_s"],
    "layer_geom_type_s":["layer_geom_type_s"],
    "solr_geom":["solr_geom"],
    "layer_id_s":["layer_id_s"],
    "dct_references_s":["dct_references_s"]
    }

And this is a dictionary to translate multivalue Dublin Core/GBL fields into GBLJson

In [3]:
multiple_dict = {
    "dct_isPartOf_sm":["dct_isPartOf_sm"],
    "dc_subject_sm":["dc_subject_sm"],
    "dct_temporal_sm":["dct_temporal_sm"],
    "dct_spatial_sm":["dct_spatial_sm"],
    "dc_publisher_sm":["dc_publisher_sm"],
    "dc_creator_sm":["dc_creator_sm"],
    "dc_language_sm":["dc_language_sm"],
    }

This statement will create a folder to store the jsons if one does not already exist

In [4]:
if not os.path.exists("json"):
    os.mkdir("json")

Open the CSV with the GBL data. Change the string inside the open statement to match your file name

In [5]:
csvfile = open('b.csv', 'r')

Reads the CSV into a dictionary and sets the date modified to now.

In [6]:
reader = csv.DictReader(csvfile)
date_modified = datetime.today().strftime('%Y-%m-%d')+"T"+datetime.today().strftime('%X')+"Z"

### The script creates a Python dictionary and adds values from the CSV.

1. A dictionary is created first with default values that are the same for all records


2. Each row is examined for an identifying code. This code separates the records into collections. A folder for each code is created in the json folder so that the jsons can be sorted into their respective collection.


3. The script then goes through the single and multiple dictionaries that were defined above and writes them into the starting dictionary.

4. Finally, the unique identifier is pulled out, the output filename is named according to that unique identifier, and the output json file is written. This happens for every row in the CSV, so each record will be written to its own JSON file.

In [7]:
for row in reader:
    code = ""
    ref = []
    
#starting dictionary with default values
    small_dict = {"geoblacklight_version":"1.0","layer_modified_dt":date_modified}
    for key,val in row.items():
        #Creates a new folder for each unique Code
        if key == "b1g_code_s":
            code = val
            if not os.path.exists("json/" + val):
                os.mkdir("json/" + val)
        
#Looks just for the single valued fields and creates a dictionary of them
        if key in single_dict:
            for fieldname in single_dict[key]:
                small_dict[fieldname] = val
        
#Looks for the multivalued fields (split with a pipe '|') and creates a dictionary of them.
        if key in multiple_dict:
            for fieldname in multiple_dict[key]:
                small_dict[fieldname] = val.split('|')
                

    iden = row['dc_identifier_s']
    filename = iden + ".json"
    
    
#writes to a json with the identifier as the filename 
    with open("json/"+code+"/"+filename, 'w') as jsonfile: 
        json.dump(small_dict,jsonfile,indent=2)

*Script authored by Emily Ruetz @ruetz007; Updated by Karen Majewicz @karenmajewicz*