## Update GEOMG Records

The script aims to update the original GEOMG records to match the Aardvark version GeoBlaklight Metadata.
- For `Subject` field,  migrate it into a new field called `ISO Topic Category` if the value comes from the [ISO Terms](https://airtable.com/tblQYoRv5nCaMOgim/viwjVPpqPcIyH6eul?blocks=hide), or migrate it into another new field called `Resource Type` if the value comes from the [LOC Genres](https://airtable.com/tblQYoRv5nCaMOgim/viwYc6Rpunc2W4t1h?blocks=hide). However, for those values comes neither of them, just remain them in the  `Subject` field instead.

- For `Geometry Type` field, migrate it into the very first position of the `Resouce Type` followed by the word `data` if it is among `Point`, `Polygon`, `Line`, `Vector`, and `Raster`. However, if it is `Image`, just ignore it and do not migrate here.

-  For `Is Part Of` field, first check if it contains multiple values seperated by `|`. If it does, split it and store the first value which is always the `Code` into a new field called `Member Of`, then store the rest parts into `Is Part Of`. However, for those which have only one code, just migrate it into `Member Of` and remain `Is Part Of` blank instead.


#### Data Structure
- `script.ipynb`
-  `dataBefore` folder -> several CSV files downloaded from GEOMG website by institution
-  `dataUpdate` folder -> several CSV files after updated



#### Reminder
**Updated fields `Member Of`, `Is Part Of`, `ISO Topic Category` and `Resource Type` are all appended at the rightmost columns. After going through the whole process of the script, you may need open the output CSV files on EXCEL or Google Spreadsheet to delete the useless columns and choose to re-order fields order as well if needed.**



> Created by Gene Cheng([@Ziiiiing](https://github.com/Ziiiiing)) on April 20, 2021

In [1]:
import csv
import os

In [2]:
iso_terms = ['Farming','Biota','Boundaries','Climatology,Meteorology and Atmosphere',
             'Economy','Elevation','Environment','Geoscientific Information','Health',
             'Imagery and Base Maps','Intelligence and Military','Inland Waters','Location',
             'Oceans','Planning and Cadastral','Society','Structure','Transportation',
             'Utilities and Communications']

loc_genres = ['Aerial photographs','Aerial views','Aeronautical charts','Armillary spheres',
              'Astronautical charts','Astronomical models','Composite atlases','Atlases',
              'Bathymetric maps','Block diagrams','Bottle-charts','Cadastral maps',
              'Cartographic materials','Cartographic materials for people with visual disabilities',
              'Celestial charts','Celestial globes','Census data', 'Children\'s atlases',
              'Children\'s maps','Comparative maps','Digital elevation models','Digital maps',
              'Early maps','Ephemerides','Ethnographic maps','Fire insurance maps','Flow maps',
              'Gazetteers','Geological cross-sections','Geological maps','Globes','Gores (Maps)',
              'Gravity anomaly maps','Index maps','Linguistic atlases','Loran charts',
              'Manuscript maps','Mappae mundi','Mental maps','Meteorological charts',
              'Military maps','Mine maps','Miniature maps','Nautical charts','Outline maps',
              'Photogrammetric maps','Photomaps','Physical maps','Pictorial maps','Plotting charts',
              'Portolan charts','Quadrangle maps','Relief models','Remote-sensing maps','Road maps',
              'Statistical maps','Stick charts','Strip maps','Thematic maps','Topographic maps',
              'Tourist maps','Upside-down maps','Wall maps','World atlases','World maps',
              'Worm\'s-eye views','Zoning maps']

migrate_types = ['Point', 'Polygon','Line', 'Vector', 'Raster']

In [6]:
def updateRecords(file):
    print("##### Updating", file)
    # read csv fields and records
    fields = []
    records = []
    with open('dataBefore/' + file) as fr:
        csv_reader = csv.reader(fr)
        fields = next(csv_reader)
        for row in csv_reader:
            records.append(row)
    
    # add new fields for updated records
    fields.append('Member Of')
    fields.append('Is Part Of')
    fields.append('ISO Topic Category')
    fields.append('Resource Type')

    # update records
    for record in records:
        code = record[39]
        old_isPartOf = record[40]
        old_geometryType = record[17]
        old_subject = record[7]
        
        new_memberOf = ''
        new_isPartOf = ''
        new_resourceType = ''
        new_iso = ''
        
        # Split multi-value 'Is Part Of'
        if '|' in old_isPartOf:
            new_memberOf = old_isPartOf.split('|')[0]
            new_isPartOf = old_isPartOf.split('|')[1]
        elif old_isPartOf == code:
            new_memberOf = old_isPartOf
        else:
            new_isPartOf = old_isPartOf
        
        # Migrate 'Geometry Type'
        if old_geometryType in migrate_types:
            new_resourceType = old_geometryType + ' data'
            record[17] = ''
        elif old_geometryType == 'Image':       # FIXME: delete later
            record[17] = ''
        
        # Split and migrate 'Subject'
        if old_subject in iso_terms:
            new_iso = old_subject
            record[7] = ''
        elif (old_subject in loc_genres) and (new_resourceType):
            new_resourceType += '|' + old_subject
            record[7] = ''
        elif (old_subject in loc_genres) and (not new_resourceType):
            new_resourceType = old_subject
            record[7] = ''
        
        record[40] = ''                      
        record.append(new_memberOf)
        record.append(new_isPartOf)
        record.append(new_iso)
        record.append(new_resourceType)

    # write a new CSV file
    with open('dataUpdate/' + file[:-4] +'_updated.csv', 'w') as fw:
        csv_writer = csv.writer(fw)
        csv_writer.writerow(fields)
        csv_writer.writerows(records)


In [7]:
# store existing CSV files under 'dataBefore' folder into a list
files = [ x for x in os.listdir('dataBefore') if x.endswith('.csv') ]

In [8]:
for file in files:
   updateRecords(file)

##### Updating wisconsin.csv
##### Updating none.csv
##### Updating minnesota.csv
##### Updating maryland.csv
##### Updating pennsylvania.csv
##### Updating michigan.csv
##### Updating stanford.csv
##### Updating ohio.csv
##### Updating nebraska.csv
##### Updating Indiana.csv
##### Updating purdue.csv
##### Updating chicago.csv
##### Updating harvard.csv
##### Updating msu.csv
##### Updating indianamaps.csv
##### Updating illinois.csv
##### Updating iowa.csv
