# Update records on GEOMG via script

The `update.ipynb` is a script aims to modify fields' values on GEOMG via python. We used to **download CSV files -> modify values locally -> upload again** to modify a large number of datasets, or **open each dataset page -> modify values -> update and open next data page** to modify datasets one by one. However, this script offers you a third method for GEOMG document updates.




> Originally created by **Gene Cheng [(@Ziiiiing)](https://github.com/Ziiiiing)** on **Oct 3, 2021**

> Updated by **Gene Cheng [(@Ziiiiing)](https://github.com/Ziiiiing)** on **Oct 24, 2021**

In [None]:
# uncomment & run this cell if the 'mechanize' module is not installed yet

# pip install mechanize

In [None]:
import mechanize
import time
import csv

## Step 1: Prepare a CSV File

Store updated **values** with **field names** in a local CSV file under the same directory.
- First row should be **field names** 
- First column should be **IDs** for each document. 

Please look at the [README.md](https://github.com/BTAA-Geospatial-Data-Project/geomg-documents-update/blob/main/README.md) for more information.


In [None]:
# Hello, please edit here !!
csv_file = "<directory of your CSV file>"

In [None]:
data = {}
with open(csv_file, 'r') as fr:
    reader = csv.reader(fr)
    fields = next(reader)[1:]
    for row in reader:
        ID = row[0]
        dictVal = {}
        for i in range(len(row)-1):
            nameAttr = fields[i]
            newVal = row[i+1]
            if newVal.startswith("[\'"):
                newVal = eval(newVal)
            dictVal[nameAttr] = newVal
        data[ID] = dictVal


## Step 2: User Login on GEOMG

After preparation, we are ready for interacting with the GEOMG. First thing first, you need to modify the value of `username` and `password` to your own ones for GEOMG login. Make sure your personal information is not exposed on the internet.

In [None]:
# Hello, please edit here !!
username = "<your_username>"
password = "<your_password>"

In [None]:
# Perform login 
login_url = "https://geomg.lib.umn.edu/users/sign_in"

br = mechanize.Browser()
br.set_handle_robots(False)   # ignore robots

# browse the Login Page and select the right form for login
br.open(login_url)
br.select_form(nr=1)

# input and submit the username & password
br["user[email]"] = username
br["user[password]"] = password
br.submit()

# redirect if successfully logged in
if br.geturl() ==  login_url:
    print(">>> Failed to login.")
else:
    print('>>>> Successfully logged in.')


## Step 3: Modify Web Contents Online

In [None]:
# iterate the 'modifies' dictionary and make updates
count = 0
nonexist = []
failed = []

for ID in data:
    count += 1
    item_url = "https://geomg.lib.umn.edu/documents/{}".format(ID)
    modifies = data[ID]

    try:
        br.open(item_url)          # open the edit page for each record
        br.select_form(nr=1)       # the index of the form is 1
    
        # iterate field&value pairs to modify
        for field, newval in modifies.items():
            br[field] = newval

        # submit the changes for this document
        br.submit()
        print(">>> [{}/{}] Updating {} .................... √".format(count, len(data), ID))
    
    # skip the nonexist record with error code 404 if any error occurs
    except mechanize.HTTPError as e:
        # ignore the non-exist records
        if e.code == 404:
            print(">>> [{}/{}] Updating {} .................... x".format(count, len(data), ID))
            nonexist.append(ID)
        else:
            print(">>> [{}/{}] Updating {} .................... x".format(count, len(data), ID))
            failed.append(ID)        # store failed item and try again later
    except:
        print(">>> [{}/{}] Updating {} .................... x".format(count, len(data), ID))
        failed.append(ID)

            
# print out the summary
print('\n-------------- Summary --------------')
print('Successful Updates: {}'.format(len(data)-len(nonexist)-len(failed)))
print('Datasets Not Exist: {}'.format(len(nonexist)))
print('Failed Updates: {}'.format(len(failed)))

if failed:
    print('\n-------------- Manual Edits Needed for Failed Updates --------------')
    for ID in failed:
        item_url = 'https://geomg.lib.umn.edu/documents/{}'.format(ID)
        print(item_url)