# Update Retired Items on GEOMG

The script aims to find a better way to process the retired items on GEOMG after the monthly reharvest the DCAT portals. 

We used to download all published items from GEOMG, compare and find the retired ones, modify their `Date Retired` as well as the `Status`fields in a standalone spreadsheet, then upload the spreadsheet on GEOMG again to convert the `Publication States` to **unpublished** for all retired items. 

However, in order to make process easier, we use a python module called `Mechanize` to help interact with the existing online data on GEOMG by modifying the values for the retired ones directly, rather than repeatly download and upload every month. 





> Originally created by **Gene Cheng [(@Ziiiiing)](https://github.com/Ziiiiing)** on **Sep 27, 2021**

> Updated by **Gene Cheng** on **Oct 3, 2021**
- Check login condition
- Skip the non-exist records on GEOMG

## Module Preparation

In [None]:
# uncomment & run this cell if the 'mechanize' module is not installed yet

# pip install mechanize

In [1]:
import mechanize
import time
import csv

## Step 1: User Login

First thing first, you need to modify the value of `username` and `password` to your own ones for GEOMG login. Make sure your personal information is not exposed on the internet.

In [2]:
username = "<your_username>"
password = "<your_password>"

In [3]:
# Perform login 

login_url = "https://geomg.lib.umn.edu/users/sign_in"

br = mechanize.Browser()
br.set_handle_robots(False)   # ignore robots

# browse the Login Page and select the right form for login
br.open(login_url)
br.select_form(nr=1)

# input and submit the username & password
br["user[email]"] = username
br["user[password]"] = password
br.submit()

# redirect if successfully logged in
if br.geturl() ==  login_url:
    print(">>> Failed to login.")
else:
    print('>>>> Successfully logged in.')


>>>> Successfully logged in.


## Step 2: Fetch ID for Retired Items

In [4]:
# read the csv file and extract the ID for all retired items from 'reports' folder

retired_items = []
actionDate = time.strftime("%Y%m%d")

with open('reports/allDeletedItems_{}.csv'.format(actionDate)) as fr:
    reader = csv.reader(fr)
    fieldnames = next(reader)  # jump over the title line
    for row in reader:
        retired_items.append(row[0])   # exact the retired ids only

## Step 3: Scrape & Modify Fields Online

For retired items, we need to keep most of their metadata but only modify the following fields:
- `Date Retired`: set the date today as the retired date with format **YYYY-MM-DD**
- `Status`: change the status from **Active** to **Inactive**
- `Publication State`: convert the publication state from **published** to **unpublished**

In [6]:
# iterate all retired items and update the content

dateRetired = time.strftime("%Y-%m-%d")
count = 0
nonexist = []
failed = []

for item in retired_items:
    count += 1
    item_url = "https://geomg.lib.umn.edu/documents/{}".format(item)

    try:
        br.open(item_url)          # open the edit page for each item
        br.select_form(nr=1)       # the index of the form is 1
    
        # assign a new text for the Date Retired (TextControl)
        br["document[b1g_dateRetired_s]"] = dateRetired
        # select a new option for the Status & Publication States (ListControl)
        br["document[b1g_status_s]"] = ["Inactive"]     
        br["document[publication_state]"] = ["unpublished"]

        # submit the changes
        br.submit()
        print(">>> [{}/{}] Processing {} .................... √".format(count, len(retired_items), item))
    except mechanize.HTTPError as e:
        # ignore the non-exist records
        if e.code == 404:
            print(">>> [{}/{}] Processing {} .................... x".format(count, len(retired_items), item))
            nonexist.append(item)
        else:
            print(">>> [{}/{}] Processing {} .................... x".format(count, len(retired_items), item))
            failed.append(item)        # store failed item and try again later
    except:
        print(">>> [{}/{}] Processing {} .................... x".format(count, len(retired_items), item))
        failed.append(item)
            
# print out the summary
print('\n-------------- Summary --------------')
print('Successful Updates: {}'.format(len(retired_items)-len(nonexist)-len(failed)))
print('Records Not Exist: {}'.format(len(nonexist)))
print('Failed Updates: {}'.format(len(failed)))

if failed:
    print('\n-------------- Manual Edits Needed for Failed Updates --------------')
    for item in failed:
        item_url = 'https://geomg.lib.umn.edu/documents/{}'.format(item)
        print(item_url)

>>> [1/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_15 .................... x
>>> [2/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_36 .................... x
>>> [3/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_51 .................... x
>>> [4/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_94 .................... x
>>> [5/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_114 .................... x
>>> [6/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_116 .................... x
>>> [7/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_159 .................... x
>>> [8/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_0 .................... x
>>> [9/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_29 .................... x
>>> [10/430] Processing 322ee79b35f748869974ec661bd04bbc_5 .................... √
>>> [11/430] Processing 322ee79b35f748869974ec661bd04bbc_10 .................... √
>>> [12/430] Processing 322ee79b35f748869974ec661bd04bbc_24 .................... x
>>> [13/430]

>>> [101/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_72 .................... x
>>> [102/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_89 .................... x
>>> [103/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_90 .................... x
>>> [104/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_95 .................... x
>>> [105/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_104 .................... x
>>> [106/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_105 .................... x
>>> [107/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_122 .................... x
>>> [108/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_46 .................... x
>>> [109/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_74 .................... x
>>> [110/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_82 .................... x
>>> [111/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_143 .................... x
>>> [112/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_100 ..........

>>> [199/430] Processing 640dbe6747934964b07a1e964108d5a6_7 .................... x
>>> [200/430] Processing 640dbe6747934964b07a1e964108d5a6_81 .................... x
>>> [201/430] Processing 640dbe6747934964b07a1e964108d5a6_85 .................... x
>>> [202/430] Processing 640dbe6747934964b07a1e964108d5a6_95 .................... x
>>> [203/430] Processing 640dbe6747934964b07a1e964108d5a6_57 .................... x
>>> [204/430] Processing 640dbe6747934964b07a1e964108d5a6_1 .................... √
>>> [205/430] Processing 640dbe6747934964b07a1e964108d5a6_6 .................... x
>>> [206/430] Processing 640dbe6747934964b07a1e964108d5a6_12 .................... x
>>> [207/430] Processing 640dbe6747934964b07a1e964108d5a6_22 .................... x
>>> [208/430] Processing 640dbe6747934964b07a1e964108d5a6_32 .................... x
>>> [209/430] Processing 640dbe6747934964b07a1e964108d5a6_37 .................... x
>>> [210/430] Processing 640dbe6747934964b07a1e964108d5a6_42 ..................

>>> [298/430] Processing 640dbe6747934964b07a1e964108d5a6_33 .................... x
>>> [299/430] Processing 640dbe6747934964b07a1e964108d5a6_46 .................... x
>>> [300/430] Processing 640dbe6747934964b07a1e964108d5a6_50 .................... x
>>> [301/430] Processing 640dbe6747934964b07a1e964108d5a6_56 .................... x
>>> [302/430] Processing 640dbe6747934964b07a1e964108d5a6_67 .................... x
>>> [303/430] Processing 640dbe6747934964b07a1e964108d5a6_70 .................... x
>>> [304/430] Processing 640dbe6747934964b07a1e964108d5a6_86 .................... x
>>> [305/430] Processing 640dbe6747934964b07a1e964108d5a6_99 .................... x
>>> [306/430] Processing 93bfc2b040e941e8ad4c41e871d6a892_7 .................... √
>>> [307/430] Processing 640dbe6747934964b07a1e964108d5a6_115 .................... x
>>> [308/430] Processing 640dbe6747934964b07a1e964108d5a6_55 .................... x
>>> [309/430] Processing 640dbe6747934964b07a1e964108d5a6_117 ..............

>>> [397/430] Processing 1095cae768cd4049a2912fc92803f105_0 .................... √
>>> [398/430] Processing d328c47c99a945f9bceb18a549644c6a_2 .................... √
>>> [399/430] Processing 16a463304f9d4be99bed048fa06da1e6_4 .................... √
>>> [400/430] Processing c6c6e051e8b3450c84242a41befd9507_1 .................... √
>>> [401/430] Processing cde2a6d7bed4411993d0b8eefc35406b_5 .................... √
>>> [402/430] Processing d13a5b9fcd9e4a9abb3b459f5355dd61_18 .................... √
>>> [403/430] Processing d328c47c99a945f9bceb18a549644c6a_1 .................... √
>>> [404/430] Processing cf032f548e134d8c80b1d97cbccdf3d9_0 .................... √
>>> [405/430] Processing d328c47c99a945f9bceb18a549644c6a_0 .................... √
>>> [406/430] Processing d328c47c99a945f9bceb18a549644c6a_3 .................... √
>>> [407/430] Processing 47092f1302c54a24b164e94eaf80f033_3 .................... √
>>> [408/430] Processing 40504206193f4bf3aa61f47b8832ae20_1 .................... x
>>>

## Thanks for your time.