# Update Retired Items on GEOMG

The script aims to find a better way to process the retired items on GEOMG after the monthly reharvest the DCAT portals. 

We used to download all published items from GEOMG, compare and find the retired ones, modify their `Date Retired` as well as the `Status`fields in a standalone spreadsheet, then upload the spreadsheet on GEOMG again to convert the `Publication States` to **unpublished** for all retired items. 

However, in order to make process easier, we use a python module called `Mechanize` to help interact with the existing online data on GEOMG by modifying the values for the retired ones directly, rather than repeatly download and upload every month. 





> Originally created by Gene Cheng [(@Ziiiiing)](https://github.com/Ziiiiing) on Sep 27, 2021

## Module Preparation

In [None]:
# uncomment & run this cell if the 'mechanize' module is not installed yet

# pip install mechanize

In [1]:
import mechanize
import time
import csv

## Step 1: User Login

First thing first, you need to modify the value of `username` and `password` to your own ones for GEOMG login. Make sure your personal information is not exposed on the internet.

In [2]:
username = "<your_username>"
password = "<your_password>"

In [3]:
# Perform login 

login_url = "https://geomg.lib.umn.edu/users/sign_in"

br = mechanize.Browser()
br.set_handle_robots(False)   # ignore robots

# browse the Login Page and select the right form for login
br.open(login_url)
br.select_form(nr=1)

# input and submit the username & password
br["user[email]"] = username
br["user[password]"] = password

try:
    br.submit()
    print('>>>> Successful logged in.')
except:
    print('>>> Failed to login. [HTTP Status Code:{}]'.format(res.code))


>>>> Successful logged in.


## Step 2: Fetch ID for Retired Items

In [4]:
# read the csv file and extract the ID for all retired items from 'reports' folder

retired_items = []
actionDate = time.strftime("%Y%m%d")

with open('reports/allDeletedItems_{}.csv'.format(actionDate)) as fr:
    reader = csv.reader(fr)
    fieldnames = next(reader)  # jump over the title line
    for row in reader:
        retired_items.append(row[0])   # exact the retired ids only

## Step 3: Scrape & Modify Fields Online

For retired items, we need to keep most of their metadata but only modify the following fields:
- `Date Retired`: set the date today as the retired date with format YYYY-MM-DD
- `Status`: change the status from **Active** to **Inactive**
- `Publication State`: convert the publication state from **published** to **unpublished**

In [12]:
# iterate all retired items and update the content

dateRetired = time.strftime("%Y-%m-%d")
count = 0
failed = []

for item in retired_items:
    count += 1
    item_url = "https://geomg.lib.umn.edu/documents/{}".format(item)

    try:
        br.open(item_url)          # open the edit page for each item
        br.select_form(nr=1)       # the index of the form is 1
    
        # assign a new text for the Date Retired (TextControl)
        br["document[b1g_dateRetired_s]"] = dateRetired
        # select a new option for the Status & Publication States (ListControl)
        br["document[b1g_status_s]"] = ["Inactive"]     
        br["document[publication_state]"] = ["unpublished"]

        # submit the changes
        br.submit()
        print(">>> [{}/{}] Processing {}: Success".format(count, len(retired_items), item))
    except:
        print(">>> [{}/{}] Processing {}: Failed".format(count, len(retired_items), item))
        failed.append(item)        # store failed item and try again later

>>> [1/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_15: Failed
>>> [2/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_36: Failed
>>> [3/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_51: Failed
>>> [4/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_94: Failed
>>> [5/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_114: Failed
>>> [6/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_116: Failed
>>> [7/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_159: Failed
>>> [8/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_0: Failed
>>> [9/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_29: Failed
>>> [10/430] Processing 322ee79b35f748869974ec661bd04bbc_5: Success
>>> [11/430] Processing 322ee79b35f748869974ec661bd04bbc_10: Success
>>> [12/430] Processing 322ee79b35f748869974ec661bd04bbc_24: Failed
>>> [13/430] Processing 322ee79b35f748869974ec661bd04bbc_27: Failed
>>> [14/430] Processing 322ee79b35f748869974ec661bd04bbc_120: Failed
>>> [15/430] Processing 9b2537e7a6e749328d84ab8d071f5

>>> [122/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_56: Failed
>>> [123/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_60: Failed
>>> [124/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_75: Failed
>>> [125/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_77: Failed
>>> [126/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_8: Failed
>>> [127/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_80: Failed
>>> [128/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_92: Failed
>>> [129/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_96: Failed
>>> [130/430] Processing 9b2537e7a6e749328d84ab8d071f5c9f_97: Failed
>>> [131/430] Processing 322ee79b35f748869974ec661bd04bbc_164: Failed
>>> [132/430] Processing 322ee79b35f748869974ec661bd04bbc_102: Failed
>>> [133/430] Processing 322ee79b35f748869974ec661bd04bbc_52: Failed
>>> [134/430] Processing 322ee79b35f748869974ec661bd04bbc_73: Failed
>>> [135/430] Processing 322ee79b35f748869974ec661bd04bbc_93: Failed
>>> [136/430] Processing 322ee79b

>>> [241/430] Processing 640dbe6747934964b07a1e964108d5a6_65: Failed
>>> [242/430] Processing 640dbe6747934964b07a1e964108d5a6_79: Failed
>>> [243/430] Processing 640dbe6747934964b07a1e964108d5a6_89: Failed
>>> [244/430] Processing 640dbe6747934964b07a1e964108d5a6_136: Failed
>>> [245/430] Processing 640dbe6747934964b07a1e964108d5a6_138: Failed
>>> [246/430] Processing 640dbe6747934964b07a1e964108d5a6_157: Failed
>>> [247/430] Processing 640dbe6747934964b07a1e964108d5a6_97: Failed
>>> [248/430] Processing 640dbe6747934964b07a1e964108d5a6_0: Failed
>>> [249/430] Processing 640dbe6747934964b07a1e964108d5a6_2: Success
>>> [250/430] Processing 640dbe6747934964b07a1e964108d5a6_4: Failed
>>> [251/430] Processing 640dbe6747934964b07a1e964108d5a6_11: Failed
>>> [252/430] Processing 640dbe6747934964b07a1e964108d5a6_25: Failed
>>> [253/430] Processing 640dbe6747934964b07a1e964108d5a6_54: Failed
>>> [254/430] Processing 640dbe6747934964b07a1e964108d5a6_62: Failed
>>> [255/430] Processing 640dbe67

>>> [360/430] Processing 1d75af0fe6024a578c56266c9f10388f_0: Success
>>> [361/430] Processing 5f9216c69a3e474a9442cbab53235049_10: Success
>>> [362/430] Processing 11941f347a5d40d39dcb4f7db57a6ead_0: Success
>>> [363/430] Processing 75caeadc857e451499ac6b4090179828_0: Success
>>> [364/430] Processing 36248a2eca4c4434bf356cd2221aabdc_9: Success
>>> [365/430] Processing 2531be0a528c4a40833c529871b29d08_4: Success
>>> [366/430] Processing 5abd7dd9e5d84485a1903ba6ec6df6b1_3: Success
>>> [367/430] Processing d7104c4af1134790b8c5c067079564a5_0: Success
>>> [368/430] Processing 2e0e6b2e1a944313a9766612746d27bf_0: Success
>>> [369/430] Processing a2dc345848c14c5d9940acf0ef55cd56_0: Success
>>> [370/430] Processing e289166bf1084ea596bd71a2d47e8ff2_0: Success
>>> [371/430] Processing 20ceeef070734c7cabfda961e8413921_0: Success
>>> [372/430] Processing 2f1efc7bfcd14123aa8c9dba7062cd53_0: Success
>>> [373/430] Processing 8a2d7d329b854c02b0fb173d1fd660a0_0: Success
>>> [374/430] Processing abdb4042

If any of the retired items are failed to be modified on GEOMG, then try a second time to modify these items.

In [16]:
count = 0
failedAgain = []

if failed:
    print('-------------- Summary --------------')
    print('Successful Updates: {}'.format(len(retired_items)-len(failed)))
    print('Failed Updates: {}'.format(len(failed)))
    
    print('\n-------------- Try Again for Failed Updates --------------')
    for item in failed:
        count += 1
        item_url = 'https://geomgdev.lib.umn.edu/documents/{}'.format(item)
        
        try:
            br.open(item_url)          # open the edit page for each item
            br.select_form(nr=1)       # the index of the form is 1


            # assign a new text for the Date Retired (TextControl)
            br["document[b1g_dateRetired_s]"] = dateRetired
            # select a new option for the Status & Publication States (ListControl)
            br["document[b1g_status_s]"] = ["Inactive"]     
            br["document[publication_state]"] = ["unpublished"]

            # submit the changes
            br.submit()
            print(">>> [{}/{}] Processing {}: Success".format(count, len(failed), item))
        except:
            print(">>> [{}/{}] Processing {}: Failed".format(count, len(failed), item))
            failedAgain.append(item)
                  


-------------- Summary --------------
Successful Updates: 139
Failed Updates: 291

-------------- Try Again for Failed Updates --------------
>>> [1/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_15: Failed
>>> [2/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_36: Failed
>>> [3/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_51: Failed
>>> [4/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_94: Failed
>>> [5/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_114: Failed
>>> [6/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_116: Failed
>>> [7/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_159: Failed
>>> [8/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_0: Failed
>>> [9/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_29: Failed
>>> [10/291] Processing 322ee79b35f748869974ec661bd04bbc_24: Failed
>>> [11/291] Processing 322ee79b35f748869974ec661bd04bbc_27: Failed
>>> [12/291] Processing 322ee79b35f748869974ec661bd04bbc_120: Failed
>>> [13/291] Processing 9b2537e7a6e749328d84ab8d

>>> [119/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_50: Failed
>>> [120/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_56: Failed
>>> [121/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_60: Failed
>>> [122/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_75: Failed
>>> [123/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_77: Failed
>>> [124/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_8: Failed
>>> [125/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_80: Failed
>>> [126/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_92: Failed
>>> [127/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_96: Failed
>>> [128/291] Processing 9b2537e7a6e749328d84ab8d071f5c9f_97: Failed
>>> [129/291] Processing 322ee79b35f748869974ec661bd04bbc_164: Failed
>>> [130/291] Processing 322ee79b35f748869974ec661bd04bbc_102: Failed
>>> [131/291] Processing 322ee79b35f748869974ec661bd04bbc_52: Failed
>>> [132/291] Processing 322ee79b35f748869974ec661bd04bbc_73: Failed
>>> [133/291] Processing 322ee79b

>>> [238/291] Processing 640dbe6747934964b07a1e964108d5a6_25: Failed
>>> [239/291] Processing 640dbe6747934964b07a1e964108d5a6_54: Failed
>>> [240/291] Processing 640dbe6747934964b07a1e964108d5a6_62: Failed
>>> [241/291] Processing 640dbe6747934964b07a1e964108d5a6_68: Failed
>>> [242/291] Processing 640dbe6747934964b07a1e964108d5a6_92: Failed
>>> [243/291] Processing 640dbe6747934964b07a1e964108d5a6_100: Failed
>>> [244/291] Processing 640dbe6747934964b07a1e964108d5a6_101: Failed
>>> [245/291] Processing 640dbe6747934964b07a1e964108d5a6_102: Failed
>>> [246/291] Processing 640dbe6747934964b07a1e964108d5a6_135: Failed
>>> [247/291] Processing 640dbe6747934964b07a1e964108d5a6_137: Failed
>>> [248/291] Processing 640dbe6747934964b07a1e964108d5a6_141: Failed
>>> [249/291] Processing 640dbe6747934964b07a1e964108d5a6_166: Failed
>>> [250/291] Processing 322ee79b35f748869974ec661bd04bbc_80: Failed
>>> [251/291] Processing 322ee79b35f748869974ec661bd04bbc_122: Failed
>>> [252/291] Processing 3

If these failed updates still exist, try to modify them manually.

In [17]:
if failedAgain:
    print('-------------- Summary --------------')
    print('Successful Updates: {}'.format(len(retired_items)-len(failedAgain)))
    print('Failed Updates: {}'.format(len(failedAgain)))
                  
    print('\n-------------- Manual Edits Needed --------------')
    for item in failedAgain:
        item_url = 'https://geomgdev.lib.umn.edu/documents/{}'.format(item)
        print(item_url)

-------------- Summary --------------
Successful Updates: 139
Failed Updates: 291

-------------- Manual Edits Needed --------------
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_15
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_36
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_51
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_94
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_114
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_116
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_159
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_0
https://geomgdev.lib.umn.edu/documents/9b2537e7a6e749328d84ab8d071f5c9f_29
https://geomgdev.lib.umn.edu/documents/322ee79b35f748869974ec661bd04bbc_24
https://geomgdev.lib.umn.edu/documents/322ee79b35f748869974ec661bd04bbc_27
https://geomgdev.lib.umn.edu/documents/3

## Thanks for your time.