# Upload files to Science Base

Here we are going to use `sciencebasepy` to programmatically upload MTH5 files to a parent Science Base page.

In [1]:
from pathlib import Path
import sciencebasepy as sb
import time


## Parent Page

Need to know the parent page to upload to.  This will be the list of numbers and characters at the end of your Science Base web address you were assigned.

In [2]:
parent_id = "67a3b942d34e63325c2b7229"


## Start a Session

First, initiallize a Science Base session.  Now that we no longer have a AD password we need to login differently.  Here we need to get a token to connect to Science Base.  

`session.get_token()` will open a web browser.  In the upper right hand corner there will be a profile icon. Click on it and click `Copy API Token`.  You will need to paste it into the next cell.

<div class="alert alert-block alert-info">
<b>Note:</b> If you are creating a Science Base data release "Prod" mode or production mode don't use any keywords in SbSession().  If you are just downloading data from Science Base you can use SbSession("beta").
</div>



In [3]:
session = sb.SbSession() # use sb.SbSession("beta") if just downloading. 
session.get_token()


A browser window/tab should momentarily open with ScienceBase Manager
Sign in using active directory or login.gov
Click the user icon in the upper right and select 'Copy API token'
This copies the token to your clipboard
Use this value in the add_token function as the token_json parameter


### API Token

Paste the copied API Token here.  What you copied will be the full dictionary.

In [4]:
token = {
    "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJCMWJfZFBNR3BWY09hT253dU0zQThnSGF2Mzd5U1Y0OXJaLUZ2U2gtaXRNIn0.eyJleHAiOjE3NDE5MDAyMTYsImlhdCI6MTc0MTg5ODQxNiwiYXV0aF90aW1lIjoxNzQxODk3MzAwLCJqdGkiOiJjYWUzYjYxYy1iOWZiLTQwZjgtOWQ5Ny1hY2VjNTg4ZGI4MmMiLCJpc3MiOiJodHRwczovL3d3dy5zY2llbmNlYmFzZS5nb3YvYXV0aC9yZWFsbXMvU2NpZW5jZUJhc2UiLCJhdWQiOiJhY2NvdW50Iiwic3ViIjoiOTY1OWVkYmQtOWU0ZC00MWE0LWFhNDEtNWU0YTM3ODM2YTZlIiwidHlwIjoiQmVhcmVyIiwiYXpwIjoiY2F0YWxvZyIsIm5vbmNlIjoiYzZjOTIyYzItOTc0OC00MTQyLThiZTEtOGQxNTZiOTA3OGQ5Iiwic2Vzc2lvbl9zdGF0ZSI6IjYwNjJlNmIyLWY0ODMtNDQxNi04MWMwLTA3YzQ5ZWI2ODc1MyIsImFjciI6IjAiLCJhbGxvd2VkLW9yaWdpbnMiOlsiaHR0cHM6Ly93d3cuc2NpZW5jZWJhc2UuZ292L3VwbG9hZCIsImh0dHBzOi8vYXBpLnNjaWVuY2ViYXNlLmdvdi8qIiwiKiIsImh0dHBzOi8vYmV0YS5zY2llbmNlYmFzZS5nb3YvZGlyZWN0b3J5IiwiaHR0cDovL2xvY2FsaG9zdDozMDAwIiwiaHR0cDovL2xvY2FsaG9zdDo0MDAwLyoiLCJodHRwczovL2Rldi1zY2llbmNlYmFzZS51c2dzLmdvdiIsImh0dHBzOi8vYmV0YS5zY2llbmNlYmFzZS5nb3YiLCJodHRwOi8vbG9jYWxob3N0OjgwODAiLCJodHRwczovL3NjaWVuY2ViYXNlLmdvdiIsImh0dHBzOi8vd3d3LnNjaWVuY2ViYXNlLmdvdi9jYXRhbG9nIiwiaHR0cDovL2xvY2FsaG9zdDo4MDkwIiwiaHR0cDovL2xvY2FsaG9zdDo4MDg4LyoiLCJodHRwczovL2JldGEuc2NpZW5jZWJhc2UuZ292L3VwbG9hZCIsImh0dHBzOi8vYmV0YS5zdGFnaW5nLnNjaWVuY2ViYXNlLmdvdi8qIiwiaHR0cHM6Ly9iZXRhLnNjaWVuY2ViYXNlLmdvdi92b2NhYiJdLCJyZWFsbV9hY2Nlc3MiOnsicm9sZXMiOlsib2ZmbGluZV9hY2Nlc3MiLCJ1bWFfYXV0aG9yaXphdGlvbiJdfSwicmVzb3VyY2VfYWNjZXNzIjp7ImFjY291bnQiOnsicm9sZXMiOlsidmlldy1wcm9maWxlIl19fSwic2NvcGUiOiJvcGVuaWQgcHJvZmlsZSBlbWFpbCIsImVtYWlsX3ZlcmlmaWVkIjp0cnVlLCJuYW1lIjoiSmFyZWQgUGVhY29jayIsInByZWZlcnJlZF91c2VybmFtZSI6ImpwZWFjb2NrQHVzZ3MuZ292IiwiZ2l2ZW5fbmFtZSI6IkphcmVkIiwiZmFtaWx5X25hbWUiOiJQZWFjb2NrIiwiZW1haWwiOiJqcGVhY29ja0B1c2dzLmdvdiJ9.TWVQvQRrkYYmPapnV4rLMDi-pN7Bp3IzaE_qRMYNMU95RT3PCw_4dWRVFNdakNzrLSNcOmQZ28s-hYed_ktkjAf2sGNzGBX3fZYoXnWtPftTHzAVhdrkMH8nFBmjuiCnbkWNL_3c6AcrWtIyiXFQffpMogQS8NjqukTqKQpq3KJiSPqjFb6WYkIt406jkRJEfoDFxe6Fmz1zIQ06e67AOf7EBZmSXgzHAaRPrQmjlr-pYmMV48cnHjVA77ZIdgOlLns6F1h7KGzNg0uMaQdBalQqpvLQhVC4GKvDbaKOK8oMf0aTzhrVbJm5eHBxE__dattMqNd-puXsFvkqSstPyQ",
    "refresh_token": "eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJkMmMyMmZmMy1mZjIxLTQ4NjQtYjJlMS04NTQ1ZWJlNmU0NjkifQ.eyJleHAiOjE3NDE5MDIzNzYsImlhdCI6MTc0MTg5ODQxNiwianRpIjoiNzIzYTc0NGQtOWMyYS00MzIwLTg0NmMtYTJiY2I2MzhiY2JjIiwiaXNzIjoiaHR0cHM6Ly93d3cuc2NpZW5jZWJhc2UuZ292L2F1dGgvcmVhbG1zL1NjaWVuY2VCYXNlIiwiYXVkIjoiaHR0cHM6Ly93d3cuc2NpZW5jZWJhc2UuZ292L2F1dGgvcmVhbG1zL1NjaWVuY2VCYXNlIiwic3ViIjoiOTY1OWVkYmQtOWU0ZC00MWE0LWFhNDEtNWU0YTM3ODM2YTZlIiwidHlwIjoiUmVmcmVzaCIsImF6cCI6ImNhdGFsb2ciLCJub25jZSI6ImM2YzkyMmMyLTk3NDgtNDE0Mi04YmUxLThkMTU2YjkwNzhkOSIsInNlc3Npb25fc3RhdGUiOiI2MDYyZTZiMi1mNDgzLTQ0MTYtODFjMC0wN2M0OWViNjg3NTMiLCJzY29wZSI6Im9wZW5pZCBwcm9maWxlIGVtYWlsIn0.Rrs6BUod12bJCHG7Hf8I_6ndXvhd6-QnC-FPFi-QfbA",
}

### Add the API Token
To login add the API token to the session.

In [5]:

session.add_token(token)

In [6]:
# be sure you are logged in.
print(f"Is Logged In: {session.is_logged_in()}")

Is Logged In: True


## Parent Item

Now we are goint to get the parent item, which returns a JSON file that describes the Science Base page.  You can adjust this to adjust the web page.

In [7]:
item = session.get_item(parent_id)

## Upload MTH5s

We are going to upload the MTH5 files to the parent page, which is different from what we have done in the past and should be easier for the user.  The basics are that we are going to upload the file then append the file to the parent item.  

I have all the MTH5 files in one directory, suggest doing the same.  Set the path to that directory.

<div class="alert alert-block alert-warning">
<b>Warning:</b> Sometimes there are timeouts or something goes wrong with uploading, so we will catch the error and move on to the next file.  So be sure to check the output for MTH5 files that were not uploaded properly.
</div>

In [9]:
h5_path = Path(r"c:\\Users\\jpeacock\\OneDrive - DOI\\MTData\\CL2021\\archive")

In [10]:
for h5_file in list(h5_path.glob("*.h5")):
    if h5_file.name in [d["name"] for d in item["files"]]:
        print(f"Skipping: {h5_file.name} is already uploaded")
        continue
    try:
        item = session.upload_file_to_item(item, h5_file.as_posix(), scrape_file=False)
        print(f"{time.ctime()} Uploaded: {h5_file.name}")
    except Exception as error:
        print(f"{h5_file.name} Error: {error}")
        print(f"Is Logged In: {session.is_logged_in()}")
        item = session.get_item(parent_id)
        continue

Skipping: cl008.h5 is already uploaded
Skipping: cl016.h5 is already uploaded
Skipping: cl017.h5 is already uploaded
Skipping: cl024.h5 is already uploaded
Skipping: cl028.h5 is already uploaded
Skipping: cl029.h5 is already uploaded
Skipping: cl031.h5 is already uploaded
Skipping: cl032.h5 is already uploaded
Skipping: cl036.h5 is already uploaded
Skipping: cl037.h5 is already uploaded
Skipping: cl043.h5 is already uploaded
Skipping: cl044.h5 is already uploaded
Skipping: cl049.h5 is already uploaded
Skipping: cl051.h5 is already uploaded
Skipping: cl059.h5 is already uploaded
cl060.h5 Error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Is Logged In: False
Skipping: cl061.h5 is already uploaded
Skipping: cl070.h5 is already uploaded
Skipping: cl071.h5 is already uploaded
Skipping: cl074.h5 is already uploaded
Skipping: cl077.h5 is already uploaded
Skipping: cl078.h5 is already uploaded
Skipping: cl088.h5 is already uploaded
Skipping: cl0

### Check for failed files

If the log got deleted or you didn't run the previous cell, check for files that are not on the parent page.

In [10]:
fns = [d["name"] for d in item["files"]] 
for a in h5_path.glob("*.h5"):
    if a.name not in fns:
        print(a.name)


cl060.h5


## Reorder File List

For organization, we will reorder the files alphanumerically, otherwise they are ordered in the order they were uploaded.  

In [9]:
fns = [d["name"] for d in item["files"]]
print(fns)

['cl008.h5', 'cl016.h5', 'cl017.h5', 'cl029.h5', 'cl031.h5', 'cl024.h5', 'cl028.h5', 'cl032.h5', 'cl036.h5', 'cl037.h5', 'cl043.h5', 'cl044.h5', 'cl049.h5', 'cl051.h5', 'cl059.h5', 'cl061.h5', 'cl070.h5', 'cl071.h5', 'cl074.h5', 'cl077.h5', 'cl088.h5', 'cl090.h5', 'cl099.h5', 'cl101.h5', 'cl103.h5', 'cl104.h5', 'cl108.h5', 'cl111.h5', 'cl112.h5', 'cl113.h5', 'cl126.h5', 'cl128.h5', 'cl129.h5', 'cl140.h5', 'cl141.h5', 'cl160.h5', 'cl161.h5', 'cl162.h5', 'cl163.h5', 'cl200.h5', 'cl201.h5', 'cl203.h5', 'cl204.h5', 'cl205.h5', 'cl207.h5', 'cl208.h5', 'cl211.h5', 'cl221.h5', 'cl230.h5', 'cl302.h5', 'cl303.h5', 'cl304.h5', 'cl305.h5', 'cl306.h5', 'cl3101.h5', 'cl3116.h5', 'cl3121.h5', 'cl3129.h5', 'cl321.h5', 'cl326.h5', 'cl340.h5', 'cl341.h5', 'cl355.h5', 'cl362.h5', 'cl381.h5', 'cl393.h5', 'cl394.h5', 'cl395.h5', 'cl396.h5', 'cl397.h5', 'cl407.h5', 'cl408.h5', 'cl409.h5', 'cl431.h5', 'cl433.h5', 'cl434.h5', 'cl436.h5', 'cl455.h5', 'cl461.h5', 'cl463.h5', 'cl471.h5', 'cl472.h5', 'cl473.h5',

In [11]:

initial_files = []
h5_files = []
for fn in fns:
    if fn.endswith(".xml"):
        initial_files.append(fn)
    elif fn.endswith(".zip"):
        initial_files.append(fn)
    else:
        h5_files.append(fn)

ordered_fns = initial_files + sorted(h5_files)
for fn in ordered_fns:
    print(fn)

clearlake_2022_metadata.xml
clearlake2022_mt_edis_and_pngs.zip
cl008.h5
cl016.h5
cl017.h5
cl024.h5
cl028.h5
cl029.h5
cl031.h5
cl032.h5
cl036.h5
cl037.h5
cl043.h5
cl044.h5
cl049.h5
cl051.h5
cl059.h5
cl060A.h5
cl061.h5
cl070.h5
cl071.h5
cl074.h5
cl077.h5
cl078.h5
cl088.h5
cl090.h5
cl099.h5
cl101.h5
cl103.h5
cl104.h5
cl108.h5
cl110.h5
cl111.h5
cl112.h5
cl113.h5
cl126.h5
cl128.h5
cl129.h5
cl140.h5
cl141.h5
cl160.h5
cl161.h5
cl162.h5
cl163.h5
cl200.h5
cl201.h5
cl203.h5
cl204.h5
cl205.h5
cl207.h5
cl208.h5
cl211.h5
cl221.h5
cl230.h5
cl302.h5
cl303.h5
cl304.h5
cl305.h5
cl306.h5
cl3101.h5
cl3116.h5
cl3121.h5
cl3129.h5
cl321.h5
cl326.h5
cl340.h5
cl341.h5
cl355.h5
cl362.h5
cl381.h5
cl393.h5
cl394.h5
cl395.h5
cl396.h5
cl397.h5
cl407.h5
cl408.h5
cl409.h5
cl431.h5
cl433.h5
cl434.h5
cl435.h5
cl436.h5
cl455.h5
cl461.h5
cl463.h5
cl471.h5
cl472.h5
cl473.h5
cl478.h5
cl479.h5
cl484.h5
cl485.h5
cl486.h5
cl491.h5


## Reorder file list by name

In [12]:
new_file_list = []

for fn in ordered_fns:
    for file_item in item["files"]:
        if file_item["name"] == fn:
            new_file_list.append(file_item)
            break

item["files"] = new_file_list
for d in item["files"]:
    print(d["name"])

clearlake_2022_metadata.xml
clearlake2022_mt_edis_and_pngs.zip
cl008.h5
cl016.h5
cl017.h5
cl024.h5
cl028.h5
cl029.h5
cl031.h5
cl032.h5
cl036.h5
cl037.h5
cl043.h5
cl044.h5
cl049.h5
cl051.h5
cl059.h5
cl060A.h5
cl061.h5
cl070.h5
cl071.h5
cl074.h5
cl077.h5
cl078.h5
cl088.h5
cl090.h5
cl099.h5
cl101.h5
cl103.h5
cl104.h5
cl108.h5
cl110.h5
cl111.h5
cl112.h5
cl113.h5
cl126.h5
cl128.h5
cl129.h5
cl140.h5
cl141.h5
cl160.h5
cl161.h5
cl162.h5
cl163.h5
cl200.h5
cl201.h5
cl203.h5
cl204.h5
cl205.h5
cl207.h5
cl208.h5
cl211.h5
cl221.h5
cl230.h5
cl302.h5
cl303.h5
cl304.h5
cl305.h5
cl306.h5
cl3101.h5
cl3116.h5
cl3121.h5
cl3129.h5
cl321.h5
cl326.h5
cl340.h5
cl341.h5
cl355.h5
cl362.h5
cl381.h5
cl393.h5
cl394.h5
cl395.h5
cl396.h5
cl397.h5
cl407.h5
cl408.h5
cl409.h5
cl431.h5
cl433.h5
cl434.h5
cl435.h5
cl436.h5
cl455.h5
cl461.h5
cl463.h5
cl471.h5
cl472.h5
cl473.h5
cl478.h5
cl479.h5
cl484.h5
cl485.h5
cl486.h5
cl491.h5


In [14]:
session.update_item({"id": parent_id, "files": new_file_list})

{'link': {'rel': 'self',
  'url': 'https://www.sciencebase.gov/catalog/item/67a3b942d34e63325c2b7229'},
 'relatedItems': {'link': {'url': 'https://www.sciencebase.gov/catalog/itemLinks?itemId=67a3b942d34e63325c2b7229',
   'rel': 'related'}},
 'id': '67a3b942d34e63325c2b7229',
 'identifiers': [{'type': 'IPDS',
   'scheme': 'https://www.sciencebase.gov/vocab/category/item/identifier',
   'key': 'IP-175657'},
  {'type': 'DOI',
   'scheme': 'https://www.sciencebase.gov/vocab/category/item/identifier',
   'key': 'doi:10.5066/P14KAQ3M'},
  {'type': 'USGS_ScienceCenter',
   'scheme': 'https://www.sciencebase.gov/vocab/category/item/identifier',
   'key': 'Geology, Minerals, Energy, and Geophysics Science Center'},
  {'type': 'USGS_MissionArea',
   'scheme': 'https://www.sciencebase.gov/vocab/category/item/identifier',
   'key': 'Natural Hazards'},
  {'type': 'USGS_keywords',
   'scheme': 'https://www.sciencebase.gov/vocab/category/item/identifier',
   'key': 'Energy Resources, Geophysics, Min

In [15]:
session.logout()

You have now been logged out.
