# Practice: Data Submission to ESS-DIVE SANDBOX Using API
### Dataset API Version 1.7 (Package Service API)

The ESS-DIVE Dataset API is a service that enables projects to programmatically submit and manage datasets with ESS-DIVE. This is an alternative to using the ESS-DIVE Online form for data uploads. This service encodes metadata using the JSON-LD specification. JSON-LD is a schema to encode linked Data using JSON, and in the future will be used by Google to index metadata for searches. The use of the standardized JSON-LD schema will dramatically increase the visibility of datasets, and also enable projects to create one-time code that can be reused for periodic uploads of datasets to ESS-DIVE. 


---

⭐ Current Maximum Upload Limit: **500 GB per upload attempt**
Please contact ess-dive-support@lbl.gov to submit datasets above this upload limit.

**CHANGE TEXT - documentation uses sandbox for testing, when you're done use production**
While learning the expected schema for the metadata use https://api-sandbox.ess-dive.lbl.gov, as this is the domain shown in the examples in this documentation. Once you've familiarized yourself with ESS-DIVE's metadata and dataset JSON_LD schema, use our production domain https://api.ess-dive.lbl.gov/ to submit datasets to ESS-DIVE for publishing and review. 

For additional information about the API, review the documentation at https://api-sandbox.ess-dive.lbl.gov.


---

ESS-DIVE Test API URL: https://api-sandbox.ess-dive.lbl.gov
ESS-DIVE Production API URL: https://api.ess-dive.lbl.gov/
Help Desk: ess-dive-support@lbl.gov

## 1. Get Authentication Token - reword


2. Go to https://data-sandbox.ess-dive.lbl.gov
3. Sign in with Orcid
4. Click your Name in the right hand corner and select My Profile 
5. Now Click the Settings > Authentication Token
6. Scroll down and click Copy on the “Token” tab to get your authentication token 

---
When you're ready to publish your dataset on production, use 
1. If you are not already registered to submit data with ESS-DIVE, follow the steps on the New Contributor Registration guide. 





## 2. Setup

In [None]:
pip install requests

Enter your Authentication Token below. See step 1 for instructions to access your token through ESS-DIVE.

*Always re-run this cell when you update your token. Tokens expire every 24 hours.*

In [None]:
import requests
import os
import json
from ipywidgets import widgets, interact

token_text = widgets.Text("", description="Token:")
display(token_text)

In [None]:
token = token_text.value
base = "https://api-sandbox.ess-dive.lbl.gov/"
header_authorization =  "bearer {}".format(token)
endpoint = "packages"

## 3. Create Metadata 
The following lines of code validate JSON-LD metadata for a single dataset.


In [None]:
provider_spruce = {
   "name": "SPRUCE",
   "member": {
     "@id": "http://orcid.org/0000-0001-7293-3561",
     "givenName": "Paul J",
     "familyName": "Hanson",
     "email": "hansonpj@ornl.gov",
     "jobTitle": "Principal Investigator"
   }
 }

**Coming soon: Use project identifier instead of manually entering project metadata.**

In [None]:
#provider_spruce = {
#            "identifier": {
#            "@type": "PropertyValue",
#                "propertyID": "ess-dive",
#                "value": "1e6d50d3-9532-43fb-a63f-bdcb4350bf0c"
#   }
# }

Prepare the dataset authors in the order that you would like them to appear in the citation. Please add the ORCID for all authors, especially the first author, if possible. 

In [None]:
creators =  [
   {
     "@id": "http://orcid.org/0000-0001-7293-3561",
     "givenName": "Paul J",
     "familyName": "Hanson",
     "affiliation": "Oak Ridge National Laboratory",
     "email": "hansonpj@ornl.gov"
   },
   {
     "givenName": "Jeffrey",
     "familyName": "Riggs",
     "affiliation": "Oak Ridge National Laboratory"
   },
   {
     "givenName": "C",
     "familyName": "Nettles",
     "affiliation": "Oak Ridge National Laboratory"
   },
   {
     "givenName": "William",
     "familyName": "Dorrance",
     "affiliation": "Oak Ridge National Laboratory"
   },
   {
     "givenName": "Les",
     "familyName": "Hook",
     "affiliation": "Oak Ridge National Laboratory"
   }
 ]

Add dataset title

In [None]:
dataset_title = "title_here"

Create the rest of the JSON-LD object


In [None]:
json_ld = {
 "@context": "http://schema.org/",
 "@type": "Dataset",
 "@id": "http://dx.doi.org/10.3334/CDIAC/spruce.001",
 "name": dataset_title,
 "description": [
   "This data set reports selected ambient environmental monitoring data from the S1 bog in Minnesota for the period June 2010 through December 2016. Measurements of the environmental conditions at these stations will serve as a pre-treatment baseline for experimental treatments and provide driver data for future modeling activities.",
   "The site is the S1 bog, a Picea mariana [black spruce] - Sphagnum spp. bog forest in northern Minnesota, 40 km north of Grand Rapids, in the USDA Forest Service Marcell Experimental Forest (MEF). There are/were three monitoring sites located in the bog: Stations 1 and 2 are co-located at the southern end of the bog and Station 3 is located north central and adjacent to an existing U.S. Forest Service monitoring well.",
   "There are eight data files with selected results of ambient environmental monitoring in the S1 bog for the period June 2010 through December 2016. One file has the ",
   "other seven have the available data for a given calendar year. Not all measurements started in June 2010 and EM3 measurements ended in May 2014.",
   "Further details about the data package are in the attached pdf file (SPRUCE_EM_DATA_2010_2016_20170620)."
 ],
 "creator": creators,
 "datePublished": "2015",
 "keywords": [
   "EARTH SCIENCE > BIOSPHERE > VEGETATION",
   "Climate Change"
 ],
 "variableMeasured": [
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC TEMPERATURE > SURFACE TEMPERATURE > AIR TEMPERATURE",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WATER VAPOR > WATER VAPOR INDICATORS > HUMIDITY > RELATIVE HUMIDITY",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC PRESSURE > SEA LEVEL PRESSURE",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC TEMPERATURE > SURFACE TEMPERATURE > DEW POINT TEMPERATURE > DEWPOINT DEPRESSION",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WINDS > SURFACE WINDS > WIND SPEED",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC WINDS > SURFACE WINDS > WIND DIRECTION",
   "EARTH SCIENCE > BIOSPHERE > VEGETATION > PHOTOSYNTHETICALLY ACTIVE RADIATION",
   "EARTH SCIENCE > ATMOSPHERE > ATMOSPHERIC RADIATION > NET RADIATION",
   "EARTH SCIENCE > LAND SURFACE > SURFACE RADIATIVE PROPERTIES > ALBEDO",
   "EARTH SCIENCE > LAND SURFACE > SOILS > SOIL TEMPERATURE",
   "Precipitation (Total)",
   "Irradiance",
   "Groundwater Temperature",
   "Groundwater Level",
   "Volumetric Water Content",
   "surface_albedo"
 ],
 "license": "http://creativecommons.org/licenses/by/4.0/",
 "spatialCoverage": [
   {
     "description": "Site ID: S1 Bog Site name: S1 Bog, Marcell Experimental Forest Description: The site is the 8.1-ha S1 bog, a Picea mariana [black spruce] - Sphagnum spp. ombrotrophic bog forest in northern Minnesota, 40 km north of Grand Rapids, in the USDA Forest Service Marcell Experimental Forest (MEF). The S1 bog was harvested in successive strip cuts in 1969 and 1974 and the cut areas were allowed to naturally regenerate. Stations 1 and 2 are located in a 1974 strip that is characterized by a medium density of 3-5 meter black spruce and larch trees with an open canopy. The area was suitable for siting a monitoring station for representative meteorological conditions on the S1 bog. Station 3 is located in a 1969 harvest strip that is characterized by a higher density of 3-5 meter black spruce and larch trees with a generally closed canopy. Measurements at this station represent conditions in the surrounding stand. Site Photographs are in the attached document",
     "geo": [
       {
         "name": "Northwest",
         "latitude": 47.50285,
         "longitude": -93.48283
       },
       {
         "name": "Southeast",
         "latitude": 47.50285,
         "longitude": -93.48283
       }
     ]
   }
 ],
 "funder": {
   "@id": "http://dx.doi.org/10.13039/100006206",
   "name": "U.S. DOE > Office of Science > Biological and Environmental Research (BER)"
 },
 "temporalCoverage": {
   "startDate": "2010-07-16",
   "endDate": "2016-12-31"
 },
 "editor": {
   "@id": "http://orcid.org/0000-0001-7293-3561",
   "givenName": "Paul J",
   "familyName": "Hanson",
   "email": "hansonpj@ornl.gov"
 },
 "provider": provider_spruce,
 "measurementTechnique": [
   "The stations are equipped with standard sensors for measuring meteorological parameters, solar radiation, soil temperature and moisture, and groundwater temperature and elevation. Note that some sensor locations are relative to nearby vegetation and bog microtopographic features (i.e., hollows and hummocks). See Table 1 in the attached pdf (SPRUCE_EM_DATA_2010_2016_20170620) for a list of measurements and further details. Sensors and data loggers were initially installed and became operational in June, July, and August of 2010. Additional sensors were added in September 2011. Station 3 was removed from service on May 12, 2014.",
   "These data are considered at Quality Level 1. Level 1 indicates an internally consistent data product that has been subjected to quality checks and data management procedures. Established calibration procedures were followed."
 ]
}

## Submit your dataset
There are three options for creating a new dataset:

*   submit metadata only
*   submit metadata and a single data file
*   submit metadata and multiple data files 



### Metadata Only
Use the following cell to submit only your JSON-LD object.

In [None]:
post_packages_url = "{}{}".format(base,endpoint)
post_package_response = requests.post(post_packages_url,
                                      headers={"Authorization":header_authorization},
                                      json=json_ld)

if post_package_response.status_code == 201:
    # Success
    response=post_package_response.json()
    print(f"View URL:{response['viewUrl']}")
    print(f"Name:{response['dataset']['name']}")
else:
    # There was an error
    print(post_package_response.text)

### Metadata and Single Data File
To submit the JSON-LD object along with a data file, use the following cell block. Replace "file_path" with the path to your file.

In [None]:
files_tuples_array = []
upload_file = "file_path"

files_tuples_array.append((("json-ld", json.dumps(json_ld))))
files_tuples_array.append(("data", open(upload_file ,'rb')))

post_packages_url = "{}{}".format(base,endpoint)
post_package_response = requests.post(post_packages_url,
                                    headers={"Authorization":header_authorization},
                                    files= files_tuples_array)

if post_package_response.status_code == 201:
    # Success
    response=post_package_response.json()
    print(f"View URL:{response['viewUrl']}")
    print(f"Name:{response['dataset']['name']}")
else:
    # There was an error
    print(post_package_response.text)


### Metadata and Multiple Data Files
If you have many files to be uploaded, you can place them all inside a directory named 'files' and use the following code:

In [None]:
files_tuples_array = []
files_upload_directory = "/Users/emilyarobles/Desktop/API_TEST_FILES/"
files = os.listdir(files_upload_directory)

files_tuples_array.append((("json-ld", json.dumps(json_ld))))

for filename in files:
   file_directory = files_upload_directory + filename
   files_tuples_array.append((("data", open(file_directory, 'rb'))))

post_packages_url = "{}{}".format(base,endpoint)
post_package_response = requests.post(post_packages_url,
                                    headers={"Authorization":header_authorization},
                                    files= files_tuples_array)

if post_package_response.status_code == 201:
   # Success
   response=post_package_response.json()
   print(f"View URL:{response['viewUrl']}")
   print(f"Name:{response['dataset']['name']}")
else:
   # There was an error
   print(post_package_response.text)

# Revise Existing Datasets
It is possible to both update the metadata and data of an existing dataset.  The following update scenarios are possible 

*   update metadata only
*   replace/add files only
*   both update metadata and replace/add files. 

These examples will demonstrate both updating metadata and adding new files to the dataset created in previous sections.

### Update metadata only
Use the PUT function to update the metadata of a dataset.  This example updates the title (name) of a dataset. You will need the ESS-DIVE identifier of the dataset that you want to revise. 

In [None]:
dataset_id = input('Enter an ESS-DIVE Identifier here: ')

In [None]:
put_package_url = "{}{}/{}".format(base,endpoint, dataset_id)

metadata_update_dict = {"name": "Updated Dataset Name"}

put_package_response = requests.put(put_package_url,
                                    headers={"Authorization":header_authorization},
                                    json=metadata_update_dict)

Check the results for the changed metadata attribute

In [None]:
# Check for errors
if put_package_response.status_code == 200:
   # Success
   response=put_package_response.json()
   print(f"View URL:{response['viewUrl']}")
   print(f"Name:{response['dataset']['name']}")
else:
   # There was an error
   print(put_package_response.text)

### Metadata plus a new data file
Use the PUT function to update a dataset.  This example updates the date published to 2019 of a dataset and adds a new data file.

In [None]:
dataset_id = input('Enter an ESS-DIVE Identifier here: ')

In [None]:
files_tuples_array = []
upload_file = "path/to/your_file"
files_tuples_array.append((("json-ld", json.dumps(metadata_update_dict))))
files_tuples_array.append(("data", open(upload_file ,'rb')))

put_package_url = "{}{}/{}".format(base,endpoint, dataset_id)



put_package_response = requests.put(put_package_url,
                                   headers={"Authorization":header_authorization},
                                   files= files_tuples_array)

Check the results for the changed metadata attribute.


In [None]:
# Check for errors
if put_package_response.status_code == 200:
    # Success
    response=put_package_response.json()
    print(f"View URL:{response['viewUrl']}")
    print(f"Date Published:{response['dataset']['datePublished']}")
    print(f"Files In Dataset:{response['dataset']['distribution']}")
else:
   # There was an error
   print(put_package_response.text)

Check the results for the added metadata attribute.

In [None]:
get_packages_url = "{}{}".format(base,endpoint)
get_packages_response = requests.get(get_packages_url, 
    headers={"Authorization":header_authorization})

if get_packages_response.status_code == 200:
   #Success
   print(get_packages_response.json())
else:
   # There was an error
   print(get_packages_response.text)