# About

The purpose of this document is to create a Dataverse API testing notebook, in addition to providing curation workflow for data deposit into a repository (only the deposit and prep portion of the workflow). The tests written for this notebook are being run against Dataverse v5.13.

See [_about_dataverseTest.md](./_about_dataverseTest.md) for information about configuring and running this notebook, and the technical details about the notebook (since we didn't want to bog down the notebook with instructions if you know Python).

See the `CHANGELOG.md` file for issues needing to be addressed and recent changes.

In [1]:
# *** RESTART THE NOTEBOOK KERNEL IF YOU MAKE EDITS TO THE _worker_modTest.py script or configuration ***

# run the _installer_dataverseTest.py script and import our _worker_modTest.py script
import _installer_dataverseTest
# %load_ext autoreload  # do not use this with the 'logger' plugin otherwise duplicate logging messages will appear
# %autoreload all
from _worker_dataverseTest import Worker
# we need the 'autoreload' above if we are actively making changes to the worker.py module and want to reload any changes to the module without restarting the notebook kernel
# NOTE: if we make changes to the worker script or configuration we need to rerun this code block for the notebook to use the new edits
objWorker = Worker("_config_dataverseTest.json") # initialize our Worker object; we should only need to call this once for the notebook session (working with 'demo' configuration)

Collecting git+https://github.com/kuhlaid/DvApiMod5.13
  Cloning https://github.com/kuhlaid/DvApiMod5.13 to /tmp/pip-req-build-x1iqihy4


  Running command git clone --filter=blob:none --quiet https://github.com/kuhlaid/DvApiMod5.13 /tmp/pip-req-build-x1iqihy4


  Resolved https://github.com/kuhlaid/DvApiMod5.13 to commit fb7cf1a5d78819b0b4bef8bf3200e42c4e968c19
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'


2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] Finished ObjDvApi init
2024-09-16 16:57:27 EST _worker_dataverseTest [INFO] Finished installing and importing modules for the _config_dataverseTest.json environment


In [None]:
! python --version

## About the notebook code

The code blocks in this notebook are intentionally brief because most users are not concerned with what the code looks like (at least initially). If you want to know what the scripts do then review the .py files that we imported into this notebook. However we will briefly describe a line of code so you have a general idea of what is happening behind the scenes.

The `objWorker.ObjDvApi.DvCreateCollection()` command for example, runs the `DvCreateCollection()` method, which is found in the `ObjDvApi` object, and makes a Dataverse API request to create a new repository/collection. The `ObjDvApi` is simply defined in an external Python file which contains reusable methods for working with the Dataverse API. We use this same class for all of our datasets, so keeping the methods in a single file for reuse is better than manually adding into the code of each of our datasets and making our working code script more densely worded than it needs to be.

### The objWorker

The `objWorker` is the object that we customize for each dataset and simply acts as a template for importing different classes/objects we want to attach to it. For instance, we attach the `ObjDvApi` to our `objWorker` object so whatever functionality exists in the `ObjDvApi` class can be used in our `objWorker` class. The `.` between `objWorker.ObjDvApi` simply represents that `ObjDvApi` is an extension of `objWorker`. An analogy would be adding a dustpan to a broom (or `broom.dustpan`) to extend the functionality of the broom, so the broom can now be used to pick up dust and not simply push it around.

Below are some simple code commands to set up a Dataverse collection.

## Create a Dataverse Collection

### Configuration

The `_config_dataverseTest.json` file contains a Dataverse starter object `objDvApi_COLLECTION_START` to create a new Dataverse collection. Luckily we do not need to follow the API documentation that instructs users to create a separate JSON file for use with the API endpoint. Since we added the JSON to our main configuration file we can simply reference the object in the `json` parameter of our request. We will place this collection under the root 'parent' collection.

### Retrieving our collection info

Since we already have our starter collection information defined in our main `_config_dataverseTest.json` file, there is no need to save the collection information sent back from the creation of our collection. We can always use the `ViewCollection()` method in our worker script to retrieve the collection information as long as we at least know our collection alias.

In [None]:
objWorker.createCollection()  # initialize a new collection

In [None]:
objWorker.viewCollection()  # view information on our dataverse collection

In [None]:
objWorker.deleteCollection()  # delete our dataverse collection

In [None]:
objWorker.viewCollectionContents()  # view dataverse collection contents

## Create a dataset

Using the https://guides.dataverse.org/en/5.13/_downloads/4e04c8120d51efab20e480c6427f139c/dataset-create-new-all-default-fields.json referenced in https://guides.dataverse.org/en/5.13/api/native-api.html#create-a-dataset-in-a-dataverse-collection, will be our dataset template. We simply add this JSON object to our `_config_dataverseTest.json` file under the `DATAVERSE_DATASET` constant.


In [2]:
objWorker.createDataset("objDvApi_DATASET_INIT_PART")  # create a dataset

2024-09-16 16:57:27 EST _worker_dataverseTest [INFO] start createDataset
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] start createDataset
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] making request: https://demo-dataverse.rdmc.unc.edu/api/dataverses/jocoknowfd3/datasets
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] ----------------------------------------
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] response status=201
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] headers={'Date': 'Mon, 16 Sep 2024 20:57:34 GMT', 'Server': 'Apache/2.4.37 (Rocky Linux) OpenSSL/1.1.1k', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'PUT, GET, POST, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Accept, Content-Type, X-Dataverse-Key, Range', 'Access-Control-Expose-Headers': 'Accept-Ranges, Content-Range, Content-Encoding', 'Location': 'https://demo-dataverse.rdmc.unc.edu/datasets/505', 'Content-Type': 'application/json;charset=UTF-8', 'Content-Len

In [None]:
objWorker.deleteDataset()  # delete dataset draft

## Create fake data

Next we need to create some files to test the API.

In [None]:
objWorker.createTestFiles("lstTEST_FILES")

## Adding files to the dataset

The Dataverse API guide is confusing when it comes to handling files, but we have designed the `ObjDvApi` class to handle this for you. However if you want to know how it works read on.

### Adding a file that does not exist in the dataset

If adding a new file (based on file name), that does not currently exist in the dataset, then use the `add file` API endpoint. 

### Replacing a file that exists in the dataset

#### A file with the same content exists in the dataset (regardless of metadata)

The Dataverse will not allow you to upload a file that currently exists in the dataset with the same MD5 checksum (same content), however you can replace the metadata for the file. To do this you must use the .

#### File with differing content (regardless of metadata)

If uploading a file that already exists in the dataset you should use the `file replace` API endpoint otherwise using the `add file` endpoint will create a duplicate file in your dataset (which you do not want).

When we upload a file to a dataset, it is advisable to check the MD5 hash of the file you are attempting to upload. Our `ObjDvApi` class handles this for you. If the MD5 hash is the same and you upload the file to the dataset, then a new file will be added to the dataset with a file name ending in a number. Thus you will end up with two duplicate files in the dataset with two different names (which you should not do). We have added an MD5 hash checking method to our `ObjDvApi` class that will check for matching MD5 hashes and will use the `file replace` API if files already exist in the dataset.

**Note: uploading new files (different MD5 hashes) to a dataset draft with existing files of the same names will result in duplicate files being added, so we need to use the `file replace` API instead for existing files.**

In [None]:
objWorker.uploadTestFiles("lstTEST_FILES") # initial list of files to upload

## Update dataset metadata

In [3]:
objWorker.updateDatasetMetadata("objDvApi_DATASET_UPDATE")  # update dataset metadata

2024-09-16 16:57:27 EST _worker_dataverseTest [INFO] start updateDatasetMetadata
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] start updateDatasetMD
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] making request: https://demo-dataverse.rdmc.unc.edu/api/datasets/:persistentId/versions/:draft?persistentId=doi:10.33563/DMO/ZM0DF7
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] ----------------------------------------
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] response status=200
2024-09-16 16:57:27 EST DvApiMod_pip_package [INFO] headers={'Date': 'Mon, 16 Sep 2024 20:58:04 GMT', 'Server': 'Apache/2.4.37 (Rocky Linux) OpenSSL/1.1.1k', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'PUT, GET, POST, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Accept, Content-Type, X-Dataverse-Key, Range', 'Access-Control-Expose-Headers': 'Accept-Ranges, Content-Range, Content-Encoding', 'Content-Type': 'application/json;charset=UTF-8', 'Keep-Alive': 'timeout

## Publish dataset

https://guides.dataverse.org/en/5.13/api/native-api.html#publish-a-dataset

In [None]:
objWorker.publishDatasetDraft("major") # we need to determine if the dataverse is published before trying to publish it again

## Dataset version test

Next we will create another set of test files and use them to update the dataset version.

In [None]:
objWorker.createTestFiles("lstTEST_FILES2")

## Create empty draft dataset

This is the best action to take when you want to ensure a clean dataset draft with no files from the last published version of the dataset needing to be replaced by incoming files. The `createEmptyDatasetDraft` method uploads an empty file to the dataset to initiate a new draft state for the dataset (if it does not already exist), then queries the dataset for the current files found in the draft state, then removes all files in the draft state. From here you have an empty dataset draft for importing the newest files for your dataset.

As some background, this method was created to resolve the problem of some types of files (such as zip files) failing to replace existing zip files in the dataset. This problem may occur with other files as well so simply using this method will aleviate the need for file replacement and issues with that API endpoint.

In [None]:
objWorker.createEmptyDatasetDraft()

In [None]:
objWorker.uploadTestFiles("lstTEST_FILES2")

## List files in a dataset

Once we have added our new files to the dataset we want to see a list of all files in the draft (make sure to use one of the version specifiers listed in https://guides.dataverse.org/en/latest/api/native-api.html#dataset-version-specifiers).

In [None]:
objWorker.viewDatasetFiles(":draft")  # show dataset contents of our draft since this is the version we are interested in for testing