# About

The purpose of this document is to create a Dataverse API testing notebook. These tests are being run against Dataverse v5.13.

See [_about_dataverseTest.md](./_about_dataverseTest.md) for information about configuring and running this notebook, and the technical details about the notebook (since we didn't want to bog down the notebook with instructions if you know Python).

See the `CHANGELOG.md` file for issues needing to be addressed.

## Create a Dataverse Collection

### Configuration

Using the Dataverse starter object `DATAVERSE_COLLECTION_START` in our configuration file we will create a new collection through the API https://guides.dataverse.org/en/5.13/api/native-api.html#create-a-dataverse-collection. Luckily we do not need to follow the API documentation that instructs users to create a separate JSON file for use with the API endpoint. Since we added the JSON to our main configuration file we can simply reference the object in the `json` parameter of our request. We will place this collection under the root 'parent' collection.

### Retrieving our collection info

Since we already have our starter collection information defined in our main `_config_dataverseTest.json` file, there is no need to save the collection information sent back from the creation of our collection. We can always use the `ViewCollection()` method in our worker script to retrieve the collection information as long as we at least know our collection alias. 

### Issue

Note: If you use a GET request instead of a POST request to the API endpoint, the action may appear to be successful but it will simply be returning the Dataverse collection of the main parent collection, and NOT create a new collection for you.

In [1]:
# *** RESTART THE NOTEBOOK KERNEL IF YOU MAKE EDITS TO THE _worker_modTest.py script or configuration ***

# run the _installer_dataverseTest.py script and import our _worker_modTest.py script
import _installer_dataverseTest
# %load_ext autoreload  # do not use this with the 'logger' plugin otherwise duplicate logging messages will appear
# %autoreload all
from _worker_dataverseTest import Worker
# we need the 'autoreload' above if we are actively making changes to the worker.py module and want to reload any changes to the module without restarting the notebook kernel
# NOTE: if we make changes to the worker script or configuration we need to rerun this code block for the notebook to use the new edits
objWorker = Worker("_config_dataverseTest.json") # initialize our Worker object; we should only need to call this once for the notebook session (working with 'demo' configuration)

Collecting DvApiMod_pip_package@ git+https://github.com/kuhlaid/DvApiMod5.13.git
  Cloning https://github.com/kuhlaid/DvApiMod5.13.git to /tmp/pip-install-d32hu4ca/dvapimod-pip-package_a56923c3d5e9489b812a70e0edac6176


  Running command git clone --filter=blob:none --quiet https://github.com/kuhlaid/DvApiMod5.13.git /tmp/pip-install-d32hu4ca/dvapimod-pip-package_a56923c3d5e9489b812a70e0edac6176


  Resolved https://github.com/kuhlaid/DvApiMod5.13.git to commit fdb481afb1262b9de98221b314cf29365eb44554
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: DvApiMod_pip_package
  Building wheel for DvApiMod_pip_package (setup.py): started
  Building wheel for DvApiMod_pip_package (setup.py): finished with status 'done'
  Created wheel for DvApiMod_pip_package: filename=DvApiMod_pip_package-1.0-py3-none-any.whl size=1993 sha256=4b2a84a8668d7e55086ef018dc68456d90823abd626a8ab8878fee9fd3b1f94e
  Stored in directory: /tmp/pip-ephem-wheel-cache-tv_4ubu2/wheels/2d/2b/6a/e26e442182023f1df598097e62088495e948911c707b25907b
Successfully built DvApiMod_pip_package
Installing collected packages: DvApiMod_pip_package
Successfully installed DvApiMod_pip_package-1.0


2024-08-27 20:34:46 EST DvApiMod [INFO] Finished ObjDvApi init
2024-08-27 20:34:46 EST _worker_dataverseTest [INFO] Finished installing and importing modules for the _config_dataverseTest.json environment


## About the notebook code

The code blocks in this notebook are intentionally brief because most users are not concerned with what the code looks like (at least initially). If you want to know what the scripts do then review the .py files that we imported into this notebook. However we will briefly describe a line of code so you have a general idea of what is happening behind the scenes.

The `objWorker.ObjDvApi.DvCreateCollection()` command for example, runs the `DvCreateCollection()` method, which is found in the `ObjDvApi` object, and makes a Dataverse API request to create a new repository/collection. The `ObjDvApi` is simply defined in an external Python file which contains reusable methods for working with the Dataverse API. We use this same class for all of our datasets, so keeping the methods in a single file for reuse is better than manually adding into the code of each of our datasets and making our working code script more densely worded than it needs to be.

### The objWorker

The `objWorker` is the object that we customize for each dataset and simply acts as a template for importing different classes/objects we want to attach to it. For instance, we attach the `ObjDvApi` to our `objWorker` object so whatever functionality exists in the `ObjDvApi` class can be used in our `objWorker` class. The `.` between `objWorker.ObjDvApi` simply represents that `ObjDvApi` is an extension of `objWorker`. An analogy would be adding a dustpan to a broom (or `broom.dustpan`) to extend the functionality of the broom, so the broom can now be used to pick up dust and not simply push it around.

Below are some simple code commands to set up a Dataverse collection.

In [None]:
objWorker.ObjDvApi.createCollection()  # initialize a new collection

In [2]:
objWorker.ObjDvApi.viewCollection()  # view information on our dataverse collection

2024-08-27 20:34:46 EST DvApiMod [INFO] start viewCollection
2024-08-27 20:34:46 EST DvApiMod [INFO] making request: https://demo-dataverse.rdmc.unc.edu/api/dataverses/jocoknow
2024-08-27 20:34:46 EST DvApiMod [INFO] ----------------------------------------
2024-08-27 20:34:46 EST DvApiMod [INFO] response status=200
2024-08-27 20:34:46 EST DvApiMod [INFO] headers={'Date': 'Wed, 28 Aug 2024 00:36:20 GMT', 'Server': 'Apache/2.4.37 (Rocky Linux) OpenSSL/1.1.1k', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'PUT, GET, POST, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Accept, Content-Type, X-Dataverse-Key, Range', 'Access-Control-Expose-Headers': 'Accept-Ranges, Content-Range, Content-Encoding', 'Content-Type': 'application/json;charset=UTF-8', 'Content-Length': '424', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive'}
2024-08-27 20:34:46 EST DvApiMod [INFO] json={'status': 'OK', 'data': {'id': 391, 'alias': 'jocoknow', 'name': 'JoCoKnow', 'affilia

In [None]:
# objWorker.ObjDvApi.deleteCollection()  # delete our dataverse collection

In [None]:
objWorker.ObjDvApi.viewCollectionContents()  # view dataverse collection contents

## Create a dataset

Using the https://guides.dataverse.org/en/5.13/_downloads/4e04c8120d51efab20e480c6427f139c/dataset-create-new-all-default-fields.json referenced in https://guides.dataverse.org/en/5.13/api/native-api.html#create-a-dataset-in-a-dataverse-collection, will be our dataset template. We simply add this JSON object to our `_config_dataverseTest.json` file under the `DATAVERSE_DATASET` constant.


In [None]:
objWorker.createDataset()  # create a partial dataset
# NOTE: we need the JSON object of the full dataset initialization (waiting for RDMC response)

In [None]:
objWorker.deleteDataset()  # delete dataset draft

## Create fake data

Next we need to create some files to test the API.

In [None]:
objWorker.createTestFiles("lstTEST_FILES")

## Adding files to the dataset

The Dataverse API guide is confusing when it comes to handling files, but we have designed the `ObjDvApi` class to handle this for you. However if you want to know how it works read on.

### Adding a file that does not exist in the dataset

If adding a new file (based on file name), that does not currently exist in the dataset, then use the `add file` API endpoint. 

### Replacing a file that exists in the dataset

#### A file with the same content exists in the dataset (regardless of metadata)

The Dataverse will not allow you to upload a file that currently exists in the dataset with the same MD5 checksum (same content), however you can replace the metadata for the file. To do this you must use the .

#### File with differing content (regardless of metadata)

If uploading a file that already exists in the dataset you should use the `file replace` API endpoint otherwise using the `add file` endpoint will create a duplicate file in your dataset (which you do not want).

When we upload a file to a dataset, it is advisable to check the MD5 hash of the file you are attempting to upload. Our `ObjDvApi` class handles this for you. If the MD5 hash is the same and you upload the file to the dataset, then a new file will be added to the dataset with a file name ending in a number. Thus you will end up with two duplicate files in the dataset with two different names (which you should not do). We have added an MD5 hash checking method to our `ObjDvApi` class that will check for matching MD5 hashes and will use the `file replace` API if files already exist in the dataset.

**Note: uploading new files (different MD5 hashes) to a dataset draft with existing files of the same names will result in duplicate files being added, so we need to use the `file replace` API instead for existing files.**

In [None]:
objWorker.uploadTestFiles("lstTEST_FILES") # initial list of files to upload

## Publish dataset

https://guides.dataverse.org/en/5.13/api/native-api.html#publish-a-dataset

In [None]:
objWorker.publishDatasetDraft("major") # we need to determine if the dataverse is published before trying to publish it again

## Dataset version test

Next we will create another set of test files and use them to update the dataset version.

In [None]:
objWorker.createTestFiles("lstTEST_FILES2")

In [None]:
# before we upload the replacement files we need to delete the files we do not want https://guides.dataverse.org/en/latest/api/native-api.html#deleting-files
# or delete after we upload

objWorker.uploadTestFiles("lstTEST_FILES2") # see if we can create a new version of the dataset with a different set of files

## List files in a dataset

Once we have added our new files to the dataset we want to see a list of all files in the draft (make sure to use one of the version specifiers listed in https://guides.dataverse.org/en/latest/api/native-api.html#dataset-version-specifiers).

In [None]:
objWorker.viewDatasetFiles(":draft")  # show dataset contents of our draft since this is the version we are interested in for testing

## Deleting files

This is supported in later versions of the Dataverse but not v5.13.
https://guides.dataverse.org/en/latest/api/native-api.html#deleting-files

One alternative that might work for v5.13 is to replace the files (https://guides.dataverse.org/en/5.13/api/native-api.html#replacing-files) with a dummy text file and give it an extension of `.empty` and no content.

Now we want to make sure can remove any old dataset files we no longer need. To do this we first need to upload the new set of files we want. Next we delete any files that are not in the latest list of files we want saved to our dataset.

In [None]:
objWorker.removeUnusedFiles("lstTEST_FILES2",":draft")

## Issue UNCDVSUP-38 (submitted on 8/17)

I’m trying to use the JSON from https://guides.dataverse.org/en/5.13/_downloads/4e04c8120d51efab20e480c6427f139c/dataset-create-new-all-default-fields.json to create a new dataset in http://demo-dataverse.rdmc.unc.edu .  However I am receiving an error that makes it seem that the JSON properties are incorrectly defined. Below is the response information (with the error message appearing in https://github.com/IQSS/dataverse.harvard.edu/issues/172 ):

json= {'status': 'ERROR', 'message': 'Error parsing Json: incorrect multiple   for field productionPlace'}
headers= {'Date': 'Sat, 17 Aug 2024 16:09:07 GMT', 'Server': 'Apache/2.4.37 (Rocky Linux) OpenSSL/1.1.1k', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Methods': 'PUT, GET, POST, DELETE, OPTIONS', 'Access-Control-Allow-Headers': 'Accept, Content-Type, X-Dataverse-Key, Range', 'Access-Control-Expose-Headers': 'Accept-Ranges, Content-Range, Content-Encoding', 'Content-Type': 'application/json;charset=UTF-8', 'Content-Length': '97', 'Connection': 'close'}
response status= 400

The JSON in question seems to be:

{
              "typeName": "productionPlace",
              "multiple": false,
              "typeClass": "primitive",
              "value": "ProductionPlace"
            },

The release notes for 5.13 state: 

Edit the following line to your schema.xml (to indicate that productionPlace is now multiValued='true"):

So I can’t tell if the UNC Dataverse schema simply needs updating or something else is going on. If I set "multiple": true, in the JSON then the response is:

json= {'status': 'ERROR', 'message': 'Error parsing Json: Invalid values submitted for productionPlace. It should be an array of values.'}

…but I do not know how to format the JSON for multiple values.

My Python method for creating the dataset is using POST so that should not be the issue.

def DvCreateDataset(self):
        print("start DvCreateDataset")
        strApiEndpoint = '%s/api/dataverses/%s/datasets' % (self.strDATAVERSE_DOMAIN, self._config["DATAVERSE_COLLECTION_START"]["alias"])
        print('making request: %s' % strApiEndpoint)
        objHeaders = {
            "Content-Type": "application/json",
            "X-Dataverse-Key": self.strDATAVERSE_API_TOKEN
        }
        r = requests.request("POST", strApiEndpoint, json=self._config["DATAVERSE_DATASET"], headers=objHeaders)
        self.printResponseInfo(r)
        print("end DvCreateDataset")