# Dataverse search API

*As a next step, we need to learn how to query the complete DCCD dataverse...* 

## Listing all records

As explained by Laura the example command to query all dendro records, say between 1400-1800 with `curl` is this terminal command below. The json output can be captured like so in the `output.txt` file. 

    $ curl "https://dataverse.nl/api/search?q=dccd-periodEnd%3A%5B1400%20TO%201800%5D&start=0&per_page=100&subtree=dccd&type=dataset&metadata_fields=dccd:*" | jq > output.txt 

This works well, and provides meta data for the the first 100 records. An alternative to the `curl` command line tool might be the Python search API.  

Let's explore the Search API: https://guides.dataverse.org/en/6.0/api/search.html. At the bottom of this page we find an iteration example: 

```python
#!/usr/bin/env python
import urllib2
import json
base = 'https://demo.dataverse.org'
rows = 10
start = 0
page = 1
condition = True # emulate do-while
while (condition):
    url = base + '/api/search?q=*' + "&start=" + str(start)
    data = json.load(urllib2.urlopen(url))
    total = data['data']['total_count']
    print "=== Page", page, "==="
    print "start:", start, " total:", total
    for i in data['data']['items']:
        print "- ", i['name'], "(" + i['type'] + ")"
    start = start + rows
    page += 1
    condition = start < total
```

In [2]:
#!/usr/bin/env python
import urllib2
import json
base = 'https://demo.dataverse.org'
rows = 10
start = 0
page = 1
condition = True # emulate do-while
while (condition):
    url = base + '/api/search?q=*' + "&start=" + str(start)
    data = json.load(urllib2.urlopen(url))
    total = data['data']['total_count']
    print("=== Page", page, "===")
    print("start:", start, " total:", total)
    for i in data['data']['items']:
        print("- ", i['name'], "(" + i['type'] + ")")
    start = start + rows
    page += 1
    condition = start < total

ModuleNotFoundError: No module named 'urllib2'

## How about `pyDataverse` python package? 

**This seems outdated and no longer supported.**

Let's also try to use the `pyDataverse` package for downloading data. 

See: https://pydataverse.readthedocs.io/en/latest/user/basic-usage.html#download-and-save-a-dataset-to-disk

%pip install -U pyDataverse

In [2]:
from pyDataverse.api import NativeApi, DataAccessApi
from pyDataverse.models import Dataverse 

In [22]:
#base_url = 'https://dataverse.harvard.ed'
base_url = 'https://dataverse.nl'
api = NativeApi(base_url, api_version='v1')
data_api = DataAccessApi(base_url)

In [23]:
#DOI = "doi:10.7910/DVN/KBHLOD"
DOI = 'doi:10.34894/MSBW8A'
dataset = api.get_dataset(DOI)


In [24]:
files_list = dataset.json()['data']['latestVersion']['files']

In [25]:
files_list

[{'description': 'Lab logbook',
  'label': '2016003 Dorestad D16 Logboek.pdf',
  'restricted': False,
  'version': 1,
  'datasetVersionId': 29311,
  'dataFile': {'id': 376393,
   'persistentId': '',
   'filename': '2016003 Dorestad D16 Logboek.pdf',
   'contentType': 'application/pdf',
   'friendlyType': 'Adobe PDF',
   'filesize': 2951413,
   'description': 'Lab logbook',
   'storageIdentifier': 'file://1894e4bf037-544cfdd49456',
   'rootDataFileId': -1,
   'md5': '3ceeb18fe19526d0823e4128c805f14c',
   'checksum': {'type': 'MD5', 'value': '3ceeb18fe19526d0823e4128c805f14c'},
   'tabularData': False,
   'creationDate': '2023-07-13',
   'publicationDate': '2023-07-13',
   'fileAccessRequest': False}},
 {'description': 'Measurement series in stacked Heidelberg format',
  'label': '2016003 Dorestad meetreeksen 1 tot en met 10.fh',
  'restricted': False,
  'version': 1,
  'datasetVersionId': 29311,
  'dataFile': {'id': 376394,
   'persistentId': '',
   'filename': '2016003 Dorestad meetree

In [15]:
for file in files_list:
    filename = file["dataFile"]["filename"]
    file_id = file["dataFile"]["id"]
    print("File name {}, id {}".format(filename, file_id))
    response = data_api.get_datafile(file_id)
    with open(filename, "wb") as f:
        f.write(response.content)

File name 2016003 Dorestad D16 Logboek.pdf, id 376393
File name 2016003 Dorestad meetreeksen 1 tot en met 10.fh, id 376394


Almost there. Unfortunately these files are just text files that contain an error message: 

    {"status":"ERROR","code":404,"message":"API endpoint does not exist on this server. Please check your code for typos, or consult our API guide at http://guides.dataverse.org.","requestUrl":"https://dataverse.nl/api/v1/access/datafile/:persistentId/?persistentId=376393","requestMethod":"GET"}

In [27]:
#data_api??