## Why is it important to use Web APIs for research?

Web APIs help automate access to research data and metadata. This enables reproducibility, automation of data pipelines, and programmatic interaction with repositories like 4TU.ResearchData.

## REST APIs in a nutshell

A REST API is a web service that uses HTTP methods (GET, POST, etc.) to allow communication between clients and servers. Responses are usually in JSON format, making them easy to parse and reuse.

## 1. REUSE: Search and Download Datasets

### Get 10 datasets published after 01-01-2025 (via `curl`)

In [None]:
!curl "https://data.4tu.nl/v2/articles" | jq

## What is curl?

curl stands for **Client URL**. 

It’s a command-line tool that allows you to transfer data to or from a server using various internet protocols, most commonly HTTP and HTTPS.

It is especially useful for making API requests — you can send GET, POST, PUT, DELETE requests, upload or download files, send headers or authentication tokens, and more.

## Why curl works for APIs

REST APIs are based on the HTTP protocol, just like websites. When you visit a webpage, your browser sends a GET request and displays the HTML it gets back. When you use curl, you do the same thing, but in your terminal. For example: 

`curl https://data.4tu.nl/v2/articles` This sends an HTTP GET request to the 4TU.ResearchData API.

## Key reasons why curl is used:

It’s built into most Linux/macOS systems and easily installable on Windows.

Scriptable: usable in bash scripts, notebooks, automation.

Supports headers, query parameters, tokens, POST data, etc.

Can output to files (>, -o, -O) or pipe to processors like jq.

In [None]:

!curl "https://data.4tu.nl/v2/articles?limit=2&published_since=2024-07-25" > data.json

In [None]:
!curl "https://data.4tu.nl/v2/articles?limit=2&published_since=2024-07-25" | jq

### exercise : request datasets published from January 1st and show it in the screen

In [None]:
!curl "https://data.4tu.nl/v2/articles?item_type=3&limit=10&published_since=2025-01-01" | jq

### Get 10 software records published after 01-01-2025 (via `curl`)

In [None]:
!curl "https://data.4tu.nl/v2/articles?item_type=9&limit=10&published_since=2025-01-01" | jq

### Save dataset titles and DOIs to file (via `curl`)

In [12]:
!curl "https://data.4tu.nl/v2/articles?item_type=3&limit=10&published_since=2025-01-01" | jq '.[] | "* " + .title + " (" + .doi + ")"' > datasets.md

# output model:  * Mechanical failure in wind turbines (10.4121/example-doi)


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10593  100 10593    0     0   172k      0 --:--:-- --:--:-- --:--:--  175k


### Explain the anatomy of that ommand 

`jq '.[] | "* " + .title + " (" + .doi + ")"'`

- `jq` is a lightweight tool for processing JSON.

- `.[]` accesses each element of the returned JSON array.

For each item, the command:

- Adds a bullet point (* ),

- Appends the title of the dataset/article (.title),

- Appends the DOI in parentheses ((.doi)).

### Exercise: Save dataset title, DOI, and publication date (via `curl`)

#### Tips for Customizing the Output

- Use `+` to concatenate strings in jq.

- Wrap literal characters like (), [], — in quotes.

In [13]:
!curl "https://data.4tu.nl/v2/articles?item_type=3&limit=10&published_since=2025-01-01" | jq '.[] | "* " + .title + " (" + .doi + ") - " + .published_date ' > datasets.md



  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10593  100 10593    0     0  17544      0 --:--:-- --:--:-- --:--:-- 17538


## Search Datasets by Keyword

In [14]:
!curl --request POST  --header "Content-Type: application/json" --data '{ "search_for": "mechanical engineering" }' \https://data.4tu.nl/v2/articles/search | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 16449  100 16407  100    42   6587     16  0:00:02  0:00:02 --:--:--  6603
[1;39m[
  [1;39m{
    [0m[1;34m"id"[0m[1;39m: [0m[0;39m21708260[0m[1;39m,
    [0m[1;34m"uuid"[0m[1;39m: [0m[0;32m"2883a6e8-c3ed-44b8-9138-0a45c98bcbb9"[0m[1;39m,
    [0m[1;34m"title"[0m[1;39m: [0m[0;32m"Data underlying the publication: Engineering Complexity Beyond the surface - Discerning the Viewpoints, the Drivers and The Challenges"[0m[1;39m,
    [0m[1;34m"doi"[0m[1;39m: [0m[0;32m"10.4121/21708260.v1"[0m[1;39m,
    [0m[1;34m"handle"[0m[1;39m: [0m[0;90mnull[0m[1;39m,
    [0m[1;34m"url"[0m[1;39m: [0m[0;32m"https://data.4tu.nl/v2/articles/2883a6e8-c3ed-44b8-9138-0a45c98bcbb9"[0m[1;39m,
    [0m[1;34m"published_date"[0m[1;39m: [0m[0;32m"2024-12-30T11:27:28"[0m[1;39m,
    [0m[1;34m"thumb"[0m[1;

In [15]:
!curl --request POST  --header "Content-Type: application/json" --data '{ "search_for": "Nanomechanical String Resonators" }' \https://data.4tu.nl/v2/articles/search | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 14668  100 14616  100    52   6551     23  0:00:02  0:00:02 --:--:--  6574
[1;39m[
  [1;39m{
    [0m[1;34m"id"[0m[1;39m: [0m[0;90mnull[0m[1;39m,
    [0m[1;34m"uuid"[0m[1;39m: [0m[0;32m"467cfeab-a657-4508-922f-26b7acc031ee"[0m[1;39m,
    [0m[1;34m"title"[0m[1;39m: [0m[0;32m"Data underlying the publication Turbulence and added drag over acoustic liners"[0m[1;39m,
    [0m[1;34m"doi"[0m[1;39m: [0m[0;32m"10.4121/467cfeab-a657-4508-922f-26b7acc031ee.v1"[0m[1;39m,
    [0m[1;34m"handle"[0m[1;39m: [0m[0;90mnull[0m[1;39m,
    [0m[1;34m"url"[0m[1;39m: [0m[0;32m"https://data.4tu.nl/v2/articles/467cfeab-a657-4508-922f-26b7acc031ee"[0m[1;39m,
    [0m[1;34m"published_date"[0m[1;39m: [0m[0;32m"2024-03-22T11:05:46"[0m[1;39m,
    [0m[1;34m"thumb"[0m[1;39m: [0m[0;90mnull[0m[1;39m,

## Using a Token to Access Author Info (via `curl`)

#### Create the .env file in binder and copy and paste the token for demosntrations purposes 

`echo 'API_TOKEN="your_token_here"' > ~/.env`

`echo "Token loaded: ${API_TOKEN:0:5}..."`

`source ~/.env`


### Troubleshooting 

- Most probably we have to move to the terminal in binder to make it work because in the notebook it does not work

In [16]:
# Requires setting a token in a sourced .env file (maybe skip this step but mention it
!curl --request POST https://next.data.4tu.nl/v2/account/authors/search --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: application/json" --data '{ "search": "Aleksandra" }'  | jq > author_info.md

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   623  100   597  100    26   4356    189 --:--:-- --:--:-- --:--:--  4580


## Upload Datasets (POST Requests)

### Basic Upload

In [17]:
!curl -X POST https://next.data.4tu.nl/v2/account/articles  --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: application/json" --data '{ "title": "Example dataset" }' | jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   143  100   113  100    30   1546    410 --:--:-- --:--:-- --:--:--  1986
[1;39m{
  [0m[1;34m"location"[0m[1;39m: [0m[0;32m"https://next.data.4tu.nl/v2/account/articles/07137385-5e91-4ee6-8abd-5d8e435f5c45"[0m[1;39m,
[1;39m}[0m


### Upload with Author Metadata

In [18]:
!curl -X POST https://next.data.4tu.nl/v2/account/articles --header "Authorization: token ${API_TOKEN_NEXT}" --header "Content-Type: application/json" --data '{ "title": "Example dataset 2", "authors": [{ "first_name": "John", "full_name": "John Doe", "last_name": "Doe", "orcid_id": "0000-0003-4324-5350" }] }'| jq

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   262  100   113  100   149   2483   3274 --:--:-- --:--:-- --:--:--  5822
[1;39m{
  [0m[1;34m"location"[0m[1;39m: [0m[0;32m"https://next.data.4tu.nl/v2/account/articles/d7cc06fa-2c87-4f19-a438-ce9f9dfaf29f"[0m[1;39m,
[1;39m}[0m


### Upload Using YAML Metadata

In [None]:
!yq '.' example_metadata.yaml | curl -X POST https://next.data.4tu.nl/v2/account/articles -H "Authorization: token ${API_TOKEN_NEXT}" -H "Content-Type: application/json" -d @-

#### Command explanation:

`yq '.' example_metadata.yaml` : Converts example_metadata.yaml into JSON

- yq is a command-line tool to read/manipulate YAML (like jq is for JSON).

- `'.'` means "read the full YAML structure as-is".


`-d @-`

- `-d` sends data in the body of the POST request.

- `@-` means: read the request body from stdin (standard input), i.e., the piped-in JSON from yq.


## Motivation for Using Python :

Use case: Imagine a researcher is interested in getting the descriptions and categories of datasets uploaded in April 2025

Challenge: The description and categories are exposed if a dataset in specific is queried 

In [23]:
!curl -s "https://data.4tu.nl/v2/articles/fb26fd3f-ba3c-4cf0-8926-14768a256933" | jq

[1;39m{
  [0m[1;34m"files"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[1;34m"id"[0m[1;39m: [0m[0;90mnull[0m[1;39m,
      [0m[1;34m"uuid"[0m[1;39m: [0m[0;32m"2d0f97eb-12a5-4f6f-97e5-64cec92fb355"[0m[1;39m,
      [0m[1;34m"name"[0m[1;39m: [0m[0;32m"01_readme.txt"[0m[1;39m,
      [0m[1;34m"size"[0m[1;39m: [0m[0;39m14741[0m[1;39m,
      [0m[1;34m"is_link_only"[0m[1;39m: [0m[0;39mfalse[0m[1;39m,
      [0m[1;34m"is_incomplete"[0m[1;39m: [0m[0;39mfalse[0m[1;39m,
      [0m[1;34m"download_url"[0m[1;39m: [0m[0;32m"https://data.4tu.nl/file/fb26fd3f-ba3c-4cf0-8926-14768a256933/2d0f97eb-12a5-4f6f-97e5-64cec92fb355"[0m[1;39m,
      [0m[1;34m"supplied_md5"[0m[1;39m: [0m[0;90mnull[0m[1;39m,
      [0m[1;34m"computed_md5"[0m[1;39m: [0m[0;32m"90832c0ffb862cf2a1eba32017efd543"[0m[1;39m
    [1;39m}[0m[1;39m,
    [1;39m{
      [0m[1;34m"id"[0m[1;39m: [0m[0;90mnull[0m[1;39m,
      [0m[1;34m"uuid"[0m[1;39m: [0m[0;

### Get the description and categories of the datasets uploaded in April 2025

In [24]:
!curl -s "https://data.4tu.nl/v2/articles/fb26fd3f-ba3c-4cf0-8926-14768a256933" | jq -r '"Description: " + .description + "\nCategories: " + (.categories | map(.title) | join(", "))' > datasets_description_categories.md

### Bash Script: Loop Through UUIDs to Collect Metadata

In [None]:
!curl -s "https://data.4tu.nl/v2/articles?published_since=20250401&item_type=3&limit=10" | jq '.[] | {uuid: .uuid}' > article_ids.jsoncat article_ids.json | jq -r '.uuid' | while read uuid; do  curl -s "https://data.4tu.nl/v2/articles/$uuid"  | jq -r '"Description: " + .description + "\nCategories: " + (.categories | map(.title) | join(", "))' >> articles_full_metadata.md ; done

### Limitations of Bash Scripts

- Harder to debug or extend
- Tricky to structure or merge data
- Not ideal for large-scale automation

## Using the API with Python

See `get_description_categories_datasets_example.ipynb` for a full example using `requests`.

## Bonus: Using `connect4tu` Python Package

You can also use the [connect4tu](https://github.com/leilaicruz/connect4tu) package for a cleaner Python interface to the 4TU API.