# APIs and Datasets for Scholarly Publications 

Using [lens.org](https://www.lens.org/) API for scholarly data.

## Prerequisites

1. Clone the [GitHub repository](https://github.com/kaust-library/using_lens_org): https://github.com/kaust-library/using_lens_org
1. Create your virtual environment: `python -m venv venv`.
1. Activate your environment: `. .\venv\Scripts\activate`. (Windows platform).
1. Install the required packages: `pip install -r requirements.txt`.

## Loading the Packages and Env File

Load the packages

In [1]:
import dotenv as DE
import os as OS
import requests as RQ
import pprint as PP
import json as JN
import csv as CSV

You may need to create a `.env` file with your _token_ on _root_ directory of your project

```
(venv) PS C:\Users\garcm0b\Work\lens_org> cat .env
MY_TOKEN=(...)
(venv) PS C:\Users\garcm0b\Work\lens_org>
```

Make sure that your `.env` file is in the `.gitignore` file, so we will not upload your credential by accident.

In [2]:
DE.load_dotenv()
api_passwd = OS.environ['MY_TOKEN']

## Examples

Using the `requests` library to test the Lens.org API. We use a singple example from the Swager API test page.

In [3]:
headers= {"Authorization": api_passwd, "Content-Type": "application/json"}

payload = '''
{
  "query": {
    "match": {
      "title": "Malaria"
    }
  },
  "size": 5,
  "from": 0,
  "include": [
    "title",
    "lens_id",
    "patent_citations_count"
  ],
  "sort": [
    {
      "created": "desc"
    },
    {
      "year_published": "asc"
    }
  ],
  "exclude": null,
  "scroll": null,
  "scroll_id": null
}
'''

rr = RQ.post('https://api.lens.org/scholarly/search', data=payload, headers=headers)

After the query, we check if our request was successful or not by checking the `status_code`. The value `200` means a valid answer from the server, and [any other value](https://docs.api.lens.org/getting-started.html#http-responses) means an error. Next we print the result of the query:

In [4]:
if rr.status_code == 200:
    print(f"Your request was successfull")
    PP.pprint(rr.text)
else:
    print(f"Something went wrong. The return code was '{rr.status_code}'")

Your request was successfull
('{"total":107394,"data":[{"lens_id":"065-560-633-197-40X","title":"A '
 'COMPARATIVE STUDY OF THE DIFFERENT DIAGNOSTICS OF DETECTING MALARIA AND '
 'TYPHOID FEVER"},{"lens_id":"191-605-211-943-569","title":"The effectiveness '
 'of the prevention and control methods applied towards the elimination of '
 'malaria in Botswana"},{"lens_id":"111-730-442-279-483","title":"Quantifying '
 'the impact of interventions against Plasmodium vivax malaria: a model for '
 'country-specific use"},{"lens_id":"006-242-871-547-58X","title":"Malaria '
 'Control by Mass Drug Administration with Artemisinin plus Piperaquine on '
 'Grande Comore Island, Union of '
 'Comoros"},{"lens_id":"040-464-556-289-935","title":"X marks the shot against '
 'malaria"}],"results":5}')


In [5]:
payload = '''{
     "query": {
        "bool": {
            "must": [
                {
                    "query_string": {
                        "query": "catalyzed",
                        "fields": [
                            "title",
                            "abstract",
                            "full_text"
                        ],
                        "default_operator": "or"
                    }
                }
            ],
            "filter": [
                {
                    "term": {
                        "has_abstract": true
                    }
                }
            ]
        }
    },
     "size": 10
}
'''

In [6]:
rr = RQ.post('https://api.lens.org/scholarly/search', data=payload, headers=headers)

The function below is to extract the first and last name of the author(s). The problem is that in the answer, the authors is a [structure with several fields](https://docs.api.lens.org/response-scholar.html#author), like affiliations, ids, initials, etc. Here we just want the name, and in the case of more than one author, we use a different character (`;`) so we don't mix with commas separating the fields.

In [7]:
def get_authors(count: int, aulist: list) -> str:
    """
    Return the author's first and lastname.
    """
    
    if count == 1:
        return aulist[0]['first_name'] + " " + aulist[0]['last_name']
    else:
        names = ""
        for aa in aulist:
            names += aa['first_name'] + " " + aa['last_name'] + "; "
        # hack: remove the last '; '.
        names = names[:-2]
    
        return names



We use the method [`loads`](https://docs.python.org/3/library/json.html#json.loads) to read the output from our request into a JSON object. Next we will save the JSON items as a CSV file.

In [8]:
text = JN.loads(rr.text)
data = text['data']

fields = ['lens_id', 'title', 'year_published', 'authors', 'abstract']
row_csv = []

for dd in data:
    row = {}
    for ff in fields:
        row.update({ff: dd[ff]})
    # print(dd['author_count'])
    row['authors'] = get_authors(dd['author_count'], dd['authors'])
    row_csv.append(row)

with open('metadata.csv', "w", newline="", encoding='utf-8') as csvfile:
    writer = CSV.DictWriter(csvfile, fieldnames=fields)
    writer.writeheader()
    writer.writerows(row_csv)

The number of fields varies for the records, so we will use only a few fields
