# WoS Citation Network Generation
Eploring how to use the WoS to recapitulate the citation network we made with Semantic Scholar's API. Based off of a [WoS API code snippet](https://github.com/clarivate/wos_api_usecases/blob/main/citation_report_for_larger_datasets/main.py)

In [20]:
import requests
from bs4 import BeautifulSoup

In [3]:
# Import API key. This must be requested from https://www.semanticscholar.org/product/api#api-key; we save ours in an untracked file in data and import here
import sys
sys.path.append('../data/')
from wos_api_key import API_KEY
header = {'X-ApiKey': API_KEY}

In [4]:
SEARCH_QUERY = 'TS=desiccation tolerance'

Took me a bit to find what the allowable search abbreviations are, got them in an error message, pasting them here for future reference: `Allowed tags are AI, AU, CS, DO, DT, IS, OG, PG, PMID, PY, SO, SUR, TI, TS, UT, VL.` I have no idea what most of these stand for (an educated guess says that `TS` is Topic Search, but it doesn't return as many results as the WoS GUI does for the same search term). **UPDATE:** Found the [documentation](https://api.clarivate.com/swagger-ui/?url=https%3A%2F%2Fdeveloper.clarivate.com%2Fapis%2Fwos-starter%2Fswagger%3FforUser%3D7bfe29c46eafaa67d592cdb3e556cd4e2c6618e8)! It contains definitions of these abbreviations.

Playing around with different request strings to try and get abstracts. Since the string from the code snippet I've been working off for this API looks relatively similar to the one from Semantic Scholar, I'm going to just try using some of the same keywords.

In [5]:
initial_request = requests.get(f'https://api.clarivate.com/apis/wos-starter/v1/documents?q={SEARCH_QUERY}'
                               f'&limit=50&page=1&db=WOS&fields=title,abstract,references',
                               headers=header).json()

When I tried the above request, I got the following error message:

```
The 'fields' parameter is not valid for the response. Possible parameters: db, detail, limit, modifiedTimeSpan, page, q, sortField, tcModifiedTimeSpan.
```
Maybe `detail` is what I want? But I'm not sure what I'm allowed to pass there.

In [6]:
initial_request = requests.get(f'https://api.clarivate.com/apis/wos-starter/v1/documents?q={SEARCH_QUERY}'
                               f'&limit=50&page=1&db=WOS&detail=title,abstract,references',
                               headers=header).json()

Tried passing the same fields but with the keyword `detail` instead, got this:
```
The value 'title,abstract,references' is not valid for the selected filter 'detail'. Allowed values are 'short' and 'full'.
```
I wonder what the default is? Maybe full will give me the full abstract and references as part of the response.

In [7]:
initial_request = requests.get(f'https://api.clarivate.com/apis/wos-starter/v1/documents?q={SEARCH_QUERY}'
                               f'&limit=50&page=1&db=WOS&detail=full',
                               headers=header).json()

It definitely does not, and some googling makes it seem like you need the Expanded API to be able to access any of the text of a paper, including the abstract. I've requested access, but that took several weeks previously and I also am not sure if MSU's subscription includes it. Since I only really need the references from WoS, as a workaround, I'm going to pull the DOI's from WoS, and use Semantic Scholar to get the abstracts to be able to do the classification. I also need to figure out how to get the DOI's of the references out of the URL provided.

In [18]:
ref_request = requests.get('https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=desiccation-network&SrcAuth=WosAPI&KeyUT=WOS:A1970F829300010&DestLinkType=CitedReferences&DestApp=WOS',
                          headers=header)

Unfortunately, the documentation is not any help in figuring out what to do with the reference URLs that are provided in the response.

In [21]:
soup = BeautifulSoup(ref_request.text, 'html.parser')

In [23]:
soup.get_text()

'\n\nClarivate\n\n\n\n\n\n\n\n\n'

Well, this looks like a dead end. From the documentation for the Extended API, it looks like you can get references & citations easily with the Extended API, and from this result, it sort of looks like I'm not going to be able to simply scrape the references from the provided link from the Starter API. This is super disappointing; I'll have to wait to hear back from Clarviate to see if I'm approved for the Extended API.

In [27]:
initial_request = requests.get(f'https://api.clarivate.com/apis/wos-starter/v1/documents?q={SEARCH_QUERY}'
                               f'&limit=50&page=1&db=WOS',
                               headers=header)
initial_json = initial_request.json()
print(initial_json)
documents_found = initial_json['metadata']['total']
print(f'Total Web of Science documents found: {documents_found}')
requests_required = ((documents_found - 1) // 50) + 1
print(f'Web of Science Starter API requests required to retrieve them: {requests_required}')
for wos_document in initial_json['hits']:
    print(wos_document.keys())
#     retrieve_key_fields(wos_document)

# If the number of required API requests is more than 1, subsequent API requests are being sent
if requests_required > 1:
    for i in range(1, requests_required):
        subsequent_request = requests.get(f'https://api.clarivate.com/apis/wos-starter/v1/documents?q={SEARCH_QUERY}'
                                          f'&limit=50&page={i+1}&db=WOS',
                                          headers={'X-ApiKey': STARTER_APIKEY})
        subsequent_json = subsequent_request.json()
        for wos_document in subsequent_json['hits']:
            print(wos_document.keys())
#             retrieve_key_fields(wos_document)
        print(f'{i+1} of {requests_required} processed')

{'metadata': {'total': 6680, 'page': 1, 'limit': 50}, 'hits': [{'uid': 'WOS:A1956WX61800036', 'title': 'THE TOLERANCE OF THE DRYWOOD TERMITE, KALOTERMES-MINOR HAGEN, TO DESICCATION', 'types': ['Article'], 'sourceTypes': ['Note'], 'source': {'sourceTitle': 'JOURNAL OF ECONOMIC ENTOMOLOGY', 'publishYear': 1956, 'volume': '49', 'issue': '4', 'pages': {'range': '553-554', 'begin': '553', 'end': '554', 'count': 2}}, 'names': {'authors': [{'wosStandard': 'PENCE, RJ', 'researcherId': 'DNO-0499-2022'}]}, 'links': {'record': 'https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=desiccation-network&SrcAuth=WosAPI&KeyUT=WOS:A1956WX61800036&DestLinkType=FullRecord&DestApp=WOS', 'references': 'https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=desiccation-network&SrcAuth=WosAPI&KeyUT=WOS:A1956WX61800036&DestLinkType=CitedReferences&DestApp=WOS', 'related': 'https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=desiccation-network&SrcAuth=WosAPI&KeyUT=WOS:A1956WX61800036&DestLi

NameError: name 'STARTER_APIKEY' is not defined