<a href="https://colab.research.google.com/github/Jask-Code/Research-Analytics/blob/master/WOS_API_Automation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple tutorial on Web of Science API
For more information and help, please refere to the following [link](http://help.incites.clarivate.com/wosWebServicesLite/WebServicesLiteOverviewGroup/Introduction.html)


# Install XMLTODICT Converter
We need to install this converter from XML to Dict. It is much easier to work with Dicts.



In [8]:
 
!pip install xmltodict

Collecting xmltodict
  Downloading https://files.pythonhosted.org/packages/28/fd/30d5c1d3ac29ce229f6bdc40bbc20b28f716e8b363140c26eff19122d8a5/xmltodict-0.12.0-py2.py3-none-any.whl
Installing collected packages: xmltodict
Successfully installed xmltodict-0.12.0


# Import requirements

In [0]:
import json
import requests
import base64
import xmltodict


#Authentication
Web of Knowledge (ISI) is providing subscribers with unique username and passwords for authentication. This authentication is dedicated to one or more WOS services. In the following, we present the steps to use WOS Lite subscrption. 

In [10]:
# get a username and password in the following format   [ USERNAME:PASSWORD ]
userdata = raw_input("Authentication in this format [USERNAME:Password]->")

Authentication in this format [USERNAME:Password]->KING_HG:Welcome#21


# Encoding [USERNAME:PASSWORD]

WOS API expects client-based username and password encoding using Base64-encoded string in HTTP Header. Therefore, we need to carryout this process before sending our authentication request to the WOS server. 

The encoded username:password will be added to the header, as well as the type of soap request. The Header part will have the following 
headers = {'content-type': 'application/soap+xml', 'Authorization': 'Basic ' + str(endata)}

1.   content-type  : application/soap+xml
2.   Authorization : Basic XXXXXXXXXXXXXXX

where XXXXXXX are the encoded username:password



In [0]:
#prepare authentication using base64 encoding:
endata = base64.b64encode(userdata.encode("utf-8"))

# Authentication Request Component

##Client steps
1.   Endpoint Link (WSDL)
2.   set request Header
3.   Send authentication request (fixed)

##Server Expected response
1. Session ID should be send back from the server


More information is in [here](http://help.incites.clarivate.com/wosWebServicesLite/AuthenticationGroup/Authentication/Username_PW_Authentication.html)




In [0]:
#Authentication Endpoint
Aurl="http://search.webofknowledge.com/esti/wokmws/ws/WOKMWSAuthenticate?WSDL"

In [0]:
# Header content
headers = {'content-type': 'application/soap+xml', 'Authorization': 'Basic ' + str(endata)}


In [0]:
# Authentication request SOAP Envelope (XML)
body = """<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:auth="http://auth.cxf.wokmws.thomsonreuters.com">
   <soapenv:Header/>
   <soapenv:Body>
      <auth:authenticate/>
   </soapenv:Body>
</soapenv:Envelope>"""



# Ready to send the request
After you set all the above information, now you are ready to send the authentication request. This should establish a session. A session ID should be returned from the Server. Then, it can be used to access WOS research data. 

In [49]:
# Send an authentication request and check
response = requests.post(Aurl,data=body,headers=headers)

# Check successful
if response.status_code:
  print('response received successfully')
  flag = True
else:
   flag = False
   print("Fail authentication, check your username and passowrd. Remember they're case senstive")

response received successfully


# Session ID
The response will contain a session ID. This ID will be used for sending data requests as well as end the current session.  

More information is in [here](http://help.incites.clarivate.com/wosWebServicesLite/AuthenticationGroup/Authentication/2685-TRS.html) and a complete example of a request-response can be found [here](http://help.incites.clarivate.com/wosWebServicesLite/WebServiceOperationsGroup/WebServiceOperations/g1/authenticate.html)

In [50]:
# if the autentication successful
if flag:
  for itm in response.headers.items():
      if 'Cookie' in itm[0]:
        SID = itm[1]
        print ('Session ID:', SID[4:])

('Session ID:', '7FPLUQcqTl3HHucVsLy')


# Ready to get research data from Web of Knowledge

##Client side
This requires three steps:
1. Set the Endpoint URL
2. Set the Header of the request, Make sure to include a Cookie entry with a SID value ( session id)
3. prepare you data request. In this example we're trying to get data related to our instituion (King Fahd University of Petroluem and Minerals)

##Server side
the response will be a set of records (maximum 100 records) if our request was sucessful. Otherwise, we should receive server error. 




In [0]:
# prepare the url with sevice url
Qrul = 'http://search.webofknowledge.com/esti/wokmws/ws/WokSearchLite?wsdl'

In [0]:
# set header with session id (SID) value
headers = {'content-type': 'application/json', 'Cookie': SID}

In [0]:
# Write a proper query: This is an institutional level query
Dquery = """<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
 <soap:Header/>
 <soap:Body>
 <ns2:search xmlns:ns2="http://woksearchlite.v3.wokmws.thomsonreuters.com">
 <queryParameters>
 <databaseId>WOS</databaseId>
 <editions>
 <collection>WOS</collection>
 <edition>SCI</edition>
 </editions>
 <editions>
 <collection>WOS</collection>
 <edition>ISTP</edition>
 </editions>
  <editions>
 <collection>WOS</collection>
 <edition>SSCI</edition>
 </editions>
 <editions>
 <collection>WOS</collection>
 <edition>AHCI</edition>
 </editions>
 <editions>
 <collection>WOS</collection>
 <edition>ISSHP</edition>
 </editions>
  <editions>
 <collection>WOS</collection>
 <edition>ESCI</edition>
 </editions>
 <queryLanguage>en</queryLanguage>
 <timeSpan>
 <begin>1900-01-01</begin>
 <end>2019-10-26</end>
 </timeSpan>
 <userQuery>OG=King Fahd Univ*</userQuery>
 </queryParameters>
 <retrieveParameters>
 <count>100</count>
 <firstRecord>1</firstRecord>
 </retrieveParameters>
 </ns2:search>
 </soap:Body>
</soap:Envelope>"""

# Web of Science Core Collection Editions
the database id should be 'WOS', and there are several editions as shown in the [table](http://help.incites.clarivate.com/wosWebServicesLite/dbEditionsOptionsGroup/databaseEditionsWos.html) 


In [64]:
# send a query  and check if it is successful
response = requests.post(Qrul,data=Dquery,headers=headers)

# check successful
if response.status_code:
  print('response received')
  flag = True
else:
   flag = False
   print('Fail authentication')


response received


# parse response using xmltodict


In [0]:
# Convert successful response to dictionary
if flag:
  x= xmltodict.parse(response.content)
else:
  print('Not successful')

In [0]:
# dumps it to json
jdumps = json.dumps(x,indent=1)

In [0]:
# get loads of the data
jDict= json.loads(jdumps)

#Response Fields
Since each query returns in maximum a 100 record, we need to send multiple queries to get our data. Therefore, WSDL describes a retreive request where we need to increase the queryId counter each time, and set the 'firstRecord' tag to the start of the next 100 record. 

To extract the queryId, we can execute the following code:



In [69]:
# get query Id:
QueryId = jDict["soap:Envelope"]["soap:Body"]["ns2:searchResponse"]["return"]["queryId"]
print(QueryId)

1


# Retreive Request
This request is used to retrieve subsquent data from the previous query. Therefore, this request should contain the queryId and firstRecord and how many records should be retrieved next!

In [0]:
Dquery= """ <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
  <ns2:retrieve xmlns:ns2="http://woksearchlite.v3.wokmws.thomsonreuters.com">
    
    <queryId>"""QueryId"""</queryId>
    
    <retrieveParameters>
       <firstRecord>101</firstRecord>
       <count>100</count>
    </retrieveParameters>
    
  </ns2:retrieve>
</soap:Body>
</soap:Envelope>"""


In [29]:
# send a query  and check if it is successful
response = requests.post(Qrul,data=Dquery,headers=headers)

# check successful
if response.status_code:
  print('response received')
  flag = True
else:
   flag = False
   print('Fail authentication')

response received


In [70]:
# We can get the number of found records. Therefore, a loop can be developed to retrieve 
# all data belongs to an institution using retrieve. 
foundrecords = jDict["soap:Envelope"]["soap:Body"]["ns2:searchResponse"]["return"]["recordsFound"]
print(foundrecords)

19310


In [71]:
# This an auxiliary information to know how many records that process has searched in
searchedrecords = jDict["soap:Envelope"]["soap:Body"]["ns2:searchResponse"]["return"]["recordsSearched"]
print(searchedrecords)


71592216


In [72]:
# You can access a record at a time using the following format:
jDict["soap:Envelope"]["soap:Body"]["ns2:searchResponse"]["return"]["records"][1]

{u'authors': {u'label': u'Authors',
  u'value': [u'KHONDAKER, AN', u'ALLAYLA, RI', u'HUSAIN, T']},
 u'doctype': {u'label': u'Doctype', u'value': u'Review'},
 u'keywords': {u'label': u'Keywords',
  u'value': [u'AQUIFER PROPERTIES',
   u'CONTAMINANT TRANSPORT',
   u'GROUNDWATER CONTAMINATION',
   u'POROUS MEDIA',
   u'SOLUTE TRANSPORT',
   u'SUBSURFACE POLLUTION',
   u'TRANSPORT PARAMETERS',
   u'TRANSPORT THEORY']},
 u'other': [{u'label': u'Identifier.Ids', u'value': u'EQ660'},
  {u'label': u'Identifier.Issn', u'value': u'1040-838X'},
  {u'label': u'Identifier.Xref_Doi', u'value': u'10.1080/10643389009388399'},
  {u'label': u'ResearcherID.Disclaimer',
   u'value': u'ResearcherID data provided by Clarivate Analytics'}],
 u'source': [{u'label': u'Issue', u'value': u'4'},
  {u'label': u'Pages', u'value': u'231-256'},
  {u'label': u'Published.BiblioYear', u'value': u'1990'},
  {u'label': u'SourceTitle',
   u'value': u'CRITICAL REVIEWS IN ENVIRONMENTAL CONTROL'},
  {u'label': u'Volume', u'va

In [39]:
# Parse the content and get only author names
jDict["soap:Envelope"]["soap:Body"]["ns2:searchResponse"]["return"]["records"][1]["authors"]["value"]

[u'KHONDAKER, AN', u'ALLAYLA, RI', u'HUSAIN, T']

In [40]:
# Parse the content and get only research title
jDict["soap:Envelope"]["soap:Body"]["ns2:searchResponse"]["return"]["records"][1]["title"]["value"]

u'GROUNDWATER CONTAMINATION STUDIES - THE STATE-OF-THE-ART'

# close session
 Onece you're done with the data, you should close the session. To close an active session, we need to send a close request with SID added to the header

In [0]:
# Endpoint 
Aurl="http://search.webofknowledge.com/esti/wokmws/ws/WOKMWSAuthenticate?WSDL"

# Header content
headers = {'content-type': 'application/json', 'Cookie': SID}
# request data
Closerequest="""<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:auth="http://auth.cxf.wokmws.thomsonreuters.com">
   <soapenv:Header/>
   <soapenv:Body>
      <auth:closeSession/>
   </soapenv:Body>
</soapenv:Envelope>"""

In [74]:
# send a query  and check if it is successful
response = requests.post(Aurl,data=Closerequest,headers=headers)

# check successful
if response.status_code:
  print("session closed: ", response.text)
else:
   print('session closed, already')

('session closed: ', u'<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"><soap:Body><ns2:closeSessionResponse xmlns:ns2="http://auth.cxf.wokmws.thomsonreuters.com"/></soap:Body></soap:Envelope>')
