### Notes from Edwin

Henneken, Edwin

May 3, 2022
	

Hi Ryan

I created a sample script that queries the ADS API. All you need to do is replace TOKEN by your ADS API token in this line

APItoken = 'TOKEN'

You can find your token here (when you're logged in to your ADS account): https://ui.adsabs.harvard.edu/user/settings/token

The file takes an input ASCII file (I attached an example) which has on each line a file name and an ADS query, separated by a tab. The file name will be the name of the TSV file that will hold the results of the query; a date string will be included in the name of the TSV file, so if you execute this script, say, once a month, previous output won't be overwritten. In its current version, the TSV file has 6 columns:

1: bibcode
2: DOI
3: first author
4: article title
5: refereed status ("1" is refereed, "0" is nonrefereed)
6: open access status ("1" is open access, "0" is not open access)

The script is executed like

  python doADS_API_Query.py queries.txt

I hope this is useful.

--Edwin



### Open Questions

 1. How do we form queries over terms not yet in UAT?
 2. 
 
 
 
 

### Imports

In [1]:
# Import standard Python libraries
# import urllib2
import urllib
import urllib3
from urllib.parse import urlencode
import requests
import json
import sys
import math
import csv
from datetime import datetime

import pandas as pd


### Helper functions


In [88]:


# Execute a search query
    # Ryan method
def do_query(URL, params):
    qparams = urlencode(params)    
    data = requests.get("{}?{}".format(URL,qparams),\
                headers={'Content-type': 'application/json',
                         'Accept': 'text/plain',
                         'Authorization': 'Bearer ' + APItoken})
    
    data = data.json()
    return data

#     # Edwin method
# def do_query(URL, params):
#     qparams = urllib.parse.urlencode(params)
#     req = urllib.request.Request("%s?%s"%(URL, qparams))
#     # and add the correct header information
#     req.add_header('Content-type', 'application/json')
#     req.add_header('Accept', 'text/plain')
#     req.add_header('Authorization', 'Bearer %s' % APItoken)
#     # do the actual request
#     resp = urllib.request.urlopen(req)
#     # and retrieve the data to work with
#     data = json.load(resp)
#     return data

# Get records from Solr
def get_records(token, query_string, return_fields):
    start = 0
    results = []
    params = {
        'q':query_string,
        'fl': return_fields,
        'rows': rows,
        'start': start
    }
    data = do_query(QUERY_URL, params)
    try:
        results = data['response']['docs']
    except:
        raise Exception('Solr returned unexpected data!')
    num_documents = int(data['response']['numFound'])
    num_paginates = int(math.ceil((num_documents) / (1.0*rows))) - 1
    start += rows
    for i in range(num_paginates):
        params['start'] = start
        data = do_query(QUERY_URL, params)
        try:
            results += data['response']['docs']
        except:
            raise Exception('Solr returned unexpected data!')
        start += rows
    return results



### Code to run queries in 'queries.txt'

In [89]:
## Input parameters

# NOTE: 
#     you must enter your Bumblebee API token here
#     You can find your token here (when you're logged in to your ADS account): https://ui.adsabs.harvard.edu/user/settings/token
APItoken = 

# Address of API
API_URL = 'https://api.adsabs.harvard.edu/v1'
QUERY_URL = "{}/search/query".format(API_URL)


# Query parameters
# The number of records to be returned in Solr query
rows = 300


# What data do we need back from Solr
fields = "bibcode,doi,first_author_norm,title,property"


# date string for output file
dstring = datetime.today().strftime('%Y%m%d')


# ## Get the query for which we want the data
# try:
#     query_file = sys.argv[1]
# except:
#     sys.exit('Please provide name for file with queries as argument...')

query_file = 'queries.txt'
# query_input = 'active_solar_chromosphere    keyword_schema:UAT keyword:1980' #RMM: manually input a sample query here
    

### Main part of script
# get the entries in the input file
try:
    queries = open(query_file).read().strip().split('\n')
except Exception:
    sys.exit('Failed to get queries from file')

    
# Now execute the queries
for entry in queries:
    # ignore comment lines
    if entry.startswith('#'):
        continue
    # get file name and query
    fname, query = entry.split('\t')
    # retrieve the records found by the query
    try:
        pubdata = get_records(APItoken, query, fields)
    except Exception:
        sys.exit('Failed to get results for query provided')
    # determine output file
    ofile = "{0}_{1}.tsv".format(fname, dstring)
    # save some data in the records retrieved to the TSV file
    with open(ofile, 'w') as out_file:
        tsv_writer = csv.writer(out_file, delimiter='\t')
        row = []
        row.append('bibcode')
        row.append('doi')
        row.append('first_author_norm')
        row.append('title')
        row.append('refstatus')
        row.append('openaccess')
        tsv_writer.writerow(row)
        for entry in pubdata:
            properties = entry['property']
            refstatus = 1
            if 'REFEREED' not in properties:
                refstatus = 0
            openaccess = 1
            if 'OPENACCESS' not in properties:
                openaccess = 0
            row = []
            row.append(entry['bibcode'])
            row.append(entry['doi'][0])
            row.append(entry['first_author_norm'])
            row.append(entry['title'][0].encode('utf-8'))
            row.append(refstatus)
            row.append(openaccess)
            tsv_writer.writerow(row)

In [90]:
# Read the output file
query_outputs = pd.read_csv(ofile,delimiter='\t')#,header=0)#columns=['bibcode','doi','first_author_norm','title'])

In [91]:
query_outputs

Unnamed: 0,bibcode,doi,first_author_norm,title,refstatus,openaccess
0,2021ApJ...912..153K,10.3847/1538-4357/abf42d,"Kerr, G",b'He I 10830 \xc3\x85 Dimming during Solar Fla...,1,1
1,2020ApJ...904...15Y,10.3847/1538-4357/abba81,"Yan, X",b'Dynamics Evolution of a Solar Active-region ...,1,1
2,2020ApJ...889...65Z,10.3847/1538-4357/ab621f,"Zuccarello, F","b'Continuum Enhancements, Line Profiles, and M...",1,1
3,2020ApJ...898..144K,10.3847/1538-4357/aba117,"Kontogiannis, I",b'High-resolution Spectroscopy of an Erupting ...,1,1
4,2020ApJ...891...91D,10.3847/1538-4357/ab6bc9,"del Pino Aleman, T",b'The Magnetic Sensitivity of the Resonance an...,1,1
5,2020ApJ...890...96M,10.3847/1538-4357/ab6664,"Murabito, M",b'Penumbral Brightening Events Observed in AR ...,1,1
6,2019ApJ...885..119K,10.3847/1538-4357/ab48ea,"Kerr, G",b'Modeling Mg II during Solar Flares. II. None...,1,1
7,2019ApJ...883...57K,10.3847/1538-4357/ab3c24,"Kerr, G",b'Modeling Mg II During Solar Flares. I. Parti...,1,1
8,2020ApJ...890...32S,10.3847/1538-4357/ab65ec,"Schad, A",b'Inference of Solar Rotation from Perturbatio...,1,1
9,2020ApJ...904...95Z,10.3847/1538-4357/abb77c,"Zhou, Y",b'Spectroscopic Observations of High-speed Dow...,1,1


### Compiling queries

[running list of queries for Heliophysics](https://docs.google.com/spreadsheets/d/1R1flxY5j8MrdMzPZVPrwx6hDJ4nm63jDw1gZnGZwQ5w/edit?usp=sharing)

In [None]:
# NOTE: below should be put in a queries.txt file to be run

# TODO: the URLs for the queries should be update with the appropriate SOLR syntax to ensure consistency (e.g., if we query a specific UAT keyword)

# each line: two entries separated by tab
# first column: name of output file
# second column: ADS query
coronal_mass_ejection    keyword_schema:UAT keyword:310
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=keyword_schema%3AUAT%20keyword%3A310&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0
coronal_mass_ejection_full    full:"coronal mass ejection"    
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A%22coronal%20mass%20ejection%22&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0
    
solar_wind    keyword_schema:UAT keyword:1534
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=keyword_schema%3AUAT%20keyword%3A1534&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0
solar_wind_full    full:"solar wind"    
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A%22solar%20wind%22%20&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0

ionospheric_conductivity_full    full:"ionospheric_conductivity"  # not in UAT, closest is 'Earth Ionosphere (860)' 
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A%22ionospheric_conductivity%22&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0

space_weather    keyword_schema:UAT keyword:2037 
        #https://ui.adsabs.harvard.edu/search/q=keyword_schema%3AUAT%20keyword%3A2037%20&sort=date%20desc%2C%20bibcode%20desc&p_=0
        #NOTE: stand to gain much from this query - at the start, space weather is poorly indexed in ADS
        
geomagnetically_induced_current_full    full:"geomagnetically induced current" # not in UAT, closest is 'Geomagnetic fields(646)'
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A%22geomagnetically%20induced%20current%22&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0
        
compound_swmc    full:("solar wind" AND magnetosphere AND coupling)
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A%22geomagnetically%20induced%20current%22&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0
        
compound_mic    full:(magnetosphere AND ionosphere AND coupling)
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A(magnetosphere%20AND%20ionosphere%20AND%20coupling)&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0
        # NOTE: this one might be particularly useful to gauge change because at the outset this does not return good results in ADS
        
compound_imfr    full:("interplanetary magnetic field" AND reconnection) 
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A(%22interplanetary%20magnetic%20field%22%20AND%20reconnection)&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0


reconnection    keyword_schema:UAT keyword:1504 # not in UAT, closest is 'Solar magnetic reconnection(1504)''
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=keyword_schema%3AUAT%20keyword%3A1504&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0
reconnection_full    full:"reconnection" # not in UAT, closest is 'Solar magnetic reconnection(1504)''
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A%22reconnection%22&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0

substorm    full:"substorm" # not in UAT, nothing close
        #https://ui.adsabs.harvard.edu/search/fl=identifier%2C%5Bcitations%5D%2Cabstract%2Cauthor%2Cbook_author%2Corcid_pub%2Corcid_user%2Corcid_other%2Cbibcode%2Ccitation_count%2Ccomment%2Cdoi%2Cid%2Ckeyword%2Cpage%2Cproperty%2Cpub%2Cpub_raw%2Cpubdate%2Cpubnote%2Cread_count%2Ctitle%2Cvolume%2Clinks_data%2Cesources%2Cdata%2Ccitation_count_norm%2Cemail%2Cdoctype&q=full%3A%22substorm%22&rows=25&sort=date%20desc%2C%20bibcode%20desc&start=0&p_=0
        #NOTE: this is a critical query as it is important and the results are not relevant in the current ADS
    
particle_acceleration    keyword_schema:UAT keyword:826 # not fully covered in UAT, closest is 'Interplanetary particle acceleration(826)'
        #https://ui.adsabs.harvard.edu/search/q=keyword_schema%3AUAT%20keyword%3A826&sort=date%20desc%2C%20bibcode%20desc&p_=0
particle_acceleration_full    full:"particle acceleration" 
        #https://ui.adsabs.harvard.edu/search/q=full%3A%22particle%20acceleration%22&sort=date%20desc%2C%20bibcode%20desc&p_=0

network_similar    similar(bibcode:2015AdSpR..55.2745S)
        #https://ui.adsabs.harvard.edu/search/q=similar(bibcode%3A2015AdSpR..55.2745S)&sort=score%20desc%2C%20bibcode%20desc&p_=0
        #NOTE: the 'similar' second order operator: The collated text of all the abstracts returned by the inner query (if this is more than 200, only the top 200 are taken, according to score) are compared with each abstract in the ADS database

network_useful_Parker    useful(topn(200,similar(1958ApJ...128..664P)))
        #https://ui.adsabs.harvard.edu/search/q=useful(topn(200%2Csimilar(1958ApJ...128..664P)))&sort=score%20desc%2C%20bibcode%20desc&p_=0
    
network_useful_Dungey    useful(topn(200,similar(1961PhRvL...6...47D)))
        #https://ui.adsabs.harvard.edu/search/q=useful(topn(200%2Csimilar(1961PhRvL...6...47D)))&sort=score%20desc%2C%20bibcode%20desc&p_=0

network_trending    trending(keyword_schema:UAT keyword:2037)
        #https://ui.adsabs.harvard.edu/search/q=trending(keyword_schema%3AUAT%20keyword%3A2037)&sort=score%20desc%2C%20bibcode%20desc&p_=0

Evaluation notes

1. Sheer volume returned
2. How to subset the results (top hits? some random selection?)
3. Recall and precision
4. Expert review (how would we make this consitent?)
5. ...?
    