# <span style="color:darkblue"> Lecture 5b - APIs and JSON </span>

<font size = "5">

In this lecture we will cover the basics of dictionaries and <br>
interacting with web-based APIs

<font size = "5">

Import packages

In [1]:
import requests      # Requests data from an API
import json          # Handles JSON object
import jmespath      # Allows you to mor easily navigate JSON objects
import pandas as pd  # Handles data frames

# <span style="color:darkblue"> I. Best Coding Practices </span>

<font size = "5">

Splitting a command into multiple lines

<font size = "3">

- Sometimes strings are very long, e.g. URLs
- Best practice to code lines with maximum 80 characters
- It is easier to read code if you scroll vertically <br>
rather than sideways

In [2]:
# You can split commands into multiple lines with a backlash
example_backslash = "This is a very long string" \
                    + "and I would like to break it down into multiple lines"

# You can also split commands into multiple lines by wrapping them in parentheses)
example_parenthesis = ( "This is a very long string" 
                    + "and I would like to break it down into multiple lines")


<font size = "5">

Examples of Dictionary Formatting

In [3]:
# Define dictionary
dictionary_student = \
{
    "first_name": "John",
    "last_name" : "Smith",
    "location"  : "Atlanta",
    "university": "Emory",
}


<font size = "5">
Nested dictionaries + Lists

In [4]:
dictionary_student_wgrades =\
{
    "first_name": "John",
    "last_name" : "Smith",
    "location"  : "Atlanta",
    "university": "Emory",
    "grades":{
        "computing":"A",
        "reasoning":"B",
        "applied":"A-"
    }
}

dictionary_liststudents = \
[{
    "first_name": "Ned",
    "last_name" : "Stark",
    "location"  : "Atlanta",
    "university": "Emory",
    
 },
 {
    "first_name": "Cersei",
    "last_name" : "Lannister",
    "location"  : "Decatur",
    "university": "Georgia Tech",
 }
]

<font size = "5">

Convert dictionaries to JSON

In [5]:
# Convert dictionary to JSON
## json.dumps() converts the dictionary to a string
## json.loads() converts this string to a JSON object

json_student         = json.loads(json.dumps(dictionary_student ))
json_student_wgrades = json.loads(json.dumps(dictionary_student_wgrades))
json_liststudents    = json.loads(json.dumps(dictionary_liststudents))


<font size = "5">

Navigating a JSON file by name

In [6]:
# The search command tries to search down the hierarchy, first "grades", then
# computing. The "." is used to separate the queries at different levels.

extract_info1 = jmespath.search('grades.computing', json_student_wgrades)
print(extract_info1)

A


<font size = "5">

Navigating a JSON list by position

In [7]:
extract_info2 = jmespath.search('[0]', json_liststudents)
print(extract_info2)

extract_info3 = jmespath.search('[0].first_name', json_liststudents)
print(extract_info3)


{'first_name': 'Ned', 'last_name': 'Stark', 'location': 'Atlanta', 'university': 'Emory'}
Ned


<font size = "5">

Extracting elements at all sublevels

In [8]:
# This commands extracts the value of "first_name" for all students
# The [*] command 

extract_info3 = jmespath.search('[*].first_name', json_liststudents)
print(extract_info3)


['Ned', 'Cersei']


# <span style="color:darkblue"> II. Accessing an API </span>

<font size = "5">

About the example

- OpenAlex is one of largest open access repositories <br>
of bibliographic information
- Contains millions of references and metada on <br>
books, papers, authors, and citation networks

https://docs.openalex.org

<font size = "5">

Define an URL to access API

``` https://website.com/option=12345```

- Usually it's a domain followed by a field
- Many search engines use this type of format <br>
for search queries

In [9]:
# Define a new string variable called "url_openalex_api"
# Notice that we used the backslash to splot the command, since
# the URL is quite long.

# In this case where are searching for all the works published 
# by an author whose ID is "A5023888391"

url_openalex_api = \
      "https://api.openalex.org/works?filter=author.id:A5023888391"


<font size = "5">

Request data


In [10]:
# Obtain the JSON data of records from the open alex API
# Note: you can break the link using backslash

search_results = requests.\
    get(url_openalex_api).json()


<font size = "5">

Convert to a JSON file for easier readability

In [11]:
# This commands opens the file to write it.
# The code in parenthesis of file.write(...) 
#     nicely formats the json file, with four character indentation.

with open('json_files/convert_search_results.json', 'w') as file: 
    file.write(json.dumps(search_results, indent =4))

<font size = "5">

Browsing with the JSON data viewer

<font size = "3">

- Open the ```convert_search_results.json``` file from the folder in VSCode
- Click on VS-Code's "View" tab at the top and click on the <br>
 option "Command Palette". This will open a pop-up window. <br>
<br>

<img src="figures/command_palette.png" alt="drawing" width="150"/>
<img src="figures/open_jsonviewer.png" alt="drawing" width="350"/>

<br>

- Install JSON viewer extension for VS-Code
- Search "Open in JSON viewer" and click enter. This <br>
should open a new tab with the following layout, which has <br>
convenient drop-down menus: <br>
<br>

<img src="figures/json_viewer.png" alt="drawing" width="400"/>

<font size = "5">

Try it yourself! Experiment with the JSON data viewer. <br>
Diagnose how the data is structured and the different levels.

# <span style="color:darkblue"> III. Navigating the nested structure </span>

<font size = "5">

Browse keys in the "first level"


In [12]:
# The "len" command is used to 
print(len(search_results))

# The "keys" subfunction extract the names of the "first-level" objects
print(search_results.keys())

# Store keys to data fram to easy access and visualization
search_results_keys = pd.DataFrame(search_results.keys())


3
dict_keys(['meta', 'results', 'group_by'])


<font size = "5">

Browse subdictionary within a particular key

In [13]:
search_results["meta"]

{'count': 62,
 'db_response_time_ms': 111,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

<font size = "5">

Convert dictionary to pandas

In [14]:
# This code converts to Pandas

data = pd.DataFrame(search_results["meta"],index = [0])

data

Unnamed: 0,count,db_response_time_ms,page,per_page,groups_count
0,62,111,1,25,


<font size = "5">

Count elements at a particular level

In [58]:
# Count number of elements in list
len(search_results["results"])

25

<font size = "5">

You can access different sub elements

In [59]:
print(search_results["results"][0])
print(search_results["results"][1])

search_results["results"][0].keys()

{'id': 'https://openalex.org/W2741809807', 'doi': 'https://doi.org/10.7717/peerj.4375', 'title': 'The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles', 'display_name': 'The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles', 'publication_year': 2018, 'publication_date': '2018-02-13', 'ids': {'openalex': 'https://openalex.org/W2741809807', 'doi': 'https://doi.org/10.7717/peerj.4375', 'mag': '2741809807', 'pmid': 'https://pubmed.ncbi.nlm.nih.gov/29456894', 'pmcid': 'https://www.ncbi.nlm.nih.gov/pmc/articles/5815332'}, 'language': 'en', 'primary_location': {'is_oa': True, 'landing_page_url': 'https://doi.org/10.7717/peerj.4375', 'pdf_url': 'https://peerj.com/articles/4375.pdf', 'source': {'id': 'https://openalex.org/S1983995261', 'display_name': 'PeerJ', 'issn_l': '2167-8359', 'issn': ['2167-8359'], 'is_oa': True, 'is_in_doaj': True, 'is_core': True, 'host_organization': 'https://openalex.org/P4310320104', 'ho

dict_keys(['id', 'doi', 'title', 'display_name', 'publication_year', 'publication_date', 'ids', 'language', 'primary_location', 'type', 'type_crossref', 'indexed_in', 'open_access', 'authorships', 'countries_distinct_count', 'institutions_distinct_count', 'corresponding_author_ids', 'corresponding_institution_ids', 'apc_list', 'apc_paid', 'fwci', 'has_fulltext', 'fulltext_origin', 'cited_by_count', 'citation_normalized_percentile', 'cited_by_percentile_year', 'biblio', 'is_retracted', 'is_paratext', 'primary_topic', 'topics', 'keywords', 'concepts', 'mesh', 'locations_count', 'locations', 'best_oa_location', 'sustainable_development_goals', 'grants', 'datasets', 'versions', 'referenced_works_count', 'referenced_works', 'related_works', 'abstract_inverted_index', 'cited_by_api_url', 'counts_by_year', 'updated_date', 'created_date'])

<font size ="5">

Gradually work through the nested structure!

<font size = "5">

Try it yourself!

<font size = "3">

- Navigate the "results" file using the JSON Data Viewer
- Use the "jmes.search()" to extract the "title" of all the <br>
search results.


In [15]:
# Write your own code
titles = jmespath.search('results[*].title', search_results)
print(titles)



['The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles', 'Scientometrics 2.0: New metrics of scholarly impact on the social Web', 'Altmetrics in the wild: Using social media to explore scholarly impact', 'altmetrics: a manifesto', 'The Altmetrics Collection', 'How and why scholars cite on Twitter', 'Coverage and adoption of altmetrics sources in the bibliometric community', 'OpenAlex: A fully-open index of scholarly works, authors, venues, institutions, and concepts', 'Beyond the paper', "Beyond citations: Scholars' visibility on the social Web", 'The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles', 'The Future of OA: A large-scale analysis projecting Open Access publication and readership', 'Data for free: Using LMS activity logs to measure community in online courses', 'The power of altmetrics on a CV', 'Decoupling the scholarly journal', 'Riding the crest of the altmetrics wave: How librarians can help

<font size = "5">

Try it yourself!

<font size = "3">

- Convert ```search_results["results"]``` to a Pandas dataframe
- Are there any columns with conversion issues?

Note: Pandas conversion can into issues for variables, with <br>
several levels of nesting and more data cleaning may be required <br>
to properly code these cases.

In [16]:
# Write your own code

df_results = pd.DataFrame(search_results["results"])
df_results.info()
df_results.head()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25 entries, 0 to 24
Data columns (total 51 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   id                              25 non-null     object 
 1   doi                             21 non-null     object 
 2   title                           25 non-null     object 
 3   display_name                    25 non-null     object 
 4   publication_year                25 non-null     int64  
 5   publication_date                25 non-null     object 
 6   ids                             25 non-null     object 
 7   language                        25 non-null     object 
 8   primary_location                24 non-null     object 
 9   type                            25 non-null     object 
 10  type_crossref                   25 non-null     object 
 11  indexed_in                      25 non-null     object 
 12  open_access                     25 non

Unnamed: 0,id,doi,title,display_name,publication_year,publication_date,ids,language,primary_location,type,...,versions,referenced_works_count,referenced_works,related_works,abstract_inverted_index,abstract_inverted_index_v3,cited_by_api_url,counts_by_year,updated_date,created_date
0,https://openalex.org/W2741809807,https://doi.org/10.7717/peerj.4375,The state of OA: a large-scale analysis of the...,The state of OA: a large-scale analysis of the...,2018,2018-02-13,{'openalex': 'https://openalex.org/W2741809807...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,[],45,"[https://openalex.org/W1560783210, https://ope...","[https://openalex.org/W3203790917, https://ope...","{'Despite': [0], 'growing': [1], 'interest': [...",,https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2025, 'cited_by_count': 30}, {'year'...",2025-06-11T00:03:37.532706,2017-08-08
1,https://openalex.org/W2122130843,https://doi.org/10.5210/fm.v15i7.2874,Scientometrics 2.0: New metrics of scholarly i...,Scientometrics 2.0: New metrics of scholarly i...,2010,2010-07-02,{'openalex': 'https://openalex.org/W2122130843...,en,"{'is_oa': False, 'landing_page_url': 'https://...",article,...,[],0,[],"[https://openalex.org/W4385506752, https://ope...","{'The': [0], 'growing': [1], 'flood': [2], 'of...",,https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2025, 'cited_by_count': 4}, {'year':...",2025-05-29T05:33:52.392876,2016-06-24
2,https://openalex.org/W1553564559,https://doi.org/10.48550/arxiv.1203.4745,Altmetrics in the wild: Using social media to ...,Altmetrics in the wild: Using social media to ...,2012,2012-01-01,{'openalex': 'https://openalex.org/W1553564559...,en,"{'is_oa': True, 'landing_page_url': 'https://a...",preprint,...,[],0,[],"[https://openalex.org/W3211611723, https://ope...","{'In': [0], 'growing': [1], 'numbers,': [2], '...",,https://api.openalex.org/works?filter=cites:W1...,"[{'year': 2025, 'cited_by_count': 2}, {'year':...",2025-05-16T23:01:40.514294,2016-06-24
3,https://openalex.org/W3130540911,,altmetrics: a manifesto,altmetrics: a manifesto,2011,2011-01-01,{'openalex': 'https://openalex.org/W3130540911...,ca,"{'is_oa': False, 'landing_page_url': 'https://...",article,...,[],0,[],"[https://openalex.org/W2963475133, https://ope...",,,https://api.openalex.org/works?filter=cites:W3...,"[{'year': 2021, 'cited_by_count': 14}, {'year'...",2025-05-24T20:43:35.778470,2021-03-01
4,https://openalex.org/W2396414759,https://doi.org/10.1371/journal.pone.0048753,The Altmetrics Collection,The Altmetrics Collection,2012,2012-11-01,{'openalex': 'https://openalex.org/W2396414759...,en,"{'is_oa': True, 'landing_page_url': 'https://d...",article,...,[],28,"[https://openalex.org/W1559646899, https://ope...","[https://openalex.org/W3150373829, https://ope...","{'What': [0], 'paper': [1], 'should': [2, 7, 1...",,https://api.openalex.org/works?filter=cites:W2...,"[{'year': 2024, 'cited_by_count': 14}, {'year'...",2025-05-21T08:44:21.949114,2016-06-24


# <span style="color:darkblue"> IV. Hands-on Experience </span>

The Open Library also contains bibliographic information.

You can search for sports books with the API link

http://openlibrary.org/search.json?title=sports

1. Get information from this website using requests
2. Store the result in a JSON file and open it with data viewer.
2. Find out the main keys
3. How many search results are there?
4. What data can you obtain? Think carefully about <br>
how to convert this into a workable pandas format

In [17]:
# Write your own code

# 1. Get information from the Open Library API
openlibrary_url = "http://openlibrary.org/search.json?title=sports"
openlibrary_results = requests.get(openlibrary_url).json()

# 2. Store the result in a JSON file
with open('json_files/openlibrary_sports.json', 'w') as f:
    f.write(json.dumps(openlibrary_results, indent=4))

# 3. Find out the main keys
print(openlibrary_results.keys())

# 4. How many search results are there?
print("Number of search results:", openlibrary_results.get('numFound', 0))

# 5. Convert the 'docs' list to a pandas DataFrame for further analysis
df_openlibrary = pd.DataFrame(openlibrary_results['docs'])
df_openlibrary.info()
df_openlibrary.head()





dict_keys(['numFound', 'start', 'numFoundExact', 'num_found', 'documentation_url', 'q', 'offset', 'docs'])
Number of search results: 41912
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 19 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   author_key            100 non-null    object 
 1   author_name           100 non-null    object 
 2   cover_i               93 non-null     float64
 3   ebook_access          100 non-null    object 
 4   edition_count         100 non-null    int64  
 5   first_publish_year    100 non-null    int64  
 6   has_fulltext          100 non-null    bool   
 7   ia                    69 non-null     object 
 8   ia_collection_s       68 non-null     object 
 9   key                   100 non-null    object 
 10  language              98 non-null     object 
 11  lending_edition_s     41 non-null     object 
 12  lending_identifier_s  41 non-null     

Unnamed: 0,author_key,author_name,cover_i,ebook_access,edition_count,first_publish_year,has_fulltext,ia,ia_collection_s,key,language,lending_edition_s,lending_identifier_s,public_scan_b,title,cover_edition_key,subtitle,id_project_gutenberg,id_librivox
0,"[OL22258A, OL10447468A]","[James Patterson, Valentina de Angelis]",6450803.0,borrowable,34,1998,True,"[savingworld0000patt, isbn_9780756983536, maxi...",cnusd-ol;delawarecountydistrictlibrary-ol;denv...,/works/OL5337360W,[eng],OL26443392M,savingworld0000patt,False,Saving the World and Other Extreme Sports,,,,
1,[OL68744A],[Francie Alexander],277661.0,borrowable,48,2003,True,"[isbn_9780439406789, phonicsfun00fran_0, phoni...",inlibrary;internetarchivebooks;printdisabled,/works/OL816016W,[eng],OL10251173M,isbn_9780439406789,False,Good Sports (Clifford the Big Red Dog Phonics ...,OL7512703M,,,
2,[OL241533A],[Robert S. Weinberg],1364042.0,borrowable,7,1995,True,"[foundationsofspo0003wein, foundationsofspo000...",americana;barryuniversity-ol;binghamton-ol;inl...,/works/OL2005230W,[eng],OL19133813M,foundationsofspo0003wein,False,Foundations of sport and exercise psychology,OL3556484M,,,
3,[OL27064A],[Dick Francis],6688382.0,borrowable,45,1967,True,"[bloodsport0000dick_r1f0, bloodsport0000dick, ...",americana;delawarecountydistrictlibrary;inlibr...,/works/OL463330W,[eng],OL47837916M,bloodsport0000dick_r1f0,False,Blood sport,OL20941658M,,,
4,[OL8222957A],[Robert Smith Surtees],11946684.0,no_ebook,57,1984,False,,,/works/OL25030921W,[eng],,,False,Mr. Sponge's Sporting Tour,OL33279842M,,,


# <span style="color:darkblue"> V. Further Reading </span>

<font size = "5">

Learn more about how to use advanced JSON queries with JMES at

- This can be very useful to navigate the nested structure
- Examples of syntax and advanced search terms

https://jmespath.org/tutorial.html

<font size = "5">

Learn more about JSON at <br>

https://www.w3schools.com/js/js_json_intro.asp

<font size = "5">

https://saturncloud.io/blog/how-to-convert-nested-json-to-pandas-dataframe-with-specific-format/


<font size = "5">

Check the following textbook on our reading list, Chapter 7:

https://www.amazon.com/Introduction-Programming-Business-Science-Applications/dp/1544377444

<font size = "5">

There are several online resources for further practice at:

https://study.sagepub.com/researchmethods/statistics/kaefer-intro-to-python