# <span style="color:darkblue"> Lecture 5 - HTML, JSON, Dictionaries </span>

<font size = "5">

In this lecture we will cover the basics of dictionaries and <br>
interacting with web-based APIs

<font size = "5">

Import packages

In [32]:
import requests      # Requests data from an API
import json          # Handles JSON object
import pandas as pd  # Handles data frames

# <span style="color:darkblue"> I. Best Coding Practices </span>

<font size = "5">

Splitting a command into multiple lines

- Sometimes strings very long, e.g. URLs
- Best practice to code lines with maximum 80 characters
- It is easier to read code if you scroll vertically <br>
rather than sideways

In [3]:
# You can split commands into multiple lines with a backlash
example_backslash = "This is a very long string" \
                    + "and I would like to break it down into multiple lines"

# You can also split commands into multiple lines by wrapping it in parentheses)
example_parenthesis = ( "This is a very long string" 
                    + "and I would like to break it down into multiple lines")


<font size = "5">

Examples of JSON/Dictionary Formatting

In [4]:
dictionary_student = \
{
    "first_name": "John",
    "last_name" : "Smith",
    "location"  : "Atlanta",
    "university": "Emory",
}

<font size = "5">
Nested dictionaries + Lists

In [5]:
dictionar_student_wgrades =\
{
    "first_name": "John",
    "last_name" : "Smith",
    "location"  : "Atlanta",
    "university": "Emory",
    "grades":{
        "computing":"A",
        "reasoning":"B",
        "applied":"A-"
    }
}

dictionary_liststudents = \
[{
    "first_name": "Ned",
    "last_name" : "Stark",
    "location"  : "Atlanta",
    "university": "Emory",
 },
 {
    "first_name": "Cersei",
    "last_name" : "Lannister",
    "location"  : "Decatur",
    "university": "Georgia Tech",
 }
]

# <span style="color:darkblue"> II. Accessing an API </span>

<font size = "5">

About the example

- OpenAlex is one of largest open access repositories <br>
of bibliographic information
- Contains millions of references and metada on <br>
books, papers, authors, and citation networks

https://docs.openalex.org

<font size = "5">

Define an URL to access API

``` https://website.com/option=12345```

- Usually it's a domain followed by a field
- Many search engines use this type of format <br>
for search queries

In [26]:
# Define a new string variable called "url_openalex_api"
# Notice that we used the backslash to splot the command, since
# the URL is quite long.

# In this case where are searching for all the works published 
# by an author whose ID is "A5023888391"

url_openalex_api = \
      "https://api.openalex.org/works?filter=author.id:A5023888391"


<font size = "5">

Request data


In [27]:
# Obtain the JSON data of records from the open alex API
# Note: you can break the link using backslash

search_results = requests.\
    get(url_openalex_api).json()


<font size = "5">

(Optional) Convert to a file for easier readability

In [22]:
# This commands opens the file to write it.
# The code in parenthesis of file.write(...) 
#     nicely formats the json file, with four character indentation.

with open('json_files/convert_search_results.txt', 'w') as file: 
    file.write(json.dumps(search_results, indent =4))

<font size = "5">

Open the file from the folder:
- Manually browsing for
- Getting a visual sense of the main features of the JSON <br>
 can save time in the automation process later


# <span style="color:darkblue"> II. Accessing an API </span>

<font size = "5">

Browse keys in the "first level"


In [28]:
# The "len" command is used to 
print(len(search_results))

# The "keys" subfunction extract the names of the "first-level" objects
print(search_results.keys())


3
dict_keys(['meta', 'results', 'group_by'])


<font size = "5">

Browse subdictionary within a particular key

In [35]:
search_results["meta"]

{'count': 54,
 'db_response_time_ms': 40,
 'page': 1,
 'per_page': 25,
 'groups_count': None}

<font size = "5">

Convert dictionary to pandas

In [41]:
# This code converts to Pandas

data = pd.DataFrame(search_results["meta"],index = [0])

data

Unnamed: 0,count,db_response_time_ms,page,per_page,groups_count
0,54,40,1,25,


# <span style="color:darkblue"> III. Working with the nested structure </span>

<font size = "5">

Browse the objects and check their formatting

- The following object is actually a list
- This is very important for how you access the information

In [43]:
# Count number of elements in list
len(search_results["results"])

25

In [44]:
search_results["results"]

[{'id': 'https://openalex.org/W2741809807',
  'doi': 'https://doi.org/10.7717/peerj.4375',
  'title': 'The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles',
  'display_name': 'The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles',
  'publication_year': 2018,
  'publication_date': '2018-02-13',
  'ids': {'openalex': 'https://openalex.org/W2741809807',
   'doi': 'https://doi.org/10.7717/peerj.4375',
   'mag': '2741809807',
   'pmid': 'https://pubmed.ncbi.nlm.nih.gov/29456894',
   'pmcid': 'https://www.ncbi.nlm.nih.gov/pmc/articles/5815332'},
  'language': 'en',
  'primary_location': {'is_oa': True,
   'landing_page_url': 'https://doi.org/10.7717/peerj.4375',
   'pdf_url': 'https://peerj.com/articles/4375.pdf',
   'source': {'id': 'https://openalex.org/S1983995261',
    'display_name': 'PeerJ',
    'issn_l': '2167-8359',
    'issn': ['2167-8359'],
    'is_oa': True,
    'is_in_doaj': True,
    'host_organizat

<font size = "5">

You can access different sub elements

In [47]:
print(search_results["results"][0])
print(search_results["results"][1])

{'id': 'https://openalex.org/W2741809807', 'doi': 'https://doi.org/10.7717/peerj.4375', 'title': 'The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles', 'display_name': 'The state of OA: a large-scale analysis of the prevalence and impact of Open Access articles', 'publication_year': 2018, 'publication_date': '2018-02-13', 'ids': {'openalex': 'https://openalex.org/W2741809807', 'doi': 'https://doi.org/10.7717/peerj.4375', 'mag': '2741809807', 'pmid': 'https://pubmed.ncbi.nlm.nih.gov/29456894', 'pmcid': 'https://www.ncbi.nlm.nih.gov/pmc/articles/5815332'}, 'language': 'en', 'primary_location': {'is_oa': True, 'landing_page_url': 'https://doi.org/10.7717/peerj.4375', 'pdf_url': 'https://peerj.com/articles/4375.pdf', 'source': {'id': 'https://openalex.org/S1983995261', 'display_name': 'PeerJ', 'issn_l': '2167-8359', 'issn': ['2167-8359'], 'is_oa': True, 'is_in_doaj': True, 'host_organization': 'https://openalex.org/P4310320104', 'host_organization_n

<font size ="5">

Gradually work through the nested structure

# <span style="color:darkblue"> IV. Hands-on Experience </span>

The Open Library also contains bibliographic information.

You can search for sports books with the API link

http://openlibrary.org/search.json?title=sports

1. Get information from this website using requests
2. Find out the main keys
3. How many search results are there?
4. What data can you obtain? Think carefully about <br>
how to convert this into a workable pandas format

In [20]:
# Write your own code







# <span style="color:darkblue"> V. Further Reading </span>

<font size = "5">

Learn more about JSON at <br>

https://www.w3schools.com/js/js_json_intro.asp

<font size = "5">

https://saturncloud.io/blog/how-to-convert-nested-json-to-pandas-dataframe-with-specific-format/


<font size = "5">

Check the following textbook on our reading list, Chapter 7:

https://www.amazon.com/Introduction-Programming-Business-Science-Applications/dp/1544377444

<font size = "5">

There are several online resources for further practice at:

https://study.sagepub.com/researchmethods/statistics/kaefer-intro-to-python