# Application Programming Interfaces
Application Programming Interfaces (API) or API endpoint are a way for computer programs and tools to communicate with each other. Typical interactions include web API's communication from website to website or website to a computer. The API will reside in the source website and the developer and owner of the API will have some instructions and documentation on how to submit requests, download data, query parameters available, among other features. Most API's require registration to the owners Website/API as well as accepting user terms of service. An API may include some data but not all data available in a website. An API endpoint  

One example of an open API is [Open Notify](http://api.open-notify.org) which is an open source project that provides a simple API for some of National Aeronautics and Space Administration (NASA) data. This Jupyter Notebook uses this API to run some example on how API downloaded data could be converted to Pandas usable data.

Most Websites will typically have a section on API's or a section for Developers which will provide specific instructions and documentation on how to use their API.

References:
- [Python API Tutorial: Getting Started with APIs – Dataquest](https://www.dataquest.io/blog/python-api-tutorial/)

Public API's (Some require API Key and registration to use):
- [Open Notify API](http://open-notify.org/)
- [U.S. Federal Goverment Federal Registry](https://www.federalregister.gov/reader-aids/developer-resources/rest-api)
- [USAJobs: U.S. Federal Goverment Job Postings Website API](https://developer.usajobs.gov/API-Reference)
- [U.S. Federal Government Data.gov Website API](https://data.gov/developers/apis/)
- [U.S. Federal Government Open Source Software By Agency](https://code.gov/agencies)

#### API Status Codes
Status codes are returned with every request that is made to a web server. Status codes indicate information about what happened with a request. Here are some codes that are relevant to GET requests:

- 200: Everything went okay, and the result has been returned (if any).
- 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
- 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
- 401: The server thinks you’re not authenticated. Many APIs require login ccredentials, so this happens when you don’t send the right credentials to access an API.
- 403: The resource you’re trying to access is forbidden: you don’t have the right perlessons to see it.
- 404: The resource you tried to access wasn’t found on the server.
- 503: The server is not ready to handle the request.

In [2]:
# Library Loading
import requests
import json
import pandas as pd

from datetime import date
from datetime import timedelta

# API Requests
The Request library and the .get() function allows us to request information from a website address. An API will have some address to make requests and the API will return information if connection was successful.

In [3]:
# In this case the API address does not exist and will return an error.
response = requests.get("https://api.open-notify.org/this-api-doesnt-exist")

ConnectionError: HTTPSConnectionPool(host='api.open-notify.org', port=443): Max retries exceeded with url: /this-api-doesnt-exist (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002C6E7D23DF0>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))

### API Request: [Open Notify](http://open-notify.org/)
As of 12/4/2022, Open Notify has two API's. One with the location of the international space station (ISS) (http://open-notify.org/Open-Notify-API/ISS-Location-Now/) and another with the total number of people in space right now (http://open-notify.org/Open-Notify-API/People-In-Space/). The API on ISS pass times has been removed (http://open-notify.org/Open-Notify-API/ISS-Pass-Times/).

In [4]:
# The Open-Notify API address is http://api.open-notify.org and points to a JSON data file.
# Get Documentation: https://requests.readthedocs.io/en/latest/user/quickstart/
response = requests.get("http://api.open-notify.org/astros.json")

In [5]:
print(response.status_code)
# The API Status codes are bellow.
# Status code 200 means the request was successful.

200


In [6]:
# Let's see the data. It returs a list of people currently in Space.
response.json()

{'people': [{'craft': 'Tiangong', 'name': 'Cai Xuzhe'},
  {'craft': 'Tiangong', 'name': 'Chen Dong'},
  {'craft': 'Tiangong', 'name': 'Liu Yang'},
  {'craft': 'ISS', 'name': 'Sergey Prokopyev'},
  {'craft': 'ISS', 'name': 'Dmitry Petelin'},
  {'craft': 'ISS', 'name': 'Frank Rubio'},
  {'craft': 'ISS', 'name': 'Nicole Mann'},
  {'craft': 'ISS', 'name': 'Josh Cassada'},
  {'craft': 'ISS', 'name': 'Koichi Wakata'},
  {'craft': 'ISS', 'name': 'Anna Kikina'},
  {'craft': 'Shenzhou 15', 'name': 'Fei Junlong'},
  {'craft': 'Shenzhou 15', 'name': 'Deng Qingming'},
  {'craft': 'Shenzhou 15', 'name': 'Zhang Lu'}],
 'number': 13,
 'message': 'success'}

In [7]:
# We can assign a variable to the data.
json_data = response.json()
# We can see that the data is a dictionary form.
type(json_data)

dict

In [8]:
# We can convert a dictionary to a dataframe.
pd.DataFrame(json_data)
# We can see that we have one column called "people" whcih has the craft and name of the astronaut.
# We have a collumn called "number" whcih has the total number of people.
# And a column message.

Unnamed: 0,people,number,message
0,"{'craft': 'Tiangong', 'name': 'Cai Xuzhe'}",13,success
1,"{'craft': 'Tiangong', 'name': 'Chen Dong'}",13,success
2,"{'craft': 'Tiangong', 'name': 'Liu Yang'}",13,success
3,"{'craft': 'ISS', 'name': 'Sergey Prokopyev'}",13,success
4,"{'craft': 'ISS', 'name': 'Dmitry Petelin'}",13,success
5,"{'craft': 'ISS', 'name': 'Frank Rubio'}",13,success
6,"{'craft': 'ISS', 'name': 'Nicole Mann'}",13,success
7,"{'craft': 'ISS', 'name': 'Josh Cassada'}",13,success
8,"{'craft': 'ISS', 'name': 'Koichi Wakata'}",13,success
9,"{'craft': 'ISS', 'name': 'Anna Kikina'}",13,success


In [9]:
# We can get rid of the "number" and "message" columns by only selecting the column "people"
# This data now looks similar to our typical dataframe and easier to interact with.
pd.DataFrame(json_data['people'])

Unnamed: 0,craft,name
0,Tiangong,Cai Xuzhe
1,Tiangong,Chen Dong
2,Tiangong,Liu Yang
3,ISS,Sergey Prokopyev
4,ISS,Dmitry Petelin
5,ISS,Frank Rubio
6,ISS,Nicole Mann
7,ISS,Josh Cassada
8,ISS,Koichi Wakata
9,ISS,Anna Kikina


In [10]:
# We can assign a dataframe name and variable. 
df_open_notify_astros = pd.DataFrame(json_data['people'])

In [11]:
# Each row is an astronaut name and the total should match our total number of astronauts in space, 13.
print(f'Total number of astronauts in space is {len(df_open_notify_astros)}.')

Total number of astronauts in space is 13.


In [12]:
# We can filter for astronauts at the ISS.
df_open_notify_astros[df_open_notify_astros['craft'] == 'ISS']

Unnamed: 0,craft,name
3,ISS,Sergey Prokopyev
4,ISS,Dmitry Petelin
5,ISS,Frank Rubio
6,ISS,Nicole Mann
7,ISS,Josh Cassada
8,ISS,Koichi Wakata
9,ISS,Anna Kikina


In [13]:
# Once the API data has been converted to a dataframe, Pandas can be used to filter, plot, and transform the data.
# Some API's also allow input of parameters.

# API Requests: U.S. Department of Energy
U.S. Federal Government Code.gov (https://code.gov/agencies) Website API for the U.S. Department of Energy.

In [14]:
response = requests.get("https://www.energy.gov/sites/default/files/2022-10/code-10-03-2022.json")

In [15]:
print(response.status_code)

200


In [16]:
response.json().keys() # Provides the top keys of the json/dictionary

dict_keys(['agency', 'measurementType', 'releases', 'version'])

In [17]:
response.json()['agency']

'DOE'

In [18]:
response.json()['measurementType']

{'ifOther': '', 'method': 'other'}

In [19]:
response.json()['version']

'2.0.0'

In [20]:
# The releases seem to be a list of Open Source codes published by the U.S. Department of energy.
print(pd.DataFrame(response.json()['releases']).shape)
pd.DataFrame(response.json()['releases']).head(5)

(4569, 14)


Unnamed: 0,contact,date,description,laborHours,name,organization,permissions,repositoryURL,status,tags,vcs,languages,homepageURL,version
0,{'email': 'jcrouch@sandia.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...","Teuchos is designed to provide portable, objec...",8344830.4,Teuchos Utility Package,Sandia National Laboratories (SNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/trilinos/Trilinos,Production,"[DOE CODE, Sandia National Laboratories (SNL)]",git,,,
1,{'email': 'jcrouch@sandia.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...",Amesos is the Direct Sparse Solver Package in ...,8344830.4,Amesos Solver Package,Sandia National Laboratories (SNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/trilinos/Trilinos,Production,"[DOE CODE, Sandia National Laboratories (SNL)]",git,[],,
2,{'email': 'holdensanchez2@llnl.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...",The MRSH project is a collection of the follow...,24213.6,MRSH Version V2.0,Lawrence Livermore National Laboratory (LLNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/chaos/mrsh,Production,"[DOE CODE, Lawrence Livermore National Laborat...",git,,,
3,{'email': 'holdensanchez2@llnl.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...",The Lustre Administrative Tools (LAT) is a set...,5639.2,Lustre Administrative Tool,Lawrence Livermore National Laboratory (LLNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/cea-hpc/shine,Production,"[DOE CODE, Lawrence Livermore National Laborat...",git,,,
4,{'email': 'holdensanchez2@llnl.gov'},"{'created': '2017-10-25', 'metadataLastUpdated...",WHATSUP determines which nodes in a cluster ar...,58793.6,WHATSUP Version1.3,Lawrence Livermore National Laboratory (LLNL),"{'exemptionText': None, 'licenses': [{'URL': '...",https://github.com/chaos/whatsup,Production,"[DOE CODE, Lawrence Livermore National Laborat...",git,,,


# API Requests: [U.S. Federal Goverment Federal Registry](https://www.federalregister.gov/reader-aids/developer-resources/rest-api)
The Federal Register is a website where all U.S. Federal Government Agencies publish rules, proposed rules, notices and other  documents for public awareness.

In [21]:
# Address is generated from the RESTFUL API Interactive Documentation.
# Used the "/documents.{format} Search all Federal Register documents published since 1994."
# Fields = abstract, agencies, dates, document_number, page_length, pdf_url, publication_date, title.
# per_page How many documents to return at once; 1000 maximum. = 1000 
# conditions[publication_date][is] = '2022-12-01'
response = requests.get("https://www.federalregister.gov/api/v1/documents.json?fields[]=abstract&fields[]=agencies&fields[]=dates&fields[]=document_number&fields[]=page_length&fields[]=pdf_url&fields[]=publication_date&fields[]=title&per_page=1000&order=newest&conditions[publication_date][is]=2022-12-01")

In [22]:
print(response.status_code)

200


In [23]:
response.json().keys() # Provides the top keys of the json/dictionary

dict_keys(['count', 'description', 'total_pages', 'results'])

In [24]:
response.json()

{'count': 98,
 'description': 'Documents published on 12/01/2022',
 'total_pages': 1,
 'results': [{'abstract': "We, the U.S. Fish and Wildlife Service (Service), list the Puerto Rican harlequin butterfly (Atlantea tulita), a species from Puerto Rico, as a threatened species with a rule issued under section 4(d) of the Endangered Species Act of 1973 (Act), as amended. We also designate critical habitat for this species under the Act. In total, approximately 41,266 acres (16,699.8 hectares) in six units in the municipalities of Isabela, Quebradillas, Camuy, Arecibo, Utuado, Florida, Ciales, Maricao, San Germ[aacute]n, Sabana Grande, and Yauco are within the boundaries of the critical habitat designation. This rule extends the Act's protections to the species and its designated critical habitat.",
   'agencies': [{'raw_name': 'DEPARTMENT OF THE INTERIOR',
     'name': 'Interior Department',
     'id': 253,
     'url': 'https://www.federalregister.gov/agencies/interior-department',
     '

In [25]:
# The releases seem to be a list of Open Source codes published by the U.S. Department of energy.
print(pd.DataFrame(response.json()).shape)
pd.DataFrame(response.json()).head(5)

(98, 4)


Unnamed: 0,count,description,total_pages,results
0,98,Documents published on 12/01/2022,1,"{'abstract': 'We, the U.S. Fish and Wildlife S..."
1,98,Documents published on 12/01/2022,1,"{'abstract': None, 'agencies': [{'raw_name': '..."
2,98,Documents published on 12/01/2022,1,"{'abstract': None, 'agencies': [{'raw_name': '..."
3,98,Documents published on 12/01/2022,1,"{'abstract': None, 'agencies': [{'raw_name': '..."
4,98,Documents published on 12/01/2022,1,"{'abstract': None, 'agencies': [{'raw_name': '..."


In [26]:
# The releases seem to be a list of Open Source codes published by the U.S. Department of energy.
print(pd.DataFrame(response.json()['results']).shape)
pd.DataFrame(response.json()['results']).head(5)
# This includes 98 publications on December 1, 2022.
# Now that we have the url date. We can define the date as a variable and put it in the url.

(98, 8)


Unnamed: 0,abstract,agencies,dates,document_number,page_length,pdf_url,publication_date,title
0,"We, the U.S. Fish and Wildlife Service (Servic...","[{'raw_name': 'DEPARTMENT OF THE INTERIOR', 'n...","This rule is effective January 3, 2023.",2022-25805,28,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,Endangered and Threatened Wildlife and Plants;...
1,,"[{'raw_name': 'DEPARTMENT OF COMMERCE', 'name'...",,2022-26153,4,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,"Antidumping or Countervailing Duty Order, Find..."
2,,"[{'raw_name': 'DEPARTMENT OF COMMERCE', 'name'...",,2022-26155,2,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,"Antidumping or Countervailing Duty Order, Find..."
3,,"[{'raw_name': 'DEPARTMENT OF ENERGY', 'name': ...",11/23/22.,2022-26139,2,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,Combined Notice of Filings
4,,"[{'raw_name': 'DEPARTMENT OF ENERGY', 'name': ...",,2022-26136,1,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-01,"AES CE Solutions, LLC; Supplemental Notice Tha..."


#### Automating API Requests

In [28]:
# Using Datetime today lets see how many documents were published yesterday.
# Need to specify date format of YYYY-MM-DD for input in the address.

def previous_weekday(a_date): # Function to calculate the date of previous weekday 
    a_date = a_date - timedelta(days=1)
    while a_date.weekday() > 4: # Checks if date is a weekday. Monday to Friday are index 0 to 4
        a_date = a_date - timedelta(days=1) # Continues to subtract a day until it is a weekday.
    return a_date.strftime('%Y-%m-%d') # Format from the API.

previous_weekday_date = previous_weekday(date.today()) # 

# Input date at the end of the API URL.
response = requests.get(f"https://www.federalregister.gov/api/v1/documents.json?fields[]=abstract&fields[]=agencies&fields[]=dates&fields[]=document_number&fields[]=page_length&fields[]=pdf_url&fields[]=publication_date&fields[]=title&per_page=1000&order=newest&conditions[publication_date][is]={previous_weekday_date}")

print(f'Status code: {response.status_code}')

# If there are reports published enters this code statement.
if response.json()['count'] != 0:
    published_documents = pd.DataFrame(response.json()['results']).shape[0]
    print(f'On {previous_weekday_date}, {published_documents} documents were published in the Federal Register.')
    
    # List of columns in the data.
    list_of_columns = pd.DataFrame(response.json()['results']).columns
    print(f'Columns in the results data: {list(list_of_columns)}.')
    
    # Converts the data to a dataframe.
    df_published_documents = pd.DataFrame(response.json()['results'])

# If there are no published reports (e.g., Weekend) enters this condition.    
else:
    print(f'No documents were published on {date_yesterday}.')
    
# If we are interested in documents that have the word "stock" in the title we can do as follows. 
df_documents_stocks = df_published_documents[df_published_documents['title'].str.contains('stock', case = False)]

Status code: 200
On 2022-12-02, 125 documents were published in the Federal Register.
Columns in the results data: ['abstract', 'agencies', 'dates', 'document_number', 'page_length', 'pdf_url', 'publication_date', 'title'].


In [29]:
# This is the resulting dataframe of the code block above.
print(df_documents_stocks.shape)
df_documents_stocks

(1, 8)


Unnamed: 0,abstract,agencies,dates,document_number,page_length,pdf_url,publication_date,title
65,,[{'raw_name': 'SECURITIES AND EXCHANGE COMMISS...,,2022-26232,3,https://www.govinfo.gov/content/pkg/FR-2022-12...,2022-12-02,Self-Regulatory Organizations; The Nasdaq Stoc...


# NOTEBOOK END