# Connecting to the Congress and Open Secrets APIs
First, register for an API key at the following websites:

  * Congress.gov API: https://api.congress.gov/sign-up/
  * Open Secrets API: https://www.opensecrets.org/api/admin/index.php?function=signup

Next go to your project folder, `git pull origin main`, open your .env file and add your keys. Remember, no quotes or spaces.

Then `docker compose up` and open the containerized Jupyter Lab. Create a new notebook called APIs.ipynb, and follow along here.

First load the following libraries and bring in your API keys:

In [10]:
import numpy as np
import pandas as pd
import requests
# json and os are part of base Python, so they don't need to be in our requirements.txt file
import json 
import os
congressapi = os.getenv('congressapi')
opensecretsapi = os.getenv('opensecretsapi')

Check that your keys loaded correctly (but don't leave in any code that displays your keys).

## Accessing the Congress.gov API

An API is a system for transferring data over the internet. As users, we have to supply four things:

  * a "root": the beginning of the URL that leads to all APIs that one data provider maintains
  * an "endpoint": the later part of the URL, leading to one specific API
  * parameters: the arguments the API expects us to provide along with the root and endpoint that control features about the data the API sends back to us (such as date range, etc)
  * headers: additional information that we supply to the API to let them know who we are and what our system is

Many APIs require credentials/keys, and different APIs ask that we provide the key as either a parameter or a header.

But by far the most important skill for APIs is the ability to find and use the API documentation. Your goal in reading the documentation is finding out what the API requires with respect to the root, endpoint, parameters, and headers.

For the congress.gov API, start with the documentation here: https://api.congress.gov

There are many endpoints, but (if you look at the examples) they all begin with https://api.congress.gov/v3. Let's set that as the root:

In [7]:
root = 'https://api.congress.gov/v3'

To choose the right endpoint, we have to decide what data we want to obtain. We'll come back to this API many times this semester. But let's start with the bill endpoint for the 118th congress: /bill/{congress}

When an endpoint includes curly braces, it means to fill in that part of the URL with the value you want. So for the 118th congress the endpoint is /bill/118.

We can build code that will allow us to swap the congress number with code by using an "f-string". A Python f-string starts with an f before the quote, then uses curly braces to indicate Python variable defined elsewhere. We can construct the endpoint as follows:

In [5]:
congress = '118'
endpoint = f'/bill/{congress}'
endpoint

'/bill/118'

We can combine the root and endpoint by adding them together:

In [8]:
root + endpoint

'https://api.congress.gov/v3/bill/118'

Next, consider the parameters listed in the documentation:

  * congress: The congress number. For example, the value can be 117.
  * format: The data format. Value can be xml or json.
  * offset: The starting record returned. 0 is the first record.
  * limit: The number of records returned. The maximum limit is 250.
  * fromDateTime: The starting timestamp to filter by update date. Use format: YYYY-MM-DDT00:00:00Z.
  * toDateTime: The ending timestamp to filter by update date. Use format: YYYY-MM-DDT00:00:00Z.
* sort: Sort by update date in Congress.gov. Value can be updateDate+asc or updateDate+desc.

It is usually not necessary to supply all of the parameters, unless the parameter is specifically marked as "required", like "congress" is. But in this case, let's specify them all. Parameters need to be listed inside a Python dictionary, which is contained within curly braces and contains "key-value" pairs in which each parameter's name is attached to the value with a colon, and parameters are separated by commas. 

Because the congress parameter is part of the endpoint, it does not need to be part of the parameters.

Also, note that the example also includes our API key as the "api_key" parameter.

Here, I want JSON (not XML) data in response. I want all bills initiated from January 1st of this year until September 18. I am limited to 250 bills, so I set the limit to that maximum value (if I max out, I can call the API again with offset=1 to obtain the next 250 bills on the list). Finally, I want the results sorted in descending order by the date the bill was last updates.

The following example will clarify: 


In [9]:
mypars = {'format': 'json',
         'offset': 0,
         'limit': 250,
         'fromDateTime': '2023-01-01T00:00:00Z',
         'toDateTime': '2023-09-18T00:00:00Z',
         'sort': 'updateDate+desc',
         'api_key': congressapi}

Finally, we need to create the headers. Headers are not always necessary, depending on the API. But using headers to provide information about who we are is always good etiquette. There are two fields we should define. "From" gives us a place to provide our email, in case our request causes a problem for the data provider. "User-Agent" is for the user-agent string - a small piece of text that describes the system we are using to connect to the API. The easiest way to find the user-agent string is to connect to https://httpbin.org/user-agent, which is a website that reflects our user-agent back to us. Use this code to get the user-agent string:

In [11]:
r = requests.get('https://httpbin.org/user-agent')
useragent = json.loads(r.text)['user-agent']
useragent

'python-requests/2.31.0'

Then our headers dictionary is:

In [12]:
headers = {'User-Agent': useragent,
          'From': 'jkropko@virginia.edu'}

Finally, to put it all together, we use the `requests.get()` method, passing the root+endpoint, the parameters to the `params` argument, and the headers to the `headers` argument:

In [13]:
r = requests.get(root + endpoint,
                params = mypars,
                headers = headers)

If the request was successful, we will see a "200" in response. Anything other than 200 is cause for worry:

In [14]:
r

<Response [200]>

To extract the data in raw format, use the text attribute of the request variable we just defined:

In [16]:
#r.text

This is long, and mostly unreadable. This string is JSON format (we'll talk more about JSON later). The important thing is to get Python to understand that the curly braces in this text represent dictionaries, and the square braces represent lists. To convert the string in this way, use `json.loads()`:

In [18]:
#json.loads(r.text)

The most useful way to present the data however are in terms of a data frame. For that use the `pd.json_normalize()` method. To work, we have to specify the `record_path` argument, which is the key (or combination of keys) that lead to a list that contain all of the records we want to list as rows in the data frame:

In [19]:
mydf = pd.json_normalize(json.loads(r.text),
                        record_path = ['bills'])
mydf

Unnamed: 0,congress,number,originChamber,originChamberCode,title,type,updateDate,updateDateIncludingText,url,latestAction.actionDate,latestAction.text,latestAction.actionTime
0,118,5517,House,H,To reauthorize programs of the Economic Develo...,HR,2023-09-16,2023-09-16T07:14:01Z,https://api.congress.gov/v3/bill/118/hr/5517?f...,2023-09-15,Referred to the Subcommittee on Economic Devel...,
1,118,5497,House,H,To amend the Homeland Security Act of 2002 imp...,HR,2023-09-16,2023-09-16T07:14:01Z,https://api.congress.gov/v3/bill/118/hr/5497?f...,2023-09-15,Referred to the Subcommittee on Economic Devel...,
2,118,5473,House,H,Promoting Resilient Buildings Act of 2023,HR,2023-09-17,2023-09-17T13:30:21Z,https://api.congress.gov/v3/bill/118/hr/5473?f...,2023-09-15,Referred to the Subcommittee on Economic Devel...,
3,118,5465,House,H,To require the head of each agency to allow me...,HR,2023-09-16,2023-09-16T07:15:31Z,https://api.congress.gov/v3/bill/118/hr/5465?f...,2023-09-15,Referred to the Subcommittee on Economic Devel...,
4,118,5457,House,H,To support carbon dioxide removal research and...,HR,2023-09-16,2023-09-16T07:14:00Z,https://api.congress.gov/v3/bill/118/hr/5457?f...,2023-09-15,Referred to the Subcommittee on Water Resource...,
...,...,...,...,...,...,...,...,...,...,...,...,...
245,118,5398,House,H,Advancing Tech Startups Act,HR,2023-09-14,2023-09-14T11:15:19Z,https://api.congress.gov/v3/bill/118/hr/5398?f...,2023-09-12,Referred to the House Committee on Energy and ...,
246,118,5421,House,H,EITC Modernization Act,HR,2023-09-14,2023-09-14T22:15:22Z,https://api.congress.gov/v3/bill/118/hr/5421?f...,2023-09-12,Referred to the House Committee on Ways and Me...,
247,118,679,House,H,Expressing support for the designation of the ...,HRES,2023-09-15,2023-09-15T07:15:28Z,https://api.congress.gov/v3/bill/118/hres/679?...,2023-09-12,Referred to the House Committee on Oversight a...,
248,118,5414,House,H,Improving Mental Health Access from the Emerge...,HR,2023-09-14,2023-09-14T22:15:22Z,https://api.congress.gov/v3/bill/118/hr/5414?f...,2023-09-12,Referred to the House Committee on Energy and ...,


Now that we've successfully used this endpoint, try to get data from /member endpoint. Then try 