In the world of programming, especially for social scientists exploring data-rich environments, Application Programming Interfaces (APIs) serve as crucial tools. An API can be thought of as a set of rules and protocols that allows one software application to interact with another. It defines the methods and data formats that developers should use when programming their software to interact with other software, whether it be web-based services, operating systems, database systems, computer hardware, or software libraries.

# What is an API?

APIs are like intermediaries that allow applications to communicate with each other. For example, when you use a social media analytics tool to gather data about trends or engagement, this tool likely uses an API to retrieve the data directly from the social media platform. This direct communication between software through APIs can simplify complex processes, making them more manageable and standardized.

In the context of Python programming, an API could allow your Python script to send a request to a web service. This web service then processes the request, retrieves the necessary data, and sends it back to your script. This is commonly seen in tasks like accessing Twitter data, where Python libraries such as Tweepy abstract the API usage into simple functions that a social scientist can use without needing to understand the underlying technical complexities.

## Advantages of Using APIs

APIs allow you to directly access the data you need in a format that’s easy to integrate with your applications, significantly reducing the time and effort required for data processing and cleaning. This makes APIs highly efficient for retrieving data, which is especially valuable in research settings where time and accuracy are critical. The ability to access data directly from the source also reduces the risk of errors that can occur during data transcription.

APIs are maintained by service providers, offering a reliable and stable source of data. This reliability is essential for longitudinal studies and ongoing research projects that require consistent data inputs over time. Furthermore, the stability provided by APIs can help in planning and executing research projects without the need to continuously adjust for data sourcing issues.

APIs can handle large amounts of data efficiently, which is crucial for conducting extensive social science research. This scalability allows researchers to access larger datasets than would be feasible with manual collection methods, supporting more comprehensive analyses and insights. The ability to scale data requests also helps in accommodating the needs of larger research teams or projects with broader scope.

Once one becomes comfortable with a service's API, accessing data is much quicker than using that service's web interface.  (Many times an organization will go further and make less data available via a web interface than an API.)  While searching via a graphical interface feels easier to a novice programmer, it is much slower once you figure out how to structure your queries.  For example, if you learn about a new variable to download, you do not have to navigate through menus and wait for the website to reload as you navigate backwards, you can simply change the variable name in your query and send it to the API.  In this way you can get more data more quickly with APIs than searching via web interfaces.

## Disadvantages of Using APIs

APIs may have limitations set by providers, such as rate limits or restricted access to certain types of data. These restrictions can impact how much data can be retrieved and how often requests can be made, potentially limiting the scope of research projects. It is important for researchers to be aware of these limitations and plan their data collection accordingly.

Some APIs, especially those offering high-quality or extensive data services, might be costly. This cost can be prohibitive for academic institutions or individual researchers with limited funding. The financial aspect of using APIs must be considered when planning projects that require large volumes of data or frequent data updates.

Using an API means relying on an external provider for data. If the API is discontinued or its terms of service change, it might affect the availability of data or the conditions under which it can be used. This dependence on external providers requires researchers to have contingency plans in case their access to the API is compromised.

# Why APIs Exist
Organizations like application programming interfaces because they reduce the external load on internal hardware and software resources.  Examples organizations include private companies like Meta or government services like the United States Census or states' Department of Motor Vehicles.  There is so much traffic on the internet that exposing internal databases to it could overwhelm a company, and the difficulty only intensifies as more data are exposed.  APIs allow organizations to limit requests for data by requiring requestors to verify themselves, and with this verification organizations can limit how much data can be retrieved.  These ceilings render potential digital chaos into more predictable streams of requests, making it easier to predict how much hardware and human resources an organization will have to budget for.  

APIs offer a level of predictability in interactions with external software, ensuring that data exchanges and functionality are executed reliably and consistently according to predefined rules. This predictability aids in maintaining system integrity and reducing the likelihood of errors during data transactions. Additionally, APIs grant organizations more control over how their data and services are accessed and used by third parties. By defining clear parameters and protocols, they can safeguard their systems and manage the load on their resources more effectively.  They can even charge for providing more data to requestors, allowing the API to fund its maintenance or even contribute revenue to the organization.

Furthermore, by making APIs available, organizations encourage the development of a vibrant ecosystem of third-party applications. This ecosystem can lead to increased usage of their platforms, as developers create innovative applications that enhance the core offerings, thereby driving user engagement and expanding the organization's reach and impact in the market. These third-party applications add value for both users and the organization, fostering a symbiotic relationship that stimulates continuous growth and innovation.

For all these reasons, APIs are highly useful to organizations that provide them.  They are also useful to groups that take advantage of them; these groups range from data science teams in large organizations to small app developer companies to academics and non-profits using them for research purposes.  

# APIs Are Not Web Scraping
Using an application programming interface is not equivalent to scraping a website.  Scraping consists of _unauthenticated requests_.  Since it is more difficult for an organization to monitor scraping, a scraper can theoretically retrieve much more data from an organization than an API requestor.  Until the rise of smartphones and social media, scraping was the dominant method of retrieving data from the internet ando rganizations often did not understand the value of their data.  The result has often been shocked upon seeing unenvisioned use cases.  

Such was the case when LinkedIn served hiQ, a data analytics company, a cease-and-desist order for scraping its data, claiming doing so violated the Computer Fraud and Abuse Act.  In response hiQ sued LinkedIn in _hiQ Labs, Inc. v. LinkedIn Corp._  The [case was an odyssey](https://natlawreview.com/article/hiq-and-linkedin-reach-proposed-settlement-landmark-scraping-case) and even reached the Supreme Court, which in June 2021 sent it back down to the 9th Circuit Court of Appeals.  ([https://cdn.ca9.uscourts.gov/datastore/opinions/2022/04/18/17-16783.pdf](See the case text here.))  The parties settled in December 2022, by which point hiQ was bankrupt.  As the _The National Law Review_ explains:

    Practically speaking, though, the dispute had essentially reached its logical end with the last court ruling in November – hiQ had prevailed on the Computer Fraud and Abuse Act (CFAA) “unauthorized access” issue related to public website data but was facing a ruling that it had breached LinkedIn’s User Agreement due to its scraping and creation of fake accounts (subject to its equitable defenses).  

Scraping public data is therefore legal.  In the last few years more and more of organizations' websites have done behind user login walls, vastly reducing the amount of public facing data.  As of this writing we have seen no confirmation that the two trends are related, but the possibility makes sense.

Web scraping involves programmatically navigating and extracting data from websites. Unlike APIs, which provide data in a structured format (such as JSON or XML), web scraping requires parsing the HTML content of the web page itself, extracting the needed data embedded in the page's code.

APIs are designed to be accessed by programs and provide data in a predictable, structured format. In contrast, web scraping involves extracting data from the HTML and is susceptible to failure if the website's layout changes. This makes APIs a more reliable method for data retrieval, ensuring that data is consistently delivered in a usable format.

API calls are generally more efficient because they are intended for programmatic access and return data quickly in a consistent format. Web scraping can be slower and more resource-intensive, as it often involves downloading entire web pages and searching through large amounts of HTML code to find the necessary data. This efficiency makes APIs particularly useful for handling large datasets or performing frequent data updates.

Using APIs provided by a service is typically sanctioned by the service provider, making it a more secure and ethically sound method of data collection. In contrast, web scraping can violate the terms of service of a website and may raise legal and ethical concerns. This distinction is crucial for researchers who need to ensure that their data collection methods are compliant with legal standards.


# Understanding Key API Protocols: REST, RPC, and SOAP
APIs can utilize various protocols, with REST, RPC, and SOAP being among the most common. Each protocol has its unique approach to enabling communication between systems.

REST (Representational State Transfer) is a protocol that uses standard HTTP methods like GET, POST, PUT, and DELETE for communication. Its strength lies in its simplicity and how closely it aligns with the architecture of the web, making it intuitive for web developers to use. RESTful APIs are stateless, meaning that each request from a client to the server must contain all the information needed to understand the request, and session state is held on the client side only. This makes REST scalable and efficient for internet applications, although it might be too inflexible for more complex querying operations.

RPC (Remote Procedure Call) enables a program to execute a procedure on a different address space (commonly on another computer on a shared network), which is made to appear as if it were a local procedure call, without the programmer explicitly coding the details for the remote interaction. This makes it very effective for actions that are tightly coupled to a procedure call that returns a specific result. However, RPC can become complicated when dealing with the broader requirements and flexibilities of a distributed system, as it tends to be more tightly coupled to the application it serves.

SOAP (Simple Object Access Protocol) is a protocol based on XML for accessing web services. It provides a robust mechanism for ensuring that the communication is strongly typed and compliant with formal standards and enterprise-level security features. SOAP can handle complex transactions and offers considerable flexibility in terms of transport (can be used over HTTP, FTP, SMTP, etc.), but it also involves a considerable amount of overhead which can lead to slower performance and more complexity compared to REST.

REST is the most commonly used protocol among these, primarily because of its simplicity and compatibility with the web. Most modern web APIs are designed around REST principles due to its ability to efficiently handle multiple types of calls, return different data formats, and its stateless nature, which scales well for the web environments.

# Understanding API Authentication
API authentication is a critical component of API design, ensuring that only authorized users can access the API. APIs make traffic flow manageable by tracking who makes requests, and it tracks by requiring requestors to provide authentication.  This section briefly discusses the most common methods of authentication.

HTTP Authentication: This is a simple challenge-response mechanism where the server challenges a request, and the client provides authentication credentials. HTTP Basic Authentication is easy to implement but not very secure unless combined with SSL, as credentials are only base64 encoded.

API Tokens: In this method, a user or client application receives a token after providing their authentication credentials. The client must include this token in the API requests to access the API. Tokens are typically a secure way of handling authentication as they can be easily revoked, have limited lifetimes, and ensure that the original credentials are not sent with each request.

OAuth 2.0: This is a more complex, secure, and versatile standard for authorization. OAuth 2.0 allows third-party services to exchange web resources on behalf of a user. It's particularly useful for scenarios where you need to allow users to access resources without exposing their credentials to the third-party service. OAuth 2.0 uses access tokens for authentication and provides various flows for different types of client applications, making it highly effective for large-scale API access in web and mobile applications.

Each authentication method has its specific use cases and trade-offs between security and ease of implementation. Choosing the right method depends on the particular needs of the application, the security requirements, and the expected scale of API usage.

# Using the United States Census API
Now we turn to actually using an API, in this case the United States Census'.  By using the API you can access any of the datasets the United States Census makes available via the web interface.  (Some APIs let you access more data than is provided via a web interface, it all depends on the API owner's preferences.) For an overview of how Census data is structured and what datasets are available, see [this training video from the Census](https://www.census.gov/data/academy/webinars/2022/getting-started-with-american-community-survey-data.html).

## Orientation
Before using any organization's API, orient yourself with key features of the API.  Start orientation by finding the developers' portal, a comprehensive resource offered by many organizations.  Spend at least an hour there learning about the API. [This link goes to the United States Census'](https://www.census.gov/data/developers.html) and below is a screenshot of that page, both as of May 2024.

![The US Census' developer portal](./figures/census_developerHome.png)

A developers' portal typically houses all the tools, resources, and information necessary to start using the API effectively. As you can see at the Census' developer portal home, one-click access to the API documentation, a Slack channel, and a mailing list is available, and there are two places to request a key (discussed in the next subsection).

API documentation is the cornerstone of successfully using an API. It includes detailed descriptions of various endpoints, data types, authentication methods, and error codes that one might encounter. It also provides sample requests and responses that are especially useful for understanding how to interact with the API.  Developers should pay special attention to the sections concerning rate limits, which define how many requests can be made in a given period. Adhering to these limits is essential to avoid service interruptions or API access issues.

Most developers' portals feature a Frequently Asked Questions (FAQ) section and a developer forum. The FAQ addresses common challenges and queries that other developers have encountered, offering solutions and best practices that can save time and prevent common mistakes.  The forum is where developers will ask questions they could not figure out from documentation.  It often allows for interaction with parts of an organizations' engineering team and can be the best place to go for answers after an organization modifies its API.  Forums are only useful when there are enough developers that there is frequent activity on the forum.

When orientating, you may find that an organization maintains multiple APIs.  For example, the [US Census maintains](https://www.census.gov/data/developers/guidance/api-user-guide.html):

    three underlying services: Census Data API, TIGERweb REST Services and the Geocoder REST Services:

    *Census Data API*

    The Census Data Application Programming Interface (API) is an API that gives the public access to raw statistical data from various Census Bureau data programs. In terms of space, we aggregate the data and usually associate them with a certain Census geographic boundary/area defined by a FIPS code. In terms of time, we associate the data with a specific vintage (reference year).

    *TIGERweb*

    TIGERweb GeoServices REST API  provides Census area boundaries/shapes referenced by FIPS codes. This service can take two types of parameters to return one or more Census boundaries: a FIPS code or a latitude/longitude pair. FIPS codes are 12-digit codes that are hierarchical in code so that the higher numbers define higher-level geographies and lower numbers define lower-level geographies.

    *Geocoder*

    Our publicly available Geocoding Services API  translates addresses and other location formats into latitude/longitude parameters, which are then fed into the TIGERweb REST services to request a Census boundary.

This chapter focuses on the Census Data API, but notice how useful the other two are for geospatial analysis.

### Other Developer Portals
Here is what Twitter's developer portal looks like as of May 2024:

![Twitter developer portal](./figures/twitter_developerHome.png)

And here is section of Meta's developer portal for Instagram as of May 2024:

![Instagram developer portal](./figures/instagram_developerHome.png)

## Authentication
To authenticate requires identification, using called a developer key.  This subsection provides screenshots of that process for the US Census as of May 2024.

In the earlier screenshot, you may have noticed a big yellow button with the text "Request a Key".  In case you did not, here is that screen zoomed in:

![Request a key button](./figures/census_developerHome_requestKey.png)

After clicking "Request a key", you will be asked to provide an organization and an email address.  There is no documentation that the Census is restricted to particular organizations, so it appears that you can put whatever name you want in there.  Be nice.

![Provide information](./figures/census_requestKey_info.png)

The email address you provide will soon receive a confirmation email with your API key -- it is a long sequence of numbesr and letters - and an activation link.  Copy and paste the API key into a text file and save it somewhere you will remember.  Click that link and you should see the following screen:

![Success](./figures/census_requestKey_success.png)

Once you have saved the key and activated it, delete the email to make it harder for others to find the key information.  Your future self is part of others, so make sure the key is saved somewhere you will remember.

One quirk of the US Census is that a key is technically not required to use the API.  Without a key, one can only make 500 requests per IP address per day, plenty for most research projects.  Nonetheless, the steps and code shown in this chapter are worthwhile because most APIs require a key for use.

## Download Data Directly
Now it is time to download data, a process often called "making requests". For the rest of this lesson, we will use the American Community Survey API to download the data provided with this book.  There are many other datasets the Census provides via its API. 

Downoading data from an API is as simple as creating a URL that tells the service what data to return.  In this way, it is like web scraping, but creating a URL for an API is still different because it requires authentication and therefore allows the service to monitor usage.

The structure of the request URLs is https://api.census.gov/data/year/dataset_name/acronym/profile?get=<variable1,variable2,...variable50>&for=geography&key=<your_key>.  The example call the Census provides is:

    https://api.census.gov/data/2019/acs/acs1?get=NAME,B02015_009E,B02015_009M‌&for=state:*&key=<your_key>
    
This URL requests 2019 ACS 1 year data data for the total number of Hmong people (B02015_009E) with a corresponding margin of error (B02015_009M) for all states.  (The dataset is technically the ACS and "acs1" is the acronym for the one-year version.)  The "\*" is the same wildcard character introduced in the qualitative analysis chapter, so in this case it stands for all states.  

_As of May 2024, this call is wrong because it does not contain "profile?"._

To see what this call returns in terms of data, paste the URL into a web browser search bar.  The browser will display the returned data.  You could copy and paste those data into a spreadsheet, but is better to programmatically retrieve the data.

To return to data closer to what we have analyzed, use the variable DP04_0132E, median rent.  This variable has been provided to you at the place level DP02, but for this call we will use the state level and ask only for 2019 to save space.  The call is:

    https://api.census.gov/data/2019/acs/acs5/profile?get=DP04_0132E&for=state:*&key=<your_key>

The result is:

    [["DP04_0132E","state"],
    ["1427","01"],
    ["2417","02"],
    ["7942","04"],
    ["886","05"],
    ["391290","06"],
    ["17752","08"],
    ["711","10"],
    ["13347","11"],
    ["9972","09"],
    ["48996","12"],
    ["8221","13"],
    ["623","16"],
    ["15363","15"],
    ["27987","17"],
    ["1997","18"],
    ["1262","19"],
    ["1177","20"],
    ["1423","21"],
    ["2022","22"],
    ["785","23"],
    ["24265","24"],
    ["43353","25"],
    ["5628","26"],
    ["5894","27"],
    ["691","28"],
    ["2559","29"],
    ["404","30"],
    ["752","31"],
    ["3540","32"],
    ["1453","33"],
    ["40586","34"],
    ["645","35"],
    ["145134","36"],
    ["6106","37"],
    ["626","38"],
    ["4612","39"],
    ["1347","40"],
    ["7373","41"],
    ["14976","42"],
    ["1197","44"],
    ["3707","45"],
    ["379","46"],
    ["4038","47"],
    ["38715","48"],
    ["653","50"],
    ["2121","49"],
    ["35307","51"],
    ["32579","53"],
    ["275","54"],
    ["2921","55"],
    ["330","56"],
    ["435","72"]


Notice that there is no state name, only codes.  When that happens, add "NAME&" to the URL before "for=".  To retrieve the same data but with state names, the call is:

    https://api.census.gov/data/2019/acs/acs5/profile?get=DP04_0132E,NAME&for=state:*&key=<your_key>

The result is:

    [["DP04_0132E","NAME","state"],
    ["1427","Alabama","01"],
    ["2417","Alaska","02"],
    ["7942","Arizona","04"],
    ["886","Arkansas","05"],
    ["391290","California","06"],
    ["17752","Colorado","08"],
    ["711","Delaware","10"],
    ["13347","District of Columbia","11"],
    ["9972","Connecticut","09"],
    ["48996","Florida","12"],
    ["8221","Georgia","13"],
    ["623","Idaho","16"],
    ["15363","Hawaii","15"],
    ["27987","Illinois","17"],
    ["1997","Indiana","18"],
    ["1262","Iowa","19"],
    ["1177","Kansas","20"],
    ["1423","Kentucky","21"],
    ["2022","Louisiana","22"],
    ["785","Maine","23"],
    ["24265","Maryland","24"],
    ["43353","Massachusetts","25"],
    ["5628","Michigan","26"],
    ["5894","Minnesota","27"],
    ["691","Mississippi","28"],
    ["2559","Missouri","29"],
    ["404","Montana","30"],
    ["752","Nebraska","31"],
    ["3540","Nevada","32"],
    ["1453","New Hampshire","33"],
    ["40586","New Jersey","34"],
    ["645","New Mexico","35"],
    ["145134","New York","36"],
    ["6106","North Carolina","37"],
    ["626","North Dakota","38"],
    ["4612","Ohio","39"],
    ["1347","Oklahoma","40"],
    ["7373","Oregon","41"],
    ["14976","Pennsylvania","42"],
    ["1197","Rhode Island","44"],
    ["3707","South Carolina","45"],
    ["379","South Dakota","46"],
    ["4038","Tennessee","47"],
    ["38715","Texas","48"],
    ["653","Vermont","50"],
    ["2121","Utah","49"],
    ["35307","Virginia","51"],
    ["32579","Washington","53"],
    ["275","West Virginia","54"],
    ["2921","Wisconsin","55"],
    ["330","Wyoming","56"],
    ["435","Puerto Rico","72"]]


The Census API distinguishes between variables and tables, a collection of variables.  Each call can request up to 50 distinct variables.  To request more than 50 one has to request a table.  The key is to add "group(\<table\>)" after "get=" to the URL.  Here is an example call that retrieves the B25064 table, median gross rent.  

    https://api.census.gov/data/2019/acs/acs5?get=group(B25064)&for=state:*&key=<your_key>

The result is:

    [["B25064_001E","B25064_001EA","B25064_001M","B25064_001MA","GEO_ID","NAME","state"],
    ["792",null,"5",null,"0400000US01","Alabama","01"],
    ["1244",null,"13",null,"0400000US02","Alaska","02"],
    ["1052",null,"4",null,"0400000US04","Arizona","04"],
    ["745",null,"4",null,"0400000US05","Arkansas","05"],
    ["1503",null,"4",null,"0400000US06","California","06"],
    ["1271",null,"6",null,"0400000US08","Colorado","08"],
    ["1130",null,"8",null,"0400000US10","Delaware","10"],
    ["1541",null,"19",null,"0400000US11","District of Columbia","11"],
    ["1180",null,"6",null,"0400000US09","Connecticut","09"],
    ["1175",null,"4",null,"0400000US12","Florida","12"],
    ["1006",null,"4",null,"0400000US13","Georgia","13"],
    ["853",null,"7",null,"0400000US16","Idaho","16"],
    ["1617",null,"18",null,"0400000US15","Hawaii","15"],
    ["1010",null,"4",null,"0400000US17","Illinois","17"],
    ["826",null,"3",null,"0400000US18","Indiana","18"],
    ["789",null,"4",null,"0400000US19","Iowa","19"],
    ["850",null,"4",null,"0400000US20","Kansas","20"],
    ["763",null,"4",null,"0400000US21","Kentucky","21"],
    ["866",null,"4",null,"0400000US22","Louisiana","22"],
    ["853",null,"8",null,"0400000US23","Maine","23"],
    ["1392",null,"5",null,"0400000US24","Maryland","24"],
    ["1282",null,"6",null,"0400000US25","Massachusetts","25"],
    ["871",null,"3",null,"0400000US26","Michigan","26"],
    ["977",null,"4",null,"0400000US27","Minnesota","27"],
    ["780",null,"5",null,"0400000US28","Mississippi","28"],
    ["830",null,"4",null,"0400000US29","Missouri","29"],
    ["810",null,"10",null,"0400000US30","Montana","30"],
    ["833",null,"5",null,"0400000US31","Nebraska","31"],
    ["1107",null,"5",null,"0400000US32","Nevada","32"],
    ["1111",null,"9",null,"0400000US33","New Hampshire","33"],
    ["1334",null,"5",null,"0400000US34","New Jersey","34"],
    ["844",null,"6",null,"0400000US35","New Mexico","35"],
    ["1280",null,"3",null,"0400000US36","New York","36"],
    ["907",null,"4",null,"0400000US37","North Carolina","37"],
    ["826",null,"8",null,"0400000US38","North Dakota","38"],
    ["808",null,"3",null,"0400000US39","Ohio","39"],
    ["810",null,"4",null,"0400000US40","Oklahoma","40"],
    ["1110",null,"5",null,"0400000US41","Oregon","41"],
    ["938",null,"3",null,"0400000US42","Pennsylvania","42"],
    ["1004",null,"10",null,"0400000US44","Rhode Island","44"],
    ["894",null,"5",null,"0400000US45","South Carolina","45"],
    ["747",null,"8",null,"0400000US46","South Dakota","46"],
    ["869",null,"3",null,"0400000US47","Tennessee","47"],
    ["1045",null,"3",null,"0400000US48","Texas","48"],
    ["985",null,"12",null,"0400000US50","Vermont","50"],
    ["1037",null,"8",null,"0400000US49","Utah","49"],
    ["1234",null,"5",null,"0400000US51","Virginia","51"],
    ["1258",null,"6",null,"0400000US53","Washington","53"],
    ["725",null,"5",null,"0400000US54","West Virginia","54"],
    ["856",null,"2",null,"0400000US55","Wisconsin","55"],
    ["855",null,"13",null,"0400000US56","Wyoming","56"],
    ["478",null,"4",null,"0400000US72","Puerto Rico","72"]]

For an example of a table with more than 50 variables, see S2507: Financial Characteristics for Housing Units Without a Mortgage.  For an easy to use list and description of available tables, see [the Census Reporter website](https://censusreporter.org/topics/).  Clicking links from there will take you the corresponding table at the Census' data explorer.
    
    
## Accessing the API Through Python
To programmatically retrieve data from an API requires submitting a URL via a library.  In Python, this library is called `requests`.  

In [17]:
# pip install requests # Uncomment this line if you have not yet installed requests
import requests

## URL parts 
year='2019'
name= 'acs'
acronym='acs5' 
cols= 'DP04_0132E,NAME' 
state='*' 
keyfile= 'census_key.txt'  # Change this name to reflect the name of your file. Make sure it is saved in the same directory as your script.

## Read api key in from file
with open (keyfile) as key:
    api_key=key.read().strip()

## Retrieve data
base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
data_url = f'{base_url}profile?get={cols}&for=state:{state}&key={api_key}'
response = requests.get(data_url)


`response` is the data returned.  `requests`, however, return status codes, so one more step is needed to get the data that are expected.

In [19]:
response # <Response [200]>, good

data = response.json()  # The Census returns the data with JSON formatting.

data

len(data)

[['DP04_0132E', 'NAME', 'state'],
 ['1427', 'Alabama', '01'],
 ['2417', 'Alaska', '02'],
 ['7942', 'Arizona', '04'],
 ['886', 'Arkansas', '05'],
 ['391290', 'California', '06'],
 ['17752', 'Colorado', '08'],
 ['711', 'Delaware', '10'],
 ['13347', 'District of Columbia', '11'],
 ['9972', 'Connecticut', '09'],
 ['48996', 'Florida', '12'],
 ['8221', 'Georgia', '13'],
 ['623', 'Idaho', '16'],
 ['15363', 'Hawaii', '15'],
 ['27987', 'Illinois', '17'],
 ['1997', 'Indiana', '18'],
 ['1262', 'Iowa', '19'],
 ['1177', 'Kansas', '20'],
 ['1423', 'Kentucky', '21'],
 ['2022', 'Louisiana', '22'],
 ['785', 'Maine', '23'],
 ['24265', 'Maryland', '24'],
 ['43353', 'Massachusetts', '25'],
 ['5628', 'Michigan', '26'],
 ['5894', 'Minnesota', '27'],
 ['691', 'Mississippi', '28'],
 ['2559', 'Missouri', '29'],
 ['404', 'Montana', '30'],
 ['752', 'Nebraska', '31'],
 ['3540', 'Nevada', '32'],
 ['1453', 'New Hampshire', '33'],
 ['40586', 'New Jersey', '34'],
 ['645', 'New Mexico', '35'],
 ['145134', 'New York', 

Notice two things.  First, `data` is actually a list of lists, not a dictionary like most JSON objects.  Second, the first list in the list is the header and the subsequent lists are data for geographic entitty, in this case state.  

You could save `data` as a text file, but it is easier to convert it to a dataframe and then save it.  Fortunately, `pandas` makes it easy to convert a list to a dataframe.  The one wrinkle is that when one of the lists in the list is a header, it needs to be separated.

In [31]:
import pandas as pd # Do not forget to load pandas

header = data.pop(0)  # Removes the first entry
df = pd.DataFrame(data, columns=header)

df.head()  # Inspect the result

df.to_csv(acronym + '_' + 'medianRent' + '.csv') # medianRent is how I describe the variable. 'medianRent.csv' also works but I like separating the two parts to emphasize the variable name.

# cols.split(',')[0]  # Replace medianRent with this line to get the actual variable name.  Careful if you want to use multiple cols values in the filename, the filename could get quite long and you will have to modify this code.

Now that you understand the basics of requesting data using `requests`, let us learn how to develop more complicated requests.  For the following three sets of code, the structure is the same as what was just shown above, but notice when geographic object names are changed, e.g. `state` is now `county` in the first example.

In [None]:
### DP04_0132E by county
## URL parts
year='2019'
name= 'acs'
acronym='acs5' 
cols= 'DP04_0132E,NAME' 
county='*' 
keyfile= 'census_key.txt'  # Change this name to reflect the name of your file. Make sure it is saved in the same directory as your script.

## Read api key in from file
with open (keyfile) as key:
    api_key=key.read().strip()

## Retrieve data, print output to screen
base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
data_url = f'{base_url}profile?get={cols}&for=county:{county}&key={api_key}'
response = requests.get(data_url)

response # <Response [200]>, good

data_county = response.json()  # The Census returns the data with JSON formatting.

len(data_county)  # Much longer than data

## Make a dataframe
header_county = data_county.pop(0)  # Removes the first entry
df_county = pd.DataFrame(data_county, columns=header_county)

df_county.head()  # Inspect the result

df_county.shape # Same number of rows as data_county

df_county.head() # Looks good



Notice above that the code above is the same as when we were accessing `DP04_0132E` by state.  When you see the same code structure multiple times in a script, that is a sign that you should write a function.  Below is a function that takes the Census' response, processes it, and returns a dataframe.  Once it is defined, its response is tested and subsequent responses from the Census are processed using the function instead of repeatedly pasting the four lines inside of the function.

In [None]:
### Make the function
def response_to_df(from_census):
    data = response.json()
    header = data.pop(0)
    df = pd.DataFrame(data, columns=header)
    print('The returned data contain ' + str(df.shape[0]) + ' rows and ' + str(df.shape[1]) + ' columns.')
    return(df)

df_county2 = response_to_df(from_census=response)

df_county2.shape == df_county.shape  # True, meaning the function returns the same number of rows and columns 

In [None]:
### DP04_0132E by place
## URL parts
year='2019'
name= 'acs'
acronym='acs5' 
cols= 'DP04_0132E,NAME' 
place='*' 
keyfile= 'census_key.txt'  # Change this name to reflect the name of your file. Make sure it is saved in the same directory as your script.

## Read api key in from file
with open (keyfile) as key:
    api_key=key.read().strip()

## Retrieve data
base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
data_url = f'{base_url}profile?get={cols}&for=place:{place}&key={api_key}'
response = requests.get(data_url)

df_place = response_to_df(from_census=response)


### Multiple variables by county
## URL parts
year='2019'
name= 'acs'
acronym='acs5' 
cols= 'DP04_0001E,DP04_0108E,DP04_0132E,DP04_0134E,NAME' # Notice the lack of spaces after the coma
place='*' 
keyfile= 'census_key.txt'  # Change this name to reflect the name of your file. Make sure it is saved in the same directory as your script.

## Read api key in from file
with open (keyfile) as key:
    api_key=key.read().strip()

## Retrieve data
base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
data_url = f'{base_url}profile?get={cols}&for=place:{place}&key={api_key}'
response = requests.get(data_url)

df_multiple = response_to_df(from_census=response)

df_multiple.head()  # Look at the dataframe to confirm multiple variables.

You will often want to conduct longitudinal analysis, analysis of a variable over time.  The Census does not provide a way to download multiple years of data with one call, so acquiring longitudinal data requires extra work.  The below code provides an example of that.

In [None]:
### DP04_0132E by state, 2010-2023
## URL parts
name = 'acs'
acronym = 'acs5'
cols = 'DP04_0132E,NAME'
state = '*'
keyfile = 'census_key.txt'  # Ensure this is the correct file name and path.

## Read the API key from the file
with open(keyfile) as key:
    api_key = key.read().strip()

## Define the range of years
#years = [2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022]
years = range(2010, 2024)  # From 2010 to 2023

## Initialize a list to collect responses
data_frames = []  # Will make a list of dataframes

## Loop through each year and fetch the data
for year in years:
    print('Starting on ' + str(year) + '.')
    base_url = f'https://api.census.gov/data/{year}/{name}/{acronym}/' 
    data_url = f'{base_url}profile?get={cols}&for=state:{state}&key={api_key}'
    response = requests.get(data_url)

    if response.status_code == 200:
        # Optionally convert the response to JSON and add to the list
        temp = response.json()
        temp = response_to_df(from_census=temp)
        temp['Year'] = year
        data_frames.append(temp)
        print(f"Data for {year} retrieved successfully.")
    else:
        print(f"Failed to retrieve data for {year}: {response.status_code}")

df = pd.concat(data_frames, ignore_index=True)
        
df.shape  # Looks good
df.head() # Looks good



## Using a Package
While `requests` is a package, it is not designed for the Census or any other API, so formatting URLs can become a source of frustration.  Fortunately, many APIs have packages developed for them, some by the API provider and some by unaffiliated individuals.  These packages handle URL formatting and other common tasks, making it easier to work with an API.  Their drawback is that one has to learn how a package is designed to work with an API in addition to how the API works.  There are two packages for working with United States Census data in Python, [`census`](https://pypi.org/project/census/) and [`cenpy`](https://github.com/cenpy-devs/cenpy). 


# Assignment Questions

## The `df_multiple` dataframe has several variables in addition to DP04_0132E.  In words, what are they?  The answer needs to be descriptive, i.e. not the variable name but what the variable measures.

## When making requests for multiple years, no data were retrieved for 2023.  Why?

## Change the column names in `df_multiple` to be easier to understand.

## Download table S2507 and convert it to a dataframe.  Do not use the `requests` library.


## Get 50 variables

Unfortunately, the Census API only provides tables starting with the letters B and C.  The datasets used for this textbook are DP02 and DP04, known as data profiles. Write code that requests the first 50 Estimate variables from DP02 for 2015.  Do not use the `requests` library. Make a dataframe and saved is as `DP02E_first50.csv`.  _Hint: Use one of the files from this book's Data/ACS/ folder for column names._ 


## One variable, requests library

## Four varibles, requests library

## Six variables, five years, requests library

## Modify function to also save a file out. Make sure the filename changes based on arguments passed to the function.