<a href="https://colab.research.google.com/github/mnfurey25/data_science_tutorials/blob/main/Using_Google_Collab_and_Making_Reports_with_Public_Information.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Welcome!

This is Google Collab which is a workspace for coding and data analysis. The goal of this tutorial is to discuss some of Google Collab's features and then discuss ways to access public information through an API.

## About Google Collab

Google Collab uses Jupyter notebooks. Jupyter notebooks allow users to write descriptions in text with [markdown](https://www.markdownguide.org/cheat-sheet/) and use [Python](https://docs.python.org/3/tutorial/introduction.html) to work with data, generate files, and publish  reports with analysis. It is a great feature that Google Collab is a cloud-based service, because one does not need to download software onto their computer. However, Google may have a license over this content, and so one should be cautious to store any sensitive information over Google Collab.

# Practice Using Markdown

Webpages are designed with markup languages to add tags to text to style and format them differently. You may have heard of [HTML](https://www.w3schools.com/html/) which is a universal markup language for internet content.

## Exercises

You can style and edit content with Google collab using [markdown](https://www.markdownguide.org/cheat-sheet/). Try referring to this sheet to do the following.

(1) Make this line an H2 heading.

(2) Bold this line

(3) Italicize this line

(4) Underline this line

(5) Put the following items in an ordered list:

Bread

Eggs

Milk

(6) Make a [link] to the following webpage: https://www.census.gov/

(7) Add this image to the bottom of the text: https://www.villanova.edu/content/dam/villanova/logos/university-main-logo-1.png with an alternative text.

# Advanced Exercises

Users are capable of doing a lot of things with markdown. Try doing the following with the [documentation](https://www.markdownguide.org/cheat-sheet/).

(1) Organize the following into a table.

Day | Task

Tuesday | Pick up groceries

Wednesday | Drop off mail

Thursday | Meet with friends

(2) Footnote this line with the following: "This is a footnote."




# Introduction to Python

Python is a programming language that allows users to do many things. Python can be used to take in input, define variables or functions, and create output.

Before one uses a function in python, one may need to import a library. A library contains pre-built functions for doing things in Python. There is probably a library for most things you would like to do in Python. [Pypi.org](https://pypi.org/) is site for finding custom Python libraries.

**NOTE** Don't worry too much about knowing all of the following syntax! This is just an example of what Python can do.

In [None]:
# This is a comment block! Comment blocks are used to document one's code.
# If you run this block, nothing happens.

### Searching Text for Patterns


In [None]:
# The following is a library for using advanced methods to search through text
import re

phone_number = re.search("\\d{3}-\\d{3}-\\d{4}", "Hi my name is Michael and my phone number is 555-555-5555.", )
print(phone_number.group())


### Make a Bar Chart (e.g. Experience in Law School)


In [None]:
import matplotlib.pyplot as plt

chart = plt.bar(["1L", "2L", "3L"], [1, 2, 3], width=0.8, bottom=None, align='center')
chart.set_label("Experience in Law School")

## Functions

Python can be used to define functions that take in input and create output. A function is defined like this:

```
def add(num1, num2):
  return num1 + num2
```
Calling this function would return:

```
add(1,2)
3
```

# Creating Output



Python can create output in multiple forms such as documents and tables of data. In Google Collab, this output is stored in the files tab. For further information, see **Al Sweigert, Automate the Boring Stuff in Python** 430 (3d ed. 2025).


## Creating a word document



In [None]:
#Creates a word document with someone's name and favorite color.
!pip install python-docx
import docx

def CreateWordDoc(doc_title):
  first = input("First Name: ")
  color = input("Favorite Color: ")

  #Creates a new word document
  doc = docx.Document()
  #Adds paragraphs based on input
  doc.add_paragraph(first)
  doc.add_paragraph(color)

  #Creates a document saved with the name in the function
  print(doc_title + ".docx")
  doc.save(doc_title + ".docx")



In [None]:
CreateWordDoc("mydoc")

## What is an API?

An API or Application Programming Interface is the means by which computers talk to each other and request services (from a server). API's use HTTP (Hyper Text Transfer Protocol) to make a number of requests to  a website's server. We use HTTP to make GET requests all the time when we use web browsers.

### External API

An example of an external API is a website that contains a database of information and makes its API endpoints available to users. A number of public websites have external APIs, such as: [Census.gov](https://www.census.gov/data/developers/data-sets.html). They are particularly useful for pulling updated information without navigating the website and clicking through relevant results.

In [None]:
! pip install census
! pip install us


#Imports Python's Beautiful Soup library, which parses through html and xml files.\n"
from bs4 import BeautifulSoup
#Imports libraries for navigating the local directory \n"
from os import listdir
#Imports library for making requests (get requests) to the Congress.gov API"
import requests
#Imports libaries for creating datasets and charting"
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
#Importing libary for creating regular expressions (regex) to search through data for relevant keywords.
import re
#Importing libraries for parsing json and yaml (Yet Another Markdown Language)
import yaml
import json
import math
import census
import us

# Requesting an API Key

We are going to request an API key for the following site:

1. [Congress.gov](https://api.congress.gov/sign-up/)

# Loading an API Key

Google Collab has a way to use your API key without listing it directly in the file. Otherwise, everyone can see it! Go to the key tab on the left. Under secrets add a "name" for the API key ("cong_gov_api") and enter the API key as the "value." Toggle the checkmark for notebook access.

## API Format

Most GET requests for an API can be made directly with a URL. The URL is usually structured like a terms and connectors search where one needs to define variables:

For example:

`https://api.congress.gov/v3/bill/{congress}?limit={num_results}&sort={sort}&api_key={api_key}"`

`congress` = session of congress

`limit` = number of bills we would like to return

`sort` = how we would like the data ordered

`api_key` = the api key for making the request.

The following is an example of the URL for making a request from [Congress.gov's API](https://api.congress.gov/) for bills.

In [None]:
from google.colab import userdata

#The session of congress
congress = "118"
#Limit is the variable for the number of bills we would like to return
limit = "10"
#Sort is the way we would like to order the information returned to us.
sort = "updateDate+desc"

#API Key: The API key that we get from Congress.gov to make the request work.
#We use the Google Collab userdata.get function to return the API key
api_key =  userdata.get('cong_gov_api')


#You may notice that we format the url with the variables we specify above.
request_url = f"https://api.congress.gov/v3/bill/{congress}?limit={limit}&sort={sort}&api_key={api_key}"

## JSON

APIs work with a data format called JSON, which is a nested structure for organizing data. JSON is the means by which most web applications transmit, store, and return public information.

The following is an example of what is returned when we make the API request above.

In [None]:
import re
import json
import requests

json_data = requests.get(request_url).json()
json_data

You may see that JSON represents blocks of data with brackets (`{}`), separated by commas. Each element inside the data has a `variable:value` pair. In the following block of data, the bill number is 9883 and the last action for the bill was that it was 'Referred to the House Committee on Armed Services.'

For example:

```
{

  'congress': 118,
   'latestAction': {'actionDate': '2024-09-25',
    'text': 'Referred to the House Committee on Armed Services.'},
   'number': '9833',
   'originChamber': 'House',
   'originChamberCode': 'H',
   'title': 'Restore Military Familiesâ€™ Voices Act',
   'type': 'HR',
   'updateDate': '2026-02-02',
   'updateDateIncludingText': '2026-02-02',
   'url': 'https://api.congress.gov/v3/bill/118/hr/9833?format=json'},
  
  ...

}

```

If we would like to put this data into a readable format, like a table, we can read each block of data to access and store their elements. We use a loop below to do this (don't worry too much about how the syntax works here, we can cover this another time)!

In [None]:
import pandas as pd

congress = []
bill_number = []
action_date = []
last_action = []
title = []
origin_chamber = []
url = []

#https://www.geeksforgeeks.org/python/generating-word-cloud-python/

for i in json_data['bills']:
  congress.append(i['congress'])
  bill_number.append(i['number'])
  origin_chamber.append(i['originChamber'])
  action_date.append(i['latestAction']['actionDate'])
  last_action.append(i['latestAction']['text'])
  title.append(i['title'])
  url.append(i['url'])

d = {'congress': congress,
     'bill_number': bill_number,
     'origin_chamber': origin_chamber,
    'action_date': action_date,
    'last_action': last_action,
     'title': title,
     'url': url}

df = pd.DataFrame(d)

df


## Exporting Data to a File

One can even export this data to a file. Above we defined a variable for this table called `df`. We can write this variable to the following file: `bill_data.csv`.



In [None]:
df.to_csv("bill_data.csv", index=False)

# Using Census Information

Now, we can discuss using Census information, including the American Community Survey. The American Community Survey is a comprehensive survey conducted by the U.S. Census Bureau. It is separate from the decennial census and includes estimates of demographic information. For the most granular data, one can use the 5-Year American Community Survey.

The five-year survey contains multiple tables with demographic information and statistics. You can search for these tables on [data.census.gov](https://data.census.gov/).

Once you find data that is useful to you, you can look up the codes for those variables. For example this is the list of variables for the [2024 5-Year ACS Survey](https://api.census.gov/data/2024/acs/acs5/variables.html).

We can use the Census Bureau's API and a Python library for mapping census information.

# Shape Files and Geopandas

We are going to use a library called `Geopandas` for reading in a shape file. [Shape files](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html) are provided by the Census Bureau to support mapping with different geographic areas (e.g. states, counties, census tracts, blocks). We are going to practice mapping census tracts.

The following code installs the geopandas library and loads in the 2025 shape file for Census tract (this make take a few minutes).

In [None]:
!pip install geodatasets

import geopandas as gpd
import geodatasets

# Making Maps with Census Information

There are some great guides for making maps for census information including the following articles that serve as background.

- [Documentation on Geopandas](https://geopandas.org/en/stable/gallery/plotting_basemap_background.html) which is the library for creating interative maps.
- [Article on Making Maps With the Census API](https://www.natekratzer.com/posts/census_map/). This last article was a particularly helpful tutorial on using the Census API to create a map by census tract. We are going to build on this tutorial to discuss how to make a map for any variable in the ACS 5-Year Survey using the Census.gov API.

The [Census API](https://www2.census.gov/data/api-documentation/api-user-guide.pdf) has a number of options for making a customized request. I am going to highlight a few here that are helpful.

The API endpoint for retrieving data from the 2024 5-Year ACS Survey is:

```
https://api.census.gov/data/2024/acs/acs5?
```

**Variables**

One can list [many variables](https://api.census.gov/data/2024/acs/acs5/variables.html) in the get request to the end point, including a code for the specific geographic area called a [GEO_ID](https://www.census.gov/programs-surveys/geography/guidance/geo-identifiers.html) including the geographic area's `NAME`.

```
https://api.census.gov/data/2024/acs/acs5?get=NAME,GEO_ID,{variable}

```

**Geography**

We can filter any table by any specific geographic area. For example, the following GEO ID calls for data in all census tracts in Philadelphia: `0500000US42101$1400000`.

```
https://api.census.gov/data/2024/acs/acs5?get=NAME,GEO_ID,{variable}&ucgid=pseudo(0400000US42$1400000)

```

**Output Format**

We can also specify the output format. Instead of JSON, we can ask for a csv file of data: `&outputFormat=csv`

```
https://api.census.gov/data/2024/acs/acs5?get=NAME,GEO_ID,{variable}&ucgid=pseudo(0400000US42$1400000)&outputFormat=csv"
```

## Function for Creating Maps With Statistics

The following code defines a program for creating maps using Census information and the following [datapoints](https://api.census.gov/data/2024/acs/acs5/variables.html) onto PA census tracts. Just to note, some datapoints may not be available by Census tract.

In [None]:
import numpy as np

shp = gpd.read_file('https://www2.census.gov/geo/tiger/TIGER2025/TRACT/tl_2025_42_tract.zip')

def map_philly_census_tracts_2024(variable, total, title, shape_file):

  url = f"https://api.census.gov/data/2024/acs/acs5?get=NAME,GEO_ID,{variable},{total}&ucgid=pseudo(0500000US42101$1400000)&outputFormat=csv"
  data = requests.get(url)
  if (data.status_code == 200):
    with open("data.csv", "w+") as f:
      f.write(data.text)

    census_data = pd.read_csv("data.csv").rename(
        columns={"NAME":"Tract Name",
                "GEO_ID":"GEOIDFQ"}
        )

    #Using a percentage calculation as: https://www.natekratzer.com/posts/census_map/
    census_data[f'{title}'] = np.round(census_data[f'{variable}'] / census_data[f'{total}'] * 100, 2)

    all_data = shape_file.merge(census_data, on="GEOIDFQ")

    return all_data.explore(
        column = f'{title}',
        tooltip = ['Tract Name', f'{title}'],
        #allows you to view streets and locations.
        tiles = 'OpenStreetMap')
  else:
    print("Sorry, this request was not available")



In [None]:
map_philly_census_tracts_2024(variable = "B19058_002E", total ="B19058_001E", title = "Percentage Receiving Food Assistance Benefits", shape_file = shp)
