# Learning goals
After today's lesson you should be able to:
- Get Census data from the U.S. Census API
- Use the Socrata API


Some of today's lessons borrow from: 
- [PyGIS - Open Source Spatial Programming and Remote Sensing book](https://pygis.io/docs/d_access_census.html)
- [The Socrata SODA API documentation](https://dev.socrata.com/consumers/getting-started.html)

In [None]:
## You might need to run these or manually add the libraries to your environment in Anaconda
# !pip install census
# !pip install us

In [None]:
# We are going to start importing the libraries we need
# all in one cell. 
# It is a good practice to keep all the imports in one cell so that
# we can easily see what libraries we are using in the notebook.
import pandas as pd
import numpy as np
import geopandas as gpd

import matplotlib.pyplot as plt
import seaborn as sns

## The set_context() function is really useful!
## It allows us to set the size of the fonts in our plots based on whether 
## we are making a poster, a talk, a notebook, etc.

## If you are only presenting these figures in your jupyter notebook, 
## there is no need to set the context to be "talk" or "poster"
## But, I sometimes set my context to be "talk" or "poster" even for articles
## because I like the fonts to be bigger.
sns.set_context(context='paper')

# we use the inline backend to generate the plots within the browser
%matplotlib inline

from census import Census
from us import states



# 0. Census Data: Census survey and statistical boundaries

## 0.1 Census Surveys
The United States Census Bureau has been collecting information on its residents in the country since 1780 through surveys sent by mail (since 2020, you can submit your survey by phone, mail, or online). Census data is used for a variety of governmental purposes including: provision of housing, infrastructure, and public amenities; making districting decisions for schools, precints, and elections; and more generally, to understand the population, socio-economic, and demographic characteristics of residents in the country. [Did you know that the punch card machine (a prototype for the computer) was created for the 1890 Census?](https://en.wikipedia.org/wiki/Tabulating_machine)

The US Census has historically been taken every 10 years. Every household in the U.S. is sent a Census survey (and you are legally required to respond.) In 2005, the Census Bureau created the American Community Survey (ACS), which is collected every month on a sample of households.

Since 2020, the Census only contains 10 questions (historically called the "short form census") such as age, sex, race, Hispanic origin, and owner/renter status. The ACS contains a larger set of questions such as employment, education, transportation.

Because the ACS is more frequent, it is often used for more current census needs; however, because it is also a sample, we generally need a longer time span to get a robust sample. This is why we will often use the **5-yr ACS** (for ex: 2012 - 2016 ACS) to represent the year (here, 2014).

Census data is often the baseline survey dataset in the area of urban planning because it provides racial, socio-economic, housing, etc. information that is often the highlight or backdrop of a study.

## 0.2 Census Geographies
There are different, often nested Census geographic regions used for  different administrative scales. The most commonly used regions are statistical areas, typically nested within each other, whose boundaries are defined by certain physical, administrative, and population constraints. For instance, a **Census block** is bounded by physical features such as streets and administrative boundaries such as city limits and school districts. **Block groups**, the smallest unit of analysis that is still mostly statistically robust, are collections of Census blocks (hence the name) that generally have between 800 to 5000 people. **Census tracts** generally have between 1000 and 8000 people. [Here's more information](https://pitt.libguides.com/uscensus/understandinggeography) about Census geographies if you're curious.

See the image below for how these regions nest within one another.

</figure>
<img src="https://www.dropbox.com/s/8w69pibhwffgoc0/qgis_censusgeography.png?dl=1" alt="drawing" width="500" style="display: block; margin: 0 auto"/>
</figure>


## 0.3 [Social Explorer](https://www-socialexplorer-com.proxy.library.cornell.edu/ezproxy)
This is a great tool for looking at Census and ACS data visually. They also have datasets beyond just Census Bureau data. You can also output images and shareable links to the map. I encourage you to sign up (through Cornell it's free) and explore this tool on your own time.


# 0.4 What is an API 
APIs are tools that allow different software applications to communicate with one another. In particular, the Census API allows us to access data from the US Census Bureau.

# 1. U.S. Census 
The Census makes data publicly available directly from their website `census.gov`. They have a bunch of APIs on their website that allow you to access various datasets: 
</figure>
<img src="https://www.dropbox.com/scl/fi/glfgqu8evzpqql70eq08w/Screen-Shot-2024-02-11-at-11.27.10-AM.png?rlkey=e6gio2bjp7w1eadjf2v3tecp2&dl=1" alt="drawing" width="1000" style="display: block; margin: 0 auto"/>
</figure>


## 1.1 Census Python Package
The `census` python library is a wrapper for the US Census API. We are also going to use a helper tool called `us` that helps us to navigate the FIPS codes and other US State metadata like capitals, time zones, postal codes, etc. 


In [None]:
states.VA.fips

In [None]:
states.NY.fips

In [None]:
states.NY.capital

You will need to create and keep track of your Census API key, which can be obtained [here](http://api.census.gov/data/key_signup.html)

In [None]:
# Set API key
c = Census("YOUR CENSUS API KEY HERE")


In [None]:
c = Census("d9c002dc1334c8f6cbea48d3f10a4176cdf89064")


## 1.2 Getting the ACS 5-year
There are various geographies at which we can get the ACS 5-year tables, here are the functions and inputs: 

* state(fields, state_fips)
* state_county(fields, state_fips, county_fips)
* state_county_blockgroup(fields, state_fips, county_fips, blockgroup)
* state_county_subdivision(fields, state_fips, county_fips, subdiv_fips)
* state_county_tract(fields, state_fips, county_fips, tract)
* state_place(fields, state_fips, place)
* state_congressional_district(fields, state_fips, congressional_district)
* state_legislative_district_upper(fields, state_fips, legislative_district)
* state_legislative_district_lower(fields, state_fips, legislative_district)
* us(fields)
* state_zipcode(fields, state_fips, zip5)

You can consult the [documentation](https://pypi.org/project/census/) to see which vintages the library has. It looks like they only have up to the 2021 5YR (2017-2021). 


Going on the [ACS 5Yr page on the census website](https://www.census.gov/data/developers/data-sets/acs-5year.html) (make sure to select the correct year!), we can see the different types of tables that exist. 

We are interested in columns from the "[Detailed Tables](https://api.census.gov/data/2021/acs/acs5/variables.html)" here. 


We can also use the [Table Shells and Table List](https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.2021.html#list-tab-79594641) to more quickly look for the columns we need. "The ACS table list contains columns with the table IDs, table titles, table universes, and 1-year/5-year availability for all Detailed Tables, Supplemental Estimate Tables, Comparison Profiles, Data Profiles, and Subject Tables in one spreadsheet."

You will have to download the `XXXX ACS Detailed Table Shells` for the ACS 1/5 YR if you want to use the table shells. 

</figure>
<img src="https://www.dropbox.com/scl/fi/gx1q7o27byz9lt83o6ekr/Screen-Shot-2024-02-12-at-11.17.28-AM.png?rlkey=s4y1r0gathbpnu0vqs4z4zrtf&dl=1" alt="drawing" width="1000" style="display: block; margin: 0 auto"/>
</figure>

In [None]:
# B16010_041E: is the total number of people with an educational attainment of a bachelor's degree or higher
# B01003_001E: total population
ny_census = c.acs5.state_county_tract(fields = ('NAME', 'B16010_041E','B01003_001E'),
                                      state_fips = states.NY.fips,
                                      county_fips = "*",
                                      tract = "*",
                                      year = 2019)

We do need to create a `GEOID` column that's the actual FIPS code. 

In [None]:
ny_df = pd.DataFrame(ny_census)

In [None]:
ny_df["GEOID"] = ny_df["state"] + ny_df["county"] + ny_df["tract"]


In [None]:
ny_df.shape

We can also translate the number of people with a college degree or higher to a percentage

In [None]:
ny_df['college_ed_perc'] = ny_df['B16010_041E'] / ny_df['B01003_001E'] 

In [None]:
ny_df.head()

## 1.3 Get the shapefiles 
The Census also maintains a [set of shapefiles](https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html) that has the geometries by state, counties, tracts, block groups, and more. 

When you go to the Tiger/Line Shapefiles, make sure to select the year you are looking for: 
</figure>
<img src="https://www.dropbox.com/scl/fi/qbbj4x6jer4sjtldb228u/Screen-Shot-2024-02-12-at-10.29.16-AM.png?rlkey=i6t2k6zr8e83ofy4rjh58pou8&dl=1" alt="drawing" width="1000" style="display: block; margin: 0 auto"/>
</figure>

You can use the **FTP Archive** to find the particular boundary and state you need (you'll have to know the FIPS code for the state):

</figure>
<img src="https://www.dropbox.com/scl/fi/wj3ewuazhzx84c5bbtnjm/Screen-Shot-2024-02-12-at-10.40.24-AM.png?rlkey=zxresvebgfods8p69yjqolde4&dl=1" alt="drawing" width="1000" style="display: block; margin: 0 auto"/>
</figure>


Once you have all this information, you can read the shapefile directly from the URL link: 

In [None]:
ny_tract = gpd.read_file("https://www2.census.gov/geo/tiger/TIGER2019/TRACT/tl_2019_36_tract.zip")


In [None]:
ny_tract.shape

In [None]:
ny_tract.head()

Finally, we can merge the tables we created with the shapefile

In [None]:
ny_census_geo = ny_tract.merge(ny_df, left_on = 'GEOID', right_on = 'GEOID')

In [None]:
ny_census_geo.shape

In [None]:
ny_census_geo.plot('college_ed_perc', legend = True, figsize = (10,10))

## Q.1
List at least one reason why the above is not a clear figure in a markdown cell. 

INSERT YOUR TEXT HERE.

## Q.2 
Using the [ACS table](https://www.census.gov/programs-surveys/acs/technical-documentation/table-shells.2019.html#list-tab-LO1F1MU1CQP3YOHD2T) lookup page to download "2019 ACS Detailed Table Shells"
- Find the table ID (in the format of "B00000") for the **HISPANIC OR LATINO ORIGIN BY RACE** table 
- Plot the percentage Hispanic or Latino Origin by Race for Oregon using the method we described above. 


In [None]:
### INSERT YOUR CODE HERE
or_census_geo = 

In [None]:
fig, ax = plt.subplots(figsize=(15,15))

or_census_geo.plot(### INSERT YOUR CODE HERE)
ax.set_axis_off()

## Use tight_layout to remove the white space around the plot
plt.tight_layout()

## I forgot to show you all how to save down your plots!
fig.savefig('OR_perc_hispanic.png')   # save the figure to file

## 2. Socrata and Socrata APIs
Many government open data portals were built by the same company, Socrata (acquired a few years back by Tyler Technologies), which created the infrastructure and front-end interface to access open government data. 

We are going to look at Mandatory Inclusionary Housing zones in New York City [here](https://data.cityofnewyork.us/Housing-Development/Mandatory-Inclusionary-Housing-MIH-/bw8v-wzdr).


You may have noticed that, when we go to export data, that there is a **SODA API** section: 
</figure>
<img src="https://www.dropbox.com/s/0ewtgsg8lc4sl3j/Screen%20Shot%202024-02-12%20at%2011.30.04%20AM.png?dl=1" alt="drawing" width="1000" style="display: block; margin: 0 auto"/>
</figure>

SODA is Socrata's API for allowing users from researchers to (more often) people building tools and applications to access open-portal data. This is most useful when you have to programmatically connect your data export to something else. For instance, if you're running a website that needs to update data in real-time or if you don't want to download an updated dataset each time, you can connect your notebook or app to this API. Click to expand the **SODA API** section.


**Copy the API endpoint URL**. 

## 2.1 API endpoint to GeoDataFrame

We can pretty easily this JSON file into a geodataframe. FYI, a JSON stands for "JavaScript Object Notation" and is a file format that was desisgned for the JavaScript language, but is easily translated to other formats that we know well. 

The good thing is that pandas has a `pd.read_json()` function that will allow us read this JSON as a DF and eventually turn it into a geodataframe. 

In [None]:
# mih = pd.read_json('https://data.cityofnewyork.us/resource/m79g-k9r4.json')
mih = pd.read_json('INSERT_YOUR_API_ENDPOINT_HERE')



In [None]:
mih.head()

In [None]:
mih.shape

Notice that there is a **the_geom** column that looks like it might have geometry information. 

In [None]:
## Ignore the warnings 
mih['the_geom'].head(1)

We are going to turn these strings, into Shapely geometries, which is the only piece of our data that is missing so we can turn this into a geometry. 

In [None]:
from shapely.geometry import shape

In [None]:
from shapely.geometry import shape

## the apply method applies the function to each row of the dataframe
mih['the_geom'] = mih['the_geom'].apply(shape)

## I'm going to use the GeoDataFrame method to create a GeoDataFrame
## I figured the CRS is 4326 looking at the lat/longs in the geometries, but unfortunately 
## we are not given the CRS in the data documentation!
## We can look at the shapefile .prj file to see what the CRS is.
mih_geo = gpd.GeoDataFrame(mih,geometry='the_geom',crs='epsg:4326')

In [None]:
mih_geo.head()

In [None]:
## Faint, but these are our buildings

mih_geo.explore()

## 2.2 Filtering
The SODA API allows us to filter data from the endpoint url. Why might we want to do this? For one, there are very large datasets such as the [311 Service Requests dataset](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9) (with 32 million rows) or the [Open Parking and Camera Violations](https://data.cityofnewyork.us/City-Government/Open-Parking-and-Camera-Violations/nc67-uf89) (with 93 million rows!) that are difficult to work with due to their size. 

There are two ways to filter data using the SODA API: 
- [Simple Filters](https://dev.socrata.com/docs/filtering.html)
- [SoQL Queries](https://dev.socrata.com/docs/queries/)


**Both of these filters are text we append to the original endpoint URL.**

### 2.2.1 Simple Filters
Any column in the dataset can be used as a filter for specific values within that column and is in the format :

`http://yourendpointurl.json?col_name=element_name`

In [None]:
mih_url_orig = "https://data.cityofnewyork.us/resource/m79g-k9r4.json"

## Note, this query is CASE-SENSITIVE! 
## If the column name is in all caps, it must be in all caps here
## If the value of interest is in all caps, it must be in all caps here
mih_url_mh = "https://data.cityofnewyork.us/resource/m79g-k9r4.json?Boro=1"

In [None]:
mih_mh = pd.read_json(mih_url_mh)
mih_mh['the_geom'] = mih_mh['the_geom'].apply(shape)
mih_mh_geo = gpd.GeoDataFrame(mih_mh,geometry='the_geom',crs='epsg:4326')

In [None]:
mih_mh_geo.head()

In [None]:
mih_mh_geo.explore()

You can join multiple queries with an `&`.

**One key formatting difference here is the use of white space, but must be translated into `%20` for URL purposes, since no white spaces are allowed in the URL.** I am using the `.replace("to_be_replace_str","new_str")` function to replace empty spaces with `%20`.


In [None]:
nycha_url_mh_eh = "https://data.cityofnewyork.us/resource/m79g-k9r4.json?boro=1&project_nam=East Harlem Neighborhood Rezoning".replace(' ','%20')
nycha_url_mh_eh = pd.read_json(nycha_url_mh_eh)
nycha_url_mh_eh['the_geom'] = nycha_url_mh_eh['the_geom'].apply(shape)
nycha_url_mh_eh_geo = gpd.GeoDataFrame(nycha_url_mh_eh,geometry='the_geom',crs='epsg:4326')

In [None]:
nycha_url_mh_eh_geo

In [None]:
nycha_url_mh_eh_geo.explore()

### 2.2.2 SoQL Queries
The “Socrata Query Language” (SoQL) is a simple, SQL-like query language specifically designed for making it easy to work with data on the web. If you're familiar with SQL, the following may be familiar. And even if you're not, this will seem pretty intuitive. 

Here are all the different parameters that you can use in this query: 
</figure>
<img src="https://www.dropbox.com/s/r4edgdtyzm2vrxn/Screen%20Shot%202023-02-19%20at%2010.09.27%20AM.png?dl=1" alt="drawing" width="800" style="display: block; margin: 0 auto"/>
</figure>



The same filtering for Manhattan and the Jefferson Development we did above would look like this: 

(Note that the values we need to filter by need single quotes if they are strings now.)


In [None]:
## Note the use of single vs double quotes here, since I need to include a single quote in the query
nycha_url_mh_eh_sql = "https://data.cityofnewyork.us/resource/m79g-k9r4.json?$where=boro='1' and project_nam='East Harlem Neighborhood Rezoning'".replace(" ", "%20")

nycha_url_mh_eh2 = pd.read_json(nycha_url_mh_eh_sql)
nycha_url_mh_eh2['the_geom'] = nycha_url_mh_eh2['the_geom'].apply(shape)
nycha_url_mh_eh2_geo = gpd.GeoDataFrame(nycha_url_mh_eh2,geometry='the_geom',crs='epsg:4326')

In [None]:
nycha_url_mh_eh2_geo.plot()

### 2.2.3 A more complex SoQL query

Let's say we wanted to look at the [311 Service Requests data](https://data.cityofnewyork.us/Social-Services/311-Service-Requests-from-2010-to-Present/erm2-nwe9). Here are the ways I want to filter the dataset based on the columns available: 
- **Created Date** is since Feb 2023
- **Complaint Type**  is `Noise - Residential`
- **Descriptor** is `Loud Music/Party` 

Looking at the [311 API docs](https://dev.socrata.com/foundry/data.cityofnewyork.us/erm2-nwe9) will give you some example queries and will also show you the correct column names for the API. You can also find the column names when you click on each column in the "Columns in the Dataset" section of the data homepage. 

</figure>
<img src="https://www.dropbox.com/s/wlrh8jzes9dcsvv/Screen%20Shot%202023-02-19%20at%2011.55.08%20AM.png?dl=1" alt="drawing" width="1000" style="display: block; margin: 0 auto"/>
</figure>


In [None]:
servicereq_url = "https://data.cityofnewyork.us/resource/erm2-nwe9.json?$where=created_date between '2023-02-01T0:00:00.000' and '2023-02-21T0:00:00.000'   and complaint_type='Noise - Residential' and descriptor='Loud Music/Party'".replace(" ", "%20")
servicereq = pd.read_json(servicereq_url)

In [None]:
servicereq

Let's turn this into a GeoDataFrame

In [None]:
servicereq_geo[servicereq_geo.geometry.is_valid]

In [None]:
servicereq_geo = gpd.GeoDataFrame(servicereq, 
                                  geometry=gpd.points_from_xy(servicereq['longitude'], 
                                                              servicereq['latitude']),
                                                              crs='epsg:4326')

In [None]:
servicereq_geo.plot()

## 2.3 `offset` and `limit`
The issue with using this endpoint is that we are limited to 1000 rows per query. You will see the documentation refer to this as "pages" sometimes.


In [None]:
servicereq.shape

What to do? 

One way to get around this is to use the `limit` and `offset` parameters. From the SODA documentation: 

>The `$offset` parameter is most often used in conjunction with $limit to page through a dataset. The `$offset` is the number of records into a dataset that you want to start, indexed at 0. For example, to retrieve the “4th page” of records (records 151 - 200) where you are using `$limit` to page 50 records at a time, you’d ask for an `$offset` of 150.

In [None]:
servicereq_url_offset = "https://data.cityofnewyork.us/resource/erm2-nwe9.json?$limit=50&$offset=150&$where=created_date between '2023-02-01T0:00:00.000' and '2023-02-21T0:00:00.000' and complaint_type='Noise - Residential' and descriptor='Loud Music/Party'".replace(" ", "%20")
servicereq_offset = pd.read_json(servicereq_url_offset)

This is now 50 entries of the "4th page".

In [None]:
servicereq_offset

So, to get all the data, what we can do is run a loop to change that offset amount iteratively. 

OR

If we are getting the data just once, we can use the filter function, accessible through the  "View Data" button on the dataset's home page. 

</figure>
<img src="https://www.dropbox.com/s/oz26ti7y164pm8r/Screen%20Shot%202023-02-19%20at%2012.35.21%20PM.png?dl=1" alt="drawing" width="1000" style="display: block; margin: 0 auto"/>
</figure>



### 2.3.1 A short review of loops

In [None]:
my_counter = np.arange(0,1000,50)
print(my_counter)

In [None]:
# The for loop will iterate through each value in the list
# The {} is a placeholder for the value in the list within a string
for i in my_counter:
    print("Current Counter is now at {}".format(i))

In [None]:
## reset i to 0
i = 0
## The while loop will continue to run until the condition is no longer true
while i < 1000:
    print("Current Counter is now at {}".format(i))
    
    ## This is an example of an incrementer
    ## An incrementer is a variable that is used to increment a value
    ## After each iteration, the value of i will increase by 50
    i = i + 50

In [None]:
for i in np.arange(0,100000,50):
    print("Current Counter is now at {}".format(i))
    i = i + 50

    if i >1000:
        print("We are done")
        break

To programmatically run different queries, I just going to 

This might take a while to run and might not work at all given our 1000 an hour limit. :/

In [None]:
offset_list_smaller = np.arange(0,200,50)

In [None]:
## I actually don't know what the upper range is for my dataset, but I will just use 100,000
# offset_list = np.arange(0,100000,50)

# I'm actually going to use a smaller list for demo and not overloading the API
offset_list_smaller = np.arange(0,200,50)

list_of_dfs = []

for offset in offset_list_smaller:
    servicereq_url_offset = "https://data.cityofnewyork.us/resource/erm2-nwe9.json?$limit=50&$offset={}&$where=created_date between '2023-02-01T0:00:00.000' and '2023-02-19T0:00:00.000' and complaint_type='Noise - Residential' and descriptor='Loud Music/Party'".replace(" ", "%20").format(offset)
    servicereq_offset = pd.read_json(servicereq_url_offset)

    ## Here I am creating a list of dataframes by appending each dataframe to the list
    list_of_dfs.append(servicereq_offset)

I now have a list of dataframes.

In [None]:
len(list_of_dfs)

In [None]:
## pd.concat will concatenate the dataframes in the list
## to create a single dataframe
servicereq_final = pd.concat(list_of_dfs)

In [None]:
servicereq_final.shape

If I were to really try and get all this data, I'd put a `sleep()` call from the library `time` to pause my code from running the next line for a certain amount of time. 

In [None]:
import time

list_of_dfs = []

for offset in offset_list_smaller:
    servicereq_url_offset = "https://data.cityofnewyork.us/resource/erm2-nwe9.json?$limit=50&$offset={}&$where=created_date between '2023-02-01T0:00:00.000' and '2023-02-19T0:00:00.000' and complaint_type='Noise - Residential' and descriptor='Loud Music/Party'".replace(" ", "%20").format(offset)
    servicereq_offset = pd.read_json(servicereq_url_offset)

    ## Here I am creating a list of dataframes by appending each dataframe to the list
    list_of_dfs.append(servicereq_offset)
    
    ## I am adding a sleep timer to avoid overloading the API
    ## The sleep timer will pause the code for 10 seconds
    ## This gives me 10 seconds /run for each 50 records
    time.sleep(10)
    if servicereq_offset.shape[0] == 0:
        print("We are done")
        break

servicereq_final = pd.concat(list_of_dfs)

In [None]:
servicereq_final.head()

Lastly! Don't think this means we can just get all the data at once. Each query we make "costs" the API provider resources. To ensure that everyone is able to use the API, the provider will limit your capacity to query. Here's their language on it:

>## Throttling and Application Tokens
>Hold on a second! Before you go storming off to make the next great open data app, you should understand how SODA handles throttling. You can make a certain number of requests without an application token, but they come from a shared pool and you’re eventually going to get cut off.
>
>If you want more requests, sign up for a Socrata account, then register for an application token and your application will be granted up to 1000 requests per rolling hour period. If you need even more than that, special exceptions are made by request. You can contact our support team here.

## Q.3 Querying and Concatenating 
- Using the [Film Permits](https://data.cityofnewyork.us/City-Government/Film-Permits/tg4x-b46p) dataset to retrieve two dataframes: 
    1. The **StartDateTime** should be after July 1, 2022
    2. The **StartDateTime** should be after July 1, 2022 & The **Category** should be `Television`. 
- Create a list of two dataframes with 50 rows per "page"
- Concatenate these two dataframes together into one dataframe
- Show the first 5 rows of the new dataframe.



Using the [Film Permits](https://data.cityofnewyork.us/City-Government/Film-Permits/tg4x-b46p) dataset to retrieve two dataframes: 
1. The **StartDateTime** should be after July 1, 2022
2. The **StartDateTime** should be after July 1, 2022 & The **Category** should be `Television`. 

In [None]:
film_url1 = ## INSERT YOUR CODE HERE
film1 = pd.read_json(film_url1)

film_url2 = ## INSERT YOUR CODE HERE
film2 = pd.read_json(film_url2)

Concatenate these two dataframes together into one dataframe

In [None]:
## INSERT YOUR CODE HERE

Show the first 5 rows of the new dataframe.


In [None]:
## INSERT YOUR CODE HERE