# ETL Project
---

## Extract

U.S. border-crossing data was extracted from the Bureau of Transportation Statistics (BTS) Border Crossing API (https://data.transportation.gov/Research-and-Statistics/Border-Crossing-Entry-Data/keg4-3bc2). Seeing as this data is found in the Socrata Public Data API, instead of extracting border-crossing data via JSON, the sodapy library was used.

In [3]:
# Dependencies
import pandas as pd
from sodapy import Socrata

In [4]:
# Activate the Socrata Public Data API, specifically transportation data
client = Socrata("data.transportation.gov", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.transportation.gov,
#                  MyAppToken,
#                  userame="user@example.com",
#                  password="AFakePassword")



In [52]:
# Create conditions to ensure most rows will be used in final database
test_conditions = "date >= '2009-01-01' and value > 0 and (measure = 'Personal Vehicles' or \
                   measure = 'Personal Vehicles Passengers' or measure = 'Bus Passengers' or \
                   measure = 'Train Passengers') "

conditions = "date >= '2009-01-01' and value > 0"

In [59]:
# Request Border Entry data from specified API including additional conditions
results = client.get("keg4-3bc2", limit = 20000,  border = 'US-Canada Border', where = test_conditions)

In [60]:
# Create a Pandas Dataframe
results_df = pd.DataFrame.from_records(results)

In [66]:
# Preview the Dataframe
results_df.sort_values(by=['date'])

Unnamed: 0,border,date,location,measure,port_code,port_name,state,value
0,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-102.55, 49]}",Bus Passengers,3403,Portal,North Dakota,715
78,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-83.04, 42.32]}",Bus Passengers,3801,Detroit,Michigan,26165
2,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-67.93, 47.16]}",Personal Vehicles,108,Van Buren,Maine,16882
3,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-71.79, 45.01]}",Personal Vehicles,211,Norton,Vermont,2692
4,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-72.09, 45.01]}",Bus Passengers,209,Derby Line,Vermont,4385
5,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-118.22, 49]}",Bus Passengers,3016,Laurier,Washington,63
6,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-75.98, 44.35]}",Personal Vehicles,708,Alexandria Bay,New York,31018
7,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-72.66, 45.02]}",Bus Passengers,203,Richford,Vermont,102
17,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-103.81, 49]}",Personal Vehicles,3417,Fortuna,North Dakota,354
18,US-Canada Border,2009-01-01T00:00:00.000,"{'type': 'Point', 'coordinates': [-111.96, 49]}",Personal Vehicles,3310,Sweetgrass,Montana,15823


In [64]:
results_df['measure'].value_counts()

Personal Vehicles    10307
Bus Passengers        5771
Train Passengers      2981
Name: measure, dtype: int64