# CBP Encounters Scraping 

## Purpose 

This notebook provides functionality to "scrape" or extract all data from the first Tableau dashboard on the [CBP Southwest Land Border Encounters](https://www.cbp.gov/newsroom/stats/southwest-land-border-encounters) page. CBP does not provide this data in a spreadsheet format nor does it enable download of the data through the embedded Tableau chart. Therefore this code was developed to pull down all the data included every data point that exists with every possible combination of filters. 

**Please note:** When using the data output by this notebook for analysis or exploration that you must filter the data based on the filter columns [Citizenship Grouping,	Component,	Demographic,	Title of Authority]. These columns coorespond to the filters. 

## Approach

We use a the python programming language along with some python libraries that simplify the process of extracting data from the dashboard.

 The [TableauScraper](https://github.com/bertrandmartel/tableau-scraping) library (an open source project)  provides the primary functionality for extracting data from Tableau.

In [1]:
from google.colab import drive
drive.mount('/content/drive/')

Drive already mounted at /content/drive/; to attempt to forcibly remount, call drive.mount("/content/drive/", force_remount=True).


In [3]:
%cd drive/Shareddrives/Data\ Products\ Team/Products/Immigration\ Data\ Hub/DataRepo/

/content/drive/Shareddrives/Data Products Team/Products/Immigration Data Hub/DataRepo


## The Code !

In [3]:
!pip install TableauScraper
!pip install pandas



In [1]:
%load_ext autoreload
%autoreload 2

In [5]:
# Import helper python libraries
from tableauscraper import TableauScraper as TS
import itertools
import logging
import time
import pprint
import pandas as pd

pp = pprint.PrettyPrinter(indent=4)
import logging.config

from cbp_tableau_scraping import find_filters_worksheet, unpack_filter_information, get_dashboard_data

logging.config.dictConfig(
    {
        "version": 1,
        "disable_existing_loggers": True,
    }
)


### First We Grab the CBP Tableau Data

The cbp url see [link](todo) on how to find the url

TODO --> Maybe se if there is an automated way to grab this 

In [6]:
dashboard1_url = (
    "https://publicstats.cbp.gov/t/PublicFacing/views/"
    "CBPSBOEnforcementActionsDashboardsAUGFY21/"
    "SBOEncounters8076?:isGuestRedirectFromVizportal=y&:embed=y"
)

dashboard2_url =  (
    "https://publicstats.cbp.gov/t/PublicFacing/views/"
    "CBPSBOEnforcementActionsDashboardsAUGFY21/"
    "SBObyMonthDemo8076?:isGuestRedirectFromVizportal=y&:embed=y"
)




Here we activate (instantiate is the technical term) the TableauScraper library and then load data from the dashboard url. 

**Dashboard to load**

In [7]:
current_dashboard = dashboard2_url

In [8]:
# Create a tableau scraper object
ts = TS()

# We then pass the url to that object and it grabs data from the CBP dashboard
ts.loads(dashboard2_url)

The tableau dashboard has the filters linked to a specific visualization. On the CBP page we have a line chart, table and a bar chart in the first embedded dashboard. We want to extract the table or line data as they contain the same information. We need to find the visualization that has the filters that will update all three charts. 

TODO - Explain more about filters, worksheets and workboks ...

In [44]:
??find_filters_worksheet

In [10]:
filters_ws, wb_names = find_filters_worksheet(ts)

Filters Presesnt on worksheet name --> All MoM Change Podium
[   {   'column': 'Citizenship Grouping',
        'globalFieldName': '[federated.1xhccc00nlacbx14ajs101w1uee1].[none:Citizenship '
                           'Grouping:nk]',
        'ordinal': 0,
        'selection': [   'El Salvador',
                         'Guatemala',
                         'Honduras',
                         'Mexico',
                         'Other',
                         'all'],
        'selectionAlt': [   {   'columnFullNames': ['[Citizenship Grouping]'],
                                'domainTables': [   {   'isSelected': True,
                                                        'label': 'El '
                                                                 'Salvador'}],
                                'fn': '[federated.1xhccc00nlacbx14ajs101w1uee1].[none:Citizenship '
                                      'Grouping:nk]'}],
        'values': ['El Salvador', 'Guatemala', 'Honduras', 'Mexic

Above we can see which worksheet has the filters, if we are using the first url the filters are on the `SBO Line Graph` worksheet, if using the second url the filtes are on the `All MoM Change Podium` worksheet.

 The `filters_ws` holds the value for which worksheet has the filters.  

In [11]:
print(filters_ws)

All MoM Change Podium


In [26]:
data_element_target = 'Demo FYTD by Month (2)'

In [None]:
filters_ws, wb_names = find_filters_worksheet(ts)

Filters Presesnt on worksheet name --> All MoM Change Podium
[   {   'column': 'Citizenship Grouping',
        'globalFieldName': '[federated.1xhccc00nlacbx14ajs101w1uee1].[none:Citizenship '
                           'Grouping:nk]',
        'ordinal': 0,
        'selection': [   'El Salvador',
                         'Guatemala',
                         'Honduras',
                         'Mexico',
                         'Other',
                         'all'],
        'selectionAlt': [   {   'columnFullNames': ['[Citizenship Grouping]'],
                                'domainTables': [   {   'isSelected': True,
                                                        'label': 'El '
                                                                 'Salvador'}],
                                'fn': '[federated.1xhccc00nlacbx14ajs101w1uee1].[none:Citizenship '
                                      'Grouping:nk]'}],
        'values': ['El Salvador', 'Guatemala', 'Honduras', 'Mexic

### Next, we build out all possible combinations of filters 

In [12]:
??unpack_filter_information

In [13]:
filter_data = unpack_filter_information(ts, filters_ws, skip_filters = ['Demographic'])

{   'Citizenship Grouping': [   'El Salvador',
                                'Guatemala',
                                'Honduras',
                                'Mexico',
                                'Other'],
    'Demographic': [   'Accompanied Minors',
                       'FMUA',
                       'Single Adults',
                       'UC / Single Minors'],
    'Title of Authority': ['Title 8', 'Title 42']}
Fiscal year not present


In [14]:
# See the various combinations
pp.pprint(filter_data['filter_combinations'])

[   (None, None, None),
    (None, None, 'Title 8'),
    (None, None, 'Title 42'),
    (None, 'Accompanied Minors', None),
    (None, 'Accompanied Minors', 'Title 8'),
    (None, 'Accompanied Minors', 'Title 42'),
    (None, 'FMUA', None),
    (None, 'FMUA', 'Title 8'),
    (None, 'FMUA', 'Title 42'),
    (None, 'Single Adults', None),
    (None, 'Single Adults', 'Title 8'),
    (None, 'Single Adults', 'Title 42'),
    (None, 'UC / Single Minors', None),
    (None, 'UC / Single Minors', 'Title 8'),
    (None, 'UC / Single Minors', 'Title 42'),
    ('El Salvador', None, None),
    ('El Salvador', None, 'Title 8'),
    ('El Salvador', None, 'Title 42'),
    ('El Salvador', 'Accompanied Minors', None),
    ('El Salvador', 'Accompanied Minors', 'Title 8'),
    ('El Salvador', 'Accompanied Minors', 'Title 42'),
    ('El Salvador', 'FMUA', None),
    ('El Salvador', 'FMUA', 'Title 8'),
    ('El Salvador', 'FMUA', 'Title 42'),
    ('El Salvador', 'Single Adults', None),
    ('El Salvador', 'S

### Now lets pull down the data

**Data Extraction Function**

We wil create a function to pull down the data and paramaterize some of the arguments 

This may take about 20 minutes

In [32]:
dataset, failed_combination = get_dashboard_data(
    current_dashboard, filter_data['filter_columns'], filter_data['filter_combinations'], filters_ws, data_element_target
)

Attempting Fitler Combination (None, None, None)
Attempting Fitler Combination (None, None, 'Title 8')
Attempting Fitler Combination (None, None, 'Title 42')
Attempting Fitler Combination (None, 'Accompanied Minors', None)
Attempting Fitler Combination (None, 'Accompanied Minors', 'Title 8')
Attempting Fitler Combination (None, 'Accompanied Minors', 'Title 42')
Attempting Fitler Combination (None, 'FMUA', None)
Attempting Fitler Combination (None, 'FMUA', 'Title 8')
Attempting Fitler Combination (None, 'FMUA', 'Title 42')
Attempting Fitler Combination (None, 'Single Adults', None)
Attempting Fitler Combination (None, 'Single Adults', 'Title 8')
Attempting Fitler Combination (None, 'Single Adults', 'Title 42')
Attempting Fitler Combination (None, 'UC / Single Minors', None)
Attempting Fitler Combination (None, 'UC / Single Minors', 'Title 8')
Attempting Fitler Combination (None, 'UC / Single Minors', 'Title 42')
Attempting Fitler Combination ('El Salvador', None, None)
Attempting Fitler

**Check if anything failed**

In [33]:
print(len(failed_combination))
print(failed_combination)

0
[]


## Review Data

In [34]:
dataset.shape

(5376, 11)

In [35]:
dataset

Unnamed: 0,Component-value,Component-alias,Demographic-value,Demographic-alias,Month (abbv)-value,Month (abbv)-alias,SUM(Encounter Count)-alias,ATTR(Demographic (copy))-alias,Citizenship Grouping,Demographic,Title of Authority
0,Office of Field Operations,Office of Field Operations,%all%,%all%,%all%,%all%,68996,%many-values%,all,all,all
1,Office of Field Operations,Office of Field Operations,%all%,%all%,AUG,AUG,13329,%many-values%,all,all,all
2,Office of Field Operations,Office of Field Operations,%all%,%all%,JUL,JUL,12935,%many-values%,all,all,all
3,Office of Field Operations,Office of Field Operations,%all%,%all%,JUN,JUN,10385,%many-values%,all,all,all
4,Office of Field Operations,Office of Field Operations,%all%,%all%,MAY,MAY,7943,%many-values%,all,all,all
...,...,...,...,...,...,...,...,...,...,...,...
8,U.S. Border Patrol,U.S. Border Patrol,UC / Single Minors,UC / Single Minors,NOV,NOV,1,Unaccompanied Children (UC) / Single Minors,Other,UC / Single Minors,Title 42
9,U.S. Border Patrol,U.S. Border Patrol,UC / Single Minors,UC / Single Minors,OCT,OCT,18,Unaccompanied Children (UC) / Single Minors,Other,UC / Single Minors,Title 42
10,%all%,%all%,%all%,%all%,%all%,%all%,21,Unaccompanied Children (UC) / Single Minors,Other,UC / Single Minors,Title 42
11,%all%,%all%,%all%,%all%,NOV,NOV,1,Unaccompanied Children (UC) / Single Minors,Other,UC / Single Minors,Title 42


In [43]:
dataset[(dataset['Citizenship Grouping'] == 'El Salvador')
      & (dataset['Demographic'] == 'all')
      & (dataset['Title of Authority'] == 'all')
      & (dataset['Component-value'] == 'Office of Field Operations')
]

Unnamed: 0,Component-value,Component-alias,Demographic-value,Demographic-alias,Month (abbv)-value,Month (abbv)-alias,SUM(Encounter Count)-alias,ATTR(Demographic (copy))-alias,Citizenship Grouping,Demographic,Title of Authority
0,Office of Field Operations,Office of Field Operations,%all%,%all%,%all%,%all%,2665,%many-values%,El Salvador,all,all
1,Office of Field Operations,Office of Field Operations,%all%,%all%,AUG,AUG,718,%many-values%,El Salvador,all,all
2,Office of Field Operations,Office of Field Operations,%all%,%all%,JUL,JUL,562,%many-values%,El Salvador,all,all
3,Office of Field Operations,Office of Field Operations,%all%,%all%,JUN,JUN,527,%many-values%,El Salvador,all,all
4,Office of Field Operations,Office of Field Operations,%all%,%all%,MAY,MAY,411,%many-values%,El Salvador,all,all
5,Office of Field Operations,Office of Field Operations,%all%,%all%,APR,APR,200,%many-values%,El Salvador,all,all
6,Office of Field Operations,Office of Field Operations,%all%,%all%,MAR,MAR,52,%many-values%,El Salvador,all,all
7,Office of Field Operations,Office of Field Operations,%all%,%all%,FEB,FEB,37,%many-values%,El Salvador,all,all
8,Office of Field Operations,Office of Field Operations,%all%,%all%,JAN,JAN,47,%many-values%,El Salvador,all,all
9,Office of Field Operations,Office of Field Operations,%all%,%all%,DEC,DEC,39,%many-values%,El Salvador,all,all


# End