## CBP Entry Ports Data

The directory of the entry ports in the CBP jurisdiction includes terrestrial ports, maritimal ports and airpoirts of the southwest landborder.

df_CA = https://www.cbp.gov/about/contact/ports/ca

df_AZ = https://www.cbp.gov/about/contact/ports/az

df_nm = https://www.cbp.gov/about/contact/ports/nm

df_tx = https://www.cbp.gov/about/contact/ports/tx


In [2]:
from bs4 import BeautifulSoup

import requests
import re

import pandas as pd

from geopy.geocoders import Photon
from geopy.geocoders import Nominatim

### Step 1: Data Scrap

In [3]:
# URLs and their corresponding state abbreviations
urls = {
    'ca': 'https://www.cbp.gov/about/contact/ports/ca',
    'az': 'https://www.cbp.gov/about/contact/ports/az',
    'nm': 'https://www.cbp.gov/about/contact/ports/nm',
    'tx': 'https://www.cbp.gov/about/contact/ports/tx'
}

# Function to scrape and create a DataFrame from a given URL
def scrape_to_df(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    table = soup.find('table')
    data = []
    for row in table.find_all('tr'):
        cols = row.find_all('td')
        if len(cols) > 0:
            col_data = [ele.text.strip().replace('\n', ' ') for ele in cols]
            data.append(col_data)
    return pd.DataFrame(data, columns=['Port Name', 'Location Address', 'Field Operation Office'])

# Dictionary to hold each state's DataFrame
state_dfs = {}

# Scrape each URL and store the resulting DataFrame
for state_abbr, url in urls.items():
    df_temp = scrape_to_df(url)
    # Add a state column to each DataFrame
    df_temp['State'] = state_abbr.upper()
    state_dfs[f'df_{state_abbr}'] = df_temp

# Merge all state DataFrames into df_sb_all
df_sb_all = pd.concat(state_dfs.values(), ignore_index=True)


In [11]:
#Take out the airports
#Otra opcion podria ser quitar por CP, hacer una lista de cp que aplican y quitar los demas

In [14]:
# Filtrar el DataFrame para excluir filas que contienen la palabra "airport"
df_sb_all = df_sb_all[~df_sb_all.apply(lambda x: x.astype(str).str.contains('airport', case=False, na=False)).any(axis=1)]


In [13]:
num_filas = df_sb_all_filtered.shape[0]
print(f"El número total de líneas (filas) es: {num_filas}")

El número total de líneas (filas) es: 47


 ### CBP Entry Ports 
 #### Step 2: Get Geolocate Addresses to Coords

In [15]:
# Inicializar el geolocalizador de Photon
geolocator = Photon(user_agent="geoapiExercises")

# Función para obtener latitud y longitud
def get_lat_long(address):
    try:
        location = geolocator.geocode(address)
        if location:
            return (location.latitude, location.longitude)
        else:
            return (None, None)
    except:
        return (None, None)

# Aplicar la función a la columna 'Location Address'
df_sb_all['Coordinates'] = df_sb_all['Location Address'].apply(lambda x: get_lat_long(x))

# Separar las coordenadas en dos columnas nuevas: 'Latitude' y 'Longitude'
df_sb_all[['Latitude', 'Longitude']] = pd.DataFrame(df_sb_all['Coordinates'].tolist(), index=df_sb_all.index)


In [16]:
df_sb_all.head()

Unnamed: 0,Port Name,Location Address,Field Operation Office,State,Coordinates,Latitude,Longitude
0,"Andrade - Class A, California - 2502","235 Andrade Road Winterhaven, CA 92283 United ...",San Diego,CA,"(32.7184216, -114.727919)",32.718422,-114.727919
1,"Calexico East - Class A, California - 2507","1699 East Carr, Rd Bldg A Calexico, CA 92231 U...",San Diego,CA,"(32.6753413, -115.3887997)",32.675341,-115.3888
2,"Calexico West - Class A, California - 2503","200 East First Street Calexico, CA 92231 Unite...",San Diego,CA,"(38.88981295000001, -77.00902077737487)",38.889813,-77.009021
3,"Eureka, California - 2802","317 3rd Street, Suite 6 Eureka, CA 95501 Unite...",San Francisco,CA,"(40.8035929, -124.16821893938675)",40.803593,-124.168219
4,"Fresno (2803/2882), California - 2803","5177 E. Clinton Way Fresno, CA 93727 United St...",San Francisco,CA,"(36.7722997, -119.7316224)",36.7723,-119.731622


In [17]:
#guardar el archivo CSV
path = "/Users/pablouriarte/Documents/1. Expediente Tec de Monterrey/1.Tesis/Mapa_Migracion_Irregular_Mexico/1. distribution_Institucional/3. CBP Entry Ports/cbp_entry_ports_loc.csv"

# Guardar el DataFrame como CSV
df_sb_all.to_csv(path, index=False)

print("Archivo guardado exitosamente en:", path)


Archivo guardado exitosamente en: /Users/pablouriarte/Documents/1. Expediente Tec de Monterrey/1.Tesis/Mapa_Migracion_Irregular_Mexico/1. distribution_Institucional/3. CBP Entry Ports/cbp_entry_ports_loc.csv
