# Ocean Carrier Alliances - Data Analysis for USDA Report

This notebook analyses data on maritime container shipping to inspect the impacts of strategic alliances between ocean freight carriers (Ocean Carrier Alliances, OCAs) on the containerized agricultural export market in the US. Data are pre-processed in the oca_data_prep notebook; see the [project repo](https://github.com/epistemetrica/Ocean-Carrier-Alliances-Project) for full details. 

In [1]:
#preliminaries
import numpy as np
import pandas as pd
import polars as pl
import plotly_express as px
import statsmodels.api as sm

#enable string cache for polars categoricals
pl.enable_string_cache()
#display settings
pd.set_option('display.max_columns', None)

In [2]:
#load data
main_lf = (
    #load parquet files to polars lazyframe 
    pl.scan_parquet('../data/main/*.parquet')
    #choose only exports
    .filter(pl.col('direction')=='export')
    #drop unneeded columns
    .drop('origin_territory', 'origin_region', 'direction', 'carrier_name', 
          'carrier_scac', 'vessel_name', 'voyage_number')
    #rename cols for consistency/sanity
    .rename({'unified_carrier_name':'carrier_name',
             'unified_carrier_scac':'scac',
             'arrival_port_name':'dest_port_name',
             'arrival_port_code':'dest_port_code',
             'departure_port_name':'origin_port_name',
             'departure_port_code':'origin_port_code',
             'pc_alliance':'owner_alliance'})
    #reorder columns for sanity
    .select('bol_id', 'date', 'month', 'year', 'teus', 'hs_code', 'lane_id', 
            'lane_name', 'origin_port_name', 'origin_port_code', 'coast_region', 
            'dest_port_name', 'dest_port_code', 'dest_territory', 'dest_region',  
            'drewery_lane', 'dist', 'rate_40', 'rate_20', 'scac', 'carrier_name', 
            'alliance', 'alliance_member', 'vessel_id', 'vessel_capacity', 
            'vessel_owner', 'owner_alliance', 'primary_cargo', 'shared_teus', 
            'cargo_source')
)

## Data Summary

The main dataset holds a unique bill of lading (BOL) on each row and includes data from various sources in the columns described below.  

### Column Definitions

- bol_id - unique identifier of each BOL
- date, month, year - departure date of the export
- teus - volume of the shipment in twenty-foot-equivalent (TEU) units
- hs_code - list of commodity codes included in the shipment
- lane_id - unique identifier of the lane (a combination of origin and destination port codes)
- lane_name - name of the lane
- origin_port_name - name of the US port of origin
- origin_port_code - CBP code for the port of origin
- coast_region - US coastal region of origin
- dest_port_name - name of the foreign destination port
- dest_port_code - CBP code for the foreign destination port
- dest_territory - name of country or territory of destination
- dest_region - global region of destination
- drewery_lane - Drewery lane matched to the lane_id using summed haversine distances between the origin/destination port pairs
- dist - summed haversine distance between the origin/destination port pairs 
- rate_40 - Drewery CFRI rate index for 40-ft equivalent containers when the cargo was carried
- rate_20 - Drewery CFRI rate index for 20-ft equivalent containers (TEU) when the cargo was carried
- scac - Standard Carrier Alpha Code (SCAC) for the firm who carried the cargo
- carrier_name - name of the carrier
- alliance - the alliance to which the carrier belonged when the cargo was sold (NOTE this needs to be updated when we input regional alliance data)
- alliance_member - boolean for whether or not the carrier was a member of any alliance when that cargo was carried
- vessel_id - International Maritime Organization (IMO) code uniquely identifying the vessel
- vessel_capacity - metric of how many TEU can be carried by the vessel at any given time (computed from Net Register Tonnage)
- vessel_owner - the carrier representing the most TEUs carried by that vessel during the month on which the BOL was carried
- owner_alliance - the OCA to which the vessel_owner belonged at the time
- primary_cargo - boolean indicating whether the cargo came from the vessel_owner or not
- shared_teus - number of TEUs if the cargo did not come from the vessel_owner (identical to 'teus' when primary_cargo is 0). 
- cargo_source - indicator for whether the cargo came from a member of the vessel_owner's alliance or not. 


### Summary stats and head

In [3]:
desc = main_lf.describe()
desc.write_csv('../data/misc/usda_maindata_summarystat.csv')
display(desc)
main_lf.limit(5).collect()

statistic,bol_id,date,month,year,teus,hs_code,lane_id,lane_name,origin_port_name,origin_port_code,coast_region,dest_port_name,dest_port_code,dest_territory,dest_region,drewery_lane,dist,rate_40,rate_20,scac,carrier_name,alliance,alliance_member,vessel_id,vessel_capacity,vessel_owner,owner_alliance,primary_cargo,shared_teus,cargo_source
str,str,str,str,f64,f64,str,str,str,str,str,str,str,str,str,str,str,f64,f64,f64,str,str,str,f64,f64,f64,str,str,f64,f64,str
"""count""","""63737316""","""63737242""","""63737317""",63737317.0,63737317.0,"""63735980""","""63737317""","""63737317""","""63737317""","""63737317""","""63737173""","""63737317""","""63737317""","""63734881""","""63734881""","""63734875""",63734875.0,35720688.0,35720688.0,"""63737317""","""63696714""","""63737317""",63737317.0,63737317.0,59237959.0,"""63737317""","""63737317""",63737317.0,63737317.0,"""63737317"""
"""null_count""","""1""","""75""","""0""",0.0,0.0,"""1337""","""0""","""0""","""0""","""0""","""144""","""0""","""0""","""2436""","""2436""","""2442""",2442.0,28016629.0,28016629.0,"""0""","""40603""","""0""",0.0,0.0,4499358.0,"""0""","""0""",0.0,0.0,"""0"""
"""mean""",,"""2015-12-20 00:00:26.571000""",,2015.465141,3.220729,,,,,,,,,,,,1436.760648,1671.20566,1279.975398,,,,0.301567,9231900.0,2160.482837,,,0.702031,1.040772,
"""std""",,,,4.741281,5.982663,,,,,,,,,,,,1194.145132,793.815448,541.86681,,,,,474503.072275,1499.188461,,,,3.692474,
"""min""","""079A_26004878070""","""2007-01-01""","""200701""",2007.0,0.01,"""-1""",,,,,,,,,,,5.93014,400.0,310.0,,,"""2M Alliance""",0.0,196.0,0.0,,"""2M Alliance""",0.0,0.0,"""ally"""
"""25%""",,"""2012-02-27""",,2012.0,2.0,,,,,,,,,,,,425.931559,1100.0,880.0,,,,,9218686.0,905.147059,,,,0.0,
"""50%""",,"""2016-02-20""",,2016.0,2.533158,,,,,,,,,,,,1261.373246,1590.0,1230.0,,,,,9315202.0,2036.911765,,,,0.0,
"""75%""",,"""2019-12-13""",,2019.0,2.533158,,,,,,,,,,,,2377.521062,2020.0,1530.0,,,,,9430868.0,3253.161765,,,,2.0,
"""max""","""zzzz_ZZZZ""","""2023-12-31""","""202312""",2023.0,3729.25,"""ddedo""",,,,,,,,,,,8769.910108,15900.0,12070.0,,,"""The Alliance""",1.0,9979125.0,17889.705882,,"""The Alliance""",1.0,1123.25,"""non-ally"""


bol_id,date,month,year,teus,hs_code,lane_id,lane_name,origin_port_name,origin_port_code,coast_region,dest_port_name,dest_port_code,dest_territory,dest_region,drewery_lane,dist,rate_40,rate_20,scac,carrier_name,alliance,alliance_member,vessel_id,vessel_capacity,vessel_owner,owner_alliance,primary_cargo,shared_teus,cargo_source
str,date,str,i32,f64,str,cat,cat,cat,cat,cat,cat,cat,cat,cat,cat,f64,f64,f64,cat,cat,str,bool,i32,f64,cat,str,bool,f64,str
"""ZIML_IMUORF177673""",2007-01-01,"""200701""",2007,2.533158,"""390120""","""5301_24128""","""Houston — Kingston""","""HOUSTON""","""5301""","""GULF""","""KINGSTON""","""24128""","""JAMAICA""","""CARIBBEAN""","""US Gulf Coast (Houston) to Col…",860.905655,,,"""ZIMU""","""ZIM CONTAINER""","""Non-alliance Carriers""",False,9008562,,"""ZIMU""","""Non-alliance Carriers""",True,0.0,"""non-ally"""
"""MLSL_800184208""",2007-01-01,"""200701""",2007,2.533158,"""901890""","""5301_47563""","""Houston — Cagliari""","""HOUSTON""","""5301""","""GULF""","""CAGLIARI""","""47563""","""ITALY""","""MEDITERRANEAN""","""US Gulf Coast (Houston) to Wes…",585.05394,,,"""MAEU""","""MAERSK LINE""","""Non-alliance Carriers""",False,8212702,,"""MAEU""","""Non-alliance Carriers""",True,0.0,"""non-ally"""
"""MLSL_511755218""",2007-01-01,"""200701""",2007,2.533158,"""080290""","""5301_47563""","""Houston — Cagliari""","""HOUSTON""","""5301""","""GULF""","""CAGLIARI""","""47563""","""ITALY""","""MEDITERRANEAN""","""US Gulf Coast (Houston) to Wes…",585.05394,,,"""MAEU""","""MAERSK LINE""","""Non-alliance Carriers""",False,8212702,,"""MAEU""","""Non-alliance Carriers""",True,0.0,"""non-ally"""
"""MLSL_851964547""",2007-01-01,"""200701""",2007,2.533158,"""732690""","""5301_47536""","""Houston — Gioia Tauro""","""HOUSTON""","""5301""","""GULF""","""GIOIA TAURO""","""47536""","""ITALY""","""MEDITERRANEAN""","""US Gulf Coast (Houston) to Wes…",890.01468,,,"""MAEU""","""MAERSK LINE""","""Non-alliance Carriers""",False,8212702,,"""MAEU""","""Non-alliance Carriers""",True,0.0,"""non-ally"""
"""MLSL_HOUM30858""",2007-01-01,"""200701""",2007,2.533158,"""401199""","""5301_47536""","""Houston — Gioia Tauro""","""HOUSTON""","""5301""","""GULF""","""GIOIA TAURO""","""47536""","""ITALY""","""MEDITERRANEAN""","""US Gulf Coast (Houston) to Wes…",890.01468,,,"""MAEU""","""MAERSK LINE""","""Non-alliance Carriers""",False,8212702,,"""MAEU""","""Non-alliance Carriers""",True,0.0,"""non-ally"""
