In [1]:
import pandas as pd
import plotly.express as px

mapbox_token = "pk.eyJ1IjoiemhvbmdqdW5tYSIsImEiOiJja2pwbzA3c2YwNjd3MnJxaDFrcHh5NmNyIn0.6mzTX-clHuP5h9lUO17TCw"
BORDER_SIZE = 10
DATA_DIR = '../blog-data'

# Port Congestion

We define two congestion indices:

Average Congestion Rate (ACR) --- A weighted average of the congestion rates for the top 50 container ports worldwide, with the weights determined by the number of ship visits to each port:

$$
ACR_t = \sum_{p \in \mathcal{P}} \left[ \frac{Delayed_{pt} + Undelayed_{pt}}{\sum_{p \in \mathcal{P}} \left( Delayed_{pt} + Undelayed_{pt} \right)} \cdot Congestion_{pt} \right],
$$ (acr)

where the congestion rate for each port $p \in \mathcal{P}$ is computed by dividing the number of delayed ship visits by the total number of ship visits:

$$
Congestion_{pt} \equiv \frac{Delayed_{pt}}{Delayed_{pt} + Undelayed_{pt}}, \ \forall p \in \mathcal{P}. 
$$ (congestion_rate_for_a_port)

$Delayed_{pt}$ and $Undelayed_{pt}$ represent the number of delayed and undelayed ship visits at port $p$ in month $t$, respectively. The full list of the top 50 container ports can be found using the following link: https://www.worldshipping.org/top-50-ports.

Average Congestion Time (ACT) --- The average number of hours a containership waits in an anchorage area of a port before docking at a berth, weighted by the number of ship visits to the top 50 container ports worldwide:

$$
ACT_t = \sum_{p \in \mathcal{P}} \left[ \frac{Delayed_{pt} + Undelayed_{pt}}{\sum_{p \in \mathcal{P}} \left( Delayed_{pt} + Undelayed_{pt} \right)} \cdot \frac{DelayHours_{pt}}{Delayed_{pt} + Undelayed_{pt}} \right], 
$$ (ACT)

where $DelayHours_{pt}$ represents the total number of hours that containerships spend in the anchorage areas of port $p$ in month $t$, respectively. 


The identification of berth and anchorage areas in each port is achieved through the application of the IMA-DBSCAN algorithm, i.e., the Iterative, Multi-Attribute, Density-Based Spatial Clustering of Applications with Noise. The details and pseudo-codes can be found in Bai *et al*. (2023){cite}`bai2023imadbscan` and Bai *et al*. (2023){cite}`bai2023supply`. 

The monthly series of the ACR and ACT indices can be downloaded using the provided link: [Aggregated Congestion Indices](./data/congestion_indices.xlsx). Additionally, the disaggregated monthly series of congestion rates for the top 50 container ports are visualized in the following plot:

In [2]:
df = pd.read_csv("../blog-data/congestion_new_with_coords.csv", sep="|", header=0)
df = df.loc[df["year"] >= 2017].copy()
df = df.round({"total_congestion_time": 1, "act": 2, "aact": 2, "acr": 4})
df = df.rename(columns={"act": "ACT", "aact": "AACT", "acr": "ACR"})
df["ACR"] = df["ACR"].apply(lambda x: format(x, ".2%"))
df["Time"] = df.apply(
    lambda x: pd.Timestamp(x["year"], x["month"], 1)
    .date()
    .strftime("%Y-%m"),
    axis=1,
)
sizeref = 2. * df['ACT'].max() / (40 ** 2)
df['size'] = df["ACT"]/sizeref

In [3]:
df

Unnamed: 0,port_name,year,month,num_of_ship,num_of_delay,total_congestion_time,ACT,AACT,ACR,longitude,latitude,Time,size
1,Algeciras,2017,1,247,66,1477.9,5.98,22.39,26.72%,-5.436667,36.126111,2017-01,25.701085
2,Algeciras,2017,2,193,45,1526.6,7.91,33.92,23.32%,-5.436667,36.126111,2017-02,33.995917
3,Algeciras,2017,3,223,27,603.2,2.71,22.34,12.11%,-5.436667,36.126111,2017-03,11.647147
4,Algeciras,2017,4,249,28,692.7,2.78,24.74,11.24%,-5.436667,36.126111,2017-04,11.947996
5,Algeciras,2017,5,288,73,1756.4,6.10,24.06,25.35%,-5.436667,36.126111,2017-05,26.216826
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3895,Yingkou,2023,6,39,16,401.7,10.30,25.11,41.03%,122.233333,40.683333,2023-06,44.267755
3896,Yingkou,2023,7,39,17,273.1,7.00,16.07,43.59%,122.233333,40.683333,2023-07,30.084882
3897,Yingkou,2023,8,36,13,163.0,4.53,12.54,36.11%,122.233333,40.683333,2023-08,19.469217
3898,Yingkou,2023,9,36,18,249.3,6.93,13.85,50.00%,122.233333,40.683333,2023-09,29.784034


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3854 entries, 1 to 3899
Data columns (total 13 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   port_name              3854 non-null   object 
 1   year                   3854 non-null   int64  
 2   month                  3854 non-null   int64  
 3   num_of_ship            3854 non-null   int64  
 4   num_of_delay           3854 non-null   int64  
 5   total_congestion_time  3854 non-null   float64
 6   ACT                    3854 non-null   float64
 7   AACT                   3785 non-null   float64
 8   ACR                    3854 non-null   object 
 9   longitude              3854 non-null   float64
 10  latitude               3854 non-null   float64
 11  Time                   3854 non-null   object 
 12  size                   3854 non-null   float64
dtypes: float64(6), int64(4), object(3)
memory usage: 421.5+ KB


In [5]:
fig = px.scatter_geo(
    df,
    lat="latitude",
    lon="longitude",
    # text=congestion_df['act'].round(1).astype(int),
    size="size",
    hover_name="port_name",
    hover_data={
        "ACR": True,
        "ACT": True,
        "AACT": True,
        "Time": False,
        "size": False,
        "latitude": False,
        "longitude": False,
    },
    animation_frame="Time",
    projection="natural earth",
    # scope='asia',
    # width=800,
    # height=600,
    opacity=1,
    size_max=40,
    title="Congestion of top 50 container ports from 2017-01 to 2023-10",
)
# fig.update_layout(
#     margin={"l": BORDER_SIZE, "r": BORDER_SIZE},
# )
fig.show()

## References

```{bibliography}
:filter: docname in docnames
```