# Project Group - 25

Members: Yun-An LIN (Jackie), Rohan Menezes, John Kuttikat, Muhammad Rizki Ziarieputra (Kiki), Ian Trout 

Student numbers: 5841682, 5850908, 5765382, 5848113, 5851483

# Research Objective

*Requires data modeling and quantitative research in Transport, Infrastructure & Logistics*

Vessel time spent in ports by country before and during COVID--an analysis by ship category showing the impacts of COVID

# Contribution Statement

*Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling*

**Author 1**: coding, background research, conceptualisation

**Author 2**:coding, visualisation

**Author 3**: coding, data analysis
    
**Author 4**:coding, data modelling

**Author 5**: coding, visualisation

# Data Used

----Covid data (https://data.humdata.org/dataset/coronavirus-covid-19-cases-and-deaths) 

----Port data (https://unctadstat.unctad.org/wds/TableViewer/tableView.aspx?ReportId=170027)

----total cargo loaded/unloaded by region from 1970 to 2020 (https://www.kaggle.com/datasets/illiaparfeniuk/maritime-trading-volumes)

----Total amount of goods imported and exported by ship per EU country(https://ec.europa.eu/eurostat/databrowser/view/ttr00009/default/map?lang=en)

# Data Pipeline

take only the last 6 months of each year (limitation of the maritime data): 
    
convert the maritime data:
    
    1) to a common volume 
    
    2) calculate the average volume for all cargo types 
    
    3) consolidate the data into regions of the world. 

convert COVID cases: 
    1) calculate the average vaccination cases per country that has reported it 
    2) calculate the average COVID cases per country for the last 6 months of every year (July to December) 
    
Analyze port call times for 2018, 2019, compared to 2020 to see the difference with COVID.

---calculate the differences 

Compare the 2020 and 2021 port call times to see if improvements have been made or if port calls are still slow. 

Visually show the change in port call times by region of the world by year. 



# Part I

In [67]:
import pandas as pd
import chardet
from plotly.offline import init_notebook_mode
import pandas as pd
import numpy as np
import plotly.io as pio
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import json
import itertools
import geopandas as gpd
import geoplot
import geoplot.crs as gcrs
import math
import scipy
from scipy.signal import find_peaks
import datetime

init_notebook_mode(connected=True)
pio.renderers.default = "plotly_mimetype+notebook"

In [68]:
file_path = r"/Users/iantrout/TIL6022-group_project/Data/Maritime data/US_PortCalls_S_ST202209220924_v1.csv"
with open(file_path, 'rb') as rawdata:
    result = chardet.detect(rawdata.read(100000))
result

{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}

In [69]:
# Importing and Touching-up the Data

df_ports = pd.read_csv(file_path,encoding='utf-8')
df_ports['Period Label'] = df_ports['Period Label'].str.replace('   ','-')
df_ports = df_ports.drop(columns=['Period', 'Frequency', 'Frequency Label', 'Economy', 
                                      'CommercialMarket', 'Median time in port (days) Footnote',
                                      'Average age of vessels Footnote', 'Average size (GT) of vessels Footnote',
                                      'Maximum size (GT) of vessels Footnote', 'Average cargo carrying capacity (dwt) per vessel Footnote',
                                      'Maximum cargo carrying capacity (dwt) of vessels Footnote','Average container carrying capacity (TEU) per container ship Footnote',
                                      'Maximum container carrying capacity (TEU) of container ships Footnote'])
df_ports.rename(columns = {'Economy Label': 'Location', 'CommercialMarket Label': 'Vessel_Type', }, inplace=True)

df_ports.head()

Unnamed: 0,Period Label,Year,Location,Vessel_Type,Median time in port (days),Average age of vessels,Average size (GT) of vessels,Maximum size (GT) of vessels,Average cargo carrying capacity (dwt) per vessel,Maximum cargo carrying capacity (dwt) of vessels,Average container carrying capacity (TEU) per container ship,Maximum container carrying capacity (TEU) of container ships
0,S1-2018,2018,World,All ships,0.97,18,15222,234006,24074.0,441561.0,3526.0,21413.0
1,S1-2018,2018,World,Passenger ships,,21,8978,228081,,,,
2,S1-2018,2018,World,Liquid bulk carriers,0.94,13,15470,234006,26871.0,441561.0,,
3,S1-2018,2018,World,Container ships,0.69,13,38405,217673,,,3526.0,21413.0
4,S1-2018,2018,World,Dry breakbulk carriers,1.12,19,5455,91784,7413.0,138743.0,,


In [70]:
df_ports.to_csv (r'/Users/iantrout/TIL6022-group_project/updated_port_info.csv')

In [71]:
geodata = gpd.read_file("/Users/iantrout/TIL6022-group_project/Data/countries.geojson") # geojson file
geodata.rename(columns = {'ADMIN': 'Location', }, inplace=True)
geodata.head()

Unnamed: 0,Location,ISO_A3,geometry
0,Aruba,ABW,"POLYGON ((-69.99694 12.57758, -69.93639 12.531..."
1,Afghanistan,AFG,"POLYGON ((71.04980 38.40866, 71.05714 38.40903..."
2,Angola,AGO,"MULTIPOLYGON (((11.73752 -16.69258, 11.73851 -..."
3,Anguilla,AIA,"MULTIPOLYGON (((-63.03767 18.21296, -63.09952 ..."
4,Albania,ALB,"POLYGON ((19.74777 42.57890, 19.74601 42.57993..."


In [72]:
geodata.to_file ("/Users/iantrout/TIL6022-group_project/Data/countries.geojson", driver="GeoJSON")


In [73]:
# Merge the two dataframes, using _ID column as key
geo_port = pd.merge(geodata, df_ports, on = 'Location')

geo_port.head()


Unnamed: 0,Location,ISO_A3,geometry,Period Label,Year,Vessel_Type,Median time in port (days),Average age of vessels,Average size (GT) of vessels,Maximum size (GT) of vessels,Average cargo carrying capacity (dwt) per vessel,Maximum cargo carrying capacity (dwt) of vessels,Average container carrying capacity (TEU) per container ship,Maximum container carrying capacity (TEU) of container ships
0,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S1-2018,2018,All ships,1.49,19,25686,168666,78572.0,299688.0,4263.0,8084.0
1,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S1-2018,2018,Passenger ships,,27,5105,168666,,,,
2,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S1-2018,2018,Liquid bulk carriers,1.34,7,23585,85496,40187.0,166447.0,,
3,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S1-2018,2018,Container ships,1.19,12,46778,90449,,,4263.0,8084.0
4,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S1-2018,2018,Dry breakbulk carriers,1.69,12,15417,54529,21345.0,80500.0,,


In [74]:
geo_port["Vessel_Type"].unique()

array(['All ships', 'Passenger ships', 'Liquid bulk carriers',
       'Container ships', 'Dry breakbulk carriers', 'Dry bulk carriers',
       'Roll-on/ roll-off ships', 'Liquefied petroleum gas carriers',
       'Liquefied natural gas carriers'], dtype=object)

In [75]:
geo_port_all_vessels= geo_port[
    (geo_port.Vessel_Type == 'All ships')
]

geo_port_all_vessels



Unnamed: 0,Location,ISO_A3,geometry,Period Label,Year,Vessel_Type,Median time in port (days),Average age of vessels,Average size (GT) of vessels,Maximum size (GT) of vessels,Average cargo carrying capacity (dwt) per vessel,Maximum cargo carrying capacity (dwt) of vessels,Average container carrying capacity (TEU) per container ship,Maximum container carrying capacity (TEU) of container ships
0,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S1-2018,2018,All ships,1.4900,19,25686,168666,78572.0,299688.0,4263.0,8084.0
9,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S2-2018,2018,All ships,1.4600,19,26220,168666,78398.0,299688.0,4454.0,8530.0
18,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S1-2019,2019,All ships,1.4014,20,24755,168666,78064.0,299688.0,4465.0,8500.0
27,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S2-2019,2019,All ships,1.4583,20,25938,168666,79711.0,299688.0,4553.0,9971.0
36,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",S1-2020,2020,All ships,1.4889,21,25790,169379,81834.0,299688.0,4653.0,9572.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1264,United States of America,USA,"MULTIPOLYGON (((-155.60652 20.13796, -155.5863...",S1-2020,2020,All ships,1.4625,24,16172,232618,47874.0,323183.0,5237.0,22000.0
1273,United States of America,USA,"MULTIPOLYGON (((-155.60652 20.13796, -155.5863...",S2-2020,2020,All ships,1.4438,25,15199,228741,46430.0,322829.0,5508.0,21237.0
1282,United States of America,USA,"MULTIPOLYGON (((-155.60652 20.13796, -155.5863...",S1-2021,2021,All ships,1.5639,26,15219,228741,47527.0,322829.0,5503.0,21237.0
1291,United States of America,USA,"MULTIPOLYGON (((-155.60652 20.13796, -155.5863...",S2-2021,2021,All ships,1.5861,26,15853,228081,47932.0,321300.0,5328.0,19273.0


In [28]:
#df['text'] = geo_port['Location'] + '<br>' + \
   # 'Passenger ships ' + geo_port['Passenger ships'] + ' Dairy ' + geo_port['dairy'] + '<br>' + \
   # 'Fruits ' + geo_port['total fruits'] + ' Veggies ' + geo_port['total veggies'] + '<br>' + \
   # 'Wheat ' + geo_port['wheat'] + ' Corn ' + geo_port['corn']

fig = px.choropleth(geo_port_all_vessels, locations="ISO_A3",
                    color="Median time in port (days)", 
                    hover_name="Location",
                    range_color=(0, 2),
                    animation_frame="Period Label",
                    #text=df['text'], # hover text
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()


In [None]:
# reading csv files
data1 = df_ports
data2 = pd.read_csv('xx')
  
# using merge function by setting how='outer'
output4 = pd.merge(data1, data2, 
                   on='Location', 
                   how='outer')
  
# displaying result
print(output4)

new = df_ports['Period Label'].str.split('  ', n = 1, expand = True)
df_ports['Semester']= new[0]
df_ports['Year']= new[1]
df_ports.drop(columns =['Period Label'], inplace = True)
df_ports = df_ports[['Period', 'Year', 'Semester', 'Frequency', 'Frequency Label', 'Economy',
       'Economy Label', 'CommercialMarket', 'CommercialMarket Label',
       'Median time in port (days)', 'Median time in port (days) Footnote',
       'Average age of vessels', 'Average age of vessels Footnote',
       'Average size (GT) of vessels', 'Average size (GT) of vessels Footnote',
       'Maximum size (GT) of vessels', 'Maximum size (GT) of vessels Footnote',
       'Average cargo carrying capacity (dwt) per vessel',
       'Average cargo carrying capacity (dwt) per vessel Footnote',
       'Maximum cargo carrying capacity (dwt) of vessels',
       'Maximum cargo carrying capacity (dwt) of vessels Footnote',
       'Average container carrying capacity (TEU) per container ship',
       'Average container carrying capacity (TEU) per container ship Footnote',
       'Maximum container carrying capacity (TEU) of container ships',
       'Maximum container carrying capacity (TEU) of container ships Footnote',
       ]]
df_ports

# Part II

In [56]:
# Variables from COVID data 
activity_1 = 'new cases'
activity_2 = 'new deaths'
activity_3 = 'cumulative cases'
activity_4 = 'cumulative deaths'

# Varaibles from Maritime data 
activity_5 = 'Median time in port (days)'
#activity_6 = 'port index value'
#activity_7 = 'port calls'
activity_8 = 'Average age of vessels'
activity_10 = 'Average size (GT) of vessels'
activity_9 = 'Vessel_Type'

# Common variables
region_1 = 'Asia'
region_2 = 'Oceania'
region_3 = 'Europe'
region_4 = 'Africa'
region_5 = 'North America'
region_6 = 'South America'
world_story = [region_1, region_2, region_3, region_4, region_5, region_6]


activities_story_1 = [activity_8, activity_5, activity_10]
#activities_story_2 = [activity_1, activity_2, activity_6]
#activities_story_3 = [activity_5, activity_2]


In [57]:
# first, I'm going to define a function to be able to select the different vessels in a list for each country for a specific time period
def data_highs(data, acitivity, **kwargs):

    diff_1 = data[activity].diff(periods = -1)
    diff_2 = data[activity].diff(periods = 1)
    
    peaks = []
    for i in range(len(diff_1)):
        if diff_1[i] > 0 and diff_2[i] > 0:
            peaks.append(int(i))          
            
    return peaks

# And do the same for the valleys
def data_lows(data, activity, **kwargs):

    diff_1 = data[activity].diff(periods = -1)
    diff_2 = data[activity].diff(periods = 1)

    valleys = []
    for i in range(len(diff_1)):
        if diff_1[i] < 0 and diff_2[i] < 0:
            valleys.append(int(i))          
            
    return valleys

In [63]:
# Then I start the figure and create several dictionaries that are necessary. The peaks and valleys dictionaries are for the graphs and the date dictionaries are for the next steps
fig_1 = go.Figure()

peaks_dict_1 = {}
valleys_dict_1 = {}
peaks_date_dict_1 = {}
valleys_date_dict_1 = {}

# I create a dataframe that contains only the data for the selected province and reset the indices for it
geo_port_all_vessels = geo_port[(geo_port.Vessel_Type == 'All ships')]
geo_port_all_vessels = geo_port_all_vessels[(geo_port_all_vessels.Location == 'Australia')]
geo_port_all_vessels.reset_index(inplace=True)

# I find the peaks and valleys and add them to the dictionaries
for activity in activities_story_1:
    max_ind = data_highs(geo_port_all_vessels, activity)
    peaks_dict_1[activity]=max_ind

    min_ind = data_lows(geo_port_all_vessels,activity)
    valleys_dict_1[activity]=min_ind
    
    # Then I turn them into dataframes to be able to use the dates for the graphs, and for the date dictionaries
    df_max_1 = geo_port_all_vessels.iloc[max_ind]
    df_min_1 = geo_port_all_vessels.iloc[min_ind]

# The date dictionaries are filled with the dates of the peaks and the valleys
    peaks_date_dict_1[activity] = df_max_1['Period Label']
    valleys_date_dict_1[activity] = df_min_1['Period Label']
    
    #The graphs are formatted 
    x1 = geo_port_all_vessels['Period Label']
    y1 = geo_port_all_vessels[activity]
    x2 = df_max_1['Period Label']
    y2 = df_max_1[activity]
    x3 = df_min_1['Period Label']
    y3 = df_min_1[activity]
    fig_1.add_trace(go.Scatter(x=x1,y=y1,name=activity))
    fig_1.add_trace(go.Scatter(x=x2,y=y2,mode='markers',name='peaks ' + activity))
    fig_1.add_trace(go.Scatter(x=x3,y=y3,mode='markers',name='valleys ' + activity))

fig_1.update_layout(title= activity_5 + ' and ' + activity_8 + ' during covid times in ' + activity_10)
fig_1.show()

In [66]:
fig_2 = go.Figure()
fig_2 = make_subplots(rows=3,cols=1)
x1 = geo_port_all_vessels['Period Label']
y1 = geo_port_all_vessels[activity_5]
x2 = df_max_1['Period Label']
y2 = df_max_1[activity_5]
x3 = geo_port_all_vessels['Period Label']
y3 = geo_port_all_vessels[activity_8]
x4 = df_min_1['Period Label']
y4 = df_min_1[activity_8]
x5 = geo_port_all_vessels['Period Label']
y5 = geo_port_all_vessels[activity_10]
x6 = df_max_1['Period Label']
y6 = df_max_1[activity_10]

fig_2.append_trace(go.Scatter(x=x1,y=y1,name=activity_5),row=1,col=1)
fig_2.append_trace(go.Scatter(x=x2,y=y2,mode='markers',name='peaks ' + activity_5),row=1,col=1)
fig_2.append_trace(go.Scatter(x=x3,y=y3,name=activity_8),row=2,col=1)
fig_2.append_trace(go.Scatter(x=x4,y=y4,mode='markers',name='valleys ' + activity_8),row=2,col=1)
fig_2.append_trace(go.Scatter(x=x5,y=y5,name=activity_10),row=3,col=1)
fig_2.append_trace(go.Scatter(x=x6,y=y6,mode='markers',name='valleys ' + activity_10),row=3,col=1)

fig_2.update_layout(title='Trends in vessel port time, age, and size thru the years')

fig_2.show()

Rate of change in the lines is Part II is the comparision factor. Comparing before COVID and after COVID

## Part III - Data visualisation

For this last part, we're going to visually show the effect that COVID had on vessel times so that users can see how ports have been impacted by COVID and thus has also impacted the logistics system as a whole by: 

We're going to look at regions and look at the semi annual trend by vessel type 

pie chart showing the proportions of the commodity shipped

World map showing the change in port call times over the years 

Comparing covid high periods vs low periods with port call times 

Interpreting the results 

First, We show our variables for this part.

We want to show the COVID data with the port time (worldwide)

In [None]:
# First, I'll make a graph of all covid data in the world
fig_5 = go.Figure()

x1 = df_ports['Period Label']
y1 = df_ports['Median time in port (days)']
fig_5.add_trace(go.Scatter(x=x1,y=y1, name=activity_5))


fig_5.update_layout(title='Covid data in world')
fig_5.show()

Now we will show over the years from 2018, the number of port calls by region

In [11]:
 fig = px.histogram(df_ports, y="Location", x="Median time in port (days)", orientation= "h",
             animation_frame="Period Label", 
             #range_x=[0,4000000000], 
                color="Location",)
fig.update_yaxes(categoryorder='sum ascending')

fig.show()

In [None]:
# create figure
fig = go.Figure()

# Add surface trace
fig.add_trace(go.Surface(z=.values.tolist(), colorscale="Viridis"))

# Update plot sizing
fig.update_layout(
    width=800,
    height=900,
    autosize=False,
    margin=dict(t=0, b=0, l=0, r=0),
    template="plotly_white",
)

# Add dropdown
fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(
                    args=["type", "surface"],
                    label="Asia",
                    method="restyle"
                ),
                dict(
                    args=["type", "heatmap"],
                    label="America",
                    method="restyle"
                ),
                dict(
                    args=["type", "heatmap"],
                    label="Africa",
                    method="restyle"
                )
            ]),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

# Add annotation
fig.update_layout(
    annotations=[
        dict(text="Countries:", showarrow=False,
        x=0, y=1.085, yref="paper", align="left")
    ]
)

fig.show()

In [None]:
pie = px.pie(df_new, values="occurance", names="Sectors", title="sector wise composition")
pie.show()
#https://www.youtube.com/watch?v=s_iEvTBSBfA
sunburst=px.sunburst(df_path=['Sectors', 'regions'],values='volume transported')
sunburst.show()