# Project Group - 25

Members: Yun-An LIN (Jackie), Rohan Menezes, John Kuttikat, Muhammad Rizki Ziarieputra (Kiki), Ian Trout 

Student numbers: 5841682, 5850908, 5765382, 5848113, 5851483

# Research Objective

*Requires data modeling and quantitative research in Transport, Infrastructure & Logistics*

Vessel time spent in ports by country before and during COVID--an analysis by ship category showing the impacts of COVID

# Contribution Statement

*Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling*

**Author 1**: coding, background research, conceptualisation

**Author 2**:coding, visualisation

**Author 3**: coding, data analysis
    
**Author 4**:coding, data modelling

**Author 5**: coding, visualisation

# Data Used

----Covid data (https://data.humdata.org/dataset/coronavirus-covid-19-cases-and-deaths) 

----Port data (https://unctadstat.unctad.org/wds/TableViewer/tableView.aspx?ReportId=170027)

----total cargo loaded/unloaded by region from 1970 to 2020 (https://www.kaggle.com/datasets/illiaparfeniuk/maritime-trading-volumes)

----Total amount of goods imported and exported by ship per EU country(https://ec.europa.eu/eurostat/databrowser/view/ttr00009/default/map?lang=en)

# Data Pipeline

take only the last 6 months of each year (limitation of the maritime data): 
    
convert the maritime data:
    
    1) to a common volume 
    
    2) calculate the average volume for all cargo types 
    
    3) consolidate the data into regions of the world. 

convert COVID cases: 
    1) calculate the average vaccination cases per country that has reported it 
    2) calculate the average COVID cases per country for the last 6 months of every year (July to December) 
    
Analyze port call times for 2018, 2019, compared to 2020 to see the difference with COVID.

---calculate the differences 

Compare the 2020 and 2021 port call times to see if improvements have been made or if port calls are still slow. 

Visually show the change in port call times by region of the world by year. 



first, we will import the necessary libraries



In [2]:
import pandas as pd
import chardet
from plotly.offline import init_notebook_mode
import pandas as pd
import numpy as np
import plotly.io as pio
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import json
import itertools
import geopandas as gpd
# import geoplot
# import geoplot.crs as gcrs
import math
import scipy
from scipy.signal import find_peaks
from datetime import datetime
from scipy.stats import spearmanr

init_notebook_mode(connected=True)
pio.renderers.default = "plotly_mimetype+notebook"

## Part I

First, We're going to import and combine dataframes of the four types of data I found:

Covid data from the WHO on country level, giving cases, hospitalizations and casualties per day (absolute and cumulative)

Port data from UNICSTAT on a country level, giving tonnage, median time in port, and other information (from 2018 to 2022) 

Port peformance index data for several ports within a country (data ranging from 2020 to 2021) 

GeoJSON file of all the countries in the world


We're starting off with the port and the geocoding datasets.

In [96]:
#file_path = r"C:\Users\user\OneDrive - Delft University of Technology\Desktop\TIL\Q1\TIL6022\TIL6022-group_project\Data\Maritime data\US_PortCalls_S_ST202209220924_v1.csv"
#with open(file_path, 'rb') as rawdata:
#    result = chardet.detect(rawdata.read(100000))
#result

#Delete this one and resue the one above when Ian is not using this 
file_path = "/Users/iantrout/TIL6022-group_project/Data/Maritime data/US_PortCalls_S_ST202209220924_v1.csv"
with open(portcalls_file_path, 'rb') as rawdata:    
    result = chardet.detect(rawdata.read(100000))
result

{'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}

In [97]:
# Importing and Touching-up the Data

df_ports = pd.read_csv(file_path,encoding='utf-8')
df_ports['Period Label'] = df_ports['Period Label'].str.replace('   ','-')
df_ports = df_ports.drop(columns=['Period', 'Frequency', 'Frequency Label', 'Economy', 
                                      'CommercialMarket', 'Median time in port (days) Footnote',
                                      'Average age of vessels Footnote', 'Average size (GT) of vessels Footnote',
                                      'Maximum size (GT) of vessels Footnote', 'Average cargo carrying capacity (dwt) per vessel Footnote',
                                      'Maximum cargo carrying capacity (dwt) of vessels Footnote','Average container carrying capacity (TEU) per container ship Footnote',
                                      'Maximum container carrying capacity (TEU) of container ships Footnote'])
df_ports.rename(columns = {'Economy Label': 'country', 'CommercialMarket Label': 'Vessel_Type', }, inplace=True)
date_change=[]
for row in df_ports['Period Label']:
    if row == 'S1-2018' :   date_change.append('2018-07-31')
    elif row == 'S2-2018':   date_change.append('2019-01-31')
    elif row == 'S1-2019':  date_change.append('2019-07-31')
    elif row == 'S2-2019':  date_change.append('2020-01-31')
    elif row == 'S1-2020':  date_change.append('2020-07-31')
    elif row == 'S2-2020':  date_change.append('2021-01-31')
    elif row == 'S1-2021':  date_change.append('2021-07-31')
    elif row == 'S2-2021':  date_change.append('2022-01-31')
    elif row == 'S1-2022':  date_change.append('2022-07-31')
    else:           date_change.append('Not_Rated')

df_ports = df_ports.drop(columns=['Period Label'])
df_ports['date'] = date_change
df_ports
df_ports.head()

Unnamed: 0,Year,country,Vessel_Type,Median time in port (days),Average age of vessels,Average size (GT) of vessels,Maximum size (GT) of vessels,Average cargo carrying capacity (dwt) per vessel,Maximum cargo carrying capacity (dwt) of vessels,Average container carrying capacity (TEU) per container ship,Maximum container carrying capacity (TEU) of container ships,date
0,2018,World,All ships,0.97,18,15222,234006,24074.0,441561.0,3526.0,21413.0,2018-07-31
1,2018,World,Passenger ships,,21,8978,228081,,,,,2018-07-31
2,2018,World,Liquid bulk carriers,0.94,13,15470,234006,26871.0,441561.0,,,2018-07-31
3,2018,World,Container ships,0.69,13,38405,217673,,,3526.0,21413.0,2018-07-31
4,2018,World,Dry breakbulk carriers,1.12,19,5455,91784,7413.0,138743.0,,,2018-07-31


In [98]:
# df_ports.to_csv (r'/Users/iantrout/TIL6022-group_project/updated_port_info.csv')

In [99]:
# geodata = gpd.read_file("/Users/iantrout/TIL6022-group_project/Data/countries.geojson") # geojson file
# geodata.rename(columns = {'ADMIN': 'Location', }, inplace=True)
# geodata.head()

In [100]:
# geodata.to_file ("/Users/iantrout/TIL6022-group_project/Data/countries.geojson", driver="GeoJSON")


In [101]:
# # Merge the two dataframes, using _ID column as key
# geo_port = pd.merge(geodata, df_ports, on = 'Location')

# geo_port.rename(columns = {'Location': 'country', }, inplace=True)

# geo_port.head()

now we will merge the other port performance index data with the table above

In [102]:
# port_2021_path = "/Users/iantrout/TIL6022-group_project/Data/The productivity of the ports/Container-Port-Performance-Index-2021 copy.csv"
# data_call_path = "/Users/iantrout/TIL6022-group_project/Data/Maritime data/US_PortCalls_S_ST202209220924_v1.csv"


# port_2021 = pd.read_csv(port_2021_path)
# data_call = pd.read_csv(data_call_path)
  
# # using merge function by setting how='outer'
# output = pd.merge(port_2021, data_call, 
#                    on='Economy Label', 
#                    how='outer')
  
# # displaying result
# print(output)

now we will merge the covid data with the port data 

In [103]:
# file_path = r'C:\Users\user\OneDrive - Delft University of Technology\Desktop\TIL\Q1\TIL6022\TIL6022-group_project\JOHN_FILES\covid_data.csv'
# df = pd.read_csv(file_path)
# df = df.rename({
#     'Date_reported': 'date',
#     'Country': 'country',
#     'New_cases': 'new_cases',
#     'Cumulative_cases': 'cumulative_cases'
# }, axis=1) 
# df = df.drop(labels=[
#     'New_deaths', 
#     'Cumulative_deaths', 
#     'Country_code', 
#     'WHO_region'
# ], axis=1)
# df.head()

# for i in range(len(df)):
#     k=df.iloc[i,0].split('-')
#     df.iloc[i,0]=datetime(int(k[0]),int(k[1]),int(k[2]))

# df_new = (df.groupby(['country', pd.Grouper(key='date', freq='6M')])
#         .max()
#         .reset_index())
# df_new.head()
# #fig = px.line(df_new, x='date', y='cumulative_cases', markers='True', color="country")
# #fig.show()

file_path2 = "/Users/iantrout/TIL6022-group_project/JOHN_FILES/covid_data_new.csv"
with open(file_path, 'rb') as rawdata:
    result = chardet.detect(rawdata.read(100000))
result

df_new = pd.read_csv(file_path2,encoding='utf-8')
df_new.head()

#use this file_path code for when Ian is not using it:
#r"C:\Users\user\OneDrive - Delft University of Technology\Desktop\TIL\Q1\TIL6022\Final Project\covid_data_new.csv"

Unnamed: 0.1,Unnamed: 0,country,date,new_cases,cumulative_cases
0,0,Afghanistan,2020-01-31,0,0
1,1,Afghanistan,2020-07-31,36628,2080134
2,2,Afghanistan,2021-01-31,18395,8092870
3,3,Afghanistan,2021-07-31,92131,14105445
4,4,Afghanistan,2022-01-31,14986,28667488


In [104]:
# df_ports_world = geo_port 
# df_ports_world.head()
# date_change=[]
# for row in df_ports_world['Period Label']:
#     if row == 'S1-2018' :   date_change.append('2018-07-31')
#     elif row == 'S2-2018':   date_change.append('2019-01-31')
#     elif row == 'S1-2019':  date_change.append('2019-07-31')
#     elif row == 'S2-2019':  date_change.append('2020-01-31')
#     elif row == 'S1-2020':  date_change.append('2020-07-31')
#     elif row == 'S2-2020':  date_change.append('2021-01-31')
#     elif row == 'S1-2021':  date_change.append('2021-07-31')
#     elif row == 'S2-2021':  date_change.append('2022-01-31')
#     elif row == 'S1-2022':  date_change.append('2022-07-31')
#     else:           date_change.append('Not_Rated')

# df_ports_world = df_ports_world.drop(columns=['Period Label'])
# df_ports_world['date'] = date_change
# df_ports_world

In [105]:
# df_combined=pd.merge(df_new,df_ports_world,on=['country','date'])
# df_combined.head()

df_combined = pd.merge(df_ports, df_new, on=['country','date'], how='outer')
df_combined = df_combined.drop(['Unnamed: 0', 'cumulative_cases'], axis=1)
df_combined

Unnamed: 0,Year,country,Vessel_Type,Median time in port (days),Average age of vessels,Average size (GT) of vessels,Maximum size (GT) of vessels,Average cargo carrying capacity (dwt) per vessel,Maximum cargo carrying capacity (dwt) of vessels,Average container carrying capacity (TEU) per container ship,Maximum container carrying capacity (TEU) of container ships,date,new_cases
0,2018.0,World,All ships,0.97,18.0,15222.0,234006.0,24074.0,441561.0,3526.0,21413.0,2018-07-31,
1,2018.0,World,Passenger ships,,21.0,8978.0,228081.0,,,,,2018-07-31,
2,2018.0,World,Liquid bulk carriers,0.94,13.0,15470.0,234006.0,26871.0,441561.0,,,2018-07-31,
3,2018.0,World,Container ships,0.69,13.0,38405.0,217673.0,,,3526.0,21413.0,2018-07-31,
4,2018.0,World,Dry breakbulk carriers,1.12,19.0,5455.0,91784.0,7413.0,138743.0,,,2018-07-31,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3179,,"occupied Palestinian territory, including east...",,,,,,,,,,2021-01-31,163678.0
3180,,"occupied Palestinian territory, including east...",,,,,,,,,,2021-07-31,167062.0
3181,,"occupied Palestinian territory, including east...",,,,,,,,,,2022-01-31,179138.0
3182,,"occupied Palestinian territory, including east...",,,,,,,,,,2022-07-31,157380.0


In [106]:
# Importing and Touching-up the Port Calls Data

#r"C:\Users\user\OneDrive - Delft University of Technology\Desktop\TIL\Q1\TIL6022\TIL6022-group_project\Data\Maritime data\US_PortCallsArrivals_S_ST202209220927_v1.csv"
#Note that the above path is for everyone except Ian; please replace it when you need to use the file path

file_path3 = "/Users/iantrout/TIL6022-group_project/Data/Maritime data/US_PortCallsArrivals_S_ST202209220927_v1.csv"
with open(file_path3, 'rb') as rawdata:
    result = chardet.detect(rawdata.read(100000))
result

df_port_calls = pd.read_csv(file_path3,encoding='utf-8')
df_port_calls['Period Label'] = df_port_calls['Period Label'].str.replace('   ','-')
df_port_calls = df_port_calls.drop(columns=['Period', 'Frequency', 'Frequency Label', 'Economy', 
                                       'CommercialMarket', 'Number of port calls Footnote',])
df_port_calls.rename(columns = {'Economy Label': 'country', 'CommercialMarket Label': 'Vessel_Type', }, inplace=True)
date_change=[]
for row in df_port_calls['Period Label']:
    if row == 'S1-2018' :   date_change.append('2018-07-31')
    elif row == 'S2-2018':   date_change.append('2019-01-31')
    elif row == 'S1-2019':  date_change.append('2019-07-31')
    elif row == 'S2-2019':  date_change.append('2020-01-31')
    elif row == 'S1-2020':  date_change.append('2020-07-31')
    elif row == 'S2-2020':  date_change.append('2021-01-31')
    elif row == 'S1-2021':  date_change.append('2021-07-31')
    elif row == 'S2-2021':  date_change.append('2022-01-31')
    elif row == 'S1-2022':  date_change.append('2022-07-31')
    else:           date_change.append('Not_Rated')

df_port_calls = df_port_calls.drop(columns=['Period Label'])
df_port_calls['date'] = date_change
df_port_calls

Unnamed: 0,Year,country,Vessel_Type,Number of port calls,date
0,2018,World,All ships,1984908,2018-07-31
1,2018,World,Passenger ships,1053697,2018-07-31
2,2018,World,Liquid bulk carriers,245147,2018-07-31
3,2018,World,Container ships,226063,2018-07-31
4,2018,World,Dry breakbulk carriers,211031,2018-07-31
...,...,...,...,...,...
15697,2022,United Kingdom,Dry breakbulk carriers,7967,2022-07-31
15698,2022,United Kingdom,Dry bulk carriers,956,2022-07-31
15699,2022,United Kingdom,Roll-on/ roll-off ships,7983,2022-07-31
15700,2022,United Kingdom,Liquefied petroleum gas carriers,632,2022-07-31


In [107]:
df_combined2 = pd.merge(df_port_calls, df_new, on=['country','date'], how='outer')
# df_combined2 = df_combined2.drop(['Unnamed: 0', 'cumulative_cases'], axis=1)
df_combined2

Unnamed: 0.1,Year,country,Vessel_Type,Number of port calls,date,Unnamed: 0,new_cases,cumulative_cases
0,2018.0,World,All ships,1984908.0,2018-07-31,,,
1,2018.0,World,Passenger ships,1053697.0,2018-07-31,,,
2,2018.0,World,Liquid bulk carriers,245147.0,2018-07-31,,,
3,2018.0,World,Container ships,226063.0,2018-07-31,,,
4,2018.0,World,Dry breakbulk carriers,211031.0,2018-07-31,,,
...,...,...,...,...,...,...,...,...
17248,,"occupied Palestinian territory, including east...",,,2021-01-31,1654.0,163678.0,15204283.0
17249,,"occupied Palestinian territory, including east...",,,2021-07-31,1655.0,167062.0,52958731.0
17250,,"occupied Palestinian territory, including east...",,,2022-01-31,1656.0,179138.0,79971265.0
17251,,"occupied Palestinian territory, including east...",,,2022-07-31,1657.0,157380.0,117827259.0


In [108]:
port_covid = pd.merge(df_combined, df_port_calls, on=['country','date', 'Vessel_Type'], how='outer')
port_covid

Unnamed: 0,Year_x,country,Vessel_Type,Median time in port (days),Average age of vessels,Average size (GT) of vessels,Maximum size (GT) of vessels,Average cargo carrying capacity (dwt) per vessel,Maximum cargo carrying capacity (dwt) of vessels,Average container carrying capacity (TEU) per container ship,Maximum container carrying capacity (TEU) of container ships,date,new_cases,Year_y,Number of port calls
0,2018.0,World,All ships,0.97,18.0,15222.0,234006.0,24074.0,441561.0,3526.0,21413.0,2018-07-31,,2018.0,1984908.0
1,2018.0,World,Passenger ships,,21.0,8978.0,228081.0,,,,,2018-07-31,,2018.0,1053697.0
2,2018.0,World,Liquid bulk carriers,0.94,13.0,15470.0,234006.0,26871.0,441561.0,,,2018-07-31,,2018.0,245147.0
3,2018.0,World,Container ships,0.69,13.0,38405.0,217673.0,,,3526.0,21413.0,2018-07-31,,2018.0,226063.0
4,2018.0,World,Dry breakbulk carriers,1.12,19.0,5455.0,91784.0,7413.0,138743.0,,,2018-07-31,,2018.0,211031.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17248,,"Europe, Northern America, Australia and New Ze...",Dry breakbulk carriers,,,,,,,,,2022-07-31,,2022.0,108618.0
17249,,"Europe, Northern America, Australia and New Ze...",Dry bulk carriers,,,,,,,,,2022-07-31,,2022.0,39257.0
17250,,"Europe, Northern America, Australia and New Ze...",Roll-on/ roll-off ships,,,,,,,,,2022-07-31,,2022.0,46544.0
17251,,"Europe, Northern America, Australia and New Ze...",Liquefied petroleum gas carriers,,,,,,,,,2022-07-31,,2022.0,8430.0


In [109]:
# port_covid.to_csv (r'C:\Users\user\OneDrive - Delft University of Technology\Desktop\TIL\Q1\TIL6022\Final Project\updated.csv')

In [110]:
# Geo data
df_geo = gpd.read_file("/Users/iantrout/TIL6022-group_project/Data/countries.geojson") # geojson file
df_geo.rename(columns = {'ADMIN': 'Location', }, inplace=True)

df_geo.rename(columns = {'Location': 'country', }, inplace=True)
df_geo.head()

Unnamed: 0,country,ISO_A3,geometry
0,Aruba,ABW,"POLYGON ((-69.99694 12.57758, -69.93639 12.531..."
1,Afghanistan,AFG,"POLYGON ((71.04980 38.40866, 71.05714 38.40903..."
2,Angola,AGO,"MULTIPOLYGON (((11.73752 -16.69258, 11.73851 -..."
3,Anguilla,AIA,"MULTIPOLYGON (((-63.03767 18.21296, -63.09952 ..."
4,Albania,ALB,"POLYGON ((19.74777 42.57890, 19.74601 42.57993..."


# Part II

we start by understanding how many countries we have data for and for that we will plot a world map for All ship types combined

In [111]:
# #df['text'] = geo_port['Location'] + '<br>' + \
#    # 'Passenger ships ' + geo_port['Passenger ships'] + ' Dairy ' + geo_port['dairy'] + '<br>' + \
#    # 'Fruits ' + geo_port['total fruits'] + ' Veggies ' + geo_port['total veggies'] + '<br>' + \
#    # 'Wheat ' + geo_port['wheat'] + ' Corn ' + geo_port['corn']
# geo_port_all_vessels= geo_port[
#     (geo_port.Vessel_Type == 'All ships')
# ]
# fig = px.choropleth(geo_port_all_vessels, locations="ISO_A3",
#                     color="Median time in port (days)", 
#                     hover_name="Location",
#                     range_color=(0, 2),
#                     animation_frame="Period Label",
#                     #text=df['text'], # hover text
#                     color_continuous_scale=px.colors.sequential.Plasma)
# fig.show()


now we can't only infer information from graphs, so we will calculate the peaks and valleys of the COVID data and the port data to see if there is a match based on serveral values of the port data (avg age of the vessel, average size of the vessel, average time in port)

In [112]:
# Variables from COVID data 
activity_1 = 'new cases'
activity_2 = 'new deaths'
activity_3 = 'cumulative cases'
activity_4 = 'cumulative deaths'

# Varaibles from Maritime data 
activity_5 = 'Median time in port (days)'
#activity_6 = 'port index value'
#activity_7 = 'port calls'
activity_8 = 'Average age of vessels'
activity_10 = 'Average size (GT) of vessels'
activity_9 = 'Vessel_Type'

# Common variables
region_1 = 'Asia'
region_2 = 'Oceania'
region_3 = 'Europe'
region_4 = 'Africa'
region_5 = 'North America'
region_6 = 'South America'
world_story = [region_1, region_2, region_3, region_4, region_5, region_6]


activities_story_1 = [activity_8, activity_5, activity_10]
#activities_story_2 = [activity_1, activity_2, activity_6]
#activities_story_3 = [activity_5, activity_2]


In [113]:
# first, I'm going to define a function to be able to select the different vessels in a list for each country for a specific time period
def data_highs(data, acitivity, **kwargs):

    diff_1 = data[activity].diff(periods = -1)
    diff_2 = data[activity].diff(periods = 1)
    
    peaks = []
    for i in range(len(diff_1)):
        if diff_1[i] > 0 and diff_2[i] > 0:
            peaks.append(int(i))          
            
    return peaks

# And do the same for the valleys
def data_lows(data, activity, **kwargs):

    diff_1 = data[activity].diff(periods = -1)
    diff_2 = data[activity].diff(periods = 1)

    valleys = []
    for i in range(len(diff_1)):
        if diff_1[i] < 0 and diff_2[i] < 0:
            valleys.append(int(i))          
            
    return valleys

In [114]:
# # Then I start the figure and create several dictionaries that are necessary. The peaks and valleys dictionaries are for the graphs and the date dictionaries are for the next steps
# fig_1 = go.Figure()

# peaks_dict_1 = {}
# valleys_dict_1 = {}
# peaks_date_dict_1 = {}
# valleys_date_dict_1 = {}

# # I create a dataframe that contains only the data for the selected province and reset the indices for it
# geo_port_all_vessels = geo_port[(geo_port.Vessel_Type == 'All ships')]
# geo_port_all_vessels = geo_port_all_vessels[(geo_port_all_vessels.Location == 'Australia')]
# geo_port_all_vessels.reset_index(inplace=True)

# # I find the peaks and valleys and add them to the dictionaries
# for activity in activities_story_1:
#     max_ind = data_highs(geo_port_all_vessels, activity)
#     peaks_dict_1[activity]=max_ind

#     min_ind = data_lows(geo_port_all_vessels,activity)
#     valleys_dict_1[activity]=min_ind
    
#     # Then I turn them into dataframes to be able to use the dates for the graphs, and for the date dictionaries
#     df_max_1 = geo_port_all_vessels.iloc[max_ind]
#     df_min_1 = geo_port_all_vessels.iloc[min_ind]

# # The date dictionaries are filled with the dates of the peaks and the valleys
#     peaks_date_dict_1[activity] = df_max_1['Period Label']
#     valleys_date_dict_1[activity] = df_min_1['Period Label']
    
#     #The graphs are formatted 
#     x1 = geo_port_all_vessels['Period Label']
#     y1 = geo_port_all_vessels[activity]
#     x2 = df_max_1['Period Label']
#     y2 = df_max_1[activity]
#     x3 = df_min_1['Period Label']
#     y3 = df_min_1[activity]
#     fig_1.add_trace(go.Scatter(x=x1,y=y1,name=activity))
#     fig_1.add_trace(go.Scatter(x=x2,y=y2,mode='markers',name='peaks ' + activity))
#     fig_1.add_trace(go.Scatter(x=x3,y=y3,mode='markers',name='valleys ' + activity))

# fig_1.update_layout(title= activity_5 + ' and ' + activity_8 + ' during covid times in ' + activity_10)
# fig_1.show()

this graph is hard to see since the values are not the same, thus we will make subplots to more cleary be able to compare the 3 port parameters. 

In [115]:
# fig_2 = go.Figure()
# fig_2 = make_subplots(rows=3,cols=1)
# x1 = geo_port_all_vessels['Period Label']
# y1 = geo_port_all_vessels[activity_5]
# x2 = df_max_1['Period Label']
# y2 = df_max_1[activity_5]
# x3 = geo_port_all_vessels['Period Label']
# y3 = geo_port_all_vessels[activity_8]
# x4 = df_min_1['Period Label']
# y4 = df_min_1[activity_8]
# x5 = geo_port_all_vessels['Period Label']
# y5 = geo_port_all_vessels[activity_10]
# x6 = df_max_1['Period Label']
# y6 = df_max_1[activity_10]

# fig_2.append_trace(go.Scatter(x=x1,y=y1,name=activity_5),row=1,col=1)
# fig_2.append_trace(go.Scatter(x=x2,y=y2,mode='markers',name='peaks ' + activity_5),row=1,col=1)
# fig_2.append_trace(go.Scatter(x=x3,y=y3,name=activity_8),row=2,col=1)
# fig_2.append_trace(go.Scatter(x=x4,y=y4,mode='markers',name='valleys ' + activity_8),row=2,col=1)
# fig_2.append_trace(go.Scatter(x=x5,y=y5,name=activity_10),row=3,col=1)
# fig_2.append_trace(go.Scatter(x=x6,y=y6,mode='markers',name='valleys ' + activity_10),row=3,col=1)

# fig_2.update_layout(title='Trends in vessel port time, age, and size thru the years')

# fig_2.show()

Rate of change in the lines is Part II is the comparision factor. Comparing before COVID and after COVID

In addition, we will do a world to world comparision of COVID cases versus port times: 

In [116]:
# file_path2 = r"C:\Users\user\OneDrive - Delft University of Technology\Desktop\TIL\Q1\TIL6022\Final Project\covid_data_new.csv"
# with open(file_path, 'rb') as rawdata:
#     result = chardet.detect(rawdata.read(100000))
# result

In [117]:
df_ports_world = df_ports[df_ports.country == 'World']
df_covid_world = df_new.groupby('date').sum()
df_covid_world = df_covid_world.drop(['2020-01-31','2023-01-31'])

df_combined_world =pd.merge(df_ports_world, df_covid_world, on=['date'], how='outer')
df_combined_world = df_combined_world.drop(['Unnamed: 0', 'cumulative_cases'], axis=1)
df_combined_world

Unnamed: 0,Year,country,Vessel_Type,Median time in port (days),Average age of vessels,Average size (GT) of vessels,Maximum size (GT) of vessels,Average cargo carrying capacity (dwt) per vessel,Maximum cargo carrying capacity (dwt) of vessels,Average container carrying capacity (TEU) per container ship,Maximum container carrying capacity (TEU) of container ships,date,new_cases
0,2018,World,All ships,0.9700,18,15222,234006,24074.0,441561.0,3526.0,21413.0,2018-07-31,
1,2018,World,Passenger ships,,21,8978,228081,,,,,2018-07-31,
2,2018,World,Liquid bulk carriers,0.9400,13,15470,234006,26871.0,441561.0,,,2018-07-31,
3,2018,World,Container ships,0.6900,13,38405,217673,,,3526.0,21413.0,2018-07-31,
4,2018,World,Dry breakbulk carriers,1.1200,19,5455,91784,7413.0,138743.0,,,2018-07-31,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
76,2022,World,Dry breakbulk carriers,1.1938,21,5571,91784,7604.0,138743.0,,,2022-07-31,198097947.0
77,2022,World,Dry bulk carriers,2.2306,14,32735,204014,58640.0,404389.0,,,2022-07-31,198097947.0
78,2022,World,Roll-on/ roll-off ships,,17,25706,100430,10319.0,55828.0,,,2022-07-31,198097947.0
79,2022,World,Liquefied petroleum gas carriers,1.0292,16,10726,60784,11986.0,64220.0,,,2022-07-31,198097947.0


In [118]:
# df_combined_world.to_csv(r"C:\Users\user\OneDrive - Delft University of Technology\Desktop\TIL\Q1\TIL6022\Final Project\result.csv")

In [119]:
df_combined_world = df_combined_world[df_combined_world['Vessel_Type'] == 'All ships']

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=df_combined_world['date'], y=df_combined_world['Median time in port (days)'], name="Median time in port (days)"),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=df_combined_world['date'], y=df_combined_world['new_cases'], name="New covid cases"),
    secondary_y=True,
)

# Add figure title
fig.update_layout(
    title_text="World - all vessel type"
)

# Set x-axis title
fig.update_xaxes(title_text="Date")

# Set y-axes titles
fig.update_yaxes(title_text="<b>primary</b> Median time in port (days)", secondary_y=False)
fig.update_yaxes(title_text="<b>secondary</b> New covid cases", secondary_y=True)

fig.show()

In [120]:
# data1 = df_combined_world['new_cases']
# data2 = df_combined_world['Median time in port (days)']
# coef, p = spearmanr(data1, data2)
# print('Spearmans correlation coefficient: %.3f' % coef)
# # interpret the significance
# alpha = 0.05
# if p > alpha:
# 	print('Samples are uncorrelated (fail to reject H0) p=%.3f' % p)
# else:
# 	print('Samples are correlated (reject H0) p=%.3f' % p)

In [121]:
from scipy import stats

df_combined_world=df_combined_world.dropna(subset=['new_cases'])
data1 = df_combined_world['new_cases']
data2 = df_combined_world['Median time in port (days)']
print("The pearson correlation data for the whole world is (Pearson's correlation coefficient r, P-value):")
stats.pearsonr(data1, data2)

The pearson correlation data for the whole world is (Pearson's correlation coefficient r, P-value):


PearsonRResult(statistic=0.9139174003700252, pvalue=0.029923900661430667)

In [122]:
df_port_calls_world = df_port_calls[df_port_calls.country == 'World']
df_covid_world = df_new.groupby('date').sum()
df_covid_world = df_covid_world.drop(['2020-01-31','2023-01-31'])

df_combined_world_calls = pd.merge(df_port_calls_world, df_covid_world, on=['date'], how='outer')
df_combined_world_calls = df_combined_world_calls.drop(['Unnamed: 0', 'cumulative_cases'], axis=1)
df_combined_world_calls

Unnamed: 0,Year,country,Vessel_Type,Number of port calls,date,new_cases
0,2018,World,All ships,1984908,2018-07-31,
1,2018,World,Passenger ships,1053697,2018-07-31,
2,2018,World,Liquid bulk carriers,245147,2018-07-31,
3,2018,World,Container ships,226063,2018-07-31,
4,2018,World,Dry breakbulk carriers,211031,2018-07-31,
...,...,...,...,...,...,...
76,2022,World,Dry breakbulk carriers,210158,2022-07-31,198097947.0
77,2022,World,Dry bulk carriers,138318,2022-07-31,198097947.0
78,2022,World,Roll-on/ roll-off ships,88572,2022-07-31,198097947.0
79,2022,World,Liquefied petroleum gas carriers,29257,2022-07-31,198097947.0


In [123]:
df_combined_world_calls = df_combined_world_calls[df_combined_world_calls['Vessel_Type'] == 'All ships']

# Create figure with secondary y-axis
fig = make_subplots(specs=[[{"secondary_y": True}]])

# Add traces
fig.add_trace(
    go.Scatter(x=df_combined_world_calls['date'], y=df_combined_world_calls['Number of port calls'], name="Number of port calls"),
    secondary_y=False,
)

fig.add_trace(
    go.Scatter(x=df_combined_world_calls['date'], y=df_combined_world_calls['new_cases'], name="New covid cases"),
    secondary_y=True,
)

# Add figure title
fig.update_layout(
    title_text="World - all vessel type"
)

# Set x-axis title
fig.update_xaxes(title_text="Date")

# Set y-axes titles
fig.update_yaxes(title_text="<b>primary</b> Number of port calls", secondary_y=False)
fig.update_yaxes(title_text="<b>secondary</b> New covid cases", secondary_y=True)

fig.show()

In [124]:
# from scipy import stats

# df_combined_world_calls = df_combined_world_calls.dropna(subset=['new_cases'])
# data3 = df_combined_world_calls['new_cases']
# data4 = df_combined_world_calls['Number of port calls']
# print("The pearson correlation data for the whole world is (Pearson's correlation coefficient r, P-value):")
# stats.pearsonr(data3, data4)

## Part III - Data visualisation

For this last part, we're going to visually show the effect that COVID had on vessel times so that users can see how ports have been impacted by COVID and thus has also impacted the logistics system as a whole by: 

We're going to look at regions and look at the semi annual trend by vessel type 

pie chart showing the proportions of the commodity shipped

World map showing the change in port call times over the years 

Comparing covid high periods vs low periods with port call times 

Interpreting the results 

First, We show our variables for this part.

We want to show the COVID data with the port time (worldwide)

In [125]:
# First, I'll make a graph of all covid data in the world
fig_5 = go.Figure()

x1 = df_ports['Period Label']
y1 = df_ports['Median time in port (days)']
fig_5.add_trace(go.Scatter(x=x1,y=y1, name=activity_5))


fig_5.update_layout(title='Covid data in world')
fig_5.show()

KeyError: 'Period Label'

Now we will show over the years from 2018, the number of port calls by region

In [126]:
 fig = px.histogram(df_ports, y="Location", x="Median time in port (days)", orientation= "h",
             animation_frame="Period Label", 
             #range_x=[0,4000000000], 
                color="Location",)
fig.update_yaxes(categoryorder='sum ascending')

fig.show()

ValueError: Value of 'y' is not the name of a column in 'data_frame'. Expected one of ['Year', 'country', 'Vessel_Type', 'Median time in port (days)', 'Average age of vessels', 'Average size (GT) of vessels', 'Maximum size (GT) of vessels', 'Average cargo carrying capacity (dwt) per vessel', 'Maximum cargo carrying capacity (dwt) of vessels', 'Average container carrying capacity (TEU) per container ship', 'Maximum container carrying capacity (TEU) of container ships', 'date'] but received: Location

In [127]:
# load dataset
df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/volcano.csv")

# create figure
fig = go.Figure()

# Add surface trace
fig.add_trace(go.Surface(z=df.values.tolist(), colorscale="Viridis"))

# Update plot sizing
fig.update_layout(
    width=800,
    height=900,
    autosize=False,
    margin=dict(t=0, b=0, l=0, r=0),
    template="plotly_white",
)


# Add dropdown
fig.update_layout(
    updatemenus=[
        dict(
            buttons=list([
                dict(
                    args=["type", "surface"],
                    label="Asia",
                    method="restyle"
                ),
                dict(
                    args=["type", "heatmap"],
                    label="America",
                    method="restyle"
                ),
                dict(
                    args=["type", "heatmap"],
                    label="Africa",
                    method="restyle"
                )
            ]),
            direction="down",
            pad={"r": 10, "t": 10},
            showactive=True,
            x=0.1,
            xanchor="left",
            y=1.1,
            yanchor="top"
        ),
    ]
)

# Add annotation
fig.update_layout(
    annotations=[
        dict(text="Countries:", showarrow=False,
        x=0, y=1.085, yref="paper", align="left")
    ]
)

fig.show()

In [128]:
pie = px.pie(df_new, values="occurance", names="Sectors", title="sector wise composition")
pie.show()
#https://www.youtube.com/watch?v=s_iEvTBSBfA
sunburst=px.sunburst(df_path=['Sectors', 'regions'],values='volume transported')
sunburst.show()

ValueError: Value of 'names' is not the name of a column in 'data_frame'. Expected one of ['Unnamed: 0', 'country', 'date', 'new_cases', 'cumulative_cases'] but received: Sectors

In [129]:
import streamlit as st

#Creating different horizontal sections in the webpage
header = st.container() 
data= st.container()


with header:  # accesing the section for presenting the info
    st.title('Impact of Covid on Vessel waiting time') # prints the string  in the section as title

with data:# all the data is processed as 

    df_combined2 =pd.merge(df_port_calls,df_new,on=['country','date'], how='outer')# merging port call and covid data

    st.write(port_covid.head(25))
    st.write(df_combined2.head(25))# Displaying the data on the webpage
    
    
    
    port_col, covid_col =st.columns(2) # dividing the webpage in 2 columns so you can show covid graph and port graph next to each other
    
    #CREATING a dynamic multiselect box for user to choose countries from
    country_options = port_covid['country'].unique() #converting unqiue values to list,
    #[Note:(do not use tolist() if it's already a list), in our case, it's already a list, otherwise it would be df_combined['country'].tolist().unique())
    #This list will be used as options for multiselect for user to chhose which country data he wants to see
    
    # creating a multiselect toggle option for user to choose from country_options and setting the default option as world
    country= st.multiselect('Which country data would you like to see',country_options,['Netherlands']) 
    
    
    #CREATING a dynamic dropdown box for user to choose vessel types from
    vessel_options = port_covid['Vessel_Type'].unique()
    vessel = st.selectbox('Which Vessel data would you like to see',options =vessel_options,index=0)
    # index sets the default value at the index of the list that will be displayed if nothing is selected.
    
    
    
    #filetering the data according to user's choice in both options
    df=port_covid[(port_covid['country'].isin(country)) & (port_covid['Vessel_Type']==vessel)]
    #period_options = df['Period Label'].unique()
    #period = st.selectbox('Which period data would you like to see',options=period_options,index=0)
    
    dp=df_combined2[(df_combined2['country'].isin(country)) & (df_combined2['Vessel_Type']==vessel)]
    
   # Accessing port_col vertical section of the webpage and plotting different port graphs
    with port_col:
        fig = px.line(df,x='date',y='Median time in port (days)',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        fig = px.line(df,x='date',y='Average age of vessels',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        fig = px.line(df,x='date',y='Average cargo carrying capacity (dwt) per vessel',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        fig = px.line(df,x='date',y='Average size (GT) of vessels',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        fig = px.line(dp,x='date',y='Number of port calls',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        # Accessing covid_col vertical section of the webpage and plotting different covid graphs next to port graphs
    with covid_col:
        fig = px.line(df,x='date',y='new_cases',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        fig = px.line(df,x='date',y='new_cases',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        fig = px.line(df,x='date',y='new_cases',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        fig = px.line(df,x='date',y='new_cases',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        fig = px.line(dp,x='date',y='new_cases',color='country',markers=True)
        fig.update_layout(width=400)
        st.write(fig)
        
        
# to run on the webpage : go to cmd go to the file path where this file is located using command 'cd'
#then type streamlit run port1.1.py

ModuleNotFoundError: No module named 'streamlit'

LOCATION SPECIFIC

EAST VS WEST

In [130]:
from scipy import stats
# df_combined = pd.merge(df_ports, df_new, on=['country','date'], how='outer')
# df_combined = df_combined.drop(['Unnamed: 0', 'cumulative_cases'], axis=1)
# df_combined
# df_combined_world = df_combined_world[df_combined_world['Vessel_Type'] == 'All ships']

port_covid = pd.merge(df_ports, df_new, on=['country','date'], how='outer')
port_covid_us = port_covid[port_covid["country"] == 'United States of America']
port_covid_west_allships = port_covid_us[port_covid_us["Vessel_Type"] == 'All ships']
# port_covid_west_allships = port_covid_us[port_covid_us["Vessel_Type"] == 'Container ships']
port_covid_west_allships

port_covid_west_allships=port_covid_west_allships.dropna(subset=['new_cases'])
data1 = port_covid_west_allships['new_cases']
data2 = port_covid_west_allships['Median time in port (days)']
print("The pearson correlation data for the US is (Pearson's correlation coefficient r, P-value):")
stats.pearsonr(data1, data2)

The pearson correlation data for the US is (Pearson's correlation coefficient r, P-value):


PearsonRResult(statistic=0.4509116455455507, pvalue=0.36947250538584875)

In [131]:
port_covid = pd.merge(df_ports, df_new, on=['country','date'], how='outer')
port_covid_china = port_covid[port_covid["country"] == 'China']
port_covid_east_allships = port_covid_china[port_covid_china["Vessel_Type"] == 'All ships']
# port_covid_east_allships = port_covid_china[port_covid_china["Vessel_Type"] == 'Container ships']
port_covid_east_allships

port_covid_east_allships=port_covid_east_allships.dropna(subset=['new_cases'])
data1 = port_covid_east_allships['new_cases']
data2 = port_covid_east_allships['Median time in port (days)']
print("The pearson correlation data for China is (Pearson's correlation coefficient r, P-value):")
stats.pearsonr(data1, data2)

The pearson correlation data for China is (Pearson's correlation coefficient r, P-value):


PearsonRResult(statistic=0.43361944525590984, pvalue=0.3903366587135776)

In [132]:
port_covid = pd.merge(df_ports, df_new, on=['country','date'], how='outer')
port_covid_indo = port_covid[port_covid["country"] == 'Indonesia']
port_covid_indo_allships = port_covid_indo[port_covid_indo["Vessel_Type"] == 'All ships']
# port_covid_indo_allships = port_covid_indo[port_covid_china["Vessel_Type"] == 'Container ships']
port_covid_indo_allships

port_covid_indo_allships=port_covid_indo_allships.dropna(subset=['new_cases'])
data1 = port_covid_indo_allships['new_cases']
data2 = port_covid_indo_allships['Median time in port (days)']
print("The pearson correlation data for Indonesia is (Pearson's correlation coefficient r, P-value):")
stats.pearsonr(data1, data2)

The pearson correlation data for Indonesia is (Pearson's correlation coefficient r, P-value):


PearsonRResult(statistic=0.5796336610430461, pvalue=0.22792077053085105)

In [133]:
port_covid = pd.merge(df_ports, df_new, on=['country','date'], how='outer')
port_covid_nl = port_covid[port_covid["country"] == 'Netherlands']
port_covid_nl_allships = port_covid_nl[port_covid_nl["Vessel_Type"] == 'All ships']
# port_covid_indo_allships = port_covid_indo[port_covid_china["Vessel_Type"] == 'Container ships']
port_covid_nl_allships

port_covid_nl_allships=port_covid_nl_allships.dropna(subset=['new_cases'])
data1 = port_covid_nl_allships['new_cases']
data2 = port_covid_nl_allships['Median time in port (days)']
print("The pearson correlation data for Netherlands is (Pearson's correlation coefficient r, P-value):")
stats.pearsonr(data1, data2)

The pearson correlation data for Netherlands is (Pearson's correlation coefficient r, P-value):


PearsonRResult(statistic=0.862706470520963, pvalue=0.026980311754594966)

In [134]:
port_covid = pd.merge(df_ports, df_new, on=['country','date'], how='outer')
port_covid_uk = port_covid[port_covid["country"] == 'United Kingdom']
port_covid_uk_allships = port_covid_uk[port_covid_uk["Vessel_Type"] == 'All ships']
# port_covid_indo_allships = port_covid_indo[port_covid_china["Vessel_Type"] == 'Container ships']
port_covid_uk_allships

# port_covid_uk_allships=port_covid_uk_allships.dropna(subset=['new_cases'])
# data1 = port_covid_uk_allships['new_cases']
# data2 = port_covid_uk_allships['Median time in port (days)']
# print("The pearson correlation data for the UK is (Pearson's correlation coefficient r, P-value):")
# stats.pearsonr(data1, data2)

Unnamed: 0.1,Year,country,Vessel_Type,Median time in port (days),Average age of vessels,Average size (GT) of vessels,Maximum size (GT) of vessels,Average cargo carrying capacity (dwt) per vessel,Maximum cargo carrying capacity (dwt) of vessels,Average container carrying capacity (TEU) per container ship,Maximum container carrying capacity (TEU) of container ships,date,Unnamed: 0,new_cases,cumulative_cases
171,2018.0,United Kingdom,All ships,1.09,17.0,13546.0,217673.0,11828.0,320051.0,3549.0,21413.0,2018-07-31,,,
352,2018.0,United Kingdom,All ships,1.1,17.0,13784.0,217673.0,12258.0,323183.0,3382.0,21413.0,2019-01-31,,,
532,2019.0,United Kingdom,All ships,1.0799,17.0,13625.0,219775.0,12722.0,319994.0,3400.0,21413.0,2019-07-31,,,
713,2019.0,United Kingdom,All ships,1.0764,17.0,13755.0,232618.0,12295.0,320926.0,3366.0,23756.0,2020-01-31,,,
894,2020.0,United Kingdom,All ships,1.1014,17.0,15309.0,235500.0,12920.0,321300.0,3359.0,23964.0,2020-07-31,,,
1076,2020.0,United Kingdom,All ships,1.1125,17.0,14194.0,236583.0,11945.0,321300.0,3564.0,23964.0,2021-01-31,,,
1257,2021.0,United Kingdom,All ships,1.1625,18.0,14350.0,236583.0,11973.0,321300.0,3162.0,23964.0,2021-07-31,,,
1440,2021.0,United Kingdom,All ships,1.1639,18.0,14244.0,235579.0,12269.0,320785.0,3064.0,23992.0,2022-01-31,,,
1624,2022.0,United Kingdom,All ships,1.2132,19.0,13975.0,235579.0,12913.0,319778.0,3309.0,23992.0,2022-07-31,,,


In [135]:
port_covid = pd.merge(df_ports, df_new, on=['country','date'], how='outer')
port_covid_jpn = port_covid[port_covid["country"] == 'Japan']
port_covid_jpn_allships = port_covid_jpn[port_covid_jpn["Vessel_Type"] == 'All ships']
# port_covid_indo_allships = port_covid_indo[port_covid_china["Vessel_Type"] == 'Container ships']
port_covid_jpn_allships

port_covid_jpn_allships=port_covid_jpn_allships.dropna(subset=['new_cases'])
data1 = port_covid_jpn_allships['new_cases']
data2 = port_covid_jpn_allships['Median time in port (days)']
print("The pearson correlation data for the Japan is (Pearson's correlation coefficient r, P-value):")
stats.pearsonr(data1, data2)

The pearson correlation data for the Japan is (Pearson's correlation coefficient r, P-value):


PearsonRResult(statistic=0.6416277642089109, pvalue=0.16963299780473085)

In [136]:
port_covid_new = port_covid.dropna(subset=['new_cases'])
port_covid_new = port_covid_new.dropna(subset=['Vessel_Type'])
port_covid_new = port_covid_new[port_covid_new['Vessel_Type'] == 'All ships']
port_covid_new

# unq_country = port_covid_new['country'].unique()
# unq_country

port_au = port_covid_new[port_covid_new["country"] == 'Australia']
cc_au = stats.pearsonr(port_au['new_cases'], port_au['Median time in port (days)'])[0]
pv_au = stats.pearsonr(port_au['new_cases'], port_au['Median time in port (days)'])[1]

port_ca = port_covid_new[port_covid_new["country"] == 'Canada']
cc_ca = stats.pearsonr(port_ca['new_cases'], port_ca['Median time in port (days)'])[0]
pv_ca = stats.pearsonr(port_ca['new_cases'], port_ca['Median time in port (days)'])[1]

port_ch = port_covid_new[port_covid_new["country"] == 'China']
cc_ch = stats.pearsonr(port_ch['new_cases'], port_ch['Median time in port (days)'])[0]
pv_ch = stats.pearsonr(port_ch['new_cases'], port_ch['Median time in port (days)'])[1]

port_cr = port_covid_new[port_covid_new["country"] == 'Croatia']
cc_cr = stats.pearsonr(port_cr['new_cases'], port_cr['Median time in port (days)'])[0]
pv_cr = stats.pearsonr(port_cr['new_cases'], port_cr['Median time in port (days)'])[1]

port_dn = port_covid_new[port_covid_new["country"] == 'Denmark']
cc_dn = stats.pearsonr(port_dn['new_cases'], port_dn['Median time in port (days)'])[0]
pv_dn = stats.pearsonr(port_dn['new_cases'], port_dn['Median time in port (days)'])[1]

port_fr = port_covid_new[port_covid_new["country"] == 'France']
cc_fr = stats.pearsonr(port_fr['new_cases'], port_fr['Median time in port (days)'])[0]
pv_fr = stats.pearsonr(port_fr['new_cases'], port_fr['Median time in port (days)'])[1]

port_ge = port_covid_new[port_covid_new["country"] == 'Germany']
cc_ge = stats.pearsonr(port_ge['new_cases'], port_ge['Median time in port (days)'])[0]
pv_ge = stats.pearsonr(port_ge['new_cases'], port_ge['Median time in port (days)'])[1]

port_gr = port_covid_new[port_covid_new["country"] == 'Greece']
cc_gr = stats.pearsonr(port_gr['new_cases'], port_gr['Median time in port (days)'])[0]
pv_gr = stats.pearsonr(port_gr['new_cases'], port_gr['Median time in port (days)'])[1]

port_id = port_covid_new[port_covid_new["country"] == 'Indonesia']
cc_id = stats.pearsonr(port_id['new_cases'], port_id['Median time in port (days)'])[0]
pv_id = stats.pearsonr(port_id['new_cases'], port_id['Median time in port (days)'])[1]

port_it = port_covid_new[port_covid_new["country"] == 'Italy']
cc_it = stats.pearsonr(port_it['new_cases'], port_it['Median time in port (days)'])[0]
pv_it = stats.pearsonr(port_it['new_cases'], port_it['Median time in port (days)'])[1]

port_jp = port_covid_new[port_covid_new["country"] == 'Japan']
cc_jp = stats.pearsonr(port_jp['new_cases'], port_jp['Median time in port (days)'])[0]
pv_jp = stats.pearsonr(port_jp['new_cases'], port_jp['Median time in port (days)'])[1]

port_nl = port_covid_new[port_covid_new["country"] == 'Netherlands']
cc_nl = stats.pearsonr(port_nl['new_cases'], port_nl['Median time in port (days)'])[0]
pv_nl = stats.pearsonr(port_nl['new_cases'], port_nl['Median time in port (days)'])[1]

port_no = port_covid_new[port_covid_new["country"] == 'Norway']
cc_no = stats.pearsonr(port_no['new_cases'], port_no['Median time in port (days)'])[0]
pv_no = stats.pearsonr(port_no['new_cases'], port_no['Median time in port (days)'])[1]

port_ru = port_covid_new[port_covid_new["country"] == 'Russian Federation']
cc_ru = stats.pearsonr(port_ru['new_cases'], port_ru['Median time in port (days)'])[0]
pv_ru = stats.pearsonr(port_ru['new_cases'], port_ru['Median time in port (days)'])[1]

port_sp = port_covid_new[port_covid_new["country"] == 'Spain']
cc_sp = stats.pearsonr(port_sp['new_cases'], port_sp['Median time in port (days)'])[0]
pv_sp = stats.pearsonr(port_sp['new_cases'], port_sp['Median time in port (days)'])[1]

port_sw = port_covid_new[port_covid_new["country"] == 'Sweden']
cc_sw = stats.pearsonr(port_sw['new_cases'], port_sw['Median time in port (days)'])[0]
pv_sw = stats.pearsonr(port_sw['new_cases'], port_sw['Median time in port (days)'])[1]

port_tr = port_covid_new[port_covid_new["country"] == 'Türkiye']
cc_tr = stats.pearsonr(port_tr['new_cases'], port_tr['Median time in port (days)'])[0]
pv_tr = stats.pearsonr(port_tr['new_cases'], port_tr['Median time in port (days)'])[1]

port_us = port_covid_new[port_covid_new["country"] == 'United States of America']
cc_us = stats.pearsonr(port_us['new_cases'], port_us['Median time in port (days)'])[0]
pv_us = stats.pearsonr(port_us['new_cases'], port_us['Median time in port (days)'])[1]

# initialize data of lists.
data_pearson = {'country': ['Australia', 'Canada', 'China', 'Croatia', 'Denmark', 'France',
                            'Germany', 'Greece', 'Indonesia', 'Italy', 'Japan', 'Netherlands',
                            'Norway', 'Russian Federation', 'Spain', 'Sweden', 'Türkiye',
                            'United States of America'],
                'Correlation Coeff.': [cc_au, cc_ca, cc_ch, cc_cr, cc_dn, cc_fr,
                                        cc_ge, cc_gr, cc_id, cc_it, cc_jp, cc_nl,
                                        cc_no, cc_ru, cc_sp, cc_sw, cc_tr,
                                        cc_us],
                'P-value': [pv_au, pv_ca, pv_ch, pv_cr, pv_dn, pv_fr,
                            pv_ge, pv_gr, pv_id, pv_it, pv_jp, pv_nl,
                            pv_no, pv_ru, pv_sp, pv_sw, pv_tr,
                            pv_us]}
  
# Create DataFrame
pearson_covid_port = pd.DataFrame(data_pearson)
  
# Print the output.
pearson_covid_port

# port_covid_new.to_csv (r'C:\Users\user\OneDrive - Delft University of Technology\Desktop\TIL\Q1\TIL6022\Final Project\correlation data.csv')

Unnamed: 0,country,Correlation Coeff.,P-value
0,Australia,0.719339,0.107102
1,Canada,-0.252046,0.629937
2,China,0.433619,0.390337
3,Croatia,0.188726,0.720272
4,Denmark,-0.62331,0.186118
5,France,0.332295,0.519903
6,Germany,0.344195,0.504096
7,Greece,0.514415,0.296441
8,Indonesia,0.579634,0.227921
9,Italy,0.945915,0.004309


In [137]:
geo_pearson = pd.merge(df_geo, pearson_covid_port, on = 'country')
geo_pearson.head()

Unnamed: 0,country,ISO_A3,geometry,Correlation Coeff.,P-value
0,Australia,AUS,"MULTIPOLYGON (((158.86573 -54.74993, 158.83823...",0.719339,0.107102
1,Canada,CAN,"MULTIPOLYGON (((-65.61059 43.42817, -65.62881 ...",-0.252046,0.629937
2,China,CHN,"MULTIPOLYGON (((111.20460 15.77924, 111.19654 ...",0.433619,0.390337
3,Germany,DEU,"MULTIPOLYGON (((6.74220 53.57836, 6.74952 53.5...",0.344195,0.504096
4,Denmark,DNK,"MULTIPOLYGON (((11.25603 54.95458, 11.30348 54...",-0.62331,0.186118


In [139]:
#this will be a world map for P-value/correlation coeff

fig = px.choropleth(geo_pearson, locations="ISO_A3",
                    color="P-value", 
                    hover_name="country",
                    range_color=(0, 1),
                    hover_data=['Correlation Coeff.'], #I just want to shorten it done to 4 decimal places %10.4f
                    color_continuous_scale=px.colors.sequential.Plasma)
fig.show()

In [48]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb

pearsoncorr = port_covid_new.corr(method='pearson')
pearsoncorr

NameError: name 'port_covid_new' is not defined