# Contents

- [1. Preparation](#1)
- [2. Define Imports and Variables](#2)
- [3. Read Most Recent Ship Register](#3)
- [4. Read Ship Register from archive](#4)
- [5. Spark Stop](#5)

# 1. Preparation <a class="anchor" id="1"></a>

## 1.1 Kernel
- Choose config template *ais-tt* in Ocean Spark.
- For advanced users, you may request your own config template as the following configuration is added:
    - If thru Ocean Spark config Template, add "spark.sql.parquet.enableVectorizedReader": "false". The configuration should be requested from 
    - If thru Jupyter Notebook, add spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")
    
    
- After choosing your kernel, wait for the kernel to turn "Idle" (empty circle)

## 1.2 Spark Session
A spark session named ```spark``` is already built for you based on the configuration of your chosen template

In [1]:
spark

## 1.3 Ship Register Data Structure

Please see: https://code.officialstatistics.org/trade-task-team-phase-1/samplecode/-/wikis/4.-Ships-Register-Data

# 2. Define Imports and Variables <a class="anchor" id="2"></a>

In [2]:
import pyspark.sql.functions as F
import pandas as pd

In [3]:
basepath = "s3a://ungp-ais-data-historical-backup/register/"
basepath_archive = "s3a://ungp-ais-data-historical-backup/register-archive/"

# 3. Read Most Recent Ship Register <a class="anchor" id="3"></a>

## 3.1 Get Version

In [4]:
df = spark.read.load(basepath+"version.csv", 
                     format="csv", sep=",", inferSchema="true")
df.show()

+--------+
|     _c0|
+--------+
|20231030|
+--------+



## 3.2 Read Ship Data (Fact) Table

In [5]:
df = spark.read.load(basepath+ "ShipData.CSV", 
                     format="csv", sep=",", inferSchema="true", header="true")

In [7]:
df.printSchema()

root
 |-- LRIMOShipNo: integer (nullable = true)
 |-- StatCode5: string (nullable = true)
 |-- AlterationsDescriptiveNarrative: string (nullable = true)
 |-- PropulsionTypeCode: string (nullable = true)
 |-- ShipName: string (nullable = true)
 |-- ExName: string (nullable = true)
 |-- MaritimeMobileServiceIdentityMMSINumber: integer (nullable = true)
 |-- RegisteredOwnerCode: integer (nullable = true)
 |-- RegisteredOwnerCountryOfRegistration: string (nullable = true)
 |-- RegisteredOwnerCountryofDomicile: string (nullable = true)
 |-- ShipManagerCompanyCode: integer (nullable = true)
 |-- ShipManagerCountryOfRegistration: string (nullable = true)
 |-- ShipManagerCountryofDomicileName: string (nullable = true)
 |-- GroupBeneficialOwnerCompanyCode: integer (nullable = true)
 |-- GroupBeneficialOwnerCountryOfRegistration: string (nullable = true)
 |-- GroupBeneficialOwnerCountryofDomicile: string (nullable = true)
 |-- OperatorCompanyCode: integer (nullable = true)
 |-- OperatorCountryOf

In [6]:
df.count()

254235

In [8]:
ship_data = df.toPandas()
ship_data.head()

Unnamed: 0,LRIMOShipNo,StatCode5,AlterationsDescriptiveNarrative,PropulsionTypeCode,ShipName,ExName,MaritimeMobileServiceIdentityMMSINumber,RegisteredOwnerCode,RegisteredOwnerCountryOfRegistration,RegisteredOwnerCountryofDomicile,...,PropulsionType,ShipStatus,ShiptypeLevel5,TotalBunkerCapacity,TotalHorsepowerofAuxiliaryGenerators,TotalHorsepowerofMainEngines,TotalHorsepowerofMainGenerators,TotalKilowattsofMainEngines,TotalPowerOfAllEngines,TotalPowerOfAuxiliaryEngines
0,1000019,X11A2YP,,DD,LADY K II,Princess Tanya,,5976406,Netherlands,Netherlands,...,"Oil Engine(s), Direct Drive",In Service/Commission,Yacht,,700.0,1680.0,,1236.0,1236,
1,1000021,X11A2YP,,DG,MONTKAJ,,,6012336,Cayman Islands,Cayman Islands,...,"Oil Engine(s), Geared Drive",In Service/Commission,Yacht,,872.0,5052.0,,3716.0,4466,750.0
2,1000033,X11A2YP,,DG,ASTRALIUM,,234028000.0,3019278,Jersey,Jersey,...,"Oil Engine(s), Geared Drive",In Service/Commission,Yacht,,120.0,1034.0,,760.0,850,90.0
3,1000045,X11A2YP,,DG,OKTANA,,239488000.0,5019446,Greece,Greece,...,"Oil Engine(s), Geared Drive",In Service/Commission,Yacht,,188.0,4568.0,,3360.0,3510,150.0
4,1000057,X11A2YP,,DD,LIMA,,,3015933,Saudi Arabia,Saudi Arabia,...,"Oil Engine(s), Direct Drive",Broken Up,Yacht,,,2196.0,,1616.0,1616,


## 3.3 Read Ship Type Codes (Dimension) Table

In [9]:
df = spark.read.load(basepath + "tblShipTypeCodes.CSV", 
                     format="csv", sep=",", inferSchema="true", header="true")

In [10]:
df.printSchema()

root
 |-- StatCode5: string (nullable = true)
 |-- ShiptypeLevel5: string (nullable = true)
 |-- Level4Code: string (nullable = true)
 |-- ShipTypeLevel4: string (nullable = true)
 |-- Level3Code: string (nullable = true)
 |-- ShipTypeLevel3: string (nullable = true)
 |-- Level2Code: string (nullable = true)
 |-- ShipTypeLevel2: string (nullable = true)
 |-- ShipTypeLevel1Code: string (nullable = true)
 |-- ShiptypeLevel1: string (nullable = true)
 |-- HullType: string (nullable = true)
 |-- SubGroup: string (nullable = true)
 |-- SubType: string (nullable = true)



In [None]:
df.count()

In [None]:
ship_type_codes= df.toPandas()
ship_type_codes.head()

# 4. Read Ship Register from archive <a class="anchor" id="4"></a>

## 4.1 Get Version

In [None]:
df = spark.read.load(basepath_archive+"versions.csv", 
                     format="csv", sep=",", inferSchema="true")
df.show()

## 4.2 Read Ship Data (Fact) Table from a specific version

In [15]:
#format is yyyy/yyyymmdd
basepath_archive_version = basepath_archive + '2021/20210609/'
df = spark.read.load(basepath_archive_version+"/ShipData.CSV", 
                     format="csv", sep=",", inferSchema="true", header="true")
ship_data = df.toPandas()
ship_data.head()

Unnamed: 0,LRIMOShipNo,StatCode5,AlterationsDescriptiveNarrative,PropulsionTypeCode,ShipName,ExName,MaritimeMobileServiceIdentityMMSINumber,RegisteredOwnerCode,RegisteredOwnerCountryOfRegistration,RegisteredOwnerCountryofDomicile,...,PropulsionType,ShipStatus,ShiptypeLevel5,TotalBunkerCapacity,TotalHorsepowerofAuxiliaryGenerators,TotalHorsepowerofMainEngines,TotalHorsepowerofMainGenerators,TotalKilowattsofMainEngines,TotalPowerOfAllEngines,TotalPowerOfAuxiliaryEngines
0,1000019,X11A2YP,,DD,LADY K II,Princess Tanya,,5976406,Netherlands,Netherlands,...,"Oil Engine(s), Direct Drive",In Service/Commission,Yacht,,700.0,1680.0,,1236.0,1236,
1,1000021,X11A2YP,,DG,MONTKAJ,,,6012336,Cayman Islands,Cayman Islands,...,"Oil Engine(s), Geared Drive",In Service/Commission,Yacht,,872.0,5052.0,,3716.0,4466,750.0
2,1000033,X11A2YP,,DG,ASTRALIUM,,234028000.0,3019278,Jersey,Jersey,...,"Oil Engine(s), Geared Drive",In Service/Commission,Yacht,,120.0,1034.0,,760.0,850,90.0
3,1000045,X11A2YP,,DG,OKTANA,,239488000.0,5019446,Greece,Greece,...,"Oil Engine(s), Geared Drive",In Service/Commission,Yacht,,188.0,4568.0,,3360.0,3510,150.0
4,1000057,X11A2YP,,DD,LIMA,,403070000.0,3015933,Saudi Arabia,Saudi Arabia,...,"Oil Engine(s), Direct Drive",Laid-Up,Yacht,,,2196.0,,1616.0,1616,


## 4.3 Read Ship Type Codes (Dimensio) Table

In [16]:
df = spark.read.load(basepath_archive_version + "tblShipTypeCodes.CSV", 
                     format="csv", sep=",", inferSchema="true", header="true")

ship_type_codes  = df.toPandas()
ship_type_codes.head()

Unnamed: 0,StatCode5,ShiptypeLevel5,Level4Code,ShipTypeLevel4,Level3Code,ShipTypeLevel3,Level2Code,ShipTypeLevel2,ShipTypeLevel1Code,ShiptypeLevel1,HullType,SubGroup,SubType
0,A11A2TN,LNG Tanker,A11A,LNG Tanker,A11,Liquefied Gas,A1,Tankers,A,Cargo Carrying,Ship Shape Including Multi-Hulls,Petroleum Products,LNG Tanker
1,A11A2TQ,CNG Tanker,A11A,LNG Tanker,A11,Liquefied Gas,A1,Tankers,A,Cargo Carrying,Ship Shape Including Multi-Hulls,Petroleum Products,CNG Tanker
2,A11A2TZ,Combination Gas Tanker (LNG/LPG),A11A,LNG Tanker,A11,Liquefied Gas,A1,Tankers,A,Cargo Carrying,Ship Shape Including Multi-Hulls,Petroleum Products,Combination Gas Tanker
3,A11B2TG,LPG Tanker,A11B,LPG Tanker,A11,Liquefied Gas,A1,Tankers,A,Cargo Carrying,Ship Shape Including Multi-Hulls,Petroleum Products,LPG Tanker
4,A11B2TH,LPG/Chemical Tanker,A11B,LPG Tanker,A11,Liquefied Gas,A1,Tankers,A,Cargo Carrying,Ship Shape Including Multi-Hulls,Petroleum Products,LPG/Chemical Tanker


# 5. Spark Stop <a class="anchor" id="5"></a>

In [20]:
spark.stop()