<a href="https://colab.research.google.com/github/shiftkey-labs/ProofsToPrograms-Workshop/blob/main/Activity.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ðŸš¢ Maritime Data Taskforce
**Phase 2: The Implementation**

**The Challenge:**
You have just received a raw telemetry dataset of 1,000 global maritime voyages. The fleet is bleeding capital through inefficient sailing speeds, and looming environmental regulations threaten to ground vessels.

Your mandate is to mathematically model fleet dynamics, identify exactly which ships are failing their regulatory efficiency targets, and calculate the exact financial impact of hull maintenance on the syndicate's bottom line.

## ðŸŸ¢ Level 1: The Data Engineer (Easy)
**Focus:** Data Hygiene, Basic Vectorization, Boolean Logic.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# --- TASK 1: INGESTION & INSPECTION ---
# TODO: Load 'fleet_data.csv' into a variable named df
df = ...

# TODO: Print the shape of the dataframe
print("Data Shape:", ...)

# TODO: Print the number of missing values in the 'Cargo_Weight_Tons' column
print("Missing Cargo Weights:", ...)


# --- TASK 2: DATA HYGIENE (IMPUTATION) ---
# TODO: Calculate the median of 'Cargo_Weight_Tons'
median_cargo = ...

# TODO: Fill the NaN values in 'Cargo_Weight_Tons' with this median
# Hint: Use .fillna(..., inplace=True)
...

In [None]:
# --- TASK 3: VECTORIZED LOGIC (BIOFOULING) ---
# We need to assign the Drag Factor (k) based on the Hull Status.
# TODO: Create a new column 'Drag_k' using np.where()
# Clean = _______, Fouled = ________
df['Drag_k'] = ...


# --- TASK 4: THE PHYSICS ENGINE (MASTER FORMULAS) ---
# Extracting columns as NumPy arrays for speed
k = df[].values
D = df[].values
v = df[].values
W = df[].values

# TODO: Calculate Total_Fuel and CII_Rating using the Math Team's formulas.
# Formula 1: F =
# Formula 2: CII =
df['Total_Fuel'] = ...
df['CII_Rating'] = ...

print(df[['Ship_ID', 'Total_Fuel', 'CII_Rating']].head())

In [None]:
# --- TASK 5: STATISTICAL ANOMALY DETECTION ---
# TODO: Calculate the Mean (mu) and Standard Deviation (sigma) of the 'CII_Rating' column.
mu = ...
sigma = ...

# TODO: Create a Boolean column called 'Is_Offender'.
# A ship is an offender if CII_Rating > mu + 1.5 * sigma.


# TODO: Print the total number of Offenders in the fleet.
# Hint: You can use .sum() on a boolean column.
print("Total Offenders:", ...)

## ðŸŸ¡ Level 2: The Computational Scientist (Medium)
**Focus:** Advanced NumPy Math, Aggregation, Matrices, Visualization.

In [None]:
# --- TASK 6: THE OPTIMIZATION (CALCULUS IMPLEMENTATION) ---
Fuel_Price = 600
C_h = df['Hourly_Cost_USD'].values

# TODO: Calculate 'Optimal_Speed' using the Math Team's derivative formula.
# Formula: v_opt = CubeRoot( C_h / (2 * Fuel_Price * k) )
# Hint: Use np.cbrt()
df['Optimal_Speed'] = ...

# TODO: Calculate 'Financial_Waste'
# 1. Calculate what the fuel WOULD be at the Optimal_Speed.
fuel_opt = ...

# 2. Subtract 'fuel_opt' from 'Total_Fuel' (current fuel), and multiply by Fuel_Price.
df['Financial_Waste'] = ...

print(f"Total Fleet Financial Waste: ${df['Financial_Waste'].sum():,.2f}")

In [None]:
# --- TASK 7: ADVANCED FILTERING & SORTING ---
# TODO: Create a new dataframe containing ONLY ships where Type is "Tanker".
tankers = ...

# TODO: Sort this Tanker dataframe by 'Financial_Waste' in DESCENDING order.
tankers_sorted = ...

# TODO: Print the Ship_ID of the single most wasteful Tanker.
print("Most Wasteful Tanker ID:", ...)




In [None]:
# --- TASK 8: MATRIX OPERATIONS (ROUTE PLANNING) ---
adj = np.load('network_matrix.npy')

# TODO: Perform Matrix Multiplication (Dot Product) of 'adj' with itself to find 2-stop routes.
two_stop_routes = ...

print("2-Stop Route Matrix:\n", two_stop_routes)
# Look at the printed matrix. How many ways are there from Port 0 to Port 3?

In [None]:
# --- TASK 9: VISUAL INTELLIGENCE (MATPLOTLIB) ---
plt.figure(figsize=(12, 5))

# Plot 1: Scatter Plot (Speed vs CII_Rating)
plt.subplot(1, 2, 1)
# TODO: Create an array of colors: 'red' if Is_Offender is True, 'blue' if False using np.where.
point_colors = ...
# TODO: Plot a scatter plot using Speed_Knots as X, CII_Rating as Y, and point_colors for the c= parameter.
plt.scatter(...)
plt.title('Speed vs Efficiency')
plt.xlabel('Speed (Knots)')
plt.ylabel('CII Rating')

# Plot 2: Histogram of Financial Waste
plt.subplot(1, 2, 2)
# TODO: Create a histogram of the 'Financial_Waste' column with 30 bins.
plt.hist(...)
plt.title('Financial Waste Distribution')
plt.xlabel('Waste (USD)')

plt.tight_layout()
plt.show()