# 🚚 Smart Logistics: IoT Data Simulation

This notebook simulates **IoT-generated data** for a **smart logistics tracking system**, inspired by how platforms like **Shopee**, **Lazada**, and **Lalamove** monitor their deliveries in the Philippines.

The simulation captures real-world shipment behavior by generating random, yet realistic, sensor data and delivery event details. The goal is to understand how logistics systems track environmental conditions and shipment events to maintain product quality and delivery performance.

---

## 📦 Objectives

✅ Simulate realistic shipment tracking and delivery records using Python.  
✅ Understand how **IoT sensors** monitor environmental data like temperature and humidity.  
✅ Include business-relevant details such as delay detection, alert flags, and shipment statuses.  
✅ Generate datasets for use in **visualization**, **analytics**, and potential smart contract or blockchain integrations.

In [4]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import random
import os
import json

Create a folder to store the output data

In [5]:
os.makedirs("../data", exist_ok=True)

Each shipment record includes:

- **Timestamps**:
  - `timestamp`: when the data was recorded
  - `pickup_datetime`: when the item was picked up
  - `estimated_arrival`: when the item was expected to arrive
  - `actual_arrival`: when the item actually arrived

- **Identifiers**:
  - `shipment_id`: a unique tracking ID
  - `order_id`: the customer's original order number

- **Logistics Routing**:
  - `origin_hub`: where the item was dispatched from (e.g., Cebu Hub, QC Hub)
  - `destination_city`: delivery city (e.g., Taguig, Baguio)

- **Status & Delivery**:
  - `status`: current status of the shipment (e.g., Delivered, Delayed)
  - `logistics_delay`: flag if actual arrival is later than estimated
  - `logistics_delay_reason`: reason for delay (if any)
  - `customer_notified`: whether the customer has been updated

- **Environmental Monitoring (IoT Simulation)**:
  - `temperature_c`: recorded temperature in °C (cold-chain packages)
  - `humidity_percent`: humidity level in %
  - `temperature_alert`: flagged if outside acceptable range (2.5–7.5°C)
  - `humidity_alert`: flagged if outside acceptable range (55–75%)

- **Package Details**:
  - `package_type`: type of item (e.g., Pouch, Small Box, Frozen Pack)
  - `delivery_rating`: customer rating after delivery 

In [9]:
num_records = 100
data = []

for _ in range(num_records):
    # Generate timestamps
    timestamp = datetime.now() - timedelta(minutes=np.random.randint(0, 1440))
    pickup_datetime = timestamp - timedelta(hours=np.random.randint(1, 6))
    estimated_arrival = timestamp + timedelta(hours=np.random.randint(24, 49))
    actual_arrival = estimated_arrival + timedelta(minutes=np.random.randint(-60, 120))

    # Simulated IoT sensor readings
    temperature_c = round(np.random.uniform(2.0, 8.0), 2)
    humidity_percent = round(np.random.uniform(50.0, 80.0), 2)

    # Alert logic using native bool
    temperature_alert = bool(temperature_c < 2.5 or temperature_c > 7.5)
    humidity_alert = bool(humidity_percent < 55 or humidity_percent > 75)

    # Delay detection
    logistics_delay = actual_arrival > estimated_arrival
    logistics_delay_reason = (
        np.random.choice(["Heavy Traffic", "Flooded Roads", "Mechanical Issue", "Checkpoint Delay"])
        if logistics_delay else "None"
    )

    record = {
        "timestamp": timestamp.strftime("%Y-%m-%d %H:%M:%S"),
        "pickup_datetime": pickup_datetime.strftime("%Y-%m-%d %H:%M:%S"),
        "estimated_arrival": estimated_arrival.strftime("%Y-%m-%d %H:%M:%S"),
        "actual_arrival": actual_arrival.strftime("%Y-%m-%d %H:%M:%S"),
        "shipment_id": f"SHIP{np.random.randint(1000, 9999)}",
        "order_id": f"ORD{np.random.randint(100000, 999999)}",
        "origin_hub": np.random.choice(["Cebu Hub", "QC Hub", "Laguna Warehouse", "Davao Center"]),
        "destination_city": np.random.choice(["Taguig", "Cebu City", "Baguio", "Davao", "Naga", "Iloilo"]),
        "status": np.random.choice(["In Transit", "Delivered", "Delayed", "Out for Delivery"]),
        "temperature_c": temperature_c,
        "humidity_percent": humidity_percent,
        "temperature_alert": temperature_alert,
        "humidity_alert": humidity_alert,
        "package_type": np.random.choice(["Pouch", "Small Box", "Medium Box", "Large Box", "Envelope", "Frozen Pack"]),
        "logistics_delay": bool(logistics_delay),
        "logistics_delay_reason": logistics_delay_reason,
        "delivery_rating": random.choice([None, 3, 4, 5]),
        "customer_notified": bool(np.random.choice([True, False]))
    }

    data.append(record)

# Convert to DataFrame
df = pd.DataFrame(data)

We’ll export the dataset in two formats:
- CSV for spreadsheet tools
- JSON for API and web usage

In [10]:
# Save to CSV and JSON
df.to_csv("../data/logistics_data.csv", index=False)

import json

with open("../data/logistics_data.json", "w") as json_file:
    json.dump(data, json_file, indent=4)

In [11]:
df.head()

Unnamed: 0,timestamp,pickup_datetime,estimated_arrival,actual_arrival,shipment_id,order_id,origin_hub,destination_city,status,temperature_c,humidity_percent,temperature_alert,humidity_alert,package_type,logistics_delay,logistics_delay_reason,delivery_rating,customer_notified
0,2025-05-18 15:46:54,2025-05-18 11:46:54,2025-05-20 14:46:54,2025-05-20 15:28:54,SHIP7734,ORD171327,Davao Center,Naga,Delivered,2.23,53.55,True,True,Medium Box,True,Flooded Roads,5.0,False
1,2025-05-18 09:41:54,2025-05-18 06:41:54,2025-05-19 09:41:54,2025-05-19 10:57:54,SHIP4309,ORD162097,QC Hub,Taguig,Delivered,6.28,57.19,False,False,Envelope,True,Flooded Roads,5.0,False
2,2025-05-18 13:45:54,2025-05-18 10:45:54,2025-05-19 20:45:54,2025-05-19 20:15:54,SHIP1802,ORD279145,Davao Center,Baguio,Out for Delivery,6.14,71.31,False,False,Envelope,False,,3.0,False
3,2025-05-18 05:57:54,2025-05-18 01:57:54,2025-05-19 06:57:54,2025-05-19 07:38:54,SHIP9967,ORD542396,Cebu Hub,Iloilo,In Transit,3.09,70.9,False,False,Pouch,True,Heavy Traffic,4.0,False
4,2025-05-18 01:22:54,2025-05-17 22:22:54,2025-05-19 16:22:54,2025-05-19 18:07:54,SHIP3088,ORD752230,QC Hub,Baguio,Out for Delivery,4.69,61.32,False,False,Medium Box,True,Heavy Traffic,3.0,False


In [11]:
# To count the number of json

import glob

# Count the number of CSV files in the ../data/ directory
csv_files = glob.glob("../data/logistics_data.csv")
with open("../data/logistics_data.csv", "r") as file:
    line_count = sum(1 for line in file)

print(f"Total number of lines (including header): {line_count}")



Total number of lines (including header): 101


In [10]:
import json
import glob

# Count number of CSV files (optional)
csv_files = glob.glob("../data/*.csv")
print(f"Number of CSV files: {len(csv_files)}")

# Count number of records (data entries) in the JSON file
with open("../data/logistics_data.json", "r") as file:
    data = json.load(file)

print(f"Total number of data entries in JSON: {len(data)}")


Number of CSV files: 1
Total number of data entries in JSON: 100


In [17]:
# Show CSV Data Preview

import glob
import pandas as pd

# Count the number of CSV files in the ../data/ directory
csv_files = glob.glob("../data/*.csv")
print(f"Number of CSV files: {len(csv_files)}")

# Read CSV file
csv_path = "../data/logistics_data.csv"
df = pd.read_csv(csv_path)

# Count rows including header (rows + 1)
with open(csv_path, "r") as file:
    total_lines = sum(1 for line in file)

print(f"Total number of lines in '{csv_path}' (including header): {total_lines}")
print(f"Total number of data rows (excluding header): {len(df)}")

# Display the entire dataset
print("Entire CSV Data:")
print(df.to_string(index=False))  # Removes row numbers from output


Number of CSV files: 1
Total number of lines in '../data/logistics_data.csv' (including header): 101
Total number of data rows (excluding header): 100
Entire CSV Data:
          timestamp     pickup_datetime   estimated_arrival      actual_arrival shipment_id  order_id       origin_hub destination_city           status  temperature_c  humidity_percent  temperature_alert  humidity_alert package_type  logistics_delay logistics_delay_reason  delivery_rating  customer_notified
2025-05-18 15:46:54 2025-05-18 11:46:54 2025-05-20 14:46:54 2025-05-20 15:28:54    SHIP7734 ORD171327     Davao Center             Naga        Delivered           2.23             53.55               True            True   Medium Box             True          Flooded Roads              5.0              False
2025-05-18 09:41:54 2025-05-18 06:41:54 2025-05-19 09:41:54 2025-05-19 10:57:54    SHIP4309 ORD162097           QC Hub           Taguig        Delivered           6.28             57.19              False        

This dataset simulates a smart logistics system using IoT sensor inputs and shipment tracking logic.
You can now use this data for:

• Visualization (e.g., delays, alert rates)

• Machine learning experiments

• Blockchain or smart contract storage

• Business analysis of delivery performance and IoT alerting