# Ontology #3: Supply Chain    
    

## Data generation approach

### Main Classes    
    
1. **Supplier**      
2. **Manufacturer**      
3. **Warehouse**      
4. **Retailer**      
5. **Product**      
6. **Order**      
7. **Shipment**      
8. **Invoice**      
   - Additional relationships link: Supplier→Manufacturer, Manufacturer→Product, Warehouse→Product, Orders reference Retailers & Warehouses, etc.    
    
### Generation Order & Rationale    
    
1. **Supplier**      
   - Independent. Later, we create a link “suppliesTo → Manufacturer.”    
    
2. **Manufacturer**      
   - Also a top-level entity. We link it back to suppliers.    
    
3. **Warehouse**      
   - Another independent entity. We link it to the products it stores.    
    
4. **Retailer**      
   - Also fairly standalone. Will place orders to a warehouse or manufacturer.    
    
5. **Product**      
   - Created after Manufacturer (since the code might store a link from manufacturer→product), or at least simultaneously.      
   - Typically, we do Manufacturer first, then Product, so we can set “manufactures -> Product.”    
    
6. **Linking**: Supplier→Manufacturer, Manufacturer→Product, Warehouse→Product      
   - In code, we often do these “relationship builds” after the main lists are created.    
    
7. **Order**      
   - References a `RetailerID` and either a `WarehouseID` or a `ManufacturerID` as the seller.      
   - Hence we must have Retailers, Warehouses, and Manufacturers ready first.    
    
8. **Shipment**      
   - References the `OrderID`, plus a `shipperID` (Warehouse or Manufacturer).      
   - Therefore we create shipments after we have Orders in place.    
    
9. **Invoice**      
   - References the same parties as the order (billedBy = seller, billedTo = retailer).      
   - So it must come last, after we know which orders are valid and what the order amounts might be.    

## Implementation

In [1]:
import random
import pandas as pd
from faker import Faker
from datetime import datetime

In [2]:
data_path = "./data/"

In [3]:
fake = Faker()

In [4]:
# Configurable quantities
NUM_SUPPLIERS = 30
NUM_MANUFACTURERS = 20
NUM_WAREHOUSES = 40
NUM_RETAILERS = 50
NUM_PRODUCTS = 250
NUM_ORDERS = 1500
NUM_SHIPMENTS = 1200
NUM_INVOICES = 800

# Seed (optional)
# random.seed(42)  # For reproducibility, uncomment these lines
# Faker.seed(42)

In [5]:
# 1. Generate Suppliers
suppliers = []
for i in range(NUM_SUPPLIERS):
    suppliers.append({
        "id": f"supplier_{i}",
        "supplierName": fake.company(),
        "location": fake.city(),
        "rating": round(random.uniform(2.0, 5.0), 2)  # rating out of 5
    })

In [6]:
# 2. Generate Manufacturers
manufacturers = []
for i in range(NUM_MANUFACTURERS):
    manufacturers.append({
        "id": f"manufacturer_{i}",
        "manufacturerName": fake.company() + " Manufacturing",
        "location": fake.city(),
        "capacity": random.randint(1000, 10000)  # monthly capacity
    })

In [7]:
# 3. Generate Warehouses (Distribution Centers)
warehouses = []
for i in range(NUM_WAREHOUSES):
    warehouses.append({
        "id": f"warehouse_{i}",
        "warehouseName": fake.company() + " Distribution",
        "location": fake.city(),
        "capacity": random.randint(5000, 20000)
    })

In [8]:
# 4. Generate Retailers
retailers = []
retailer_types = ["Online", "Brick-and-mortar", "Mixed"]
for i in range(NUM_RETAILERS):
    retailers.append({
        "id": f"retailer_{i}",
        "retailerName": fake.company() + " Retail",
        "location": fake.city(),
        "retailerType": random.choice(retailer_types)
    })

In [9]:
# 5. Generate Products
products = []
product_types = ["RawMaterial", "Component", "FinishedGood"]
for i in range(NUM_PRODUCTS):
    ptype = random.choice(product_types)
    products.append({
        "id": f"product_{i}",
        "productName": fake.catch_phrase(),
        "sku": f"SKU-{i}-{random.randint(100, 999)}",
        "productType": ptype,
        "unitPrice": round(random.uniform(1.0, 200.0), 2)
    })

In [10]:
# 6. Link Supplier -> Manufacturer
#    We'll say each supplier "suppliesTo" a few random manufacturers
for s in suppliers:
    num_mf = random.randint(1, 3)  # each supplier might supply 1-3 manufacturers
    s["manufacturerIDs"] = [m["id"] for m in random.sample(manufacturers, k=num_mf)]

# 7. Link Manufacturer -> Product (they "manufacture" finished goods, maybe some components)
for m in manufacturers:
    # Suppose each manufacturer makes 5-15 products, but only "Component" or "FinishedGood"
    possible_products = [p for p in products if p["productType"] in ["Component", "FinishedGood"]]
    num_prod = random.randint(5, 15)
    m["productIDs"] = [prod["id"] for prod in random.sample(possible_products, k=num_prod)]

# 8. Link Warehouse -> Product (they "store" finished goods or components)
for w in warehouses:
    # Each warehouse might store 10-20 random products (excluding raw materials maybe)
    possible_products = [p for p in products if p["productType"] != "RawMaterial"]
    num_prod = random.randint(10, 20)
    w["productIDs"] = [prod["id"] for prod in random.sample(possible_products, k=num_prod)]

In [11]:
# 9. Generate Orders (Retailer -> Warehouse or Manufacturer)
orders = []
for i in range(NUM_ORDERS):
    order_id = f"order_{i}"
    # randomly choose if ordering from a Warehouse or a Manufacturer
    if random.random() < 0.7:
        seller = random.choice(warehouses)
        sellerType = "warehouse"
        sellerID = seller["id"]
    else:
        seller = random.choice(manufacturers)
        sellerType = "manufacturer"
        sellerID = seller["id"]

    # pick a random retailer
    buyer = random.choice(retailers)

    # each order will have 1-5 different products
    possible_products = []
    if sellerType == "warehouse":
        # use the warehouse's product list
        possible_products = seller["productIDs"]
    else:
        # use the manufacturer's product list
        possible_products = seller["productIDs"]

    if not possible_products:
        # fallback if no productIDs for some reason
        possible_products = [p["id"] for p in products if p["productType"] != "RawMaterial"]

    num_line_items = random.randint(1, min(5, len(possible_products)))
    line_items = random.sample(possible_products, k=num_line_items)

    # compute a total amount
    total_amount = 0.0
    for pid in line_items:
        product_obj = next((x for x in products if x["id"] == pid), None)
        if product_obj:
            # random quantity 1-50
            qty = random.randint(1, 50)
            total_amount += (product_obj["unitPrice"] * qty)

    order_data = {
        "id": order_id,
        "orderNumber": f"ON-{i}-{random.randint(1000, 9999)}",
        "orderDate": fake.date_between(start_date='-2y', end_date='today').isoformat(),
        "status": random.choice(["Pending", "Shipped", "Delivered", "Cancelled"]),
        "totalAmount": round(total_amount, 2),
        "sellerType": sellerType,
        "sellerID": sellerID,
        "retailerID": buyer["id"],
        "productIDs": line_items,  # simplistic: no separate qty, but you could store it
    }
    orders.append(order_data)

In [12]:
# 10. Generate Shipments (each shipment -> one of the orders; hasShipper -> warehouse)
shipments = []
available_warehouses = warehouses[:]  # to pick from
for i in range(NUM_SHIPMENTS):
    shipment_id = f"shipment_{i}"
    # pick a random order that is not "Cancelled"
    valid_orders = [o for o in orders if o["status"] != "Cancelled"]
    if not valid_orders:
        break  # if all are cancelled, no shipments
    order_obj = random.choice(valid_orders)

    # only a warehouse can ship an order in this simplified model
    if order_obj["sellerType"] == "warehouse":
        shipperID = order_obj["sellerID"]
    else:
        # if the seller is a manufacturer, let's say they can ship directly too
        shipperID = order_obj["sellerID"]
    # The error occurred in the original notebook here.
    # fake.date_between() requires datetime.date objects as input, not strings!
    ship_date = fake.date_between(start_date=datetime.fromisoformat(order_obj["orderDate"]), end_date=datetime.today())
    shipments.append({
        "id": shipment_id,
        "shipmentID": f"SHIP-{i}-{random.randint(1000, 9999)}",
        "shipDate": ship_date.isoformat(),
        "carrier": random.choice(["UPS", "FedEx", "DHL", "USPS"]),
        "trackingNumber": f"TRK-{random.randint(1000000, 9999999)}",
        "orderID": order_obj["id"],
        "shipperID": shipperID
    })

In [13]:
# 11. Generate Invoices (Warehouse or Manufacturer -> Retailer)
invoices = []
for i in range(NUM_INVOICES):
    invoice_id = f"invoice_{i}"
    # pick a random order to invoice
    # prefer "Shipped" or "Delivered" orders
    valid_orders = [o for o in orders if o["status"] in ["Shipped", "Delivered"]]
    if not valid_orders:
        break

    order_obj = random.choice(valid_orders)
   # Corrected here too, for the same reason as the shipment date error.
    invoice_date = fake.date_between(start_date=datetime.fromisoformat(order_obj["orderDate"]), end_date=datetime.today())

    amount = order_obj["totalAmount"] * (1 + random.uniform(0.0, 0.2))  # add shipping or tax

    invoices.append({
        "id": invoice_id,
        "invoiceNumber": f"INV-{i}-{random.randint(1000, 9999)}",
        "invoiceDate": invoice_date.isoformat(),
        "amountDue": round(amount, 2),
        "dueDate": fake.date_between(start_date=invoice_date, end_date='+30d').isoformat(),
        # who billed it? same as the seller in the order
        "billedByID": order_obj["sellerID"],
        "billedByType": order_obj["sellerType"],
        "billedToID": order_obj["retailerID"]
    })

In [14]:
# Summary: Show counts
print("Generated:")
print(" Suppliers:", len(suppliers))
print(" Manufacturers:", len(manufacturers))
print(" Warehouses:", len(warehouses))
print(" Retailers:", len(retailers))
print(" Products:", len(products))
print(" Orders:", len(orders))
print(" Shipments:", len(shipments))
print(" Invoices:", len(invoices))

Generated:
 Suppliers: 30
 Manufacturers: 20
 Warehouses: 40
 Retailers: 50
 Products: 250
 Orders: 1500
 Shipments: 1200
 Invoices: 800


In [15]:
# Print a few sample records
print("\nSample Supplier:", suppliers[0])
print("Sample Manufacturer:", manufacturers[0])
print("Sample Warehouse:", warehouses[0])
print("Sample Retailer:", retailers[0])
print("Sample Product:", products[0])
print("Sample Order:", orders[0])
print("Sample Shipment:", shipments[0] if shipments else "No shipments generated.")
print("Sample Invoice:", invoices[0] if invoices else "No invoices generated.")


Sample Supplier: {'id': 'supplier_0', 'supplierName': 'Lee Ltd', 'location': 'East Annette', 'rating': 4.13, 'manufacturerIDs': ['manufacturer_1']}
Sample Manufacturer: {'id': 'manufacturer_0', 'manufacturerName': 'Elliott and Sons Manufacturing', 'location': 'Taylorchester', 'capacity': 8902, 'productIDs': ['product_65', 'product_208', 'product_202', 'product_38', 'product_58', 'product_189', 'product_191', 'product_32', 'product_249', 'product_197', 'product_147', 'product_215', 'product_70']}
Sample Warehouse: {'id': 'warehouse_0', 'warehouseName': 'Ross-Jones Distribution', 'location': 'Kyleview', 'capacity': 8892, 'productIDs': ['product_246', 'product_2', 'product_97', 'product_238', 'product_169', 'product_227', 'product_188', 'product_214', 'product_103', 'product_176', 'product_151', 'product_133', 'product_190', 'product_8', 'product_126']}
Sample Retailer: {'id': 'retailer_0', 'retailerName': 'Smith, Neal and Nunez Retail', 'location': 'Wattshaven', 'retailerType': 'Online'

In [16]:
# persist the data, use " as the escape char
pd.DataFrame(suppliers).to_csv(data_path+"suppliers.csv", encoding = "utf-8", escapechar = "\"", index=False)
pd.DataFrame(manufacturers).to_csv(data_path+"manufacturers.csv", encoding = "utf-8", escapechar = "\"", index=False)
pd.DataFrame(warehouses).to_csv(data_path+"warehouses.csv", encoding = "utf-8", escapechar = "\"", index=False)
pd.DataFrame(retailers).to_csv(data_path+"retailers.csv", encoding = "utf-8", escapechar = "\"", index=False)
pd.DataFrame(products).to_csv(data_path+"products.csv", encoding = "utf-8", escapechar = "\"", index=False)
pd.DataFrame(orders).to_csv(data_path+"orders.csv", encoding = "utf-8", escapechar = "\"", index=False)
pd.DataFrame(shipments).to_csv(data_path+"shipments.csv", encoding = "utf-8", escapechar = "\"", index=False)
pd.DataFrame(shipments).to_csv(data_path+"shipments.csv", encoding = "utf-8", escapechar = "\"", index=False)