# Schema Analysis of Supply Chain Tables

This notebook documents the schema for tables from the S3 bucket `altdatasetexfil/dnet/backfill/`.

## Available Tables

1. `gmp_shipment_events_na`
2. `induct_events_na`
3. `o_slam_packages_leg_live`
4. `o_slam_packages_live`
5. `package_systems_event_na`

Each table schema is detailed below.

## 1. gmp_shipment_events_na

This table contains shipment tracking events data.

```sql
CREATE TABLE gmp_shipment_events_na (
  shipment_type VARCHAR,
  sender_id VARCHAR,
  standard_carrier_alpha_code VARCHAR,
  tracking_id VARCHAR,
  old_tracking_id VARCHAR,
  sub_tracking_id VARCHAR,
  ship_track_event_code VARCHAR,
  ship_track_normalized_carrier_name VARCHAR,
  supplement_code VARCHAR,
  pick_up_date TIMESTAMP,
  pick_up_by_date TIMESTAMP,
  tcda_container_id VARCHAR,
  parent_container_id VARCHAR,
  parent_container_type VARCHAR,
  available_payment_methods VARCHAR,
  dimension_uom VARCHAR,
  width DECIMAL,
  length DECIMAL,
  height DECIMAL,
  pickup_location_name VARCHAR,
  pickup_location_id VARCHAR,
  pickup_id VARCHAR,
  pickup_due_date TIMESTAMP,
  pickup_due_date_timezone VARCHAR,
  pickup_address_city VARCHAR,
  pickup_address_state VARCHAR,
  pickup_address_country_code VARCHAR,
  pickup_address_id VARCHAR,
  pickup_location_directions VARCHAR,
  pickup_location_open_hrs VARCHAR,
  actual_tax_deducted VARCHAR,
  clearance_id VARCHAR,
  declare_number VARCHAR,
  return_code VARCHAR,
  delivery_location_code VARCHAR,
  status_node_id VARCHAR,
  load_id VARCHAR,
  store_chain_store_id VARCHAR,
  store_chain_owner_id VARCHAR,
  recipient_address_city VARCHAR,
  recipient_address_state VARCHAR,
  recipient_address_country_code VARCHAR,
  recipient_address_id VARCHAR,
  customer_receipt_token VARCHAR,
  status_date TIMESTAMP,
  status_date_timezone VARCHAR,
  carrier_url VARCHAR,
  current_address_city VARCHAR,
  current_address_state VARCHAR,
  current_address_country_code VARCHAR,
  current_address_id VARCHAR,
  edi_standard_name VARCHAR,
  current_facility_id VARCHAR,
  status_code VARCHAR,
  reason_code VARCHAR,
  status_classification VARCHAR,
  estimated_pickup_date TIMESTAMP,
  is_status_duplicate VARCHAR,
  is_carrier_api_scan VARCHAR,
  fulfillment_reference_id VARCHAR,
  marketplace_id VARCHAR,
  amazon_bar_code VARCHAR,
  fulfillment_shipment_id VARCHAR,
  unnormalized_ship_method VARCHAR,
  estimated_arrival_date TIMESTAMP,
  promised_arrival_date TIMESTAMP,
  tss_ship_date TIMESTAMP,
  package_id VARCHAR,
  business_unit VARCHAR,
  destination_address_city VARCHAR,
  destination_address_state VARCHAR,
  destination_address_country_code VARCHAR,
  destination_address_id VARCHAR,
  fulfillment_center_id VARCHAR,
  customer_id VARCHAR,
  order_id VARCHAR,
  business_transit_time VARCHAR,
  ship_option VARCHAR,
  ship_method VARCHAR,
  actual_delivery_date TIMESTAMP,
  attempted_delivery_date TIMESTAMP,
  is_export_charge_prepaid VARCHAR,
  is_virtual_scan VARCHAR,
  access_point_id VARCHAR,
  seller_id VARCHAR,
  transport_shipment_id VARCHAR,
  bill_of_lading_number VARCHAR,
  destination_fc VARCHAR,
  amazon_reference_number VARCHAR,
  additional_reference_number VARCHAR,
  tender_id VARCHAR,
  pallet_quantity VARCHAR,
  carton_quantity VARCHAR,
  original_label_name VARCHAR,
  trailer_number VARCHAR,
  status_file_name VARCHAR,
  status_ref_target_type VARCHAR,
  predicted_delivery_date TIMESTAMP,
  estimated_delivery_date TIMESTAMP,
  posting_date TIMESTAMP,
  transport_shipment_items VARCHAR,
  origin_address_city VARCHAR,
  origin_address_state VARCHAR,
  origin_address_country_code VARCHAR,
  origin_address_id VARCHAR,
  sub_carrier VARCHAR,
  comments VARCHAR,
  appt_window_start_date TIMESTAMP,
  appt_window_end_date TIMESTAMP,
  manifest_id VARCHAR,
  service_type VARCHAR,
  dw_created_time TIMESTAMP,
  unloaded_at TIMESTAMP
);
```

### Key Fields
- `tracking_id`: Primary tracking identifier for shipments
- `ship_track_event_code`: Type of tracking event
- `status_code`: Current status of the shipment
- `status_date`: Timestamp when the status was recorded
- `package_id`: Package identifier

## 2. induct_events_na

This table contains package induction events data.

```sql
CREATE TABLE induct_events_na (
  entity_id VARCHAR,
  entity_id_type VARCHAR,
  entity_type VARCHAR,
  event_type VARCHAR,
  event_reason VARCHAR,
  destination_id VARCHAR,
  destination_type VARCHAR,
  operation_node_id VARCHAR,
  operator_id VARCHAR,
  operator_type VARCHAR,
  event_time TIMESTAMP,
  is_routing_update_req VARCHAR,
  is_conveyed VARCHAR,
  chute_id VARCHAR,
  route_id VARCHAR,
  cycle_id VARCHAR,
  route_code VARCHAR,
  package_category VARCHAR,
  dw_created_time TIMESTAMP,
  tracking_id VARCHAR,
  node_id VARCHAR,
  label_type VARCHAR,
  delivery_assist_marker VARCHAR,
  sort_location VARCHAR,
  cycle_name VARCHAR,
  unloaded_at TIMESTAMP
);
```

### Key Fields
- `entity_id`: The primary identifier for the inducted entity
- `event_type`: Type of induction event
- `event_time`: When the induction event occurred
- `destination_id`: Target destination for the package
- `tracking_id`: Package tracking identifier

## 3. o_slam_packages_leg_live

This table contains details about individual legs of package shipments.

```sql
CREATE TABLE o_slam_packages_leg_live (
  row_id INTEGER,
  slam_leg_pk VARCHAR,
  region_id INTEGER,
  request_id VARCHAR,
  shipment_id INTEGER,
  package_id INTEGER,
  route_id VARCHAR,
  route_warehouse_id VARCHAR,
  route_ship_method VARCHAR,
  route_internal_sort_code VARCHAR,
  route_external_sort_code VARCHAR,
  processing_date TIMESTAMP,
  leg_sequence_id DECIMAL,
  leg_id VARCHAR,
  leg_warehouse_id VARCHAR,
  leg_ship_method VARCHAR,
  leg_internal_sort_code VARCHAR,
  leg_external_sort_code VARCHAR,
  leg_destination_warehouse_id VARCHAR,
  ship_option VARCHAR,
  pickup_date TIMESTAMP,
  estimated_arrival_date TIMESTAMP,
  transit_time_in_hours DECIMAL,
  zone VARCHAR,
  ship_cost DECIMAL,
  ship_cost_uom VARCHAR,
  ranking_cost DECIMAL,
  ranking_cost_uom VARCHAR,
  pickup_troe_offset DECIMAL,
  pickup_troe_offset_uom VARCHAR,
  delivery_troe_offset DECIMAL,
  delivery_troe_offset_uom VARCHAR,
  bill_weight DECIMAL,
  bill_weight_uom VARCHAR,
  dim_weight DECIMAL,
  dim_weight_uom VARCHAR,
  girth DECIMAL,
  girth_calculation_method VARCHAR,
  request_timestamp TIMESTAMP,
  request_date TIMESTAMP,
  sunk_cost DECIMAL,
  sunk_cost_uom VARCHAR,
  coincidence_discount DECIMAL,
  coincidence_discount_uom VARCHAR,
  dw_creation_date TIMESTAMP,
  unloaded_at TIMESTAMP
);
```

### Key Fields
- `slam_leg_pk`: Primary key for the leg record
- `shipment_id`: ID of the shipment containing this leg
- `package_id`: ID of the package
- `leg_id`: Identifier for this specific leg
- `leg_sequence_id`: Sequence number of the leg within the route
- `transit_time_in_hours`: Expected transit time for this leg

## 4. o_slam_packages_live

This table contains details about package shipments.

```sql
CREATE TABLE o_slam_packages_live (
  row_id INTEGER,
  slam_pk VARCHAR,
  region_id INTEGER,
  request_id VARCHAR,
  shipment_id INTEGER,
  address_id VARCHAR,
  package_id INTEGER,
  route_id VARCHAR,
  warehouse_id VARCHAR,
  route_delivery_group VARCHAR,
  ship_method VARCHAR,
  ship_option VARCHAR,
  carrier_name VARCHAR,
  zone VARCHAR,
  ship_cost DECIMAL,
  ship_cost_uom VARCHAR,
  ranking_cost DECIMAL,
  ranking_cost_uom VARCHAR,
  internal_sort_code VARCHAR,
  external_sort_code VARCHAR,
  processing_date TIMESTAMP,
  pickup_date TIMESTAMP,
  estimated_arrival_date TIMESTAMP,
  transit_time_in_hours DECIMAL,
  promised_arrival_date TIMESTAMP,
  promised_ship_date TIMESTAMP,
  unit_count INTEGER,
  package_value DECIMAL,
  package_value_uom VARCHAR,
  cod_balance DECIMAL,
  cod_balance_uom VARCHAR,
  girth DECIMAL,
  girth_calculation_method VARCHAR,
  length DECIMAL,
  length_uom VARCHAR,
  width DECIMAL,
  width_uom VARCHAR,
  height DECIMAL,
  height_uom VARCHAR,
  scale_weight DECIMAL,
  scale_weight_uom VARCHAR,
  dim_weight DECIMAL,
  dim_weight_uom VARCHAR,
  bill_weight DECIMAL,
  bill_weight_uom VARCHAR,
  is_soft_capped VARCHAR,
  destination_city VARCHAR,
  destination_state VARCHAR,
  destination_district VARCHAR,
  destination_country_code VARCHAR,
  destination_postal_code VARCHAR,
  delivery_group VARCHAR,
  request_timestamp TIMESTAMP,
  control_group_cost DECIMAL,
  control_group_cost_uom VARCHAR,
  is_slam_capped VARCHAR,
  external_estimated_arrival_date TIMESTAMP,
  loaded_at TIMESTAMP,
  unloaded_at TIMESTAMP
);
```

### Key Fields
- `slam_pk`: Primary key for the package record
- `shipment_id`: ID of the shipment
- `package_id`: ID of the package
- `route_id`: ID of the shipping route
- `ship_method`: Method of shipment
- `pickup_date`: When the package was picked up
- `estimated_arrival_date`: Expected arrival date
- `promised_arrival_date`: Committed arrival date

## 5. package_systems_event_na

This table contains package event data from operational systems.

```sql
CREATE TABLE package_systems_event_na (
  type VARCHAR,
  package_id VARCHAR,
  package_id_type VARCHAR,
  forward_amazon_barcode VARCHAR,
  forward_tracking_id VARCHAR,
  forward_tcda_container_id VARCHAR,
  state_location_type VARCHAR,
  state_location_id VARCHAR,
  state_location_destination_id VARCHAR,
  state_location_source_id VARCHAR,
  state_status VARCHAR,
  state_time TIMESTAMP,
  triggerer_id VARCHAR,
  triggerer_id_type VARCHAR,
  dw_created_time TIMESTAMP,
  state_sub_status VARCHAR,
  comp_type VARCHAR,
  comp_reason VARCHAR,
  comp_state VARCHAR,
  reverse_amazon_barcode VARCHAR,
  reverse_tcda_container_id VARCHAR,
  reverse_tracking_id VARCHAR,
  execution_id VARCHAR,
  execution_id_type VARCHAR,
  state_location_destination_type VARCHAR,
  state_location_source_type VARCHAR,
  unloaded_at TIMESTAMP
);
```

### Key Fields
- `package_id`: Primary identifier for the package
- `state_status`: Current status of the package
- `state_time`: Timestamp of the status event
- `forward_tracking_id`: Forward tracking number
- `state_location_id`: Location where the event occurred

## Relationships Between Tables

The tables are related through several common fields:

1. Package Identification:
   - `gmp_shipment_events_na.package_id` relates to `o_slam_packages_live.package_id`
   - `o_slam_packages_leg_live.package_id` relates to `o_slam_packages_live.package_id`
   - `package_systems_event_na.package_id` relates to all package IDs in other tables

2. Tracking Information:
   - `gmp_shipment_events_na.tracking_id` relates to `induct_events_na.tracking_id`
   - `gmp_shipment_events_na.tracking_id` relates to `package_systems_event_na.forward_tracking_id`

3. Routing Information:
   - `o_slam_packages_leg_live.route_id` relates to `o_slam_packages_live.route_id`
   - `o_slam_packages_leg_live.route_id` relates to `induct_events_na.route_id`

4. Time-based Information:
   - All tables contain timestamp fields that can be used for temporal analysis
   - `unloaded_at` field exists in all tables

## Sample Usage

To load and analyze these tables with pandas and pyarrow:

In [None]:
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

# Example: Loading a sample from one of the tables
file_path = "/local/home/admsia/parquet_analysis/tables/sample_table.parquet"

# Using PyArrow to read the schema first
schema = pq.read_schema(file_path)
print("Table Schema:")
for field in schema:
    print(f"  {field.name}: {field.type}")

# Load a small number of rows for analysis
df = pd.read_parquet(file_path, engine="pyarrow")
print(f"\nSample data ({len(df)} rows):")
df.head()

## Accessing Data from S3

These tables are available in the S3 bucket at `s3://altdatasetexfil/dnet/backfill/` with the following partitioning structure:

```
s3://altdatasetexfil/dnet/backfill/<table_name>/partition_date=YYYY-MM-DD 00:00:00/
```

For example:
```
s3://altdatasetexfil/dnet/backfill/gmp_shipment_events_na/partition_date=2025-06-02 00:00:00/0000_part_00.parquet
```