## Advanced Consistency Check with Hierarchical Data

**Description**: You have two datasets `orders.csv` and `order_items.csv` . Perform a consistency check to ensure each order in `orders.csv` has corresponding items in `order_items.csv` .

In [1]:
# Write your code from here
import pandas as pd
import numpy as np

# ----------------------------
# Simulate Data (Orders and Order Items)
# ----------------------------
# Orders dataset (orders.csv)
orders_data = {
    'order_id': [101, 102, 103, 104, 105],
    'customer_id': ['C001', 'C002', 'C003', 'C004', 'C005'],
    'order_date': ['2025-05-01', '2025-05-02', '2025-05-03', '2025-05-04', '2025-05-05']
}
orders_df = pd.DataFrame(orders_data)

# Order Items dataset (order_items.csv)
order_items_data = {
    'order_id': [101, 102, 103, 106, 107],  # Note: orders 106 and 107 are inconsistent
    'product_id': ['P001', 'P002', 'P003', 'P004', 'P005'],
    'quantity': [2, 1, 3, 2, 1]
}
order_items_df = pd.DataFrame(order_items_data)

# ----------------------------
# Perform Consistency Check
# ----------------------------

# Check if all order_ids in orders_df exist in order_items_df
missing_orders = orders_df[~orders_df['order_id'].isin(order_items_df['order_id'])]

# Check if all order_ids in order_items_df exist in orders_df
extra_order_items = order_items_df[~order_items_df['order_id'].isin(orders_df['order_id'])]

# Results
if not missing_orders.empty:
    print("Orders with missing items in order_items.csv:")
    print(missing_orders)
else:
    print("All orders in orders.csv have corresponding items in order_items.csv.")

if not extra_order_items.empty:
    print("\nOrder items with no corresponding order in orders.csv:")
    print(extra_order_items)
else:
    print("All order items in order_items.csv have corresponding orders in orders.csv.")

# ----------------------------
# Optional: Combine Data to Validate Consistency (Visual Check)
# ----------------------------

# Merge the two DataFrames on order_id to visually check any inconsistencies
merged_df = pd.merge(orders_df, order_items_df, on='order_id', how='left')

print("\nMerged DataFrame to visually check consistency:")
print(merged_df)

Orders with missing items in order_items.csv:
   order_id customer_id  order_date
3       104        C004  2025-05-04
4       105        C005  2025-05-05

Order items with no corresponding order in orders.csv:
   order_id product_id  quantity
3       106       P004         2
4       107       P005         1

Merged DataFrame to visually check consistency:
   order_id customer_id  order_date product_id  quantity
0       101        C001  2025-05-01       P001       2.0
1       102        C002  2025-05-02       P002       1.0
2       103        C003  2025-05-03       P003       3.0
3       104        C004  2025-05-04        NaN       NaN
4       105        C005  2025-05-05        NaN       NaN
