## Advanced Consistency Check with Hierarchical Data

**Description**: You have two datasets `orders.csv` and `order_items.csv` . Perform a consistency check to ensure each order in `orders.csv` has corresponding items in `order_items.csv` .

In [1]:
import pandas as pd

# Simulated orders data
df_orders = pd.DataFrame({
    'order_id': [1001, 1002, 1003, 1004]
})

# Simulated order_items data
df_order_items = pd.DataFrame({
    'order_id': [1002, 1003, 1005]  # 1005 is extra; 1001 and 1004 are missing
})

# Ensure data types are numeric
df_orders['order_id'] = pd.to_numeric(df_orders['order_id'], errors='coerce')
df_order_items['order_id'] = pd.to_numeric(df_order_items['order_id'], errors='coerce')

# Drop any invalid order_ids
df_orders.dropna(subset=['order_id'], inplace=True)
df_order_items.dropna(subset=['order_id'], inplace=True)

# Convert to sets for comparison
order_ids = set(df_orders['order_id'])
item_order_ids = set(df_order_items['order_id'])

# Identify missing and extra order_ids
missing_in_items = order_ids - item_order_ids
extra_in_items = item_order_ids - order_ids

# Output the results
print(f"Orders missing in order_items: {missing_in_items}")
print(f"Extra orders in order_items: {extra_in_items}")

Orders missing in order_items: {1001, 1004}
Extra orders in order_items: {1005}
