# 🛠️ Adventurer Mart: ML Data Preparation - Part 1

## 📦 1. Data Loading & Preview

This notebook handles the initial data loading and preview phase of the ML data preparation pipeline.

### 🎯 Objectives
- Load and explore the database structure
- Preview data tables and understand basic structure
- Export cleaned DataFrames for next phase

### 🗂️ Dataset Overview
Working with `adventurer_mart.db` - a fantasy e-commerce database containing customer, product, and sales information.

In [1]:
# Import Required Libraries
import sqlite3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
import warnings
import pickle
import os

warnings.filterwarnings('ignore')

# Set display options for better output
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

print("📦 Libraries imported successfully!")
print(f"🐼 Pandas version: {pd.__version__}")
print(f"📊 NumPy version: {np.__version__}")

# Create data directory for intermediate files
os.makedirs('data_intermediate', exist_ok=True)
print("📁 Created data_intermediate directory for file persistence")

📦 Libraries imported successfully!
🐼 Pandas version: 2.2.2
📊 NumPy version: 1.26.4
📁 Created data_intermediate directory for file persistence


In [2]:
# Connect to database and discover tables
db_path = "adventurer_mart.db"

# Function to get all table names
def get_table_names(db_path):
    conn = sqlite3.connect(db_path)
    cursor = conn.cursor()
    cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
    tables = cursor.fetchall()
    conn.close()
    return [table[0] for table in tables]

# Get all available tables
table_names = get_table_names(db_path)
print("🗂️ Available tables in the database:")
for i, table in enumerate(table_names, 1):
    print(f"   {i}. {table}")

print(f"\n📊 Total tables found: {len(table_names)}")

🗂️ Available tables in the database:
   1. details_adventure_gear
   2. details_magic_items
   3. details_weapons
   4. details_armor
   5. details_potions
   6. details_poisons
   7. all_products
   8. customers
   9. sales

📊 Total tables found: 9


In [3]:
# Load all tables into DataFrames
def load_table(db_path, table_name):
    """Load a table from SQLite database into a pandas DataFrame"""
    conn = sqlite3.connect(db_path)
    df = pd.read_sql_query(f"SELECT * FROM {table_name}", conn)
    conn.close()
    return df

# Load all tables into a dictionary of DataFrames
dataframes = {}
for table in table_names:
    dataframes[table] = load_table(db_path, table)
    print(f"✅ Loaded {table}: {dataframes[table].shape[0]} rows, {dataframes[table].shape[1]} columns")

print(f"\n🎯 Successfully loaded {len(dataframes)} tables into DataFrames")

✅ Loaded details_adventure_gear: 106 rows, 6 columns
✅ Loaded details_magic_items: 199 rows, 6 columns
✅ Loaded details_weapons: 37 rows, 8 columns
✅ Loaded details_armor: 13 rows, 9 columns
✅ Loaded details_potions: 22 rows, 5 columns
✅ Loaded details_poisons: 16 rows, 6 columns
✅ Loaded all_products: 393 rows, 4 columns
✅ Loaded customers: 1423 rows, 6 columns
✅ Loaded sales: 57915 rows, 7 columns

🎯 Successfully loaded 9 tables into DataFrames


In [4]:
# Preview the first few rows of each table
print("👀 PREVIEWING FIRST FEW ROWS OF EACH TABLE")
print("=" * 60)

for table_name, df in dataframes.items():
    print(f"\n📋 Table: {table_name}")
    print("-" * 40)
    print(df.head())
    print(f"Shape: {df.shape}")
    print("\n" + "="*60)

👀 PREVIEWING FIRST FEW ROWS OF EACH TABLE

📋 Table: details_adventure_gear
----------------------------------------
  item_id                      name  price weight    category            type
0  01-Ars                    Abacus   2 gp  2 lb.      Others  adventure_gear
1  02-Ars               Acid (vial)  25 gp  1 lb.      Others  adventure_gear
2  03-Ars  Alchemist's Fire (flask)  50 gp  1 lb.      Others  adventure_gear
3  04-Aon               Arrows (20)   1 gp  1 lb.  Ammunition  adventure_gear
4  05-Bon       Blowgun Needle (50)   1 gp  1 lb.  Ammunition  adventure_gear
Shape: (106, 6)


📋 Table: details_magic_items
----------------------------------------
   item_id                 name     price    rarity          category  \
0  001-ACo  Ammunition +1 (Per)     15 gp  Uncommon  Consumable Items   
1  002-ACo  Ammunition +2 (Per)     50 gp      Rare  Consumable Items   
2  005-BCo        Bead of Force  1,000 gp      Rare  Consumable Items   
3  006-CCo     Chime of Opening    4

In [5]:
# Check dimensions (df.shape) and data types (df.dtypes)
print("📊 DIMENSIONS AND DATA TYPES ANALYSIS")
print("=" * 60)

for table_name, df in dataframes.items():
    print(f"\n📋 Table: {table_name}")
    print("-" * 40)
    print(f"🔢 Dimensions: {df.shape[0]} rows × {df.shape[1]} columns")
    print(f"💾 Memory usage: {df.memory_usage(deep=True).sum() / 1024:.2f} KB")
    print(f"\n📝 Data Types:")
    
    # Group data types for cleaner display
    dtype_counts = df.dtypes.value_counts()
    for dtype, count in dtype_counts.items():
        print(f"   {dtype}: {count} columns")
    
    print("\n🔍 Column Details:")
    for col in df.columns:
        print(f"   {col}: {df[col].dtype}")
    
    print("\n" + "="*60)

📊 DIMENSIONS AND DATA TYPES ANALYSIS

📋 Table: details_adventure_gear
----------------------------------------
🔢 Dimensions: 106 rows × 6 columns
💾 Memory usage: 35.49 KB

📝 Data Types:
   object: 6 columns

🔍 Column Details:
   item_id: object
   name: object
   price: object
   weight: object
   category: object
   type: object


📋 Table: details_magic_items
----------------------------------------
🔢 Dimensions: 199 rows × 6 columns
💾 Memory usage: 69.67 KB

📝 Data Types:
   object: 6 columns

🔍 Column Details:
   item_id: object
   name: object
   price: object
   rarity: object
   category: object
   type: object


📋 Table: details_weapons
----------------------------------------
🔢 Dimensions: 37 rows × 8 columns
💾 Memory usage: 17.22 KB

📝 Data Types:
   object: 8 columns

🔍 Column Details:
   item_id: object
   name: object
   price: object
   damage: object
   weight: object
   properties: object
   category: object
   type: object


📋 Table: details_armor
----------------------

In [6]:
# Get basic info about each table
print("ℹ️ BASIC INFORMATION SUMMARY")
print("=" * 60)

table_info = {}
for table_name, df in dataframes.items():
    info = {
        'rows': df.shape[0],
        'columns': df.shape[1],
        'memory_kb': df.memory_usage(deep=True).sum() / 1024,
        'dtypes': dict(df.dtypes.value_counts()),
        'null_counts': df.isnull().sum().sum()
    }
    table_info[table_name] = info
    
    print(f"\n📋 {table_name}:")
    print(f"   📏 Size: {info['rows']:,} rows × {info['columns']} columns")
    print(f"   💾 Memory: {info['memory_kb']:.2f} KB")
    print(f"   🕳️ Null values: {info['null_counts']:,}")

print(f"\n📊 Total dataset summary:")
total_rows = sum(info['rows'] for info in table_info.values())
total_cols = sum(info['columns'] for info in table_info.values())
total_memory = sum(info['memory_kb'] for info in table_info.values())
total_nulls = sum(info['null_counts'] for info in table_info.values())

print(f"   📏 Total rows: {total_rows:,}")
print(f"   📊 Total columns: {total_cols}")
print(f"   💾 Total memory: {total_memory:.2f} KB")
print(f"   🕳️ Total null values: {total_nulls:,}")

ℹ️ BASIC INFORMATION SUMMARY

📋 details_adventure_gear:
   📏 Size: 106 rows × 6 columns
   💾 Memory: 35.49 KB
   🕳️ Null values: 1

📋 details_magic_items:
   📏 Size: 199 rows × 6 columns
   💾 Memory: 69.67 KB
   🕳️ Null values: 0

📋 details_weapons:
   📏 Size: 37 rows × 8 columns
   💾 Memory: 17.22 KB
   🕳️ Null values: 0

📋 details_armor:
   📏 Size: 13 rows × 9 columns
   💾 Memory: 6.06 KB
   🕳️ Null values: 17

📋 details_potions:
   📏 Size: 22 rows × 5 columns
   💾 Memory: 6.37 KB
   🕳️ Null values: 0

📋 details_poisons:
   📏 Size: 16 rows × 6 columns
   💾 Memory: 4.68 KB
   🕳️ Null values: 1

📋 all_products:
   📏 Size: 393 rows × 4 columns
   💾 Memory: 90.08 KB
   🕳️ Null values: 0

📋 customers:
   📏 Size: 1,423 rows × 6 columns
   💾 Memory: 416.95 KB
   🕳️ Null values: 0

📋 sales:
   📏 Size: 57,915 rows × 7 columns
   💾 Memory: 21148.70 KB
   🕳️ Null values: 455

📊 Total dataset summary:
   📏 Total rows: 60,124
   📊 Total columns: 57
   💾 Total memory: 21795.22 KB
   🕳️ Total null 

In [7]:
# Export data for next phase
print("💾 EXPORTING DATA FOR NEXT PHASE")
print("=" * 50)

# Save the dataframes dictionary as pickle file
with open('data_intermediate/01_dataframes.pkl', 'wb') as f:
    pickle.dump(dataframes, f)
print("✅ Saved dataframes to data_intermediate/01_dataframes.pkl")

# Save table information for reference
with open('data_intermediate/01_table_info.pkl', 'wb') as f:
    pickle.dump(table_info, f)
print("✅ Saved table info to data_intermediate/01_table_info.pkl")

# Save table names for reference
with open('data_intermediate/01_table_names.pkl', 'wb') as f:
    pickle.dump(table_names, f)
print("✅ Saved table names to data_intermediate/01_table_names.pkl")

print(f"\n🎯 DATA LOADING PHASE COMPLETE!")
print(f"   📊 Loaded {len(dataframes)} tables with {total_rows:,} total rows")
print(f"   📁 Data exported for Phase 2: EDA Analysis")
print(f"   ➡️ Next: Run 02_exploratory_data_analysis.ipynb")

💾 EXPORTING DATA FOR NEXT PHASE
✅ Saved dataframes to data_intermediate/01_dataframes.pkl
✅ Saved table info to data_intermediate/01_table_info.pkl
✅ Saved table names to data_intermediate/01_table_names.pkl

🎯 DATA LOADING PHASE COMPLETE!
   📊 Loaded 9 tables with 60,124 total rows
   📁 Data exported for Phase 2: EDA Analysis
   ➡️ Next: Run 02_exploratory_data_analysis.ipynb


## 🎉 Phase 1 Complete!

**What we accomplished:**
- ✅ Connected to SQLite database
- ✅ Discovered and loaded all tables
- ✅ Previewed data structure and basic information
- ✅ Analyzed data types and memory usage
- ✅ Exported data for next phase

**Next Steps:**
- Run `02_exploratory_data_analysis.ipynb` for comprehensive EDA

**Data Files Created:**
- `data_intermediate/01_dataframes.pkl` - All loaded DataFrames
- `data_intermediate/01_table_info.pkl` - Table metadata
- `data_intermediate/01_table_names.pkl` - Table names list