# LEGO Instruction Manual Analysis - Data Loading

This notebook demonstrates the process of loading and validating LEGO instruction manual data into our database.

## Project Overview
- **Goal**: Develop a machine learning model to identify LEGO pieces in instruction manuals
- **Current Phase**: Data Infrastructure (Sprint 1)
- **Task**: Set up database tables and load initial data

In [1]:
import os
from dotenv import load_dotenv
from pathlib import Path
from src.data_loader import LegoDataLoader

# Load environment variables
load_dotenv()

# Database connection string
db_url = os.getenv('DATABASE_URL')

In [2]:
db_url

'sqlite:///C:\\Users\\idany\\OneDrive\\Documents\\Projects\\BrickMapper\\data\\brickmapper.db'

## Initialize Data Loader
Create an instance of our data loader class and set up the database tables.

In [3]:
# Initialize loader
loader = LegoDataLoader(db_url)

# Create tables
loader.create_tables()

2025-01-09 00:23:37,653 - INFO - Created tables and indexes


## Load Data
Load the step and element data from our CSV files.

In [4]:
# Define paths to CSV files
data_dir = Path('../data')
steps_csv = data_dir / 'set_steps.csv'
elements_csv = data_dir / 'step_elements.csv'

# Load data
loader.load_data(steps_csv, elements_csv)

2025-01-05 21:22:28,174 - INFO - Successfully loaded 0 steps and 0 elements


## Verify Data Loading
Let's check that our data was loaded correctly by querying the new tables.

In [6]:
import pandas as pd
from sqlalchemy import create_engine

# Create engine
engine = create_engine(db_url)

# Query and display sample data
with engine.connect() as conn:
    # Check set_steps table
    steps_query = "SELECT * FROM set_steps LIMIT 5"
    steps_sample = pd.read_sql(steps_query, conn)
    
    # Check step_elements table
    elements_query = "SELECT * FROM step_elements LIMIT 5"
    elements_sample = pd.read_sql(elements_query, conn)

print("Sample from set_steps table:")
display(steps_sample)

print("\nSample from step_elements table:")
display(elements_sample)

Sample from set_steps table:


Unnamed: 0,inventory_id,booklet_number,step_number,page_number



Sample from step_elements table:


Unnamed: 0,inventory_id,booklet_number,step_number,element_id,quantity
