# MapleStory Item Price Prediction - Google Colab

This notebook runs the complete pipeline for training a price prediction model.

## Setup Instructions

1. **Upload your project files**: Upload the entire `src/` directory and `requirements.txt` to Colab
2. **Upload your data**: Upload raw data (`.jsonl`) or preprocessed data (`.parquet`) files
3. **Run all cells**: Execute cells in order

## Data Options
- **Option 1**: Upload raw `.jsonl` files (will be preprocessed)
- **Option 2**: Upload preprocessed `.parquet` files (faster, skips preprocessing)
- **Option 3**: Connect to database (requires credentials)


## Step 1: Install Dependencies


In [None]:
# Install required packages
%pip install -q joblib>=1.5.2 numpy>=2.3.4 pandas>=2.3.3 pymysql>=1.1.2 python-dotenv>=1.2.1 scikit-learn>=1.7.2 pyarrow>=18.0.0 lightgbm>=4.0.0

print("✓ Dependencies installed successfully!")


## Step 2: Setup Project Structure


In [None]:
import os
import sys

# Create necessary directories
os.makedirs('data/processed', exist_ok=True)
os.makedirs('data/raw', exist_ok=True)
os.makedirs('models', exist_ok=True)

# Add current directory to path
sys.path.insert(0, os.getcwd())

print("✓ Project structure created!")


## Step 3: Upload Data Files

Choose one of the following options:


### Option A: Upload Raw Data Files (.jsonl)


In [None]:
from google.colab import files
import shutil

# Upload raw data files
print("Please upload your .jsonl data files:")
uploaded = files.upload()

# Move uploaded files to data/raw directory
for filename in uploaded.keys():
    if filename.endswith('.jsonl'):
        shutil.move(filename, f'data/raw/{filename}')
        print(f"✓ Moved {filename} to data/raw/")
    else:
        print(f"⚠ Skipping {filename} (not a .jsonl file)")

print("\n✓ File upload complete!")


### Option B: Upload Preprocessed Data (.parquet)


In [None]:
from google.colab import files
import shutil

# Upload preprocessed data files
print("Please upload your .parquet data files:")
uploaded = files.upload()

# Move uploaded files to data/processed directory
for filename in uploaded.keys():
    if filename.endswith('.parquet'):
        shutil.move(filename, f'data/processed/{filename}')
        print(f"✓ Moved {filename} to data/processed/")
    else:
        print(f"⚠ Skipping {filename} (not a .parquet file)")

print("\n✓ File upload complete!")


### Option C: Mount Google Drive (Alternative)


In [None]:
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Example: Copy files from Drive
# !cp /content/drive/MyDrive/path/to/your/data.jsonl /content/data/raw/

print("✓ Google Drive mounted!")


## Step 4: Upload Project Source Files

Upload the `src/` directory. You can zip it first and upload, then unzip:


In [None]:
from google.colab import files
import zipfile

# Upload zip file containing src/ directory
print("Please upload a zip file containing the src/ directory:")
uploaded = files.upload()

# Extract zip file
for filename in uploaded.keys():
    if filename.endswith('.zip'):
        with zipfile.ZipFile(filename, 'r') as zip_ref:
            zip_ref.extractall('.')
        print(f"✓ Extracted {filename}")
        os.remove(filename)  # Clean up zip file
    else:
        print(f"⚠ Skipping {filename} (not a zip file)")

print("\n✓ Project files extracted!")


## Step 5: Configure Data Source

Set the path to your data file. Modify the variables below based on your upload:


In [None]:
# Configuration
USE_PREPROCESSED = False  # Set to True if you uploaded preprocessed .parquet files
JSONL_FILE_PATH = 'data/raw/raw_data.jsonl'  # Path to your raw JSONL file
PREPROCESSED_FILE_PATH = 'data/processed/preprocessed_data.parquet'  # Path to preprocessed file
DATA_LIMIT = None  # Set to a number (e.g., 10000) to limit rows, or None for all data

# Check what files are available
import os

print("Available files:")
if os.path.exists('data/raw'):
    raw_files = [f for f in os.listdir('data/raw') if f.endswith('.jsonl')]
    print(f"  Raw files: {raw_files}")
    if raw_files:
        JSONL_FILE_PATH = f"data/raw/{raw_files[0]}"
        print(f"  → Using: {JSONL_FILE_PATH}")

if os.path.exists('data/processed'):
    processed_files = [f for f in os.listdir('data/processed') if f.endswith('.parquet')]
    print(f"  Preprocessed files: {processed_files}")
    if processed_files:
        PREPROCESSED_FILE_PATH = f"data/processed/{processed_files[0]}"
        print(f"  → Using: {PREPROCESSED_FILE_PATH}")
        USE_PREPROCESSED = True

print(f"\nConfiguration:")
print(f"  Use preprocessed: {USE_PREPROCESSED}")
print(f"  Data limit: {DATA_LIMIT}")


## Step 6: Run Pipeline

Choose your model type and run the training pipeline:


In [None]:
# Import pipeline
from src.pipeline import run_pipeline
import pandas as pd

# Model configuration
MODEL_TYPE = 'lightgbm'  # Options: 'random_forest', 'gradient_boosting', 'lightgbm'
TEST_SIZE = 0.2
VAL_SIZE = 0.1

# Run pipeline
if USE_PREPROCESSED:
    # Use preprocessed data
    print(f"Using preprocessed data from: {PREPROCESSED_FILE_PATH}")
    results = run_pipeline(
        data_limit=DATA_LIMIT,
        model_type=MODEL_TYPE,
        test_size=TEST_SIZE,
        val_size=VAL_SIZE,
        preprocessed_data_path=PREPROCESSED_FILE_PATH,
        save_processed=True
    )
else:
    # Load from JSONL file
    print(f"Loading from JSONL file: {JSONL_FILE_PATH}")
    results = run_pipeline(
        data_limit=DATA_LIMIT,
        model_type=MODEL_TYPE,
        test_size=TEST_SIZE,
        val_size=VAL_SIZE,
        jsonl_path=JSONL_FILE_PATH,
        save_processed=True
    )

print("\n✓ Pipeline completed successfully!")


## Step 7: Download Results

Download your trained models and results:


In [None]:
from google.colab import files
import zipfile
import os

# Create a zip file with models and results
output_zip = 'maple_meso_results.zip'

with zipfile.ZipFile(output_zip, 'w', zipfile.ZIP_DEFLATED) as zipf:
    # Add models directory
    if os.path.exists('models'):
        for root, dirs, files_list in os.walk('models'):
            for file in files_list:
                file_path = os.path.join(root, file)
                zipf.write(file_path, os.path.relpath(file_path, '.'))
    
    # Add reports
    for report_file in ['TRAINING_REPORT.md', 'TRAINING_REPORT_lightgbm.md', 'MODEL_COMPARISON.md']:
        if os.path.exists(report_file):
            zipf.write(report_file)

print(f"✓ Created {output_zip}")
files.download(output_zip)
print("\n✓ Download started!")


## Optional: Make Predictions

Use the trained model to make predictions on new data:


In [None]:
from src.predict import predict_price

# Example prediction
example_item = {
    "name": "앱솔랩스 메이지크라운",
    "item_id": 1004423,
    "star_force": 22,
    "potential_grade": 4,
    "additional_grade": 4,
    "payload_json": {
        "detail_json": "{}",  # Add your detail_json here
        "summary_json": "{}",  # Add your summary_json here
    }
}

# Make prediction
result = predict_price(example_item)
print(f"Predicted price: {result['predicted_price_formatted']} 메소")
print(f"Confidence: {result.get('confidence', 'N/A')}")


## Troubleshooting

### Import Errors
- Make sure you uploaded the `src/` directory
- Check that all files are in the correct location

### File Not Found
- Verify file paths match your uploaded files
- Check the file names in `data/raw/` or `data/processed/` directories

### Memory Issues
- Reduce `DATA_LIMIT` to process smaller batches
- Use preprocessed data instead of raw data
- Consider using Colab Pro for more RAM

### Database Connection (if needed)
If you need to connect to a database, set environment variables:
```python
import os
os.environ['MAPLE_DB_HOST'] = 'your-host'
os.environ['MAPLE_DB_USER'] = 'your-user'
os.environ['MAPLE_DB_PASSWORD'] = 'your-password'
os.environ['MAPLE_DB_NAME'] = 'your-database'
```
