# ðŸš• Taxi Tip Predictor - Google Colab Setup

This notebook sets up the environment for running the Taxi Tip Predictor on Google Colab with GPU support.

## Step 1: Change Runtime to GPU

1. Go to **Runtime** â†’ **Change runtime type**
2. Set **Hardware accelerator** to **T4 GPU**
3. Click **Save**

## Step 2: Install Dependencies

In [None]:
# Install required packages
!pip install xgboost scikit-learn matplotlib seaborn tqdm pyyaml -q

## Step 3: Install RAPIDS cuDF (GPU-accelerated Pandas)

Note: RAPIDS installation on Colab can be complex. For simplicity, we'll use pandas with CPU fallback.
For full GPU acceleration, you may need to install RAPIDS manually.

In [None]:
# Try to install RAPIDS (optional, may not work on all Colab versions)
# Uncomment if you want to try GPU-accelerated data processing
# !bash <(curl -s https://raw.githubusercontent.com/rapidsai/rapidsai-csp-utils/colab/rapids-colab.sh)
# import sys, os
# sys.path.append('/usr/local/lib/python3.10/site-packages/')

# For now, we'll use pandas (CPU) - still works great!
print("Using pandas for data processing (CPU mode)")

## Step 4: Upload Project Files

Upload all Python files from the project to Colab, or clone from GitHub if available.

In [None]:
# Create project structure
!mkdir -p data models results

## Step 5: Upload Dataset

Upload your NYC Taxi dataset CSV file to the `data/` folder.

For Colab, use a smaller subset (5 million rows) to avoid memory issues.

In [None]:
# Update config for Colab
import config
config.USE_CUDF = False  # Use pandas instead of cuDF
config.SAMPLE_SIZE = 5_000_000  # Sample 5M rows for Colab
config.XGBOOST_PARAMS['device'] = 'cuda'  # Use GPU for XGBoost
config.XGBOOST_PARAMS['tree_method'] = 'hist'  # GPU hist method

print("Config updated for Colab environment")

## Step 6: Verify GPU Availability

In [None]:
# Check GPU
!nvidia-smi

In [None]:
# Verify XGBoost GPU support
import xgboost as xgb
print(f"XGBoost version: {xgb.__version__}")

# Test GPU availability
try:
    import cupy as cp
    print(f"CUDA available: {cp.cuda.is_available()}")
except:
    print("CuPy not available, but XGBoost can still use GPU")

## Step 7: Run Training

Now you're ready to run the main training script!

In [None]:
# Run the training pipeline
from main import main
main()