# Generate Odoo Dataset with AI Descriptions (Kaggle Edition)

This notebook runs a background job to generating an Odoo 19.0 dataset with AI-enhanced descriptions using a local LLM (Llama 3 via Ollama).

### **Prerequisites (Before Running)**
1. **Internet Access**: Enable "Internet" in the Notebook Settings (Right Panel).
2. **GPU Accelerator**: Select **GPU T4 x2** (or GPU P100) in the Notebook Settings. 
   * *Do NOT use TPU.*

### **How to Run for Vacation (Background Mode)**
1. Click **"Save Version"** (Top Right).
2. Select **"Save & Run All (Commit)"**.
3. Click **Save**.

This will start a background session that runs for up to **12 hours**. You can close the browser and check back later.

In [None]:
# Step 1: Install Dependencies & Setup Ollama
# We install Ollama (AI Engine) and Python libraries

!curl -fsSL https://ollama.com/install.sh | sh
!pip install pandas openpyxl requests tqdm openai

In [None]:
# Step 2: Start Ollama Server (Background Process)
import subprocess
import time

print("Starting Ollama Server...")
process = subprocess.Popen(["ollama", "serve"], stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
time.sleep(10)  # Give it time to boot up

print("Pulling Llama 3 model (this may take a few minutes)...")
!ollama pull llama3

In [None]:
# Step 3: Clone Resources
# 1. Clone Odoo 19.0 Source Code
print("Cloning Odoo 19.0 source code...")
!git clone --depth 1 --branch 19.0 https://github.com/odoo/odoo.git odoo-19.0

# 2. Clone the Generation Script from your Repo
print("Cloning generation script...")
!rm -rf odoo_dataset_creation_technical # Cleanup if exists
!git clone https://github.com/midhlaj-nk/odoo_dataset_creation_technical.git odoo_dataset_creation_technical

In [None]:
# Step 4: Run the Generation Script
# This is the main process that will run for hours.

import os

# Define Paths
working_dir = "/kaggle/working"
script_path = f"{working_dir}/odoo_dataset_creation_technical/generate_odoo_dataset.py"
odoo_source = f"{working_dir}/odoo-19.0"
output_file = f"{working_dir}/odoo_19_full_dataset_ai.xlsx"

print("Starting Dataset Generation...")
print(f"Script: {script_path}")
print(f"Source: {odoo_source}")
print(f"Output: {output_file}")

# Run python script
# We use 'llama3' as the model. Remove '--ollama-model llama3' if you want CPU-only docstring extraction.
!python3 "$script_path" "$odoo_source" "$output_file" --ollama-model llama3

print("DONE! Check the Output section of this notebook version to download your file.")