# 🎯 SEMA VOC Analysis - Simple Google Colab Interface

This notebook provides a simple interface for Korean Voice of Customer (VOC) sentiment analysis.

## How to use:
1. Run the setup cell
2. Run the main processing cell
3. Upload your Excel files when prompted
4. Download the results

## Step 1: Setup Environment

In [ ]:
# Install required packages and setup environment
print("🔧 Setting up environment...")

# Install system dependencies
!apt-get update -qq
!apt-get install -y openjdk-8-jdk -qq

# Set Java environment for Korean language processing
import os
os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-8-openjdk-amd64'

print("✅ Java installed")

# Install Python packages (except konlpy)
!pip install -q huggingface_hub torch transformers torchmetrics lightning

# Install and test KoNLPy separately
print("📦 Installing KoNLPy...")
!pip install konlpy -q

# Test KoNLPy installation
try:
    from konlpy.tag import Kkma
    kkma = Kkma()
    test_result = kkma.morphs("테스트")
    print("✅ KoNLPy working correctly")
except Exception as e:
    print(f"⚠️ KoNLPy test failed: {e}")
    print("Retrying installation...")
    !pip install --upgrade konlpy -q
    from konlpy.tag import Kkma
    kkma = Kkma()
    print("✅ KoNLPy working after retry")

print("✅ Python packages installed")

# Clone repository
!git clone -q https://github.com/your-username/sema_inf.git
%cd sema_inf
!pip install -q -e .

print("✅ Repository cloned and installed")
print("🎉 Setup complete! Ready to process VOC data.")

## Step 2: Initialize SEMA Processor

In [None]:
# Import and initialize the SEMA CLI
from colab_cli import SemaColabCLI

print("🚀 Initializing SEMA VOC Analysis...")
print("This will download the AI model and data files (may take a few minutes)")

# Initialize the processor
sema = SemaColabCLI()

print("✅ SEMA processor ready!")

## Step 3: Upload and Process Files

In [None]:
# Upload your Excel files
print("📤 Please upload your Excel files:")
print("Your files should have VOC1 and VOC2 columns with Korean text")

uploaded_files = sema.upload_files()

if uploaded_files:
    print(f"\n🔄 Processing {len(uploaded_files)} files...")
    print("This may take several minutes depending on file size...")
    
    # Process all uploaded files
    success_count = sema.process_all_files()
    
    if success_count > 0:
        print(f"\n🎉 Successfully processed {success_count} files!")
        print("📥 Downloading results...")
        sema.download_results()
        print("\n✅ All done! Check your downloads folder for the results.")
    else:
        print("❌ No files were successfully processed. Please check your input files.")
else:
    print("❌ No files uploaded. Please run this cell again and select files.")

## Step 4: Check Status (Optional)

In [None]:
# Check processing status
import os
import torch

input_files = [f for f in os.listdir('data/input') if f.endswith('.xlsx')]
output_files = [f for f in os.listdir('data/output') if f.endswith('.xlsx')]

print("📊 Processing Status:")
print(f"📁 Input files: {len(input_files)}")
for f in input_files:
    print(f"   - {f}")

print(f"\n📁 Output files: {len(output_files)}")
for f in output_files:
    print(f"   - {f}")

print(f"\n🖥️ GPU available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

## Step 5: Download Results Again (If Needed)

In [None]:
# Download results again if needed
sema.download_results()

## Interactive Mode (Advanced)

In [None]:
# Run interactive CLI mode
sema.run_interactive()

---

## 📋 Instructions for Your Clients

### Input File Format
Your Excel files should have these columns:
- **VOC1**: First voice of customer text (Korean)
- **VOC2**: Second voice of customer text (Korean) [optional]
- Other columns will be preserved in the output

### Output File Format
The processed files will have additional columns:
- **VOC**: Cleaned text
- **pred**: Predicted labels
- **topic**: Extracted topic
- **sentiment**: Sentiment analysis result
- **keyword**: Extracted keywords

### Tips
- 🚀 **GPU**: Make sure to use GPU runtime (Runtime → Change runtime type → GPU)
- 📊 **File Size**: Larger files will take longer to process
- 💾 **Memory**: If you get memory errors, restart runtime and try again
- 📱 **Multiple Files**: You can upload and process multiple files at once

### Troubleshooting
- If setup fails, restart runtime and run cells again
- If processing fails, check that your Excel files have VOC1/VOC2 columns
- If download fails, run the download cell again
