# 🎯 SEMA VOC Analysis - Client Interface

## Korean Voice of Customer Sentiment Analysis

### Instructions:
1. **Run the Setup cell below** (one time only)
2. **Run the Processing cell** 
3. **Upload your Excel files** when prompted
4. **Download results** automatically

### File Requirements:
- Excel files (.xlsx) with **VOC1** and **VOC2** columns
- Korean text in the VOC columns
- No special formatting required

---

## 🔧 Step 1: Setup (Run Once)

**Important**: Make sure to set your runtime to **GPU** first!
- Go to: **Runtime** → **Change runtime type** → **GPU**

In [ ]:
# ===== SETUP ENVIRONMENT =====
print("🔧 Setting up SEMA VOC Analysis environment...")
print("This may take 2-3 minutes on first run.")

# Install system dependencies
!apt-get update -qq
!apt-get install -y openjdk-8-jdk -qq

# Set Java environment for Korean language processing
import os
os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-8-openjdk-amd64'
print("✅ Java installed")

# Install Python packages with compatible versions
!pip install -q "huggingface_hub>=0.16.0" "torch>=2.0.0" "transformers>=4.30.0,<5.0.0" "torchmetrics>=0.11.0" "lightning>=2.0.0"

# Install and test KoNLPy for Korean text processing
print("📦 Installing Korean language processor...")
!pip install konlpy -q

# Test Korean language processing
try:
    from konlpy.tag import Kkma
    kkma = Kkma()
    test_result = kkma.morphs("테스트")
    print("✅ Korean language processor working")
except Exception as e:
    print(f"⚠️ Retrying Korean language setup: {e}")
    !pip install --upgrade konlpy -q
    from konlpy.tag import Kkma
    kkma = Kkma()
    print("✅ Korean language processor ready")

print("✅ Python packages installed")

# Clone the SEMA repository
!git clone -q https://github.com/shc443/sema_inf.git
%cd sema_inf
!pip install -q -e .

print("✅ Repository installed")
print("🎉 Setup complete! Ready to process VOC data.")

# Check GPU availability
import torch
if torch.cuda.is_available():
    print(f"🚀 GPU available: {torch.cuda.get_device_name(0)}")
else:
    print("⚠️ GPU not available - please change runtime to GPU for faster processing")

## 📤 Step 2: Process Your Files

Run this cell to upload and process your Excel files:

In [None]:
# ===== PROCESS VOC FILES =====
from colab_cli import SemaColabCLI

print("🚀 Initializing SEMA VOC Analysis...")
print("Downloading AI model (this may take a few minutes on first run)...")

# Initialize the processor (auto-downloads model from HuggingFace)
sema = SemaColabCLI()

print("\n📤 Please upload your Excel files:")
print("Your files should have VOC1 and VOC2 columns with Korean text")
print("You can select multiple files at once.")

# Upload files
uploaded_files = sema.upload_files()

if uploaded_files:
    print(f"\n🔄 Processing {len(uploaded_files)} files...")
    print("⏳ This may take several minutes depending on file size...")
    
    # Process all uploaded files
    success_count = sema.process_all_files()
    
    if success_count > 0:
        print(f"\n🎉 Successfully processed {success_count} files!")
        print("📥 Downloading results...")
        sema.download_results()
        print("\n✅ COMPLETE! Check your downloads folder for the results.")
        print("\n📋 Output files contain:")
        print("   - VOC: Cleaned text")
        print("   - topic: Extracted topic")
        print("   - sentiment: Sentiment analysis")
        print("   - keyword: Extracted keywords")
    else:
        print("❌ Processing failed. Please check your input files have VOC1/VOC2 columns.")
else:
    print("❌ No files uploaded. Please run this cell again and select files.")

## 📊 Step 3: Check Status (Optional)

Run this to see processing status:

In [None]:
# ===== CHECK STATUS =====
import os
import torch

input_files = [f for f in os.listdir('data/input') if f.endswith('.xlsx')]
output_files = [f for f in os.listdir('data/output') if f.endswith('.xlsx')]

print("📊 Processing Status:")
print(f"📁 Input files: {len(input_files)}")
for f in input_files:
    print(f"   - {f}")

print(f"\n📁 Output files: {len(output_files)}")
for f in output_files:
    print(f"   - {f}")

print(f"\n🖥️ GPU available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

## 📥 Step 4: Download Results Again (If Needed)

If you need to download the results again:

In [None]:
# ===== DOWNLOAD RESULTS AGAIN =====
sema.download_results()

---

## 🆘 Troubleshooting

### Common Issues:

1. **"No GPU available"**
   - Go to Runtime → Change runtime type → GPU
   - Restart runtime and run setup again

2. **"Processing failed"**
   - Check that your Excel files have VOC1 and VOC2 columns
   - Make sure the columns contain Korean text

3. **"Memory error"**
   - Your file might be too large
   - Try processing smaller files first
   - Restart runtime and try again

4. **"Setup failed"**
   - Restart runtime (Runtime → Restart runtime)
   - Run the setup cell again

### Need Help?
- Make sure you've set runtime to GPU
- Try with a smaller test file first
- Restart runtime if you encounter errors

---

## 📋 File Format Guide

### Input Files Should Have:
- **VOC1**: Korean customer feedback text
- **VOC2**: Additional Korean feedback text (optional)
- Other columns will be preserved in output

### Output Files Will Have:
- **All original columns**
- **VOC**: Cleaned and processed text
- **pred**: AI predictions
- **topic**: Extracted topic categories
- **sentiment**: Sentiment analysis (positive/negative)
- **keyword**: Key words extracted from text

---

*Powered by SEMA AI - Korean VOC Analysis System*