# 🎯 SEMA VOC Analysis

Korean Voice of Customer Sentiment Analysis

## Instructions:
1. Set runtime to **GPU** (Runtime → Change runtime type → GPU)
2. Run Setup cell (required after restarts)
3. Upload Excel files to input folder
4. **Click ONE button** to process all files and download results

## File Requirements:
- Excel files with **VOC1** and **VOC2** columns
- Korean text content

## 🔧 Setup (Run after restarts)

In [None]:
print("🔧 Setting up SEMA environment...")

# Install system dependencies
!apt-get update -qq && apt-get install -y openjdk-8-jdk -qq

# Set Java environment
import os
os.environ['JAVA_HOME'] = '/usr/lib/jvm/java-8-openjdk-amd64'

# Install packages (including psutil for monitoring)
!pip install -q "huggingface_hub>=0.16.0" "torch>=2.0.0" "transformers>=4.30.0,<5.0.0" "torchmetrics>=0.11.0" "lightning>=2.0.0" konlpy psutil

# Setup repository
!git clone -q https://github.com/shc443/sema_inf.git
%cd sema_inf
!pip install -q -e .

# Check GPU
import torch
if torch.cuda.is_available():
    print(f"🚀 GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory/1024**3:.1f}GB")
else:
    print("⚠️ No GPU - change runtime to GPU for faster processing")

# Test system monitoring
try:
    import psutil
    print(f"🖥️ System: CPU={psutil.cpu_count()} cores, RAM={psutil.virtual_memory().total/1024**3:.1f}GB")
    print("✅ Monitoring system ready")
except:
    print("⚠️ Monitoring system not available")

print("✅ Setup complete!")

## 📁 Upload Files to Input Folder

In [None]:
from colab_cli import SemaColabCLI
import os

print("🚀 Initializing SEMA...")
sema = SemaColabCLI()

# Create input/output directories if they don't exist
os.makedirs('data/input', exist_ok=True)
os.makedirs('data/output', exist_ok=True)

print("📁 Upload your Excel files (with VOC1/VOC2 columns):")
print("Files will be saved to input folder for processing.")
uploaded_files = sema.upload_files()

if uploaded_files:
    print(f"✅ Uploaded {len(uploaded_files)} files to input folder!")
    print("📂 Files ready for processing. Run the next cell to process all files.")
else:
    print("❌ No files uploaded.")

## 📊 Check Status

# Import safe processing system
from colab_safe_processor import run_safe_processing

# Run complete processing with timeout and error monitoring
print("🚀 Starting Safe Automated Processing...")
print("⏰ Each file has 15-minute timeout protection")
print("📊 Progress will be monitored and logged")
print("🚨 Errors will be saved to logs/errors/ folder")
print()

# Process all files with full safety monitoring
success = run_safe_processing()

if success:
    print("\n🎉 ALL PROCESSING COMPLETED SUCCESSFULLY!")
    print("📥 Results have been automatically downloaded")
    print("✅ Check your Downloads folder for the output files")
else:
    print("\n❌ Processing failed or incomplete")
    print("🔍 Check the logs above for details")
    print("💡 Try restarting runtime if GPU issues occurred")

In [None]:
## 📊 Check Status & Files

In [ ]:
import os
import torch

input_files = [f for f in os.listdir('data/input') if f.endswith('.xlsx')]
output_files = [f for f in os.listdir('data/output') if f.endswith('.xlsx')]

print(f"📁 Input: {len(input_files)} files")
print(f"📁 Output: {len(output_files)} files")
print(f"🖥️ GPU: {torch.cuda.is_available()}")

## 📥 Download Results Again

In [None]:
# Download results again (if needed)
try:
    sema.download_results()
    print("✅ All output files downloaded!")
except NameError:
    # If sema not initialized, initialize and download
    from colab_cli import SemaColabCLI
    sema = SemaColabCLI()
    sema.download_results()
    print("✅ All output files downloaded!")

## 🆘 Troubleshooting & Safety Features

**New Safety Features:**
- ⏰ **15-minute timeout** per file (prevents hanging)
- 📊 **Real-time monitoring** of CPU, RAM, and GPU usage
- 🚨 **Automatic error reporting** saved to logs/errors/
- 🛡️ **Process recovery** with detailed status logging
- 🧹 **Automatic cleanup** of GPU memory on errors

**Common Issues:**
1. **No GPU**: Runtime → Change runtime type → GPU
2. **Processing hangs**: Automatic timeout will stop and report the issue
3. **Memory error**: GPU memory will be automatically cleared
4. **Setup failed**: Restart runtime and run setup again
5. **Timeout occurred**: Check logs/errors/ folder for detailed report

**Error Logs Location:**
- 📁 **logs/**: Processing logs with timestamps
- 🚨 **logs/errors/**: Detailed error reports in JSON format

**Output Format:**
- Original columns preserved
- **VOC**: Cleaned text
- **topic**: Extracted topics  
- **sentiment**: Positive/negative analysis
- **keyword**: Key extracted words

**If Problems Persist:**
1. Restart Colab runtime (Runtime → Restart runtime)
2. Process files in smaller batches
3. Check error reports in logs/errors/ folder
4. Verify Excel files have VOC1/VOC2 columns with Korean text