# SEMA Analytics - Docker Edition for Google Colab

This notebook runs SEMA inference in a Docker container for consistent, reproducible results across any environment.

**No more environment setup issues!**

## Option 1: Run Pre-built Docker Image (Fastest)

If the Docker image is already published to Docker Hub, use this method.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Install Docker in Colab
print("Installing Docker...")
!curl -fsSL https://get.docker.com -o get-docker.sh
!sh get-docker.sh
!systemctl start docker 2>/dev/null || service docker start
print("‚úÖ Docker installed!")

# Verify installation
!docker --version

In [None]:
# Setup directories
import os
os.makedirs('/content/data/input', exist_ok=True)
os.makedirs('/content/data/output', exist_ok=True)

print("üìÅ Directories created")
print("üì§ Upload your Excel files to /content/data/input/")

In [None]:
# Upload files
from google.colab import files

print("üì§ Please select your Excel files to upload...")
uploaded = files.upload()

# Move uploaded files to input directory
import shutil
for filename in uploaded.keys():
    if filename.endswith('.xlsx') and not filename.startswith('~'):
        shutil.move(filename, f'/content/data/input/{filename}')
        print(f"‚úÖ Uploaded: {filename}")

# List files
!ls -lh /content/data/input/

In [None]:
# Pull pre-built Docker image (REPLACE WITH YOUR DOCKER HUB USERNAME)
DOCKER_IMAGE = "your-dockerhub-username/sema-inference:latest"

print(f"üì• Pulling Docker image: {DOCKER_IMAGE}")
!docker pull {DOCKER_IMAGE}
print("‚úÖ Image pulled successfully!")

In [None]:
# Run Docker container
print("üöÄ Running SEMA inference in Docker...")
print("This may take 5-10 minutes depending on the number of files.")
print("")

!docker run --rm \
  --gpus all \
  -v /content/data/input:/workspace/data/input \
  -v /content/data/output:/workspace/data/output \
  {DOCKER_IMAGE}

print("")
print("‚úÖ Processing complete!")

In [None]:
# Check output files
!ls -lh /content/data/output/

In [None]:
# Download results
import os
from google.colab import files

output_dir = '/content/data/output'
output_files = [f for f in os.listdir(output_dir) if f.endswith('.xlsx')]

if output_files:
    print(f"üì• Downloading {len(output_files)} result files...")
    for filename in output_files:
        files.download(os.path.join(output_dir, filename))
    print("‚úÖ All files downloaded!")
else:
    print("‚ùå No output files found")

---

## Option 2: Build Docker Image from Source (Slower, More Flexible)

If you need to build the Docker image from source or make modifications.

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Install Docker
print("Installing Docker...")
!curl -fsSL https://get.docker.com -o get-docker.sh
!sh get-docker.sh
!systemctl start docker 2>/dev/null || service docker start
print("‚úÖ Docker installed!")

!docker --version

In [None]:
# Clone repository
import os

if os.path.exists('/content/sema_inf'):
    !cd /content/sema_inf && git pull origin main
else:
    !git clone https://github.com/shc443/sema_inf /content/sema_inf

!cd /content/sema_inf && git reset --hard origin/main
print("‚úÖ Repository ready")

In [None]:
# Build Docker image
print("üî® Building Docker image...")
print("This will take 10-15 minutes on first build.")
print("")

!cd /content/sema_inf && docker build -t sema-inference:latest .

print("")
print("‚úÖ Docker image built successfully!")

In [None]:
# Setup data directories
!mkdir -p /content/sema_inf/data/input
!mkdir -p /content/sema_inf/data/output

print("üìÅ Directories ready")

In [None]:
# Upload files
from google.colab import files
import shutil

print("üì§ Please select your Excel files to upload...")
uploaded = files.upload()

for filename in uploaded.keys():
    if filename.endswith('.xlsx') and not filename.startswith('~'):
        shutil.move(filename, f'/content/sema_inf/data/input/{filename}')
        print(f"‚úÖ Uploaded: {filename}")

!ls -lh /content/sema_inf/data/input/

In [None]:
# Run Docker container
print("üöÄ Running SEMA inference in Docker...")
print("This may take 5-10 minutes.")
print("")

!cd /content/sema_inf && docker run --rm \
  --gpus all \
  -v $(pwd)/data/input:/workspace/data/input \
  -v $(pwd)/data/output:/workspace/data/output \
  sema-inference:latest

print("")
print("‚úÖ Processing complete!")

In [None]:
# Check results
!ls -lh /content/sema_inf/data/output/

In [None]:
# Download results
import os
from google.colab import files

output_dir = '/content/sema_inf/data/output'
output_files = [f for f in os.listdir(output_dir) if f.endswith('.xlsx')]

if output_files:
    print(f"üì• Downloading {len(output_files)} result files...")
    for filename in output_files:
        files.download(os.path.join(output_dir, filename))
    print("‚úÖ All files downloaded!")
else:
    print("‚ùå No output files found")

---

## Troubleshooting

### Docker installation fails
- Restart runtime and try again
- Colab may have restrictions on Docker installation

### GPU not available
- Change runtime type to GPU: Runtime ‚Üí Change runtime type ‚Üí GPU
- Remove `--gpus all` flag to run on CPU (slower)

### Build takes too long
- Use Option 1 with pre-built image from Docker Hub
- Colab has time limits, may need to use local machine

### Files not processing
- Check Excel files have VOC1 and VOC2 columns
- Check container logs: `!docker logs <container-id>`