# 🎯 Welcome to LDaCA - Language Data Commons of Australia

## 📋 What is LDaCA?

The **Language Data Commons of Australia (LDaCA)** is a comprehensive platform for discovering, accessing, and working with language and speech data. This BinderHub deployment provides an interactive environment where you can:

- 🔍 Explore language datasets and collections
- 📊 Analyze linguistic data using the DocFrame and DocWorkspace libraries
- 🚀 Work with the LDaCA web application and API
- 📝 Create and share research notebooks

## 🚀 Getting Started in BinderHub

### LDaCA Web Application

The LDaCA web application is running in this container and accessible through Jupyter's server proxy:

**🌐 [Access LDaCA Web App →](../proxy/443/)**

This will open the LDaCA interface where you can:
- Browse language data collections
- Search and filter datasets
- View metadata and documentation
- Access data files

### API Access

The LDaCA FastAPI backend is also available:

**🔧 [LDaCA API Documentation →](../proxy/8001/docs)**

## 📚 Available Libraries

This environment includes the core LDaCA libraries:

### DocFrame
A powerful library for working with document collections:
- Load and manipulate text corpora
- Extract metadata and content
- Perform text analysis operations

### DocWorkspace  
Workspace management for document collections:
- Organize and structure data
- Create reproducible workflows
- Manage file operations

## 🏃‍♂️ Quick Start Examples

Click the links below to explore example notebooks:

- **Basic Usage**: Introduction to DocFrame and DocWorkspace
- **Data Exploration**: Working with language datasets
- **Text Analysis**: Common text processing tasks
- **API Integration**: Using the LDaCA web API

## 💡 Tips for BinderHub Users

1. **Persistent Work**: Your work will be preserved during the session but may be lost when the container stops
2. **File Management**: Use the file browser on the left to navigate and create files
3. **Multiple Services**: The LDaCA web app runs alongside Jupyter - both are available simultaneously
4. **Help Resources**: Check the `examples/` directory for sample notebooks and data

## 🔧 Technical Details

- **Python Environment**: Python 3.10 with LDaCA libraries pre-installed
- **Services**: nginx (port 443), FastAPI backend (port 8001), Jupyter Lab
- **Data Location**: Sample data available in `examples/data/`
- **Service Management**: All services managed via supervisor daemon

---

**Ready to explore? Start with the [LDaCA Web App](../proxy/443/) or create a new notebook below!**

In [None]:
# Quick Library Check
import sys
import os

print("🐍 Python Environment:")
print(f"Python version: {sys.version}")
print(f"Working directory: {os.getcwd()}")

# Check if LDaCA libraries are available
try:
    import docframe
    print("✅ DocFrame library loaded successfully")
    print(f"DocFrame version: {docframe.__version__ if hasattr(docframe, '__version__') else 'Unknown'}")
except ImportError as e:
    print(f"⚠️  DocFrame library not found: {e}")

try:
    import docworkspace
    print("✅ DocWorkspace library loaded successfully") 
    print(f"DocWorkspace version: {docworkspace.__version__ if hasattr(docworkspace, '__version__') else 'Unknown'}")
except ImportError as e:
    print(f"⚠️  DocWorkspace library not found: {e}")

print("\n🌐 Service Status:")
print("LDaCA Web App: https://[your-binder-url]/proxy/443/")
print("LDaCA API: https://[your-binder-url]/proxy/8001/docs")
print("\n🎉 Ready to start exploring LDaCA!")

In [None]:
# LDaCA Service Startup (Run this first!)
import subprocess
import time
import os
import requests
from pathlib import Path

print("🚀 Starting LDaCA Services...")

# Check if services are already running
def check_service(port, name):
    try:
        response = requests.get(f"http://localhost:{port}", timeout=2)
        return response.status_code < 500
    except:
        return False

# Check current service status
ldaca_running = check_service(443, "LDaCA Web App")
api_running = check_service(8001, "LDaCA API")

if ldaca_running and api_running:
    print("✅ All LDaCA services are already running!")
else:
    print("🔄 Starting LDaCA services...")
    
    # Start the service launcher in background
    if os.path.exists("/usr/local/bin/start-ldaca-services.sh"):
        try:
            # Run the startup script
            subprocess.Popen(["/usr/local/bin/start-ldaca-services.sh"], 
                           stdout=subprocess.PIPE, 
                           stderr=subprocess.PIPE)
            print("🔄 Service launcher started...")
            
            # Wait a moment for services to start
            time.sleep(5)
            
            # Re-check services
            ldaca_running = check_service(443, "LDaCA Web App")
            api_running = check_service(8001, "LDaCA API")
            
            if ldaca_running:
                print("✅ LDaCA Web App is running!")
            else:
                print("⚠️  LDaCA Web App may still be starting...")
                
            if api_running:
                print("✅ LDaCA API is running!")
            else:
                print("⚠️  LDaCA API may still be starting...")
                
        except Exception as e:
            print(f"⚠️  Error starting services: {e}")
            print("🔧 You can manually start services later if needed")
    else:
        print("⚠️  Service launcher script not found")

print(f"\n🌐 Quick Links:")
print(f"  • LDaCA Web App: ../proxy/443/")
print(f"  • LDaCA API Docs: ../proxy/8001/docs")
print(f"  • API Health: ../proxy/8001/api/health")

# Check for startup marker from postBuild
if Path("/home/jovyan/.ldaca-start-marker").exists():
    print("\n✅ PostBuild completed successfully")

## ⏰ Service Startup Information

**Important**: LDaCA services (nginx, FastAPI backend) are starting in the background and may take 10-15 seconds to become fully available after Jupyter loads.

### Service Status Indicators:
- ✅ **Jupyter**: Ready immediately  
- 🔄 **LDaCA Services**: Starting in background (check `/tmp/supervisor.log` for details)
- 🌐 **Web App Access**: Available at [/proxy/443/](../proxy/443/) once services are ready
- 📚 **API Access**: Available at [/proxy/8001/docs](../proxy/8001/docs) once services are ready

If LDaCA web app links don't work immediately, please wait a moment and try again!

In [None]:
# Check LDaCA Service Status
import requests
import time
import os

def check_service_status():
    """Check if LDaCA services are ready"""
    services = {
        "LDaCA Web App (nginx)": "http://localhost:443",
        "LDaCA API (FastAPI)": "http://localhost:8001/api/health"
    }
    
    print("🔍 Checking LDaCA service status...\n")
    
    for service_name, url in services.items():
        try:
            response = requests.get(url, timeout=3)
            if response.status_code == 200:
                print(f"✅ {service_name}: Ready")
            else:
                print(f"🔄 {service_name}: Starting (HTTP {response.status_code})")
        except requests.exceptions.ConnectionError:
            print(f"🔄 {service_name}: Starting (not yet available)")
        except requests.exceptions.Timeout:
            print(f"⏳ {service_name}: Starting (timeout)")
        except Exception as e:
            print(f"❓ {service_name}: {str(e)}")
    
    # Check supervisor log if available
    log_file = "/tmp/supervisor.log"
    if os.path.exists(log_file):
        print(f"\n📋 Recent supervisor log entries:")
        with open(log_file, 'r') as f:
            lines = f.readlines()
            for line in lines[-5:]:  # Show last 5 lines
                print(f"   {line.strip()}")
    else:
        print("\n⏳ Supervisor log not yet available - services still starting")
    
    print(f"\n🌐 Access links:")
    print(f"   • LDaCA Web App: /proxy/443/")
    print(f"   • LDaCA API: /proxy/8001/docs")

# Run the check
check_service_status()