A comprehensive web-based automation tool for managing GPU containers and Jupyter notebooks on the Hertie School GPU server. This application provides an intuitive interface for container lifecycle management, automatic GPU optimization, and seamless Jupyter notebook access.
- 🔐 Secure Authentication: SSH-based authentication to the GPU server
- 📦 Complete Container Management: Create, start, stop, and remove ML containers
- 🚀 Smart Jupyter Integration: Launch Jupyter notebooks with automatic port forwarding and no authentication required
- 🎯 Intelligent GPU Selection: Automatically selects the least loaded GPU based on utilization and memory usage
- 🌐 Modern Web Interface: Responsive, animated UI with real-time progress tracking
- 🔄 Robust Session Management: Persistent sessions with automatic cleanup and port management
- 🧹 Advanced Cleanup Tools: Port cleanup and session management utilities
- ⚡ Real-time Progress Tracking: Visual progress indicators for container creation and Jupyter launches
- ✅ Container Removal: Interactive container removal with confirmation
- ✅ Loading Animations: Visual feedback during container creation and operations
- ✅ Enhanced UI: Improved layout with better session ID visibility
- ✅ Progress Tracking: Real-time progress modal for Jupyter launches
- ✅ Smart GPU Selection: Automatically finds GPU with lowest utilization
- ✅ GPU Information Display: Shows which specific GPU is being used
- ✅ Resource Monitoring: Tracks GPU utilization and memory usage
- ✅ No Authentication Required: Jupyter notebooks launch without token/password
- ✅ Auto-expanding Progress Modal: Dynamic UI that adapts to operation steps
- ✅ Session Persistence: Maintains connections across browser sessions
- ✅ Error Handling: Comprehensive error messages and recovery
- Python 3.8 or higher
- SSH access to the Hertie GPU server (10.1.23.20)
- Network access to the server
- Modern web browser with JavaScript enabled
-
Clone or download the project files
-
Install dependencies:
pip install -r requirements.txt
-
Configure the application (optional):
- Edit
config.pyto modify server settings, ports, or timeouts - Default configuration is optimized for the Hertie GPU server
- Edit
This app is optimized for Railway deployment with zero configuration required!
Quick Deploy:
- Fork this repository to your GitHub account
- Sign up at railway.app (free, no credit card)
- Create new project → "Deploy from GitHub repo"
- Select your forked repository
- Set environment variables (see DEPLOYMENT.md)
- Deploy! 🚀
Benefits of Railway:
- ✅ Free tier: 500 hours/month
- ✅ Automatic deployments from Git
- ✅ SSL certificates included
- ✅ Global CDN for fast access
- ✅ No server management required
See DEPLOYMENT.md for detailed deployment instructions.
-
Run the Flask app:
python app.py
-
Access the web interface:
- Open your browser and go to
http://localhost:2344 - The app automatically finds an available port if 2344 is busy
- Current port is displayed in the console output
- Open your browser and go to
- Enter your Hertie School email and password
- Click "Authenticate" to establish SSH connection
- Session ID is displayed in the header for reference
- View Containers: See all your containers with status, framework, and version
- Create Containers:
- Choose from TensorFlow, PyTorch, or MXNet
- Select specific versions
- Real-time loading animation during creation
- Start/Stop Containers: Manage container states
- Remove Containers: Interactive removal with confirmation dialog
- Click "🌐 Launch Jupyter" on any running container
- Watch real-time progress with detailed steps:
- Container startup
- GPU selection (shows specific GPU number)
- Environment setup
- Port forwarding
- Jupyter opens automatically in a new tab
- No authentication required - direct access
- Session ID: Visible in header for reference
- Cleanup Ports: Clean up SSH tunnels while keeping session
- Logout: Complete session cleanup
# Server Configuration
SERVER_HOST = "10.1.23.20" # GPU server IP
SERVER_PORT = 22 # SSH port
# Local Port Configuration
LOCAL_PORT_RANGE = range(9000, 9100) # Ports for Jupyter forwarding
# Flask App Configuration
FLASK_HOST = "0.0.0.0"
FLASK_PORT = 2344
FLASK_DEBUG = False- TensorFlow: 2.11.0, 2.10.0, 2.9.2-jlab, 2.9.0, 2.8.0, 2.7.0, 2.6.1, 2.5.0, 2.4.1, 2.4.0, 2.3.1-nvidia, 1.15.4-nvidia
- PyTorch: 2.1.0-aime, 2.1.0, 2.0.1-aime, 2.0.1, 2.0.0, 1.14.0a-nvidia, 1.13.1-aime, 1.13.0a-nvidia, 1.12.1-aime
- MXNet: 1.8.0-nvidia
app.py: Main Flask application with all routes and business logicGPUServerManager: Advanced class handling SSH connections, container operations, and GPU optimizationtemplates/: Modern HTML templates with JavaScript for interactive UIconfig.py: Configuration settings and server parameters- Test Files: Comprehensive test suite for all functionality
-
SSH Connection Management:
- Secure connection to GPU server with keepalive
- Interactive command support (for container removal)
- Automatic connection cleanup and error handling
-
Container Operations:
- Container creation with framework/version selection
- Start/stop container management
- Interactive container removal with confirmation
- Real-time status monitoring
-
Jupyter Integration:
- Automatic Jupyter startup in containers
- Port forwarding setup with automatic port discovery
- Authentication disabled for seamless access
- Progress tracking with detailed steps
-
GPU Optimization:
- Automatic GPU selection based on utilization and memory
- Real-time GPU usage monitoring
- Display of selected GPU information
-
Session Management:
- Persistent user sessions with timeout
- Automatic session cleanup
- Port management and cleanup utilities
Run the comprehensive test suite:
# Main functionality tests
python test_app.py
# Container removal tests
python test_container_removal.py
# SSH connection tests
python test_ssh_manual.pyThe test suite includes:
- SSH connection and authentication tests
- Container management (create, start, stop, remove) tests
- Jupyter launch and GPU selection tests
- Utility function tests
- Flask app integration tests
-
SSH Connection Failed:
- Verify your credentials (N.Thing@students.hertie-school.org)
- Check network connectivity to 10.1.23.20
- Ensure SSH access is enabled
-
Port Already in Use:
- The app automatically finds available ports (9000-9099)
- Check if another instance is running
- Use "Cleanup Ports" button to clear orphaned connections
-
Container Creation Failed:
- Verify framework and version combinations
- Check server resources
- Ensure container name is unique
- Watch for loading animation and error messages
-
Jupyter Not Starting:
- Check if container is running
- Verify port forwarding setup
- Check progress modal for specific error steps
- Ensure no firewall blocking local ports
-
Container Removal Issues:
- Containers must be stopped before removal
- Use interactive confirmation (Y/N)
- Check for running processes in container
Enable debug mode in config.py:
FLASK_DEBUG = TrueThis provides detailed error messages and auto-reload on code changes.
- SSH passwords stored in memory only during active sessions
- Sessions automatically timeout after 1 hour
- All connections use secure SSH protocol
- Jupyter authentication disabled for convenience (use only on trusted networks)
- Interactive container removal requires confirmation
- Flask: Web framework for the application
- Paramiko: SSH client library with interactive support
- Werkzeug: WSGI utilities
- Cryptography: Security utilities for SSH connections
- Flask-SocketIO: WebSocket support for interactive shell
- Gunicorn: Production WSGI server (for deployment)
- Eventlet: Async networking library
The following files are included for Railway deployment:
Procfile: Tells Railway how to run the appruntime.txt: Specifies Python version (3.9.18)railway.json: Railway-specific configurationnixpacks.toml: Build configuration with SSH supportrequirements.txt: Python dependenciesenv.example: Environment variables templateDEPLOYMENT.md: Comprehensive deployment guide
This project is developed for internal use at the Hertie School.
For issues or questions:
- Check the troubleshooting section above
- Review the test output for specific errors
- Check server logs for detailed error messages
- Verify network connectivity to the GPU server
- Ensure proper credentials and SSH access
- ✅ Container Removal: Interactive removal with confirmation
- ✅ GPU Selection: Automatic selection with specific GPU display
- ✅ Loading Animations: Visual feedback for all operations
- ✅ Progress Tracking: Real-time progress for Jupyter launches
- ✅ UI Improvements: Better layout and session management
- ✅ Error Handling: Comprehensive error messages and recovery