The first Linux Docker container fully tested and optimized for NVIDIA RTX 5090 and RTX 5060 Blackwell GPUs, providing native support for both PyTorch and TensorFlow with CUDA 12.8. Run machine learning and deep learning workloads on the latest NVIDIA Blackwell architecture (RTX 50-series) with zero configuration on Ubuntu 24.04.2 LTS. Also supports RTX 40-series (4090, 4080, 4070) for mixed multi-GPU environments.
Fork Note: This project is an enhanced fork of wutzebaer/tensorflow-5090, extended to include PyTorch support, improved documentation, and additional utilities for multi-framework ML development.
- Why This Container?
- Features
- Blackwell GPU Support Status
- Quick Start
- Prerequisites
- Installation
- Using the Container
- GPU Control
- Common Tasks
- Troubleshooting
- Project Structure
- System Requirements
- What's Included
- Author
If you've just bought an RTX 5090, RTX 5060, or other Blackwell GPU and discovered that:
- ❌ PyTorch doesn't work out-of-the-box (kernel errors)
- ❌ TensorFlow requires manual compilation
- ❌ Existing Docker images don't support SM 12.0 architecture
- ❌ Mixed RTX 50/40 series setups are complicated
This container solves all these problems. Get your Blackwell GPU running with PyTorch and TensorFlow in minutes, not days.
- ✅ RTX 5090 & RTX 5060 Tested - Fully validated on actual Blackwell hardware
- ✅ NVIDIA Blackwell Support - SM 12.0 architecture with CUDA 12.8
- ✅ PyTorch Nightly - Latest builds with experimental Blackwell support
- ✅ TensorFlow Nightly - JIT compilation for optimal Blackwell performance
- ✅ Multi-GPU Ready - Mix RTX 50-series with RTX 40-series (4090, 4080, 4070)
- ✅ Zero Configuration - Works out-of-the-box with RTX 5090/5060/5080/5070
- ✅ Ubuntu 24.04 + Python 3.11 - Modern stack for latest hardware
- ✅ Jupyter Lab Included - Start training immediately
- RTX 5090 ✅ Fully tested and operational
- RTX 5060 ✅ Fully tested and operational
- RTX 5080/5070 ⚡ Compatible (same architecture)
Framework | Blackwell Support | Status |
---|---|---|
TensorFlow | Full support via JIT | ✅ Production ready |
PyTorch | Experimental (nightly) | 🔄 Functional, awaiting PyTorch 2.7 |
CUDA/cuDNN | Native 12.8 support | ✅ Full compatibility |
Note: First-run TensorFlow operations compile PTX to native Blackwell code (one-time ~30s delay). PyTorch uses pre-compiled kernels where available and falls back to JIT compilation.
For users with RTX 5090/5060 or other Blackwell GPUs who already have Docker and NVIDIA drivers:
# Clone the repository
git clone https://github.com/dconsorte/pytorch-tensorflow-gpu.git
cd pytorch-tensorflow-gpu
# Create workspace
mkdir -p ~/ai_training_env
# Build and run
chmod +x build.sh
./build.sh
nvidia-smi
If this command works and shows your GPU, skip to step 2. If not, install drivers:
- Open Software & Updates
- Go to Additional Drivers tab
- Select NVIDIA driver metapackage from nvidia-driver-570-open (proprietary, tested)
- The "open kernel" version is recommended and tested
- Version 570 is tested with RTX 50-series
- Click Apply Changes and restart
# Check available drivers
ubuntu-drivers devices
# Install recommended driver
sudo ubuntu-drivers autoinstall
# Restart
sudo reboot
# Verify after restart
nvidia-smi
Why Docker Buildx? Modern Docker installations include Buildx by default, but some older or minimal installations might not have it. Our installation script ensures both Docker and Buildx are properly installed.
Learn more about Docker:
docker --version
If Docker is not installed or you're unsure about Buildx:
# Our script handles everything - Docker Engine, Buildx, and NVIDIA Container Toolkit
chmod +x install_docker_and_buildx.sh
./install_docker_and_buildx.sh
# After installation, log out and back in, or run:
newgrp docker
# Option A: Clone from GitHub
git clone https://github.com/dconsorte/pytorch-tensorflow-gpu.git
cd pytorch-tensorflow-gpu
# Option B: Download and extract ZIP
# Extract the downloaded ZIP and navigate to the folder
cd pytorch-tensorflow-gpu
# Check NVIDIA driver
nvidia-smi # Should show your GPU(s)
# Check Docker
docker run hello-world # Should work without 'sudo'
# Check GPU access in Docker
docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu24.04 nvidia-smi
If any of these fail, go back to Prerequisites.
# This is where your projects and data will be stored
mkdir -p ~/ai_training_env
# Make build script executable
chmod +x build.sh
# Build and start container (15-30 minutes first time)
./build.sh
The script will:
- Check your NVIDIA driver
- Build the Docker image with all frameworks
- Create and start the container
- Mount your workspace at
/workspace
inside the container
# Start existing container (after first build)
docker start -ai PYTORCH_TENSORFLOW_GPU
# Stop container (preserves your work)
docker stop PYTORCH_TENSORFLOW_GPU
# Remove container (⚠️ WARNING: Only if rebuilding - this deletes the container!)
docker rm PYTORCH_TENSORFLOW_GPU
When you enter the container, you'll see a summary of installed versions. Test everything works:
# Run comprehensive GPU test
python /workspace/test_gpu.py
# Note: Warnings during the test are normal and can be ignored
# Look for the ✅ checkmarks at the end - these confirm everything is working
# Quick checks
python -c "import torch; print(f'PyTorch CUDA: {torch.cuda.is_available()}')"
python -c "import tensorflow as tf; print(f'TF GPUs: {len(tf.config.list_physical_devices(\"GPU\"))}')"
# Inside container, start Jupyter
jupyter lab --ip=0.0.0.0 --no-browser --allow-root
# Access from browser at: http://localhost:8888
# Token will be shown in terminal
Sometimes you want to use CPU only (debugging, comparison, GPU busy):
# Method 1: Environment variable (before imports!)
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import torch
import tensorflow as tf
# Method 2: PyTorch specific
device = torch.device('cpu')
model = model.to(device)
# Method 3: TensorFlow specific
tf.config.set_visible_devices([], 'GPU')
If you have multiple GPUs:
# Use only GPU 0
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
# Use only GPU 1
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
# Use GPUs 0 and 2
os.environ['CUDA_VISIBLE_DEVICES'] = '0,2'
# Check GPU memory
def check_gpu_memory():
import torch
for i in range(torch.cuda.device_count()):
free, total = torch.cuda.mem_get_info(i)
print(f"GPU {i}: {free/1e9:.1f}GB free / {total/1e9:.1f}GB total")
# Clear GPU memory
def clear_gpu_memory():
import torch, gc
gc.collect()
torch.cuda.empty_cache()
# Auto-select GPU with enough memory
def auto_device(min_free_gb=5):
import torch
if not torch.cuda.is_available():
return 'cpu'
for i in range(torch.cuda.device_count()):
free_gb = torch.cuda.mem_get_info(i)[0] / 1e9
if free_gb >= min_free_gb:
return f'cuda:{i}'
return 'cpu'
# Usage
device = auto_device(min_free_gb=10)
model = model.to(device)
# Inside container
pip install transformers datasets accelerate
# For a specific project, consider using requirements.txt
pip install -r /workspace/your_project/requirements.txt
# Your ~/ai_training_env folder is mounted at /workspace
# Simply copy files to ~/ai_training_env on host
# From host:
cp mydata.csv ~/ai_training_env/
# Inside container:
ls /workspace/mydata.csv
All files in /workspace
(container) are automatically saved to ~/ai_training_env
(host).
- Install "Remote - Containers" extension in VS Code
- Open VS Code in your project folder
- Click "Reopen in Container" when prompted
- Select the running container
This is expected with RTX 50-series GPUs until PyTorch 2.7:
# Current workaround - the nightly build helps but isn't perfect
# Full native support expected in PyTorch 2.7+
Normal behavior for Blackwell GPUs:
# You'll see: "Running PTX compilation for sm_120"
# This happens once per operation type (~30 seconds)
# Subsequent runs are fast
# Enable TF32 for PyTorch (Blackwell optimized)
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
# Use mixed precision (Blackwell has enhanced Tensor Cores)
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
You need to install NVIDIA drivers on your host system first. See Prerequisites.
Run the Docker installation script:
./install_docker_and_buildx.sh
Either:
- Log out and back in after installation
- Run:
newgrp docker
- Check you're in docker group:
groups | grep docker
# Remove old container
docker stop PYTORCH_TENSORFLOW_GPU
docker rm PYTORCH_TENSORFLOW_GPU
# Run build.sh again
- Check host GPU:
nvidia-smi
- Test Docker GPU:
docker run --rm --gpus all nvidia/cuda:12.8.1-base-ubuntu24.04 nvidia-smi
- Ensure NVIDIA Container Toolkit is installed:
./install_docker_and_buildx.sh
PyTorch support for Blackwell is experimental. If you see:
no kernel image is available for execution on the device
This is expected with current PyTorch versions. The nightly builds have the best support, but full Blackwell optimization is coming in PyTorch 2.7+. TensorFlow should work fine (uses JIT compilation).
# Clear memory before running
import gc
import torch
gc.collect()
torch.cuda.empty_cache()
# Use smaller batch sizes
batch_size = 16 # Instead of 64
# Use mixed precision
from torch.cuda.amp import autocast
with autocast():
output = model(input)
TensorFlow compiles kernels on first use. You'll see PTX compilation messages - this is normal and only happens once.
pytorch-tensorflow-gpu/
├── Dockerfile # Container definition
├── build.sh # Build and run script
├── test_gpu.py # GPU testing utility
├── startup.sh # Container startup script
├── install_docker_and_buildx.sh # Docker installation helper
├── .dockerignore # Build optimization
└── README.md # This file
Component | Minimum | Recommended | Optimal |
---|---|---|---|
GPU | RTX 4070 | RTX 5060 | RTX 5090 |
Architecture | Ada Lovelace (SM 8.9) | Blackwell (SM 12.0) | Blackwell (SM 12.0) |
GPU Driver | 525+ | 560+ | 560.35.03+ |
RAM | 16GB | 32GB | 64GB+ |
Storage | 30GB | 50GB | 100GB+ |
OS | Ubuntu 20.04 | Ubuntu 22.04 | Ubuntu 24.04 |
- Blackwell (Primary): RTX 5090, RTX 5080, RTX 5070, RTX 5060
- Ada Lovelace (Legacy): RTX 4090, RTX 4080, RTX 4070 Ti/Super, RTX 4070, RTX 4060 Ti, RTX 4060
Component | Version | Blackwell Optimization |
---|---|---|
Ubuntu | 24.04 LTS | Latest kernel for GPU support |
CUDA | 12.8.1 | Native Blackwell SM 12.0 |
cuDNN | 9.x | Optimized for Blackwell |
Python | 3.11 | Virtual environment |
PyTorch | 2.6.0.dev (nightly) | CUDA 12.8 PTX support |
TensorFlow | 2.19.0.dev (nightly) | JIT compilation for Blackwell |
Jupyter Lab | Latest | Pre-configured |
Key Libraries | Latest | NumPy, Pandas, Matplotlib, scikit-learn |
- PyTorch: Nightly builds include experimental Blackwell kernels not in stable releases
- TensorFlow: Nightly builds have the latest XLA optimizations for new architectures
Created by Dennis Consorte
🔗 https://dennisconsorte.com
This project is released under the MIT License. See LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
This project builds on @wutzebaer's pioneering work in tensorflow-5090, which first enabled TensorFlow on RTX 5090 GPUs. This fork extends that foundation with:
- Blackwell-Optimized PyTorch - Added nightly builds with CUDA 12.8 for RTX 50-series
- Multi-GPU Support - Mix Blackwell (RTX 5090/5060) with Ada Lovelace (RTX 4090)
- Production Testing - Validated on actual RTX 5090 and RTX 5060 hardware
- Enhanced Documentation - Specific guidance for Blackwell architecture
- Automated Setup - One-command installation for Blackwell GPU users
- Performance Optimizations - TF32 and mixed precision configs for Blackwell
If this project helps you run ML workloads on your RTX 5090, RTX 5060, or other Blackwell GPUs, please consider giving it a star on GitHub!
RTX 5090
RTX 5060
RTX 5080
RTX 5070
Blackwell GPU
NVIDIA Blackwell
PyTorch Blackwell
TensorFlow Blackwell
CUDA 12.8
RTX 50-series
RTX 5090 Docker
RTX 5060 Docker
Machine Learning
Deep Learning
GPU Docker
RTX 4090
Multi-GPU
Blackwell Architecture
SM 12.0