Quick Start

This repo provides a nix flake that can be used to run the NVIDIA GPT-2B-001 model (https://huggingface.co/nvidia/GPT-2B-001) under WSL (Windows Subsystem for Linux).

The flake currently hard-codes support for the RTX 3090 (Compute Capability 8.1 from https://en.wikipedia.org/wiki/CUDA#GPUs_supported) and will not work on a vanilla Linux install, as it assumes the CUDA drivers are in the default location for WSL.

Quick Start

Nix flakes allow you to run an app without the associated repo; however, for simplicity these instructions assume you have cloned the repo.

The flake provides two apps; a server for hosting the model (megatron-gpt-eval, and also the default), and a script sending prompts to the model (chat).

Download the GPT-2B-001 model from https://huggingface.co/nvidia/GPT-2B-001/blob/main/GPT-2B-001_bf16_tp1.nemo and save it locally.
Run the server using the following:

$ nix run --impure --option sandbox false .#megatron-gpt-eval -- \
  gpt_model_file=<absolute path to GPT-2B-001_bf16_tp1.nemo> \
  trainer.precision=bf16 \
  server=True \
  tensor_model_parallel_size=1 \
  trainer.devices=1

The path to GPT-2B-001_bf16_tp1.nemo must be absolute, as the server will change directories before starting.

Note that this step will take quite awhile the first time, as CUDA, pytorch and other dependencies are downloaded and built. You may be able to speed up your build using the CUDA maintainers cachix (cachix use cuda-maintainers); see instructions at https://nixos.wiki/wiki/CUDA#Building_CUDA_packages_with_Nix. I have also set up a cache at m4dc4p-nix that you can use by running cachix use m4dc4p-nix.

Once the server is running, you can send it prompts using the included chat app:

$ nix run .#chat
Enter something:

Notes

By default, nixpkgs does not build CUDA artifacts, as they have an unfree license. That means pytorch is not built with CUDA suppor either. This flake builds pytorch CUDA support. However, under WSL, the CUDA libraries are mounted in a special location (/usr/lib/wsl/lib), and in all likelihood actually run Windows code.

When buildilng a nix package on Linux (and thus WSL), the "sandbox" option defaults to true, which means that the build cannot see the CUDA libraries. That is why the --option sandbox false argument must be passed to nix run command above.

If your CUDA libraries are in some other location, you can set the LIBCUDA_PATH environment variable to point to them.

NeMo Shell

The flake provides a shell (nix develop .#) which should be sufficient to run the megatron_gpt_eval.py script directly.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
apex.patch		apex.patch
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

apex.patch

apex.patch

flake.lock

flake.lock

flake.nix

flake.nix

Repository files navigation

Quick Start

Notes

NeMo Shell

About

Releases

Packages

Languages

License

m4dc4p/gpt-2b-001-flake

Folders and files

Latest commit

History

Repository files navigation

Quick Start

Notes

NeMo Shell

About

Resources

License

Stars

Watchers

Forks

Languages