Skip to content

sshkhr/MinText

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MinText

Welcome to MinText! This repository contains hands-on tutorials for the workshop "Scaling Large Language Models: Getting Started with Large-Scale Parallel Training of LLMs".

Note: 🚧 MinText (the library) is currently under construction - when ready it will be a useful minimal implementation of various parallelism strategies for transformer training in JAX/Flax. In the meantime, please follow the tutorials in this repository to learn about distributed training techniques for large language models. 🚧

Workshop Overview

As large language models (LLMs) grow increasingly larger, it is no longer possible to train them effectively with the memory capacity of single or a few GPUs. This workshop covers fundamental parallelism dimensions—data, tensor, and pipeline parallelism—and how to compose them effectively for training billion-parameter models. The content is available in the Jupyter notebooks:

  1. Parallelization Basics
  2. Data Parallelism and FSDP
  3. Tensor Parallelism and Transformers
  4. Up Next

Running the Notebooks

The notebooks are designed to run on Google Colab with v2-8 TPU runtime, which provides 8 TPU devices for free. This allows you to experiment with multi-device parallelization without any cost. Before running the notebooks, make sure to select the TPU runtime by going to Runtime > Change runtime type and selecting v2-8 TPU as the hardware accelerator.

TPU run time

If you prefer to run the notebooks locally, you can simulate 8 devices by adding the following at the beginning of each notebook:

import os
os.environ["XLA_FLAGS"] = '--xla_force_host_platform_device_count=8' # Use 8 CPU devices

You can also view the pre-executed notebooks with all outputs at: https://mintext.readthedocs.io

Acknowledgments

Most of the explanations and code in these tutorials are adapted from:

License

This project is licensed under MIT License.

About

Minimalistic 4D-parallelism distributed training and inference framework in JAX

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors