Skip to content

Commit

Permalink
Remove FSDP (#473)
Browse files Browse the repository at this point in the history
* remove fsdp

* rm fsdp in tests

* Readme

---------

Co-authored-by: Pascal Pfeiffer <pascal.pfeiffer@h2o.ai>
  • Loading branch information
haqishen and pascal-pfeiffer authored Oct 31, 2023
1 parent 511b4a8 commit b86709a
Show file tree
Hide file tree
Showing 21 changed files with 31 additions and 83 deletions.
12 changes: 12 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,15 @@ Please make sure your pull request fulfills the following checklist:
☐ If your contribution is still a work in progress, change the PR to draft mode.
☐ Ensure that the existing tests pass by running `make test`.
☐ Make sure `make style` passes to maintain consistent code style.

## Installing custom packages

If you need to install additional Python packages into the environment, you can do so using pip after activating your virtual environment via ```make shell```. For example, to install flash-attention, you would use the following commands:

```bash
make shell
pip install flash-attn --no-build-isolation
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
```

For a PR, update the Pipfile and the Pipfile.lock via ```pipenv install package_name```.
31 changes: 14 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,9 +54,10 @@ Using CLI for fine-tuning LLMs:

## What's New

- [PR 288](https://github.com/h2oai/h2o-llmstudio/pull/288) Introduced Deepspeed for sharded training allowing to train larger models on machines with multiple GPUs. Requires NVLink. This feature replaces FSDP and offers more flexibility. Deepspeed requires a system installation of cudatoolkit and we recommend using version 11.8. See [Recommended Install](#recommended-install).
- [PR 449](https://github.com/h2oai/h2o-llmstudio/pull/449) New problem type for Causal Classification Modeling allows to train binary and multiclass models using LLMs.
- [PR 364](https://github.com/h2oai/h2o-llmstudio/pull/364) User secrets are now handled more securely and flexible. Support for handling secrets using the 'keyring' library was added. User settings are tried to be migrated automatically.
- [PR 328](https://github.com/h2oai/h2o-llmstudio/pull/328) RLHF is now a separate problem type. Note that starting a new RLHF experiment from an old experiment that used RLHF is no longer supported. To continue from a previous experiment, please start a new experiment and enter the settings from the previous experiment manually.
- [PR 328](https://github.com/h2oai/h2o-llmstudio/pull/328) RLHF is now a separate problem type. Note that starting a new RLHF experiment from an old experiment that used RLHF is no longer supported. To continue from a previous experiment, please start a new experiment and enter the settings from the previous experiment manually.
- [PR 308](https://github.com/h2oai/h2o-llmstudio/pull/308) Sequence to sequence models have been added as a new problem type.
- [PR 152](https://github.com/h2oai/h2o-llmstudio/pull/152) Add RLHF functionality for fine-tuning LLMs.
- [PR 132](https://github.com/h2oai/h2o-llmstudio/pull/131) Add 4bit training that allows training of larger LLM backbones with less GPU memory. See [here](https://huggingface.co/blog/4bit-transformers-bitsandbytes) for a comprehensive summary of this method.
Expand Down Expand Up @@ -90,13 +91,21 @@ If deploying on a 'bare metal' machine running Ubuntu, one may need to install t
```bash
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda-repo-ubuntu2004-11-4-local_11.4.3-470.82.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-4-local_11.4.3-470.82.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu2004-11-4-local/7fa2af80.pub
sudo apt-get -y update
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda-repo-ubuntu2004-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-11-8-local_11.8.0-520.61.05-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-11-8-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
```

alternatively, one can install cudatoolkits in a cuda environmet:

```bash
conda create -n llmstudio python=3.10
conda activate llmstudio
conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
```

#### Create virtual environment (pipenv)

The following command will create a virtual environment using pipenv and will install the dependencies using pipenv:
Expand All @@ -113,18 +122,6 @@ If you wish to use conda or another virtual environment, you can also install th
pip install -r requirements.txt
```

### Installing custom packages

If you need to install additional Python packages into your environment, you can do so using pip after activating your virtual environment via ```make shell```. For example, to install flash-attention, you would use the following commands:

```bash
make shell
pip install flash-attn --no-build-isolation
pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory=csrc/rotary
```

Alternatively, you can also directly install via ```pipenv install package_name```.

## Run H2O LLM Studio GUI

You can start H2O LLM Studio using the following command:
Expand Down
5 changes: 0 additions & 5 deletions documentation/docs/guide/experiments/experiment-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,6 @@ import PStopp from '../../tooltips/experiments/_top-p.mdx';
import ESgpus from '../../tooltips/experiments/_gpus.mdx';
import ESmixedprecision from '../../tooltips/experiments/_mixed-precision.mdx';
import EScompilemodel from '../../tooltips/experiments/_compile-model.mdx';
import ESusefsdp from '../../tooltips/experiments/_use-fsdp.mdx';
import ESfindunusedparameters from '../../tooltips/experiments/_find-unused-parameters.mdx';
import EStrustremotecode from '../../tooltips/experiments/_trust-remote-code.mdx';
import ESnumofworkers from '../../tooltips/experiments/_number-of-workers.mdx';
Expand Down Expand Up @@ -430,10 +429,6 @@ The settings under each category are listed and described below.

<EScompilemodel/>

### Use FSDP

<ESusefsdp/>

### Find unused parameters

<ESfindunusedparameters/>
Expand Down
1 change: 0 additions & 1 deletion documentation/docs/tooltips/experiments/_use-fsdp.mdx

This file was deleted.

7 changes: 0 additions & 7 deletions llm_studio/python_configs/cfg_checks.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,13 +100,6 @@ def check_for_common_errors(cfg: DefaultConfigProblemBase) -> dict:
"Please use LORA or set Backbone Dtype to float32."
]

# deepspeed related checks
if cfg.environment.use_deepspeed and cfg.environment.use_fsdp:
errors["title"] += ["Deepspeed and FSDP cannot be used at the same time."]
errors["message"] += [
"Deepspeed and FSDP are mutually exclusive. "
"We recommend to disable FSDP which will be deprecated."
]
if cfg.environment.use_deepspeed and cfg.architecture.backbone_dtype in [
"int8",
"int4",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -342,7 +342,6 @@ class ConfigNLPCausalLMEnvironment(DefaultConfig):
mixed_precision: bool = True

compile_model: bool = False
use_fsdp: bool = False
use_deepspeed: bool = False
deepspeed_reduce_bucket_size: int = int(1e6)
deepspeed_stage3_prefetch_bucket_size: int = int(1e6)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,6 @@ class ConfigRLHFLMEnvironment(ConfigNLPCausalLMEnvironment):

def __post_init__(self):
super().__post_init__()
self._visibility["use_fsdp"] = -1
self._visibility["compile_model"] = -1


Expand Down
27 changes: 1 addition & 26 deletions llm_studio/src/utils/modeling_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,6 @@
from deepspeed.utils.zero_to_fp32 import get_fp32_state_dict_from_zero_checkpoint
from peft import LoraConfig, get_peft_model
from torch.cuda.amp import autocast
from torch.distributed.fsdp.fully_sharded_data_parallel import (
FullyShardedDataParallel,
MixedPrecision,
)
from torch.nn.parallel import DistributedDataParallel
from tqdm import tqdm
from transformers import (
Expand Down Expand Up @@ -264,28 +260,7 @@ def wrap_model_distributed(
val_dataloader: torch.utils.data.DataLoader,
cfg: Any,
):
if cfg.environment.use_fsdp:
auto_wrap_policy = None

mixed_precision_policy = None
dtype = None
if cfg.environment.mixed_precision:
dtype = torch.float16
if dtype is not None:
mixed_precision_policy = MixedPrecision(
param_dtype=dtype, reduce_dtype=dtype, buffer_dtype=dtype
)
model = FullyShardedDataParallel(
model,
# sharding_strategy=ShardingStrategy.SHARD_GRAD_OP,
# cpu_offload=CPUOffload(offload_params=True),
auto_wrap_policy=auto_wrap_policy,
mixed_precision=mixed_precision_policy,
device_id=cfg.environment._local_rank,
# use_orig_params=False
limit_all_gathers=True,
)
elif cfg.environment.use_deepspeed:
if cfg.environment.use_deepspeed:
ds_config = get_ds_config(cfg)
model, optimizer, train_dataloader, lr_scheduler = deepspeed.initialize(
model=model,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ environment:
seed: -1
trust_remote_code: true
use_deepspeed: false
use_fsdp: false
experiment_name: solid-spaniel
llm_backbone: facebook/opt-125m
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ environment:
seed: -1
trust_remote_code: true
use_deepspeed: false
use_fsdp: false
experiment_name: solid-spaniel
llm_backbone: MaxJeblick/llama2-0b-unit-test
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ environment:
number_of_workers: 8
seed: -1
trust_remote_code: true
use_fsdp: false
experiment_name: test-causal-language-modeling-oasst
llm_backbone: h2oai/h2ogpt-4096-llama2-7b
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ environment:
number_of_workers: 8
seed: -1
trust_remote_code: true
use_fsdp: false
experiment_name: test-causal-language-modeling-oasst-cpu
llm_backbone: MaxJeblick/llama2-0b-unit-test
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ environment:
seed: -1
trust_remote_code: true
use_deepspeed: false
use_fsdp: false
experiment_name: solid-spaniel
llm_backbone: facebook/opt-125m
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,6 @@ environment:
seed: -1
trust_remote_code: true
use_deepspeed: false
use_fsdp: false
experiment_name: solid-spaniel
llm_backbone: MaxJeblick/llama2-0b-unit-test
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ environment:
number_of_workers: 8
seed: -1
trust_remote_code: true
use_fsdp: false
experiment_name: test-rlhf-language-modeling-oasst
llm_backbone: facebook/opt-125m
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ environment:
number_of_workers: 8
seed: -1
trust_remote_code: true
use_fsdp: false
experiment_name: test-rlhf-language-modeling-oasst
llm_backbone: MaxJeblick/llama2-0b-unit-test
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ environment:
number_of_workers: 8
seed: -1
trust_remote_code: true
use_fsdp: false
experiment_name: test-sequence-to-sequence-modeling-oasst
llm_backbone: t5-small
logging:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ environment:
number_of_workers: 8
seed: -1
trust_remote_code: true
use_fsdp: false
experiment_name: test-sequence-to-sequence-modeling-oasst
llm_backbone: t5-small
logging:
Expand Down
1 change: 0 additions & 1 deletion tests/src/test_data/cfg.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,6 @@ environment:
mixed_precision: true
number_of_workers: 8
seed: -1
use_fsdp: false
experiment_name: test
llm_backbone: EleutherAI/pythia-12b-deduped
logging:
Expand Down
1 change: 0 additions & 1 deletion tests/src/utils/test_load_yaml_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ def test_load_config_yaml():
assert cfg.environment.mixed_precision is True
assert cfg.environment.number_of_workers == 8
assert cfg.environment.seed == -1
assert cfg.environment.use_fsdp is False

assert cfg.logging.logger == "None"
assert cfg.logging.neptune_project == ""
Expand Down
17 changes: 4 additions & 13 deletions train.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
import pandas as pd
import torch
from torch.cuda.amp import GradScaler, autocast
from torch.distributed.fsdp.sharded_grad_scaler import ShardedGradScaler
from torch.utils.data import DataLoader
from tqdm import tqdm
from transformers.deepspeed import HfDeepSpeedConfig
Expand Down Expand Up @@ -178,12 +177,9 @@ def run_train(
Last train batch
"""

scaler: GradScaler | ShardedGradScaler | None = None
scaler: GradScaler | None = None
if cfg.environment.mixed_precision:
if cfg.environment.use_fsdp:
scaler = ShardedGradScaler()
else:
scaler = GradScaler()
scaler = GradScaler()

optimizer.zero_grad(set_to_none=True)

Expand Down Expand Up @@ -427,12 +423,9 @@ def run_train_rlhf(
Last train batch
"""

scaler: GradScaler | ShardedGradScaler | None = None
scaler: GradScaler | None = None
if cfg.environment.mixed_precision:
if cfg.environment.use_fsdp:
scaler = ShardedGradScaler()
else:
scaler = GradScaler()
scaler = GradScaler()

optimizer.zero_grad(set_to_none=True)

Expand Down Expand Up @@ -754,8 +747,6 @@ def run(cfg: Any) -> None:
else:
cfg.environment._seed = cfg.environment.seed

if cfg.environment.use_deepspeed and cfg.environment.use_fsdp:
raise ValueError("Deepspeed and FSDP cannot be used at the same time.")
if (
cfg.architecture.backbone_dtype in ["int8", "int4"]
and cfg.environment.use_deepspeed
Expand Down

0 comments on commit b86709a

Please sign in to comment.