Skip to content
This repository was archived by the owner on Aug 5, 2025. It is now read-only.

Conversation

@kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Mar 31, 2023

Motivation

To support the following code using PiPPy's pipelined forward method:

# OPT generate
tokenizer = AutoTokenizer.from_pretrained(args.model_name)
prompt = "Hey, are you consciours? Can you talk to me?"
input = tokenizer(prompt, return_tensors="pt")
input_ids = input["input_ids"].to(args.device)
outputs = model.generate(input_ids, max_length=30)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

Enabler

We added a util that:

# Inject pipeline driver's forward function back to original model
inject_pipeline_forward(model, pipe_driver)

Test

$ cd examples/inference

# This runs OPT inference on 4 GPUs
$ torchrun --nproc_per_node=4 opt_generate.py

Hey, are you consciours? Can you talk to me?
I'm not consciours, but I can talk to you.

Major changes:

  • Add OPT generate example
  • Add safety net to filter unexpected runtime arguments
  • Add inject_pipeline_forward util

Add OPT generate example
Add safety net to filter unexpected runtime arguments
Add inject_pipeline_forward util
@kwen2501 kwen2501 merged commit 89c7c2c into main Apr 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants