Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inf2 example #2399

Merged
merged 11 commits into from
Jun 16, 2023
Merged

Inf2 example #2399

merged 11 commits into from
Jun 16, 2023

Conversation

namannandan
Copy link
Collaborator

@namannandan namannandan commented Jun 6, 2023

Description

Inferentia2 example based on opt-6.7b model

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Feature/Issue validation/testing

  • Successful test for 125m parameter variant of the opt model
$ cat sample_text.txt 
Today the weather is really nice and I am planning on
$ curl http://127.0.0.1:8080/predictions/opt-125m -T sample_text.txt 
Today the weather is really nice and I am planning on
going through some camping next weekend. I thought if I didn’t
wear rain gear on Saturday maybe I would have to wear a coat but
I am using my wet shoes
  • Successful test for the 6.7b parameter variant of opt model
$ cat sample_text.txt 
Today the weather is really nice and I am planning on
$ curl http://127.0.0.1:8080/predictions/opt-6.7b -T sample_text.txt 
Today the weather is really nice and I am planning on
spending the day in the park and riding my bike to the
store for ice cream. Then I will come home and study
more Spanish. It is raining and sunny here,

Ubuntu and others added 5 commits April 29, 2023 05:19
* fix INF2 example handler

* Add logging for padding in inf2 handler

* update response timeout and model

* Update documentation to show opt-6.7b as the example model

* Update model batch log

---------

Co-authored-by: Naman Nandan <namannan@amazon.com>
@codecov
Copy link

codecov bot commented Jun 7, 2023

Codecov Report

Merging #2399 (897c05c) into master (679b33d) will not change coverage.
The diff coverage is n/a.

❗ Current head 897c05c differs from pull request most recent head e7559e7. Consider uploading reports for the commit e7559e7 to get more accurate results

@@           Coverage Diff           @@
##           master    #2399   +/-   ##
=======================================
  Coverage   72.01%   72.01%           
=======================================
  Files          78       78           
  Lines        3648     3648           
  Branches       58       58           
=======================================
  Hits         2627     2627           
  Misses       1017     1017           
  Partials        4        4           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@namannandan namannandan marked this pull request as ready for review June 7, 2023 21:27
Copy link
Collaborator

@HamidShojanazeri HamidShojanazeri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much @namannandan LGTM

model_name = ctx.model_yaml_config["handler"]["model_name"]

# allocate "tp_degree" number of neuron cores to the worker process
os.environ["NEURON_RT_NUM_CORES"] = str(tp_degree)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make sure neuron has enough number of cores to support tp_degree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe torch-neuronx currently does not have an API that provides the number of available(unallocated) neuron cores. Here, if the required number of neuron cores, i.e tp_degree are not available then the model loading will fail with error of the form:

ERROR  TDRV:db_vtpb_get_mla_and_tpb                 Could not find VNC id 1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that torch-neuronx does have a method to query the number of available unallocated cores using torch_neuronx.xla_impl.data_parallel.device_count(). Updated the handler to verify that the necessary number of cores are available before proceeding with model loading

@namannandan
Copy link
Collaborator Author

Successfully tested the example:

  • on EC2 with Deep Learning AMI Neuron PyTorch 1.13.0 and
  • in docker using DLC 763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py38-sdk2.10.0-ubuntu20.04

@lxning lxning merged commit 4e21262 into master Jun 16, 2023
12 of 13 checks passed
@namannandan namannandan deleted the inf2-example branch November 9, 2023 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants