Adds neuron support#486
Conversation
There was a problem hiding this comment.
PR Summary
This PR adds AWS Neuron support to the Infinity project, enabling deployment on AWS Inferentia hardware through Docker containers and ECS tasks with optimized model inference.
- Added
NeuronOptimumEmbedderin/libs/infinity_emb/infinity_emb/transformer/embedder/neuron.pywith dynamic batch size support and proper core detection - Added AWS Neuron base Dockerfile
/infra/aws_neuron/Dockerfile.basewith neuronx packages and runtime configuration - Added deployment instructions in
/infra/aws_neuron/README.mdfor EC2 and ECS with Huggingface AMI - Added proper device mounting and IPC configuration in ECS task definition for Neuron accelerator access
- Added version-pinned dependencies in
reqs_frozen.txtfor reproducible Neuron builds
7 file(s) reviewed, 19 comment(s)
Edit PR Review Bot Settings | Greptile
| # RUN pip3 install \ | ||
| # neuronx-cc==2.15.143.0 \ | ||
| # torch-neuronx==2.1.2.2.3.2 \ | ||
| # transformers-neuronx==0.12.313 \ | ||
| # libneuronxla==2.0.5347.0 \ | ||
| # --extra-index-url=https://pip.repos.neuron.amazonaws.com |
There was a problem hiding this comment.
style: Remove commented out code block - it's obsolete and could cause confusion
| RUN pip3 install --upgrade \ | ||
| neuronx-cc==2.* \ | ||
| libneuronxla==2.0.5347.0 \ | ||
| torch-neuronx==2.1.2.2.2.0 \ |
There was a problem hiding this comment.
logic: torch-neuronx version 2.1.2.2.2.0 is older than the commented out version 2.1.2.2.3.2 above - verify this downgrade was intentional
| # COPY reqs_frozen.txt reqs_frozen.txt | ||
| # RUN pip3 install -r reqs_frozen.txt | ||
| # Install optimum-neuron | ||
| #14 19.70 Successfully installed aiohappyeyeballs-2.4.4 aiohttp-3.11.9 aiosignal-1.3.1 async-timeout-5.0.1 attrs-24.2.0 coloredlogs-15.0.1 datasets-3.1.0 dill-0.3.8 frozenlist-1.5.0 fsspec-2024.9.0 humanfriendly-10.0 multidict-6.1.0 multiprocess-0.70.16 optimum-1.18.0 optimum-neuron-0.0.1 pandas-2.2.3 propcache-0.2.1 pyarrow-18.1.0 pytz-2024.2 requests-2.32.3 sentencepiece-0.2.0 tokenizers-0.15.2 transformers-4.39.3 tzdata-2024.2 xxhash-3.5.0 yarl-1.18.3 | ||
| # RUN pip3 install optimum[neuronx] --extra-index-url=https://pip.repos.neuron.amazonaws.com |
There was a problem hiding this comment.
style: Remove commented installation commands and build output logs
| ENV PATH="/opt/bin/:/opt/aws/neuron/bin:${PATH}" | ||
|
|
||
| FROM neuron AS infinity | ||
| RUN apt-get update -y && apt-get install -y nano |
There was a problem hiding this comment.
style: Installing nano editor increases image size unnecessarily - remove if not critical for production
| # Is an mirror of | ||
| # 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuronx:2.1.2-transformers4.43.2-neuronx-py310-sdk2.20.0-ubuntu20.04 |
There was a problem hiding this comment.
syntax: typo in comment: 'Is an mirror' should be 'Is a mirror'
| optimum[neuronx]==1.22.0 | ||
| orjson==3.10.7 | ||
| overrides==7.7.0 | ||
| packaging |
There was a problem hiding this comment.
logic: packaging package is missing version pin, which could cause dependency conflicts
| hf-transfer | ||
| httptools==0.6.4 ; python_version >= "3.9" and python_version < "4" | ||
| huggingface-hub |
There was a problem hiding this comment.
logic: hf-transfer and huggingface-hub packages missing version constraints. Add specific versions to ensure reproducible builds.
| mpmath==1.3.0 ; python_version >= "3.9" and python_version < "4" | ||
| multidict==6.1.0 ; python_version >= "3.9" and python_version < "4" | ||
| multiprocess==0.70.15 ; python_version >= "3.9" and python_version < "4" | ||
| numpy |
There was a problem hiding this comment.
logic: numpy package missing version constraint. Should be pinned to ensure compatibility with other dependencies.
| scikit-learn | ||
| scipy |
There was a problem hiding this comment.
logic: scikit-learn and scipy packages missing version constraints. Should be pinned for reproducibility.
| aiohappyeyeballs==2.4.3 ; python_version >= "3.9" and python_version < "4" | ||
| aiohttp==3.10.10 ; python_version >= "3.9" and python_version < "4" | ||
| aiosignal==1.3.1 ; python_version >= "3.9" and python_version < "4" | ||
| async-timeout==4.0.3 ; python_version >= "3.9" and python_version < "3.11" |
There was a problem hiding this comment.
style: async-timeout restricted to Python < 3.11 while other packages support up to Python 4. May cause issues with Python 3.11+ environments.
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #486 +/- ##
==========================================
- Coverage 79.54% 79.48% -0.06%
==========================================
Files 41 41
Lines 3422 3422
==========================================
- Hits 2722 2720 -2
- Misses 700 702 +2 ☔ View full report in Codecov by Sentry. |
Related Issue
Checklist
Additional Notes
Add any other context about the PR here.