Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Better memory estimation for NLP models #568

Merged
merged 11 commits into from
Nov 6, 2023

Conversation

valeriy42
Copy link
Contributor

@valeriy42 valeriy42 commented Jul 20, 2023

This PR adds an ability to estimate per deployment and per allocation memory usage of NLP transformer models. It uses torch.profiler and performs logs the peak memory usage during the inference.

This information is then used in Elasticsearch to provision models with sufficient memory (elastic/elasticsearch#98874).

@valeriy42 valeriy42 changed the title Nlp-memory-estimation-2 [ML] Better memory estimation for NLP models Jul 20, 2023
@valeriy42 valeriy42 requested a review from qherreros July 20, 2023 11:11
@valeriy42
Copy link
Contributor Author

@qherreros , since I ported your code for memory estimation, would you mind looking at the relevant part in transformers.py?

Copy link

@qherreros qherreros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a minor comment.

eland/ml/pytorch/transformers.py Outdated Show resolved Hide resolved
@valeriy42 valeriy42 marked this pull request as ready for review August 7, 2023 13:45
@valeriy42 valeriy42 self-assigned this Oct 26, 2023
@valeriy42
Copy link
Contributor Author

@davidkyle could you please look at the PR since you have the best overview of the transformers.py. 🙏

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Please add an assertion to the model config creation tests that these new settings are present and have sensible values

https://github.com/elastic/eland/blob/main/tests/ml/pytorch/test_pytorch_model_config_pytest.py#L149

.buildkite/run-elasticsearch.sh Show resolved Hide resolved
eland/ml/pytorch/transformers.py Outdated Show resolved Hide resolved
tests/ml/pytorch/test_pytorch_model_upload_pytest.py Outdated Show resolved Hide resolved
@valeriy42
Copy link
Contributor Author

Thank you for the review @davidkyle . I addressed your comments, it would be great if you could take another look.

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidkyle
Copy link
Member

@pquentin do you know why the read the docs build has started failing

I notice the build is using Python 3.12.0 which could have something to do with it

Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [33 lines of output]
      Traceback (most recent call last):
        File "/home/docs/checkouts/readthedocs.org/user_builds/eland/envs/568/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/home/docs/checkouts/readthedocs.org/user_builds/eland/envs/568/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/docs/checkouts/readthedocs.org/user_builds/eland/envs/568/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 112, in get_requires_for_build_wheel
          backend = _build_backend()
                    ^^^^^^^^^^^^^^^^
        File "/home/docs/checkouts/readthedocs.org/user_builds/eland/envs/568/lib/python3.12/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 77, in _build_backend
          obj = import_module(mod_path)
                ^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/docs/.asdf/installs/python/3.12.0/lib/python3.12/importlib/__init__.py", line 90, in import_module
          return _bootstrap._gcd_import(name[level:], package, level)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "<frozen importlib._bootstrap>", line 1381, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1304, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "<frozen importlib._bootstrap>", line 1381, in _gcd_import
        File "<frozen importlib._bootstrap>", line 1354, in _find_and_load
        File "<frozen importlib._bootstrap>", line 1325, in _find_and_load_unlocked
        File "<frozen importlib._bootstrap>", line 929, in _load_unlocked
        File "<frozen importlib._bootstrap_external>", line 994, in exec_module
        File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
        File "/tmp/pip-build-env-4u84dhia/overlay/lib/python3.12/site-packages/setuptools/__init__.py", line 16, in <module>
          import setuptools.version
        File "/tmp/pip-build-env-4u84dhia/overlay/lib/python3.12/site-packages/setuptools/version.py", line 1, in <module>
          import pkg_resources
        File "/tmp/pip-build-env-4u84dhia/overlay/lib/python3.12/site-packages/pkg_resources/__init__.py", line 2172, in <module>
          register_finder(pkgutil.ImpImporter, find_on_path)
                          ^^^^^^^^^^^^^^^^^^^
      AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

@pquentin
Copy link
Member

pquentin commented Nov 5, 2023

@pquentin do you know why the read the docs build has started failing

I notice the build is using Python 3.12.0 which could have something to do with it

Yes, we were using the latest Python version supported by Read the Docs and it recently became Python 3.12. It was failing to build numpy because we pin it to an older version that does not support Python 3.12. #627 fixes this by asking Python 3.10 explicitly.

@davidkyle
Copy link
Member

Thanks for fixing the docs build @pquentin

@valeriy42 valeriy42 merged commit 6cecb45 into elastic:main Nov 6, 2023
3 of 4 checks passed
@valeriy42 valeriy42 deleted the nlp-memory-estimation-2 branch November 6, 2023 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants