Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Set resource limit throws error #190

Closed
cceyda opened this issue Apr 27, 2023 · 6 comments · Fixed by #195
Closed

[Bug] Set resource limit throws error #190

cceyda opened this issue Apr 27, 2023 · 6 comments · Fixed by #195
Labels
bug Something isn't working

Comments

@cceyda
Copy link

cceyda commented Apr 27, 2023

Describe the bug
This is more of a python error honestly.
It is because below doesn't work on some cases.(I am root)

import resource
resource.setrlimit(resource.RLIMIT_STACK, (2**29, -1))

But I couldn't see a recommended python version & other people might also run into it.
this happens during import hidet

Setting resource limit throws

resource.setrlimit(resource.RLIMIT_STACK, (2**29, -1))

ValueError: not allowed to raise maximum limit

/opt/conda/lib/python3.10/site-packages/hidet/__init__.py:15 in <module>                         │
│                                                                                                  │
│   12 """                                                                                         │
│   13 Hidet is an open-source DNN inference framework based on compilation.                       │
│   14 """                                                                                         │
│ ❱ 15 from . import option                                                                        │
│   16 from . import ir                                                                            │
│   17 from . import backend                                                                       │
│   18 from . import utils                                                                         │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/hidet/option.py:112 in <module>                          │
│                                                                                                  │
│   109 │   )                                                                                      │
│   110                                                                                            │
│   111                                                                                            │
│ ❱ 112 register_hidet_options()                                                                   │
│   113                                                                                            │
│   114                                                                                            │
│   115 class OptionContext:                                                                       │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/hidet/option.py:61 in register_hidet_options             │
│                                                                                                  │
│    58                                                                                            │
│    59                                                                                            │
│    60 def register_hidet_options():                                                              │
│ ❱  61 │   from hidet.utils import git_utils                                                      │
│    62 │                                                                                          │
│    63 │   register_option(                                                                       │
│    64 │   │   name='bench_config',                                                               │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/hidet/utils/__init__.py:18 in <module>                   │
│                                                                                                  │
│   15 from . import netron                                                                        │
│   16 from . import transformers_utils                                                            │
│   17 from . import structure                                                                     │
│ ❱ 18 from . import stack_limit                                                                   │
│   19                                                                                             │
│   20 from .py import prod, Timer, repeat_until_converge, COLORS, get_next_file_index, factori    │
│   21 from .py import same_list, strict_zip, index_of, initialize, gcd, lcm, error_tolerance,     │
│                                                                                                  │
│ /opt/conda/lib/python3.10/site-packages/hidet/utils/stack_limit.py:19 in <module>                │
│                                                                                                  │
│   16 import resource                                                                             │
│   17                                                                                             │
│   18 # allow up to 128MB stack space                                                             │
│ ❱ 19 resource.setrlimit(resource.RLIMIT_STACK, (2**29, -1))                                      │
│   20                                                                                             │
│   21 # allow up to 10^5 recursive python calls, increase this when needed                        │
│   22 sys.setrecursionlimit(100000)                         

To Reproduce
pip install hidet
import hidet

Expected behavior
maybe better to try catch & warn.

Enviroment
Using the latest aws container with amazonaws.com/pytorch-training:2.0.0-gpu-py310

  • OS: 20.04.6 LTS (Focal Fossa)
  • Python 3.10.8
  • GPU: [e.g. RTX 3090] Not relevant
  • Others: [e.g. NVIDIA GPU Driver 525.85.12] Not relevant

Additional context

@cceyda cceyda added the bug Something isn't working label Apr 27, 2023
@cceyda cceyda changed the title [Bug] [Bug] Set resource limit throws error Apr 27, 2023
@austinmw
Copy link

got same error

@yaoyaoding
Copy link
Member

Hi @cceyda , thanks for bringing this up!

Hidet uses recursion to traverse the its IR, thus we need a large stack size to process large computation graph and tensor program. This error seems indicate that your operating system has set the hard limit for stack size to a concret value instead of infinity (ulimit -H -s to check).

I have submitted #195 to deal with this case. Now, hidet will not force to set the limit if the hard limit is less than the recommanded size, but emit a warning.

@yaoyaoding
Copy link
Member

yaoyaoding commented Apr 28, 2023

Let me know if this PR does not solve the problem @cceyda @austinmw, thanks!

@cceyda
Copy link
Author

cceyda commented Apr 28, 2023

Indeed ulimit -H -s is set to a value. I'm was running on aws sagemaker (so inside a blind container)
I guess this is something to keep in mind for people using docker containers to maybe set their ulimits to unlimited(or other large number)
I'll test when the nightly package drops

@yaoyaoding yaoyaoding reopened this Apr 28, 2023
@yaoyaoding
Copy link
Member

Hi @cceyda, the nightly build version of hidet should have this PR in it. Could you have a try?

$ pip install --pre --extra-index-url https://download.hidet.org/whl hidet

@cceyda
Copy link
Author

cceyda commented Apr 29, 2023

Just tested, it is working. Thank you ~

@cceyda cceyda closed this as completed Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants