Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make simple-parsing faster #278

Closed
anivegesana opened this issue Aug 3, 2023 · 4 comments · Fixed by #279 or #285
Closed

Make simple-parsing faster #278

anivegesana opened this issue Aug 3, 2023 · 4 comments · Fixed by #279 or #285

Comments

@anivegesana
Copy link
Contributor

anivegesana commented Aug 3, 2023

Is your feature request related to a problem? Please describe.
Thank you for your work on #276! In the parser that our team wrote, we noticed that most of the time needed to generate the --help menu was spent inside of the inspect.getsource function. I think the issue is that the issue is that caching is done on _get_attribute_docstring, but the majority of the time is spent in inspect.getsource, which has to be done multiple times for the same dataclass since the key of the cache is (dataclass, field_name).

Describe the solution you'd like
Constructing caches for the inspect.getsource, inspect.getdoc, and dp.parse functions dramatically speeds up the construction of the parser. I ran a simple experiment where I didn't change any code of the simple-parsing library and simply prepended the following lines of code to my script:

import functools
import inspect

import docstring_parser as dp

inspect.getsource = functools.lru_cache(2048)(inspect.getsource)
inspect.getdoc = functools.lru_cache(2048)(inspect.getdoc)
dp.parse = functools.lru_cache(2048)(dp.parse)
Construction without monkey patch: 0.902 s
Parsing without monkey patch: 0.323 s
Construction with monkey patch: 0.020 s
Parsing with monkey patch: 0.021 s

Also, you import numpy even if you don't use it. It is possible to import numpy lazily so that --help menus will be more snappy. (I haven't checked to see if this is a major source of unnecessary waiting, but, on my machine, it seems to take up ~0.105s/0.211s to construct the parser and parse the arguments.This may not may not also apply to yaml.)

To lazily load numpy, you can create a module called lazy_numpy.py and use the following code:

def __getattr__(attr):
    import numpy as np
    globals().update(vars(np))
    return getattr(np, attr)

def __dir__():
    import numpy as np
    globals().update(vars(np))
    return dir(np)

Importing lazy_numpy does not import numpy but accessing any attribute within lazy_numpy does.

Describe alternatives you've considered
I haven't tried other locations that caching could be improved. I just think that, if three lines of code can speed things up this dramatically, might as well add them.

Additional context
I can open a PR, but I do not have a good idea on how to implement test cases for this.

@lebrice
Copy link
Owner

lebrice commented Aug 3, 2023

Thanks a lot @anivegesana , You're more than welcome to submit a PR for the caching, I think the current tests are sufficient in convering the docstring creation, I dont think additional tests are needed.

lebrice added a commit that referenced this issue Aug 3, 2023
Fixes #278

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
lebrice added a commit that referenced this issue Aug 3, 2023
Fixes #278

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
lebrice added a commit that referenced this issue Aug 3, 2023
Fixes #278

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
lebrice added a commit that referenced this issue Aug 3, 2023
* Add pytest-benchmark test dependency

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Add performance files for before the changes

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Use cached versions of inspect.getdoc/getsource

Fixes #278

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Make numpy import lazy

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Simplify the benchmark code a bit

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Remove .benchmarks file from git history

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Playing around with GitHub actions for benchmark

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Remove upload workflow, add benchmark

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Tweak benchmark.yml and build.yml

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

* Only run workflow on push to master

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>

---------

Signed-off-by: Fabrice Normandin <normandf@mila.quebec>
@lebrice
Copy link
Owner

lebrice commented Aug 8, 2023

Hey @anivegesana , just FYI, I did make the numpy import lazy, and added the lru_cache to all the inspect functions in #279

@anivegesana
Copy link
Contributor Author

Thank you! I noticed that my --help menu was much faster with yesterday's release. I think that there is a place that you forgot to remove an import for numpy: anivegesana@153eb03

Thank you for your awesome work!

@lebrice lebrice reopened this Aug 8, 2023
@lebrice
Copy link
Owner

lebrice commented Aug 8, 2023

Oops. Thanks for pointing it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants