Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue #39

Closed
alexespencer opened this issue Sep 30, 2020 · 8 comments
Closed

Performance issue #39

alexespencer opened this issue Sep 30, 2020 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@alexespencer
Copy link

Python version
3.7

Package version
0.18.0 vs 0.21.0

Current behavior (bug description)
0.21.0 is up to 30 times slower in some instances than 0.18.0 for "large" dictionaries

Expected behavior
Performance is equal or better in 0.21.0

MicrosoftTeams-image (2)

Code to reproduce:
import benedict
from benedict import benedict as ben
import logging
import os
logger = logging.getLogger(os.path.basename(file))
logger.setLevel(logging.INFO)

formatter = logging.Formatter('%(asctime)s : [%(levelname)s] : %(name)s : %(message)s')
log_file_handler = logging.StreamHandler()
log_file_handler.setFormatter(formatter)
logger.addHandler(log_file_handler)

Create large nested dictionary

test = ben({})

logger.info(f"Starting test with python-benedict version: {benedict.version}")
for i in range(0, 500):
for j in range(0, 100):
test.set(f"{i}.{j}", f"text-{i}-{j}")

Access multiple elements with a few missing element paths

for i in range(0, 550):
for j in range(0, 110):
test.get(f"{i}.{j}", None)

logger.info("End")

@alexespencer alexespencer added the bug Something isn't working label Sep 30, 2020
@alexespencer
Copy link
Author

In the screenshot above, you can see that 0.21.0 took 60 seconds ish to run. The 0.18.0 code took just 2 seconds. Code to reproduce this is attached above. I just used python 3.7, and in two different environs had 0.21.0 and 0.18.0. This was spotted during a huge performance slow down in our kubernetes cluster - I have now locked the version (should have done that before, that's on me) - but thought you would want to know about this in case it's fixable.

@alexespencer
Copy link
Author

I removed the "get" of missing elements and it still takes a long time - I wondered if it might just be that bit that's slow, but it still took the same time. Ideally the timer would only start after the dictionary has been created too - so I reran the test and 0.21.0 took 30 seconds - so perhaps both setting and getting is affected?

@fabiocaccamo
Copy link
Owner

@alexespencer thank you for reporting this, I'm confident that this slowdown is caused by the base layer introduced here.

@alexespencer
Copy link
Author

@fabiocaccamo was that from 0.20.0? I just double checked 0.19.0 and that's all good, the slow down starts from 0.20.0

@fabiocaccamo
Copy link
Owner

fabiocaccamo commented Sep 30, 2020

Yes, it has been released in 0.20.0 
Do you pass an existing dict to the constructor or you start with an empty dict?

@alexespencer
Copy link
Author

@fabiocaccamo in the test code above I start with an empty one, passing "benedict({})" to initialise it, then populate via .set. However, in our production code we usually read JSON into a dict, then pass that in.

@fabiocaccamo
Copy link
Owner

@alexespencer I figured out the problem, it is the automatic casting to benedict of retrieved dict values.

The problem occurs because some internal utilities call methods like __getitem__ or get (for example this one) that potentially could return dict values that will be casted to benedict.

fabiocaccamo added a commit that referenced this issue Sep 30, 2020
@fabiocaccamo
Copy link
Owner

@alexespencer You can upgrade to 0.21.1 version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants