Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complexity in validate SPDX ID causing slowdown #742

Closed
lumjjb opened this issue Aug 1, 2023 · 7 comments
Closed

Complexity in validate SPDX ID causing slowdown #742

lumjjb opened this issue Aug 1, 2023 · 7 comments

Comments

@lumjjb
Copy link
Contributor

lumjjb commented Aug 1, 2023

There is quite a bit of slowdown in the validation routine of the SPDX document, one potentially offender seems to be this function which loads in the entire list of IDs in a document over and over for each call, with a linear search for the ID each time.

def is_spdx_id_present_in_document(spdx_id: str, document: Document) -> bool:
all_spdx_ids_in_document: List[str] = get_list_of_all_spdx_ids(document)
return spdx_id in all_spdx_ids_in_document

This came up due to slowdown when running ntia-checker

  133911027 function calls (133748641 primitive calls) in 31.696 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    585/1    0.007    0.000   31.700   31.700 {built-in method builtins.exec}
        1    0.000    0.000   31.700   31.700 ntia-checker:1(<module>)
        1    0.000    0.000   31.454   31.454 main.py:42(main)
        1    0.000    0.000   31.446   31.446 sbom_checker.py:19(__init__)
        1    0.000    0.000   30.759   30.759 document_validator.py:19(validate_full_spdx_document)
        1    0.007    0.007   25.766   25.766 relationship_validator.py:12(validate_relationships)
     3996    0.035    0.000   25.758    0.006 relationship_validator.py:22(validate_relationship)
    15428    0.080    0.000   25.724    0.002 spdx_id_validators.py:46(validate_spdx_id)
     7992    0.363    0.000   25.589    0.003 spdx_id_validators.py:25(is_spdx_id_present_in_document)
     7993    0.067    0.000   25.229    0.003 spdx_id_validators.py:31(get_list_of_all_spdx_ids)
     7993    0.032    0.000   24.981    0.003 document_utils.py:11(get_contained_spdx_element_ids)
     7993    8.097    0.001   23.264    0.003 document_utils.py:12(<listcomp>)
 59639927    9.758    0.000   16.273    0.000 dataclass_with_properties.py:46(get_field)
 60112853    6.556    0.000    6.556    0.000 {built-in method builtins.getattr}
        1    0.000    0.000    4.771    4.771 package_validator.py:22(validate_packages)
      435    0.002    0.000    4.771    0.011 package_validator.py:36(validate_package_within_document)
     7871    0.019    0.000    4.748    0.001 license_expression_validator.py:26(validate_license_expression)
      148    0.054    0.000    3.700    0.025 __init__.py:812(get_spdx_licensing)
      148    0.001    0.000    3.004    0.020 __init__.py:860(build_spdx_licensing)
<TRUNCATED>

Ask:

Could there be a function that would be able to do this on multiple invocations that uses a dictionary?

@maxhbr
Copy link
Member

maxhbr commented Aug 11, 2023

Thanks for this analysis and identification of the cause. This helps!

@armintaenzertng
Copy link
Collaborator

Hey @lumjjb, I just tried replacing the offending list with a set (that additionally only gets computed once instead of every time the function is called), but could not observe any significant speed-up in the validation process.

Do you have any other ideas which parts of the code could be sped up? I'll keep you updated if I find anything! :)

@armintaenzertng
Copy link
Collaborator

Also, can you share your SBOM that you used? Your relationship validation seems to take a lot of time, which it does not for me.

@armintaenzertng
Copy link
Collaborator

@lumjjb, I found the cause that led to massive performance issues on my end and fixed it in #749.
I'm not sure, though, if this is related to your issue which seems to lie more in the relationship validation. Please test/review #749 to see if this helps and send me your SBOM if not! :)

@lumjjb
Copy link
Contributor Author

lumjjb commented Aug 22, 2023

Awesome! Let me try this out and let you know! Thanks for diving into that! Sorry i cant share the SBOM :(. Let me do some testing again and report back!

@armintaenzertng
Copy link
Collaborator

Hey @lumjjb, how did your testing go? Is there still an issue or can this be closed?

@lumjjb
Copy link
Contributor Author

lumjjb commented Sep 28, 2023

Yea this is good now! Can close this up! Thanks @armintaenzertng !

@maxhbr maxhbr closed this as completed Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants