-
-
Notifications
You must be signed in to change notification settings - Fork 533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
License Detection takes forever to complete #3245
Comments
This is gives an idea of the runtime using the latest develop.
|
@pombredanne looking into this more, also hangs for similar time in my case and looks like the issue is here: https://github.com/nexB/scancode-toolkit/blob/develop/src/licensedcode/plugin_license.py#L183 See timing log for reference:
|
Issue seems to be here: scancode-toolkit/src/licensedcode/detection.py Lines 603 to 612 in 6358a4b
see time logs:
For each unique license detection identifier we for loop through all detections twice, and when there are a lot of different types of detections (i.e. like a license repository here) this takes a lot of time. Looking into alternatives now which can fix this issue. |
Using dicts/hashmap here would fix this issue, see time taken for the same scan before and after modifications below: before: refer #3245 (comment)
|
We were iterating over license detections, which was taking forever to complete and this approach uses a dict/hashmap instead which fixes the issue here. Reference: #3245 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com> Reported-by: Philippe Ombredanne <pombredanne@nexb.com>
@pombredanne do you think we should have a test in the CI checking for these kind of issues in the main package/license/copyright scans to guard against scans taking much longer/choking against all PRs? (like say a CI that will fail if the scan time increases by 5/10% more than without the change?) |
Yes! But it should be possible to do this using timings rather than running long tests. |
We were iterating over license detections, which was taking forever to complete and this approach uses a dict/hashmap instead which fixes the issue here. Reference: #3245 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com> Reported-by: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
We were iterating over license detections, which was taking forever to complete and this approach uses a dict/hashmap instead which fixes the issue here. Reference: #3245 Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com> Reported-by: Philippe Ombredanne <pombredanne@nexb.com> Signed-off-by: Ayan Sinha Mahapatra <ayansmahapatra@gmail.com>
Fix choking license detection post-processing #3245
The attached zip with license from https://open.windriver.com/info/uni-license-list/index.html takes forever to complete
unilic-licenses.zip
$ scancode -l --license-text --license-text-diagnostics --yaml - --json-pp ~/tmp/unilic.json --csv ~/tmp/unilic.csv -n6 ~/tmp/unilic/
starts and scans all files and likely chokes when post-processing the codebase?This is a blocker for v32
The text was updated successfully, but these errors were encountered: