Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a code matcher for approximate files matching #342

Closed
1 task done
Tracked by #239
pombredanne opened this issue Mar 12, 2024 · 1 comment
Closed
1 task done
Tracked by #239

Create a code matcher for approximate files matching #342

pombredanne opened this issue Mar 12, 2024 · 1 comment
Assignees
Labels
high priority High Priority

Comments

@pombredanne
Copy link
Member

pombredanne commented Mar 12, 2024

See also:

@pombredanne pombredanne mentioned this issue Mar 12, 2024
3 tasks
@pombredanne pombredanne added the high priority High Priority label Apr 3, 2024
JonoYang added a commit that referenced this issue Apr 24, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Apr 24, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue Apr 25, 2024
    * Fix tests

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 10, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 10, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 10, 2024
    * Fix tests

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 10, 2024
    * Move match.py to match_test_utils.py as we are now just using those functions for testing match functionality than actually using it for matching

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 11, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 14, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 14, 2024
    * Use new test data

Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 14, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 14, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 14, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 14, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 14, 2024
Signed-off-by: Jono Yang <jyang@nexb.com>
JonoYang added a commit that referenced this issue May 15, 2024
@JonoYang
Copy link
Contributor

JonoYang commented May 16, 2024

This has been merged into main. We have created a new table for storing approximate file fingerprints and updated the indexing functions to index those values from scans. We have also updated the fingerprinting functions in scancode-toolkit to also generate fingerprints for text files. With these changes, we now able to perform approximate file matching in the matching pipeline run by matchcode. https://github.com/nexB/purldb/blob/main/matchcode_pipeline/pipelines/matching.py#L109

Test instructions after installing and running PurlDB with it's accompanying ScanCode.io worker, and MatchCode.io:

  1. Index the package pkg:npm/deep-equal@1.0.1 using the api/collect/index_packages endpoint
  2. Using the api/matching/ endpoint, upload the test file https://github.com/nexB/purldb/blob/main/matchcode/tests/testfiles/match/approximate-file-matching/index-modified.js
  3. The package data for pkg:npm/deep-equal@1.0.1 should be present at api/matching/<uuid>/results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority High Priority
Projects
Status: Done
Development

No branches or pull requests

2 participants