Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document matching algorithm interface #593

Open
tschmidtb51 opened this issue Nov 21, 2022 · 0 comments
Open

Document matching algorithm interface #593

tschmidtb51 opened this issue Nov 21, 2022 · 0 comments
Assignees

Comments

@tschmidtb51
Copy link
Contributor

tschmidtb51 commented Nov 21, 2022

We need to document the matching algorithm interface for CSAF asset matching systems and CSAF SBOM matching systems. Both work similar:

Input

match(product_tree, asset_database_connection, matching_threshold)

 - resp. - 
 
match(product_tree, sbom_database_connection, matching_threshold)

Output

for each product_id in product_tree:
   a list of tuples (asset_id, probability, matching_reason)

- resp -

for each product_id in product_tree:
   a list of tuples (sbom_component_id, probability, matching_reason)

Matching

Algorithm - priorities:

  1. Match based on product_identification_helper. Different ones might imply a different confidence: An sbom_url or serial_number might be stronger than a cpe.
  2. Match based on the categorized strings (value of name) in the branches (e. g. vendor, product_name, product_version).
  3. Match on the human-readable full_product_name_t/name.

The algorithm may end after it created a sufficient result - it can, but does not have to go through all steps.

Edit: The experience shows, we also want to provide a matching_threshold that allows us to fine tune what the lowest probability is that we get results for (a matching_threshold of 0 would give for each asset/SBOM component the probability that it matched with (which might be 0 if those are completely different)) and the matching_reason which provides insights into the confidence and helps debugging (a direct match on a serial number would potentially better than a match on the human-readable string).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant