Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a Heimdall HDF < -> SARIF converter #2286

Open
michaelcfanning opened this issue Feb 9, 2021 · 26 comments
Open

Implement a Heimdall HDF < -> SARIF converter #2286

michaelcfanning opened this issue Feb 9, 2021 · 26 comments

Comments

@michaelcfanning
Copy link
Member

We should investigate a converter both to send SARIF to HDF and to drop HDF back down to SARIF.

https://heimdall-lite.mitre.org/
https://saf.cms.gov/#/
https://www.youtube.com/watch?v=Vgr5wR1SFuA

@michaelcfanning
Copy link
Member Author

@eddynaka @yongyan-gh

@josepalafox
Copy link

These may also be useful links that were shared:

https://github.com/mitre/inspecjs/tree/master/schemas

is their schema.

Their suggestion is to write the mapping into this action:

https://github.com/mitre/inspec_tools_action

@aaronlippold
Copy link

I would say your first targets for HDF -> SARIF should be limited to the current heimdall-tools static analysis tools

@michaelcfanning
Copy link
Member Author

My first question for everyone, which direction serves everyone's interests best? HDF -> SARIF? or SARIF -> HDF? Both?

On the HDF repo, I note they reference a small eco-system of converters from existing tool formats, including at least one (Fortify) for which we have some SARIF support. A SARIF -> HDF converter contributed here could help build out their eco-system considerably, so that looks very useful. That is, any tool which produces SARIF directly or for which we have a native format to SARIF converter could in theory be transformed and sent to the Heimdall viewer/comparison tools and other tech.

This repo doesn't cover the direct HDF producers, which includes InSpec and who else? I think your comment above is to note that we should focus on converting these first? Is the idea that by implementing a HDF -> SARIF conversion we will accelerate getting these into GHAS?

@aaronlippold
Copy link

SARIF->HDF

@aaronlippold
Copy link

We get all the value from the tools that support SARIF and we add the value of aligning to 800-53 controls which makes all those tools even more valuable to all government customers, and ... we have a nice pretty viewer already made for it :)

@aaronlippold
Copy link

Actually, we suggest adding a SARIF converter into heimdall-tools.mitre.org which that action consumes

@aaronlippold
Copy link

https://github.com/mitre/heimdall_tools/blob/master/lib/heimdall_tools/ provides many examples of our source -> hdf pattern.

@josepalafox
Copy link

josepalafox commented Feb 9, 2021

My first question for everyone, which direction serves everyone's interests best? HDF -> SARIF? or SARIF -> HDF? Both?

My understanding is our first target is SARIF -> HDF. This unblocks the opportunity we're focused on as it sends GHAS security data to the HDF data visualization tool.

On the HDF repo, I note they reference a small eco-system of converters from existing tool formats, including at least one (Fortify) for which we have some SARIF support. A SARIF -> HDF converter contributed here could help build out their eco-system considerably, so that looks very useful. That is, any tool which produces SARIF directly or for which we have a native format to SARIF converter could in theory be transformed and sent to the Heimdall viewer/comparison tools and other tech.

This repo doesn't cover the direct HDF producers, which includes InSpec and who else? I think your comment above is to note that we should focus on converting these first? Is the idea that by implementing a HDF -> SARIF conversion we will accelerate getting these into GHAS?

These two links were provided because we asked for a reference to the HDF schema to understand what it looks like and what we may be missing in SARIF. If you have another source for this, all good.

The they mentioned they had a Gem for https://github.com/mitre/inspec_tools that already has a GH action. The explanation of the tool is that it just munges data and that could be an appropriate place to add SARIF -> HDF.

@aaronlippold
Copy link

Once we can go that way, we will have a much better picture of what makes sense going HDF -> SARIF

@michaelcfanning
Copy link
Member Author

michaelcfanning commented Feb 9, 2021

I definitely like your Heimdall viewer and profile differ! Great stuff. The SARIF value, of course, is directed more towards the developer experience. i.e., Visual Studio, and VS Code and the GHAS UX are pretty nice viewers, too. :) The SARIF format emphasis here (not sure the degree to which HDF allows this) is to transport additional context, like code snippets, complete source files, or references to enlistment/branch details to seamlessly allow developers to definitively diagnose and then 'jump into' a remediation experience (i.e., actually start coding a fix). The HDF experience seems more strongly aligned around reporting out, i.e., conformance to compliance standards, progress since last profile, etc. And so, it does look like SARIF -> HDF is the right direction, as we can drop the supporting diagnostics related to individual issues and get to the bucketing/filtering/etc. in the HDF visualizer.

Please correct any of my worldview above that requires fixing. :)

In case it isn't clear from my replies so far, I've got the link to the JSON schema, thanks for that. I also took a look at your converter repo and it answers an open question of mine: how to map a native tool's rule ids to CWE/HDF-compliant controls. It looks like you maintain these mappings in the repo as CSV consumed at conversion time.

One interesting thing about SARIF is that it supports expressing those mappings as SARIF files (which only hold this kind of classification/mapping data, which SARIF calls 'taxonomies'). These mappings can also be referenced within a SARIF log indirectly via a URL (reflecting the fact that this data might be maintained by someone other than the tool provider). So, a SARIF file could reference a remote description of CWEs, or your NIST codes or any other organizational schema and then decorate its own rules with statements like 'this rule id maps directly to XXX'. GrammaTech contributed this feature based on its prior work and so it has some sophistication to it. E.g., you can say, 'this rule of my tool is a superset of this other taxonomy's rule XXX' or simply note that two things relate. All of this is to say that what you are capturing in CSV here might be useful to formalize in a subset of SARIF JSON that is published on the web. The spec examples are mostly oriented around Mitre's CWE schema.

This feature allows direct SARIF producers to emit relevant mappings such that we could perform the SARIF->HDF conversion strictly from the data in the SARIF log. Without it, someone would need to maintain this mapping externally, as you appear to do today, in whatever format is appropriate (CSV or web-hosted SARIF).

This is likely getting into more detail on a single topic than serves this thread, though. :) Maybe I can take a day or two to continue to explore and if I could get on a call with an appropriate audience, we can plan a path forward. ?

@aaronlippold
Copy link

I definitely like your Heimdall viewer and profile differ! Great stuff. The SARIF value, of course, is directed more towards the developer experience. i.e., Visual Studio, and VS Code and the GHAS UX are pretty nice viewers, too. :) The SARIF format emphasis here (not sure the degree to which HDF allows this) is to transport additional context, like code snippets, complete source files, or references to enlistment/branch details to seamlessly allow developers to definitively diagnose and then 'jump into' a remediation experience (i.e., actually start coding a fix). The HDF experience seems more strongly aligned around reporting out, i.e., conformance to compliance standards, progress since last profile, etc. And so, it does look like SARIF -> HDF is the right direction, as we can drop the supporting diagnostics related to individual issues and get to the bucketing/filtering/etc. in the HDF visualizer.

At least for the first ittera

Please correct any of my worldview above that requires fixing. :)

In case it isn't clear from my replies so far, I've got the link to the JSON schema, thanks for that. I also took a look at your converter repo and it answers an open question of mine: how to map a native tool's rule ids to CWE/HDF-compliant controls. It looks like you maintain these mappings in the repo as CSV consumed at conversion time.

Right, so for the first itteration we go SARIF -> HDF which allows us to get a hanndle on an agreed mapping of 800-53 controls to known issues ( CWE, etc) which @ejaronne can help with.

One interesting thing about SARIF is that it supports expressing those mappings as SARIF files (which only hold this kind of classification/mapping data, which SARIF calls 'taxonomies'). These mappings can also be referenced within a SARIF log indirectly via a URL (reflecting the fact that this data might be maintained by someone other than the tool provider). So, a SARIF file could reference a remote description of CWEs, or your NIST codes or any other organizational schema and then decorate its own rules with statements like 'this rule id maps directly to XXX'. GrammaTech contributed this feature based on its prior work and so it has some sophistication to it. E.g., you can say, 'this rule of my tool is a superset of this other taxonomy's rule XXX' or simply note that two things relate. All of this is to say that what you are capturing in CSV here might be useful to formalize in a subset of SARIF JSON that is published on the web. The spec examples are mostly oriented around Mitre's CWE schema.

Yes, MITRE maintains the CWE and CVE databases, and we could enguage that team to help us with the alignment. I have also talked to them before about putting the 800-53 mapping directly in those data sources so perhaps we can circle back around to that.

This feature allows direct SARIF producers to emit relevant mappings such that we could perform the SARIF->HDF conversion strictly from the data in the SARIF log. Without it, someone would need to maintain this mapping externally, as you appear to do today, in whatever format is appropriate (CSV or web-hosted SARIF).

This is likely getting into more detail on a single topic than serves this thread, though. :) Maybe I can take a day or two to continue to explore and if I could get on a call with an appropriate audience, we can plan a path forward. ?

So the breakdown in my thought would be

phase 0: Control mappings, data element mapping between the formats and first cut HDF output
phase 1: inverse mappings and data element apignment, then mapping from HDF to SARIF - informed by phase 0
phase 2: itteration and adjustment on phase 0 and 1

In addition, we have to continue the conversation of user communication of the 'relationships and buckets' of what this data shows and informs, and what it can't.

What do we think?

@shaopeng-gh
Copy link
Collaborator

Update, I have create a PR to Heimdall repo, with the code for "SARIF --> HDF converter".
mitre/heimdall_tools#93

the basic test I have done is using the sample Flawfinder CSV, to convert to SARIF, and then use this new tool to convert the SARIF to HDF, the result HDF file can be loaded in https://heimdall-lite.mitre.org/

@eddynaka
Copy link
Collaborator

eddynaka commented Jun 10, 2021

Hello,

just a quick update:

  1. we implemented the SARIF->HDF converter in the heimdall repository:
  1. we implemented the HDF->SARIF converter in the sarif-sdk repository:

With those points above, we implemented the complete HDF -> SARIF and SARIF -> HDF.

@aaronlippold
Copy link

aaronlippold commented Jun 10, 2021 via email

@eddynaka
Copy link
Collaborator

Hi @aaronlippold ,

(1) we added in the README just like the other tools:

sarif_mapper - static analysis results interchange format

(2) for the sarif-sdk, I will check.
(3) I didn't understand your point in check text vs fix text. Can you explain?

thanks for the reply :)

@Bialogs
Copy link

Bialogs commented Jun 10, 2021

@eddynaka He means linking to the HDF->SARIF converter in this repo from HDF README. I will take care of it.

@michaelcfanning
Copy link
Member Author

Great progress, everyone, and we've roughly knocked out our early proposed work. I wonder, should we get on a call and discuss how to build on it? I'd be glad to set that up.

@aaronlippold
Copy link

aaronlippold commented Jun 11, 2021 via email

@aaronlippold
Copy link

aaronlippold commented Jun 11, 2021 via email

@eddynaka
Copy link
Collaborator

Hello,

thanks for everyone's time!
Below, the workflow converting HDF->SARIF->Upload to GitHub:
https://github.com/eddynaka/hdf-sarif-github/blob/main/.github/workflows/hdf-to-github.yml

@candrews
Copy link
Collaborator

candrews commented Mar 2, 2023

Below, the workflow converting HDF->SARIF->Upload to GitHub:
https://github.com/eddynaka/hdf-sarif-github/blob/main/.github/workflows/hdf-to-github.yml

This link is broken :(

How does this workflow work? I'd really like to do HDF->SARIF->Upload to GitHub, but I've been unable to find out how to do so.

@aaronlippold
Copy link

I'm not sure where that code ran off to, but the SAF CLI tool should still have the conversion of an HDF to SARIF.

We also have a SAF cli gh action

The actual upload into GitHub advanced security would be something we'd likely have to work on together to basically find the right API push

Hopefully the author will respond back and safe us the effort :-)

@yongyan-gh
Copy link
Collaborator

You can add below steps to GitHub workflow to convert HDF to SARIF and upload to GHAS.

  1. install SARIF Multitool (CLI)
      - name: Install Sarif Multitool package
        run: dotnet tool install --global Sarif.Multitool
  1. Run SARIF Multitool to convert HDF log to SARIF log:
      - name: Convert HDF to SARIF
        run: sarif convert <HDF_LOG_FILE> -tool Hdf -output converted.sarif
  1. Upload the SARIF log to GHAS:
      - name: Upload SARIF log
        uses: actions/upload-artifact@v3
        with:
          name: converted.sarif
          path: converted.sarif

Please lets know if any question

@aaronlippold
Copy link

aaronlippold commented Mar 3, 2023 via email

@candrews
Copy link
Collaborator

I implemented @yongyan-gh's approach in #2286 (comment) and found that it comes close, but unfortunately doesn't work.

The SARIF is generated and send to GitHub, but GitHub fails to parse due to missing location data (which it requires):

Error: Code Scanning could not process the submitted SARIF file:
locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location
Error: Code Scanning could not process the submitted SARIF file:
locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location, locationFromSarifResult: expected at least one location

-- https://github.com/candrews/jumpstart/actions/runs/5603707977/jobs/10250839746?pr=884#step:10:22

GitHub indicates this requirement in their documentation at https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning#physicallocation-object

Trivy had the same problem a while back (see aquasecurity/trivy#1038), they solved it by add location/region information to the SARIF: AndreyLevchenko/trivy@a8ec7ec

Perhaps this tool could similarly add this information when it converts HDF->SARIF?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants