Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scorecards collector/certifier #249

Closed
lumjjb opened this issue Nov 28, 2022 · 14 comments · Fixed by #459
Closed

Scorecards collector/certifier #249

lumjjb opened this issue Nov 28, 2022 · 14 comments · Fixed by #459
Assignees
Labels
data-sources good first issue Good for newcomers help wanted Extra attention is needed priority Pretty important

Comments

@lumjjb
Copy link
Contributor

lumjjb commented Nov 28, 2022

Have a scorecards collector/certifier that based on a list of repos or git commit IDs will obtain score cards information and emit scorecard documents.

  • Be configured with a list of repos/commit IDs to collect from
  • Either on a poll or watch basis, collect information from git repos
  • Emit scorecard information

This issue is to create a an implementation of the Certifier interface, which will find artifacts within the graph, run scorecards on their repos/hash targets and produce the emit the scorecards data for ingestion.

This is similar to the OSV certifier, with the difference that:

  • Instead of querying for packages, we are instead querying for artifact nodes which name property starts with "git+". This will be part of the QueryComponents interface. This will involve sub-steps including
    • Changing the QueryComponents interface to be able to return assembler.ArtifactNode as well
    • Implement a Components interface similar to the one for root packages, but instead returns the artifact nodes which represent source repos (heuristically by name starting with "git+")
  • Instead of querying OSV for details of the vulnerabilities, invoke the scorecards binary to get the scorecards output in JSON, and construct the document for scorecards (example), and emit it to the doc channel as implementation of the Certify function
  • Create a guacone command called scorecards which will be almost an exact copy of the one for OSV certifier, which can be used for testing purposes.

The out come is that let's say the graph has 3 artifact nodes

Artifact{ name:"git+github.com/guacsec/guac", digest: "sha1:fe83870c942cbe905c948bae46344f5d15b0622f"}
Artifact{ name:"git+github.com/guacsec/guac", digest: "sha1:7985af2e2375fee1f5d9cb28dff97ce8a1e146a5"}
Artifact{ name:"git+github.com/kubernetes/kubernetes", digest: "sha1:afe936fee5229cfd3d9831f439e8feab98a3fcad"}

It should end up running scorecards on the 3 targets

scorecard --repo=github.com/guacsec/guac --commit=fe83870c942cbe905c948bae46344f5d15b0622f
...

And then emit the scorecards JSON documents to the doc channel (and it will be ingested in GUAC)

The PRs should pretty much follow closely something like #245 with the changes as mentioned above.

@lumjjb lumjjb mentioned this issue Nov 28, 2022
@lumjjb lumjjb added this to the GUAC Beta v0.1 milestone Nov 29, 2022
@lumjjb lumjjb added good first issue Good for newcomers help wanted Extra attention is needed labels Dec 7, 2022
@lumjjb
Copy link
Contributor Author

lumjjb commented Jan 18, 2023

@nathannaveen Thanks for all your PRs and helping out the project! Would this issue be of interest to you? If so, i can spend some time to flesh out some details around a design.

@nathannaveen
Copy link
Contributor

@nathannaveen Thanks for all your PRs and helping out the project! Would this issue be of interest to you? If so, i can spend some time to flesh out some details around a design.

Yes, that would be interesting. I'm new to this project, so I might need some guidance. Thanks!

@lumjjb
Copy link
Contributor Author

lumjjb commented Jan 21, 2023

hi @nathannaveen ! I've added more details on the issue. Let me know if you have more questions or if anything else i can do to clarify what the issue is asking for!

@nathannaveen
Copy link
Contributor

Hi @lumjjb, thanks for the details! I have a few clarifications.


When you say:

"Changing the QueryComponents interface to be able to return assembler.ArtifactNode as well"

Do you mean to include another function in QueryComponents that returns a assembler.ArtifactNode? Would it be like GetAssembler(ctx context.Context, compChan chan<- *processor.Document) error?

Including this additional function to QueryComponents will create breaking changes in root_package.go and certify_test.go.

When you say:

"Implement a Components interface similar to the one for root packages, but instead returns the artifact nodes which represent source repos (heuristically by name starting with "git+")"

Do you mean to create something like:

func NewScorecardQuery(repo string) certifier.QueryComponents {
	return &packageQuery{
		client: repo,
	}
}

Do we want to get scorecard using a library, or a binary? As far as I can see the library isn't supported, #127 (comment). If we are using binary where do we store the scorecard binary?

Scorecard also has a REST API and a BigQuery data set, have we considered using these as well? Is there a reason not to, if so, why?

Looks like there is work being done to query the BigQuery data set #214 we could potentially use some of that work to query scorecard data.


What does the certifier file do (https://github.com/guacsec/guac/blob/main/cmd/guacone/cmd/certifier.go)? I don't see the certifier file using OSV.

@pxp928
Copy link
Collaborator

pxp928 commented Jan 23, 2023

Do you mean to include another function in QueryComponents that returns a assembler.ArtifactNode? Would it be like GetAssembler(ctx context.Context, compChan chan<- *processor.Document) error?
Including this additional function to QueryComponents will create breaking changes in root_package.go and certify_test.go.

One of the things that can be done is to change to use an interface{} instead:

type Certifier interface {
	// CertifyComponent takes the type Component and recursively scans each dependency
	// aggregating the results for the top/root level artifact. As attestation documents are generated
	// they are push to the docChannel to be ingested
	CertifyComponent(ctx context.Context, rootComponent interface{}, docChannel chan<- *processor.Document) error
}

type QueryComponents interface {
	// GetComponents runs as a goroutine to get the components that will be certified by the Certifier interface
	GetComponents(ctx context.Context, compChan chan<- interface{}) error
}

and use type assertion on osv.go

// CertifyComponent takes in the root component from the gauc database and does a recursive scan
// to generate vulnerability attestations
func (o *osvCertifier) CertifyComponent(ctx context.Context, rootComponent interface{}, docChannel chan<- *processor.Document) error {
	o.rootComponents = rootComponent.(*certifier.Component)
	_, err := o.certifyHelper(ctx, o.rootComponents, docChannel)
	if err != nil {
		return err
	}
	return nil
}

This will make it generic and can be used for anything.

@pxp928
Copy link
Collaborator

pxp928 commented Jan 23, 2023

When you say:

"Implement a Components interface similar to the one for root packages, but instead returns the artifact nodes which represent source repos (heuristically by name starting with "git+")"

Do you mean to create something like:

func NewScorecardQuery(repo string) certifier.QueryComponents {
	return &packageQuery{
		client: repo,
	}
}

Yes that is correct.

@nathannaveen
Copy link
Contributor

Thanks for your suggesions!

Scorecard also needs a github API token which can be passed to NewScorecardQuery().

@pxp928
Copy link
Collaborator

pxp928 commented Jan 24, 2023

Created a PR to make the certifier generic so it will make it easier for you to integrate scorecard. Will fix up the tests and merge soon.

@pxp928
Copy link
Collaborator

pxp928 commented Jan 24, 2023

pr #340 merged. Should make it easier to integrate the scorecard collector :)

@nathannaveen
Copy link
Contributor

@pxp928 Thanks!

@lumjjb Could you please answer these questions because I am blocked?

Do we want to get scorecard using a library, or a binary? As far as I can see the library isn't supported, #127 (comment). If we are using binary where do we store the scorecard binary?

Scorecard also has a REST API and a BigQuery data set, have we considered using these as well? Is there a reason not to, if so, why?


Looks like there is work being done to query the BigQuery data set #214 we could potentially use some of that work to query scorecard data.

@pxp928
Copy link
Collaborator

pxp928 commented Jan 26, 2023

Do we want to get scorecard using a library, or a binary? As far as I can see the library isn't supported, #127 (comment). If we are using binary where do we store the scorecard binary?

Scorecard also has a REST API and a BigQuery data set, have we considered using these as well? Is there a reason not to, if so, why?

@naveensrinivasan any thoughts on the best way to implement this for GUAC? Has there been more work done on the library side for scorecard?

@lumjjb
Copy link
Contributor Author

lumjjb commented Jan 30, 2023

Hi @naveensrinivasan sorry for the delayed response!

Scorecard also has a REST API and a BigQuery data set, have we considered using these as well? Is there a reason not to, if so, why?

The big query data set unfortunately is not complete, and so for the releases we looked at - for example, kubernetes the commits of the releases do not have a row in the dataset. In GUAC's case it needs the precision of the particular commit.

You're right... having it use the binary may be a bit tricky... For now, perhaps let's use the library, and I will follow up with scorecard folks to see if they will be able to keep supporting it going forward.

@lumjjb
Copy link
Contributor Author

lumjjb commented Feb 7, 2023

Hi @nathannaveen , i've assigned you the issue! Let us know if you have any more questions!

@lumjjb lumjjb added the priority Pretty important label Feb 7, 2023
@nathannaveen
Copy link
Contributor

@lumjjb Thank you, I have started working on it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data-sources good first issue Good for newcomers help wanted Extra attention is needed priority Pretty important
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants