-
Notifications
You must be signed in to change notification settings - Fork 488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Binary-Artifact and Pinned-Dependencies kill Scorecard in a repo with large files #3831
Comments
Note: there's also some checks that use
Similarly, we check the first 1024 bytes in another part. scorecard/checks/raw/binary_artifact.go Lines 172 to 177 in 83ff808
I can't say i've been a fan of the read it all at once aspect, instead of using scorecard/clients/repo_client.go Lines 31 to 39 in 83ff808
|
When profiling the weekly cron with Not every Additionally, GetFileContent(filename string) (io.ReadCloser, error) There are a lot of existing usages of
type fileReader interface {
ReadFile(string) (io.ReadCloser, error)
}
fr, ok := client.(fileReader)
if ok {
// use the new method fr.ReadFile
} else {
// use client.GetFileContent
} |
@pnacht I didn't see a crash on my machine for the repo, but my VM may have more resources. Does this prototype branch eliminate the crash for https://github.com/spencerschrock/scorecard/tree/reader-partial-interface |
Nope. But I noticed you added the new behavior to the |
doh! pushed another commit. I'm noticing a significant speedup now |
Yep, just ran it on my localdir and it runs! |
The breaking change was already made in #3912, so removing from v5 milestone, but there are still some callers to |
Describe the bug
Running Scorecard with Binary-Artifact and/or Pinned-Dependencies on a repo with large files crashes entirely.
Reproduction steps
I stumbled on this while trying to run Scorecard on a local clone of a HuggingFace model repository.
Steps to reproduce the behavior:
Deleting the very large files (including the .git folder), the checks pass. (There may be other checks that would also fail, I only tested those that run with
--local
)Expected behavior
The checks should work even with large files.
As described below, Binary-Artifacts doesn't need to load the entire file, and it's unlikely an actual script will ever be big enough to be a problem.
Additional context
I believe I understand why these checks are failing: both have at least one function (BinaryArtifacts and collectShellScriptInsecureDownloads) that runs
fileparser.OnMatchingFileContentDo
withPattern: "*"
(i.e. all files).As the function name implies, this function sequentially opens and loads all matching files. I assume one of the files was simply too large.
This should be fixable, though:
BinaryArtifacts
usesfileparser.OnMatchingFileContentDo
to callcheckBinaryFileContent
. That loads the file and then uses https://github.com/h2non/filetype to determine the file's type. This can be replaced by:fileparser.OnMatchingFileContentDo
with a function that only loads the first 262 bytes h2non/filetype needscollectShellScriptInsecureDownloads
could be set to only run on files with common script extensions (i.e..sh
,.bash
,.ps
, and no extension).The text was updated successfully, but these errors were encountered: