Skip to content
This repository has been archived by the owner on Jan 10, 2024. It is now read-only.

Using regex to scan for specific credentials #209

Open
ericwb opened this issue Apr 26, 2019 · 13 comments
Open

Using regex to scan for specific credentials #209

ericwb opened this issue Apr 26, 2019 · 13 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ericwb
Copy link
Collaborator

ericwb commented Apr 26, 2019

Is your feature request related to a problem? Please describe.
See https://github.com/Yelp/detect-secrets for reference, but it would be useful for Precaution to also warn on secrets, Confidential code headers, etc. So level of detection is done through GitHub token scanning, but not all.

Note that also detect-secrets operates on all files (json, txt, etc), and not just languages files. There are definitely cases where API keys are stuffed into shell scripts, json files, yaml, etc.

Describe the solution you'd like
A new, slightly different type of scanner. Not really a linter. And it would scan all types of file extensions, not just language code, so the download of files would need to include everything.

Describe alternatives you've considered
Token scanning provided by GitHub does some things by not all. I don't think it would find a Confidential header whereas detect-secrets could be configured to do so. But it would be good to research and do a comparison analysis.

There are also probably many other CLI tools to do similar scans. This was one recommended from a blog article.

Additional context

@ericwb ericwb added the enhancement New feature or request label Apr 26, 2019
@ericwb
Copy link
Collaborator Author

ericwb commented Apr 26, 2019

This integration could also add some differentiation as compared to similar competing Apps

@joshuagl joshuagl modified the milestone: Initial release Apr 29, 2019
@ericwb
Copy link
Collaborator Author

ericwb commented May 9, 2019

Note; detect-secrets overlaps with language specific linters somewhat. It's KeywordDetector plugin will detect variables named password or secrets, same as Bandit.

And many of the other plugins it has could be integrated into language specific linters, since they would be better at detecting using an AST rather than regex.

However, non-language files (*.py, *.go, *.js, *.ts) would be best detected by detect-secrets. And there are examples of people putting secret keys in json files, yaml files, etc.

@ericwb
Copy link
Collaborator Author

ericwb commented May 9, 2019

So should detect-secrets only be used on files not scanned by other linters?

@joshuagl
Copy link
Contributor

joshuagl commented May 9, 2019

I like the idea of detect-secrets only scanning files not scanned by other linters, but I don't think it's as cut and dry as not scanning files scanned by other linters:

  1. linters may not support secret detection (i.e. I don't think tslint does)
  2. a linters secret detection may not be as complete as detect-secret's , for example I don't think many language security linters do high entropy string detection?

There's a few approaches we could take here:

  1. try to ensure (through contributions, where necessary) that all linters we use have comprehensive secret detection:
    i. we could ramp up to this by having detect-secrets exclude not run against a list of file types for which we have confidence the language linter handles secret detection. I think this list would start out at zero and increase as we work with the upstream linters?
  2. have some logic to prevent reporting the same/similar issues for the same line of code. This might mean:
    a. having a model to map similar functionality across linters/scanners which will be run against the same code
    b. assigning priorities to the linters
    c. filtering out duplicate reports from lower priority scanners based on the functionality map and priorities

@MVrachev
Copy link
Contributor

MVrachev commented May 16, 2019

There are a few particularities in detect-secrets:

We should discuss what to do with those.

@MVrachev
Copy link
Contributor

I think that we can try and see how an option 2

have some logic to prevent reporting the same/similar issues for the same line of code.

implemented with the logic from subpoint c

c. filtering out duplicate reports from lower priority scanners based on the functionality map and priorities

@MVrachev
Copy link
Contributor

MVrachev commented May 16, 2019

Here is the list of the pull requests/features we should add to fully integrate detect-secrets into Precaution:

  • Spawn functionality + unit tests for it
  • Report functionality + unit tests for it
  • Add detect-secrets in the README.md and in the doc pages:
    • initial_setup
    • false-positives
    • setting up for manual deployment?

@MVrachev
Copy link
Contributor

MVrachev commented May 21, 2019

Currently, detect-secrets doesn't give you the option to use it with multiple parameters like that:
detect-secrets scan test.js test.yaml Yelp/detect-secrets#180

That's why to achieve scanning multiple parameters we have to call it like that:
detect-secrets scan .

The problem is that currently, we are using two folders for our temporary files which will be processed
cache/go/src/<repo_id>/<pr_id>/ for all Go files and cache/<repo_id>/<pr_id>/ for all other files.
If you want to read why we use two folders read the discussions here:

What happens because of that is that when we start detect-secrets from the cache/go/src/<repo_id>/<pr_id>/ directory it will scan all go files with ease but it will miss all other files in the cache/<repo_id>/<pr_id>/ folder.

If we start detect-secrets from cache/<repo_id>/<pr_id>/ folder we would scan all Go files below and if there are not deleted files from an old scan we would report for their errors too.

Because of all those problems, I believe that first, we should create a Docker image which will allow us to use Go modules in Gosec (as described here: #227) and then we would continue with the detect-secret spawn functionality.

@MVrachev
Copy link
Contributor

I made a research for other tools besides detect-secrets and here are my findings:

Tools using the Shannon entropy for minimizing the false-positives:

There are a lot of tools focused on finding secrets which are working on Git repositories.
Those tools won't work for us because we are scanning files under the cache/ folder.
But cache/ is in our gitignore file so those files will be missed.

@nishakm
Copy link

nishakm commented May 28, 2019

How about scanning one at a time?

@MVrachev
Copy link
Contributor

MVrachev commented May 28, 2019

How about scanning one at a time?

detect-secrets doesn't support multiple parameters right now: Yelp/detect-secrets#180 which means that we should give one parameter - a folder to detect-secrets which contains all files to be scanned

As I mentioned in this comment #209 (comment) we have a problem with that which will be resolved if we have Dockerfile.

@MVrachev
Copy link
Contributor

MVrachev commented Jun 13, 2019

We reevaluated the benefits of using detect-secrets HighEntropyString plugin.
We found out that this plugin won't work for us because we used the default entropy configuration and then detect-secrets didn't catch an obvious hardcoded credentials.
@joshuagl decided to configure it a little bit and he got a lot of false positives.

It's meaningless to use detect-secrets without the HighEntropyString and that's why we decided that we would write our own implementation using regular expressions to scan for specific tokens.
Because that would be a nontrivial feature to implement we decided to move this feature to the Near future milestone.

@MVrachev MVrachev changed the title Integrate detect-secrets Using regex to scan for specific credentials Jun 13, 2019
@MVrachev MVrachev reopened this Jun 13, 2019
@MVrachev MVrachev modified the milestones: Initial release, Near Future Jun 13, 2019
@MVrachev
Copy link
Contributor

It's important when implementing the regex scan how we would store the files for scan in the memory because Heroku provides us with limited RAM https://devcenter.heroku.com/articles/limits#dyno-memory.
One of the ways to process the files will be by fixing the maximum amount of memory we want our files to use and then before opening a file:

  1. check the size of the file
  2. if the currentUsedMemory + the size of the file > maximumRAMusage then we have to use 4096 bytes (or another size) of blocks to process that file
  3. if we can load the file in memory and use less or equal memory of the defined maximumRAMusage memory then load the whole file in the memory

PS: Closed the issue by mistake

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants