Skip to content

Commit

Permalink
Fix crash when starting with no arguments
Browse files Browse the repository at this point in the history
  • Loading branch information
sirwart committed Apr 17, 2022
1 parent 0c281de commit 1c48cac
Show file tree
Hide file tree
Showing 6 changed files with 42 additions and 14 deletions.
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## 0.1.1 (2022-04-16)

- Fix crash when running with no arguments
- Notarize binaries for macOS properly

## 0.1 (2022-04-15)

Initial release
2 changes: 1 addition & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "secrets"
version = "0.1.0"
version = "0.1.1"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
Expand Down
32 changes: 21 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,18 +14,30 @@

## Usage

By default running `secrets` will recursively search source files in your current directory for secrets. For every secret it finds it will print out the file, line number, and the secret that was found. If it finds any secrets it will exit with a non-zero status code.
By default running `secrets` will recursively search source files in your current directory for secrets.

You can optionally pass a list of files and directories to search as arguments. This most commonly used to search files that are about to be committed to source control for accidentically included secrets. For example, to use `secrets` as a git pre-commit hook you can add the following command to your `pre-commit` script:
```
$ secrets
```

For every secret it finds it will print out the file, line number, and the secret that was found. If it finds any secrets it will exit with a non-zero status code.

You can optionally pass a list of files and directories to search as arguments.

```
secrets `git diff --cached --name-only --diff-filter=ACM`
$ secrets file1 file2 dir1
```

This most commonly used to search files that are about to be committed to source control for accidentically included secrets. For example, to use `secrets` as a git pre-commit hook you can add the following command to your `pre-commit` script:

```
$ secrets `git diff --cached --name-only --diff-filter=ACM`
```

This command will fail if `secrets` detects any secrets in the files modified by the commit. You can install `secrets` as a pre-commit hook automatically in your current git repository using the following command:

```
secrets --install-pre-commit
$ secrets --install-pre-commit
```

## Installation
Expand All @@ -52,23 +64,21 @@ test_secret = "pAznMW3DsrnVJ5TDWwBVCA" # pragma: allowlist secret

## Performance

There were a few core decisions made in the design of `secrets` to optimize performance:

1. Written in a compiled language. Interpreted programs have longer startup time than compiled ones, which becomes a high percentage of total runtime for commands that only run for a fraction of a second.
The slowest part of secret scanning is looking for potential secrets in a large number of files. To do this quickly `secrets` does a couple of things:

2. Uses the fastest regex engine. The biggest bottleneck for secret scanning is looking for secret-like strings using a regex in a large number of files. This is a job that the excellent [ripgrep](https://github.com/BurntSushi/ripgrep) is specifically optimized to be the best at, so `secrets` was designed around using the `ripgrep` engine for parallel file walking and regex searching, down to being written in rust.
1. All the secret patterns are compiled into a single regex, so each file only needs to be processed once.

3. A single pass on files. Other scanners use N different regexes for N different secret patterns that need to be checked, which means you have to run N passes on every file you're checking. This is very flexible and modular, but comes at a large cost to performance. `secrets` compiles all patterns as a single regex to get the most performance out of the underlying regex engine.
2. This regex is fed to [ripgrep](https://github.com/BurntSushi/ripgrep), which is specially optimized to running a regex against a large number of files quickly.

To compare real world performance, here's the runtime of a few different scanning tools to search for secrets in the [Sentry repo](https://github.com/getsentry/sentry) on an M1 air laptop:
Additionally `secrets` is written in Rust, which means there's no interpreter startup time. To compare real world performance, here's the runtime of a few different scanning tools to search for secrets in the [Sentry repo](https://github.com/getsentry/sentry) on an M1 air laptop:

| tool | avg. runtime | vs. baseline |
| -------------- | ------------ | ------------ |
| secrets | 0.32s | 1x |
| trufflehog | 31.2s | 95x |
| detect-secrets | 73.5s | 226x |

Most of the time your pre-commit will be running on a small number of files, so that runtime is not typical, but when working with large commits that touch a lot of files the runtime becomes very noticeable.
Most of the time your pre-commit will be running on a small number of files, so that runtime is not typical, but when working with large commits that touch a lot of files the runtime can become noticeable.

## Alternative tools

Expand Down
2 changes: 1 addition & 1 deletion src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ fn main() {
let args: Vec<&str> = args.iter().map(|x| &**x).collect();
let mut path = ".";

if args[1] == "--install-pre-commit" {
if args.len() > 1 && args[1] == "--install-pre-commit" {
if args.len() > 2 {
eprintln!("Usage: secrets --install-pre-commit");
process::exit(1);
Expand Down
13 changes: 13 additions & 0 deletions src/p_random.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,19 @@ use std::collections::hash_set::HashSet;

use memoize::memoize;

/*
When we get a potential secret that doesn't match any known secret patterns, we need to make some determination of
whether it's a random string or not. To do that we assume it's random, and then calculate the probability that a few
metrics came about by chance:
1. Number of distinct values. Non-random text is generally going to have much fewer distinct values than random text.
2. Number of numbers. It's very common to have very few numbers in non-random text.
3. Number of bigrams. If we take a sample of roughly 10% of possible bigrams that are common in source code, we should
expect that a random string should have about 10% of those bigrams.
This math is probably not perfect, but it should be in the right ballpark and it's ultimately a hueristic so it should
be judged on how well it's able to distinguish random from non-random text.
*/
pub fn p_random(s: &[u8]) -> f64 {
return p_random_distinct_values(s) * p_random_char_class(s) * p_random_bigrams(s);
}
Expand Down

0 comments on commit 1c48cac

Please sign in to comment.