Fix crash when starting with no arguments

sirwart · Apr 17, 2022 · 1c48cac · 1c48cac
1 parent 0c281de
commit 1c48cac
Show file tree

Hide file tree

Showing 6 changed files with 42 additions and 14 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,8 @@
+## 0.1.1 (2022-04-16)
+
+- Fix crash when running with no arguments
+- Notarize binaries for macOS properly
+
 ## 0.1 (2022-04-15)
 
 Initial release
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "secrets"
-version = "0.1.0"
+version = "0.1.1"
 edition = "2021"
 
 # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

diff --git a/README.md b/README.md
@@ -14,18 +14,30 @@
 
 ## Usage
 
-By default running `secrets` will recursively search source files in your current directory for secrets. For every secret it finds it will print out the file, line number, and the secret that was found. If it finds any secrets it will exit with a non-zero status code.
+By default running `secrets` will recursively search source files in your current directory for secrets.
 
-You can optionally pass a list of files and directories to search as arguments. This most commonly used to search files that are about to be committed to source control for accidentically included secrets. For example, to use `secrets` as a git pre-commit hook you can add the following command to your `pre-commit` script:
+```
+$ secrets
+```
+
+For every secret it finds it will print out the file, line number, and the secret that was found. If it finds any secrets it will exit with a non-zero status code.
+
+You can optionally pass a list of files and directories to search as arguments.
 
 ```
-secrets `git diff --cached --name-only --diff-filter=ACM`
+$ secrets file1 file2 dir1
+```
+
+This most commonly used to search files that are about to be committed to source control for accidentically included secrets. For example, to use `secrets` as a git pre-commit hook you can add the following command to your `pre-commit` script:
+
+```
+$ secrets `git diff --cached --name-only --diff-filter=ACM`
 ```
 
 This command will fail if `secrets` detects any secrets in the files modified by the commit. You can install `secrets` as a pre-commit hook automatically in your current git repository using the following command:
 
 ```
-secrets --install-pre-commit
+$ secrets --install-pre-commit
 ```
 
 ## Installation
@@ -52,23 +64,21 @@ test_secret = "pAznMW3DsrnVJ5TDWwBVCA" # pragma: allowlist secret
 
 ## Performance
 
-There were a few core decisions made in the design of `secrets` to optimize performance:
-
-1. Written in a compiled language. Interpreted programs have longer startup time than compiled ones, which becomes a high percentage of total runtime for commands that only run for a fraction of a second.
+The slowest part of secret scanning is looking for potential secrets in a large number of files. To do this quickly `secrets` does a couple of things:
 
-2. Uses the fastest regex engine. The biggest bottleneck for secret scanning is looking for secret-like strings using a regex in a large number of files. This is a job that the excellent [ripgrep](https://github.com/BurntSushi/ripgrep) is specifically optimized to be the best at, so `secrets` was designed around using the `ripgrep` engine for parallel file walking and regex searching, down to being written in rust.
+1. All the secret patterns are compiled into a single regex, so each file only needs to be processed once.
 
-3. A single pass on files. Other scanners use N different regexes for N different secret patterns that need to be checked, which means you have to run N passes on every file you're checking. This is very flexible and modular, but comes at a large cost to performance. `secrets` compiles all patterns as a single regex to get the most performance out of the underlying regex engine.
+2. This regex is fed to [ripgrep](https://github.com/BurntSushi/ripgrep), which is specially optimized to running a regex against a large number of files quickly.
 
-To compare real world performance, here's the runtime of a few different scanning tools to search for secrets in the [Sentry repo](https://github.com/getsentry/sentry) on an M1 air laptop:
+Additionally `secrets` is written in Rust, which means there's no interpreter startup time. To compare real world performance, here's the runtime of a few different scanning tools to search for secrets in the [Sentry repo](https://github.com/getsentry/sentry) on an M1 air laptop:
 
 | tool           | avg. runtime | vs. baseline |
 | -------------- | ------------ | ------------ |
 | secrets        | 0.32s        | 1x           |
 | trufflehog     | 31.2s        | 95x          |
 | detect-secrets | 73.5s        | 226x         |
 
-Most of the time your pre-commit will be running on a small number of files, so that runtime is not typical, but when working with large commits that touch a lot of files the runtime becomes very noticeable.
+Most of the time your pre-commit will be running on a small number of files, so that runtime is not typical, but when working with large commits that touch a lot of files the runtime can become noticeable.
 
 ## Alternative tools
 

diff --git a/src/main.rs b/src/main.rs
@@ -127,7 +127,7 @@ fn main() {
     let args: Vec<&str> = args.iter().map(|x| &**x).collect();
     let mut path = ".";
 
-    if args[1] == "--install-pre-commit" {
+    if args.len() > 1 && args[1] == "--install-pre-commit" {
         if args.len() > 2 {
             eprintln!("Usage: secrets --install-pre-commit");
             process::exit(1);

diff --git a/src/p_random.rs b/src/p_random.rs
@@ -3,6 +3,19 @@ use std::collections::hash_set::HashSet;
 
 use memoize::memoize;
 
+/*
+    When we get a potential secret that doesn't match any known secret patterns, we need to make some determination of
+    whether it's a random string or not. To do that we assume it's random, and then calculate the probability that a few
+    metrics came about by chance:
+
+    1. Number of distinct values. Non-random text is generally going to have much fewer distinct values than random text.
+    2. Number of numbers. It's very common to have very few numbers in non-random text.
+    3. Number of bigrams. If we take a sample of roughly 10% of possible bigrams that are common in source code, we should
+       expect that a random string should have about 10% of those bigrams.
+
+    This math is probably not perfect, but it should be in the right ballpark and it's ultimately a hueristic so it should
+    be judged on how well it's able to distinguish random from non-random text.
+*/
 pub fn p_random(s: &[u8]) -> f64 {
     return p_random_distinct_values(s) * p_random_char_class(s) * p_random_bigrams(s);
 }