Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added regex for new npm secret format #588

Merged
merged 21 commits into from
Jan 14, 2022
Merged

Conversation

marmegh
Copy link
Collaborator

@marmegh marmegh commented Dec 14, 2021

Changes

Added regex for new npm secret format

For significant contributions please make sure you have completed the following items:

  • ReleaseHistory.md updated for non-trivial changes
  • Added unit tests

@@ -25,6 +25,9 @@
- SDK: Exposing `automationId`, `automationGuid`, and `postUri` in the
`analyze` command.
[#586](https://github.com/microsoft/sarif-pattern-matcher/pull/586)
- NFD: Adding additional regex for 'SEC101/017.NpmAuthorToken'
Copy link
Collaborator

@eddynaka eddynaka Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NFD

what is NFD? should this be FNC? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a typo. Thank you

@@ -85,6 +85,7 @@
$SEC101/015.AkamaiCredentials=(?si)https:\/\/(?P<host>[\w\-\.]+)\.akamaiapis\.net.{0,150}(?:(?:client_token.{0,10}(?:[^a]|^)(?P<id>akab[\w\-]+).{0,50})|(?:access_token.{0,10}(?:[^\w\-]|^)(?P<resource>akab[\w\-]+).{0,200})|(?:(?:client_secret).{0,10}(?:[^0-9a-z\/\+]|^)(?P<secret>[0-9a-z\/\+]{43}=))){3}
$SEC101/016.StripeApiKey=(?:[^rs]|^)(?P<secret>(?:r|s)k_(?:live|test)_(?i)[0-9a-z]{24,99})(?:[^0-9a-z]|$)
$SEC101/017.NpmAuthorToken=(?i)npm.{0,100}[^0-9a-z](?-i)(?P<secret>[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})[^0-9a-z]
$SEC101/017.NpmAuthorTokenV2=(?:[^s]|^)(?P<secret>(?:npm)_(?i)[a-zA-Z0-9]{36})
Copy link
Collaborator

@eddynaka eddynaka Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[a-zA-Z0-9]

since you added ?i, you can change this to:
0-9a-z (we normally add numbers first and letters last) #Closed

@@ -85,6 +85,7 @@
$SEC101/015.AkamaiCredentials=(?si)https:\/\/(?P<host>[\w\-\.]+)\.akamaiapis\.net.{0,150}(?:(?:client_token.{0,10}(?:[^a]|^)(?P<id>akab[\w\-]+).{0,50})|(?:access_token.{0,10}(?:[^\w\-]|^)(?P<resource>akab[\w\-]+).{0,200})|(?:(?:client_secret).{0,10}(?:[^0-9a-z\/\+]|^)(?P<secret>[0-9a-z\/\+]{43}=))){3}
$SEC101/016.StripeApiKey=(?:[^rs]|^)(?P<secret>(?:r|s)k_(?:live|test)_(?i)[0-9a-z]{24,99})(?:[^0-9a-z]|$)
$SEC101/017.NpmAuthorToken=(?i)npm.{0,100}[^0-9a-z](?-i)(?P<secret>[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})[^0-9a-z]
$SEC101/017.NpmAuthorTokenV2=(?:[^s]|^)(?P<secret>(?:npm)_(?i)[a-zA-Z0-9]{36})
Copy link
Collaborator

@eddynaka eddynaka Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?:npm)

you don't need to create a group.
#Closed

@@ -85,6 +85,7 @@
$SEC101/015.AkamaiCredentials=(?si)https:\/\/(?P<host>[\w\-\.]+)\.akamaiapis\.net.{0,150}(?:(?:client_token.{0,10}(?:[^a]|^)(?P<id>akab[\w\-]+).{0,50})|(?:access_token.{0,10}(?:[^\w\-]|^)(?P<resource>akab[\w\-]+).{0,200})|(?:(?:client_secret).{0,10}(?:[^0-9a-z\/\+]|^)(?P<secret>[0-9a-z\/\+]{43}=))){3}
$SEC101/016.StripeApiKey=(?:[^rs]|^)(?P<secret>(?:r|s)k_(?:live|test)_(?i)[0-9a-z]{24,99})(?:[^0-9a-z]|$)
$SEC101/017.NpmAuthorToken=(?i)npm.{0,100}[^0-9a-z](?-i)(?P<secret>[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})[^0-9a-z]
$SEC101/017.NpmAuthorTokenV2=(?:[^s]|^)(?P<secret>(?:npm)_(?i)[a-zA-Z0-9]{36})
Copy link
Collaborator

@eddynaka eddynaka Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(?:[^s]|^)

instead of ^s, you could just use ^n, because anything that isn't the 'n' could be a match. #Closed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with that, we would have:
(?:[^n]|^)

@@ -85,6 +85,7 @@
$SEC101/015.AkamaiCredentials=(?si)https:\/\/(?P<host>[\w\-\.]+)\.akamaiapis\.net.{0,150}(?:(?:client_token.{0,10}(?:[^a]|^)(?P<id>akab[\w\-]+).{0,50})|(?:access_token.{0,10}(?:[^\w\-]|^)(?P<resource>akab[\w\-]+).{0,200})|(?:(?:client_secret).{0,10}(?:[^0-9a-z\/\+]|^)(?P<secret>[0-9a-z\/\+]{43}=))){3}
$SEC101/016.StripeApiKey=(?:[^rs]|^)(?P<secret>(?:r|s)k_(?:live|test)_(?i)[0-9a-z]{24,99})(?:[^0-9a-z]|$)
$SEC101/017.NpmAuthorToken=(?i)npm.{0,100}[^0-9a-z](?-i)(?P<secret>[0-9a-z]{8}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{4}-[0-9a-z]{12})[^0-9a-z]
$SEC101/017.NpmAuthorTokenV2=(?:[^s]|^)(?P<secret>(?:npm)_(?i)[a-zA-Z0-9]{36})
Copy link
Collaborator

@eddynaka eddynaka Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we normally add the end border as well:
(?:[^0-9a-z]|$) #Closed

@@ -165,6 +165,13 @@
"MessageArguments": { "secretKind": "NPM API key" },
Copy link
Member

@michaelcfanning michaelcfanning Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API key

why does this say 'API key' but the rule name is author token? #Closed

{
"Id": "SEC101/017",
"Name": "DoNotExposePlaintextSecrets/NpmAuthorToken",
"ContentsRegex": "$SEC101/017.NpmAuthorTokenV2",
Copy link
Member

@michaelcfanning michaelcfanning Dec 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NpmAuthorTokenV2

we should add a checksum validation for this secret kind, that's the substance of the token change (moving to v2).

We need to separate this into a brand new rule, as we are likely to handle the 'identifiable' secrets differently (or someone needs to propose a convention for marking these as identifiable).

For now, try allocating a new rule id, SEC101/050 it appears? We should call the rule 'IdentifiableNpmAuthorToken' or 'NpmIdentifiableAuthorToken'. The former would sort all identifiable secret types together, the latter would keep the secret type bundled with the platform.

We could consider adding a new property denoting these as identifiable in adding to making it clear in the rule name. This would be valuable if we special-case behaviors in the engine itself (e.g., the engine could automatically elevate all 'identifiable' secrets to errors in the static analysis phase.

A future change, though, no need to worry about it now. Let's do handle the checksum. I just noticed we don't do this in the GH PAT either, so you'll need to follow up with me offline on what to do.

#Resolved

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, we do the checksum in the internal version.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split this out into a separate rule and added the supporting unit test and functional test files. Let me know what you think. I tried to organize the commits for easy reversal/changes.


namespace Microsoft.CodeAnalysis.Sarif.PatternMatcher.Plugins.Security
{
public class IdentifiableNpmAuthorTokenValidator : DynamicValidatorBase
Copy link
Collaborator

@eddynaka eddynaka Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IdentifiableNpmAuthorTokenValidator

create a helper that has:

  • uri
  • checkinformation
  • dynamic validation logic
  • internal classes

change this validator + npmtokenvalidator to use that helper

this will reduce the code duplication #Closed

@@ -13,6 +13,11 @@
</EmbeddedResource>
</ItemGroup>

<ItemGroup>
<None Remove="TestData\SecurePlaintextSecrets\ExpectedOutputs\SEC101_050.IdentifiableNpmAuthorToken.sarif" />
<None Remove="TestData\SecurePlaintextSecrets\Inputs\SEC101_050.IdentifiableNpmAuthorToken.ps1" />
Copy link
Collaborator

@eddynaka eddynaka Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can delete this, since we automatically embed everything under TestData #Closed


namespace Microsoft.CodeAnalysis.Sarif.PatternMatcher.Plugins.Security.Validators
{
public class IdentifiableNpmAuthorTokenValidatorTests
Copy link
Collaborator

@eddynaka eddynaka Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IdentifiableNpmAuthorTokenValidatorTests

once u create the helper, u might be able to merge the test logic for both rules. #Closed

{
public class IdentifiableNpmAuthorTokenValidatorTests
{

Copy link
Collaborator

@eddynaka eddynaka Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this empty line #Closed


namespace Microsoft.CodeAnalysis.Sarif.PatternMatcher.Plugins.Security.Validators
{
public class IdentifiableNpmAuthorTokenValidatorTests
Copy link
Collaborator

@eddynaka eddynaka Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the summary (check other test validators, pls) #Closed

Content = new StringContent(publishResponseJson)
};

var ValidEmptyContentResponse = new HttpResponseMessage(HttpStatusCode.OK)
Copy link
Collaborator

@eddynaka eddynaka Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValidEmptyContentResponse

verify casing. for local variables we normally start with lower case #Closed

ValidationState currentState = identifiableNpmAuthorTokenValidator.IsValidDynamic(ref fingerprint,
ref message,
keyValuePairs,
ref resultLevelKind);
Copy link
Collaborator

@eddynaka eddynaka Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review indentation. we normally indent based on the first parameter #Closed

@eddynaka
Copy link
Collaborator

eddynaka commented Dec 15, 2021

The change looks good!
Just a few refactors to prevent code duplication and some style nits.


In reply to: 994572554


In reply to: 994572554


protected override IEnumerable<ValidationResult> IsValidStaticHelper(IDictionary<string, FlexMatch> groups)
{
FlexMatch secret = groups["secret"];
Copy link
Collaborator

@eddynaka eddynaka Dec 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

secret

to reduce false-positives, we should validate the checksum:
https://github.blog/changelog/2021-09-23-npm-has-a-new-access-token-format/ #Closed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check sec101/102 or sec101/006 (internal version)

{
FlexMatch secret = groups["secret"];

if (groups.TryGetNonEmptyValue("checksum", out FlexMatch checksum))
Copy link
Collaborator

@eddynaka eddynaka Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (groups.TryGetNonEmptyValue("checksum", out FlexMatch checksum))

since this is required, you can follow what we did for the secret. If we don't have that key, it will throw an exception because that is not expected. #Closed

client);
}

private static string Base62EncodeUint32(uint value, int minimumLength = 6)
Copy link
Collaborator

@eddynaka eddynaka Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base62EncodeUint32

should we move this to crc class, make it static and add more tests to complete its coverage? what are ur thoughts? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of code should exist in Microsoft.Security.Utilities. That package should have API that accepts an arbitrary alphabet (such as a base62 encoding scheme) and provides encoding and decoding capabilities.

We will need this API for many classes of secret.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that functionality is still pending. May I proceed with Eddy's suggestion and ensure that a task to incorporate this is added to the backlog?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're now using the CustomAlphabetEncoder class to achieve this.

@@ -74,73 +39,9 @@ protected override IEnumerable<ValidationResult> IsValidStaticHelper(IDictionary
ref ResultLevelKind resultLevelKind)
{
string secret = fingerprint.Secret;
Copy link
Collaborator

@eddynaka eddynaka Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string secret = fingerprint.Secret;

you can remove this. #Closed

// Validate checksum to avoid false positives.
string randomPart = secret.Value.String.Substring(4, 30);
uint checksumValue = Crc32.Calculate(randomPart);
var encoder = new CustomAlphabetEncoder();
Copy link
Collaborator

@eddynaka eddynaka Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var encoder = new CustomAlphabetEncoder();

should we create only one encoder and re-use it? #Closed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this also helped me catch some other cleanup items.


[JsonProperty("total")]
public int Total { get; set; }
return NpmAuthorTokenHelper.ValidateTokens(ref fingerprint, ref message, ref resultLevelKind, client);
Copy link
Collaborator

@eddynaka eddynaka Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref fingerprint, ref message, ref resultLevelKind, client

following the identifiablenpm, can you wrap the parameters? #Closed

public class IdentifiableNpmAuthorTokenValidatorTests
{
[Fact]
public void IdentifiableNpmAuthorTokenValidatorTests_MockHttpTests()
Copy link
Collaborator

@eddynaka eddynaka Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests

nit: remove Tests from this.
The idea that we follow:
ClassThatWeAreTesting_Something #Closed

@@ -0,0 +1,3 @@
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

"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"
Copy link
Collaborator

@eddynaka eddynaka Jan 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

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

here, you created a test that pass the checksum.
Can you also add a new one that do not pass?
Also, add a comment. Something like:

# This is a pattern that matches the regex but the checksum is invalid
npm_abcd....

#Closed

{
public class IdentifiableNpmAuthorTokenValidator : DynamicValidatorBase
{
private readonly CustomAlphabetEncoder encoder = new CustomAlphabetEncoder();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new CustomAlphabetEncoder();

can you move this to the constructor of this class?
we try to move initializations to the constructor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good note, please provide the rationale for the standard. 'The reason is' that it's harder to understand initialization behavior when assignments are inlined like this: instance variables can be declared anywhere in the class. You really notice the problem with this pattern when debugging a ctor with initializations of this kind that are scattered around: the debugger hopes from place to place. With an explicit constructor, everything is sequential.

Copy link
Collaborator

@eddynaka eddynaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Copy link
Member

@michaelcfanning michaelcfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@michaelcfanning michaelcfanning merged commit d35e876 into main Jan 14, 2022
@eddynaka eddynaka deleted the users/marmegh/npmUpdate branch March 8, 2022 00:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants