Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect the lack of a pattern in the target project #7363

Open
gand3lf opened this issue Mar 20, 2023 · 12 comments
Open

Detect the lack of a pattern in the target project #7363

gand3lf opened this issue Mar 20, 2023 · 12 comments
Labels

Comments

@gand3lf
Copy link

gand3lf commented Mar 20, 2023

Hi guys!
In my opinion the main lack of you fantastic tool is that it is not possible to write a rule to say:

WARNING: The analyzed project does not implement this protection.

To implement negative matching can be tricky, but I'm asking myself: Why not implement it programmatically through the CLI.
For example:

semgrep -xc myrule.yaml ./target

where "xc" does the following:

  1. execute the rule in each relevant file, as always
  2. merge the results to verify that the output is not empty
  3. if the output is empty then print the message specified in the yaml rule

It could be so useful in all that situations where someone have to check the lack of something. Think about security engineers, it is fundamental to detect the lack of a specific protection or flag in the source code.

Of course, the best way to implement this feature is to define a specific mode (mode: negative) but maybe this solution could be more challenging to develop.
I understand that it is possible to work around the problem by using a wrapper of Semgrep but it sounds like a short term solution.

Thank you so much for your great work!

@aryx
Copy link
Collaborator

aryx commented Mar 22, 2023

interesting idea

@gand3lf
Copy link
Author

gand3lf commented Apr 27, 2023

Do you have some updates about this proposal?
I'm working on a project with my company about that will be publicly released and our biggest issue is the aforementioned.
Moreover, I would like to add that the categorization "interfile" could be related only to Pro version, but in this case Semgrep does not have to follow the flow among files but only verify the presence of an aggregated result.
Thank you

@aryx
Copy link
Collaborator

aryx commented May 22, 2023

cc @IagoAbal

@IagoAbal
Copy link
Collaborator

I would say it's not that hard to write a Semgrep rule to check for the absence of something. In some cases may be more tricky than others. But before we jump into something like mode: negative one would need to collect at least a few examples that justify it.

@gand3lf
Copy link
Author

gand3lf commented May 22, 2023

Hi @IagoAbal, thank you for the answer.
Currently, it is simple to implement a rule to check:

The target file does not use this protection.

but it is not possible to check:

The target project does not use this protection.

Please consider this example:
I want to implement a check to verify that an Android application implements root detection mechanisms. In other words, I want that somewhere in the target project, composed by N classes, the application calls the method "isRooted()". This function call can be located in one of the N classes. My goal, as security engineer, is to use Semgrep to detect the lack of this check in the entire project. The result I would like obtain is something like:

Semgrep result: the target project does not call the method isRooted() anywhere.

Suppose you implement this rule now. In the case where the protection is present in only one file you will always obtain N-1 false positives, this because each rule runs separately in each single file without grouping results.
I hope that the example was clear, in the case you need a more specific example please let me know.

Another point of view to see this:
You have implemented the join mode that is inspired to the "SQL join". What I'm proposing is inspired to the "SQL group by" since I want to check the number of final results.

Thank you!

@IagoAbal
Copy link
Collaborator

You have implemented the join mode that is inspired to the "SQL join". What I'm proposing is inspired to the "SQL group by" since I want to check the number of final results.

I agree, this should be handled by "sql mode" or something that allows to combine results of multiple rules in arbitrary ways.

@aryx
Copy link
Collaborator

aryx commented May 22, 2023

cc @kurt-r2c @LewisArdern @p4p3r

@ievans ievans added the enhancement New feature or request label May 22, 2023
@emjin
Copy link
Contributor

emjin commented May 22, 2023

cc @spencerdrak

@emjin
Copy link
Contributor

emjin commented May 22, 2023

I'm not yet convinced that the best way to solve this is via a SQL post processing step. That's definitely potentially very powerful! But my concern is that, if 80% of the use cases for this step are "find the absence of a pattern", while 20% are very disparate cases, we could end up with something more complex than necessary for the 80% case, while not providing enough for the 20% cases.

That's just a hypothetical. My point is, I want to understand better what the post-processing use cases are.

I'm also not sure that all the negative pattern cases are nicely solvable with post-processing. For example, it seems like a common need in IaC is to look for, say, "all containers without resources specified". We can do this now using existing Semgrep, but given how common it is in IaC, it might be nice to provide some syntactic sugar for the use case.

Very happy to be having this conversation! We've had this friction for a long time.

@gand3lf
Copy link
Author

gand3lf commented May 22, 2023

Hi @emjin, I agree with you when you say "something more complex than necessary for the 80% case".
As Semgrep user, currently I think that a simple negative mode could be enough. The hard way, the group-by-style mode, maybe requires too much effort in absence of concrete examples.
The SQL group by uses these aggregate functions: COUNT, MAX, MIN, SUM, AVG. These functions, except the COUNT one, don't seem to fit with Semgrep purposes.
Moreover, with a group-by mode probably you need to introduce a new syntax for using conditions on aggregated results. It is definitely "too much".

As Semgrep user I would be very happy to have a simple negative mode that does not introduce new syntax and that perform the 3 steps I describe in the original issue:

  1. execute the rule on each file
  2. count the results
  3. if the count is 0 then print message

Fast, easy and useful! 🥇

@LewisArdern
Copy link
Contributor

LewisArdern commented Jun 14, 2023

@aryx @IagoAbal

I don't want us to go down a SQL route, but I could see value with some conditions:

Specific files

  1. file type or file name exists
  2. condition does not match (0 findings)

mark as finding

Based on more than one condition

  • rule one (the requirement) condition matches (one or more findings)
  • the negative pattern (the other condition) does not match (0 findings)

mark as a finding.

I have seen cases where we may want to look for the existence of one thing, and based on that report another. E.g. find the use of Express.js but the application is missing a specific middleware in the app.use.

@IagoAbal
Copy link
Collaborator

IagoAbal commented Jun 15, 2023

I don't want us to go down a SQL route, but I could see value with some conditions:

Let's change "SQL" to "something more general". What I wouldn't like is dozens of special modes that basically just handle specific post-processing cases of Semgrep results. I would rather have a language of "post-processing operators" that can be composed, same way as we have "matching operators".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

7 participants
@aryx @ievans @LewisArdern @IagoAbal @emjin @gand3lf and others