Detect the lack of a pattern in the target project #7363

gand3lf · 2023-03-20T20:36:35Z

Hi guys!
In my opinion the main lack of you fantastic tool is that it is not possible to write a rule to say:

WARNING: The analyzed project does not implement this protection.

To implement negative matching can be tricky, but I'm asking myself: Why not implement it programmatically through the CLI.
For example:

semgrep -xc myrule.yaml ./target

where "xc" does the following:

execute the rule in each relevant file, as always
merge the results to verify that the output is not empty
if the output is empty then print the message specified in the yaml rule

It could be so useful in all that situations where someone have to check the lack of something. Think about security engineers, it is fundamental to detect the lack of a specific protection or flag in the source code.

Of course, the best way to implement this feature is to define a specific mode (mode: negative) but maybe this solution could be more challenging to develop.
I understand that it is possible to work around the problem by using a wrapper of Semgrep but it sounds like a short term solution.

Thank you so much for your great work!

The text was updated successfully, but these errors were encountered:

aryx · 2023-03-22T09:46:14Z

interesting idea

gand3lf · 2023-04-27T14:58:22Z

Do you have some updates about this proposal?
I'm working on a project with my company about that will be publicly released and our biggest issue is the aforementioned.
Moreover, I would like to add that the categorization "interfile" could be related only to Pro version, but in this case Semgrep does not have to follow the flow among files but only verify the presence of an aggregated result.
Thank you

aryx · 2023-05-22T11:43:54Z

cc @IagoAbal

IagoAbal · 2023-05-22T11:53:16Z

I would say it's not that hard to write a Semgrep rule to check for the absence of something. In some cases may be more tricky than others. But before we jump into something like mode: negative one would need to collect at least a few examples that justify it.

gand3lf · 2023-05-22T12:50:19Z

Hi @IagoAbal, thank you for the answer.
Currently, it is simple to implement a rule to check:

The target file does not use this protection.

but it is not possible to check:

The target project does not use this protection.

Please consider this example:
I want to implement a check to verify that an Android application implements root detection mechanisms. In other words, I want that somewhere in the target project, composed by N classes, the application calls the method "isRooted()". This function call can be located in one of the N classes. My goal, as security engineer, is to use Semgrep to detect the lack of this check in the entire project. The result I would like obtain is something like:

Semgrep result: the target project does not call the method isRooted() anywhere.

Suppose you implement this rule now. In the case where the protection is present in only one file you will always obtain N-1 false positives, this because each rule runs separately in each single file without grouping results.
I hope that the example was clear, in the case you need a more specific example please let me know.

Another point of view to see this:
You have implemented the join mode that is inspired to the "SQL join". What I'm proposing is inspired to the "SQL group by" since I want to check the number of final results.

Thank you!

IagoAbal · 2023-05-22T13:55:48Z

You have implemented the join mode that is inspired to the "SQL join". What I'm proposing is inspired to the "SQL group by" since I want to check the number of final results.

I agree, this should be handled by "sql mode" or something that allows to combine results of multiple rules in arbitrary ways.

aryx · 2023-05-22T13:58:02Z

cc @kurt-r2c @LewisArdern @p4p3r

emjin · 2023-05-22T16:56:57Z

cc @spencerdrak

emjin · 2023-05-22T17:09:13Z

I'm not yet convinced that the best way to solve this is via a SQL post processing step. That's definitely potentially very powerful! But my concern is that, if 80% of the use cases for this step are "find the absence of a pattern", while 20% are very disparate cases, we could end up with something more complex than necessary for the 80% case, while not providing enough for the 20% cases.

That's just a hypothetical. My point is, I want to understand better what the post-processing use cases are.

I'm also not sure that all the negative pattern cases are nicely solvable with post-processing. For example, it seems like a common need in IaC is to look for, say, "all containers without resources specified". We can do this now using existing Semgrep, but given how common it is in IaC, it might be nice to provide some syntactic sugar for the use case.

Very happy to be having this conversation! We've had this friction for a long time.

gand3lf · 2023-05-22T20:39:20Z

Hi @emjin, I agree with you when you say "something more complex than necessary for the 80% case".
As Semgrep user, currently I think that a simple negative mode could be enough. The hard way, the group-by-style mode, maybe requires too much effort in absence of concrete examples.
The SQL group by uses these aggregate functions: COUNT, MAX, MIN, SUM, AVG. These functions, except the COUNT one, don't seem to fit with Semgrep purposes.
Moreover, with a group-by mode probably you need to introduce a new syntax for using conditions on aggregated results. It is definitely "too much".

As Semgrep user I would be very happy to have a simple negative mode that does not introduce new syntax and that perform the 3 steps I describe in the original issue:

execute the rule on each file
count the results
if the count is 0 then print message

Fast, easy and useful! 🥇

LewisArdern · 2023-06-14T01:11:22Z

@aryx @IagoAbal

I don't want us to go down a SQL route, but I could see value with some conditions:

Specific files

file type or file name exists
condition does not match (0 findings)

mark as finding

Based on more than one condition

rule one (the requirement) condition matches (one or more findings)
the negative pattern (the other condition) does not match (0 findings)

mark as a finding.

I have seen cases where we may want to look for the existence of one thing, and based on that report another. E.g. find the use of Express.js but the application is missing a specific middleware in the app.use.

IagoAbal · 2023-06-15T14:43:35Z

I don't want us to go down a SQL route, but I could see value with some conditions:

Let's change "SQL" to "something more general". What I wouldn't like is dozens of special modes that basically just handle specific post-processing cases of Semgrep results. I would rather have a language of "post-processing operators" that can be composed, same way as we have "matching operators".

aryx added exotic funky stuff feature:interfile priority:low labels Mar 22, 2023

ievans added the enhancement New feature or request label May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect the lack of a pattern in the target project #7363

Detect the lack of a pattern in the target project #7363

gand3lf commented Mar 20, 2023 •

edited

aryx commented Mar 22, 2023

gand3lf commented Apr 27, 2023

aryx commented May 22, 2023

IagoAbal commented May 22, 2023

gand3lf commented May 22, 2023

IagoAbal commented May 22, 2023

aryx commented May 22, 2023

emjin commented May 22, 2023

emjin commented May 22, 2023 •

edited

gand3lf commented May 22, 2023 •

edited

LewisArdern commented Jun 14, 2023 •

edited

IagoAbal commented Jun 15, 2023 •

edited

Detect the lack of a pattern in the target project #7363

Detect the lack of a pattern in the target project #7363

Comments

gand3lf commented Mar 20, 2023 • edited

aryx commented Mar 22, 2023

gand3lf commented Apr 27, 2023

aryx commented May 22, 2023

IagoAbal commented May 22, 2023

gand3lf commented May 22, 2023

IagoAbal commented May 22, 2023

aryx commented May 22, 2023

emjin commented May 22, 2023

emjin commented May 22, 2023 • edited

gand3lf commented May 22, 2023 • edited

LewisArdern commented Jun 14, 2023 • edited

IagoAbal commented Jun 15, 2023 • edited

gand3lf commented Mar 20, 2023 •

edited

emjin commented May 22, 2023 •

edited

gand3lf commented May 22, 2023 •

edited

LewisArdern commented Jun 14, 2023 •

edited

IagoAbal commented Jun 15, 2023 •

edited