Python: Add Gradio models #16135

sylwia-budzynska · 2024-04-05T12:24:53Z

Gradio is the go-to web framework for creating machine learning demos. This pull request adds models for the Gradio framework.

Looking forward to your feedback!

python/ql/lib/semmle/python/frameworks/Gradio.qll

sylwia-budzynska · 2024-04-08T13:41:28Z

Hi team - I added example tests. Please let me know if anything could be improved!

yoff

Sorry we have taken so long to get to this. Here is a first round of comments. Since this is targeting the main tree directly, I should also go through and check the modelling properly, so there may be a second round.

I included a way to convert one of your tests to inline; we could probably do the other one also.

python/ql/lib/semmle/python/frameworks/Gradio.qll

yoff · 2024-04-26T15:55:52Z

python/ql/lib/semmle/python/frameworks/Gradio.qll

+            this = call.getKeywordParameter("inputs").getASuccessor().getAValueReachingSink()
+            or
+            this = call.getParameter(1).getASuccessor().getAValueReachingSink()


getASuccessor seems overly general, I think you would always want getASubscript. Did you observe other relevant labels here?

I changed getASuccessor to getASubscript in 944f884. What do you mean by relevant labels?

getASucessor is asking for any successor in the directed labelled graph that is the API graph. That is, follow an edge no matter what label it has. I believe that, in your situation, you only want the edges labelled subscript. I was asking if you had seen a need to follow an edge with a different label.

I tested the query with both getASucessor and getASubscript, and I see the results are the same. Like you said, I think I need only the edges labelled subscript, so let's go with that

python/ql/lib/semmle/python/frameworks/Gradio.qll

Co-authored-by: yoff <lerchedahl@gmail.com>

sylwia-budzynska · 2024-05-08T13:17:43Z

Thank you for the review @yoff! I addressed your comments.

I included a way to convert one of your tests to inline; we could probably do the other one also.

Where did you make this change?

yoff

Thank you for making the simplifications. They have now enabled new ones..

yoff · 2024-05-08T21:52:13Z

python/ql/test/library-tests/frameworks/gradio/source_test.ql

+import python
+import semmle.python.dataflow.new.RemoteFlowSources
+
+from RemoteFlowSource rfs
+select rfs


This test can be made inline in the following way:

Suggested change

import python

import semmle.python.dataflow.new.RemoteFlowSources

from RemoteFlowSource rfs

select rfs

import python

import semmle.python.dataflow.new.RemoteFlowSources

import TestUtilities.InlineExpectationsTest

private import semmle.python.dataflow.new.internal.PrintNode

module SourceTest implements TestSig {

string getARelevantTag() { result = "source" }

predicate hasActualResult(Location location, string element, string tag, string value) {

exists(location.getFile().getRelativePath()) and

exists(RemoteFlowSource rfs |

location = rfs.getLocation() and

element = rfs.toString() and

value = prettyNode(rfs) and

tag = "source"

)

}

}

import MakeTest<SourceTest>

You will then have to adjust the test files by adding annotations (eg. line 15 of source_test.py becomes def greet(name): # $ source=name, the test failures will tell you what to add where) and making source_test.expected empty.

Thank you! I added the annotations and fixed the test files in d6acea1. I'm unsure how to test the taint step though - is it fine as it is now?

Yes that looks good. Testing the taint step is a bit more difficult because it is private. We could add provenance to it and test it that way, but we can leave this for later.

python/ql/lib/semmle/python/frameworks/Gradio.qll

yoff · 2024-05-08T21:59:25Z

I included a way to convert one of your tests to inline; we could probably do the other one also.

Where did you make this change?

In source_test.ql, but I could not find it now(!?) It must have not made it into the review, so I have added it again...

yoff · 2024-05-08T22:00:48Z

Compilation fails because you need to format the code: https://github.com/github/codeql/actions/runs/9002043454/job/24729510202?pr=16135#step:5:8

Co-authored-by: yoff <lerchedahl@gmail.com>

sylwia-budzynska · 2024-05-10T11:46:31Z

Fixed the test files and formatting 🚀

yoff · 2024-05-13T18:39:29Z

The failing tests suggest that you need to merge in main (this is partly because we were so long getting to review this, sorry).

yoff

One question about the modelling, otherwise this looks fine.

yoff · 2024-05-13T19:16:31Z

python/ql/lib/semmle/python/frameworks/Gradio.qll

+                "Button", "Textbox", "UploadButton", "Slider", "JSON", "HTML", "Markdown", "File",
+                "AnnotatedImage", "Audio", "BarPlot", "Chatbot", "Checkbox", "CheckboxGroup",
+                "ClearButton", "Code", "ColorPicker", "Dataframe", "Dataset", "DownloadButton",
+                "Dropdown", "DuplicateButton", "FileExplorer", "Gallery", "HighlightedText",
+                "Image", "ImageEditor", "Label", "LinePlot", "LoginButton", "LogoutButton",
+                "Model3D", "Number", "ParamViewer", "Plot", "Radio", "ScatterPlot", "SimpleImage",


Are all these vulnerable? I quickly checked a few of them and it looks like:

HighlightedText only allows the user to specify substrings of a given text

ColorPicker only passes on hex strings

AnnotatedImage does not accept user input at all

We should probably only include actually vulnerable inputs here in order to avoid the query getting noisy.

It’s the event listeners that take untrusted user input, so, for example, gradio.AnnotatedImage.select(fn, inputs, outputs) (see docs) and that's what we model here.

To clarify - with the above models, we look for any event listeners of the listed classes. For example, gr.Button has only one, click() (see this example, which asks for a user's name and displays it). Side note that if an event listener does not exist for a given class, CodeQL doesn't (can't) match it (for example there is no gr.Button.select()). I found this way of modeling to be the most succinct and it doesn't slow performance, but I'm open to feedback around it.

I think it is fine to model situations that will never match, I was more worried about false positives. But on reflection, I think this is fine to start; we can refine it if we actually encounter any problems..

I've already found a few vulnerabilities using these models, including in stable-diffusion-webui (120k stars):
https://securitylab.github.com/advisories/GHSL-2024-010_stable-diffusion-webui/
https://securitylab.github.com/advisories/GHSL-2024-019_GHSL-2024-024_kohya_ss/
To be fair, I tested the AnnotatedImage.select() event listener (and several other event listeners on the list), and it does take user input, but it's not very popular to use, so I haven't actually seen vulnerabilities around it. Most of the vulnerabilities happen with Button.click() (including the 7 vulns I linked to above). I think there shouldn't be many false positives here, but I'll be happy to help if anything needs refining.

sylwia-budzynska · 2024-05-14T11:30:15Z

@yoff I just updated my branch to main and rerun the tests, but both return ERROR: No higher-order predicate with name doublyBoundedFastTC (DataFlowImpl.qll:3890,7-26). I'm not sure what that means, requesting assistance :)

yoff · 2024-05-14T11:43:32Z

@yoff I just updated my branch to main and rerun the tests, but both return ERROR: No higher-order predicate with name doublyBoundedFastTC (DataFlowImpl.qll:3890,7-26). I'm not sure what that means, requesting assistance :)

Is your local CLI up-to-date? The tests run on our CI infra structure reporting the expected diff.

sylwia-budzynska · 2024-05-14T12:33:26Z

You're right, I needed to update the CLI. There was a slight change in the results of the taint step test, likely due to the new CLI, so I updated the results in 34c4479.

Is it okay that the test results contain testFailures failures? Or does something need to be done so these do not appear in the results?

yoff · 2024-05-15T09:54:49Z

Is it okay that the test results contain testFailures failures? Or does something need to be done so these do not appear in the results?

Nothing to be done, this is what it should look like :-)

yoff

Just make the modelling classes private so that we can change them later without the whole deprecation dance (I guess none of these are meant to be depended on by users), then it looks good to go :-)

python/ql/lib/semmle/python/frameworks/Gradio.qll

Co-authored-by: yoff <lerchedahl@gmail.com>

sylwia-budzynska · 2024-05-15T13:16:48Z

@yoff Added your proposed changes 🚀

yoff

LGTM

Add Gradio models

bed0d56

sylwia-budzynska requested a review from a team as a code owner April 5, 2024 12:24

github-actions bot added documentation Python labels Apr 5, 2024

github-advanced-security bot found potential problems Apr 5, 2024

View reviewed changes

Fix QLdoc

ca7789d

github-advanced-security bot found potential problems Apr 5, 2024

View reviewed changes

python/ql/lib/semmle/python/frameworks/Gradio.qll Fixed Show fixed Hide fixed

sylwia-budzynska added 3 commits April 5, 2024 14:51

Fix decorator QLdoc

84d6956

Add change note

1129925

Add tests

5d94658

yoff requested changes Apr 26, 2024

View reviewed changes

yoff self-assigned this Apr 26, 2024

sylwia-budzynska and others added 3 commits May 8, 2024 14:26

Apply suggestions from code review

eaba798

Co-authored-by: yoff <lerchedahl@gmail.com>

Put GradioInterface models into GradioInput

8bb4193

Change getASuccessor() to getASubscript()

944f884

sylwia-budzynska requested a review from yoff May 8, 2024 13:17

yoff requested changes May 8, 2024

View reviewed changes

sylwia-budzynska and others added 2 commits May 10, 2024 12:07

Apply suggestions from code review

52ceb7f

Co-authored-by: yoff <lerchedahl@gmail.com>

Fix tests

d6acea1

sylwia-budzynska requested a review from yoff May 10, 2024 11:47

yoff reviewed May 13, 2024

View reviewed changes

Merge branch 'main' into gradio-model

f72afdc

Update test results

34c4479

yoff requested changes May 15, 2024

View reviewed changes

Change classes to private

72493a6

Co-authored-by: yoff <lerchedahl@gmail.com>

yoff approved these changes May 16, 2024

View reviewed changes

yoff merged commit 5076b1a into github:main May 16, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python: Add Gradio models #16135

Python: Add Gradio models #16135

sylwia-budzynska commented Apr 5, 2024

sylwia-budzynska commented Apr 8, 2024

yoff left a comment

yoff Apr 26, 2024

sylwia-budzynska May 8, 2024

yoff May 8, 2024

sylwia-budzynska May 10, 2024

sylwia-budzynska commented May 8, 2024

yoff left a comment

yoff May 8, 2024

sylwia-budzynska May 10, 2024

yoff May 15, 2024

yoff commented May 8, 2024 •

edited

yoff commented May 8, 2024

sylwia-budzynska commented May 10, 2024 •

edited

yoff commented May 13, 2024

yoff left a comment

yoff May 13, 2024

sylwia-budzynska May 14, 2024

yoff May 15, 2024

sylwia-budzynska May 15, 2024

sylwia-budzynska commented May 14, 2024

yoff commented May 14, 2024

sylwia-budzynska commented May 14, 2024

yoff commented May 15, 2024

yoff left a comment

sylwia-budzynska commented May 15, 2024

yoff left a comment

-import python
-import semmle.python.dataflow.new.RemoteFlowSources
-from RemoteFlowSource rfs
-select rfs
+import python
+import semmle.python.dataflow.new.RemoteFlowSources
+import TestUtilities.InlineExpectationsTest
+private import semmle.python.dataflow.new.internal.PrintNode
+module SourceTest implements TestSig {
+  string getARelevantTag() { result = "source" }
+  predicate hasActualResult(Location location, string element, string tag, string value) {
+    exists(location.getFile().getRelativePath()) and
+    exists(RemoteFlowSource rfs |
+      location = rfs.getLocation() and
+      element = rfs.toString() and
+      value = prettyNode(rfs) and
+      tag = "source"
+    )
+  }
+}
+import MakeTest<SourceTest>

Python: Add Gradio models #16135

Python: Add Gradio models #16135

Conversation

sylwia-budzynska commented Apr 5, 2024

sylwia-budzynska commented Apr 8, 2024

yoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylwia-budzynska commented May 8, 2024

yoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yoff commented May 8, 2024 • edited

yoff commented May 8, 2024

sylwia-budzynska commented May 10, 2024 • edited

yoff commented May 13, 2024

yoff left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sylwia-budzynska commented May 14, 2024

yoff commented May 14, 2024

sylwia-budzynska commented May 14, 2024

yoff commented May 15, 2024

yoff left a comment

Choose a reason for hiding this comment

sylwia-budzynska commented May 15, 2024

yoff left a comment

Choose a reason for hiding this comment

yoff commented May 8, 2024 •

edited

sylwia-budzynska commented May 10, 2024 •

edited