Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module boto3 - Taint Propagation Issue #795

Closed
giusepperaffa opened this issue Sep 21, 2023 · 3 comments
Closed

Module boto3 - Taint Propagation Issue #795

giusepperaffa opened this issue Sep 21, 2023 · 3 comments

Comments

@giusepperaffa
Copy link

Description
I have been trying to use Pysa (Ubuntu 20.04 + virtual environment + Python 3.8) to perform a data flow analysis of the following code (stored in a file called httphandler.py). The expected result is a data flow that has event (defined in the parameter list of the handler onHTTPPostEvent) as source and bucket.objects.all().delete() as sink.

from mypy_boto3_s3 import S3ServiceResource
import boto3

def onHTTPPostEvent(event, context) -> None:
    s3: S3ServiceResource = boto3.resource('s3')
    bucket = s3.Bucket(event['body'])
    bucket.objects.all().delete()

Pysa does not detect any data flow and I am struggling to understand how this can be solved, if at all. Details on used Pysa models, debugging and resolution attempts are provided below.

Pysa models
These are the models that I tried using initially. All of them are considered valid by Pysa. Note that none of the models focuses on the Bucket object, as I was hoping to rely on the automatic propagation of the taint attached to event['body']. Model 2 and 3 below, which were written by using the information available here, can be considered one an alternative to the other.

  1. def httphandler.onHTTPPostEvent(event: TaintSource[Test]): ...
  2. def mypy_boto3_s3.service_resource.BucketObjectsCollection.all() -> TaintSink[Test]: ...
  3. def mypy_boto3_s3.service_resource.ObjectSummary.delete() -> TaintSink[Test]: ...

Debugging steps
As suggested here, I have instrumented my code with the functions reveal_taint and reveal_type. Results:

  • event['body']: forward taint Test / No backward taint ({}). This is what I expected.
  • bucket: No forward taint ({}) / No backward taint ({}). As for the type, it is correctly identified as mypy_boto3_s3.service_resource.Bucket

The conclusion is that the taint of event['body'] is not propagated to the object bucket, which explains why the expected data flow is not found.

Attempts to propagate the taint
Taking into account the information on TITO, I have attempted the following:

Attempt 1
I have attempted to propagate the taint by modelling the constructor of Bucket as follows:

  • def mypy_boto3_s3.service_resource.Bucket.__init__(self, name: TaintInTaintOut[LocalReturn]): ...
  • def mypy_boto3_s3.service_resource.Bucket.__init__(self, *args: TaintInTaintOut[LocalReturn]): ...

Neither of the above models was considered valid by Pysa, which suggested modelling the constructor of the base class (object) instead. I have therefore tried the following models, but none of them was considered valid by Pysa because the name parameter was unexpected.

  • def object.__init__(self, name: TaintInTaintOut[LocalReturn]): ...
  • def object.__init__(self, *args: TaintInTaintOut[LocalReturn]): ...
  • def object.__init__(self, **kwargs: TaintInTaintOut[LocalReturn]): ...

Attempt 2
The boto3stubs documentation shows here that name is an attribute of Bucket. The following model is, in fact, considered valid by Pysa:

  • mypy_boto3_s3.service_resource.Bucket.name: TaintSource[Test]

However, the above model does not achieve the propagation of the taint from name to the Bucket object. Consequently, I have tried forcing the propagation with the models below, but neither of them worked because TaintInTaintOut is not supported in models that contain only attributes.

  • mypy_boto3_s3.service_resource.Bucket.name: TaintInTaintOut[Updates[self]]
  • mypy_boto3_s3.service_resource.Bucket.name: TaintInTaintOut[Updates[mypy_boto3_s3.service_resource.Bucket]]

Conclusion
The boto3.resource object is rather complex, and relying on automatic taint propagation is not an option. I would be keen to know whether there is a way of detecting the expected data flow in the above case or if this is a Pysa limitation / bug.

Please let me know if you need any additional information.

Thank you very much.

@arthaud
Copy link
Contributor

arthaud commented Sep 22, 2023

Hi @giusepperaffa,

You are definitely on the right track. The problem is that bucket is not tainted when it should be.
The first thing I would do is check the call graph for onHTTPPostEvent. This can be found in the call-graph.json file in the result directory (use --save-results-to).
Then, depending on what that line bucket = s3.Bucket(event['body']) calls, I would check the model for the callee in the taint-output.json file. This can be done with the model explorer: https://pyre-check.org/docs/pysa-explore/
If you want me to take a look, please send me all the files in the result directory (--save-results-to) as an archive/zip/whatever.

Note that there is another problem that you will hit next. Those models:

def mypy_boto3_s3.service_resource.BucketObjectsCollection.all() -> TaintSink[Test]: ...
def mypy_boto3_s3.service_resource.ObjectSummary.delete() -> TaintSink[Test]: ...

You are marking the result of a function as a sink (something we call "return sinks"). This will only impact the analysis of the body of delete and all itself:

# within mypy_boto3_s3.service_resource
class ObjectSummary:
  def delete(self):
     return something # here the return is considered a sink

This is not what you want. What you want is to mark self as flowing into a sink:

def mypy_boto3_s3.service_resource.ObjectSummary.delete(self: TaintSink[Test]): ...

@giusepperaffa
Copy link
Author

Hi @arthaud,

Thank you very much for your help. I have now solved the problem, and this issue can be closed. A few details for the future.

Analysis of the call graph
The are two ways of printing the call graph, which rely on the functions pyre_dump() and pyre_dump_call_graph() mentioned in the Pysa documentation here. Note: in my case, neither of them generates the above-mentioned call-graph.json file in the results directory, despite executing Pysa with the --save-results-to option. This might be due to the particular version of Pysa that I am using (installed in March 2022).

Case 1 - pyre_dump()
This is the function that provides the most verbose output. I simply called it within the function body as shown below:

from mypy_boto3_s3 import S3ServiceResource
import boto3

def onHTTPPostEvent(event, context) -> None:
    pyre_dump()
    s3: S3ServiceResource = boto3.resource('s3')
    bucket = s3.Bucket(event['body'])
    bucket.objects.all().delete()

Case 2 - pyre_dump_call_graph()
This is the function that provides the least verbose output. I simply called it within the function body as shown below:

from mypy_boto3_s3 import S3ServiceResource
import boto3

def onHTTPPostEvent(event, context) -> None:
    pyre_dump_call_graph()
    s3: S3ServiceResource = boto3.resource('s3')
    bucket = s3.Bucket(event['body'])
    bucket.objects.all().delete()

Model for the source
The analysis of the call graph was enough to understand how to propagate the taint. This was the missing model:

def mypy_boto3_s3.service_resource.S3ServiceResource.Bucket(self, name: TaintInTaintOut[LocalReturn]): ...

Model for the sink
The model for the sink was also corrected, as suggested above. The following models are both considered valid by Pysa:

  • def mypy_boto3_s3.service_resource.ObjectSummary.delete(self: TaintSink[Test]): ...
  • def mypy_boto3_s3.service_resource.BucketObjectsCollection.all(self: TaintSink[Test]): ...

However, it is the model for BucketObjectsCollection that allows identifying the expected sink. Again, this can be understood by looking at the call graph (Pysa will not complain if the model for ObjectSummary is used as well, but this is not the model that allows detecting the expected sink).

@arthaud arthaud closed this as completed Sep 22, 2023
@arthaud
Copy link
Contributor

arthaud commented Sep 22, 2023

Note: in my case, neither of them generates the above-mentioned call-graph.json file in the results directory, despite executing Pysa with the --save-results-to option. This might be due to the particular version of Pysa that I am using (installed in March 2022).

Just for clarification: this must be because this is a somewhat new feature. The latest pyre-check package might not have it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants