Skip to content

Commit

Permalink
feat: Automatic sanitization of sensitive data in the output (#1842)
Browse files Browse the repository at this point in the history
* feat: Automatic output sanitization to obscure sensitive data by default

Ref: #1794

* test: more stable test

* test: CLI test

* test: reuse setup code

* test: more tests

* chore: support pytest

* chore: mask config

* chore: mask config

* chore: do not use hooks

* docs: update

* chore: naming
  • Loading branch information
Stranger6667 authored Oct 18, 2023
1 parent 34f2bcd commit bde2ec7
Show file tree
Hide file tree
Showing 23 changed files with 926 additions and 98 deletions.
10 changes: 10 additions & 0 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,16 @@ Loaders
.. autofunction:: schemathesis.graphql.from_url
.. autofunction:: schemathesis.graphql.from_wsgi

Sanitizing Output
~~~~~~~~~~~~~~~~~

.. autoclass:: schemathesis.sanitization.Config()

.. automethod:: with_keys_to_sanitize
.. automethod:: without_keys_to_sanitize
.. automethod:: with_sensitive_markers
.. automethod:: without_sensitive_markers

Schema
~~~~~~

Expand Down
4 changes: 3 additions & 1 deletion docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@ Changelog
- Automatic FastAPI fixup injecting for ASGI loaders, eliminating the need for manual setup. `#1797`_
- Support for ``body`` hooks in GraphQL schemas, enabling custom filtering or modification of queries and mutations. `#1464`_
- New ``filter_operations`` hook to conditionally include or exclude specific API operations from being tested.
- Introduced a new CLI option ``--experimental=openapi-3.1`` for experimental support of OpenAPI 3.1. This enables compatible JSON Schema validation for responses, while data generation remains OpenAPI 3.0-compatible. `#1820`_
- Added ``contains`` method to ``ParameterSet`` for easier parameter checks in hooks. `#1789`_
- Automatic sanitization of sensitive data in the output is now enabled by default. This feature can be disabled using the ``--sanitize-output=false`` CLI option. For more advanced customization, use ``schemathesis.sanitizing.configure()``. `#1794`_
- ``--experimental=openapi-3.1`` CLI option for experimental support of OpenAPI 3.1. This enables compatible JSON Schema validation for responses, while data generation remains OpenAPI 3.0-compatible. `#1820`_

**Note**: Experimental features can change or be removed in any minor version release.

Expand Down Expand Up @@ -3476,6 +3477,7 @@ Deprecated
.. _#1802: https://github.com/schemathesis/schemathesis/issues/1802
.. _#1801: https://github.com/schemathesis/schemathesis/issues/1801
.. _#1797: https://github.com/schemathesis/schemathesis/issues/1797
.. _#1794: https://github.com/schemathesis/schemathesis/issues/1794
.. _#1789: https://github.com/schemathesis/schemathesis/issues/1789
.. _#1788: https://github.com/schemathesis/schemathesis/issues/1788
.. _#1783: https://github.com/schemathesis/schemathesis/issues/1783
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,7 @@ User's Guide
contrib
stateful
how
sanitizing
compatibility
examples
graphql
Expand Down
45 changes: 45 additions & 0 deletions docs/sanitizing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
.. _sanitizing-output:

Sanitizing Output
=================

Schemathesis automatically sanitizes sensitive data in both the generated test case and the received response to prevent accidental exposure of sensitive information.
This feature replaces certain headers, cookies, and other fields that could contain sensitive data with the string ``[Filtered]``.

.. note::
Schemathesis does not sanitize sensitive data in response bodies due to the challenge of preserving the original formatting of the payload.

You can control this feature through the ``--sanitize-output`` CLI option:

.. code-block:: bash
schemathesis run --sanitize-output=false ...
Or in Python tests:

.. code-block:: python
schema = schemathesis.from_dict({...}, sanitize_output=False)
Disabling this option will turn off the automatic sanitization of sensitive data in the output.

For more advanced customization of the sanitization process, you can define your own sanitization configuration and pass it to the ``configure`` function.
Here's how you could do it:

.. code-block:: python
import schemathesis
# Create a custom config
custom_config = (
schemathesis.sanitization.Config(replacement="[Custom]")
.with_keys_to_sanitize("X-Customer-ID")
.with_sensitive_markers("address")
)
# Configure Schemathesis to use your custom sanitization configuration
schemathesis.sanitization.configure(custom_config)
This will sanitize the ``X-Customer-ID`` headers (case-insensitive), and any fields containing the substring "address" (case-insensitive) in their names, with the string "[Custom]" in the generated test case and the received response.

This will sanitize the ``X-Customer-ID`` headers, and any fields containing the substring "address" in their names, with the string "[Custom]" in the generated test case and the received response.
34 changes: 26 additions & 8 deletions docs/service.rst
Original file line number Diff line number Diff line change
Expand Up @@ -122,14 +122,32 @@ Each failure is accompanied by a cURL snippet you can use to reproduce the issue

.. image:: https://raw.githubusercontent.com/schemathesis/schemathesis/master/img/service_server_error.png

Alternatively, you can use the **Replay** button on the failure page.

What data is sent?
What Data is Sent?
------------------

CLI sends info to Schemathesis.io in the following cases:
The following data is included in the reports sent to Schemathesis.io by the CLI:

- **Metadata**:

- Information about your host machine to help us understand our users better.
- Collected data includes your Python interpreter version, implementation, system/OS name, and release.

- **Test Runs**:

- Most of the Schemathesis runner's events are included, encompassing all generated data and explicitly passed headers.
- Sensitive data within the generated test cases and received responses is automatically sanitized by default, replaced with the string ``[Filtered]`` to prevent accidental exposure.
- Further information on what is considered sensitive and how it is sanitized can be found at :ref:`Sanitizing Output <sanitizing-output>`.

- **Environment Variables**:

- Some environment variables specific to CI providers are included.
- These are used to comment on pull requests.

- **Command-Line Options**:

- Command-line options without free-form values are sent to help us understand how you use the CLI.
- Rest assured, any sensitive data passed through command-line options is sanitized by default.

For more details on our data handling practices, please refer to our `Privacy Policy <https://schemathesis.io/legal/privacy>`_. If you have further questions or concerns about data handling, feel free to contact us at `support@schemathesis.io <mailto:support@schemathesis.io>`_.

- Authentication. Metadata about your host machine, that helps us to understand our users better. We collect your Python interpreter version, implementation, system/OS name and release. For more information look at ``service/metadata.py``
- Test runs. Most of Schemathesis runner's events, including all generated data and explicitly passed headers. For more information look at ``service/serialization.py``
- Some environment variables specific to CI providers. We use them to comment on pull requests.
- Command-line options without free-form values. It helps us to understand how you use the CLI.
For information on data access, retention, and deletion, please refer to the `FAQ section <https://docs.schemathesis.io/faq>`_ in our SaaS documentation.
13 changes: 13 additions & 0 deletions src/schemathesis/cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@
from .handlers import EventHandler
from .junitxml import JunitXMLHandler
from .options import CsvChoice, CsvEnumChoice, CustomHelpMessageChoice, NotSet, OptionalInt
from .sanitization import SanitizationHandler

try:
from yaml import CSafeLoader as SafeLoader
Expand Down Expand Up @@ -501,6 +502,13 @@ class ReportToService:
help="Force Schemathesis to parse the input schema with the specified spec version.",
type=click.Choice(["20", "30"]),
)
@click.option(
"--sanitize-output",
type=bool,
default=True,
show_default=True,
help="Enable or disable automatic output sanitization to obscure sensitive data.",
)
@click.option(
"--contrib-unique-data",
"contrib_unique_data",
Expand Down Expand Up @@ -665,6 +673,7 @@ def run(
stateful: Optional[Stateful] = None,
stateful_recursion_limit: int = DEFAULT_STATEFUL_RECURSION_LIMIT,
force_schema_version: Optional[str] = None,
sanitize_output: bool = True,
contrib_unique_data: bool = False,
contrib_openapi_formats_uuid: bool = False,
hypothesis_database: Optional[str] = None,
Expand Down Expand Up @@ -838,6 +847,7 @@ def run(
code_sample_style=code_sample_style,
data_generation_methods=data_generation_methods,
debug_output_file=debug_output_file,
sanitize_output=sanitize_output,
host_data=host_data,
client=client,
report=report,
Expand Down Expand Up @@ -1137,6 +1147,7 @@ def execute(
code_sample_style: CodeSampleStyle,
data_generation_methods: Tuple[DataGenerationMethod, ...],
debug_output_file: Optional[click.utils.LazyFile],
sanitize_output: bool,
host_data: service.hosts.HostData,
client: Optional[service.ServiceClient],
report: Optional[Union[ReportToService, click.utils.LazyFile]],
Expand Down Expand Up @@ -1190,6 +1201,8 @@ def execute(
cassettes.CassetteWriter(cassette_path, preserve_exact_body_bytes=cassette_preserve_exact_body_bytes)
)
handlers.append(get_output_handler(workers_num))
if sanitize_output:
handlers.insert(0, SanitizationHandler())
execution_context = ExecutionContext(
hypothesis_settings=hypothesis_settings,
workers_num=workers_num,
Expand Down
15 changes: 15 additions & 0 deletions src/schemathesis/cli/sanitization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
from dataclasses import dataclass

from ..runner import events
from ..sanitization import sanitize_serialized_check, sanitize_serialized_interaction
from .handlers import EventHandler, ExecutionContext


@dataclass
class SanitizationHandler(EventHandler):
def handle_event(self, context: ExecutionContext, event: events.ExecutionEvent) -> None:
if isinstance(event, events.AfterExecution):
for check in event.result.checks:
sanitize_serialized_check(check)
for interaction in event.result.interactions:
sanitize_serialized_interaction(interaction)
4 changes: 4 additions & 0 deletions src/schemathesis/lazy.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ class LazySchema:
data_generation_methods: Union[DataGenerationMethodInput, NotSet] = NOT_SET
code_sample_style: CodeSampleStyle = CodeSampleStyle.default()
rate_limiter: Optional[Limiter] = None
sanitize_output: bool = True

def hook(self, hook: Union[str, Callable]) -> Callable:
return self.hooks.register(hook)
Expand Down Expand Up @@ -116,6 +117,7 @@ def wrapped_test(request: FixtureRequest) -> None:
code_sample_style=_code_sample_style,
app=self.app,
rate_limiter=self.rate_limiter,
sanitize_output=self.sanitize_output,
)
fixtures = get_fixtures(test, request, given_kwargs)
# Changing the node id is required for better reporting - the method and path will appear there
Expand Down Expand Up @@ -276,6 +278,7 @@ def get_schema(
data_generation_methods: Union[DataGenerationMethodInput, NotSet] = NOT_SET,
code_sample_style: CodeSampleStyle,
rate_limiter: Optional[Limiter],
sanitize_output: bool,
) -> BaseSchema:
"""Loads a schema from the fixture."""
schema = request.getfixturevalue(name)
Expand All @@ -296,6 +299,7 @@ def get_schema(
data_generation_methods=data_generation_methods,
code_sample_style=code_sample_style,
rate_limiter=rate_limiter,
sanitize_output=sanitize_output,
)


Expand Down
4 changes: 4 additions & 0 deletions src/schemathesis/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
)
from .hooks import GLOBAL_HOOK_DISPATCHER, HookContext, HookDispatcher, dispatch
from .parameters import Parameter, ParameterSet, PayloadAlternatives
from .sanitization import sanitize_request, sanitize_response
from .serializers import Serializer, SerializerContext
from .types import Body, Cookies, FormData, Headers, NotSet, PathParameters, Query
from .utils import (
Expand Down Expand Up @@ -471,6 +472,9 @@ def validate_response(
else self.operation.schema.code_sample_style
)
verify = getattr(response, "verify", True)
if self.operation.schema.sanitize_output:
sanitize_request(response.request)
sanitize_response(response)
code_message = self._get_code_message(code_sample_style, response.request, verify=verify)
payload = get_response_payload(response)
raise exception_cls(
Expand Down
Loading

0 comments on commit bde2ec7

Please sign in to comment.