Skip to content
This repository has been archived by the owner on Apr 6, 2023. It is now read-only.

feat: add SuggestionRequest to tracing logs #179

Merged
merged 1 commit into from Oct 26, 2021

Conversation

Dexterp37
Copy link
Contributor

@Dexterp37 Dexterp37 commented Oct 22, 2021

While the object is always logged in the tracing logs, the log_full_request setting controls whether or not to log the query string as well or replace it with an empty string.

Important note: this PR additionally adds request to the skip list of the tracing instrument, since the Http request would end up logging the raw query string from there.

With the option off (default):

 Oct 22 14:46:55.263  INFOmerino_adm::remote_settings: Completed syncing quicksuggest records from Remote Settings, type: "adm.remote-settings.sync-done"
    at merino-adm\src\remote_settings\mod.rs:192
    in merino_adm::remote_settings::sync
    in merino_web::providers::suggestion_provider_setup
    in merino_web::endpoints::suggest::suggest with query_parameters: Query(SuggestQueryParameters { client_variants: [], providers: None }), suggestion_request: SuggestionRequest { accepts_english: true, city: None, country: None, device_info: DeviceInfo { os_family: Windows, form_factor: Desktop, browser: Firefox(100) }, dma: None, query: "", region: None }
    in tracing_actix_web_mozlog::middleware::request with method: GET, path: /api/v1/suggest, rid: 8835f9df-8c53-418b-b0d7-9bc145e24315, agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0"

With the option on:

  Oct 22 14:33:44.468  INFOmerino_adm::remote_settings: Completed syncing quicksuggest records from Remote Settings, type: "adm.remote-settings.sync-done"
    at merino-adm\src\remote_settings\mod.rs:192
    in merino_adm::remote_settings::sync
    in merino_web::providers::suggestion_provider_setup
    in merino_web::endpoints::suggest::suggest with query_parameters: Query(SuggestQueryParameters { client_variants: [], providers: None }), suggestion_request: SuggestionRequest { accepts_english: true, city: None, country: None, device_info: DeviceInfo { os_family: Windows, form_factor: Desktop, browser: Firefox(100) }, dma: None, query: "banana", region: None }
    in tracing_actix_web_mozlog::middleware::request with method: GET, path: /api/v1/suggest, rid: 1078df3e-9ad1-4294-8bcf-4df7fa8ae3ba, agent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:100.0) Gecko/20100101 Firefox/100.0"

Fixes #171

@Dexterp37 Dexterp37 requested a review from a team as a code owner October 22, 2021 12:48
@Dexterp37 Dexterp37 self-assigned this Oct 22, 2021
@jrconlin
Copy link
Member

I'm a little skiddish about this, because search terms are VERY sensitive information, but I understand the need for it for testing.
Is there any way we can either enforce that this is not enabled in production or add some reporting flag to __heartbeat__ to indicate that it's turned off?

@Dexterp37
Copy link
Contributor Author

Dexterp37 commented Oct 22, 2021

I'm a little skiddish about this, because search terms are VERY sensitive information, but I understand the need for it for testing.

I totally understand it and share the concern! It would be interesting to hear from @mythmon how he thought about using this feature.

Also please note that search queries are already leaked (if I'm not wrong).

Is there any way we can either enforce that this is not enabled in production or add some reporting flag to __heartbeat__ to indicate that it's turned off?

I believe we can do either or both! :-D Maybe we could only allow this if we're in 'debug' mode, or add a check in the deployment logic to fail deployment on prod if this is turned on?

@mythmon , what do you think of this feature? Did you expect it to behave differently when writing the ticket?

@mythmon
Copy link
Contributor

mythmon commented Oct 25, 2021

I'm a little skiddish about this, because search terms are VERY sensitive information, but I understand the need for it for testing.
Is there any way we can either enforce that this is not enabled in production or add some reporting flag to heartbeat to indicate that it's turned off?

This is not a testing feature. We are intentionally planning to log these in production for real users. They are very sensitive, but we are going to be treating all of Merino's logs as sensitive, and handling them appropriately. In short, the data pipeline is going to consume these logs, sanitize them, and aggregate them in a very tightly controlled environment.

@jrconlin
Copy link
Member

Ah. In that case, we might want to add that as a comment above the option description, ideally with pointers to the code where we sanitize the logs. We're building trust, so we should be certain to provide a way to verify.

@Dexterp37
Copy link
Contributor Author

I'm a little skiddish about this, because search terms are VERY sensitive information, but I understand the need for it for testing.
Is there any way we can either enforce that this is not enabled in production or add some reporting flag to heartbeat to indicate that it's turned off?

... but we are going to be treating all of Merino's logs as sensitive...

Ah, I thought we settled for the all-but-search queries as sensitive, so that engineers could use non sensitive logs to help debug?

@mythmon
Copy link
Contributor

mythmon commented Oct 25, 2021

It's still up in the air, but I expect we will have sanitized versions of these logs that engineers can use to debug that are still representative of user behavior.

Copy link
Contributor

@mythmon mythmon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new setting should be added to the Operational documentation in docs/ops.md.

While the object is always logged in the tracing logs,
the log_full_request setting controls whether or not
to log the query string as well or replace it with an
empty string.
@Dexterp37 Dexterp37 merged commit b56833a into mozilla-services:main Oct 26, 2021
@Dexterp37 Dexterp37 deleted the consvc-1317 branch October 26, 2021 16:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to configurably log suggestion requests from Merino
3 participants