Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanitize_field_names spec #334

Merged
merged 1 commit into from Sep 15, 2020

Conversation

felixbarny
Copy link
Member

@felixbarny felixbarny commented Aug 26, 2020

Supersedes #319

Note that the implementation of this spec also includes adding the option to Kibana.

Agent Milestone Link to agent implementation issue
.NET n/a already implemented according to spec
Go ? elastic/apm-agent-go#821
Java n/a already implemented according to spec
Node.js 7.11 elastic/apm-agent-nodejs#1871
PHP ? elastic/apm-agent-php#161
Python 7.11 elastic/apm-agent-python#904 (central config missing)
Ruby 7.10 elastic/apm-agent-ruby#857
RUM n/a

@apmmachine
Copy link
Collaborator

apmmachine commented Aug 26, 2020

💔 Build Failed

Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: [Started by timer]

  • Start Time: 2020-09-14T05:06:00.762+0000

  • Duration: 3 min 31 sec

Steps errors

Expand to view the steps failures

  • Name: Shell Script
    • Description: [2020-09-14T05:08:30.627Z] + git diff --name-only ae241d3759ec5e6b8c82e85f8ffac6b70a59fb53...cc3043d

    • Duration: 0 min 0 sec

    • Start Time: 2020-09-14T05:08:30.338+0000

    • log

Log output

Expand to view the last 100 lines of log output

[2020-09-14T05:06:05.482Z] To remove this log message, move the test code outside of src/. To restore the previous behavior that allowed access to files in src/test/, pass -Dorg.jenkinsci.plugins.workflow.libs.SCMSourceRetriever.INCLUDE_SRC_TEST_IN_LIBRARIES=true to the java command used to start Jenkins.
[2020-09-14T05:06:21.088Z] Still waiting to schedule task
[2020-09-14T05:06:21.088Z] Waiting for next available executor on ‘linux&&immutable’
[2020-09-14T05:07:54.083Z] Running on apm-ci-immutable-ubuntu-1804-1600059969297340462 in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334
[2020-09-14T05:07:54.214Z] �[39;49m[INFO] Override default checkout�[0m
[2020-09-14T05:07:54.271Z] Sleeping for 10 sec
[2020-09-14T05:08:06.934Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-09-14T05:08:06.957Z] Wiping out workspace first.
[2020-09-14T05:08:06.987Z] Cloning the remote Git repository
[2020-09-14T05:08:06.987Z] Using shallow clone with depth 4
[2020-09-14T05:08:06.987Z] Avoid fetching tags
[2020-09-14T05:08:07.011Z] Cloning repository git@github.com:elastic/apm.git
[2020-09-14T05:08:07.069Z]  > git init /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334 # timeout=10
[2020-09-14T05:08:07.127Z] Fetching upstream changes from git@github.com:elastic/apm.git
[2020-09-14T05:08:07.127Z]  > git --version # timeout=10
[2020-09-14T05:08:07.134Z]  > git --version # 'git version 2.17.1'
[2020-09-14T05:08:07.135Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:07.163Z]  > git fetch --no-tags --progress -- git@github.com:elastic/apm.git +refs/heads/*:refs/remotes/origin/* # timeout=15
[2020-09-14T05:08:07.927Z] Cleaning workspace
[2020-09-14T05:08:07.954Z] Using shallow fetch with depth 4
[2020-09-14T05:08:07.954Z] Pruning obsolete local branches
[2020-09-14T05:08:08.541Z] Merging remotes/origin/master commit c51d0084047222398a517d7b02814ebb142cfa00 into PR head commit cc3043d1fb2a97c306250b017e3b615c0ca30da6
[2020-09-14T05:08:08.667Z] Merge succeeded, producing 6f1f38cc223f48aaf6a648e06a6958b4344e4374
[2020-09-14T05:08:08.667Z] Checking out Revision 6f1f38cc223f48aaf6a648e06a6958b4344e4374 (PR-334)
[2020-09-14T05:08:07.903Z]  > git config remote.origin.url git@github.com:elastic/apm.git # timeout=10
[2020-09-14T05:08:07.907Z]  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-09-14T05:08:07.920Z]  > git config remote.origin.url git@github.com:elastic/apm.git # timeout=10
[2020-09-14T05:08:07.930Z]  > git rev-parse --verify HEAD # timeout=10
[2020-09-14T05:08:07.940Z] No valid HEAD. Skipping the resetting
[2020-09-14T05:08:07.941Z]  > git clean -fdx # timeout=10
[2020-09-14T05:08:07.959Z] Fetching upstream changes from git@github.com:elastic/apm.git
[2020-09-14T05:08:07.960Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:07.966Z]  > git fetch --no-tags --progress --prune -- git@github.com:elastic/apm.git +refs/pull/334/head:refs/remotes/origin/PR-334 +refs/heads/master:refs/remotes/origin/master # timeout=15
[2020-09-14T05:08:08.549Z]  > git config core.sparsecheckout # timeout=10
[2020-09-14T05:08:08.569Z]  > git checkout -f cc3043d1fb2a97c306250b017e3b615c0ca30da6 # timeout=15
[2020-09-14T05:08:08.603Z]  > git remote # timeout=10
[2020-09-14T05:08:08.621Z]  > git config --get remote.origin.url # timeout=10
[2020-09-14T05:08:08.635Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:08.640Z]  > git merge c51d0084047222398a517d7b02814ebb142cfa00 # timeout=10
[2020-09-14T05:08:08.657Z]  > git rev-parse HEAD^{commit} # timeout=10
[2020-09-14T05:08:08.670Z]  > git config core.sparsecheckout # timeout=10
[2020-09-14T05:08:08.681Z]  > git checkout -f 6f1f38cc223f48aaf6a648e06a6958b4344e4374 # timeout=15
[2020-09-14T05:08:12.285Z] Commit message: "Merge commit 'c51d0084047222398a517d7b02814ebb142cfa00' into HEAD"
[2020-09-14T05:08:12.301Z] First time build. Skipping changelog.
[2020-09-14T05:08:12.301Z] Cleaning workspace
[2020-09-14T05:08:12.289Z]  > git rev-list --no-walk 101fe3d82a65476e0b632980f4391061e3d426bd # timeout=10
[2020-09-14T05:08:12.303Z]  > git rev-parse --verify HEAD # timeout=10
[2020-09-14T05:08:12.312Z] Resetting working tree
[2020-09-14T05:08:12.312Z]  > git reset --hard # timeout=10
[2020-09-14T05:08:12.330Z]  > git clean -fdx # timeout=10
[2020-09-14T05:08:12.951Z] Masking supported pattern matches of $JOB_GCS_BUCKET or $NOTIFY_TO
[2020-09-14T05:08:12.984Z] Timeout set to expire in 3 hr 0 min
[2020-09-14T05:08:12.993Z] The timestamps step is unnecessary when timestamps are enabled for all Pipeline builds.
[2020-09-14T05:08:13.191Z] [INFO] 'shallow' is forced to be disabled when running on PullRequests
[2020-09-14T05:08:13.199Z] Running in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334/src/github.com/elastic/apm
[2020-09-14T05:08:13.209Z] [INFO] gitCheckout: Checkout master from git@github.com:elastic/apm.git with credentials f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-09-14T05:08:13.224Z] [INFO] Override default checkout
[2020-09-14T05:08:13.248Z] Sleeping for 10 sec
[2020-09-14T05:08:23.388Z] using credential f6c7695a-671e-4f4f-a331-acdce44ff9ba
[2020-09-14T05:08:23.415Z] Cloning the remote Git repository
[2020-09-14T05:08:23.432Z] Cloning repository git@github.com:elastic/apm.git
[2020-09-14T05:08:23.462Z]  > git init /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334/src/github.com/elastic/apm # timeout=10
[2020-09-14T05:08:23.470Z] Fetching upstream changes from git@github.com:elastic/apm.git
[2020-09-14T05:08:23.470Z]  > git --version # timeout=10
[2020-09-14T05:08:23.481Z]  > git --version # 'git version 2.17.1'
[2020-09-14T05:08:23.481Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:23.486Z]  > git fetch --tags --progress -- git@github.com:elastic/apm.git +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-09-14T05:08:24.123Z]  > git config remote.origin.url git@github.com:elastic/apm.git # timeout=10
[2020-09-14T05:08:24.127Z]  > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
[2020-09-14T05:08:24.135Z]  > git config remote.origin.url git@github.com:elastic/apm.git # timeout=10
[2020-09-14T05:08:24.143Z] Fetching upstream changes from git@github.com:elastic/apm.git
[2020-09-14T05:08:24.143Z] using GIT_SSH to set credentials GitHub user @elasticmachine SSH key
[2020-09-14T05:08:24.152Z]  > git fetch --tags --progress -- git@github.com:elastic/apm.git +refs/heads/*:refs/remotes/origin/* +refs/pull/*/head:refs/remotes/origin/PR/* # timeout=10
[2020-09-14T05:08:25.027Z] Checking out Revision c51d0084047222398a517d7b02814ebb142cfa00 (origin/master)
[2020-09-14T05:08:25.063Z] Commit message: "[CI] compare with the calculated SHA commit (#336)"
[2020-09-14T05:08:25.063Z] First time build. Skipping changelog.
[2020-09-14T05:08:25.022Z]  > git rev-parse origin/master^{commit} # timeout=10
[2020-09-14T05:08:25.030Z]  > git config core.sparsecheckout # timeout=10
[2020-09-14T05:08:25.041Z]  > git checkout -f c51d0084047222398a517d7b02814ebb142cfa00 # timeout=10
[2020-09-14T05:08:25.836Z] Masking supported pattern matches of $GIT_USERNAME or $GIT_PASSWORD
[2020-09-14T05:08:26.412Z] + git fetch https://****:****@github.com/elastic/apm.git +refs/pull/*/head:refs/remotes/origin/pr/*
[2020-09-14T05:08:26.722Z] Archiving artifacts
[2020-09-14T05:08:27.390Z] + git rev-parse HEAD
[2020-09-14T05:08:27.748Z] + git rev-parse HEAD
[2020-09-14T05:08:28.051Z] + git rev-parse origin/pr/334
[2020-09-14T05:08:28.083Z] [INFO] githubEnv: Found Git Build Cause: pr
[2020-09-14T05:08:28.327Z] Masking supported pattern matches of $GITHUB_TOKEN
[2020-09-14T05:08:29.348Z] [INFO] githubPrCheckApproved: Title: Add sanitize_field_names spec - User: felixbarny - Author Association: MEMBER
[2020-09-14T05:08:29.907Z] Stashed 387 file(s)
[2020-09-14T05:08:30.267Z] Running in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334/src/github.com/elastic/apm
[2020-09-14T05:08:30.627Z] + git diff --name-only ae241d3759ec5e6b8c82e85f8ffac6b70a59fb53...cc3043d1fb2a97c306250b017e3b615c0ca30da6
[2020-09-14T05:08:30.627Z] fatal: Invalid symmetric difference expression ae241d3759ec5e6b8c82e85f8ffac6b70a59fb53...cc3043d1fb2a97c306250b017e3b615c0ca30da6
[2020-09-14T05:08:30.678Z] Stage "Send Pull Request for BDD specs" skipped due to earlier failure(s)
[2020-09-14T05:08:30.697Z] Stage "Send Pull Request for JSON specs" skipped due to earlier failure(s)
[2020-09-14T05:08:30.903Z] Running on Jenkins in /var/lib/jenkins/workspace/ared_apm-update-specs-mbp_PR-334
[2020-09-14T05:08:30.990Z] [INFO] getVaultSecret: Getting secrets
[2020-09-14T05:08:31.061Z] Masking supported pattern matches of $VAULT_ADDR or $VAULT_ROLE_ID or $VAULT_SECRET_ID
[2020-09-14T05:08:31.632Z] + chmod 755 generate-build-data.sh
[2020-09-14T05:08:31.632Z] + ./generate-build-data.sh https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-334/ https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-334/runs/7 FAILURE 150610
[2020-09-14T05:08:32.183Z] INFO: curl https://apm-ci.elastic.co/blue/rest/organizations/jenkins/pipelines/apm-shared/apm-update-specs-mbp/PR-334/runs/7/steps/?limit=10000 -o steps-info.json

Copy link
Contributor

@basepi basepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@basepi
Copy link
Contributor

basepi commented Sep 1, 2020

This will be a breaking change in python (adding globbing, which is implicit in the current implementation), so we'll have to do a major version. But I think it will be worth it to standardize across agents.

@felixbarny
Copy link
Member Author

Another option that doesn't require a new major version right now would be to do the change in two steps:

Align the default with the other agents but retain the auto-glob behavior by automatically adding wildcards to the beginning and end of each pattern (unless there already is a wildcard).

Examples:

foo  -> *foo*
foo* -> *foo*

That behavior can be removed in the next major version to fully align.

@mikker
Copy link
Contributor

mikker commented Sep 2, 2020

Another thing is that �Python, �Ruby, Node.js agents currently check some values too for strings looking like credit card numbers.

Do we just keep this check as is, separate from the config option, or do we stop checking?

@felixbarny
Copy link
Member Author

Oh, that's interesting, I didn't know about these checks.

I think it definitely makes sense to leave them in but they seem to be orthogonal to this config option anyway. Could you create a follow-up spec PR to make sure we do that check across agents?
Regex checks on each value can be expensive but we could do some lightweight pre-checking. For example, only check the regex if the value has a length of at 16-20 and the 4 first characters are a digit.

@mikker
Copy link
Contributor

mikker commented Sep 2, 2020

We could also consider removing the check. I'm skeptical that it provides any real world benefit? Compared to checking every value ever and how few may be sending credit card numbers in this way, under another key than those already checked? That's all just speculation, of course.

@basepi
Copy link
Contributor

basepi commented Sep 2, 2020

I'd be fine with removing the credit card checks, their value seems dubious to me.

@mikker
Copy link
Contributor

mikker commented Sep 3, 2020

So upgrade path is

  1. add current defaults but with *s around each key
  2. keep value checking outside of config option
  3. on next major, remove *s and value checking?

@felixbarny
Copy link
Member Author

With add current defaults you mean password, passwd, pwd, secret, *key, *token*, *session*, *credit*, *card*, authorization, set-cookie? Any of the values, including defaults and custom additions, will have * added around them.

This would apply to Python and Ruby. Other agents don't add the additional wildcards.

@mikker mikker marked this pull request as ready for review September 7, 2020 09:05
@mikker mikker requested review from a team as code owners September 7, 2020 09:05
@beniwohli
Copy link
Contributor

Why don't we sanitize the query string? Sure, the querystring shouldn't be used for sensitive data, but the interweb isn't really known for following best practices...

I'm asking because in the Python agent we do currently sanitize the query string. Removing that would be a breaking change.

@gregkalapos
Copy link
Contributor

Why don't we sanitize the query string? Sure, the querystring shouldn't be used for sensitive data, but the interweb isn't really known for following best practices...

I'm asking because in the Python agent we do currently sanitize the query string. Removing that would be a breaking change.

Both in the .NET and in the Java doc we quote owasp.org:

Data in the query string is considered non-sensitive, as sensitive information should not be sent in the query string. 
See https://www.owasp.org/index.php/Information_exposure_through_query_strings_in_url for more information

I can't give a very specific answer on the Why don't we sanitize.... question other than what we state in the doc, but I was told we definitely don't wanna do this when I implemented the sanitize_field_names setting for .NET.

One quote I remember is (not tagging the person :D ) :

People who put sensitive info there deserve that it's captured

:trollface:

Not sure how big the overhead would be to parse it... I'd guess it's not that bad.

I personally think that not sanitizing query string is fine.

@SylvainJuge
Copy link
Member

While checking within the agent is definitely expensive with apparent low value for now, there is definitely a huge cost implied by any leak of sensitive data (credit card numbers, SSN, ...), and often implied deleting and/or re-indexing.

However, we could apply those checks/regexes at apm-server level, or even with an ingest pipeline to provide the same benefit to the user, while data would still be technically captured by agents, it won't be stored .

Security-minded people won't do stupid things, thus they won't use query string. However, we can't assume that everyone is one of them. If we could prevent users from shooting in their own foot with a reasonable cost (thus probably not within the agents), that would still be valuable.

Copy link
Member

@vigneshshanmugam vigneshshanmugam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RUM agent will not be introducing a new config option as there is no use-case ATM and the agent does not capture HTTP headers (set-cookie), post body, form encoded data. The only place we do sanitisation is on the URL where we already redact most of the parameters by default.

We will align with the agents on this config once we see any new use cases.

@mikker
Copy link
Contributor

mikker commented Sep 15, 2020

Declared good to merge by @graphaelli over Slack 😊

@mikker mikker merged commit d66d75f into elastic:master Sep 15, 2020
beniwohli pushed a commit to felixbarny/apm that referenced this pull request Sep 21, 2020
@graphaelli graphaelli added this to the 7.11 milestone Sep 23, 2020
@mikker
Copy link
Contributor

mikker commented Sep 29, 2020

@elastic/apm-agent-devs Please update the table above with links to issues if applicable (some already have this behaviour.)

@mikker
Copy link
Contributor

mikker commented Sep 29, 2020

If your agent already works this way (or the setting is n/a) please state so in the table.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet