Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KETTLE-73: Allow censoring of sensitive information which may be present in URL of DataSource #49

Merged
merged 7 commits into from
Dec 5, 2018

Conversation

amb26
Copy link
Member

@amb26 amb26 commented Oct 17, 2018

No description provided.

@idrc-cms-bot
Copy link

@idrc-cms-bot
Copy link

@simonbates simonbates self-requested a review October 17, 2018 17:03
var censorURL = function (url) {
sensitiveValues.forEach(function (sensitiveValue) {
if (sensitiveValue) {
url = url.replace(sensitiveValue, "[CENSORED]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This straightforward string replacement will fail whenever there is percent-encoding in the URL. For example, if we change the test URL in file tests/DataSourceSimpleTests.js from "https://secret-user:secret-password@thing.available:997/path" to "https://secret-user:secret-%25-password@thing.available:997/path" (that is, a password with value "secret-%-password"), the test will fail.

In this case, the full URL is logged as "https://secret-user:secret-%25-password@thing.available:997/path".

Inside the censorURL function, sensitiveValue is decoded and has the value "secret-user:secret-%-password". Which will not be found in the URL for replacement.

Copy link
Contributor

@simonbates simonbates Oct 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it feasible to never log the URL?

An approach might be to encode the value we want to remove and do replacement with the encoded value.

But I fear that this may be difficult to get right as there are multiple possible notations that result in the same decoded value.

It looks like some reserved characters are processed leniently even if they are not escaped. For example, the following URL passes the test:

  • https://secret-user:secret-=-password@thing.available:997/path

Even though "=" is a reserved character. A URL with another reserved character fails:

  • https://secret-user:secret-#-password@thing.available:997/path

It's also possible (though less likely) to percent encode unreserved characters. For example the following are all equivalent:

  • AB
  • A%42
  • %41B
  • %41%42

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spotting this risk, @simonbates - I decided the readability advantages of logging the URL warranted keeping it, and changed the workflow to regenerate the URL by re-encoding the already censored broken-down fields rather than attempting to censor it in place. Ready for another look - cheers

@idrc-cms-bot
Copy link

@@ -54,12 +54,13 @@ fluid.defaults("kettle.tests.middleware.verifyingUnmarked", {
gradeNames: ["kettle.plainAsyncMiddleware"],
middleware: kettle.tests.verifyingUnmarkedMiddleware
});

fluid.logObjectRenderChars = 10240;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug line can be removed.

@amb26
Copy link
Member Author

amb26 commented Dec 1, 2018

@cindyli, @simonbates - ready for another look

@idrc-cms-bot
Copy link

@cindyli
Copy link
Member

cindyli commented Dec 3, 2018

With the current censoring implementation, https://secret-user:secret-password@thing.available:997/path will be logged as https://thing.available:997/path. Also, when {auth: true} is set in censorRequestOptionsLog, the request option auth, along with headers information, will be removed from the log. This feels to me causes a loss of some useful information because having or without having these info will result in the same logged information.

I wonder if it's worthwhile to keep the presence of these options in the log but replacing their values with a string like [SENSITIVE] so we don't leak sensitive values but are still aware these info have been sent.

@amb26
Copy link
Member Author

amb26 commented Dec 5, 2018

Thanks for this good suggestion, @cindyli - ready for another look

@idrc-cms-bot
Copy link

fluid.defaults("kettle.tests.KETTLE34dataSource", {
gradeNames: "kettle.dataSource.URL",
url: "https://user:password@thing.available:997/path",
headers: {
"x-custom-header": "x-custom-value"
},
censorRequestOptionsLog: {
auth: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for auth: true with sensitive values being replaced.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that is the default value - is the idea just to test that options merging works correctly?

@cindyli
Copy link
Member

cindyli commented Dec 5, 2018

I understand auth: true is the default value. But I don't find a test that tests it with censoring sensitive options such as "auth" or "header.Authorization". [The only relevant test|https://github.com//pull/49/files#diff-f338ac10c905127921ee7d0c4ab2f01bR73] doesn't have options that need to be censored.

@amb26
Copy link
Member Author

amb26 commented Dec 5, 2018

I couldn't follow the link in your comment. This test https://github.com/fluid-project/kettle/pull/49/files#diff-f338ac10c905127921ee7d0c4ab2f01bR142 shows the sensitive option "auth" being censored from the logs

@cindyli
Copy link
Member

cindyli commented Dec 5, 2018

Right. Was thinking to enhance this test to have sensitive options, which doesn't seem necessary with the test you pointed out. Thanks.

Wonder if "header.Authorization" request field should be censored by default in kettle too. In universal, this field holds access tokens when requests are sent to the cloud based flow manager for fetching or saving user settings. See the get endpoint doc for your reference.

@amb26
Copy link
Member Author

amb26 commented Dec 5, 2018

@cindyli - good idea, implementation and tests enhanced to allow "deep censoring" - ready for another look

@idrc-cms-bot
Copy link

@@ -144,6 +144,15 @@ We document these configuration options in the next section:
nonexistent file, or an HTTP resource giving a 404) will result in a <code>resolve</code> with an empty
payload rather than a <code>reject</code> response.</td>
</tr>
<tr>
<td><code>censorRequestOptionsLog</code></td>
<td><code>Object</code> (map of <code>String</code> to <code>Boolean</code>) (default: <code>{auth: true}</code>)</td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add "headers.Authorization": true into the default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well caught. Ready for another look

@cindyli
Copy link
Member

cindyli commented Dec 5, 2018

@simonbates, this pull request looks good to me.

@idrc-cms-bot
Copy link

@cindyli cindyli merged commit 1fbf663 into fluid-project:master Dec 5, 2018
@cindyli
Copy link
Member

cindyli commented Dec 5, 2018

Merged at 281d9aa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants