Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact URL to hide password #6295

Merged
merged 10 commits into from
Mar 1, 2019
Merged

Redact URL to hide password #6295

merged 10 commits into from
Mar 1, 2019

Conversation

expobrain
Copy link
Contributor

When the verbose flag is used in pip install and in ~/.pip/pip.conf you setup the access to you private registry with basic auth, i.e. extra-index-url = https://username:pwd!@pypi.example.com/simple/, when using pip install -v the password is printed in clear to stdout.

This PR redact the URL to hide original password.

@@ -155,7 +166,7 @@ def _get_html_response(url, session):
if _is_url_like_archive(url):
_ensure_html_response(url, session=session)

logger.debug('Getting page %s', url)
logger.debug('Getting page %s', _redacted_url(url))
Copy link
Member

@cjerdonek cjerdonek Feb 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the redact_password_from_url() function which has this purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know!, Fixed in the next commit

Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more comments.

"Getting page https://user:<pwd>@example.com/simple/initools"
in result.stdout
)
virtualenv.clear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? If so, should it be in a finally block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used test_command_line_options_override_env_vars as a template and I assumed that you need to cleanup manually. Removed.

assert (
"Getting page https://user:<pwd>@example.com/simple/initools"
in result.stdout
)
Copy link
Member

@cjerdonek cjerdonek Feb 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, you might as well assert that my_password is not in the output. (And I would assign my_password to a variable so that that portion of the test doesn't pass because of a typo.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"""
password = "my_password"

script.environ['PIP_INDEX_URL'] = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the environ options instead of passing '--index-url', 'https://user:{}@example.com/simple/'.format(password) to script.pip ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xavfernandez Can you confirm if this test hits the network as written? It would be a lot better IMO if it didn't as I think such cases should be reduced to a minimum.

@expobrain It might be better to write a unit test (so in the unit directory rather than functional) of _get_html_response() to avoid hitting the network. You can look at test_get_legacy_build_wheel_path__no_names() and nearby functions for examples of using the caplog fixture (since the code wouldn't be running in a separate subprocess).

Copy link
Member

@cjerdonek cjerdonek Feb 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS - it looks like there are already a number of unit tests of that function (search for test_get_html_response_), and you can follow the examples there of mocking the session response.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm if this test hits the network as written? It would be a lot better IMO if it didn't as I think such cases should be reduced to a minimum.

I confirm it should (I was actually thinking about suggesting to add the network marker to the test) and I don't see a way around that for a functional test.
An unit test could indeed also remove the need for network.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, here's an example of a test failing because of flakiness due to installing INITools in the test: https://dev.azure.com/pypa/pip/_build/results?buildId=5274&view=logs&jobId=c897e773-a04c-5d18-b461-2c6584d92c5a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cjerdonek I've moved the test into unit, please have a look at it

@cjerdonek
Copy link
Member

@cjerdonek cjerdonek added type: bug A confirmed bug or unintended behavior type: bugfix labels Feb 24, 2019
@expobrain
Copy link
Contributor Author

Btw, this needs a news file: https://pip.pypa.io/en/stable/development/contributing/#news-entries

Can I use this PR for the news file or better create a new issue?

@cjerdonek
Copy link
Member

Can I use this PR for the news file or better create a new issue?

Thanks for asking. You can use this PR.

@@ -118,6 +118,29 @@ def test_get_html_response_no_head(url):
]


@mock.patch("pip._internal.index.logger")
def test_get_html_response_dont_log_clear_text_password(m_logger):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to mock logging. I'll repeat my suggestion from before:

You can look at test_get_legacy_build_wheel_path__no_names() and nearby functions for examples of using the caplog fixture (since the code wouldn't be running in a separate subprocess).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot about it, and fixed in the next commit

Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking a lot better now. Thanks for sticking it out! Just a couple more minor comments..

def test_get_html_response_dont_log_clear_text_password(caplog):
"""
`_get_html_response()` shouldn't send a HEAD request if the URL does not
look like an archive, only the GET request that retrieves data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this docstring was copied without updating.


caplog.set_level(logging.DEBUG)

resp = _get_html_response(url_template.format(password), session=session)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using url_template, it would be simpler and more direct just to pass https://user:my_password@example.com/simple/.

record = caplog.records[0]
assert record.levelname == 'DEBUG'
assert record.message.splitlines() == [
"Getting page {}".format(url_template.format("****")),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, you can just check that actual string instead of using nested format strings, so "Getting page https://user:****@example.com/simple/"... (It's generally better to be more explicit in test code.)

look like an archive, only the GET request that retrieves data.
"""
password = "my_password"
url_template = "https://user:{}@example.com/simple/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You won't need these two variables (see below).

@cjerdonek cjerdonek added the type: security Has potential security implications label Feb 27, 2019
@expobrain expobrain force-pushed the redact_url branch 2 times, most recently from 1a2af7b to 9e13fea Compare March 1, 2019 08:49
Copy link
Member

@cjerdonek cjerdonek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple wording comments.

news/6295.bugfix Outdated
@@ -0,0 +1 @@
Redacts the URL of the extra index when `pip -v` to hide the user's password
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the URL isn't being removed but only the password, it would be better to say something like, "Redact the password from the extra index URL when using pip -v." Also, you want to use double backticks before and after "pip -v" rather than single, and end the sentence with a period.

Copy link
Contributor Author

@expobrain expobrain Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean like ``pip -v``?

Copy link
Member

@cjerdonek cjerdonek Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I couldn't get it to render in GitHub's viewer when I first tried. The reason for double is because the contents are reStructuredText and not, say, Markdown: https://pip.pypa.io/en/stable/development/contributing/#contents-of-a-news-entry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a backslash before the backtick to escape it and let Markdown ignore it

@@ -118,6 +118,34 @@ def test_get_html_response_no_head(url):
]


def test_get_html_response_dont_log_clear_text_password(caplog):
"""
`_get_html_response()` shouldn't log the full index's URL includoing the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: including

Also, this could be a little clearer. How about: "_get_html_response() should redact the password from the index URL in its DEBUG log message."

@cjerdonek cjerdonek merged commit 729404d into pypa:master Mar 1, 2019
@cjerdonek
Copy link
Member

Thank you for all your work on this, @expobrain!

@expobrain expobrain deleted the redact_url branch March 1, 2019 12:23
@expobrain
Copy link
Contributor Author

Thank you @cjerdonek too for the code review!

@lock
Copy link

lock bot commented May 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label May 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators May 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation type: bug A confirmed bug or unintended behavior type: security Has potential security implications
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants