Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions news/9973.bugfix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Mask the user when the password is empty.
3 changes: 2 additions & 1 deletion src/pip/_internal/utils/misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -629,12 +629,13 @@ def redact_netloc(netloc):

For example:
- "user:pass@example.com" returns "user:****@example.com"
- "user:@example.com" returns "****@example.com"
- "accesstoken@example.com" returns "****@example.com"
"""
netloc, (user, password) = split_auth_from_netloc(netloc)
if user is None:
return netloc
if password is None:
if not password:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should treat user:@example.com and user@example.com differently (notice the latter lacks a colon). The two are different; a colon indicates there is a password, while the latter indicates the entire auth section is one piecen (a token). So the former should be masked as

user:****@example.com

instead.

Copy link
Author

@ioggstream ioggstream May 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically, treating them differently discloses that a user has an empty password or that the actual token is the username (Eg an API Token).
IMHO that's not a secure behaviour in practice.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s not correct from my understanding. Basic auth encodes everything before @ as one string, so knowing an auth is made with an API key provides zero advantage to an attacker. In fact, rendering an empty password the same as non-empty improves security (ever so slightly) since the attacker now has a wider character range (the password field’s) to take into consideration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's ok to render empty password with*** the issue was filed because some services use the username field to convey credentials, together with an empty password.

Rendering empty password as *** will still disclose information.

To be clear, I am not sure whether it is ok to log the username too:) I think credentials should be logged only when explicitly requested.

My 2c, R

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, empty password is different from a token, which in Basic Auth is a username without a password segment. If the service uses the username field without a password to store credentials, that is not technically a username, but a authentication key. That auth should be written as scheme://token@domain/path (which would have its auth part entirely masked), not scheme://token:@domain/path (which would only have the part after : masked).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that standards are unclear here. For example, the Wikipedia article on URLs (admittedly not a standard) states "Applications should not render as clear text any data after the first colon (:) found within a userinfo subcomponent unless the data after the colon is the empty string (indicating no password)". But RFC 1738 claims (in section 3.3) that HTTP URLs may not contain username or password information at all. Mozilla developer network says that credentials in URLs is deprecated. I couldn't find any relevant standards on passing tokens via username/password fields. I'd be happy if someone were to point me to a reference.

I find @uranusjr's arguments somewhat persuasive, in a theoretical sense, but given that we're trying to avoid disclosing sensitive information, I feel that erring on the side of caution is probably better. Maybe we should simply mask everything, as *****:*****, never disclosing username or password information? What's the value in making any of this information visible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, the value of partial obscuring is in diagnostics/diagnosis based on the logs.

Obscuring both parts would mean that it's difficult (impossible?) to know if a user has their pip installation's user details configured correctly, if you're just looking at build logs from a failed run in some automated pipeline (eg: a deployment on a cloud platform).

TBH, it's marginal and given that there's increasing amounts of complexity here, I'm definitely on board for simplifying this down to if username or password: return hidden_creds_url

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vaguely remember there was a discussion on this when the masking thing was first implemented, and debuggability was the main reason behind the current design.

Copy link
Author

@ioggstream ioggstream Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simply mask everything, as *****:*****

+1

debuggability was the main reason

Could if username or password: return username[:3]+"***" work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think we’ll need to distinguish between user: (masked into *****:*****) and token without a colon (masked into *****). That extra colon at the middle is the most confusing part when debugging; users will be sent down a wrong road if they didn’t supply it but see it in the logs. So…

  • http://user:pass@exmple.comusername="user", password="pass"http://*****:*****@exmple.com
  • http://user:@exmple.comusername="user", password=""http://*****:*****@exmple.com
  • http://:pass@exmple.comusername="", password="pass"http://*****:*****@exmple.com (very weird so as long as we handle everything else right this should fit into whatever rule makes most sense)
  • http://token@exmple.comusername="token", password=Nonehttp://*****@exmple.com

user = "****"
password = ""
else:
Expand Down