Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sanitize peer client names #20788

Merged
merged 1 commit into from
May 11, 2024
Merged

Conversation

HanabishiRecca
Copy link
Contributor

@HanabishiRecca HanabishiRecca commented May 1, 2024

Clients can send arbitrary UTF-8 strings as their client identificators. qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.

This PR introduces basic sanitization for client strings.

  • Source string is simplified.
  • Rest non-printable characters are removed.

Fixes #20010

@glassez
Copy link
Member

glassez commented May 1, 2024

qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.

If the problem is displaying of such a data then it should be sanitized at the layer where this (displaying) is done.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented May 1, 2024

If the problem is displaying of such a data then it should be sanitized at the layer where this (displaying) is done.

Are you sure? There is no way to reliably do that in JS for WebUI. For regualr GUI it probably will require extra overhead with string copies. Consider that the value is never used in code, only for displaying in UI, so keeping original value is pointless.

We already do sanitization in-place e.g. for peer id. Libtorrent also does similar things on its side.

@glassez
Copy link
Member

glassez commented May 2, 2024

Consider that the value is never used in code, only for displaying in UI, so keeping original value is pointless.

I disagree with you in general. But considering that in this particular case, this field is unlikely to be intended for anything other than a visual representation of the client name, I agree to leave it as it is. (If we really need something more unique, then we should use the raw peer ID.)

@glassez glassez added this to the 5.0 milestone May 2, 2024
src/base/bittorrent/peerinfo.cpp Outdated Show resolved Hide resolved
src/base/bittorrent/peerinfo.cpp Outdated Show resolved Hide resolved
@glassez
Copy link
Member

glassez commented May 2, 2024

It uses QChar::isPrint() to replace all non-printable characters with spaces. I'm not sure if it accounts for all possible dangerous Unicode sequences, feel free to advice improvements.

Fixes #20010

Can you confirm that it actually fixes #20010?

@glassez glassez changed the title Sanitize client strings Sanitize peer client names May 2, 2024
@Chocobo1
Copy link
Member

Chocobo1 commented May 2, 2024

qBittorrent displays that strings as is, which could lead to unexpected results and affect GUI layout in case of junk or intentionally malicious strings.

https://www.libtorrent.org/reference-Core.html#peer_info states:

client
A human readable string describing the software at the other end of the connection. In some cases this information is not available, then it will contain a string that may give away something about which software is running in the other end. In the case of a web seed, the server type and version will be a part of this string. This is UTF-8 encoded.

IMO libtorrent do try to make it human readable and if that is the case then shouldn't this PR applied at libtorrent side?

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented May 2, 2024

this field is unlikely to be intended for anything other than a visual representation of the client name

Exactly. It's an arbitrary string sent by a remote client. We should never ever rely or make assumptions about it.

Can you confirm that it actually fixes #20010?

Yes, I tested it artifically. It sanitizes \n characters.

shouldn't this PR applied at libtorrent side?

Can't say. Some other clients using libtorrent may want to see the original string. Who knows.

@Chocobo1
Copy link
Member

Chocobo1 commented May 2, 2024

const QString client = peer.client().toHtmlEscaped();

The GUI side has escaping for HTML entities. If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over. IIRC WebUI has escaping too so it would need to be removed to avoid double escaping.

@HanabishiRecca
Copy link
Contributor Author

If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over.

It will be kinda inconsistient with the rest of the code. https://github.com/search?q=repo%3Aqbittorrent%2FqBittorrent%20toHtmlEscaped&type=code

We probably should do it for all fields then in a separate PR?

@Chocobo1
Copy link
Member

Chocobo1 commented May 2, 2024

If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over.

It will be kinda inconsistient with the rest of the code. https://github.com/search?q=repo%3Aqbittorrent%2FqBittorrent%20toHtmlEscaped&type=code

We are only talking about the client field in this PR. It comes back to this #20788 (comment)

We probably should do it for all fields then in a separate PR?

I would prefer changes to client field is finished in this PR. Can't say my opinion for other fields now.

@HanabishiRecca
Copy link
Contributor Author

HanabishiRecca commented May 2, 2024

Ok. Changes:

  • Source string is simplified().
  • Space-like characters (\t, \n, \v, \f, \r ...) are replaced with regular whitespaces.
  • Rest non-printable characters are omitted completely.
  • toHtmlEscaped() interned into the method.

@glassez
Copy link
Member

glassez commented May 2, 2024

The GUI side has escaping for HTML entities. If sanitizing are to be done at PeerInfo::client() then you should move it (toHtmlEscaped()) over.

But this certainly shouldn't be in the base layer. The reasoning that it is not (yet) used without toHtmlEscaped() is invalid in terms of the correct application architecture. The need for HTML escaping is an attribute of some display methods, not the model. Although in this case the purpose of this aspect of the model is to provide a readable value, HTML escaping is not part of this generally.

@HanabishiRecca
Copy link
Contributor Author

Although in this case the purpose of this aspect of the model is to provide a readable value, HTML escaping is not part of this generally.

I tend to agree with that.

@glassez glassez requested a review from a team May 5, 2024 06:17
@glassez glassez added the Core label May 11, 2024
@glassez glassez merged commit 2c47f09 into qbittorrent:master May 11, 2024
10 of 14 checks passed
@glassez
Copy link
Member

glassez commented May 11, 2024

@HanabishiRecca
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Formatting Thunder (Xun Lei) Client Name
4 participants