Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3062: Bot verification #3062

Draft
wants to merge 4 commits into
base: old_master
Choose a base branch
from

Conversation

uhoreg
Copy link
Member

@uhoreg uhoreg commented Mar 12, 2021

@uhoreg uhoreg changed the title MSCxxxx: Bot verification MSC3062: Bot verification Mar 12, 2021
@uhoreg uhoreg added e2e kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal labels Mar 12, 2021
@@ -0,0 +1,106 @@
# MSC3062: Bot verification

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this? What is the motivation on a security model level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can verify the bot is being run by a trusted party (and therefore trust the device) you can detect if the homserver admin creates a malicious device to decrypt messages to the bot.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the bot authenticates by saying "I own the content on this domain", and then I trust the bot if I trust that domain? In that case, the "Proposal" section might benefit from being split up in a user-centric description and the low-level protocol messages.

identity given in this way matches the expected identity, or record that the
given identity is associated with the human's Matrix ID.

## Potential issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps note that Punycode has caused security issues relating to humans verifying URLs, and that clients need to be careful when decoding it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is sort of implied by the part in the "Security considerations" section that says that it depends on "the human being able to distinguish a trusted URL from an untrusted URL", but it may be worth calling this out as an example, as well as other similar things (such as "paypaI", where that last character is an upper-case "i" rather than a lower-case "L")

@turt2live turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021
@turt2live turt2live self-requested a review July 31, 2021 05:58
Comment on lines +24 to +28
The human's client displays the URL to the human, to allow them to verify that
the URL looks legitimate (e.g. that it belongs to a domain that the human
trusts to be associated with the bot's operators). If the human accepts the
URL, the client makes a POST request to the URL with the request body being a
JSON object with the following properties:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few issues with using webservers:

  1. Most bots don't have them today.
  2. They can be load balanced or serverless (AWS Lambda), and trying to direct that request through to the right backend involves either breaking the load balancing/underlying tech or exposing internal infrastructure externally (ie: returning https://bot.nyc3.i2.example.org for a bot running in a NYC3 DC as instance 2). This then means exposing DDOS endpoints to bring down targeted parts of the infrastructure, or scraping for infrastructure.
  3. Because the backend can't be guaranteed to be the bot, the user is actually verifying a provider/website with arbitrary backend. For larger providers (like if Discord were to switch wholesale to Matrix) they may very well end up with a backend service that handles all of these verification requests without ever actually talking to the bot. This means the bot could still be malicious but the service provider is hiding the details of that.

I don't really have alternatives at the moment, but the use of HTTP for verification doesn't feel like a safe route. Possibly for bots it might be sane enough to verify based purely off the ability to establish an Olm session?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is an annoying requirement. But it seems like the only widely-available trust source today is DNS names and the web PKI. The only alternatives I can think of are email signing certs and DNSSEC. Both of the others seem harder for the average bot to manage.
  2. I'm not really sure if I understand your point. Likely if you are load balanced you want to be able to do this verification on any backend so this seems reasonable. If you are sharding in a way that this isn't possible you have likely already solved this problem to manage incoming events.
  3. I'm not sure what you mean by this. You "know" it is the bot because it gave you the URL. If you were given a URL that the bot doesn't control then they have no way to get the provided information and do the verification. (Some exceptions may be using a pastebin to receive the "upload" and retrieve it later. I've raised another comment about this.

One alternative would be:

The bot sticks a public key in DNS (not web-compatible) or at an well-known HTTP endpoint. This cert is downloaded and used to send signed data to the bot. If the bot can read the data it is assumed to control that domain and that case be used as an identity.

The benefit here is that the data hosted on HTTP is static, which makes it far easier to host. You just need to keep it up with a fresh TLS cert.

Comment on lines +24 to +28
The human's client displays the URL to the human, to allow them to verify that
the URL looks legitimate (e.g. that it belongs to a domain that the human
trusts to be associated with the bot's operators). If the human accepts the
URL, the client makes a POST request to the URL with the request body being a
JSON object with the following properties:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is an annoying requirement. But it seems like the only widely-available trust source today is DNS names and the web PKI. The only alternatives I can think of are email signing certs and DNSSEC. Both of the others seem harder for the average bot to manage.
  2. I'm not really sure if I understand your point. Likely if you are load balanced you want to be able to do this verification on any backend so this seems reasonable. If you are sharding in a way that this isn't possible you have likely already solved this problem to manage incoming events.
  3. I'm not sure what you mean by this. You "know" it is the bot because it gave you the URL. If you were given a URL that the bot doesn't control then they have no way to get the provided information and do the verification. (Some exceptions may be using a pastebin to receive the "upload" and retrieve it later. I've raised another comment about this.

One alternative would be:

The bot sticks a public key in DNS (not web-compatible) or at an well-known HTTP endpoint. This cert is downloaded and used to send signed data to the bot. If the bot can read the data it is assumed to control that domain and that case be used as an identity.

The benefit here is that the data hosted on HTTP is static, which makes it far easier to host. You just need to keep it up with a fresh TLS cert.

- `keys`: a map of key ID to public key for each key that the client wants to
attest to

The HTTPS server responds with an HTTP code of:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should require more than just a 200. Otherwise the bot could pretend to be a pastebin service or similar as long as they find a service that:

  1. Returns a 200 for JSON posts.
  2. Makes that data available.

For example this can almost be done with pastebin.com. The only problem is that it requires Content-Type: application/x-www-form-urlencoded. (Although I don't actually see an explicit requirement for application/json in this MSC.) This works by using the bot-controlled transaction_id to pass required parameters. You can imagine that this would be even easier if the endpoint just accepted raw content.

% curl -iX POST --data-binary '{"transaction_id":"=bar&api_dev_key=REDACTED&api_option=paste&api_paste_code=","nonce":123,"from_device":"devid","keys":{"a":1}}' -HContent-Type:application/x-www-form-urlencoded "https://pastebin.com/api/api_post.php"
HTTP/2 200 
date: Sat, 28 Aug 2021 16:37:43 GMT
content-type: text/html; charset=UTF-8
x-custom-api-dev-id: 362667
set-cookie: pastebin_posted=REDACTED; expires=Sat, 28-Aug-2021 17:37:43 GMT; Max-Age=3600; path=/; HttpOnly
cf-cache-status: DYNAMIC
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 685ef7407fc02962-ORD

https://pastebin.com/SrNSQGBV

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user is supposed to verify that the URL looks legitimate.

Also, the attacker wouldn't be able to get the data that was POSTed without the URL that the pastebin returned.

- the Matrix ID of the human, followed by `|`,
- the device ID of the human, followed by `|`,
- the Matrix ID of the bot, followed by `|`,
- the `transaction_id`, followed by `|`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the transaction_id is not forbidden from including the | character, however I don't think this is exploitable. But maybe it would be a good idea to require escaping anyways just to be extra sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
e2e kind:feature MSC for not-core and not-maintenance stuff needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants