Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC3062: Bot verification #3062

Draft
wants to merge 4 commits into
base: old_master
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
106 changes: 106 additions & 0 deletions proposals/3062-bot-verification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# MSC3062: Bot verification

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this? What is the motivation on a security model level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can verify the bot is being run by a trusted party (and therefore trust the device) you can detect if the homserver admin creates a malicious device to decrypt messages to the bot.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically the bot authenticates by saying "I own the content on this domain", and then I trust the bot if I trust that domain? In that case, the "Proposal" section might benefit from being split up in a user-centric description and the low-level protocol messages.


It is generally recommended that key verification should be done in person.
However, it is usually difficult to meet a bot in person for the purposes of
verifying their keys. This proposal introduces a mechanism for verifying a bot
via HTTPS.

## Proposal

A new verification method, `m.bot_verification.v1`, is introduced.

With this verification method, the human initiates the verification (we do not
support two bots verifying each other using this method) by sending a
`m.key.verification.request` that includes `m.bot_verification.v1` in the
`methods` property.

The bot then responds with `m.key.verification.ready`, offering
`m.bot_verification.v1` as the only option. The bot then immediately sends an
`m.key.verification.start` message with `m.bot_verification.v1` as the method.
The `m.key.verification.start` message also contains a `url` property that
indicates a URL that can be used to verify the bot. The URL must be an HTTPS
URL.

The human's client displays the URL to the human, to allow them to verify that
the URL looks legitimate (e.g. that it belongs to a domain that the human
trusts to be associated with the bot's operators). If the human accepts the
URL, the client makes a POST request to the URL with the request body being a
JSON object with the following properties:
Comment on lines +24 to +28
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a few issues with using webservers:

  1. Most bots don't have them today.
  2. They can be load balanced or serverless (AWS Lambda), and trying to direct that request through to the right backend involves either breaking the load balancing/underlying tech or exposing internal infrastructure externally (ie: returning https://bot.nyc3.i2.example.org for a bot running in a NYC3 DC as instance 2). This then means exposing DDOS endpoints to bring down targeted parts of the infrastructure, or scraping for infrastructure.
  3. Because the backend can't be guaranteed to be the bot, the user is actually verifying a provider/website with arbitrary backend. For larger providers (like if Discord were to switch wholesale to Matrix) they may very well end up with a backend service that handles all of these verification requests without ever actually talking to the bot. This means the bot could still be malicious but the service provider is hiding the details of that.

I don't really have alternatives at the moment, but the use of HTTP for verification doesn't feel like a safe route. Possibly for bots it might be sane enough to verify based purely off the ability to establish an Olm session?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is an annoying requirement. But it seems like the only widely-available trust source today is DNS names and the web PKI. The only alternatives I can think of are email signing certs and DNSSEC. Both of the others seem harder for the average bot to manage.
  2. I'm not really sure if I understand your point. Likely if you are load balanced you want to be able to do this verification on any backend so this seems reasonable. If you are sharding in a way that this isn't possible you have likely already solved this problem to manage incoming events.
  3. I'm not sure what you mean by this. You "know" it is the bot because it gave you the URL. If you were given a URL that the bot doesn't control then they have no way to get the provided information and do the verification. (Some exceptions may be using a pastebin to receive the "upload" and retrieve it later. I've raised another comment about this.

One alternative would be:

The bot sticks a public key in DNS (not web-compatible) or at an well-known HTTP endpoint. This cert is downloaded and used to send signed data to the bot. If the bot can read the data it is assumed to control that domain and that case be used as an identity.

The benefit here is that the data hosted on HTTP is static, which makes it far easier to host. You just need to keep it up with a fresh TLS cert.


- `transaction_id`: the transaction ID from the `m.key.verification.start`
message
- `nonce`: a random nonce
- `from_device`: the device ID that the human is using
- `keys`: a map of key ID to public key for each key that the client wants to
attest to

The HTTPS server responds with an HTTP code of:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should require more than just a 200. Otherwise the bot could pretend to be a pastebin service or similar as long as they find a service that:

  1. Returns a 200 for JSON posts.
  2. Makes that data available.

For example this can almost be done with pastebin.com. The only problem is that it requires Content-Type: application/x-www-form-urlencoded. (Although I don't actually see an explicit requirement for application/json in this MSC.) This works by using the bot-controlled transaction_id to pass required parameters. You can imagine that this would be even easier if the endpoint just accepted raw content.

% curl -iX POST --data-binary '{"transaction_id":"=bar&api_dev_key=REDACTED&api_option=paste&api_paste_code=","nonce":123,"from_device":"devid","keys":{"a":1}}' -HContent-Type:application/x-www-form-urlencoded "https://pastebin.com/api/api_post.php"
HTTP/2 200 
date: Sat, 28 Aug 2021 16:37:43 GMT
content-type: text/html; charset=UTF-8
x-custom-api-dev-id: 362667
set-cookie: pastebin_posted=REDACTED; expires=Sat, 28-Aug-2021 17:37:43 GMT; Max-Age=3600; path=/; HttpOnly
cf-cache-status: DYNAMIC
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
server: cloudflare
cf-ray: 685ef7407fc02962-ORD

https://pastebin.com/SrNSQGBV

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user is supposed to verify that the URL looks legitimate.

Also, the attacker wouldn't be able to get the data that was POSTed without the URL that the pastebin returned.


- `200` if the keys match the expected values
- `404` if the `transaction_id` is unknown
- `400` if the keys do not match the expected values
- `303` if the server wants the human to perform additional steps to verify
their identity (see "[Verifying the human](#verifying-the-human)" below)

Upon successful completion of this step, the bot sends a
`m.key.verification.mac` message to the human's client. The format is the same
as the format of the message used in SAS verification, but the MAC keys are
produced by using HKDF with the salt equal to the nonce given in the HTTPS
request, and the info parameter composed by concatenating:

- the string `MATRIX_KEY_VERIFICATION_MAC|`,
- the Matrix ID of the human, followed by `|`,
- the device ID of the human, followed by `|`,
- the Matrix ID of the bot, followed by `|`,
- the `transaction_id`, followed by `|`,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the transaction_id is not forbidden from including the | character, however I don't think this is exploitable. But maybe it would be a good idea to require escaping anyways just to be extra sure.

- the Key ID of the key being MAC-ed, or the string `KEY_IDS` if the item being
MAC-ed is the list of key IDs.

The bot also sends a `m.key.verification.done` message.

The human's client calculates the MAC keys and verifies that the MACs for the
keys given in the `m.key.verification.mac` match the expected values. If they
do, the human's client marks the keys as being verified and sends a
`m.key.verification.done` message. Otherwise, the human's client displays an
error to the human and sends a `m.key.verification.cancel` message with
`m.key_mismatch` as the `reason`.

### Verifying the human

The above steps allow the human to verify the bot. However, they do not allow
the bot to verify the human. In general, there is no way for a bot to verify a
human unless the human has some other account that the bot can use, for example,
if the human has an account with the organization that operates the bot. For
example, a GitLab bot could verify the human by having them log into their
GitLab account.

If the bot wishes to do this, then it must respond to the HTTPS request with a
status code of `303` and a `Location` header pointing to a URL. The human's
client then opens the given URL in a browser, allowing the human to perform any
steps necessary to verify their identity. The bot should ensure that the
identity given in this way matches the expected identity, or record that the
given identity is associated with the human's Matrix ID.

## Potential issues
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps note that Punycode has caused security issues relating to humans verifying URLs, and that clients need to be careful when decoding it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is sort of implied by the part in the "Security considerations" section that says that it depends on "the human being able to distinguish a trusted URL from an untrusted URL", but it may be worth calling this out as an example, as well as other similar things (such as "paypaI", where that last character is an upper-case "i" rather than a lower-case "L")


TODO

## Alternatives

TODO

## Security considerations

The security of this verification method depends on:

- HTTPS,
- the human being able to distinguish a trusted URL from an untrusted URL,
- the bot's operator's ability to secure their web server.

When the human's client makes the HTTPS request, this will expose the human's
IP address to the bot's operators.

## Unstable prefix

Until this feature lands in the spec, the verification method name should be
`org.matrix.msc3062.bot_verification`.