Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add admin API to get some information about federation status #11407

Merged
merged 11 commits into from Dec 6, 2021
1 change: 1 addition & 0 deletions changelog.d/11407.feature
@@ -0,0 +1 @@
Add admin API to get some information about federation status with remote servers.
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Expand Up @@ -64,6 +64,7 @@
- [Statistics](admin_api/statistics.md)
- [Users](admin_api/user_admin_api.md)
- [Server Version](admin_api/version_api.md)
- [Federation](usage/administration/admin_api/federation.md)
- [Manhole](manhole.md)
- [Monitoring](metrics-howto.md)
- [Request log format](usage/administration/request_log.md)
Expand Down
113 changes: 113 additions & 0 deletions docs/usage/administration/admin_api/federation.md
@@ -0,0 +1,113 @@
# Federation API

This API allows a server administrator to manage Synapse's federation with other homeservers.

Note: This API is new, experimental and "subject to change".

## List of destinations

This API gets the current destination retry timing info for all remote servers.

The list contains all the servers with which the server federates,
regardless of whether an error occurred or not.
If an error occurs, it may take up to 20 minutes for the error to be displayed here,
as a complete retry must have failed.

The API is:

A standard request with no filtering:

```
GET /_synapse/admin/v1/federation/destinations
```

A response body like the following is returned:
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved

```json
{
"destinations":[
{
"destination": "matrix.org",
"retry_last_ts": 1557332397936,
"retry_interval": 3000000,
"failure_ts": 1557329397936,
"last_successful_stream_ordering": null
}
],
"total": 1
}
```

To paginate, check for `next_token` and if present, call the endpoint again
with `from` set to the value of `next_token`. This will return a new page.

If the endpoint does not return a `next_token` then there are no more destinations
to paginate through.

**Parameters**

The following query parameters are available:

- `from` - Offset in the returned list. Defaults to `0`.
- `limit` - Maximum amount of destinations to return. Defaults to `100`.
- `order_by` - The method in which to sort the returned list of destinations.
Valid values are:
- `destination` - Destinations are ordered alphabetically by remote server name.
This is the default.
Comment on lines +55 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't sound very stable to me to use as the initial sorting / pagination scheme. This might not matter usually, but is something to note.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for this default sorting is the performances / indexes. All other columns have no indexes. The idea is not to generate a high load by default.

- `retry_last_ts` - Destinations are ordered by time of last retry attempt in ms.
- `retry_interval` - Destinations are ordered by how long until next retry in ms.
- `failure_ts` - Destinations are ordered by when the server started failing in ms.
- `last_successful_stream_ordering` - Destinations are ordered by the stream ordering
of the most recent successfully-sent PDU.
- `dir` - Direction of room order. Either `f` for forwards or `b` for backwards. Setting
this value to `b` will reverse the above sort order. Defaults to `f`.

*Caution:* The database only has an index on the column `destination`.
This means that if a different sort order is used,
this can cause a large load on the database, especially for large environments.

**Response**

The following fields are returned in the JSON response body:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions:

  • What happens if I'm federating with a server and federation has never failed? Does it appear in this list?
  • What happens if federation with a problematic server recovers---does it appear in this list?
  • Does the data returned persist across Synapse restarts?
  • I think someone told me that Synapse has some kind of "blocking" mechanism, where if a remote homeserver fails too many federation requests, we block all federation requests to that server. Is that reflected in this output at all?

Sorry---I appreciate that some of these questions are about how Synapse itself works, rather than this API. But I think they're the kind of thing that'd be useful to consider in the documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few answers:

  • What happens if I'm federating with a server and federation has never failed? Does it appear in this list?

Every federated server should be in the list.

Marks that we have successfully sent the PDUs up to and including the
one specified.

What happens if federation with a problematic server recovers---does it appear in this list?

Yes.

"""Sets the current retry timings for a given destination.
Both timings should be zero if retrying is no longer occurring.

Does the data returned persist across Synapse restarts?

Yes.

I think someone told me that Synapse has some kind of "blocking" mechanism, where if a remote homeserver fails too many federation requests, we block all federation requests to that server. Is that reflected in this output at all?

I don't know.


- `destinations` - An array of objects, each containing information about a destination.
Destination objects contain the following fields:
- `destination` - string - Name of the remote server to federate.
- `retry_last_ts` - integer - The last time Synapse tried and failed to reach the
remote server, in ms. This is `0` if no further retrying occuring.
dklimpel marked this conversation as resolved.
Show resolved Hide resolved
- `retry_interval` - integer - How long since the last time Synapse tried to reach
the remote server before trying again, in ms. This is `0` if no further retrying occuring.
- `failure_ts` - integer - The first time Synapse tried and failed to reach the
remote server, in ms. This is `null` if no error has occurred.
dklimpel marked this conversation as resolved.
Show resolved Hide resolved
- `last_successful_stream_ordering` - integer - The stream ordering of the most
dklimpel marked this conversation as resolved.
Show resolved Hide resolved
recent successfully-sent [PDU](understanding_synapse_through_grafana_graphs.md#federation)
to this destination, or `null` if this information has not been tracked yet.
- `next_token`: string representing a positive integer - Indication for pagination. See above.
- `total` - integer - Total number of destinations.

# Destination Details API

This API gets the retry timing info for a specific remote server.

The API is:

```
GET /_synapse/admin/v1/federation/destinations/<destination>
```

A response body like the following is returned:

```json
{
"destination": "matrix.org",
"retry_last_ts": 1557332397936,
"retry_interval": 3000000,
"failure_ts": 1557329397936,
"last_successful_stream_ordering": null
}
```

**Response**

The response fields are the same like in the `destinations` array in
[List of destinations](#list-of-destinations) response.
6 changes: 6 additions & 0 deletions synapse/rest/admin/__init__.py
Expand Up @@ -39,6 +39,10 @@
EventReportDetailRestServlet,
EventReportsRestServlet,
)
from synapse.rest.admin.federation import (
DestinationsRestServlet,
ListDestinationsRestServlet,
)
from synapse.rest.admin.groups import DeleteGroupAdminRestServlet
from synapse.rest.admin.media import ListMediaInRoom, register_servlets_for_media_repo
from synapse.rest.admin.registration_tokens import (
Expand Down Expand Up @@ -256,6 +260,8 @@ def register_servlets(hs: "HomeServer", http_server: HttpServer) -> None:
ListRegistrationTokensRestServlet(hs).register(http_server)
NewRegistrationTokenRestServlet(hs).register(http_server)
RegistrationTokenRestServlet(hs).register(http_server)
DestinationsRestServlet(hs).register(http_server)
ListDestinationsRestServlet(hs).register(http_server)

# Some servlets only get registered for the main process.
if hs.config.worker.worker_app is None:
Expand Down
135 changes: 135 additions & 0 deletions synapse/rest/admin/federation.py
@@ -0,0 +1,135 @@
# Copyright 2021 The Matrix.org Foundation C.I.C.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from http import HTTPStatus
from typing import TYPE_CHECKING, Tuple

from synapse.api.errors import Codes, NotFoundError, SynapseError
from synapse.http.servlet import RestServlet, parse_integer, parse_string
from synapse.http.site import SynapseRequest
from synapse.rest.admin._base import admin_patterns, assert_requester_is_admin
from synapse.storage.databases.main.transactions import DestinationSortOrder
from synapse.types import JsonDict

if TYPE_CHECKING:
from synapse.server import HomeServer

logger = logging.getLogger(__name__)


class ListDestinationsRestServlet(RestServlet):
"""Get request to list all destinations.
This needs user to have administrator access in Synapse.

GET /_synapse/admin/v1/federation/destinations?from=0&limit=10

returns:
200 OK with list of destinations if success otherwise an error.

The parameters `from` and `limit` are required only for pagination.
By default, a `limit` of 100 is used.
The parameter `destination` can be used to filter by destination.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this might be served better by a separate endpoint for querying the status of one specific destination?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, an API for a single destination can certainly be added. However, the aim is to design the API in a similar way like rooms or users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @DMRobertson that this sounds like it should be a separate API, maybe /federation/destinations/{destination}$? That feels like the way rooms are handled currently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had in mind the use cases #7982 (comment) outlined here.

But I don't mind there being a generic means to answer "what's the state of federation on my homeserver?", and this seems like a start to that. It's flagged as experimental and "subject to change" in the docs so I think we should get this in, see if it's helpful to people and iterate from there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not easy for me to work on it. Information on the subject is hard to find.

The parameter `order_by` can be used to order the result.
"""

PATTERNS = admin_patterns("/federation/destinations$")

def __init__(self, hs: "HomeServer"):
self._auth = hs.get_auth()
self._store = hs.get_datastore()

async def on_GET(self, request: SynapseRequest) -> Tuple[int, JsonDict]:
await assert_requester_is_admin(self._auth, request)

start = parse_integer(request, "from", default=0)
limit = parse_integer(request, "limit", default=100)

if start < 0:
raise SynapseError(
HTTPStatus.BAD_REQUEST,
"Query parameter from must be a string representing a positive integer.",
errcode=Codes.INVALID_PARAM,
)

if limit < 0:
raise SynapseError(
HTTPStatus.BAD_REQUEST,
"Query parameter limit must be a string representing a positive integer.",
errcode=Codes.INVALID_PARAM,
)

destination = parse_string(request, "destination")

order_by = parse_string(
request,
"order_by",
default=DestinationSortOrder.DESTINATION.value,
allowed_values=[dest.value for dest in DestinationSortOrder],
)

direction = parse_string(request, "dir", default="f", allowed_values=("f", "b"))

destinations, total = await self._store.get_destinations_paginate(
start, limit, destination, order_by, direction
)
response = {"destinations": destinations, "total": total}
if (start + limit) < total:
response["next_token"] = str(start + len(destinations))

return HTTPStatus.OK, response


class DestinationsRestServlet(RestServlet):
"""Get details of a destination.
This needs user to have administrator access in Synapse.

GET /_synapse/admin/v1/federation/destinations/<destination>

returns:
200 OK with details of a destination if success otherwise an error.
"""

PATTERNS = admin_patterns("/federation/destinations/(?P<destination>[^/]+)$")

def __init__(self, hs: "HomeServer"):
self._auth = hs.get_auth()
self._store = hs.get_datastore()

async def on_GET(
self, request: SynapseRequest, destination: str
) -> Tuple[int, JsonDict]:
await assert_requester_is_admin(self._auth, request)

destination_retry_timings = await self._store.get_destination_retry_timings(
destination
)

if not destination_retry_timings:
raise NotFoundError("Unknown destination")

last_successful_stream_ordering = (
await self._store.get_destination_last_successful_stream_ordering(
destination
)
)

response = {
"destination": destination,
"failure_ts": destination_retry_timings.failure_ts,
"retry_last_ts": destination_retry_timings.retry_last_ts,
"retry_interval": destination_retry_timings.retry_interval,
"last_successful_stream_ordering": last_successful_stream_ordering,
}

return HTTPStatus.OK, response
70 changes: 70 additions & 0 deletions synapse/storage/databases/main/transactions.py
Expand Up @@ -14,6 +14,7 @@

import logging
from collections import namedtuple
from enum import Enum
from typing import TYPE_CHECKING, Iterable, List, Optional, Tuple

import attr
Expand Down Expand Up @@ -44,6 +45,16 @@
)


class DestinationSortOrder(Enum):
"""Enum to define the sorting method used when returning destinations."""

DESTINATION = "destination"
RETRY_LAST_TS = "retry_last_ts"
RETTRY_INTERVAL = "retry_interval"
FAILURE_TS = "failure_ts"
LAST_SUCCESSFUL_STREAM_ORDERING = "last_successful_stream_ordering"


@attr.s(slots=True, frozen=True, auto_attribs=True)
class DestinationRetryTimings:
"""The current destination retry timing info for a remote server."""
Expand Down Expand Up @@ -480,3 +491,62 @@ def _get_catch_up_outstanding_destinations_txn(

destinations = [row[0] for row in txn]
return destinations

async def get_destinations_paginate(
self,
start: int,
limit: int,
destination: Optional[str] = None,
order_by: str = DestinationSortOrder.DESTINATION.value,
direction: str = "f",
) -> Tuple[List[JsonDict], int]:
"""Function to retrieve a paginated list of destinations.
This will return a json list of destinations and the
total number of destinations matching the filter criteria.

Args:
start: start number to begin the query from
limit: number of rows to retrieve
destination: search string in destination
order_by: the sort order of the returned list
direction: sort ascending or descending
Returns:
A tuple of a list of mappings from destination to information
and a count of total destinations.
"""

def get_destinations_paginate_txn(
txn: LoggingTransaction,
) -> Tuple[List[JsonDict], int]:
order_by_column = DestinationSortOrder(order_by).value

if direction == "b":
order = "DESC"
else:
order = "ASC"

args = []
where_statement = ""
if destination:
args.extend(["%" + destination.lower() + "%"])
where_statement = "WHERE LOWER(destination) LIKE ?"

sql_base = f"FROM destinations {where_statement} "
sql = f"SELECT COUNT(*) as total_destinations {sql_base}"
txn.execute(sql, args)
count = txn.fetchone()[0]

sql = f"""
SELECT destination, retry_last_ts, retry_interval, failure_ts,
last_successful_stream_ordering
{sql_base}
ORDER BY {order_by_column} {order}, destination ASC
LIMIT ? OFFSET ?
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
"""
txn.execute(sql, args + [limit, start])
destinations = self.db_pool.cursor_to_dict(txn)
return destinations, count

return await self.db_pool.runInteraction(
"get_destinations_paginate_txn", get_destinations_paginate_txn
)