Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Draft: Migrate to OpenTelemetry tracing #13400

Closed
Closed
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
0cc610e
Migrate to OpenTelemetry tracing
MadLittleMods Jul 26, 2022
2fe6911
Some shim and some new
MadLittleMods Jul 27, 2022
6984cef
Progress towards OTEL
MadLittleMods Jul 27, 2022
6406fd5
Server running
MadLittleMods Jul 27, 2022
2428172
Export to Jaeger (things are showing up)
MadLittleMods Jul 27, 2022
0d7a2b9
Revert changes to Sentry scopes (not OTEL)
MadLittleMods Jul 27, 2022
9e1de86
We use the config for the Jaeger exporter now
MadLittleMods Jul 27, 2022
f6c3b22
Fix some lints
MadLittleMods Jul 27, 2022
3a25996
Fixup some todos
MadLittleMods Jul 28, 2022
1b0840e
Fix some lints
MadLittleMods Jul 29, 2022
1d208fa
Fix invalid attribute type
MadLittleMods Jul 29, 2022
2011ac2
Fix using wrong type of context (`Context` vs `SpanContext`)
MadLittleMods Jul 29, 2022
19d20b5
Record exception
MadLittleMods Jul 29, 2022
786dd9b
Explain weird function
MadLittleMods Jul 29, 2022
7c135b9
Easier to follow local vs remote span tracing
MadLittleMods Jul 30, 2022
d29a4af
Move to start_active_span
MadLittleMods Jul 30, 2022
041acdf
Working second test although it's a bit pointless testing whether ope…
MadLittleMods Jul 30, 2022
d848156
Passing tests and context manager doesn't seem to be needed
MadLittleMods Jul 30, 2022
070195a
Use correct type for what start_as_current_span returns
MadLittleMods Jul 30, 2022
7772f50
Use HTTP_HOST attribute
MadLittleMods Jul 30, 2022
322da51
Fix some lints
MadLittleMods Aug 1, 2022
33fd24e
todos
MadLittleMods Aug 1, 2022
a9fb504
Implement start_active_span_from_edu for OTEL
MadLittleMods Aug 1, 2022
8e902b8
Remove what's left of scopemanager
MadLittleMods Aug 1, 2022
00be06c
Try to align read from edu content
MadLittleMods Aug 1, 2022
6255a1a
Fix tests and some lints
MadLittleMods Aug 2, 2022
b3cdbad
PoC force tracing
MadLittleMods Aug 2, 2022
d15fa45
Non-working try baggage to inherit force tracing/sampling
MadLittleMods Aug 2, 2022
6bb7cb7
Revert "Non-working try baggage to inherit force tracing/sampling"
MadLittleMods Aug 2, 2022
dbd9005
Revert crazy custom sampler and span process to try force tracing for…
MadLittleMods Aug 2, 2022
0f93ec8
Fix lints
MadLittleMods Aug 2, 2022
36d6648
Remove type ignore comments
MadLittleMods Aug 2, 2022
fb0e820
More clear method names
MadLittleMods Aug 2, 2022
b09651a
Always return config path for config error
MadLittleMods Aug 2, 2022
da396a2
Add test for what happens when side by side spans in with statement
MadLittleMods Aug 2, 2022
ad71bc3
End on exit is already the default expected behavior
MadLittleMods Aug 2, 2022
59facea
Restore logging current_context (not sure why removed
MadLittleMods Aug 2, 2022
9d6fcf3
Clean up some opentracing text references
MadLittleMods Aug 2, 2022
fcc4220
Update docs
MadLittleMods Aug 2, 2022
d72cacf
Add changelog
MadLittleMods Aug 2, 2022
ba4a46a
Seems to (see test_side_by_side_spans)
MadLittleMods Aug 2, 2022
72c718d
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Aug 2, 2022
c26fa2d
Move to 72 schema version
MadLittleMods Aug 2, 2022
5999132
Fix lints
MadLittleMods Aug 2, 2022
2491665
Fix remnant
MadLittleMods Aug 2, 2022
16d17f7
Fix table missing column
MadLittleMods Aug 3, 2022
b6f5665
Use latested Twisted from source to fix contextvar issues causing OTE…
MadLittleMods Aug 3, 2022
699dad0
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Aug 3, 2022
270db42
Update treq to match minimum Twisted Python versions
MadLittleMods Aug 3, 2022
f5da762
Revert "Update treq to match minimum Twisted Python versions"
MadLittleMods Aug 3, 2022
ccd4752
Fix tracing imports after merging in develop
MadLittleMods Aug 3, 2022
d7166a0
Update docs/tracing.md
MadLittleMods Aug 3, 2022
7566375
Try fix Twisted/treq problems
MadLittleMods Aug 4, 2022
7024d7b
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Aug 9, 2022
8def7e4
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Aug 18, 2022
50f0342
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 9, 2022
f73bc59
Try to resolve poetry deps
MadLittleMods Sep 9, 2022
a15592d
Poetry install again
MadLittleMods Sep 9, 2022
32b9d16
poetry update
MadLittleMods Sep 9, 2022
6c40dfa
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 12, 2022
ad3e324
Install otel deps from develop
MadLittleMods Sep 12, 2022
15e242e
OTEL install with DMR
MadLittleMods Sep 13, 2022
d730a46
Update Twisted to lastest
MadLittleMods Sep 13, 2022
ed11237
Remove linting from CI for now
MadLittleMods Sep 13, 2022
19c6f6e
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 13, 2022
b77d49f
Hopefully fix problem when OTEL not installed with non recording span
MadLittleMods Sep 13, 2022
a027c6e
Maybe fix positional argument mismatch for DummyLink
MadLittleMods Sep 14, 2022
84f91e3
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 14, 2022
b86869f
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 20, 2022
e4b9898
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 26, 2022
4a495ac
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Oct 1, 2022
7d70acd
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Oct 20, 2022
627951e
Fix poetry.lock conflicts
MadLittleMods Oct 20, 2022
d993cb0
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Nov 18, 2022
7acb365
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Nov 21, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -164,9 +164,6 @@ ignore_missing_imports = True
[mypy-pympler.*]
ignore_missing_imports = True

[mypy-rust_python_jaeger_reporter.*]
ignore_missing_imports = True

[mypy-saml2.*]
ignore_missing_imports = True

Expand Down
337 changes: 212 additions & 125 deletions poetry.lock

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -175,13 +175,14 @@ authlib = { version = ">=0.14.0", optional = true }
systemd-python = { version = ">=231", optional = true }
lxml = { version = ">=4.2.0", optional = true }
sentry-sdk = { version = ">=0.7.2", optional = true }
opentracing = { version = ">=2.2.0", optional = true }
jaeger-client = { version = ">=4.0.0", optional = true }
txredisapi = { version = ">=1.4.7", optional = true }
hiredis = { version = "*", optional = true }
Pympler = { version = "*", optional = true }
parameterized = { version = ">=0.7.4", optional = true }
idna = { version = ">=2.5", optional = true }
opentelemetry-api = {version = "^1.11.1", optional = true}
opentelemetry-sdk = {version = "^1.11.1", optional = true}
opentelemetry-exporter-jaeger = {version = "^1.11.1", optional = true}

[tool.poetry.extras]
# NB: Packages that should be part of `pip install matrix-synapse[all]` need to be specified
Expand All @@ -196,7 +197,7 @@ oidc = ["authlib"]
systemd = ["systemd-python"]
url_preview = ["lxml"]
sentry = ["sentry-sdk"]
opentracing = ["jaeger-client", "opentracing"]
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
opentelemetry = ["opentelemetry-api", "opentelemetry-sdk", "opentelemetry-exporter-jaeger"]
jwt = ["authlib"]
# hiredis is not a *strict* dependency, but it makes things much faster.
# (if it is not installed, we fall back to slow code.)
Expand Down Expand Up @@ -229,8 +230,8 @@ all = [
"lxml",
# sentry
"sentry-sdk",
# opentracing
"jaeger-client", "opentracing",
# opentelemetry
"opentelemetry-api", "opentelemetry-sdk", "opentelemetry-exporter-jaeger",
# redis
"txredisapi", "hiredis",
# cache_memory
Expand All @@ -254,7 +255,6 @@ mypy-zope = "*"
types-bleach = ">=4.1.0"
types-commonmark = ">=0.9.2"
types-jsonschema = ">=3.2.0"
types-opentracing = ">=2.4.2"
types-Pillow = ">=8.3.4"
types-psycopg2 = ">=2.9.9"
types-pyOpenSSL = ">=20.0.7"
Expand Down
10 changes: 5 additions & 5 deletions synapse/api/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
from synapse.appservice import ApplicationService
from synapse.http import get_request_user_agent
from synapse.http.site import SynapseRequest
from synapse.logging.opentracing import active_span, force_tracing, start_active_span
from synapse.logging.tracing import active_span, force_tracing, start_active_span
from synapse.storage.databases.main.registration import TokenLookupResult
from synapse.types import Requester, UserID, create_requester

Expand Down Expand Up @@ -147,14 +147,14 @@ async def get_user_by_req(
# so we also force it on for that.
force_tracing()
force_tracing(parent_span)
parent_span.set_tag(
parent_span.set_attribute(
"authenticated_entity", requester.authenticated_entity
)
parent_span.set_tag("user_id", requester.user.to_string())
parent_span.set_attribute("user_id", requester.user.to_string())
if requester.device_id is not None:
parent_span.set_tag("device_id", requester.device_id)
parent_span.set_attribute("device_id", requester.device_id)
if requester.app_service is not None:
parent_span.set_tag("appservice_id", requester.app_service.id)
parent_span.set_attribute("appservice_id", requester.app_service.id)
return requester

async def _wrapped_get_user_by_req(
Expand Down
8 changes: 4 additions & 4 deletions synapse/app/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
from synapse.events.third_party_rules import load_legacy_third_party_event_rules
from synapse.handlers.auth import load_legacy_password_auth_providers
from synapse.logging.context import PreserveLoggingContext
from synapse.logging.opentracing import init_tracer
from synapse.logging.tracing import init_tracer
from synapse.metrics import install_gc_manager, register_threadpool
from synapse.metrics.background_process_metrics import wrap_as_background_process
from synapse.metrics.jemalloc import setup_jemalloc_stats
Expand Down Expand Up @@ -549,16 +549,16 @@ def setup_sentry(hs: "HomeServer") -> None:

# We set some default tags that give some context to this instance
with sentry_sdk.configure_scope() as scope:
scope.set_tag("matrix_server_name", hs.config.server.server_name)
scope.set_attribute("matrix_server_name", hs.config.server.server_name)

app = (
hs.config.worker.worker_app
if hs.config.worker.worker_app
else "synapse.app.homeserver"
)
name = hs.get_instance_name()
scope.set_tag("worker_app", app)
scope.set_tag("worker_name", name)
scope.set_attribute("worker_app", app)
scope.set_attribute("worker_name", name)
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved


def setup_sdnotify(hs: "HomeServer") -> None:
Expand Down
31 changes: 17 additions & 14 deletions synapse/config/tracer.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,41 +24,44 @@ class TracerConfig(Config):
section = "tracing"

def read_config(self, config: JsonDict, **kwargs: Any) -> None:
opentracing_config = config.get("opentracing")
if opentracing_config is None:
opentracing_config = {}
opentelemetry_config = config.get("opentelemetry")
if opentelemetry_config is None:
opentelemetry_config = {}

self.opentracer_enabled = opentracing_config.get("enabled", False)
self.opentelemetry_enabled = opentelemetry_config.get("enabled", False)

self.jaeger_config = opentracing_config.get(
"jaeger_config",
{"sampler": {"type": "const", "param": 1}, "logging": False},
# TODO: no more
self.jaeger_exporter_config = opentelemetry_config.get(
"jaeger_exporter_config",
{},
)

self.force_tracing_for_users: Set[str] = set()

if not self.opentracer_enabled:
if not self.opentelemetry_enabled:
return

check_requirements("opentracing")
check_requirements("opentelemetry")

# The tracer is enabled so sanitize the config

self.opentracer_whitelist: List[str] = opentracing_config.get(
self.opentelemetry_whitelist: List[str] = opentelemetry_config.get(
"homeserver_whitelist", []
)
if not isinstance(self.opentracer_whitelist, list):
if not isinstance(self.opentelemetry_whitelist, list):
raise ConfigError("Tracer homeserver_whitelist config is malformed")

force_tracing_for_users = opentracing_config.get("force_tracing_for_users", [])
force_tracing_for_users = opentelemetry_config.get(
"force_tracing_for_users", []
)
if not isinstance(force_tracing_for_users, list):
raise ConfigError(
"Expected a list", ("opentracing", "force_tracing_for_users")
"Expected a list", ("opentelemetry", "force_tracing_for_users")
)
for i, u in enumerate(force_tracing_for_users):
if not isinstance(u, str):
raise ConfigError(
"Expected a string",
("opentracing", "force_tracing_for_users", f"index {i}"),
("opentelemetry", "force_tracing_for_users", f"index {i}"),
)
self.force_tracing_for_users.add(u)
2 changes: 1 addition & 1 deletion synapse/federation/federation_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
nested_logging_context,
run_in_background,
)
from synapse.logging.opentracing import log_kv, start_active_span_from_edu, trace
from synapse.logging.tracing import log_kv, start_active_span_from_edu, trace
from synapse.metrics.background_process_metrics import wrap_as_background_process
from synapse.replication.http.federation import (
ReplicationFederationSendEduRestServlet,
Expand Down
4 changes: 2 additions & 2 deletions synapse/federation/sender/per_destination_queue.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from synapse.federation.units import Edu
from synapse.handlers.presence import format_user_presence_state
from synapse.logging import issue9533_logger
from synapse.logging.opentracing import SynapseTags, set_tag
from synapse.logging.tracing import SynapseTags, set_attribute
from synapse.metrics import sent_transactions_counter
from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.types import ReadReceipt
Expand Down Expand Up @@ -596,7 +596,7 @@ async def _get_to_device_message_edus(self, limit: int) -> Tuple[List[Edu], int]
if not message_id:
continue

set_tag(SynapseTags.TO_DEVICE_MESSAGE_ID, message_id)
set_attribute(SynapseTags.TO_DEVICE_MESSAGE_ID, message_id)

edus = [
Edu(
Expand Down
8 changes: 4 additions & 4 deletions synapse/federation/sender/transaction_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,11 @@
from synapse.events import EventBase
from synapse.federation.persistence import TransactionActions
from synapse.federation.units import Edu, Transaction
from synapse.logging.opentracing import (
from synapse.logging.tracing import (
extract_text_map,
set_tag,
set_status,
start_active_span_follows_from,
tags,
StatusCode,
whitelisted_homeserver,
)
from synapse.types import JsonDict
Expand Down Expand Up @@ -166,7 +166,7 @@ def json_data_cb() -> JsonDict:
except HttpResponseException as e:
code = e.code

set_tag(tags.ERROR, True)
set_status(StatusCode.ERROR)

logger.info("TX [%s] {%s} got %d response", destination, txn_id, code)
raise
Expand Down
6 changes: 3 additions & 3 deletions synapse/federation/transport/server/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@
from synapse.http.servlet import parse_json_object_from_request
from synapse.http.site import SynapseRequest
from synapse.logging.context import run_in_background
from synapse.logging.opentracing import (
from synapse.logging.tracing import (
active_span,
set_tag,
set_attribute,
span_context_from_request,
start_active_span,
start_active_span_follows_from,
Expand Down Expand Up @@ -309,7 +309,7 @@ async def new_func(
raise

# update the active opentracing span with the authenticated entity
set_tag("authenticated_entity", str(origin))
set_attribute("authenticated_entity", str(origin))

# if the origin is authenticated and whitelisted, use its span context
# as the parent.
Expand Down
28 changes: 14 additions & 14 deletions synapse/handlers/device.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
RequestSendFailed,
SynapseError,
)
from synapse.logging.opentracing import log_kv, set_tag, trace
from synapse.logging.tracing import log_kv, set_attribute, trace
from synapse.metrics.background_process_metrics import (
run_as_background_process,
wrap_as_background_process,
Expand Down Expand Up @@ -86,7 +86,7 @@ async def get_devices_by_user(self, user_id: str) -> List[JsonDict]:
info on each device
"""

set_tag("user_id", user_id)
set_attribute("user_id", user_id)
device_map = await self.store.get_devices_by_user(user_id)

ips = await self.store.get_last_client_ip_by_device(user_id, device_id=None)
Expand Down Expand Up @@ -118,8 +118,8 @@ async def get_device(self, user_id: str, device_id: str) -> JsonDict:
ips = await self.store.get_last_client_ip_by_device(user_id, device_id)
_update_device_from_client_ips(device, ips)

set_tag("device", str(device))
set_tag("ips", str(ips))
set_attribute("device", str(device))
set_attribute("ips", str(ips))

return device

Expand Down Expand Up @@ -169,8 +169,8 @@ async def get_user_ids_changed(
joined a room, that `user_id` may be interested in.
"""

set_tag("user_id", user_id)
set_tag("from_token", str(from_token))
set_attribute("user_id", user_id)
set_attribute("from_token", str(from_token))
now_room_key = self.store.get_room_max_token()

room_ids = await self.store.get_rooms_for_user(user_id)
Expand Down Expand Up @@ -461,8 +461,8 @@ async def delete_devices(self, user_id: str, device_ids: List[str]) -> None:
except errors.StoreError as e:
if e.code == 404:
# no match
set_tag("error", True)
set_tag("reason", "User doesn't have that device id.")
set_attribute("error", True)
set_attribute("reason", "User doesn't have that device id.")
else:
raise

Expand Down Expand Up @@ -794,8 +794,8 @@ async def incoming_device_list_update(
for parsing the EDU and adding to pending updates list.
"""

set_tag("origin", origin)
set_tag("edu_content", str(edu_content))
set_attribute("origin", origin)
set_attribute("edu_content", str(edu_content))
user_id = edu_content.pop("user_id")
device_id = edu_content.pop("device_id")
stream_id = str(edu_content.pop("stream_id")) # They may come as ints
Expand All @@ -815,7 +815,7 @@ async def incoming_device_list_update(
origin,
)

set_tag("error", True)
set_attribute("error", True)
log_kv(
{
"message": "Got a device list update edu from a user and "
Expand All @@ -830,7 +830,7 @@ async def incoming_device_list_update(
if not room_ids:
# We don't share any rooms with this user. Ignore update, as we
# probably won't get any further updates.
set_tag("error", True)
set_attribute("error", True)
log_kv(
{
"message": "Got an update from a user for which "
Expand Down Expand Up @@ -1027,12 +1027,12 @@ async def user_device_resync(
# eventually become consistent.
return None
except FederationDeniedError as e:
set_tag("error", True)
set_attribute("error", True)
log_kv({"reason": "FederationDeniedError"})
logger.info(e)
return None
except Exception as e:
set_tag("error", True)
set_attribute("error", True)
log_kv(
{"message": "Exception raised by federation request", "exception": e}
)
Expand Down
8 changes: 4 additions & 4 deletions synapse/handlers/devicemessage.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,11 @@
from synapse.api.errors import SynapseError
from synapse.api.ratelimiting import Ratelimiter
from synapse.logging.context import run_in_background
from synapse.logging.opentracing import (
from synapse.logging.tracing import (
SynapseTags,
get_active_span_text_map,
log_kv,
set_tag,
set_attribute,
)
from synapse.replication.http.devices import ReplicationUserDevicesResyncRestServlet
from synapse.types import JsonDict, Requester, StreamKeyType, UserID, get_domain_from_id
Expand Down Expand Up @@ -217,10 +217,10 @@ async def send_device_message(
sender_user_id = requester.user.to_string()

message_id = random_string(16)
set_tag(SynapseTags.TO_DEVICE_MESSAGE_ID, message_id)
set_attribute(SynapseTags.TO_DEVICE_MESSAGE_ID, message_id)

log_kv({"number_of_to_device_messages": len(messages)})
set_tag("sender", sender_user_id)
set_attribute("sender", sender_user_id)
local_messages = {}
remote_messages: Dict[str, Dict[str, Dict[str, JsonDict]]] = {}
for user_id, by_device in messages.items():
Expand Down
Loading