Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Draft: Migrate to OpenTelemetry tracing #13400

Closed
Closed
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
0cc610e
Migrate to OpenTelemetry tracing
MadLittleMods Jul 26, 2022
2fe6911
Some shim and some new
MadLittleMods Jul 27, 2022
6984cef
Progress towards OTEL
MadLittleMods Jul 27, 2022
6406fd5
Server running
MadLittleMods Jul 27, 2022
2428172
Export to Jaeger (things are showing up)
MadLittleMods Jul 27, 2022
0d7a2b9
Revert changes to Sentry scopes (not OTEL)
MadLittleMods Jul 27, 2022
9e1de86
We use the config for the Jaeger exporter now
MadLittleMods Jul 27, 2022
f6c3b22
Fix some lints
MadLittleMods Jul 27, 2022
3a25996
Fixup some todos
MadLittleMods Jul 28, 2022
1b0840e
Fix some lints
MadLittleMods Jul 29, 2022
1d208fa
Fix invalid attribute type
MadLittleMods Jul 29, 2022
2011ac2
Fix using wrong type of context (`Context` vs `SpanContext`)
MadLittleMods Jul 29, 2022
19d20b5
Record exception
MadLittleMods Jul 29, 2022
786dd9b
Explain weird function
MadLittleMods Jul 29, 2022
7c135b9
Easier to follow local vs remote span tracing
MadLittleMods Jul 30, 2022
d29a4af
Move to start_active_span
MadLittleMods Jul 30, 2022
041acdf
Working second test although it's a bit pointless testing whether ope…
MadLittleMods Jul 30, 2022
d848156
Passing tests and context manager doesn't seem to be needed
MadLittleMods Jul 30, 2022
070195a
Use correct type for what start_as_current_span returns
MadLittleMods Jul 30, 2022
7772f50
Use HTTP_HOST attribute
MadLittleMods Jul 30, 2022
322da51
Fix some lints
MadLittleMods Aug 1, 2022
33fd24e
todos
MadLittleMods Aug 1, 2022
a9fb504
Implement start_active_span_from_edu for OTEL
MadLittleMods Aug 1, 2022
8e902b8
Remove what's left of scopemanager
MadLittleMods Aug 1, 2022
00be06c
Try to align read from edu content
MadLittleMods Aug 1, 2022
6255a1a
Fix tests and some lints
MadLittleMods Aug 2, 2022
b3cdbad
PoC force tracing
MadLittleMods Aug 2, 2022
d15fa45
Non-working try baggage to inherit force tracing/sampling
MadLittleMods Aug 2, 2022
6bb7cb7
Revert "Non-working try baggage to inherit force tracing/sampling"
MadLittleMods Aug 2, 2022
dbd9005
Revert crazy custom sampler and span process to try force tracing for…
MadLittleMods Aug 2, 2022
0f93ec8
Fix lints
MadLittleMods Aug 2, 2022
36d6648
Remove type ignore comments
MadLittleMods Aug 2, 2022
fb0e820
More clear method names
MadLittleMods Aug 2, 2022
b09651a
Always return config path for config error
MadLittleMods Aug 2, 2022
da396a2
Add test for what happens when side by side spans in with statement
MadLittleMods Aug 2, 2022
ad71bc3
End on exit is already the default expected behavior
MadLittleMods Aug 2, 2022
59facea
Restore logging current_context (not sure why removed
MadLittleMods Aug 2, 2022
9d6fcf3
Clean up some opentracing text references
MadLittleMods Aug 2, 2022
fcc4220
Update docs
MadLittleMods Aug 2, 2022
d72cacf
Add changelog
MadLittleMods Aug 2, 2022
ba4a46a
Seems to (see test_side_by_side_spans)
MadLittleMods Aug 2, 2022
72c718d
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Aug 2, 2022
c26fa2d
Move to 72 schema version
MadLittleMods Aug 2, 2022
5999132
Fix lints
MadLittleMods Aug 2, 2022
2491665
Fix remnant
MadLittleMods Aug 2, 2022
16d17f7
Fix table missing column
MadLittleMods Aug 3, 2022
b6f5665
Use latested Twisted from source to fix contextvar issues causing OTE…
MadLittleMods Aug 3, 2022
699dad0
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Aug 3, 2022
270db42
Update treq to match minimum Twisted Python versions
MadLittleMods Aug 3, 2022
f5da762
Revert "Update treq to match minimum Twisted Python versions"
MadLittleMods Aug 3, 2022
ccd4752
Fix tracing imports after merging in develop
MadLittleMods Aug 3, 2022
d7166a0
Update docs/tracing.md
MadLittleMods Aug 3, 2022
7566375
Try fix Twisted/treq problems
MadLittleMods Aug 4, 2022
7024d7b
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Aug 9, 2022
8def7e4
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Aug 18, 2022
50f0342
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 9, 2022
f73bc59
Try to resolve poetry deps
MadLittleMods Sep 9, 2022
a15592d
Poetry install again
MadLittleMods Sep 9, 2022
32b9d16
poetry update
MadLittleMods Sep 9, 2022
6c40dfa
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 12, 2022
ad3e324
Install otel deps from develop
MadLittleMods Sep 12, 2022
15e242e
OTEL install with DMR
MadLittleMods Sep 13, 2022
d730a46
Update Twisted to lastest
MadLittleMods Sep 13, 2022
ed11237
Remove linting from CI for now
MadLittleMods Sep 13, 2022
19c6f6e
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 13, 2022
b77d49f
Hopefully fix problem when OTEL not installed with non recording span
MadLittleMods Sep 13, 2022
a027c6e
Maybe fix positional argument mismatch for DummyLink
MadLittleMods Sep 14, 2022
84f91e3
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 14, 2022
b86869f
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 20, 2022
e4b9898
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Sep 26, 2022
4a495ac
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Oct 1, 2022
7d70acd
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Oct 20, 2022
627951e
Fix poetry.lock conflicts
MadLittleMods Oct 20, 2022
d993cb0
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Nov 18, 2022
7acb365
Merge branch 'develop' into madlittlemods/11850-migrate-to-opentelemetry
MadLittleMods Nov 21, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions mypy.ini
Original file line number Diff line number Diff line change
Expand Up @@ -164,9 +164,6 @@ ignore_missing_imports = True
[mypy-pympler.*]
ignore_missing_imports = True

[mypy-rust_python_jaeger_reporter.*]
ignore_missing_imports = True

[mypy-saml2.*]
ignore_missing_imports = True

Expand Down
337 changes: 212 additions & 125 deletions poetry.lock

Large diffs are not rendered by default.

12 changes: 6 additions & 6 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -175,13 +175,14 @@ authlib = { version = ">=0.14.0", optional = true }
systemd-python = { version = ">=231", optional = true }
lxml = { version = ">=4.2.0", optional = true }
sentry-sdk = { version = ">=0.7.2", optional = true }
opentracing = { version = ">=2.2.0", optional = true }
jaeger-client = { version = ">=4.0.0", optional = true }
txredisapi = { version = ">=1.4.7", optional = true }
hiredis = { version = "*", optional = true }
Pympler = { version = "*", optional = true }
parameterized = { version = ">=0.7.4", optional = true }
idna = { version = ">=2.5", optional = true }
opentelemetry-api = {version = "^1.11.1", optional = true}
opentelemetry-sdk = {version = "^1.11.1", optional = true}
opentelemetry-exporter-jaeger = {version = "^1.11.1", optional = true}

[tool.poetry.extras]
# NB: Packages that should be part of `pip install matrix-synapse[all]` need to be specified
Expand All @@ -196,7 +197,7 @@ oidc = ["authlib"]
systemd = ["systemd-python"]
url_preview = ["lxml"]
sentry = ["sentry-sdk"]
opentracing = ["jaeger-client", "opentracing"]
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
opentelemetry = ["opentelemetry-api", "opentelemetry-sdk", "opentelemetry-exporter-jaeger"]
jwt = ["authlib"]
# hiredis is not a *strict* dependency, but it makes things much faster.
# (if it is not installed, we fall back to slow code.)
Expand Down Expand Up @@ -229,8 +230,8 @@ all = [
"lxml",
# sentry
"sentry-sdk",
# opentracing
"jaeger-client", "opentracing",
# opentelemetry
"opentelemetry-api", "opentelemetry-sdk", "opentelemetry-exporter-jaeger",
# redis
"txredisapi", "hiredis",
# cache_memory
Expand All @@ -254,7 +255,6 @@ mypy-zope = "*"
types-bleach = ">=4.1.0"
types-commonmark = ">=0.9.2"
types-jsonschema = ">=3.2.0"
types-opentracing = ">=2.4.2"
types-Pillow = ">=8.3.4"
types-psycopg2 = ">=2.9.9"
types-pyOpenSSL = ">=20.0.7"
Expand Down
12 changes: 6 additions & 6 deletions synapse/api/auth.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
from synapse.appservice import ApplicationService
from synapse.http import get_request_user_agent
from synapse.http.site import SynapseRequest
from synapse.logging.opentracing import active_span, force_tracing, start_active_span
from synapse.logging.tracing import force_tracing, get_active_span, start_active_span
from synapse.storage.databases.main.registration import TokenLookupResult
from synapse.types import Requester, UserID, create_requester

Expand Down Expand Up @@ -132,7 +132,7 @@ async def get_user_by_req(
is invalid.
AuthError if access is denied for the user in the access token
"""
parent_span = active_span()
parent_span = get_active_span()
with start_active_span("get_user_by_req"):
requester = await self._wrapped_get_user_by_req(
request, allow_guest, allow_expired
Expand All @@ -147,14 +147,14 @@ async def get_user_by_req(
# so we also force it on for that.
force_tracing()
force_tracing(parent_span)
parent_span.set_tag(
parent_span.set_attribute(
"authenticated_entity", requester.authenticated_entity
)
parent_span.set_tag("user_id", requester.user.to_string())
parent_span.set_attribute("user_id", requester.user.to_string())
if requester.device_id is not None:
parent_span.set_tag("device_id", requester.device_id)
parent_span.set_attribute("device_id", requester.device_id)
if requester.app_service is not None:
parent_span.set_tag("appservice_id", requester.app_service.id)
parent_span.set_attribute("appservice_id", requester.app_service.id)
return requester

async def _wrapped_get_user_by_req(
Expand Down
2 changes: 1 addition & 1 deletion synapse/app/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@
from synapse.events.third_party_rules import load_legacy_third_party_event_rules
from synapse.handlers.auth import load_legacy_password_auth_providers
from synapse.logging.context import PreserveLoggingContext
from synapse.logging.opentracing import init_tracer
from synapse.logging.tracing import init_tracer
from synapse.metrics import install_gc_manager, register_threadpool
from synapse.metrics.background_process_metrics import wrap_as_background_process
from synapse.metrics.jemalloc import setup_jemalloc_stats
Expand Down
30 changes: 16 additions & 14 deletions synapse/config/tracer.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,41 +24,43 @@ class TracerConfig(Config):
section = "tracing"

def read_config(self, config: JsonDict, **kwargs: Any) -> None:
opentracing_config = config.get("opentracing")
if opentracing_config is None:
opentracing_config = {}
opentelemetry_config = config.get("opentelemetry")
if opentelemetry_config is None:
opentelemetry_config = {}

self.opentracer_enabled = opentracing_config.get("enabled", False)
self.opentelemetry_enabled = opentelemetry_config.get("enabled", False)

self.jaeger_config = opentracing_config.get(
"jaeger_config",
{"sampler": {"type": "const", "param": 1}, "logging": False},
self.jaeger_exporter_config = opentelemetry_config.get(
"jaeger_exporter_config",
{},
)

self.force_tracing_for_users: Set[str] = set()

if not self.opentracer_enabled:
if not self.opentelemetry_enabled:
return

check_requirements("opentracing")
check_requirements("opentelemetry")

# The tracer is enabled so sanitize the config

self.opentracer_whitelist: List[str] = opentracing_config.get(
self.opentelemetry_whitelist: List[str] = opentelemetry_config.get(
"homeserver_whitelist", []
)
if not isinstance(self.opentracer_whitelist, list):
if not isinstance(self.opentelemetry_whitelist, list):
raise ConfigError("Tracer homeserver_whitelist config is malformed")

force_tracing_for_users = opentracing_config.get("force_tracing_for_users", [])
force_tracing_for_users = opentelemetry_config.get(
"force_tracing_for_users", []
)
if not isinstance(force_tracing_for_users, list):
raise ConfigError(
"Expected a list", ("opentracing", "force_tracing_for_users")
"Expected a list", ("opentelemetry", "force_tracing_for_users")
)
for i, u in enumerate(force_tracing_for_users):
if not isinstance(u, str):
raise ConfigError(
"Expected a string",
("opentracing", "force_tracing_for_users", f"index {i}"),
("opentelemetry", "force_tracing_for_users", f"index {i}"),
)
self.force_tracing_for_users.add(u)
2 changes: 1 addition & 1 deletion synapse/federation/federation_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@
nested_logging_context,
run_in_background,
)
from synapse.logging.opentracing import log_kv, start_active_span_from_edu, trace
from synapse.logging.tracing import log_kv, start_active_span_from_edu, trace
from synapse.metrics.background_process_metrics import wrap_as_background_process
from synapse.replication.http.federation import (
ReplicationFederationSendEduRestServlet,
Expand Down
4 changes: 2 additions & 2 deletions synapse/federation/sender/per_destination_queue.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
from synapse.federation.units import Edu
from synapse.handlers.presence import format_user_presence_state
from synapse.logging import issue9533_logger
from synapse.logging.opentracing import SynapseTags, set_tag
from synapse.logging.tracing import SynapseTags, set_attribute
from synapse.metrics import sent_transactions_counter
from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.types import ReadReceipt
Expand Down Expand Up @@ -596,7 +596,7 @@ async def _get_to_device_message_edus(self, limit: int) -> Tuple[List[Edu], int]
if not message_id:
continue

set_tag(SynapseTags.TO_DEVICE_MESSAGE_ID, message_id)
set_attribute(SynapseTags.TO_DEVICE_MESSAGE_ID, message_id)

edus = [
Edu(
Expand Down
16 changes: 10 additions & 6 deletions synapse/federation/sender/transaction_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,12 @@
from synapse.events import EventBase
from synapse.federation.persistence import TransactionActions
from synapse.federation.units import Edu, Transaction
from synapse.logging.opentracing import (
from synapse.logging.tracing import (
Link,
StatusCode,
extract_text_map,
set_tag,
start_active_span_follows_from,
tags,
set_status,
start_active_span,
whitelisted_homeserver,
)
from synapse.types import JsonDict
Expand Down Expand Up @@ -94,7 +95,10 @@ async def send_new_transaction(
if keep_destination:
edu.strip_context()

with start_active_span_follows_from("send_transaction", span_contexts):
with start_active_span(
"send_transaction",
links=[Link(context) for context in span_contexts],
):
logger.debug("TX [%s] _attempt_new_transaction", destination)

txn_id = str(self._next_txn_id)
Expand Down Expand Up @@ -166,7 +170,7 @@ def json_data_cb() -> JsonDict:
except HttpResponseException as e:
code = e.code

set_tag(tags.ERROR, True)
set_status(StatusCode.ERROR, e)

logger.info("TX [%s] {%s} got %d response", destination, txn_id, code)
raise
Expand Down
92 changes: 49 additions & 43 deletions synapse/federation/transport/server/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@
import functools
import logging
import re
import time
from http import HTTPStatus
from typing import TYPE_CHECKING, Any, Awaitable, Callable, Dict, Optional, Tuple, cast

Expand All @@ -25,12 +24,13 @@
from synapse.http.servlet import parse_json_object_from_request
from synapse.http.site import SynapseRequest
from synapse.logging.context import run_in_background
from synapse.logging.opentracing import (
active_span,
set_tag,
from synapse.logging.tracing import (
Link,
create_non_recording_span,
get_active_span,
set_attribute,
span_context_from_request,
start_active_span,
start_active_span_follows_from,
whitelisted_homeserver,
)
from synapse.types import JsonDict
Expand Down Expand Up @@ -309,59 +309,65 @@ async def new_func(
raise

# update the active opentracing span with the authenticated entity
set_tag("authenticated_entity", str(origin))
set_attribute("authenticated_entity", str(origin))

# if the origin is authenticated and whitelisted, use its span context
# as the parent.
context = None
origin_span_context = None
if origin and whitelisted_homeserver(origin):
context = span_context_from_request(request)

if context:
servlet_span = active_span()
# a scope which uses the origin's context as a parent
processing_start_time = time.time()
scope = start_active_span_follows_from(
origin_span_context = span_context_from_request(request)

if origin_span_context:
local_servlet_span = get_active_span()
# Create a span which uses the `origin_span_context` as a parent
# so we can see how the incoming payload was processed while
# we're looking at the outgoing trace. Since the parent is set
# to a remote span (from the origin), it won't show up in the
# local trace which is why we create another span below for the
# local trace. A span can only have one parent so we have to
# create two separate ones.
remote_parent_span = start_active_span(
"incoming-federation-request",
child_of=context,
contexts=(servlet_span,),
start_time=processing_start_time,
context=origin_span_context,
# Cross-link back to the local trace so we can jump
# to the incoming side from the remote origin trace.
links=[Link(local_servlet_span.get_span_context())],
)

# Create a local span to appear in the local trace
local_parent_span = start_active_span(
"process-federation-request",
# Cross-link back to the remote outgoing trace so we jump over
# there.
links=[Link(remote_parent_span.get_span_context())],
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
)

else:
# just use our context as a parent
scope = start_active_span(
"incoming-federation-request",
# Otherwise just use our local context as a parent
local_parent_span = start_active_span(
"process-federation-request",
)

try:
with scope:
if origin and self.RATELIMIT:
with ratelimiter.ratelimit(origin) as d:
await d
if request._disconnected:
logger.warning(
"client disconnected before we started processing "
"request"
)
return None
response = await func(
origin, content, request.args, *args, **kwargs
# Don't need to record anything for the remote
remote_parent_span = create_non_recording_span()

with remote_parent_span, local_parent_span:
if origin and self.RATELIMIT:
with ratelimiter.ratelimit(origin) as d:
await d
if request._disconnected:
logger.warning(
"client disconnected before we started processing "
"request"
)
else:
return None
response = await func(
origin, content, request.args, *args, **kwargs
)
finally:
# if we used the origin's context as the parent, add a new span using
# the servlet span as a parent, so that we have a link
if context:
scope2 = start_active_span_follows_from(
"process-federation_request",
contexts=(scope.span,),
start_time=processing_start_time,
else:
response = await func(
origin, content, request.args, *args, **kwargs
)
scope2.close()

return response

Expand Down
Loading