Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Filter out unwanted user_agents from udv. #16124

Merged
merged 5 commits into from
Aug 23, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/16124.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Filter out user agent references to the sliding sync proxy and rust-sdk from the user_daily_visits table to ensure that Element X can be represented fully.
5 changes: 4 additions & 1 deletion synapse/storage/databases/main/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,10 @@ def _generate_user_daily_visits(txn: LoggingTransaction) -> None:
ON u.user_id = udv.user_id AND u.device_id=udv.device_id
INNER JOIN users ON users.name=u.user_id
WHERE ? <= last_seen AND last_seen < ?
AND udv.timestamp IS NULL AND users.is_guest=0
AND udv.timestamp IS NULL
AND u.user_agent !='sync-v3-proxy-'
DMRobertson marked this conversation as resolved.
Show resolved Hide resolved
AND u.user_agent !='matrix-rust-sdk'
AND users.is_guest=0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the definition of the user_ips table it appears that there is one row per user/device/ip, and as such we only store one user agent per device. The upshot of this is that we won't necessarily pick up e.g. Element X user agents if its been clobbered by a sliding sync user agent, which I think will materially skew the stats here?

I wonder if we should instead be filtering out these user agents on insertion instead?

I'm still happy to merge this as is for a first cut if it'd be useful?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch. No point in merging as is then. I'll edit to remove the UAs from insertion.

Note the 'matrix-rust-sdk' entries generated from element x should reduce to zero following element-hq/element-x-ios#1507

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR I think you want to do the filtering here:

async def insert_client_ip(
self,
user_id: str,
access_token: str,
ip: str,
user_agent: str,
device_id: Optional[str],
now: Optional[int] = None,
) -> None:
if not now:
now = int(self._clock.time_msec())
key = (user_id, access_token, ip)
try:
last_seen = self.client_ip_last_seen.get(key)
except KeyError:
last_seen = None
# Rate-limited inserts
if last_seen is not None and (now - last_seen) < LAST_SEEN_GRANULARITY:
return
self.client_ip_last_seen.set(key, now)
if self._update_on_this_worker:
await self.populate_monthly_active_users(user_id)
self._batch_row_update[key] = (user_agent, device_id, now)
else:
# We are not the designated writer-worker, so stream over replication
self.hs.get_replication_command_handler().send_user_ip(
user_id, access_token, ip, user_agent, device_id, now
)

AND users.appservice_id IS NULL
GROUP BY u.user_id, u.device_id
"""
Expand Down
Loading