ref(ch-upgrades): update query_comparer #5584

MeredithAnya · 2024-02-23T17:24:46Z

Updating the comparer to use the file_manager and grab the files to compare from GCS. Corresponding ops PR https://github.com/getsentry/ops/pull/9471

What the comparer does:

Looks at the results- directories to see if we have pairs of results that are ready to be compared and puts them together as a matched_pairs. e.g.
```
 [('results-21-8/2024_02_15/meredith_test_1.csv', 'results-22-8/2024_02_15/meredith_test_1.csv')]
```
Iterates through those pairs to actually compare the results from those files. Before it does so though, it checks that we haven't already done the comparison (we ignore that if --override is used). We can check in the compared-data/ and compared-perf/ directories
We compare the results and generate mismatches for the data and perf categories based on the result data. Thresholds of what should be considered mismatching are in the corresponding _THRESHOLDS dicts.
We save the comparison results for each category separately, as well as send the slack overview + csv files per category as well. We do calculate the % mismatches based on the total rows of the files. Adding percentages of both categories might exceed 100% since the same query_id could have a mismatch in the performance AND the data consistency.

snuba/cli/query_comparer.py

codecov · 2024-02-23T17:52:03Z

Codecov Report

Attention: Patch coverage is 51.79487% with 94 lines in your changes are missing coverage. Please review.

Project coverage is 90.02%. Comparing base (9693ca2) to head (b135959).
Report is 22 commits behind head on master.

Files	Patch %	Lines
snuba/cli/query_comparer.py	33.88%	80 Missing ⚠️
snuba/clickhouse/upgrades/comparisons.py	68.18%	14 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5584      +/-   ##
==========================================
+ Coverage   89.94%   90.02%   +0.07%     
==========================================
  Files         900      900              
  Lines       43624    43809     +185     
  Branches      288      299      +11     
==========================================
+ Hits        39236    39437     +201     
+ Misses       4346     4330      -16     
  Partials       42       42

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

snuba/cli/query_comparer.py

nikhars · 2024-02-28T18:43:24Z

snuba/cli/query_comparer.py

-    for v1_row, v2_row in itertools.zip_longest(base_reader, upgrade_reader):
-        if v1_row[0] == "query_id":
-            # csv header row
+    def get_matched_pairs() -> Sequence[Tuple[str, str]]:


Can you document what this function does and what it returns. Similar to what you have in the PR description.

Nit: Still missing a small description of what this does :-)

nikhars · 2024-02-28T18:47:05Z

snuba/cli/query_comparer.py

+
+PERF_THRESHOLDS = {
+    "query_duration_ms": 0,
+    "read_rows": 0,


Why have read_rows and read_bytes in perf thresholds?

do we care to track if a query is reading more data than before? that would probably be reflected in the duration I guess

for query performance, it shouldn't really matter as the first point of finding mismatches. you should only use query duration. If you want to drill down into mismatches, those additional fields may give us hints.

snuba/cli/query_comparer.py

nikhars · 2024-03-04T22:58:41Z

snuba/cli/query_comparer.py

-    for v1_row, v2_row in itertools.zip_longest(base_reader, upgrade_reader):
-        if v1_row[0] == "query_id":
-            # csv header row
+    def get_matched_pairs() -> Sequence[Tuple[str, str]]:


Nit: Still missing a small description of what this does :-)

MeredithAnya requested a review from a team as a code owner February 23, 2024 17:24

MeredithAnya commented Feb 23, 2024

View reviewed changes

snuba/cli/query_comparer.py Outdated Show resolved Hide resolved

ref(ch-upgrades): update query_comparer

fc7a10b

MeredithAnya enabled auto-merge (squash) February 27, 2024 20:22

MeredithAnya disabled auto-merge February 27, 2024 20:22

have two separate reports

e245df4

MeredithAnya force-pushed the meredith/update-comparer branch from 6f640b7 to e245df4 Compare February 28, 2024 00:36

nikhars reviewed Feb 28, 2024

View reviewed changes

lotsa updates

3c14266

nikhars approved these changes Mar 4, 2024

View reviewed changes

update docstring for matched_pairs

b135959

MeredithAnya merged commit 0a1f79b into master Mar 5, 2024
31 of 32 checks passed

MeredithAnya deleted the meredith/update-comparer branch March 5, 2024 23:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(ch-upgrades): update query_comparer #5584

ref(ch-upgrades): update query_comparer #5584

MeredithAnya commented Feb 23, 2024 •

edited

Loading

codecov bot commented Feb 23, 2024 •

edited

Loading

nikhars Feb 28, 2024

nikhars Mar 4, 2024

nikhars Feb 28, 2024

MeredithAnya Feb 28, 2024

nikhars Feb 28, 2024

nikhars Mar 4, 2024

ref(ch-upgrades): update query_comparer #5584

ref(ch-upgrades): update query_comparer #5584

Conversation

MeredithAnya commented Feb 23, 2024 • edited Loading

codecov bot commented Feb 23, 2024 • edited Loading

Codecov Report

nikhars Feb 28, 2024

Choose a reason for hiding this comment

nikhars Mar 4, 2024

Choose a reason for hiding this comment

nikhars Feb 28, 2024

Choose a reason for hiding this comment

MeredithAnya Feb 28, 2024

Choose a reason for hiding this comment

nikhars Feb 28, 2024

Choose a reason for hiding this comment

nikhars Mar 4, 2024

Choose a reason for hiding this comment

MeredithAnya commented Feb 23, 2024 •

edited

Loading

codecov bot commented Feb 23, 2024 •

edited

Loading