Skip to content

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Sep 25, 2025

Something changed in SQLAlchemy 1.3 after we upgraded (maybe in 5b02472?) which caused the graph and matrix queries to start erroring, showing something like:

InvalidRequestError("Can't determine which FROM clause to join from, there are multiple FROMS which can join to this entity. Please use the .select_from() method to establish an explicit left side, as well as providing an explcit ON clause if not present already to help resolve the ambiguity.")

This fixes the errors by explicitly specifying which table to use as the FROM query. From what I understand we only have this error if there are multiple joins in the same query.

The query in load_geomean_data also seemed to be missing a join on the samples table. I guess SQLAlchemy must have previously inferred the join or something? Since the relations go run -> sample -> test.

This affected both the sqlite and postgres backends, and I've tested this fixes the error with both.

Something changed in SQLAlchemy 1.3 after we upgraded (maybe in 5b02472?) which caused the graph and matrix queries to start erroring, showing something like:

> InvalidRequestError("Can't determine which FROM clause to join from, there are multiple FROMS which can join to this entity. Please use the .select_from() method to establish an explicit left side, as well as providing an explcit ON clause if not present already to help resolve the ambiguity.")

This fixes the errors by explicitly specifying which table to use as the FROM query. From what I understand we only have this error if there are multiple joins in the same query.

The query in load_geomean_data also seemed to be missing a join on the samples table. I guess SQLAlchemy must have previously inferred the join or something? Since the relations go run -> sample -> test.

This affected both the sqlite and postgres backends, and I've tested this fixes the error with both.
@DavidSpickett
Copy link
Contributor

I see this fix is like the common ones online for this same error and:

This affected both the sqlite and postgres backends, and I've tested this fixes the error with both.

I'll take your word for, but I've got zero knowledge of how the fix actually works.

So give it a few days to see if the others do, but if not, probably safe to go in.

@DavidSpickett
Copy link
Contributor

I'll see if I can find someone who knows what they're talking about in the meantime :)

@lukel97
Copy link
Contributor Author

lukel97 commented Sep 30, 2025

I'll see if I can find someone who knows what they're talking about in the meantime :)

I appreciate it, SQL is not my area of expertise either :)

@ldionne
Copy link
Member

ldionne commented Sep 30, 2025

Thanks for fixing this! I was looking at this earlier today before I found your PR. I tried pulling down the patch and I'm seeing something strange though, the graph doesn't seem to work properly:

Screenshot 2025-09-30 at 17 09 05

It seems that all points are displayed for all orders. This could be because I am importing data into my local instance with lnt importreport, which might have other problems. If you don't see this behavior, your fix is probably good and I should look into this other issue.

Also, out of curiosity, do you use Python 2 or Python 3? How did you manage to run the tests?

Edit: After inspecting the database, I don't think lnt importreport is mis-behaving. I think there's a genuine issue with the query as written in the patch, see my next comment.

@ldionne
Copy link
Member

ldionne commented Oct 2, 2025

I played around with this some more, and here's the query that got things working for me:

diff --git a/lnt/server/ui/views.py b/lnt/server/ui/views.py
index 25947dd..9731f87 100644
--- a/lnt/server/ui/views.py
+++ b/lnt/server/ui/views.py
@@ -876,15 +876,13 @@ def load_graph_data(plot_parameter, show_failures, limit, xaxis_date, revision_c
     ts = request.get_testsuite()
 
     # Load all the field values for this test on the same machine.
-    #
-    # FIXME: Don't join to Order here, aggregate this across all the tests
-    # we want to load. Actually, we should just make this a single query.
-    values = session.query(plot_parameter.field.column, ts.Order,
-                           ts.Run.start_time, ts.Run.id) \
-                    .select_from(ts.Run).join(ts.Order) \
+    values = session.query(plot_parameter.field.column, ts.Run, ts.Order) \
+                    .join(ts.Run, ts.Sample.run_id == ts.Run.id) \
+                    .join(ts.Order, ts.Run.order_id == ts.Order.id) \
                     .filter(ts.Run.machine_id == plot_parameter.machine.id) \
                     .filter(ts.Sample.test == plot_parameter.test) \
                     .filter(plot_parameter.field.column.isnot(None))
+
     # Unless all samples requested, filter out failing tests.
     if not show_failures:
         if plot_parameter.field.status_field:
@@ -895,16 +893,18 @@ def load_graph_data(plot_parameter, show_failures, limit, xaxis_date, revision_c
 
     if xaxis_date:
         # Aggregate by date.
+        # TODO: Why do we use the start_time of the Run for this ordering?
+        #       Shouldn't we use the date associated to the Order (i.e. the commit date) instead?
         data = list(multidict.multidict(
-            (date, (val, order, date, run_id))
-            for val, order, date, run_id in values).items())
+            (run.start_time, (val, order, run.start_time, run.id))
+            for val, run, order in values).items())
         # Sort data points according to date.
         data.sort(key=lambda sample: sample[0])
     else:
         # Aggregate by order (revision).
         data = list(multidict.multidict(
-            (order.llvm_project_revision, (val, order, date, run_id))
-            for val, order, date, run_id in values).items())
+            (order.llvm_project_revision, (val, order, run.start_time, run.id))
+            for val, run, order in values).items())
         # Sort data points according to order (revision).
         data.sort(key=lambda sample: convert_revision(sample[0], cache=revision_cache))
 

@lukel97
Copy link
Contributor Author

lukel97 commented Oct 3, 2025

@ldionne whoops thanks for pointing that out, the dataset I'm testing this on is all over the place so I didn't even notice the query was bogus! It looks like the table we need to explicitly select from is NT_Sample, I pushed a fix in 8b789bb. So the hierarchy is that one order has many runs, which has many samples.

Also, out of curiosity, do you use Python 2 or Python 3? How did you manage to run the tests?

I've been using Python 3, I'm not sure if it still runs on Python 2. For testing I've just been running lit in the tests folder like lit tests, but most tests seem to be failing on python 3.13

Copy link
Member

@ldionne ldionne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks! I tested the latest version of your patch locally and indeed, that seems to work for me.

@ldionne ldionne merged commit 17a374c into llvm:main Oct 3, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants