-
Notifications
You must be signed in to change notification settings - Fork 31
Fix "Can't determine which FROM clause to join from" SQLAlchemy error #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Something changed in SQLAlchemy 1.3 after we upgraded (maybe in 5b02472?) which caused the graph and matrix queries to start erroring, showing something like: > InvalidRequestError("Can't determine which FROM clause to join from, there are multiple FROMS which can join to this entity. Please use the .select_from() method to establish an explicit left side, as well as providing an explcit ON clause if not present already to help resolve the ambiguity.") This fixes the errors by explicitly specifying which table to use as the FROM query. From what I understand we only have this error if there are multiple joins in the same query. The query in load_geomean_data also seemed to be missing a join on the samples table. I guess SQLAlchemy must have previously inferred the join or something? Since the relations go run -> sample -> test. This affected both the sqlite and postgres backends, and I've tested this fixes the error with both.
I see this fix is like the common ones online for this same error and:
I'll take your word for, but I've got zero knowledge of how the fix actually works. So give it a few days to see if the others do, but if not, probably safe to go in. |
I'll see if I can find someone who knows what they're talking about in the meantime :) |
I appreciate it, SQL is not my area of expertise either :) |
I played around with this some more, and here's the query that got things working for me: diff --git a/lnt/server/ui/views.py b/lnt/server/ui/views.py
index 25947dd..9731f87 100644
--- a/lnt/server/ui/views.py
+++ b/lnt/server/ui/views.py
@@ -876,15 +876,13 @@ def load_graph_data(plot_parameter, show_failures, limit, xaxis_date, revision_c
ts = request.get_testsuite()
# Load all the field values for this test on the same machine.
- #
- # FIXME: Don't join to Order here, aggregate this across all the tests
- # we want to load. Actually, we should just make this a single query.
- values = session.query(plot_parameter.field.column, ts.Order,
- ts.Run.start_time, ts.Run.id) \
- .select_from(ts.Run).join(ts.Order) \
+ values = session.query(plot_parameter.field.column, ts.Run, ts.Order) \
+ .join(ts.Run, ts.Sample.run_id == ts.Run.id) \
+ .join(ts.Order, ts.Run.order_id == ts.Order.id) \
.filter(ts.Run.machine_id == plot_parameter.machine.id) \
.filter(ts.Sample.test == plot_parameter.test) \
.filter(plot_parameter.field.column.isnot(None))
+
# Unless all samples requested, filter out failing tests.
if not show_failures:
if plot_parameter.field.status_field:
@@ -895,16 +893,18 @@ def load_graph_data(plot_parameter, show_failures, limit, xaxis_date, revision_c
if xaxis_date:
# Aggregate by date.
+ # TODO: Why do we use the start_time of the Run for this ordering?
+ # Shouldn't we use the date associated to the Order (i.e. the commit date) instead?
data = list(multidict.multidict(
- (date, (val, order, date, run_id))
- for val, order, date, run_id in values).items())
+ (run.start_time, (val, order, run.start_time, run.id))
+ for val, run, order in values).items())
# Sort data points according to date.
data.sort(key=lambda sample: sample[0])
else:
# Aggregate by order (revision).
data = list(multidict.multidict(
- (order.llvm_project_revision, (val, order, date, run_id))
- for val, order, date, run_id in values).items())
+ (order.llvm_project_revision, (val, order, run.start_time, run.id))
+ for val, run, order in values).items())
# Sort data points according to order (revision).
data.sort(key=lambda sample: convert_revision(sample[0], cache=revision_cache))
|
@ldionne whoops thanks for pointing that out, the dataset I'm testing this on is all over the place so I didn't even notice the query was bogus! It looks like the table we need to explicitly select from is NT_Sample, I pushed a fix in 8b789bb. So the hierarchy is that one order has many runs, which has many samples.
I've been using Python 3, I'm not sure if it still runs on Python 2. For testing I've just been running lit in the tests folder like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, thanks! I tested the latest version of your patch locally and indeed, that seems to work for me.
Something changed in SQLAlchemy 1.3 after we upgraded (maybe in 5b02472?) which caused the graph and matrix queries to start erroring, showing something like:
This fixes the errors by explicitly specifying which table to use as the FROM query. From what I understand we only have this error if there are multiple joins in the same query.
The query in load_geomean_data also seemed to be missing a join on the samples table. I guess SQLAlchemy must have previously inferred the join or something? Since the relations go run -> sample -> test.
This affected both the sqlite and postgres backends, and I've tested this fixes the error with both.