Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new sequential flag for a graph #2744

Closed
wants to merge 9 commits into from

Conversation

ghalliday
Copy link
Member

Occasionally (e.g., for APPLY) the actions in a graph should be
executed SEQUENTIALly, rather than in PARALLEL. This code adds
an attribute onto the graph to indicate if this is true.
applyaction.ecl is an example that requires it.

Will also require changes in roxie/thor. See issue #2351.

Signed-off-by: Gavin Halliday gavin.halliday@lexisnexis.com

@ghalliday
Copy link
Member Author

@richardkchapman I think this is what you need to ensure roxie behaviour is consistent.
@jakesmith I'm not sure if you have similar issues, but I suspect you may
@rengolin please review.

This change also cleans up the code a bit - there is more that could be done.

@ghost ghost assigned ghalliday Jul 3, 2012
@@ -579,7 +579,7 @@ class CQueryFactory : public CInterface, implements IQueryFactory, implements IR
ActivityArray *loadChildGraph(IPropertyTree &graph)
{
// MORE - this is starting to look very much like loadGraph (on Roxie server side)
ActivityArray *activities = new ActivityArray(true, graph.getPropBool("@delayed"), graph.getPropBool("@library"));
ActivityArray *activities = new ActivityArray(true, graph.getPropBool("@delayed"), graph.getPropBool("@library"), graph.getPropBool("@sequential"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll end up with a long list of booleans... It's more effective to create bitfields here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor when it gets to 5....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 is the charm... ;)

@rengolin
Copy link
Contributor

rengolin commented Jul 3, 2012

Looks good

@jakesmith
Copy link
Member

Change is fine afaict, but

@jakesmith I'm not sure if you have similar issues, but I suspect you may

yes, ones I was unaware of.

a) Thor executes a group of child queries like the ones generated in applyaction.ecl in sequence always at the moment.
b) The order is well defined, the child graphs are in a HT, keyed by sub graph Id. The execution iterates the HT table, so the order although deterministic, is not what you might be expecting...

What order do you expect them to be executed in , ascending subgraph id order ?

@rengolin
Copy link
Contributor

rengolin commented Jul 3, 2012

What order do you expect them to be executed in , ascending subgraph id order ?

If the subgraphs are in the same order as the statements in the ECL code, then yes.

The manual says:

"The APPLY action performs all the specified actions in the actionlist on each record of the nominated dataset. The actions execute in the order they appear in the actionlist."

@jakesmith
Copy link
Member

If the subgraphs are in the same order as the statements in the ECL code, then yes.

that's the key question, I don't even see it as an 'apply' or rather, I don't see any differentiation between a child query set being executed from an apply, as opposed to from any other generated helper.

So the flag will differentiate, i.e. tell me should be ordered, but just clarifying that a) the order is preserved somehow in the generated graph and b) the order = subgraph id order, and/or order generated (and are those 2 things the same thing).
Currently Thor is oblivious, and holds the child queries in a hash table keyed by graph id, and iterates the ht to execute, so it is neither of these orders.

@rengolin
Copy link
Contributor

rengolin commented Jul 3, 2012

Currently Thor is oblivious, and holds the child queries in a hash table keyed by graph id, and iterates the ht to execute, so it is neither of these orders.

I think the engines should know about this. Declaration order and APPLY order can be quite different. You can have multiple APPLYs, each one with a different order. Today, they'll all execute identically.

I don't see the point of an apply if the order is not kept.

Btw, how does SEQUENTIAL work? The semantics is very similar...

@jakesmith
Copy link
Member

It's not in debate that the engines should know, I just need to be clear what deterministic order to follow..

@rengolin
Copy link
Contributor

rengolin commented Jul 3, 2012

Maybe we need another attribute/atom: the order in the apply/sequential list. @ghalliday ?

@richardkchapman
Copy link
Member

The order they should be executed in is the same as the order in the XML, which I believe is also ascending subgraph id order.

I wouldn't have thought there were normally enough subgraphs - or enough calls to lookup a subgraph by id - to justify a hash table.

@richardkchapman
Copy link
Member

(it's a little surprising that thor always passes the applyaction test at present, if what you say about the execution order is correct)

@jakesmith
Copy link
Member

@ghalliday had some doubt I think whether the subgraph id order was always correct.

I doubt ever many subgraphs in this situation either, but it's same structure as elsewhere where there can be 100's.
Can easily be changed to have a ordered array too of course, just need clarification on order.

@rengolin
Copy link
Contributor

rengolin commented Jul 4, 2012

it's a little surprising that thor always passes the applyaction test at present,

The test results were probably created from the output of Thor (or similar), so not that surprising. All is needed is that Thor never changes its order and the test/results never change.

I wouldn't have thought there were normally enough subgraphs - or enough calls to lookup a subgraph by id - to justify a hash table.

I agree. Even if there are hundreds, it's still hundreds.

just need clarification on order.

The manual says the order is what's on apply, Thor does it by ID, Roxie not sure. I believe there is no reason for APPLY other than apply actions in order, so its order is all that matters.

Not sure what to do if one item in the list has a dependency on a further item in the list, requiring the latter to execute first, though. Maybe the APPLY needs some clarification on its purpose before we decide to change the manual or the code.

@richardkchapman
Copy link
Member

Roxie does it by order in XML.

@ghalliday
Copy link
Member Author

Thinking out loud...

I'm not sure I can guarantee that the id order matches the order of statements in the apply. E .g., if there are 3 actions, and actions 1 and 3 share a significant piece of code then they might end up in the same subgraph, and 2 in a different one. If that were the case I would need to add an attribute onto each root activity to indicate the sequence.

However I imagine that is going to cause significant problems for thor if they are in different subgraphs, but need to be executed in order. If this is a child graph does that change the picture?

If so I think that might mean that if the actions need to be executed sequentially then each needs to be in a separate subgraph. The engines can then assume they can be executed in sequence number order.

@rengolin
Copy link
Contributor

rengolin commented Jul 5, 2012

If so I think that might mean that if the actions need to be executed sequentially then each needs to be in a separate subgraph. The engines can then assume they can be executed in sequence number order.

I don't think it's that simple. Suppose you have two applies with the same set of actions, but in different order:

apply(a1, a2, a3);
apply(a3, a1, a2);

The first apply will run "correctly", but the second won't.

So, either the compiler duplicate the actions in different subgraphs, to make Thor run them in the right order, or the compiler marks the actions with special ids that will tell Thor in which order to run them on each apply.

@jakesmith
Copy link
Member

either Gavin wouldn't be able common up in that case, which would be bad news, or perhaps need a containing graph for each, defining dependencies [somehow] to the action subgraphs, that defined order.

However I imagine that is going to cause significant problems for thor if they are in different subgraphs, but need to be executed in order.

Ordered execution of [some?] sinks in different subgraphs.. I'd like to avoid that if possible.
Not sure what all the implications would be.. might have to execute a dependency, but not execute a sink, then revisit that subgraph.. Sounds nasty.

If this is a child graph does that change the picture?

Not entirely sure what you mean, all in a childgraph? in same way is now in e.g. applyactions.ecl ?

@richardkchapman
Copy link
Member

I think we should not worry too much about the efficiency implications of apply not being able to common up.

@rengolin
Copy link
Contributor

rengolin commented Jul 5, 2012

might have to execute a dependency, but not execute a sink, then revisit that subgraph.. Sounds nasty.

That was my initial idea on how to optimise without reordering. Crop all functionality to a separate subgraph and make all sinks depend on them.

@rengolin
Copy link
Contributor

Irrespective on how this will execute in Thor or Roxie, I think this pull request is good to go. Thor and Roxie can later decide what to do with the flag that is being provided.

@ghalliday
Copy link
Member Author

I think I need to ensure the actions are in separate subgraphs before it is ok to pull.

@richardkchapman
Copy link
Member

Please amend commit message to meet guidelines in CONTRIBUTORS file and ensure there is an associated issue.

@richardkchapman
Copy link
Member

This pull request seems to have stalled...

@ghalliday
Copy link
Member Author

Yes. I'm going to switch back to looking at it soon. I think I'll give up on my pure/volatile changes for the moment - at least until I can decided what they should do!

Kevin Wang and others added 5 commits November 14, 2012 15:19
The existing code does not do anything (just breaks the HTTP
connect) if a dropzone does not have access right for uploading
file. As a result, the connection reset error is displayed on
browser. This fix catches all of exceptions for uploading file
and display the exception message in a html page.

Signed-off-by: Kevin Wang <kevin.wang@lexisnexis.com>
Signed-off-by: Kevin Wang <kevin.wang@lexisnexis.com>
Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
Signed-off-by: Kevin Wang <kevin.wang@lexisnexis.com>
HPCC-8183 Display a better message when uploading file fails

Reviewed-By: Anthony Fishbeck <anthony.fishbeck@lexisnexis.com>
Reviewed-By: Richard Chapman <rchapman@hpccsystems.com>
richardkchapman and others added 4 commits November 16, 2012 01:05
HPCC-8286 Release example for issue8286

Reviewed-By: Richard Chapman <rchapman@hpccsystems.com>
Occasionally (e.g., for APPLY) the actions in a graph should be
executed SEQUENTIALly, rather than in PARALLEL.  This code adds
an attribute onto the graph to indicate if this is true.
applyaction.ecl is an example that requires it.

Will also require changes in roxie/thor.  See issue #2351.

Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
If codegen indicates that a graph should be executed sequentially, do so.
Re-enable the applyaction.ecl test that was (sometimes) failing without
this fix.

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
This is required so that roxie/thor can execute the apply actions sequentially,
since they generally execute all sinks in a subgraph in parallel.

Signed-off-by: Gavin Halliday <gavin.halliday@lexisnexis.com>
@ghalliday
Copy link
Member Author

closing this pull request, since this now needs to be targeted to 3.10.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants