Skip to content
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
1417 lines (1021 sloc) 87.5 KB
Every DSnode needs a size
The size needs to be the INPUT size, i.e. the size covered by DSnode.istart -> DSnode.iend
DSnode.istart and DSnode.iend need to be maintained by the MachineNode update!
And at each update, we should re-link all the inodes in that range to the node (for LEAFY nodes only! not ORs... ugh.)
Each INode needs TWO fwd links. "Middle" INodes have one link, INodes at the
boundary of two ONodes get two. ONodes like to point to the token that makes
them bad just past a done state.
Insert: INode is peer-linked but no fwd/back links yet. We follow the FIRST of the fwdlinks; update at that ONode, which WAS pointing to our new inode's right. If the ONode's iend changes
No fwd links. Each update starts at the top of the otree. Each node answers: do I care about this inode?
Each parent cares about every inode that its children care about.
Depth-first updates.
There's something different between:
- Create with this inode as the start
- Update with this inode at the start or middle
Hmmmm, there oughtn't be... updates at initial pos should still be able to re-use later stuff that doesn't move...
Rename MachineNode -> Rule. ...check.
Leave child-creation up to the rule, which will interpret/use the rule's children as appropriate.
- that's a Reposition().
It asks the connector to construct the specific child it wants.
- if inode.left: Update(root, inode.left)
- if inode.right and inode.left != inode.right: Update(root, inode.right)
- else: Reposition(root, inode)
- if inode.left: Update(root, inode.left)
- if inode.right and inode.left != inode.right: Update(root, inode.right)
- else if inode.right: Reposition(root, inode.right)
- else: Clear(root)
Connector::Update(node, inode):
- if I don't care for the node: ignore
- node.Update(*this, inode) # will create and Connector::Reposition and/or Connector::Update
# children as necessary, then determine its own state flags
- my care nodes are the union of my children's care nodes
Connector::Reposition(node, inode):
- node.Reposition(*this, inode)
-- even if we go bad, we will always "accept" (care about) this node. Since it's the first one, it's both our begin and "end" iter at worst.
- Connector::Reposition all children
- set flags like any Update
- Connector::Update all children
- set flags
- Connector::Repo first child
- Connector::Repo subsequent state(s) as appropriate
- set overall flags
- Connector::Update children sequentially until one changes its end-care, and that's when we start Connector::Repo-walking subsequent states
- iend points to the node that makes us bad, or to NULL if we're at the end
- we're subscribed to all the nodes in that span including the end that makes us bad
- Reposition, have a single ok node, and we're at the end of input: size is 1, istart is inode, iend is null
- Reposition, have a single bad node, and we're at the end of input: size is 0, istart is inode, iend is null
- we should recurse up and *something* should own the end, otherwise it's a root failure
HOW are we doing updates again?
- depth-ordering?
- when do we "move up"?? Are non-leaf nodes subscribed? or do children add their parents to the "needs update" set while on-the-go?
= Connector's responsibility. IF a node changes state we live-add its parent to the being-updated set.
1. Clear, and calling this at the right times. Does this also Unlisten? What about Destructor?
- Connector::ClearNode, Connector decides to clear the DS/State if it's the root and return to zero input.
- Rules can choose to clear their x DS/State if they want to, e.g. at RepositionNode.
2. Or, Keyword update returns bool indicating if changed
3. Or, Keyword reposition and updates, including Listening and *Unlistening*, and node deletion
4. Listener set updates need to do proper depth-based ordering including adding parents of changed nodes to the order set
Connector::Insert() and Delete() need to return the topmost parent TreeDS that changed.
- NO, we want the FULL set of Insertions/Deletions that were made to the TreeDS. Yes, Inserts and Deletes.
Implement "Tree" recognizer rules. OrRule public getter to give you the
"winning" child. Seq will probably have to give you the "winning" sequence
too. Maybe every Rule has a way to give you its winners, if that makes sense.
Insert<ListDS>() and Delete<ListDS>() populate and return a depth-ordered (really) set of TreeUpdates.
-- Consumer wants to iterate over depths, and each depth get a list of TreeUpdates in the order at which those updates were added to the map.
-- map<int, vector<TreeUpdate> >
-- maybe that's what our listener_set should have been, for the consumer anyway.
Note that before the Insert/Delete calls, inserted nodes were already connected and deleted nodes were disconnected (in the DS, but they keep their outward connections) and alive. So those nodes are still alive and pending deletion which won't happen until the Connector gets back to the inode-maintainer: yes, thanks for the Delete, we're done with it now.
This update-set is passed to the next Connector. (Connector now has Update(UpdateBatch) which just Insert()s and Delete()s each of the updates in depth-order.) Insert<TreeDS>() and Delete<TreeDS>() look very similar to the ListDS counterparts. We start by taking a leaf-inode but we don't have any onode graph root yet, so we do a Reposition(), which builds out much of the tree parent-first, and along the way subscribes some nodes. We keep going through the update batch, all of them are relevant, don't do any pruning and it won't have been worth much to try. Oh, the Lexer root Star was updated? OK sure, we update the Parser-tree root Tree:Star, and not much changes, we're not looking at any children when that happens, just updating state-flags, so it's almost a no-op. No problemo.
There *might* be optimizations in here, but they're not worth trying at this point, this algo is right, and actually pretty decent. What we did with ListDS's already was pretty good about updating the right stuff in the right order.
Forget the Insert/Update/Delete TreeChangeset rubbish.
# vv TODO: maybe necessarily complete? meh!
When any rule updates and is done (not necessarily complete), it has a list of winners. These are its children that are themselves winners. Some leaf winners know that they themselves are winners if they are done. Or has a single winner, Seq and Star have a list.
We only need to Insert() or Delete() these LEAF nodes -- the winners. Insert() looks like this:
- If we're not the first child of our parent, UpdateListeners the parent (regarding us).
- the parent will have to include this guy, and probably change its iend
meaning it's changed, and the changes bubble up.
- parent should Listen() for this child if it accepts it, otherwise, as we
bubble-up someone else had better do it :)
- Otherwise, keep popping up to find the oldest ancestor where we're not the
first child. Reposition the one where this chain is the first child, giving
it the LEAF node.
RepositionNode() always takes a LEAF. Analogous to the ListDS case, we'll push down Repositions (bouncing between Rule::Reposition() and Connector::RepositionNode()) until we get to the leaf of the rule/machine, which presumably accepts the inode, or errors it. Its acceptance starts an upwards Update() chain. Each rule will Update() and just needs to validate its current position along the tree: "I'm an Or<TreeDS>" or "I expect a Tree:Star that has no parent because I'm the root" or "I expect a Tree:Keyword("new") with no children because I'm a leaf" or whatever.
So again, the root of the inode tree (Lexer:Star top) has a LIST of winners. This is an ordered list of leaves!! But they're part of a tree! We onlyt need to Insert() or Delete() those leaves, but we get all the behaviour of recognizing a whole tree from that.
It's not clear this will scale to the ParserAST<->Compiler recognition.... but cross that bridge when we get there. Likely it will :)
Because EVERY change involves a leaf change. Even though we COULD subscribe higher-level nodes to their respective inode counterparts, there WILL be a leaf change for everything, and those changes need to occur and then bubble-up.
Rules need to maintain TreeDS.hotlist properly:
- They do NOT clear it -- that's done by the Connector
- They simply add to it -- add any child's hotlist!
- Preferably just take hotlists from the children that we know actually changed. Others are irrelevant.
- To be clean, clear the child's hotlist once we take from it.
- So if we do that right, we won't need to clear anyone in the Connector; just the root!
Connector maps a List to a Tree. Tree-Leaf nodes contain a ListDS, representing the view that the subsequent Connector will see. Its next/prev pointers are maintained by Updates() across the Tree. Non-Leaf Tree nodes have pointers to the start and end Leaf-List nodes that they use to help maintain these extremities.
We do bubble up a "hot list" of which nodes have been inserted/removed from the list.
Compiler: will know that when, say, the type of an object changes, this bubbles updates up (towards the root). Should be cool :-)
Within a rule: use iconnection to determine our state, set oconnection based on that state (the result of what our ending children tell us).
iconnections should always be maintained for any rule:
istart is the inode at which you were Reposition()ed
iend is the inode that makes you bad, or NULL if you get to end of input first
you should be Listening to istart, iend, and everything in-between
oconnections for leaf nodes can be set permanently:
- oleaf is auto_ptr<IList> with leaf data
- ostart and oend just refer to oleaf
- osize is 1
- hotlist is only hot if we went to done from not-done, or vice-versa (note I said done, not complete).
For non-leaf nodes, oconnections should NOT be set unless we're done:
- oleaf is always NULL
- ostart is our "winning starting child"'s ostart, or NULL if we're not done
- oend is our "winning ending child"'s oend, or NULL if we're not done
- osize is the sum of the osize of all the winning children, or 0 if we're not done
- hotlist is the hotlists of ALL children
Are we dealing with the complete->incomplete transition properly? e.g. deleting olist nodes?? (hotlist)
bad input: newxdelx
causes bad bug to manifest.
But we appear to have Seq working I guess? We could either plow ahead and see what a Compiler would look like.
Or go back and try to fix our latent horrible bugs -- either by brute-force, or by autotesting, or by visualization.
When should Seq set its iend? Should I even be doing special stuff to mark it "finished" ever -- I mean, sure I'm complete and at my end, but I need to eat one more input to determine it's my "bad" iend node. Oh I think I just need to logic it that way -- try to consume a next bad.... wait. no. my last child's x is complete means it already knows the iend that makes it bad, so that's my final iend, end of story. This should be working correctly.
Ah! Listening nodes (leaves of Otree listen to IList inputs, themselves likely but not necessarily from leaves of ITree) should only Listen to the nodes they actually like, i.e. that keep them OK/Done. NOT the IList node that makes them bad. Connector::UpdateListeners() does the job of updating the node to the *left* of an inserted/deleted node, so we don't need the listeners themselves to care about updates to their end; they'll get the appropriate update anyway.
Also, we can make the Connector decide (based on State) how to make each Rule know when to Listen. Yaaaay :)
- hm, not so easy. Connector doesn't know about each IList that the ONode iterates over. Need to Listen(x, inode) but don't know which inodes; Connector just does UpdateNode(x, child[=NULL]). Would need a more structured interface where Connector really controls the iteration that the Rules want to do. Maybe that's ok, maybe not, need to try it.
Let's just fix the Rules for now to listen to the right things.
Oh, and remember that it's only OLeafs that will ever Listen.
Ah, and we ARE only listening to the "good" (accepted, not bad-making) ILists in Keyword and KeywordMeta. OK.
IsEmitting: true the node is ready to provide output (done || complete).
IsHot == IsEmitting ^ wasEmitting;
Let's wrap up Rule things like this, and push shared functionality into the Connector.
Rules should keep their "partial" state in State; rely much more deeply on State.
The way State::Lock/Unlock is implemented, I trust that Rules will *actually* Reposition themselves (err at least update all their state including State.IsLocked and iconnections/oconnections) when we tell them to, and not do a cute "oh I'm already at that position I don't need to repo" check.
--- uhhhhhhh I'm doing this in Connector.cpp::RepositionNode() !!!!
Reposition() starts with iconnection.istart = &inode, iconnection.iend=NULL, and iconnection.size=0
Reposition() and Update() MUST set iconnection.iend and iconnection.size.
Update() starts with iconnection.istart correct, iconnection.iend=NULL, and iconnection.size=0.
NONONONO Update() is using the initial iconnection.iend in order to compare if it changed state, for hotlist.
We need to re-do the hotlist / output list stuff. What is the right way?
- definitely not what we're doing now! Hotlist items aren't sorted properly.
Simplification: Strategies for OList nodes:
1. Amalgomate the child OLists in order (Seq, Star)
2. Pick a winning child (Or)
3. Constructs its own single OList (Keyword, Regex, parser nodes)
We actually DO need to allow an INode to say: Hey, I'm updated! My OData contents are updated! But I was neither inserted nor deleted from the IList!
- easy, just another Hotlist_OP.
Got it:
- Nodes know their "previous" set of OList nodes; which ones are ALIVE (emitting).
- that's whatever's between ostart and oend
- There's a OList helper fw class to help nodes get this right
- OConnector methods
- On Update(), a node decides what set of OList hot changes there are (deletes, inserts). It compares against its "previous" OList:
- Constructs its own single OList? easy: Can choose if the old one should be deleted, or simply updated with the new state.
- OConnector::GetONode(), then OConnector::SetONode() AND EITHER put a Delete+Insert into the Hotlist, or just an Update.
- Amalgomates its child OLists in order? Any child that is newly-ignored (cleared or whatever), we delete all of their previous-alive nodes, and any child that is newly entered into the mix, we link-up its ILists and take its hotlist.
- Use OConnector::ClearChildren() on the previous-alive child. That clears its hotlist, then walks ostart..oend and adds a Hotlist Delete for each.
- Make sure we take that hotlist immediately! or at least AFTER we do that to the children.
- Any newly added child, link up the ILists (one child's oconnector oend's right matches the next child's oconnector ostart's left), and take its hotlist.
- Call this OConnector::LinkChildONodes() which also sets our ostart and oend to the first child's ostart and last child's oend (no reason to check if each actually changed first rly)
- Make sure to handle special case: we don't have any ostart set yet
- Then take the hotlist(s)
- Pick a winning child? If we're changing winners, then delete all the prev-ILists from the old winner-child, and insert the OLists of the new winner-child. If we're not changing winners, just take the hotlist of the winner child.
- OConnector::ClearChildren() on the child that lost
- OConnector::LinkChildONodes() on the child that won
- Then take the hotlist(s)
Let's try using an API that does what we need, then finish implementing OConnector to meet that.
First, the amalgoms: Star and Seq
Then, the winner: Or
Finally, the leaves: Keyword and Regexp
- nah, keyword/regexp first :)
Wait, hold up...
For single-ONodes (Keyword, Regexp): The OData is all the "State" you need, let's give it methods:
- KeywordData holds nothing.
- RegexpData holds the amount of matched text; don't care if we're Done or Complete or w/e.
The OData has *methods* for changing its data. For Regexp this means when you RegexpData::SetMatched("abc"), internally it checks if abc == whatever it had before. If it's different, then it knows it *would*
When the State is Done or Complete, then
Getting there.
KeywordData holds nothing. RegexpData holds the matched text.
If you change a Data then its isUpdated flag gets set, necessarily.
It gets unset by the Connector after everything has yielded, the whole tree, so any updates will need to get queued up for Flipping later.
When a parent asks the Data for its info, it tells it its Data IF it is in Done or Complete state, and nothing otherwise. If it's emitting then it will also pass along the "updated" flag, which is only useful if "updated" makes it to the top of the tree but that path was not also newly inserted or ignored along the way.
A composite node needs to know which children were winners before, and which ones are winners now. The ONLY thing that matters here is tracking the winner children. Any children that WERE winners but are NO LONGER winners, we'll enqueue all their PREVIOUS data for removal. Similarly, for any children that are NOW winners (but were not before), we'll enqueue their set of NEW data. (We do NOT take any "deletions" from those NEWLY-winner children.) For any children that were previously winners and are still winners, we incorporate all of their changes (delete anything they say we should delete, add anything they say we should add, continue to notice "just updated" children somehow...)
Note that tree Updates come in depth-ordered, so that should halp.
OConnection construction either takes a starting ONode, or it never gets one.
- If it has an ONode, it can be updated but we need to tell the OConnection about the update
- If it doesn't, then it instead will get updates which are a set of children that we are deferring to instead.
- The OConnection will link up the children for its context
- You tell it which children are deleted and which are added
- But how does it know which "neighbours" to link that child's OConnection list against?
- Because the child nodes are in a list! And if it's the type of node that links its children, presumably they should be ordered in the child ordering! :)
- The child interface kinda sucks so we might need to supply: here are my left and right child *s for any child that is updated lol
- Or the FWTree needs left/right pointers in all its children, that's ok..........
- And we just tell the OConnection: This child is now ignored. This child is now relevant.
- The OConnection can figure out which children are still relevant, and pull their hotlist updates if any.
Now just need OConnection methods to:
* Remove all a child's previous onodes (ignore state and hotlist)
* Accept all a child's actual onodes (ignore hotlist, but only Done/Complete states should be counted here, might get that for free)
- Starting at a specific child, walk along accepting it and all subsequent children, which means taking their hotlists and linking their onodes.
- if any are not Done|Complete, error out, we shouldn't be doing that.... though we *could* just skip them.
- we could do this "just start at the first child and walk across them only taking the done|complete ones" strategy, except that's def slower than what we're doing instead :)
- THIS SEEMS BROKEN. Two things:
1. What if we receive one child update, that brings us bad... then we get another child update, and it makes us good?
- NOT a concern! Silly bear, the children both update before we get their updates.
- YES a concern. We'll receive BOTH those updates from the children as separate Update events, even though we have the same child state from each.
- OConnection will track set<emitting>, set<was_emitting, ostart, oend, auto_ptr<onode>, and hotlist. yup!
2. What happens when we go bad, then? Presumably we still need to tell our parent about our updates (e.g. Del Me Plz!)? Are we managing this State properly?? I don't think I implemented any state checks in OConnection yet...
- The parent takes whatever it likes! This includes recognizing that we used to matter, and have now gone bad.
We'll also have to enqueue all the OConnector updates with the Connector so that it can clear them out between runs. Otherwise specifically the "walking along" updates could pull the same hotlist (with "updates" not just insertions/deletions) twice.
We NEED to deal with the Third Way (Connector): When we delete the only inode, and thus Clear Out the whole Tree.
Seq and Star are in disrepair; both have interesting insights but are both mid-hack.
When does the OConnector incorporate the State of a node? Emitting()ness matters when?????
- think: Keyword, Regexp. The *parent* needs to know when to take the child, and that's based on whether or not the child is Emitting.
Update() should not indicate the child that caused the update. We shouldn't update a node twice. Update all its children that were updated, then Update the parent a single time. This will be faster in the end, less code and complexity, and make it simpler to implement the ONode updates.
- The gain we had wanted was especially that the Lexer's Or should not have to loop every child every character-change. But we're doing that now *anyway*. We'd make much better gains by special-caseing that one node to receive, in a single Update() call, the whole list of children that updated that round, but *even* then, every new token means that *every* child will have updated.. n'est-ce pas?
- NO, I don't think it was for the Lexer's Or... I think it was for Seq's, because it seemed silly not to, perhaps. And then we carried it too far or something?
Let's do that simplification now. It's nearly impossible to implement Or's DeclareWinner() stuff right.
There are a few different OConnection strategies, as we know:
- ONode: we have a node that's always emit. We might flag updates to it, including restarting it.
- Or: There's a winning child. Maybe we should know what that actual *child* is, and we just take from them every time, and delete whatever was winning before had. The winning child tracks its m_wasEmitting.... ya, that's ok.
- Seq/Star: There's a sequence of winning children which starts at our first child and continues through all of our children up to some point that we define. We'll define the end child. DeclareLastChild() should be all it takes.
- The question is whether to INSERT or APPROVE a child. Insert means, we take all its m_emitting nodes and put them in our hotlist as Inserts, i.e. we were NOT emitting this child before. APPROVE means we just take the child's hotlist, because we were already emitting it.
Hmm. So every node is either emitting an ONode, or emitting a single child, or emitting a vector of children. We just wanted to avoid the top node from having to walk that whole tree, by doing all these little merges. Maybe we can just leave that as an optimization for later...
- But we also need to know which items are NEW vs. UPDATED.
- OrState is tracking the LastWinner, I think that's fine.
- SeqState, then, should track LastWinners, a *set*. That way when we DeclareLastChild(), we can just ask: each child, did we already have it.
- That should give us proper "Merging", no delayed walking work for the root node.
- Note that this is definitely slower than the Seq/Star, which is already walking its children, adding the ones it likes onto the list. It's doing more work. But it's kinda simpler. Optimize later.
- Now I'm not sure of any of these changes.
Connector still needs to Flip the updated OConnections. It's doing that correct, right?
FWTree: Tree.
- Has State which knows its Rule, and stores info relating to an instance of Rule processing; set (not read) by the Rule to provide information to parent Rules.
- IConnection
- OConnection
IList: List. Owns some data, and also now knows the FWTree* that owns it (could be NULL -- input char list). That's only for display purposes.
- Has its owning Rule and FWTree node. That's redundant; the Tree node('s State) has its Rule.
- Only wants the Tree node so it can walk the input list.
- Wants the Rule so that IF it's an OConnection that owns a single ONode, it can create the ONode.
- Instead, we clearly have different types of OConnection. Let's parameterize them somehow, or at least subclass maybe!!
OConnectionSingle: owns a single ONode that it constructs. Only receives updates. Up to the Rule to tell us when the node is updated.
OConnectionWinner: Starts with a NULL FWTree* winner. Is given a winner; we notice if it's different than the one before (in which case we Delete the old's old and Insert the new winner's new), or if it's the same (in which case, we fill out our hotlist with the winner's hotlist).
OConnectionSequence: We're given either StartChild(FWTree& child), UpdateNext(FWTree& child), InsertNext(FWTree& child), or NoChildren(). Each time we put ourselves into good m_ostart / m_oend state.
I can kill Hotlist's m_inserted and m_deleted (yay).
One problem: Nodes below an Or can over-run. e.g. keyword OR regexp, allow the regexp to win for many characters, then cause the keyword to win instead. The regex will still be listening to all those future characters! should probably clear it out.
More serious problem perhaps: Let's make sure that when a node goes bad, we don't clear out the state of future nodes (e.g. in a Seq). We want to keep those with their IConnection intact. If the bad node goes good again, we might not have to re-update many of those later nodes.
We want the following operations on Rule:
- Hush the hotlist (ignored for output)
- Prepend a "text" node to our OList (i.e. whenever we're Emitting, start with a "fake" IList that contains text for the next machine to recognize)
- Append a "text" node at the end of our OList
The Compiler's root node will need to contain the Scope globalScope. Nodes will need to inherit some stuff from their parents, e.g. parent scopes.
Compiler doesn't have a super tough job. It receives stuff like:
Its goal is to transform this into bytecode. So is the bytecode our OData? How do we override what it should be?
It needs to reorder expressions, do Scope lookups, type-check, etc.
The actual Compiler Machine tree will look a lot like the Parser's. But we can ditch all the nodes we set as Ignored For Output in the parser, and attach extra activities in the nodes which will produce the bytecode that we (automatically somehow!?) emit as our OList ONodes.
Parser emits:
[cmd blah blah blah]
[(new (type int) (exp (plus 12 17)))]
Compiler constructs tree:
[ Cmd | New ]
| |
cmd... new (type? ..) (exp? ..)
Each Rule provides a default OutputDataType (enum value) and OutputStrategyType (enum value).
- you can provide an override for the default, in the Rule constructor.
OutputStrategy (was: OConnection) has an Update() function that is called by Connector::UpdateNode() after the Rule::Update().
Each FWTree owns an auto_ptr<IList> which might be NULL. The OutputStrategy can only look at it, in its Update(), to decide how to update its info (m_emitting, m_ostart, m_oend) and Hotlist.
The FWTree owns an IList to hold its OData if it indeed has any. An IList MUST have an OData. Either the FWTree has that IList or it doesn't.
The OutputStrategy never owns the ILists, it just points to them. So even an OutputStrategySingle just refers to a single FWTree's IList.... but that means it can easily also get a PrependNode() and AppendNode(), or arbitrarily many of each.
Silent strategy: NULL data
Single strategy: must have data
Winner strategy: data is ignored
Sequence strategy: data is ignored
. Remove Data. Every IList has a Name and a Value. For now, they're both strings, but eventually the Name will be replaced by an Enum value (the "Token IDs" of that stage of the compiler process.)
. Meta currently looks at each IList's Owner's Rule Name. Forget that. Instead it will switch on the IList's Name. It could even check the Value if we want.
Rule has a default Strategy, which you can override.
Leaves are looking at their INodes to determine what to output.
Non-leaves are looking at their children to determine what to output.
Leaves: we want different ways to ignore or accept the INodes.
Non-leaves: we want to "surround" the children with synthetic IList nodes, or go silent, or w/e.
Different strategies for listening to the children.
FWTree does not have its ONode IList. The OutputStategy might have a "main" ONode IList (Single strategy) with varying parameters. It might have start and/or end "caps" with varying parameters (e.g. give each their own IList.Name).
A Rule::Update() might signal to its OutputStrategy: UpdateValue("foo"). The Strategy might care, or it might ignore it, or it might throw an exception ("Rules of type x do not support strategies of type y").
OutputStrategy should know if it takes updates or not. When UpdateValue() is called, if it takes updates and the provided value != current value, take it and add a Hotlist update. Otherwise ignore it. Only the Single Strategy takes updates, and that's separate from the SilentStrategy I guess.
A Star goes bad in the middle, after it was emitting lots of later nodes, what happens to the OutputStrategy of those later nodes? Am I Deleting them out of my hotlist? Note that the only relevant case is where a middle node goes Bad but my whole Star is still in IsEmitting. That's totally a thing that can happen.
OutputStrategySingle defines bool m_wasEmitting, which overrides OutputStrategy's emitting_set m_wasEmitting. Oops.
The bool thing should NOT be necessary. WHY does OutputStrategySingle think it should only put its Insert in the Hotlist when it changes from not emitting to emitting? That's wrong. The parent should detect when it's actually using this child for the first time, and Insert its Emitting() set at that time. The Keyword doesn't need to know when to Insert at all, really.
-- ah, it's because the parent doesn't even get an update if the child doesn't "change".
Here, a keyword:"new" went from 'ne' to 'new'. Its hotlist doesn't change. Its iend doesn't change (NULL -> NULL). Maybe that iend should really just point to the last ichar we care about. We can still be listening to the thing past the iend, just consider iend the last valid INode that we span.
Oops. I made: iend always points to the last INode of interest; never NULL.
PROBLEM: 'ne' to 'new', yay I get the update to the parent. But now I insert make the input: 'new;'. The ; did not change "new"'s iend, so we don't update its parent (the Sequence that has the ; after).
NEW ATTEMPT: iend is always the last INode of interest; either the node that makes us bad, or the last node of interest (if we're at the end of input). Then watching to see if iend changes should handle every case:
- 'ne' to 'new': yep, update, iend changed from 'e' to 'w'.
- 'new' to 'new;': yep, update, iend changed from 'w' to ';'.
- 'new;' to 'new': yep, update, iend changed from ';' to 'w'.
istart and iend should never be NULL. Interesting.... Let's enforce that :)
- We don't need IConnection::Clear().
- When we create a FWTree, it should do the first RepositionNode() immediately, so IConnection can init with a start node.
- After that it takes a Restart(inode), so it's always reasonable. The IEnd might be dumb if it goes bad, but even then, we can make up a reasonable meaning for that and keep it consistent.
- Rename Reposition() to Restart(). We can maybe use the OutputStrategy-style pattern for repositions; seems like only a few possible things could happen here, and while you /can/ implement your own, you'll typically want the Rule default.
- Eventually: One Rule, with different Restart, Update, and Output strategies for them. Cool!
So let's sort that out. Then implement an OS_SILENT.
Then caps for Sequence/Winner strategies (better: for strategies in general?).
Then we can implement our Compiler :)
Between parser_35 and parser_36, why doesn't the "del stmt" get a hotlist from the del and ;? If it does, why does the Or above it (or the * above that?) not get a hotlist?
Something isn't right... what?
Ah. Need to Meta on either the name, or value, or both, of a node.
Also want an easy way to say: output this string (value) from this node.
Regardless of what type of node it is.
- done: OutputFunc_Single.
Rules need to be recursive...... doh!
- Our cute l'il rule trees are not trees; they have cycles.
- That should be easy to fix; just let Rule hold its children by * instead of pointer_vec, and give them all bools: own or no? if owned, delete that child on destructor.
- When we make rules, we'll provide optional bool: recursive. which sets 'owned'.
ditch hotlist resets, just give Connector a hotlist. Insert, Delete, and UpdateListeners too should probably return a hotlist. Insert and Delete need to be wrappers, because UpdateHotlist() needs to insert/delete without clearing the hotlist, so we can accumulate all the updates to give back to main.cpp.
Finally we have a good example of this premonition:
One problem: Nodes below an Or can over-run. e.g. keyword OR regexp, allow the regexp to win for many characters, then cause the keyword to win instead. The regex will still be listening to all those future characters! should probably clear it out.
More serious problem perhaps: Let's make sure that when a node goes bad, we don't clear out the state of future nodes (e.g. in a Seq). We want to keep those with their IConnection intact. If the bad node goes good again, we might not have to re-update many of those later nodes.
We can see this happening now. A Seq updates, and one of its middle nodes in fact is only OK.
- Fixed!
- We still have the Regexp-Overrun problem of course.
Connector will have an OListPool. OutputFunc_Basic and OutputFunc_IValues will insert to the pool. Maybe the pool provides convenience methods to link up list nodes. It has a ptr_set<IList> active, and ptr_set<IList> unlinked. When Cleanup() is called, the unlinked nodes are deleted. Whoever's using the Connector should call Cleanup() after it's done with any hotlist it gets returned from Insert()/Delete()/UpdateWithHotlist() calls.
CHANGESET: Once we have ListPool, we do NOT need OutputFunc::ClearChild(). Get rid of that. But keep the cute l'il change where we re-ordered some stuff in OutputFunc_Sequence().
What is linked to their own last, vs. the "complete" thing that makes them bad? Need to clear out state properly when things go bad and/or when they're actually cleared/deleted.
We absolutely need to pool the STree nodes, not just the OList nodes. The reason to pool the STrees: Say I'm a Seq/Star and I'm clearing out all the nodes past my last-bad-node. I do want to remove them. Removing them means removing all their children as well. Now my OutputFunc needs to know abotu them. Either I can tell it right now during the Compute() (ugh) which children are going away, and let it update some internal state about "oh look here's everything I need to know about them, let's record it somehow before they disappear for reals by the time we actually call our outputFunc()", OR I need the outputFunc() to be able to refer to these cleared nodes (figure out for itself which children are gone and so we need to Delete from the Hotlist), which means the STree lifetimes need to last long enough.
Now that we Pool'd the ONodes, we have every ability to do it with the STrees as well. In fact, we might not even need to pool ONodes if we're pooling STrees. LOL. We can go back to OutputFuncs owning their output IList nodes... but the whole STree will stick around long enough for us to use them for any purposes we could have wanted! STree node is Cleared, it unlinks itself from the Connector's pool. b00m.
When we reposition a node, remove it from listening to its inodes; they've been replaced.
at 26-27, we approved "new", and inserted ";". now eventually, we'll wrongly say that ";" was not emitting, so we don't bother deleting it. That's wrong. Why do we do that...
between parser_18->lexer_28 is where we do a SEQ REPORT and show that our Or:done child was emitting 0 items. hmmm.
- that's its WAS EMITTING. We should be checking its EMITTING instead!!
well maybe. Typically we'd use DeleteChild() to say, let's remove whatever it HAD told us to emit.
So confused about the difference right now...
The IDEA of ClearNode() screws with the emitting/wasEmitting distinction.
Node computes necessarily come bottom-up. Even if you Restart a node, the Compute()s are done bottom-up.
After an OutputFunc(), a node has set its Emitting() from scratch for the set<INodes> it's currently emitting. Its wasEmitting is the set that it was emitting "last time", i.e. last time that OutputFunc() was called. That is the ONLY statefulness in the OutputFuncs(). Maybe we should get rid of it, and allow parents to track which nodes it actually cared to receive from each child. That seems way easier and less sketchy.
We were also relying on the idea that whether you InsertChild() or AcceptChild() a child, in both cases you're taking its FULL SET of Emitting() nodes. Only DeleteChild() cared about the WasEmitting(). Instead let's just get rid of WasEmitting(). The parent will keep a map<Child, set<IList>> which are the nodes it actually took from each child. Then each node is only maintaining its OWN STATE. w00!
How do we ClearNode() a child then? Well, the parent will know how to compute itself properly, taking anything it wants from "live" children, and Deleting all the nodes that it HAD emit (from the map<child, set<ilist>>) but aren't current.
At lexer_38 I eliminate my whole olist. Then at lexer_44 I add the whole thing back. My parser is screwing up (IMO) when I tell it to Insert the node "new" again even though it already has it. Shouldn't I only have to tell it about the del which we've inserted at the end?
This is all just with straight-up input: "new;del" no funniness.
Two things:
1. Cleaning out STreePool causes horrific crash. Why? What is still using the items that I think we should be deleting?
2. Ignore that clear out pool. Insert: new; Then go to the start and insert: del;. We're screwing up the olist somehow. Like the left/right pointers of our lexer's OList are all wrong.
- except log.log's window0 olist looks right. Just our graph looks wrong. hmmm...
- That's our input, silly! Window1 is our lexer's output which is wrong.
- it's unlinking the hotlist that's causing the problem. It screws it up.
adding that "w" at the end of "new" triggers baaaadcrash. Why?
- because our list is messed up again. hmmm...
- we fail *drawing* the parser.
bug repro:
The problem is once again, that we're trying to draw a list that doesn't terminate when it ought. This time I think we're exposing a flaw in the previous "solution". We thought it was ok to band-aid OutputFunc_Sequence() to say, if it noticed that it was deleting the node to the right of its newly-determined OEnd(), then it could just wipe out the oend->right. But that's not sufficient. We need to clear the pointers to any node that we're CLEARING, at the time that we decide to clear it. Just do that, it will solve things.
- might require some care to determine when and how to do that.
Ah! In Seq/Star, when we decide to ClearNode()s, we can just set some stuff right there. The last "sane" child's OEnd()->right = NULL, and (before that) if our laast node's OEnd()->right exists, then set ITS left to NULL.
We don't need to do anything else. No bubbling down work at all.
OK, still have a problem. I'm wiping out those connections in the ComputeFunc(), but it needs to happen in the OutputFunc(). In the Compute() I determine who my last "real" child is. However, the OutputFunc will update all the connections.......hmm I see how the current form is ugly but ultimately it should be doing the right thing.... so I don't know where the problem is.
- That problem seems to be solved now. Let's look at why Moderate fails, when new/del keywords and ID regexp are both present.
We now have a bunch of hacks in place to make Seqs/Stars keel alive and continue to emit from children beyond one that goes bad. This seems to have no effect.
For Star, given input "newdel" and then insert a semi between new and del... we COULD say: hey, I COULD reposition my second child, but instead let me try to make a child in-between to handle whatever there is in between them. I think that's what I want, but I'm not sure it's general enough. I suppose we'd also want to just delete THAT child if its INodes all disappear, too.
But what about Seq? It can't do any of that. If you screw up the sequence, it's all ogre. That's probably okay.
Seems like there IS desire to implement the Star new-nodes-in-the-middle coolness.
Of course, if we want lots of cool error checking, we can just put lots of low-priority possible-sequences everywhere to catch lots of strange possible entries. That's probably what most compiler error handling looks like. Ideally it wouldn't be necessary...
What is the philosophy of OutputFunc()?
- give Connector a hotlist of what insertions, deletions, updates have happened
- Hotlist nodes need to be "connected" correctly (left/right pointers of main list are correct)
- OutputFunc() does not care about ITS state. It might care about its children's IsEmitting().
- It should correctly describe what it's doing.
- We can't know if we were emitting for real. BUT we do know that, if we weren't emitting into the final list, and then we ARE taken, we'll automatically be Inserting everything. AND if we were not emitting for real, then all our stuff will be deleted.
- So we might record state and refer to it (is this our first time outputting, or what nodes were we emitting before) in order to... hmm is that right?
- If we say we're Updating a node that was not Inserted.. that can't happen, it WILL have been inserted when our node was chosen.
- If we say we're Inserting a node that's already in the emitted list... that's bad. How can that happen? it can't!
- When we we're not emitting (to the final list), we're not in the final list, so it doesn't matter what our OutputFunc() does.
- When we're transitioning to being accepted, as long as our Emitting() list is correct, we're good, it should all be inserted by the parent.
- Same with transitioning to not being accepted, it's up to the parent to Delete all the nodes that we WERE emitting (requires care in Winner and Sequence)
- If we were emitting to the final list, and we are still, then we need to be Accepted: Insert new nodes, Update existing nodes, Delete now-gone nodes. THIS is the scenario where we might refer to our previous computation. So storing state in an OutputFunc(), we don't need to worry about other cases. This is the situation of the parents making the calls "oh Insert this child because he's new, Delete that child because he's gone" described in the two points above.
Sequence Compute() might have left us with multiple bad children, OutputFunc() don't care, it don't care if we're emitting or not, we just chain up all the neighbouring children's outputs, and emit everyone we have.
/// Not sure about this:
//When a Sequence Compute() restarts a child, it should tell its previous child: You cannot listen to this INode! (or implicitly, anything to its right.) Different nodes will have to do this differently. Call this SetLastINode(). Winner will have to tell all its children;
Current bug:
insert: new^H^H^Hd
We call Connector->GetFirstONode(). Fails. Why? Because OutputFunc didn't know to clear its olist yet, it hasn't been called.
The basic conflict is that OutputFunc_Sequence links nodes when they're added, and doesn't think it needs to know about nodes that are removed (it DOES break the winner-sequence's OStart()->left and OEnd()->right).. but here we never called it and so it's giving us bad data.
- Can we just call OutputFunc() after the Restart? like always? this will clear it out.
- It WILL, after we DRAW THE GRAPH first. Are we failing when we draw the graph? I think we fail after it if we don't output the graph, right?
Yes, if we don't draw the graph, we fail later.
Um, yeah. We run a Connector::UpdateWithHotlist(), and then ask for its GetFirstONode(), and it gives us junk. So a full Compute() should have happened, including OutputFunc(), right?
- So we add 'd' to Lexer. It takes it, and comes back with a Hotlist of "DELETE:new".
- Then we pass this to the Parser. THAT'S where we mess up.... somewhere.
- Parser takes DELETE:new. First thing it does, is walks the root node's OutputFunc().Emitting(), and from the Connector hotlist, deletes everything that it had ever emit before. Kinda sketch...
- Then it tells the root to clear itself, but it replies: I was already cleared! (really?)
- So we report: Delete done, here's a hotlist of size 2.
- AH! So we had cleared the node, and never ran its OutputFunc() so never reset its OStart()/OEnd(). Broken.
- We can just run an OutputFunc() there after we've cleared all our state; that should be one way to clear out its OStart()/OEnd()/etc. :)
Ah one more thing
Star is working now..... for the Insert case. But if you Delete a child in-between others... how does it know not to *insert* another attempt in-between instead of deleting it out?
Solution: Not great but ok for now. If we find an incomplete middle element, before constructing something new right there, walk across all the remaining children and see if their IConnection CONTAINS the INode. If one matches (does contain the INode), then eliminate all the nodes in-between, and if necessary, restart the guy.
Right meowz: we're only looking at the in-between-nodes' IStart(), instead of doing a CONTAINS check. This might be what's causing the Parser to crash.
Test case: new;del;
new;del; when you type "del" again it crashes.
The crash is in STAR:ComputeFunc. Last thing in the log is investigating child nodes in Star...
OK tried the proposal above. That seemed to... help? Not crashing now I think. But still seeing weirdness.
Test case: new;del;
Maybe solved now by even more complexity/awkwardness in Star. Now the problems are more fringe, can't find a solid testcase though. Is the parser really updating properly?
- We're leaving junk at the end of a Star. Of course, that was "by design"...
2 things:
1. Star should be correct in its final decided State. We're not good if someone's bad.
2. There might only be 1 thing. Subsequent states... should be emit........ but maybe the Lexer should know not to feed them to the Parser? huh. idk.... yeah, like just go ahead, but why are we *growing*?? That was a bug somehow.
Most recent hack is the KeepAll RestartFunc. Stop using that for Star, and then commit.
We're not handling StartOfInput properly. Like at all. Stars are NOT properly able to handle arbitrary child insertions/deletions... they get mixed up INode orderings across children. We need a real solution to that. Hopefully we can just rely on the IStart()/IEnd() boundary of each child.
Nope, let's do it the obnoxious way. Walk across each child's IList, adding each node to a set<IList>. If we find two children whose boundaries don't line up, then keep looking across the rest for the prev's IEnd(). If we find it then clear up to that point, and continue. If we don't, insert a node, and then keep going.
Do we ever clear trailing stuff? Sure. If we go Complete() at some node, and the next node isn't accepted (i.e. we've determined our IEnd()), then in the Compute() we can ........ hmm, it's not clear that we want to be doing this. But let's try it!
... clear out the nodes and let the OutputFunc() do away with their ONodes.
Restart(): implement Restart_KeepAny(). Restarting with an istart, we walk across all our childrens' ILists to try and find it. If it exists somewhere, delete all the children up to that point, and reposition that node if necessary. If it doesn't exist anywhere, then fallback to Restart_FirstChildOfNode().
No, we don't fallback to Restart_FirstChildOfNode(). If we don't find any of our children wants this new istart, then: is the new node BEFORE them all, or AFTER? We can't know. Well maybe if Restart() took a flag hint saying if it's before or after, maaybe we'd know how to populate that correctly.
If the new istart is before all the nodes, then we kinda want to keep them.
If it's after all the nodes, we should blow them away. Otherwise our INodes are no longer in order (probably an algorithm problem), and we might never notice that, our computes will all just be sluggish if they can work at all.
We COULD special-case the root node. By giving it a different Restart function! Little stars throughout can wipe themselves for restarts, I care less. The root: if it's a Restart, we KNOW if it's from an Insert or a Delete at the start of the IList. So we can tell it that in the restart.
Non-root: Restart is caused by a Star or Seq. When we restart a child, just blow it away. See how far that gets us.
ROOT restart: we can tell if we're caused by a left insert or delete! Just check the first child, if it exists and its IStart()->right is the new istart, then we deleted, otherwise assume we inserted :D
- This makes us DEPEND on the idea that we're receiving character-by-character at the start of input. OK for now.
Can we fix OutputFunc first? And do the restarts properly afterwards.
- done. RunStairs: Well, except OutputFunc is a little too optimistic in how it tries to avoid doing too much IList() walking. We're only looking at the node prior to a break to see if it contains the next node's IStart(). However we might have swallowed the next node's IStart() by an earlier child. Not sure of an answer.... but maybe we can somehow delay caring about this next node, until such time as we've actually created a node behind it that looks just like it? That might fall out of what we have now anyway! Or maybe we're left with trailing garbage we can't get rid of.
bug: new;del;
parser. urrrrk everything is wrong..
Are we in a dangerous state when an INode is deleted? When is it REALLY gone? Because we're leaving nodes around that could point to it I think..
INodes: Flag the ones that are being deleted.
Leaf nodes: Need to check which INodes are being deleted. That includes end nodes! Deleted nodes will have "valid" left/rights. So you can get the nodes we should respect. You can get the new IEnd that way, but if your IStart changes, Clear the node!
Non-leaf nodes: Need to check if any of your children have been cleared!! If they've been cleared, better avoid 'em. If all your children are cleared, then you're cleared I guess. If your first child is cleared (and you're a Seq/Star), then declare yourself Cleared as well. Root shouldn't have to worry, Connector deals with that (it doesn't go through UpdateListeners()).
hm if any of an Or's children are clear, uhhh just clear it and wait for further instruction. lol
Changing the interface.
Connector receives a SINGLE Insert, Delete, or Update to the IList.
It finds the relevant listening nodes, and tells them what happened: IList changed, I/D/U.
The node does its Compute() which is aware of what the change was.
This Compute() may enqueue one or more actions onto other nodes. It tells the Connector: Restart child x at IList y. Or update my Parent that I've accepted INode q, or deleted INode r. One at a time. We're just enqueueing those, the Connector will determine who to update next. Might even tell a Parent: Clear me, Delete me, I'm useless.
What about OutputFunc()s?
- get compute-specific type of data from the node (e.g. winner)
- but these are generally working.
- OutputFunc determines an OutputState. OutputState is the set of ONodes, or set of children, that a node is emitting.
- We can compare two OutputStates to get the Hotlist differential between them. For any child, that means doing the same thing recursing down the tree.
- The Connector will store the OutputState of all relevant nodes, so we have the "previous" to compare against. This lets us tell which nodes are newly considered or not.
- So this way the OutputFunc() itself is stateless, and doesn't even need to be called when we're updating the tree.
- When we're done a full tree update (Connector::I/D/U), we ask the root to give us its OutputDiff. We know which of its children have actually changed (we know that from the I/D/U that occurred). For any of those we ask for their OutputDiff. Eventually we find nodes .....
Recursive. Each node gives us an ordered list of things to do:
- Insert(ONode)
- Insert(child)
- Approve(ONode)
- Approve(child) -- recurse down,
- Delete(ONode)
- Delete(child)
Note: let's see if we don't need to look at a deleted inode's left and right!!
that would be nice. We already know which nodes are listening to it, so...
just update those nodes, we tell them that it's being deleted, and that should
be it!
Connector::Insert(const INode& inode) {
if (m_root->IsClear()) {
Enqueue(RestartAction(m_root, inode));
// Note: maybe enqueueing a Restart automatically sets the node's IStart (before), and runs a Compute (after)?
// Or whatever Connector/STree does already for a Restart!
// - it's STree that does it, it does a few things we'll want to do
} else {
listener_set listeners = m_listeners.GetListeners(inode);
for (listener_iter i = listeners.begin(); i != listeners.end(); ++i) {
Enqueue(INodeInsertAction(inode, *i));
Connector::Delete(const INode& inode) {
if (m_root->IsClear()) {
throw SError("Cannot delete INode; root is already clear");
} else {
listener_set listeners = m_listeners.GetListeners(inode);
for (listener_iter i = listeners.begin(); i != listeners.end(); ++i) {
Enqueue(INodeDeleteAction(inode, *i));
Keyword::Restart(const ConnectorAction& action) {
Clear(); // Wasn't even necessary before.. but let's do it here now, for simple.
Keyword::Compute(const ConnectorAction& action) {
// IStart is set appropriate (precondition)
// The incoming action will be about an INode we may or may not care about.
// We DO care about the INode that makes us bad.
// We need to set our State, set our IEnd, Listen to INodes,
// and allow our OutputFunc() to determine which ONodes to emit.
Star gets update from a child. Does it get them all at once?
No, just a single update, does its full update, then gets the next.
OK, but in the Star update it's just gonna decide to reposition one or two of the children, so then their update was unnecessary.
That's an optimization which we CAN solve later, and don't need to. We might recompute it multiple times saying "x child changed" when it didn't (i.e. it already updated in a way that dealt with or oviated that claim). No problem for now.
- IF there's no left or right, just clear the root (optimization) :)
- Tell listeners of the deleted INode
- Tell listeners of the node preceding the deleted INode
- Tell listeners to the left
- WHY did we used to tell listeners to the right? Let's try not doing that for now :)
- IF THERE'S NO LEFT: just Restart the root. Restart() should know how to intelligently handle restarts. And if a Seq/Star wants to actually really restart a child node, it can Clear and then Restart it.
ConnectorActions can all be a single non-polymorphic struct, use the enum.
walk through Restarts, then Computefuncs, bringing everybody up to speed.
It is now the node's RestartFunc's responsibility to connector.UnlistenAll(*this), IF that's what it wants to do!!!
ProcessActions() is confused. Someone's going to be Enqueueing while we're still processing an action, and deleting it, from the priority queue.
When we Restart() a node, we want to also insert it for Compute(). I want EVERYTHING to go through the queue.
- Solved now. It just pops.
We can cleverly avoid computing children of a Seq/Star needlessly. Suppose we have multiple children that received an update. We figure out which one is the first-est of the children (this is tricky), and update that one only, then Compute the parent, and if it does a Clear and/or Restart of a child that would otherwise have Computed, we do the Clear/Restart/Compute instead. This is an optimization we can do later, it doesn't save us that much and is quite a bit tricky.
Sequence Restart:
- If we have any children, then we're putting the provided IStart as preceding all of them.
- If we're clear, then create the first child by starting it at this IStart.
- When we go to restart a child, Clear it first and then Restart it, unless we want to provide it a node preceding the nodes it already has.
- isClear = false
- IConnection.Restart(inode)
- restartFunc()
RestartFunc must:
- Set state as appropriate (no! Ideally, let the Compute() do this!!)
- Create/Restart/Update children as appropriate (err, Enqueue those requests)
- UnlistenAll() and/or listen/unlisten to inodes as appropriate
- Enqueue its own Compute when appropriate (this is a guess; maybe depends on the rule?)
Seq/Star Restart/Compute, once creating a child, will abort their routine early to let the child compute happen, which will then flip back to them.
Or, however, can tell ALL of its children to do something (e.g. Restart) before giving control back to the Connector (to update the children all before calling the Or's Compute).
Added IConnection.Size(). ComputeFunc() needs to determine if, in terms of IConnection, it grew or shrank. Tell this to the parent if we changed, so that it can do something intelligent.
Initial Insert:
- Connector::Insert()
- MakeRootNode()
- Enqueue(Restart, m_root, inode)
- Connector::ProcessActions()
- (root)->RestartNode(inode)
- STree::RestartNode()
- isClear = false
- IConnection.Restart(inode)
- restartFunc()
- Rule::RestartFunc()
- Implement for everyone!
- Default RestartFunc:
- assert: Rule has no children
- assert: Node has no children
- UnlistenAll()
- Enqueue its own Compute()
- Rule::Compute(inode, initiator)
- Implement for everyone!
- Keyword: probably fine as-is!
Tracking IConnection size kinda sucks. But we want to tell a Seq/Star: did the child shrink or grow. Do we? Is that really necessary? It would be cool if we can know that. And we CAN know that. Still.....
The assumption we were going to make is that knowing if an INode was Inserted or Deleted would be helpful way up the chain. Not so much. Inserting an INode could cause a node to shrink in terms of IConnection size. Then up the chain, its Seq shrinks on that child, so it needs to know that. Cool, I think we're doing the right thing by tracking IConnection size.
Now it's time to program the rule ComputeFuncs() for simple Rules. We need to fix all ComputeFunc()s, and also the RestartFunc()s of Or/Seq/Star.
To deal with complexity, it would be nice if each node had 4 functions:
- Restart
- Process (structure)
- Compute (state)
- Output
Or at least, conceptually arranged itself like that. We're back to keeping state in the node, about its processing (its # children and what states they're in generally, to help us determine what state we're in with only looking at the child that notified us of an update). That state can be cleanly wrapped up as going from Process to Compute.
The ProcessState should have an "unknown" state which it can use when it knows it's not done figuring itself out. Just because. That will help us render things correctly in graphs at least.
Or's ProcessState keeps those four vecs, and maps each child into which one it's in.
- actually, they don't need to be vecs at all. Make them sets. And we lookup which one our updated child is in, and make sure it's only in the right one(s).
Seq/Star: Their ProcessState might just be "what's the last child node that is complete"; not exactly sure though.
Let's try to structure the Computes to do the right thing, and then see what reasonable "ProcessState" abstraction might exist, if any.
It's sooooooo sketchy that we don't enqueue the MakeNodes(). The separation of Process vs. Compute is REQUIRED now, and that the Compute works ok even though other later computes will be happening as soon as we say MakeNode..... uhhyyyyy.
- so just abort that compute early! we only need it to Process, not Compute in this case.
Now fix OutputFuncs:
- Each node determines its OutputState, which is the set of nodes and/or children that it wants to emit
- These aren't stored by the node/OutputFunc, they're stored by the Connector:
map: SNode -> [Set of ONodes and/or other SNodes]
- At the end of an Insert/Delete/Update (or whenever we want, e.g. after a full UpdateWithHotlist), we have the ONode list we previously emit. And we know which SNodes we've actually looked at as part of the I/D/U. Recursively from the root:
- if node is not in the list of SNodes we looked at, abort.
- diff the ONodes it's emitting against the list of previously-emit-ONodes-for-that-SNode that the Connector has.
- likewise for child SNodes: Delete any that were prev emit and aren't anymore, Insert any new ones, and recurse down the Updates.
Removed Connector's m_hotlist. It doesn't need that. Individual OutputFuncs don't have hotlists anymore either.
Call Connector::Insert(), Delete(), Update(), or UpdateWithHotlist(). It will do all the processing. You can call them multiple times if you like.
Sometime, call Connector::ExtractOutput(root, UPDATE). This will:
- set m_prevOutputByNode = m_outputByNode;
- DetermineOutputRecursive(m_prevOutputPerNode, root, UPDATE);
- recurse down the tree, only looking at nodes that are in m_touchedNodes (including the root -- no-op if it was not touched).
-- unless we're in INSERT or DELETE mode, in which case we don't care if it was touched or not.
- Call OutputFunc() on every node we examine. This updates m_outputByNode for that node.
- for ONode changes, insert/delete the hotlist as appropriate. What about UPDATE???? seems like only IValues() does this. Might need to put something weird into the OutputState...
- for STree changes, decide how to approach that child node. If it was removed, then we call a different function that will walk down all the m_prevOutputByNode(that top deleted node), deleting everything we find and recursing down children, note that we do NOT call OutputFuncs for these.
- If it was inserted, then we again call a magical function to walk down those nodes, this time on the m_outputByNode (not prev), inserting everything we find. Except this time we DO call OutputFunc(). So really it's just this function with an INSERT instead of UPDATE param.
- If a child was just updated, then keep recursing down the normal way.
- Now we can tell which ONodes changed, and which SNode children changed too. If an SNode child is newly added, we recurse down it with INSERT instead of UPDATE flag, meaning all nodes under that subtree will be inserted into the hotlist. Similarly for DELETE.
- set m_needsCleanup = true
In Connector::ComputeOutput(), I had to copy the OutputState into newState early, and not at the end of the fx when I insert into the map. WHY??? That's pretty weird. OutputFuncs aren't supposed to be doing a whole lot, so how could they have invalidated our state? Or is it because of the recursive calls? Or is a node getting cleared? We crash just with "new;" input if we copy newState late instead of early.
Why the awkward spurious graphs during computes? Maybe let's pick better times to draw graphs, and also determine that we're doing the right sequence of computes/restarts across the board.
Definitely some bugs with OutputFunc. We're not linking up ONodes properly (we get a second ";" inserted which doesn't have a left anymore apparently I guess? so we just restart the parser for no reason...)
Updates are weird because: first compute on keyword new, does not change its state (the node starts ok and is ok after its first update). Since it doesn't change, we don't notify the parent to recompute. Only the del and ; will cause it to compute. That's sketchy.
Not sure what's the right way to do this. Maybe we put isClear into State instead of STree. Clear is a State (station), and we start there even if a node starts as done instead of ok. Because it should still need a Compute() to get to that. I guess? Seems weird.
In Connector::ExtractHotlist() we ComputeOutput() which will have called m_root->GetOutputFunc(). So why did this not set an OStart?? (Can tell because the DrawGraph should have drawn the OList/hotlist in that case, but it didn't)
- Or is not calling Basic output func at right time it seems??
OutputFunc::operator() needs to populate the OutputState: the OList nodes and STree children that would be output by that node. The Connector will recurse down as appropriate to call the OutputFunc() on those children. Then ConnectONodes() gets called, which can just link/set OList node starts/ends.
Stop Connector::Delete()'s new fancy attempts to just outright clear stuff if inode.right is null. actually, don't do that. We might not even need to update the inode's directly listeners anyway (just inode left... n'est-ce pas? in what situations otherwise?). Instead, the Seq or Star up above should notice when one node's IEnd().right doesn't exist, and clear everything after that point just because. I thought we had that? Was there something wrong with that?
- nurr, NO, I'm uncomfortable with leaving nodes with bad IStarts. Let's just clear those nodes k..
IConnection has a tentative start.
Restart sets the tentative start (IConnection::Restart() does that). Computes should call ConfirmStart() IF they're in a not-bad state.
IStart/IEnd must only be set by nodes that are in a not-bad state; going bad should wipe it out.
- IConnection Clear: means no IStart/IEnd. But it could have a Restart.
- call Restart to set a restart start, and clear the start/end. You can't ask for start/end, but you can ask for its TentativeStart().
- In STree::ComputeNode(), if it is clear but has a TentativeStart, sets the start/end/size=1. Then calls Compute().
- If it does not have a TentativeStart nor existing start/end, then it's cleared, and the ComputeFunc() is not called.
- warn in this case, because I don't think we call ComputeNode() on cleared nodes, but we could if we want to.
- ComputeFunc(): Has a start necessarily. Sets an End if it finds a good one. Don't need to call SetEnd() or anything if it goes Bad.
- ComputeNode(): After ComputeFunc(), if the state is Bad, then Clear the IConnection. Just outright Clear it, not even a Tentative anymore. I think.
Bad nodes need to know their start at least. Because otherwise, on this sequence:
there's a del node that's bad with "d;" but now it's "de;" and it needs to know it started at d and is trying to match something.
Pending state is probably a good idea.
- Pending means: I'm waiting for my children, before I can compute my state.
- A parent should NEVER see a child in Pending state, it means the child is borked.
- So then what about when a Child gets cleared? then it should be delibrately in a Cleared state.
- Nodes can start as Cleared, then Restart moves them from Cleared to Pending!!
ComputeFunc: CLEAR semantics regarding clearing of children etc. plz
- and if multiple children are updated, we don't know which order they'll make us Compute. That's worrisome.
Initiator child was cleared. Star and Seq should have different behaviours to deal with that I think.
Sanity check: When we unlink an INode, warn about all the nodes that refer to it, and ERROR if we do a full UpdateWithHotlist() and still have anything point to an unlinked INode
CURIOUS: Connector::Delete(), checks IF the deleted INode is a listener's IStart, then it Clears the node. Otherwise (if it's not the listener's IStart), it just sends the listener an IDelete message.
- I'm not sure this is consistent with anything else, but it makes sense.
STree clear means: the node is totally cleared out. No children, no state info, jamais.
IConnection clear means: I have no IStart or IEnd. Restart sets both. IConnection::Clear() will unset them.
State Clear is not a thing. You should have checked the STree node if it was clear before looking into its state.
State Pending means: My state is waiting on a computation that my children must be performing. A parent should NOT see this in a child; that's an error.
We could share: EVERY node's Compute, when it discovers that its first child is clear, clears itself. We could wrap that out if we're sure it's universal.
- NO. That's not what we want from a top-level Lexer *. We delete the first character, it deletes the first node, then the Star says: I don't care, I start at the new list start now kthx.
- Is that actually how we want things to be handled when it's not the start of the whole IList?
RISK: when Computing a Seq/Star, we look at an Initiator node, and possibly its prev/next, and assume that those are in an OK shape, including their IStarts and IEnds. That's a bit aggressive.
OK, we're not in terrible shape.
- eliminate RestartFunc. I think it's better, simpler, cleaner.
- Determine all the scenarios for a Star.
- would be nice if a node's children's updates coming in, had a deterministic ordering
- Are we still ok telling Star/Seq just that a child grew/shrank? Do we also want to know if it was an INode Insert/Delete that caused it? Do we want to just have INode-order-comparable children?
Yay we're in good shape!
Node types: Keyword, Regexp, Meta, Or, Star, Seq
Any node receives:
- Restart
- RestartBehind (param: amount)
- RestartAhead (param: amount)
Leaf nodes receive:
- INodeInsert
These update the root and/or leaves (listeners):
- Root is clear: Restart (top-down)
- INode has left, no right: INodeInsert on the left's listeners
- INode has right, no left: RestartPrepend the root (top-down, arbitrarily)
- INode has left and right: INodeInsert on listeners of both left and right
Intermediate nodes receive:
- Child update, no size change:
- Child grows:
- Child shrinks:
nurr, start/restart:
Connector decides to Start a node, by enqueueing a Start action
STree::Start initializes some stuff, and enqueues a Restart action, with size of 0
Restart actions are just a ComputeFunc(). The node should check what children it has, if any, and correct them, maybe particularly if action==Restart and resize==0 (meaning we're newly Started).
Nodes might Clear themselves in response to INode updates. STree::ClearNode calls Connector::ClearNode which auto-enqueues a parent update. A node cannot be Started unless it is Clear, so that leaves the possibility of re-Starting a node (e.g. the root). All good.
Seq/Star nodes may Restart children. They can just enqueue a Restart with the appropriate size, which shouldn't be 0.
Star ComputeFunc:
1. If we have no children, that should mean it's a Restart action with resize==0. MakeNode first child.
2. Restart with resize<0: Prepend a new node that starts at the provided inode.
3. Restart with resize>0: Kill nodes up until the node that contains the mentioned size. Restart that node with an appropriate resize amount.
4. ChildUpdate: if resize==0, no worries.
if resize<0, child shrank, so create a new child after it.
if resize>0, child grew, so kill nodes up until the node that contains the initiator's new IEnd, and restart it with an appropriate resize amount.
Real bugs. Real hacks. Let's go back to the drawing board with the new Seq/Star actions.
Ah! IConnection has ISize. But ConnectorAction resize amount is: How much did a node's IEnd change location? That's an entirely different number. And all nodes need to report it accurately.
Keyword/Regexp/Meta: Need to report this number aware of the INode Insert/Delete/Update they received.
- wrong: need to think this reeally through
- if we didn't have a winner, then our resize-export is the winner's ISize.
- if we changed winner, then our resize-export is the size of the new winner minus the size of the old winner, + their . (But what if it grew from an Insert? uhy...)
Gets multiple updates for multiple children... so how to report this number accurately? Doesn't each node update multiple times, so then..... what do we do?
Suppose we get a batch of INode inserts/deletes. We need to be careful about which one(s) we're inserting into the OrderList at a time, since we're relying on their neighbours to ascertain their position in the tree.
- the tree knows the old ordering. the node knows the new ordering. trust the tree. REALLY?
- YES. An UpdateHotlist has nodes that are newly-connected; any INode's left or right might be another new INode that we haven't Inserted yet. They won't point to newly-deleted nodes; those will be conspicuously absent.
problem is not OutputList. On this input:
it fails because the Star's IStart is still "e" for some weird reason. "ew" are still floating around, and "d" is not yet our IStart. why?
- no, Star's IStart is not still "e". We're just drawing the list starting at "e" because that's the INode in question when we draw the OutputList, and while the root has no IStart.
- The picture looks funny AGAIN when "d" is inserted, for the same reason!
Bad nodes have IStarts/IEnds. They're not supposed to? But how are we supposed to clear them? They don't Listen either, so they won't necessarily get the update when their INodes are deleted.
1. Decide who needs to receive updates. I think there's a distinction between "strong listening" and "weak listening". Weak listening: a Complete node's IEnd, or a Bad node's IEnd. If a node is inserted or deleted to the right of that INode, we don't care.
- is that right? isn't it just, if an INode's left listener is bad or complete, don't update it with an INodeInsert?
- When a node is inserted in-place, tell the STree node that it can Unlisten from the subsequent nodes it doesn't care about anymore.
So we're updating too many nodes. Not a huge crime.
2. The "top node" may have different output behaviour than intermediary nodes.
This has to do with what the consumer wants: do they care to know that our output is "bad", and which parts of it are questionable? Presumably they do.
Other work:
3. Tests
Root isn't outputting
IncParser::ExtractBatch()... can we clean that up
- IncParser has m_outputPerNode, a map<const STree*, OutputState>.
- ComputeOutput_Update (_Insert, _Delete) destructively changes m_outputPerNode -- just the parts with nodes we've cared to look at this round. We'll leave a bunch of OutputStates alone in that "tree".
Batches: Satisfy the contract please. Going into the batch, need each item to pretend like it exists against the list that actually would exist at that point.
- Coming out of the batch, see if we can UpdateWithBatch() (rename to ApplyBatch()) one-by-one, then.
I think we can eliminate ConnectONodes(), and just do the "connecting" implicitly when we add things to the batch.
OutputState has a set of onodes and a set of children. Replace this with a vector of either-onode-or-child.
Batch should probably have Insert-pos on each element (or at least the Inserts).
INode Insert(const List& inode, INode pos)
- clones+owns the inode, and returns the INode handle of the clone
- wires up our clone as necessary to InsertNode()
void Delete(INode inode)
- wires up the INode (we already own it) as necessary to DeleteNode()
void Update(INode inode, const string& value)
ApplyBatch(const Batch&)
- Let's let these be the raw nodes that we don't own, coming out of an IncParser
- but... then we can't apply it.
Design 1: BatchAdaptor
- who owns it? Either the output IncParser, or the application, or the input IncParser
- It knows const List* of output, and maps those to the INodes of input (so it has to supervise the Apply or at least Inserts/Deletes on the inbound).
When we output a Batch, it's: here are the incremental updates, using their own addresses.
When we actually Insert a node, it gets a new address.
Maybe an OutputBatch and an InputBatch are different objects!
OutputBatch we have. Input Batch: you have to transform it by replacing nodes, doing a mapping as we apply the updates.
OK, we can do that. In ApplyBatch we maintain that table. Uuuuugh, kinda sucks. Also we'll do it for Inserts too. When you say "Insert this node" and we clone it, we'll also keep a map from that inserted node to our version.
But then why have our own version at all? Memory ownership. murrrr....
We don't need to give back the INode at all, then. We'll just make our internal clone, but maintain the mapping ourselves. That's.... ok!
Because I didn't like those INodes anyway.
Why do nodes output things they've never output before, when they turn bad. Shouldn't we just preserve things they were already emitting, but not output things that are new, if we're bad.
Let's mark each OutputItem with the state of its owning STree node regarding that item: good, ok, bad, etc. That way the OutputFunc itself can say: ah, this item, it WAS ok but now it's bad, we'll mark it as such. Or: we've never emit it before and it's bad, so we won't emit it now.
This might get expensive, and the consumer might not care. Oh well, let's give it a shot.
Order the State::Stations: Pending, Bad, OK, Done, Complete. The OItem's owning STree node slaps on the station it attributes to it. Then going up the tree, we might overrule the item's station (on the Child instead of the ONode) with the min of the two.
We seem to be in a hilarious situation:
- compute CO_Update no matter what the worst_station is
- abort from CO_Insert before computing OutputFunc() if the worst_station is bad enough (including that node)
Both are right and both are wrong !?!?
after the Compute pass, there should be no concerns about lingering memory. We own all our Output nodes.
- uhhhh but we delete things, so make sure we tell people if we've deleted some output node. nurrrr
- maybe we stay in the ownership pool *until* we tell people about it? but that leaves the problem, it will just stay in the pool forever.
- are Removes really more "important" than adds? if a thing goes bad, we should still traverse down into it but just removing anything it wants removed??????
Yes, Removes are stronger than Adds/Updates. There are two ways to go about it:
1. Traverse down the whole tree, and add/update are the only ones affected by worst_station
2. When we clear out a node, push ownership of its output nodes up to the parent. ugh, "indefinitely in the future" this.
-- uhh I think that's what the code is doing!? so let's investigate.
How to deal with "overdones" (err, when we have extra stuff at end of input that isn't listened to by anything?) Well, the Root Node should listen directly. If the Root Node's actual real one child isn't handling it, then the root node takes the rest of the INodes in, and so long as there are any, it's Bad. If they get cleared out, then its state is the main child's state.
- but we're not using RootNodes in any of the tests. Sooooo... don't test a grammar going past the end?
- we NEED to test Root node though, because I bet its behaviour is wrong; its code has an awkward "this is wrong" message!! And I bet it doesn't handle going way past end of input with nonsense characters properly..... and what happens if we get into that state just by changing some stuff in an otherwise-decent node?? humph.
Ambiguous Star:
grammar: a*
a # done at a, or complete at (nil)
aa # done at aa, or complete at a
OutputFunc needs to respond to specifically the parts of the Compute that changed.
- Reason: (cat, _d_, car). We just want to say "cool a new bad node in between cat and car. don't eject car.".
- But this means OutputFunc is not stateless, or rather, its "state" is the children that actually changed.
- That's pretty happy for Or too. Maybe just "what children decided they're
relevant" is what we want here, a set of that.
- but if a child of a Seq/Star changed, then all the subsequent children might have changed.
- ah but if that child has changed and is bad, leave the subsequent output nodes.
- but if the child changed and is ok........ well, we should still have the subsequent nodes in our output, right? so keep them?
I'm not sure I'm comfortable with any of this. Because what if it's Complete with only a subset of its children?
Let's write more test cases first.
I need a good way to express test cases for this whole system.
Note: We're determining *state* correctly. Everyone knows if they're pending/bad/ok/done/complete just fine, at least I think it's fine so far.
The only problem is who outputs when.
Known problem case: Star last testcase, a -> aa -> a. Maybe this is only because we aren't using a real Root.
But we need to review all that logic. ugh
I think there's something different about being bad and owning the end of input, vs. being bad in the middle. At the end, more tolerant to keeping bad stuff in the output list.
Star test cases:
_ done
b complete
bc error: should not go over-complete (how is this handled?)
_ done
b complete
_ done
a done, emit ins:1:a
ab complete
a done
_ done, emit del:1:a
a done, emit ins:1:a
ba can this happen? I don't think so??
_ done
a done, emit ins:1:a
ab complete
aab complete, emit ins:2:a
ab complete, emit del:1:a (say we del'd the first one)
b complete, emit del:1:a
_ done
a done, emit ins:1:a
aa done, emit ins:1:a (we inserted at the beginning; CAN we do that?)
aaa done, emit ins:3:a
aaba complete, emit del:3:a (unless we're at the end??)
aaa done, emit ins:3:a (if we removed it above??)
aba ?
abaa ??
a a
a ?
and now tests where we just change the state of children (rather than inserting/removing them):
_ done
a:ok ok
a:bad complete
a:done done, ins:1:a a
a:bad complete, del:1:a _
a:complete done, ins:1:a a
a:co a:co done, ins:2:a aa
a:bad a:co done, del:1:a, del:2:a _
a:ok a:co ok
a:done a:co ??? error, should not happen
dgcat del dog cat
gcat emits cat (why?)
and more importantly, it leaves the root ("Splash") with IStart at the dead node, "d".
The reason is that this is the case where we've deleted the start node of a Star (that had multiple nodes), so it moves its IStart forward, but that "restart from the bottom" does not propagate upwards.
Let's make that an event that can propagate upward: hey I really want to start here now.
- shouldn't that just be a "hey Restart me please"? Is "Restart" sufficient, or do we need to StartNode?
- Let's review the whole Start/Restart logic, I'm not sure our separation there is good enough
- e.g. not clear what should happen when this Star-that-moved-itself-forward is a child of another Star/Seq.
- done by IncParser::Insert on the Root, if the IList is empty
- done by Rule::MakeNode to start up an STree child node
- performed by STree::StartNode which just does IConnection.Restart(istart) and then enqueues a Restart
- setting the IStart here is just cosmetic, i.e. it's for DrawGraph. The Restart will also set the IStart.
Root test cases: Root -- a*
More Star test cases
Splash test cases
Lexer should be perfect.
- Then try a C lexer!
Problem case:
new;del; 117
new;dnel; 118-130
new;dneel; 131-134
new;dnewel; 135-136
new;denewel; 137-143
new;delnewel; 144-184
new;del;newel; 185-212
new;del;new;el; 213-
it dies there: the lexer is emitting the latter 'new;del;' but the pos of the
"new" is a node that we previously deleted, it seems.
- what is that pos? Is it the ""-Basic "silent" ONode of the ; from the first del; ?
It's the del that was emit in the beginning, then destroyed by bonking in the "new".
The "del" Or goes bad and deletes its ONode. Then it gets fixed later and emits the del ONode again.
- Note that that's the same ONode that's emit! I think. But the listener creates its own recipient node for it? And that becomes a different thing maybe?
- not quite...
- I think we're just not updating the (cached vertically) ONode list.
After we delete 0x1508818, the next time we use it is as a behind_node for an or:complete.
AH! When we set behind_node, it's processing "lexer" (the Star), and then its first Or. That Or sets its OEnd first to the OEnd of "new", then the OEnd of ";", then the OEnd of "del". Even though those ONodes aren't even being emit!
That strategy works only for Seq/Star, but not for Or nodes. Not sure how anything worked rly.
Only set behind_node after the latest ONode we've found. We could compare all those ONodes, if we have an OrderList for them (do we?). Or we can think of something more legit....
- Get last ONode from an STree node
- Get the last OutputItem from an STree node
- which is semantically last, not just last in the list! wait... no..... hmmmm
- semantically last STree node, then actual last OutputItem
First/Last ActiveChild? (winner, last of seq/star) ugh
nonono. that's not what's happening at all.
We were emitting just "new ;"
Then when we change to:
We have: Root -> lexer(*) -> (or:new) (or:;) (or:del) (or:new) (..bad..)
So the lexer(*) changes from emitting 2 to 4 things
But its parent, the root, is still bad, so we don't actually output the "del new".
new x;del y; new WS ID ; del WS ID ;
new xdel y; new WS ID WS ID ;
segfault. did we double-remove ';' for no reason or smthg?
why do we delete EIGHT things when the user removes the first semi?
- those are from the PARSER. In response to the LEXER giving us 5 deletes, 3 inserts.
lexer_300: emits ID y
lexer_337: emits terminal ;
lexer_392: emits del(semi1) del(semi2) del(del) WS(@new) del ID del SEMI
OP_DELETE 0x216b3d8 semi1
OP_DELETE 0x2171058 semi2
OP_DELETE 0x216d768 del
OP_INSERT 0x216bf18 (List WS:) @ 0x21682c8 (List ID:)
OP_DELETE 0x2177748
OP_INSERT 0x2177eb8 (List ID:) @ 0x216bf18 (List WS:)
OP_DELETE 0x2178728
OP_INSERT 0x216f988 (List ;:) @ 0x2177eb8 (List ID:)
Parser looks bad in several ways. Let's try without that wonky new* in its middle, although that can't be the entire problem.
at parser_181, we're deleting "del" INode. But some things don't tell their parent to update right when I would expect...
new x = 5 then backspace. get invalid pos for insert (pos was recently deleted).
kousu case:
Need a Star to say: if a child doesn't advance anywhere along the input, then we're bad! No holds barred.... do not go past a child that goes nowhere.
1. Display: keep bad nodes?
- ParserWindow.cpp: implement Update
- I'm definitely corrupting values somehow...
2. Nick case, at least understand what we should do
You can’t perform that action at this time.