-
Notifications
You must be signed in to change notification settings - Fork 0
/
uLanguageDefinition2.h
322 lines (268 loc) · 14.7 KB
/
uLanguageDefinition2.h
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
//---------------------------------------------------------------------------
#ifndef uLanguageDefinition2H
#define uLanguageDefinition2H
#include <list>
#include <set>
//#include <vcl.h> //why?
#include "config.h"
#define DRGXTOKENIZER_TEMPLATED
#include "tokenizer/tokenizer.h"
//#include "uStack.h"
using namespace drgxtokenizer;
//---------------------------------------------------------------------------
namespace SHEdit
{
/*!
* LanguageDefinition
* ==================
* Parser is not a good name for this class because what is does is basically just a lexical analysis.
* Parser's task is to parse lines and draw them onto screen. Perser keeps an instance of ParserState at each endline, which specifies in which state parser was at that point. Each time buffer is altered, the component is supposed to call Parser::ParseFromLine function, to tell the Parser to reparse text from given line. Parser then parses code and updates ParserStates of lines it goes over until it reaches a ParserState that is same as the Parser's own. Parser does not parse text right away, but pushes the items into 2 queues (normal queue and priority queue). Parsing is done when Parser::Execute method is called.
* Parser::Execute() first sorts both queues, and then picks the tasks one by one (first from the priority (redraw) queue). Each time it picks a new task, it checkes its screen position to find out whether to paint the line or just parse. In the former case it also initializes formats (reconstructs it from markupStack and searches through the Iterator handled markup. Then it copies the line's parser State into it's own, calls Parser::ParseLine and continues to do so until some conditions are met. On Each line, the queues are checked and corresponding tasks removed. If priority queue is not empty, and parser has gone out of visible area of screen, then current line is pushed onto non-priority queue, and next task from priority queue is picked. When all priority tasks are reparsed, Parser calls Paint, to actually refresh all what has been painted. If non-priority queue is non empty, parser places its callback into Application's OnIdle event handler and parsing of non-priority queue continues later. Such parsing is stopped each PARSEINONEGO (currently 100) lines, to allow Application to get from "Idle event loop". Non-priority queue is parsed until all tasks under a specified Parser::upperbound are parsed. Upperbound is now set to be component's line Iter + 1000 lines. Reason is mainly so that inserting of " followed by another " few seconds later does not throttle CPU on chasing the already reparsed results of such edits.
* Parser::ParseLine() parses line char by char using its LanguageDefinition. Each char is given to the LanguageDefinition with current lexer position, and is returned new lexer position altogether with state. Upon the state depends what is done next - if special state was matched, then it is taken cared of appropriatelly - every time current char is pushed into String Parser::actText, where it waits until some "matched" state is returned, when it is drawn. ParseLine() also checks changes in markup. Parser ensures that no word is broken in half or just a substring is matched as entire word using langdef's IsAl and IsAlNum functions (by standard convention - if you do not want it to behave standartly, then override the 2 functions). Also by default Parser allows skipping of white characters during search for matching entries in the dictionaries. The white-character is defined by IsWhite function of langdef. White character skipping can be turned of directly by a bool in LanguageDefinition.
* 7. LanguageDefinition
* =====================
* Describes a sort of a deterministic PEG/LL(1) grammar. That is - everything is greedy and without any backtracking. Left recursion is not accepted. Duplicit prefixes are accepted and automatically merged, but there's no guarantee that one version wont shadow another.
*
*
* Node ----------- Node --- ... Every node is either of type nonterminal, lambda, tail or terminal.
* nonterm term On a terminal node we just look at the leave index, and jump to the next node...
* | On a nonterminal node we do the same, but on the recursive index and also we push the current
* Node -------- Node --- ... Node onto stack. Every nonterminal node owns a lambda node, which starts the rule.
* lambda \ nonterm Lambda node is just an auxiliary node - practically it is never encountered during parsing.
* \ | x End - after all we dont need this type...
* \ Node --- ... x
* \ lambda Tail recursion works as the nonterm type, but does not preserve the current position.
* \ ... Indexes are constructed to contain the lambda closures.
* \ / That means, that it maps tokens to jump locations (Nodes).
* Node ------- Node --- Node --- Node
* term nonterm term nonterm Every nonterminal node represents a rule of a grammar, and
* SELECT what_clause FROM from_clause thus may be referred to by other rules.
* | | Nonterminal node can be imagined as a graph of references to
* Node --- ... ... other Nonterminals and simple terminal Nodes (which represent
* lambda ... words (keywords, ...) of a grammar)
*
* Whenever there is no possible jump, we just return one level higher. Thus if you want to highlight an unparsed
* code, you can do something like (lowercase denotes nonterminals, uppercase terminals):
* command_block -> command error COMMAND_SEPARATOR
* command -> <your correct grammar>
* error -> EVERY | SINGLE | TERMINAL | OF | YOUR | GRAMMAR <except for the COMMAND_SEPARATOR
*
* Lexical analysis is performed by the tokenizer, which is a member object of this class. The token ids are used to uniquely identify the tokens. The tokenizer accepts basic algebraic regular expressions, always preferres the longest match and highest token id.
*
*
* */
class FontStyle;
//---------------------------------------------------------------------------
class LanguageDefinition
{
public:
class NTree;
private:
WTokenizer tokenizer;
std::locale loc;
int ids;
enum NType {ntTerm, ntNTerm, ntLambda, ntEnd, ntTailrec};
enum TokType {tS, tC, tE}; //string, choice, end?
class Node;
friend class Node;
struct Term
{
bool remember;
int tokid;
bool getstyle;
bool gather;
bool call;
FontStyle* fs;
std::wstring name;
Term(const std::wstring& n, FontStyle* fs_, int i, bool g, bool c) : name(n), fs(fs_), tokid(i), getstyle(g), call(c){} ;
bool Eq(const Term& n){return tokid == n.tokid;};
bool Cq(const Term& n){return tokid == n.tokid && remember == n.remember && getstyle == n.getstyle && call == n.call && fs == n.fs ;};
};
struct NTerm
{
int ruleid;
std::wstring name;
FontStyle* fs;
bool gather;
bool call;
Node* node;
NTerm(const std::wstring& n, FontStyle* f, int i, bool g, bool c);
bool Eq(const NTerm& n){return ruleid == n.ruleid;};
bool Cq(const NTerm& n){return ruleid == n.ruleid && gather == n.gather && call == n.call && fs == n.fs;};
};
struct Rec
{
Term* t;
NTerm* nt;
bool term;
const std::wstring& GetName()const{return term? t->name : nt->name;};
bool operator<(const Rec& r)const{return GetName() < r.GetName();};
Rec(Term* t_) : t(t_), nt(NULL), term(true){};
Rec(NTerm* t_) : nt(t_), t(NULL), term(false){};
Rec() : t(NULL), nt(NULL), term(false){};
bool GatherFlg(){ return term ? t->gather : nt->gather;};
bool CallFlg(){return term ? t->call : nt->call;};
int Id(){ return term ? t->tokid : nt->ruleid;};
};
struct Node
{
NType type;
Rec r;
std::vector<Node*> nextnodes;
std::map<int, Node*> lftidx; //leave index
std::map<int, Node*> recidx; //recursive index (for nonterminals)
Node() : type(NType::ntLambda), r(), nextnodes(), lftidx(), recidx(){};
Node(NType t) : type(t), r(), nextnodes(), lftidx(), recidx(){};
Node(NTerm* t) : type(NType::ntNTerm), r(t), nextnodes(), lftidx(), recidx(){};
Node(Term* t) : type(NType::ntTerm), r(t), nextnodes(), lftidx(), recidx(){};
Node* Add(const Node& n); //returns a pointer to a used node (either a new node, or an already existing done)
bool Eq(const Node& n); //equal (indistinguishable by perser)
bool Cq(const Node& n); //congruent; more power needed here - we should check the rest of the tree too
bool operator==(const Node& n);
void Finalize();
void ExpandLambda(std::map<int, Node*>& index, Node* next);
};
std::set<Node*> nodes; //temporary - for construction (and maybe for destruction?)
std::vector<Term*> terms;
std::vector<NTerm*> nonterms;
std::map<std::wstring, Rec> index;
std::set<std::wstring> symtab;
Node* global;
Node* entering;
std::wstring emptstr;
void ParseString(std::wstring& rule, std::vector<Node*>& endnodes);
TokType GetToken(std::wstring& str, std::wstring& val, bool eat);
TokType GetToken(std::wstring& str, std::wstring& val);
void Construct(std::wstring& rule, std::vector<Node*>& endnodes);
void DestroyTree(NTree* n);
public:
struct NTree
{
typedef std::vector<NTree*> container_type;
container_type childs;
const std::wstring* value;
int id;
NTree(int ident, const std::wstring* v = NULL) : childs(), value(v), id(ident){};
const NTree& operator[](int i) const;
};
struct StackItem
{
Node* ptr;
NTree* tree;
StackItem(Node* n, NTree* t = NULL) : ptr(n), tree(t){};
void AddValue(NTree*);
};
struct PState
{
PState() : st(){};
std::vector<StackItem> st;
};
const static int Auto = -1;
enum flags {tfCall = 1, tfGetStyle = 2, tfGather = 4, tfGlobal = 8, tfCaseSens = 16, tfEntering = 32};
void AddTerm(const std::wstring& name,FontStyle*,const std::wstring& rgx, int id, int flags);
void AddNonTerm(const std::wstring& name, FontStyle* fs, int id, int flags);
void AddNonTerm(const std::wstring& name, FontStyle* fs, const std::wstring& rule, int id, int flags);
void AddRule(const std::wstring& name, std::wstring rule);
virtual void GetStyle(int id, const std::wstring& val, bool& draw, PState & s, FontStyle *& fs){};
virtual void Call(int id, PState & s){};
virtual void EraseState(int identif){};
void Finalize();
template<class IT>
void Parse(IT& from, IT& to, PState&, bool& stylechanged, FontStyle*&fs, bool& draw);
LanguageDefinition(const std::locale& loc = std::locale());
~LanguageDefinition();
private:
void Push(PState& s, Node* ptr);
void Pop(PState& s);
};
//---------------------------------------------------------------------------
template<class IT>
void LanguageDefinition::Parse(IT& from, IT& to, PState& s, bool& stylechanged, FontStyle*&fs, bool& draw)
//the stylechanged refers to the inherited styles; the fs of the terminals is returned with false (even when set)
{
if(s.st.empty())
// s.st.push_back(StackItem(entering->r.nt->node)); //not working (we are not prepared (to encounter Lambda))
s.st.push_back(StackItem(entering));
IT a(from);
int tokid;
tokenizer.NextToken(from, to, tokid);
bool goingup = false; //signalizes to search leaveindexes instead of recindexes
bool retried = false; //signalizes that we do second pass from the entering clause
while(true)
{
const std::map<int, Node*>& ref = s.st.back().ptr->type == ntNTerm && !goingup ? s.st.back().ptr->recidx: s.st.back().ptr->lftidx;
std::map<int, Node*>::const_iterator itr = ref.find(tokid);
bool skip = false; //goto or not goto, that's the question
if(itr == ref.end())
{
if(!goingup && !retried)
{
//if we are here, we are pointing either on the entering nonterminal or on a terminal
itr = global->recidx.find(tokid);
if(itr != global->recidx.end())
{
if(itr->second->type == ntTerm && itr->second->lftidx.empty() && itr->second->recidx.empty())
return;
Push(s, global);
skip = true;
}
}
if(!skip)
{
goingup = true;
stylechanged |= s.st.back().ptr->type == ntNTerm && s.st.back().ptr->r.nt->fs != NULL;
Pop(s);
if(s.st.empty())
{
s.st.push_back(StackItem(entering));
std::wcout << "popped from the stack!" << std::endl;
if(retried)
return;
goingup = false;
retried = true;
continue;
}
else
{
continue;
}
}
}
if(!skip)
{
if(s.st.back().ptr->type == ntNTerm && !goingup)
{
Push(s, s.st.back().ptr);
stylechanged |= s.st.back().ptr->r.nt->fs != NULL;
}
}
if(itr->second->type == ntNTerm)
{
s.st.back().ptr = itr->second;
continue;
}
else
{
s.st.back().ptr = itr->second;
fs = s.st.back().ptr->r.t->fs;
const std::wstring* sym = itr->second->r.t->getstyle || itr->second->r.t->gather ? &*symtab.insert(std::wstring(a,from)).first : NULL;
if(itr->second->r.t->getstyle)
{
FontStyle * tmp = fs;
GetStyle(tokid, *sym, draw, s, fs);
stylechanged |= tmp != fs;
}
if(s.st.back().ptr->r.CallFlg())
Call(tokid, s);
if(itr->second->r.t->gather && s.st.size() > 1 && (s.st.end()-2)->ptr->r.GatherFlg())
{
(s.st.end()-2)->AddValue(new NTree(tokid, sym));
}
return;
}
}
}
}
//---------------------------------------------------------------------------
#endif