Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Permanent syntax tree representation in background #800

Open
codemanyak opened this issue Jan 20, 2020 · 1 comment
Open

Permanent syntax tree representation in background #800

codemanyak opened this issue Jan 20, 2020 · 1 comment

Comments

@codemanyak
Copy link
Collaborator

codemanyak commented Jan 20, 2020

A consistent syntax tree representation of all element texts (or just the contained expressions) ought to be provided.
This allows to concentrate on a clean and consistent syntax analysis, also in order to improve the performance. Parts of it are ready but need more elaboration. And it must be tested with regard to memory and time complexity. The repeated tokenizations and concatenations and the frantic search for a point where certain matching and replacements should ideally take place without spoiling all what had been transformed before or will have to be transformed thereafter could be avoided this way.
If in the event there will be a central point in all generators where built-in functions are to be handled then this will be a big achievement allowing the requested clear documentation. (At least for a while...)
Some of the benefits would be:

  • Built-in functions and procedures as well as operators can unambiguously be identified, which would improve Executor performance and generator functionality (e.g. addressing demands like in BASH-Code-Generator - improvement #237).
  • As the syntax trees are to be derived (or updated) only when an element is changed or on first Analyser inspection or first execution or code generation (in a lazy initialization approach), it is expected dramatically to reduce syntax analysis time (similar to the already achieved diagram drawing speed by permanently caching the highlighting patterns).
  • Instead of the current guesswork labyrinth there would be a structurally consistent and transparent operation.
  • It might be way easier to add new generators.

There are of course several challenges, too:

  • Nassi-Shneiderman diagrams are by design meant to be syntax-free, so there will always be texts not (or only partially) being convertible into syntax diagrams of some kind. (What to do here? Handle them as competely untranslatable text or try a partial analysis? Maintain the old complex and unstructured analysis approaches in parallel for such cases?).
  • Structorizer in particular supports several syntactical flavours, especially regarding explicit and implicit declarations - shall we reflect them in the syntax diagrams or canonicalize them on this occation (the latter might cause irritating Analyser reports, and Executor or generator error messages)?
  • How to deal with context influence on the type inference and so on (the declaration or initialization of variable already used in some existing element text might latter be inserted, which may e.g. reinterpret the meaning of some operator symbol etc.)? How to identify such impacts and their scope?
  • Is it better to represent an entire text line (including commands, separating keywords, and declaration stuff or just the expressions strewn into a line?

Originally posted by @codemanyak in #462 (comment)

This also relates to several internal issues.

@codemanyak codemanyak self-assigned this Jan 20, 2020
@codemanyak codemanyak added this to the Release 3.31 milestone Jan 20, 2020
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Apr 14, 2020
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Oct 20, 2020
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Oct 30, 2020
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Nov 1, 2020
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Nov 1, 2020
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Nov 1, 2020
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Nov 3, 2020
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Dec 10, 2020
@codemanyak codemanyak modified the milestones: Release 3.31, Release 3.32 Mar 1, 2021
@codemanyak codemanyak modified the milestones: Release 3.32, Release 3.33 Sep 19, 2021
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Dec 5, 2021
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Dec 7, 2021
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Dec 13, 2021
…tegrated in Line, first steps to type retrieval in Root
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Dec 13, 2021
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Dec 22, 2021
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Jan 4, 2022
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue May 23, 2022
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Nov 8, 2023
@codemanyak
Copy link
Collaborator Author

codemanyak commented Nov 10, 2023

Remark: Possibly it was not the best idea permanently to hold parsed lines on the elements, in particular since not all element text can be parsed and even small modifications in other elements or diagrams can invalidate any syntax tree derived on former diagram status. A very reasonable compromise seems to be to store the element text as lexically split token lists where the whitespace isn't mixed among the tokens but managed separately. This makes superfluous a lot of to-and-fro conversions, preserves original spacing without affecting token indices and it accelerates parsing a lot. On this occasion, the user-configurable "key phrases" (parser preferences) that may consist of several lexical tokens (like jusqu'à) can be represented by fix internal tokens, which make refactoring obsolete, since the internal key will always be the same, only on display and editing the user-specified keywords are to be shown. A parser preference modification will only require a drawing refresh (like with controller routine aliases). The task of Executor, Analyser and code generators will be facilitated a lot. They can make use of (ephemeral) syntax trees where it convenes. They may concentrate on the expressions embedded in the element text lines.

codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Nov 14, 2023
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Dec 18, 2023
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Feb 27, 2024
codemanyak added a commit to codemanyak/Structorizer.Desktop that referenced this issue Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant