## Installing the necessary dependencies

Lark is a parsing library for defining and parsing formal grammars, Anvil Uplink is a library for interacting with web applications built with Anvil, and Colab-Env is a package for accessing environment variables in Google Colaboratory.

In [2]:
from IPython.core.display import clear_output

!pip install -q lark
!pip install anvil-uplink
!pip install colab-env --upgrade
clear_output()

## Environment settings

In [3]:
import anvil.server
import colab_env
import os

Mounted at /content/gdrive


In order to add the environment variable to our drive:

```python
colab_env.envvar_handler.add_env("TEST", "HELLO WORLD!", overwrite=True)
```

In [4]:
!more gdrive/My\ Drive/vars.env
clear_output()

In [5]:
anvil.server.connect(os.getenv("KEY_ANVIL_PL"))

Connecting to wss://anvil.works/uplink
Anvil websocket open
Connected to "Private link to branch master" as SERVER


## Parser Grammar

1. **?start:** This rule sets the starting point for parsing. In this case, it is set to command, which means that a command is the starting point for parsing.

2. **?command:** This rule defines the two possible commands in the grammar: printResult and assign. The -> syntax is used to assign a semantic action to each output, which is executed when the syntax is parsed. In this case, it is used to generate a specific output for each command.

3. **?expression:** This rule defines the different forms of expressions in the grammar, such as additions, concatenations, repetitions, strings, numbers and variable names. The outputs are defined in a specific order, which determines the evaluation priority.

4. **?sum_rest:** This rule defines the addition and subtraction of expressions. There can be multiple additions and subtractions chained together, and they are evaluated from left to right.

5. **?concatenation:** This rule defines the concatenation of expressions by using the * operator. There can also be multiple chained concatenations, and they are evaluated from left to right.

6. **?repetition:** This rule defines the repetition of expressions by using the * operator. There can be multiple chained repetitions, and they are evaluated from left to right.

7. **?atom:** This rule defines the different types of atoms in the grammar, such as strings, numbers and variable names.

8. **%import:** This rule is used to import terminals (tokens) from the PEG common library. In this case, CNAME (for variable names), ESCAPED_STRING (for strings) and SIGNED_INT (for integers) are imported.

9. **%ignore:** This rule is used to ignore whitespace in the input during parsing.


In [6]:
grammar = r"""
    ?start: command

    ?command: "imprimirResultado" "(" expression ")"    -> imprimir_resultado
           | NAME "=" expression                        -> asignacion

    ?expression: sum_rest                             -> expression
               | concatenacion                        -> expression
               | repeticion                           -> expression
               | STRING                               -> expression
               | NUMBER                               -> expression
               | NAME                                 -> expression
               | "(" expression ")"                   -> expression

    ?sum_rest: sum_rest "+" concatenacion             -> suma
             | sum_rest "-" concatenacion             -> resta
             | concatenacion                         -> suma_resta

    ?concatenacion: concatenacion "*" repeticion      -> concatenar
                  | repeticion                        -> concatenacion

    ?repeticion: atom "*" repeticion                  -> repetir
               | atom                               -> repeticion

    ?atom: STRING                                    -> cadena
         | NUMBER                                    -> numero
         | NAME                                      -> variable
         | "(" expression ")"                        -> parentesis

    %import common.CNAME -> NAME
    %import common.ESCAPED_STRING -> STRING
    %import common.SIGNED_INT -> NUMBER
    %import common.WS
    %ignore WS
"""

## Tree construction

This function uses the Lark module to parse an input according to a specified grammar, collapses the ambiguities into the resulting syntax tree, and then displays the tree in a PNG plot using matplotlib. The tree image is saved to a file and displayed in a display window.

Example:
```python
plot_trees('imprimirResultado(test*1)$')
```

In [7]:
from lark import Lark, tree
from lark.visitors import CollapseAmbiguities
import matplotlib.pyplot as plt
import anvil.media

@anvil.server.callable
def plot_trees(line_code:str, grammar:str = grammar, start:str='command'):
    parser = Lark(grammar=grammar, start=start, ambiguity='explicit')  
    parsed = parser.parse(line_code)
    trees = CollapseAmbiguities().transform(parsed)
    for t in trees:
        tree.pydot__tree_to_png(t, filename='/content/gdrive/MyDrive/server_programming/img/tree.png', rankdir='TB')
        plt.figure(figsize=(64,64))

        # Load the generated image into memory
        return anvil.media.from_file('/content/gdrive/MyDrive/server_programming/img/tree.png', 'image/png')

## Lexical analyzer

It is responsible for identifying and classifying tokens in a line of input code.

Example:
```python
tokens = lexical_analyzer('imprimirResultado(test*1)')
if tokens:
    print(tokens)
```

In [8]:
import re

# Definition of regular expression patterns for tokens
tokens = [
    ('KEYWORD', r'imprimirResultado'),
    ('ASSIGNMENT', r'='),
    ('PLUS', r'\+'),
    ('MINUS', r'\-'),
    ('MULTIPLICATION', r'\*'),
    ('CONCATENATION', r'\-\>'),
    ('INTEGER', r'\d+'),
    ('IDENTIFIER', r'[A-Za-z_][A-Za-z0-9_]*'),
    ('STRING', r'"[^"]*"'),
    ('OPEN_PARENTHESIS', r'\('),
    ('CLOSE_PARENTHESIS', r'\)'),
    ('WHITESPACE', r'\s+'),
]

@anvil.server.callable
# Function for performing lexical analysis
def lexical_analyzer(line_code:str):
    tokens_found = []
    position = 0

    while position < len(line_code):
        match = None

        # Try to match with each regular expression pattern
        for token_name, pattern in tokens:
            regex = re.compile(pattern)
            match = regex.match(line_code, position)
            if match:
                value = match.group(0)
                tokens_found.append((token_name, value))
                position = match.end(0)
                break

        # If no valid token is found, display an error message
        if not match:
            print(f"Error: Invalid token at position {position}: '{line_code[position:]}'")
            return []

    return tokens_found

#### So the notebook is always available to the web app

In [None]:
anvil.server.wait_forever()

## Bibliography

- https://chat.openai.com/
- https://lark-parser.readthedocs.io/en/latest/_static/lark_cheatsheet.pdf
- https://medium.com/towards-data-science/structure-and-automated-workflow-for-a-machine-learning-project-2fa30d661c1e
- https://medium.com/@mbednarski/creating-a-programming-language-part-2-23015630cd01
- https://www.youtube.com/watch?v=ivWp6XTtFjo
- https://anvil.works/docs/overview
- https://anvil.works/learn/tutorials/google-colab-to-web-app
- Inestroza Engineer Class 3, 4. (2020, July). [Video].
