# Forgather

A notebook for experimenting with Forgather's syntax.

In [1]:
import sys, os
modules_path = os.path.join('..', 'src')
if modules_path not in sys.path: sys.path.insert(0, modules_path)

from pprint import pp, pformat

from IPython import display as ds

from forgather.latent import Latent
from forgather.config import ConfigEnvironment
from forgather.preprocess import PPEnvironment
from forgather.codegen import generate_code
from forgather.yaml_encoder import to_yaml
import forgather.nb.notebooks as nb

# Show common syntax definition.
with open(os.path.join('..', 'docs', 'syntax.md'), 'r') as f:
    display(ds.Markdown(f.read()))

# Forgather Syntax Reference

Forgather defines a domain-specific language for the dynamic construciton of Python objects using a combination of Jinja2, YAML, and a few extensions.

This guide will focus on the extensions to these languages. For details on YAML and Jinja2, see:

- [Jinja2 Template Designer Documentation](https://jinja.palletsprojects.com/en/3.1.x/templates/)
- [YAML 1.1](https://yaml.org/spec/1.1/)
- [PyYAML Documentation](https://pyyaml.org/wiki/PyYAMLDocumentation)

## Jinja2 Extensions
---
### The Preprocessor

There is a custom Jinja2 preprocessor which implemnts an extended version of Jinja2's [Line Statements](https://jinja.palletsprojects.com/en/3.1.x/templates/#line-statements). These are implemented via regex substition, where the match is converted to normal Jinja syntax.


- \#\# : Line Comment
- \-\- : Line Statement
- << : Line Statement w/ left-trim
- \>> : Line Statement w/ right-trim
- == : Print Command
- '=>' : Print Command w/ right-trim

Example Input:

```jinja2
## If 'do_loop' is True, then output a list of numbers.
-- if do_loop:
    -- for i in range(how_many): ## Loop 'how_many' times.
        == '- ' + i|string
    -- endfor
<< endif
```

Is translated to:

```jinja2
{# If 'do_loop' is True, then output a list of numbers. #}
{% if do_loop: %}
{% for i in range(how_many): %}
{{ '- ' + i|string }}
{% endfor %}
{%- endif %}
```

Output, when passed: do_loop=True, how_many=3
```yaml
- 0
- 1
- 2

```


Normal Jinja2 syntax works just fine too. I just find that the normal syntax is visually difficult to parse (without syntax-highlighting) and is awkward to type.

More Formally

```python
line_comment = r'(.*)\s+#{2,}.*'
line_statement = r'\s*(--|<<|>>|==|=>)\s(.*)'

Substitutions:
{
    '--': r"{% " + re_match[2] + r" %}
    '<<': r"{%- " + re_match[2] + r" %}"
    '>>': r"{% " + re_match[2] + r" -%}"
    '==': r"{{ " + re_match[2] + r" }}"
    '=>': r"{{ " + re_match[2] + r" -}}"
}
```

---
### Jinja2 Globals

A number of globals have been introduced to the Jinja2 environment to assist with pre-processing.

- isotime() : Returns ISO formatted local-time, with 1-second resolution ("%Y-%m-%dT%H:%M:%S")
- utcisotime() : As with isotime(), but UTC time.
- filetime(): Generates a local-time string suitable to be concatenated with a file-name. ("%Y-%m-%dT%H-%M-%S")
- utcfiletime() : As filetime(), but in UTC time.
- now() : Get datetime.datetime.now()
- utcnow() : Get datetime.datetime.utcnow()
- joinpath(*names) : Join a list of file-path segments via os.path.join()
- normpath(path) : Normalize a file path; os.path.normpath()
- abspath(path) : Convert path to absolute path; os.path.abspath()
- relpath(path) : Convert a path to a relative path; os.path.relpath()
- repr(obj) : Get Python representation of object; repr()
- modname_from_path(module_name) : Given a module file path, return the module name
- user_home_dir() : Return absolute path of user's home directory  
- getcwd() : Get the current working directory
- forgather_config_dir() : Get the platform-specific config directory for Forgather.

The following functions from https://pypi.org/project/platformdirs/
- user_data_dir()
- user_cache_dir()
- user_config_dir()
- site_data_dir()
- site_config_dir()

---
### Custom File Loader

A custom loader, derived from the FileSystemLoader, is defined. This loader has a syntax for splitting a single loaded template file into multiple sub-templates.

The primary use-case for this syntax is [template inheritance](https://jinja.palletsprojects.com/en/3.1.x/templates/#template-inheritance), which disallows multiple-inheritance. If you inherit from a template and include a template which is derived from another, Jinja2 does not allow you to direclty override blocks from the included template. You can get around this by creating another template, which overrides the desired blocks, and is included by the top-level template.

Normally, this would require creating another template file, but who needs that!? That's much more difficult to work with.

```jinja2
## This is the main template
-- extends 'base_template.jinja'

## Override block 'foo' from 'base_template.jinja'
-- block foo
    -- include 'foo.bar' ## Include the sub-template
-- endblock


##--------------------- foo.bar ---------------------
## This is a sub-template named 'foo.bar'
-- extends 'some_other_base_template.jinja'

## Override block 'bar' from 'some_other_base_template.jinja'
-- block bar
    ## ... stuff
-- endblock
```

More formally, the syntax for splitting a document is:

```python
split_on = r"\n#\s*-{3,}\s*([\w./]+)\s*-{3,}\n"
```

Note: You can't split a template defined via a Python string, as this bypasses the Loader; only file templates may be split like this.

---
## YAML

### Dot-Name Elision
YAML does not have a way of defining an object, without also constructing it. This can be inconvienient, as it may not be known ahead of time where the first use of an object will be and YAML requires that the defition occur at this point.

To work around this, if the root-node is a mapping, we delete all keys containing strings starting with a dot. Once the object has been defined, YAML does not care if we delete the original definition/instance. My convention is to use ".define", but any name, starting with a dot, will work.

By convention, the primary output object of such a mapping is named "main"

```yaml
# Define points
.define: &pt1 { x: 0, y: 0 }
.define: &pt2 { x: 5, y: 0 }
.define: &pt3 { x: 0, y: 5 }

main:
    # A list of lines, each defined by a pair of points.
    - [ *pt1, *pt2 ]
    - [ *pt2, *pt3 ]
    - [ *pt3, *pt1 ]
```

Constructed graph...

```python
Latent.materialize(graph)

{'main': [[{'x': 0, 'y': 0}, {'x': 5, 'y': 0}],
          [{'x': 5, 'y': 0}, {'x': 0, 'y': 5}],
          [{'x': 0, 'y': 5}, {'x': 0, 'y': 0}]]}
```

While not apparent from the representation, the points in the lines are not copies, they are all references to the original three points from the definition. There are only three point objects present in the graph!

---
### YAML Types

Of the standard YAML 1.1 types, only those which can be implicilty (without specifying the tag) are supported

YAML 1.1 Tag : Python Type / Examples
- !!null : None
    - null
- !!bool : bool
    - True
    - False
- !!int : int
    - 2
    - -6
- !!float : float
    - 2.0
    - 1.2e-4
- !!str : str
    - "Hello"
    -  world
- !!seq : list
    - \[ 1, 2, 3 \] 
- !!map : dict
    - { x: 1, y: 12 }

The following standard types are presently unsupported:
- !!binary
- !!timestamp
- !!omap, !!pairs
- !!set -- TODO: Implement me!

---
Complex types are instead supported through Forgather specific tags:

#### !tuple : Named Tuple

Syntax: !tuple\[:@name\] \<sequence\>

Construct a named Python tuple from a YAML sequence

```yaml
!tuple:@my_tuple [ 1, 2, 3 ]
```

---

#### !list : Named List

Syntax: !list\[:@name\] \<sequence\>

Construct a named Python list from a YAML sequence

```yaml
!tuple:@my_list [ 1, 2, 3 ]
```

---

#### !dict : Named Dictionary

Syntax: !dict\[:@\<name\>\] \<mapping\>

Construct a named Python dict from a YAML mapping

```yaml
!dict:@my_dict
    foo: 1
    bar: 2
    baz: 3
```

---
#### !var

Syntax: !var "\<var-name\>" | { name: \<var-name\>, default: \<default-value\> }

This declares a variable, which can be substituted when the graph is constructed.

```yaml
point:
    x: !var "x" # Define a variable named 'x'
    y: !var # Define a variable named 'y' with a default value of 16
        name: y
        default: 16
```

---
#### !singleton

Synatx: !singleton:\<import-spec\>[@\<name\>\] (\<sequence\> | \<mapping\> | ({ args: \<sequence\>, kwargs: \<mapping\> }))

This is a callable object with only a single instance; any aliases refers to the same object instance.

```yaml
# Construct three random ints, all having the same value.
- &random_int !singleton:random:randrange:@random_int [ 1000 ]
- *random_int
- *random_int
```

Constructed...
```python
Latent.materialize(graph)

[247, 247, 247]
```

The "SingletonNode" will generally be your 'go-to' for constructing objects, as the symantics mirror what is expected for YAML anchors and aliases.

However, there are a few exceptions...

---
#### !factory

Synatx: !factory:\<import-spec\>[@\<name\>\] (\<sequence\> | \<mapping\> | ({ args: \<sequence\>, kwargs: \<mapping\> }))

This is a callable object which instantiates a new instance everywhere it appears in the graph.

```yaml
# Construct three random ints, all (probably) having different values.
- &random_int !factory:random:randrange [ 1000 ]
- *random_int
- *random_int
```

Constructed...
```python
Latent.materialize(graph)

[99, 366, 116]
```

---
#### !lambda

Synatx: !lambda:\<import-spec\>[@\<name\>\] (\<sequence\> | \<mapping\> | ({ args: \<sequence\>, kwargs: \<mapping\> }))

This returns the entire sub-graph as a callable. This can be used when a callable needs to be passed as an argument.

```yaml
# Compute powers-of-two from a list, returning a list.
!singleton:list
    - !singleton:map
        # The generated object is equivalent to: "lambda arg0: pow(arg0, 2)"
        - !lambda:pow [ !var "arg0", 2 ]
        - [ 1, 2, 3, 4 ]
```

Constructed...
```python
Latent.materialize(graph)

[1, 4, 9, 16]
```

Note that any positional arguments are implicity converted to the variables named \[ 'arg0', 'arg1', 'arg2', ... \]

---
### CallableNodes


SingletonNode, FactoryNode, and FactoryNode are all instances of the abstract-base-class "CallableNode." A CallableNode can call any Python function, including class constructors. As Python differentiates between positional args and kwargs, making use of both requires the following syntax:

```yaml
!singleton:random:sample
    args:
        - ['red', 'blue']
        - 5
    kwargs:
        counts: [4, 2]
```

Generally speaking, you can omit the explict 'args' and 'kwargs' names, as long as the syntax is unambigous.

```yaml
- !singleton:torch:tensor
    - 2
    - 2
- !singleton:random.binomialvariate { n: 1, p: 0.5 }
```

---
#### CallableNode Tag Syntax

The part of the YAML tag after the first ':' provides the information required to locate and import the requested Callable.

In the simplest case, a [built-in](https://docs.python.org/3/library/functions.html) Python callable just needs to specify the name of the built-in.

```yaml
!singleton:tuple [ 1, 2, 3 ]
```

When the Callable is defined in a module, a second ':' is used to seperate the module name from the name within the module.

```yaml
# See: https://docs.python.org/3/library/operator.html
!singleton:operator:mod [ 365, 7 ]
```

You can also dynamically import a name from a file.

```yaml
# See: https://docs.python.org/3/library/operator.html
!singleton:/path/to/my/pymodule.py:MyClass [ "foo", "bar" ]
```

When using a file-import, which itself has relative imports, you will need to specify which directories to search for relative imports:

```yaml
# See: https://docs.python.org/3/library/operator.html
!singleton:/path/to/my/pymodule.py:MyClass 
    args: [ "foo", "bar" ]
    kwargs:
        submodule_searchpath:
            - "/path/to/my/"
            - "/path/to/shared/modules/"
```

By specifying multiple locations, the import system treats all of the directories in the list as a union, thus you can perform a relative import from any of these directories.

---
#### Named Callable Nodes

CallableNodes may be given an explcit name. The name servers the same purpose as the YAML anchor/alias, but PyYaml does not make this information available through the tag API. While feasible to hack PyYaml, doing so is risky. For now, there is a somewhat redundant interface for specitying node names.

When a node has been assigned an explicit name, it will always be rendered as an explciit definition in the Python and Yaml code generators, as to improve readability. Doing so is entirely optional.

A callable node's tag may end with '@\<name\>' which will assign a name to the node.

```yaml
.define: &foobar !singleton:dict@foobar
    foo: 1
    bar: 2
    baz: |
        She sells sea shells
        by the sea shore
main:
    - *foobar
```

When rendered as Python:

```python
def construct(
):
    foobar = {
        'foo': 1,
        'bar': 2,
        'baz': (
                'She sells sea shells\n'
                'by the sea shore\n'
            ),
    }
    
    return {
        'main': [
            foobar,
        ],
    }
```

And without the name, the object definition becomes anonymous:

```yaml
.define: &foobar !singleton:dict
...
```

```python
def construct(
):
    return {
        'main': [
            {
                'foo': 1,
                'bar': 2,
                'baz': (
                        'She sells sea shells\n'
                        'by the sea shore\n'
                    ),
            },
        ],
    }
```

---
## Create Config Environment

A configuration environment is required to construct configurations from YAML/Jinja2 inputs; it conains the infromation needed to located Jina2 templates by name as well as defining the global variables available to templates.

```python
from forgather.config import ConfigEnvironment
...
ConfigEnvironment(
    searchpath: Iterable[str | os.PathLike] | str | os.PathLike = tuple("."),
    pp_environment: Environment = None,
    global_vars: Dict[str, Any] = None,
):
```

- searchpath: A list of directories to search for templates in.
- pp_environment: Override the default Jinja2 environment class with another implementation.
- global_vars: Jinja2 global variables visible to all templates.

In [None]:
env = ConfigEnvironment()

## Define Input

A configuration document consists of a combination of YAML and Jinja2 syntax. Typically, a config template would be loaded from a file, but for testing we can create a template directly from a Python string.

Both the Jinja2 template and the configuration may accept variables.

In [None]:
document = """
-- set model_src = '../model_src/bits/'

main: !singleton:{{model_src}}causal_layer_stack.py:CausalLayerStack
    layer_factory: !lambda:{{model_src}}pre_ln_layer.py:PreLNLayer
        feedforward: !factory:{{model_src}}feedforward_layer.py:FeedforwardLayer@feedforward_factory
            d_model: !var "hidden_size"
            d_feedforward: !var "dim_feedforward"
        attention: !factory:torch.nn:Identity []
        norm1: &layer_norm_factory !factory:torch.nn:LayerNorm [!var "hidden_size"]
        norm2: *layer_norm_factory
    post_norm: *layer_norm_factory
    num_hidden_layers: 2
"""

# Keyword args to pass to the template
pp_kwargs = {
}
    
# Positional args to pass to factory
factory_args = [
]

# Keyword args to pass to factory
factory_kwargs = dict(
    hidden_size=64,
    dim_feedforward=256,
)

In [None]:
document = """
main: &foobar !list@foobar
    - !singleton:map
        # The generated object is equivalent to: "lambda arg0: pow(arg0, 2)"
        - !lambda:pow [ !var "arg0", !var "power" ]
        - !singleton:range [ 4 ]
"""

# Keyword args to pass to the template
pp_kwargs = {
}

# Positional args to pass to factory
factory_args = [
]

# Keyword args to pass to factory
factory_kwargs = dict(
    power=3,
)

In [None]:
document = """
.define: &foobar !dict@foobar
    foo: 1
    bar: 2
    baz: |
        She sells sea shells
        by the sea shore
main:
    - *foobar
"""

# Keyword args to pass to the template
pp_kwargs = {
}

# Positional args to pass to factory
factory_args = [
]

# Keyword args to pass to factory
factory_kwargs = dict(
)

## Convert Document to Graph

```python
class ConfigEnvironment:
... 
    def load(
        self,
        config_path: os.PathLike | str,
        /,
        **kwargs,
    ) -> Config:
...
    def load_from_string(
        self,
        config: str,
        /,
        **kwargs,
    ) -> Config:
```

- load: Load a template from a path; all paths relative to 'searchpaths' are searched for the template.
    - config_path: The relative (to searchpaths) template path.
    - kwargs: These are passed into the context of the template.
- load_from_string: As with load, but a Python string defines the template body; Note that this bypasses the template loader.
    - config: A Python string with a Jinja2 template.
    - kwargs: Passed to the template.

In [None]:
graph = env.load_from_string(document, **pp_kwargs).config
nb.show_codeblock("python", pformat(graph), "### Node Graph")

## Convert Graph to YAML

Convert the node-graph to a YAML representation. This may not be exactly the same as it was in the source template, but should be symantically equivalent.

```python
from forgather.yaml_encoder import to_yaml
...
def to_yaml(obj: Any):
```


In [None]:
nb.show_codeblock("yaml", to_yaml(graph))

## Convert Graph to Python

This function takes the output from Latent.to_py(graph) and uses it to render Pyhon code using a Jinja2 template. If the template is unspecified, an implicit "built-in" template is used, which will generate appropriate import and dynamic import statements, where required.

```python
from forgather.codegen import generate_code
...
def generate_code(
    obj,
    template_name: Optional[str] = None,
    template_str: Optional[str] = None,
    searchpath: Optional[List[str | os.PathLike] | str | os.PathLike] = ".",
    env=None,  # jinja2 environment or compatible API
    **kwargs,
) -> Any:
```

The default template accepts the following additional kwargs:

    factory_name: Optional[str]="construct", ; The name of the generated factory function.
    relaxed_kwargs: Optional[bool]=Undefined, ; if defined, **kwargs is added to the arg list
    
See 'help(generate_code)' for details.

In [None]:
generated_code = generate_code(graph, name_policy=None)
nb.show_codeblock("python", generated_code, "### Generated Code", )

## Materialize Graph

Construct the objects directly from the graph.

```python
from forgather.latent import Latent
...
def materialize(obj: Any, /, *args, **kwargs):
```

Construct all object in the graph, returning the constructed root-node.

The positional and keyword arguments will be substituted for matching VarNodes in the graph.

In [None]:
obj = Latent.materialize(
    graph,
    *factory_args,
    **factory_kwargs,
)
nb.show_codeblock("python", pformat(obj), "### Constructed Graph via Latent.materialize()")

## Execute Generated Code

Execute the generated code, then call the generated 'construct' function to construct the objects.

Note: Lambda nodes with args are not working at present (although Latent.materialize() works)

In [None]:
exec(generated_code)
obj = construct(*factory_args, **factory_kwargs,)
nb.show_codeblock("python", obj, "### Constructed Graph via Execution of Generated Code", )