# Example 01: Matrix Multiplication

In [None]:
STELLAR_CONTAINER = True
if STELLAR_CONTAINER:
    sys.path.append('/home/stellar/git/phyfleaux/')

def printout(*args):
    for arg in args:
        print(arg)
        print()

from phyfleaux.api import numpy as np
from phyfleaux.api.directives import task

list_a = [1, 2, 3, 4]
list_b = [1, 0, 0, 1]

N = 2
a = np.array(list_a).reshape((N, N))
b = np.array(list_b).reshape((N, N))


@task
def matmul(a, b, N):
    c = np.zeros((N, N))
    for i in range(N):
        for j in range(N):
            for k in range(N):
                c[i][j] += a[i][k] * b[k][j]
    return c


print(matmul(a, b, N))
print(matmul(a, b, N))
print(matmul(a, b, N))
print(matmul(a, b, N))
print(matmul.called)

import inspect

# print(':', inspect.findsource(matmul))

In [None]:
# Numpy Code

import numpy as np

N = 2

# data buffers
list_a = [1, 2, 3, 4]
list_b = [1, 0, 0, 1]

# building numpy arrays with their data kept in above buffers 
a = np.array(list_a).reshape((N, N))
b = np.array(list_b).reshape((N, N))

# naive implementation of matrix multiplication
def matmul(a, b, N):
    c = np.zeros((N,N), dtype=np.int64)
    for i in range(N):
        for j in range(N):
            for k in range(N):
                c[i][j] += a[i][k] * b[k][j]
    return c

# calculate the result
c = matmul(a, b, N)

# print arrays
printout(a, b, c)

In [None]:
# Phyfleaux Code

from phyfleaux.api import numpy as np     # <== changed
from phyfleaux.api.directives import task # <== added

N = 2

list_a = [1, 2, 3, 4]
list_b = [1, 0, 0, 1]

a = np.array(list_a).reshape((N, N))
b = np.array(list_b).reshape((N, N))


@task                                    # <== added
def matmul(a, b):
    c = np.zeros((N,N), dtype=np.int64)
    for i in range(N):
        for j in range(N):
            for k in range(N):
                c[i][j] += a[i][k] * b[k][j]
    return c

c = matmul(a, b)
printout(a, b, c)


In [None]:
from functools import partial

class memoize(object):
    def __init__(self, func):
        self.func = func 
    def __get__(self, obj, objtype=None):
        if obj is None:
            return self.func
        return partial(self, obj)
    def __call__(self, *args, **kw):
        obj = args[0]
        try:
            cache = obj.__cache
        except AttributeError:
            cache = obj.__cache = {}
        key = (self.func, args[1:], frozenset(kw.items()))
        try:
            res = cache[key]
        except KeyError:
            res = cache[key] = self.func(*args, **kw)
        return res


# example usage
class Test(object):
    v = 0
    @memoize
    def inc_add(self, arg):
        self.v += 1
        return self.v + arg

t = Test()
print(t.inc_add(2))
print(t.inc_add(2))
print(Test.inc_add(t, 2)) != Test.inc_add(t, 2)
assert t.inc_add(2) == t.inc_add(2)
assert Test.inc_add(t, 2) != Test.inc_add(t, 2)

## overview
# scratchpad
Phyfleaux constructs an execution-tree accompanied by callbacks returning data for each context, i.e., a _symbol-table_ returning the _view_ of data for each namespace.

__Identify the problem__


### Example 01: Matrix Multiplication
```py
import numpy as np

data_a = [1, 2, 3, 4]
data_b = [1, 0, 1, 0]

a = np.array((2,2), data_a)
b = np.array((2,2), data_b)
```

- PhySL ⇒ 
- Code representations:
- AST (PAST)
- PhySL (in memory (Python objects), human readable, …)
- ISL (Polyhedral / PyTiramisu)
- Python program => PhySL
- Code analyses
- Code transformations
- Code generation

## Input
PhySL Expression-Trees
Pointers to buffers containing the input data
Context or Context-Configurations
CPU: SIMD, MIMD, (…?)
GPU: SIMT, (…?)
Distributed
Any combination of above three contexts

---> Phyfleaux has parsers for both PhySL and Python code

---

# Overview
Phyfleaux framework facilitates runs of PhySL expression-tree(s) within one or more _contexts_. Its high-productivity Python API can be utilized to transform and generate "optimized" code, just-in-time (JIT), from PhySL expression-trees.

- **Context** is any PhySL execution-engine with hooks for code-transformation and generation, scheduling, and also profiling PhySL expressions. Phyfleaux's default execution-engine is [Phylanx](https://github.com/stellar-group/phylanx).


## Archticture
- _data_: memory model
- _computation_: SIMD SIMT MIMD
- _communication_:  

## Who is it for?
Big data application developers.
- __big-data:__ when the amount of data (number of elements) is "much" larger than memory capacity
- mesh-based (people dealing with data on a mesh field, ...)


## What it does?
1. provides an abstract(?) data-object :class:`phyfleaux.tensor` on top of Numpy's :class:`numpy.array` to transparently manage communication and  portability between CPU(s) and GPU(s).
   - _tensors_ are  of data-flow analysis
2. includes a suite of _static_ (compile-time) transformation
    - Polyhedral
    - Rule-Based
    - 
3.  generation
    - Phylanx
1. 

_Code_ is machine (_context_) specific while _program_ is **NOT**!

---

```cpp
phyfleaux::task::cpu::
phyfleaux::data::cpu::

phyfleaux::task::gpu::
phyfleaux::data::gpu::
```

or 

```cpp
phyfleaux::cpu::task::
phyfleaux::cpu::data::
phyfleaux::gpu::task::
phyfleaux::gpu::data::
```

---

<!--

Phyfleaux generates "optimized" execution-tree of *task-graph* where *tasks* are memory-bound subtrees   executables for the *contexts* running PhySL *tasks*. Executables may inlclude extra statements, and 

. Code generation usually preceded by series of code transformations based on the architecture, data layout, data access pattern, and runtime environment of the context. performance-metrics, .

... cost measurement handles  
*task-graph*. 

...Nodes of the graph are *tasks*, i.e., identified Python functions. 

builds the task-graph from the Phylanx expression-tree (PhySL). *Task* is   order to improve (minimize) the execution time, Phyfleaux transforms s (a.k.a., PhySL) into task-graphs. . application program run as a set of *tasks* on one or more execution environment(s) which we call *context*. A context consists of a task-graph, target architecture(s) (MIMD, SIMD, SIMT, ...), and also one or more runtime systems (interpreters, executors) running the expressions on target architecture(s).

In short, Phyfleaux:

1. generates task-graph 
   - generate the expression-tree (PhySL) 
   - merging expression-trees of Python functions

2. allocates subexpressions to available architecture(s)
   - Minimizing application's execution time by maximizing throughputs of its (Python) functions.

3. generates executable(s), JIT, for context's runtime

4. deploy code along with its data to Agave and Kubernete 
   - through [JetLag](https://github.com/STEllAR-GROUP/JetLag.git), as needed.

5. performance data is collected both at thread-level, through [APEX](https://github.com/khuck/xpress-apex), and task-level for 
an expression-tree (PhySL),

where each node is a [function definition](https://greentreesnakes.readthedocs.io/en/latest/nodes.html#FunctionDef) and edges represent data dependency between two tasks.
- There is a cost function associated to each task 
- Measurements returned by the cost functions are used to maximize applications computational throughput

Phyfleaux:

1. *statically* generates program's expression-tree (PhySL) __(,  a.k.a., *task-graph/tree*)?__- here by statically we mean any transformation applied after function definition but before its first invokation. or enforced code-generation ("compile-time transformations"?).

2. based on context's tasks and resources, Phyfleaux applies series of transformations to improve program's throughput. Transformation may be triggered by the user, implicitly or explicitly (), . selected by framework after analyzing data (type, layout, ...) and tasks (parallel, iterative, ...) of the application,  

3. For each runtime, Phyfleaux generates executable(-code?) of the invoked tasks based on target contexts.

There are two classes of transformations:

1. Data transformation
2. Computation transformation

Two classes of transformation are available. First, task transformations

### Create Tasks
`@task` is used to instantiate a task from a Python function


## Context
- architecture: parallel
CUDA
OPENMP
SIMD

GPU
CPU 


## Task
- wraps Python functions in *tasks*
- builds the expression-tree of these tasks (from Python AST)
- if neccessary, applies transformations based on (a) invokation context and/or (b) dynamic performance measurements
- generates the executable JIT: either once the task (Python function) is called, or the user triggered the cod
To Do:
- deployment
   * kubernetes
   * Agave
- performance
   * APEX/Phylanx
   * adaptivity
- visualization
   * performance data
   * task graph
-->

In short, Phyfleaux aims at maximizing throughput of Python application by exploiting application's, explicit and implicit, concurrency and parallelism. in the application running as many massively-parallel tasks as possible.

# Getting Started
Let's first make sure `phyfleaux` is on $PYTHONPATH:

In [None]:
# Alternative 3: Not quite sure how to get this working.

import phyfleaux as pfl

list_a = [1, 2, 3, 4] # buffer_a
list_b = [1, 0, 0, 1] # buffer_b

a = pfl.array(list_a).reshape((2, 2))
b = pfl.array(list_b).reshape((2, 2))
c = pfl.zeros((2,2))

with pfl.task as t:
    for i in range(2):
        for j in range(2):
            for k in range(2):
                c[i][j] += a[i][k] * b[k][j]

print(a)
print(b)
print(c)

## Task

<!-- Code blocks (essentially Python functions) with their associated Python AST (PAST) and corresponding expression-tree (PhySL). An easy way to create a :class:`Task` object is to use `@task`. Here is a task created from :func:`nothing` which does nothing. -->

In [None]:
from phyfleaux.api.directives import task

@task
def nothing():
    pass

type(nothing)

Tasks are Python functions associated with a memory-independent ID `id`, an AST `tree`, input data (to initiate the local symbol table) `input`, and a callback function (quantifying execution cost) `cost` as well as its derivative `derivative` (value, or yet another callback function).

In [None]:
def nothing():
    pass

# attributes of the pure Python function
python_attributes = nothing.__dir__()

@task
def nothing():
    pass

# Task attributes
fleaux_attributes = nothing.__dir__()

print(set(fleaux_attributes) - set(python_attributes))

print(nothing.__weakref__)

t = task(nothing)

In [None]:
@task
def foo(name='World'):
    print(f"Hello {name}!")

In [None]:
print(type(foo))

# Phyfleaux

## Context
### Runtime
+ Phylanx
   - run PhySL
+ CPython
   - run Python-3.8.3+
+ Numba
   - run LLVM IR


### Architecture
+ SIMT
   - GPU
 
+ SIMD and MIMD
   - CPU

+ Hybrid
   - CPU and GPU

Also, all the above in distributed settings.

## Task

<!-- FleCSI -->
## Data 

1. data associated with mesh elements (so-called mesh fields);
2. particle data;
3. data associated with physical models (equation of state, for example);
4. simulation state data that is set at launch time, but does not change afterward (physics configuration,
for example);
5. simulation state data that varies during the run (cycle number, for instance).

## Kernel
A kernel is pure function that accomplishes a single, well–defined job.
<!-- End FleCSI -->


# Related work
https://www.sciencedirect.com/topics/computer-science/execution-engine