Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Functions Module Thread. Do not merge #17

Closed
wants to merge 21 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
176 changes: 176 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,179 @@

https://en.wikipedia.org/wiki/Bagua
```

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. One-liner even before tutorial stating what xun is in general would be good.

  2. Also, put a note about technical requirements. Meaning at least exactly which Python versions are supported (I think some of the attributes you use appear only from certain version)

# Tutorial

## Quick Start

Standalone example xun project file for computing fibonacci numbers

```python
import xun
JensGM marked this conversation as resolved.
Show resolved Hide resolved


JensGM marked this conversation as resolved.
Show resolved Hide resolved
context = xun.context(
driver=xun.functions.driver.Sequential(),
store=xun.functions.store.DiskCache('store'),
)


@context.function()
def fibonacci_number(n):
return f_n_1 + f_n_2
with ...:
f_n_1 = (
0 if n == 0 else
1 if n == 1 else
fibonacci_number(n - 1)
)
f_n_2 = fibonacci_number(n - 2) if n > 1 else 0


@context.function()
def fibonacci_sequence(n):
return sequence
with ...:
sequence = [fibonacci_number(i) for i in range(n)]


def main():
"""
Compute and print the first 10 fibonacci numbers
"""
program = context.fibonacci_sequence.compile(10)
sequence = program()
JensGM marked this conversation as resolved.
Show resolved Hide resolved
for num in sequence:
print(num)


if __name__ == '__main__':
main()
```

Note that the `main` function defined here is optional. This project could either be run as is, or run using xun. To run the program using xun, run the following:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the very first lines users will read. So maybe explicitly mark the difference between running project as is and with using xun so it was clear what xun is.

Also, just calling xun without main won't print anything. As a user, where could I find the results or my computations?


```bash
xun exec examples/fibonacci.py "fibonacci_sequence(10)"
```

To see a visualization of the call graph:

```bash
xun graph examples/fibonacci.py "fibonacci_sequence(10)"
```

## A closer look

Let's break down the code from `fibonacci_number` in the example above in to 4 parts
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in()to 4 parts


```python
@context.function()
```
The decorator `@context.function()` marks this function as a context function, or a job. Context functions are functions that are meant to be executed in parallel, possibly on remote workers.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Maybe add here notes about heavy load and real-world examples (data retrieval, counting number of expressions "python is cool" in the Alexandria library, etc")

  2. Are all the context functions expected to be executed in parallel, or all the context functions but main one? Can main one be run in parallel? Maybe it actually deserves it's own marker @context.entry_function ?


```python
def fibonacci_number(n):
```
The function definition is just normal a python function definition.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. just (a) normal

  2. is it worth to separate it into its own category then?

```python
return f_n_1 + f_n_2
```
The body of the function is just regular python, it has as expected access to the function arguments, but it also has access to the variables defined in the special with constants statement.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. with constants or with constants or with_constants?
    Actually you use that all over the readme. Marking it everywhere would increase readability very much.

  2. ... statement. Note, that due to this your IDE, if you use one, will mark code as incorrect.

```python
with ...:
f_n_1 = (
0 if n == 0 else
1 if n == 1 else
fibonacci_number(n - 1)
)
f_n_2 = fibonacci_number(n - 2) if n > 1 else 0
```
Statements on form `with ...:` we refere to as with constant statments, they introduce new syntex and rules that we'll get move into in the next section. But important to note is that the recursive calls to `fibonacci_number(n)` are memoized in the context store, and can after scheduling, be run in parallel.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. as 'with constant(s)'
  • so mark once again the naming
  • decide if it is with constant() or with constant(s)
  1. context store, and can after scheduling, be run in parallel - something seems weird with commas.
    I would say:
  • context store and can after scheduling be run in parallel
  • context store and can, after scheduling, be run in parallel
  1. that we'll () move into in the...

  2. they introduce new synt(a)x

  3. we refer() to as

But important to note... be run in parallel.

Is this sentence really so important for the user? Like does this happens only for recursive calls, what is "memoized", why do I care that they have to be "in context store"? And you already promised me parallel execution of the whole method earlier.


In fact, `xun` works by first figuring out all the calls that will happen to context functions, building a call graph, and scheduling the calls such that any previous call that we may depend on is executed before we evaluate the current call. This requires the call graph to be a DAG, or directed acyclic graph.

## With Constants Statement

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will "with constants" help me? How I, as a user of xun, will know what to put into with constants?
Why can't I just do

@context.function()
def do_some_work(some_values):
    fixed_values = [fix(v) for v in some_values]
    data = dependency(fixed_values)
    result = expensive_computation(data)        

Python already knows that all the results must be available when dependent computations are done!
And you say yourself that heavy stuff should not be put into "with". Why can't I put everything outside of with then?

Or, on the opposite, why can't I put everything into "with"? You say that "@context.function" marks a job. But some jobs are called in "with_constants" statement, some are called beyond it!

To summarize:

  1. please describe in more details how I should know how to divide all of that.
  2. what is the benefit of all of that. Because it looks like you introduce only new syntax, with no value added to it.
  3. do the "bad" examples (put everything into with /outside of with / no xun at all) and describe what user will lose.


```python
@context.function()
def do_some_work(some_values):
result = expensive_computation(data)
with ...:
data = depencency(fixed_values)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depen(d)ency

fixed_values = [fix(v) for v in some_values]
```

In the above example a job takes in some iterable, `some_values` as argument, polished the values in it and calls another context function that it depends on. Note that the order of the statements inside the with constants statements does not matter. The syntax of with constants statements is similar to where clauses in Haskell and has rules that differ from standard python. In general, for with constants statements the following apply:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. polishe(s) the values in it...

  2. similar to "where" clauses in Haskell


* Order of statements is arbitrary
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note (because one more comment, one less - who cares anymore?):
I still think that you might be the only person in the world who would want non-sequential order of statements 😹

* Calling context functions is only allowed within with constants statements

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. And any functions in the module that you are calling from within "with context" statements must be context functions or belong to a module (or be python build-in).

  2. Add tests for all of that? Calling context functions/free functions/imported functions/make shared functions from and outside of from with statement.

* Only assignments and free expressions are allowed
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I am not familiar with term "free expression". What are those? What are the opposite?

  2. No reassignments allowed

* There can only be one with constants statements per context function
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... one "with constants" statement()

* Any code in with constants statements will be executed during scheduling, so the heavy lifting should be done in fully in the function body, and not inside the with constants statements

With constants statements allows xun to figure out the order of calls needed to execute a xun program.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"With constants" statements allow()
or
"With constants" statement() allows


## Stores

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Stores and Drivers maybe refer to lines from original script (where you create them), so it's obvious what you are talking about


```python
class DiskCache(metaclass=StoreMeta):
def __init__(self, cache_dir):
self.cache_dir = cache_dir

def __contains__(self, key):
with diskcache.Cache(self.cache_dir) as cache:
return key in cache

def __delitem__(self, key):
with diskcache.Cache(self.cache_dir) as cache:
del cache[key]

def __getitem__(self, key):
with diskcache.Cache(self.cache_dir) as cache:
return cache[key]

def __iter__(self):
with diskcache.Cache(self.cache_dir) as cache:
return iter(cache)

def __len__(self):
with diskcache.Cache(self.cache_dir) as cache:
return len(cache)

def __setitem__(self, key, value):
with diskcache.Cache(self.cache_dir) as cache:
cache[key] = value
```

As calls to context functions are executed and finished, the results are saved in the store of the context. Stores are classes that satisfy the requirements of `collections.abc.MutableMapping`, are pickleable, and whos state is shared between all instances. Stores can be defined by users by specifying a class with metaclass `xun.functions.store.StoreMeta`. The above code is in fact the entire implementation of the `xun.functions.store.DiskCache` store.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. and whos(e) state

  2. are pickl()able - at least they use this spelling in the python docs

  3. shared between all instances - instances of what? All StoreMeta objects shared the same state, all DiskCache objects shared the same state?
    Are you talking about instances from only one execution (why there are many of them then?? shouldn't there only be one "database" all the time) or across forever?

  4. How permanent is this? Can I access the state of job finished a month ago in my new job? I can get incorrect results if I name job "flower" and job "flower" already was executed before for something totally different?


## Drivers

Drivers are the classes that have the responsibility of executing programs. This includes scheduling the calls of the call graph and managing any concurency.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. concur(r)ency


## The `@xun.make_shared` decorator

```python
from math import radians
import numpy as np


def not_installed():
pass


@xun.make_shared
def not_installed_but_shared():
pass


@context.function()
def context_function():
not_installed() # Not OK
not_installed_but_shared() # OK
radians(180) # OK because the function is builtin
np.array([1, 2, 3]) # OK because the function is defined in an installed module
```

Because context functions are pickled, any function they reference must either be installed on the system, be represented differently. `xun` comes with a decorator, `@xun.make_shared`, that can make many functions serializable, which you need to use if you wish to call functions defined in your project file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. either be installed ... (or) be represented differently. (what is "installed" on the system? Should I call Bill Gates because my Windows doesn't come up with "install function" module?)

  2. comes with a decorator, xun_make_shared, which can make many functions serializable and which you need to use... (otherwise it seems like which you need to use refers to serializable functions)

  3. "make many functions serializable" - maybe describe a bit more which functions can be made serializable and which probably not.

  4. Make it clear that this decorator refers to all the functions which are called from context_function body and does not apply to the functions referred to from within "with constants"

  5. And mark that this is important only when calling from "xun exec", but everything works without it when calling from "python file.py" (don't know why btw)

41 changes: 41 additions & 0 deletions examples/fibonacci.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#!/usr/bin/env python3
import xun


context = xun.context(
driver=xun.functions.driver.Sequential(),
store=xun.functions.store.DiskCache('store'),
)


@context.function()
def fibonacci_number(n):
return f_n_1 + f_n_2
with ...:
f_n_1 = (
0 if n == 0 else
1 if n == 1 else
fibonacci_number(n - 1)
)
f_n_2 = fibonacci_number(n - 2) if n > 1 else 0


@context.function()
def fibonacci_sequence(n):
return sequence
with ...:
sequence = [fibonacci_number(i) for i in range(n)]


def main():
"""
Compute and print the first 10 fibonacci numbers
"""
program = context.fibonacci_sequence.compile(10)
sequence = program()
for num in sequence:
print(num)


if __name__ == '__main__':
main()
59 changes: 59 additions & 0 deletions examples/quicksort.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
#!/usr/bin/env python3
import xun


"""
WARNING: in this example almost all the computations happens within the with
JensGM marked this conversation as resolved.
Show resolved Hide resolved
constants statement and will be run during scheduling. It cannot be
parallelized, and is indended only to show the syntax of with constants
JensGM marked this conversation as resolved.
Show resolved Hide resolved
statements.
"""


context = xun.context(
driver=xun.functions.driver.Sequential(),
store=xun.functions.store.Memory(),
)


@context.function()
def quicksort(iterable):
JensGM marked this conversation as resolved.
Show resolved Hide resolved
result = []

result.extend(lt_sorted)

if len(pivot) == 1:
JensGM marked this conversation as resolved.
Show resolved Hide resolved
result.append(pivot[0])

result.extend(gt_sorted)

return tuple(result)
with ...:
JensGM marked this conversation as resolved.
Show resolved Hide resolved
# Tuples are used because arguments must be hashable and lists are not
lt_sorted = quicksort(lt) if len(lt) > 0 else tuple()
gt_sorted = quicksort(gt) if len(gt) > 0 else tuple()

# Workaround because generators can't be pickled, make list before tuple
lt = tuple([item for item in L[1:] if item <= pivot[0]])
JensGM marked this conversation as resolved.
Show resolved Hide resolved
gt = tuple([item for item in L[1:] if item > pivot[0]])

pivot = L[:1]
L = list(iterable)


def main():
"""
Compute and print the first 10 fibonacci numbers
JensGM marked this conversation as resolved.
Show resolved Hide resolved
"""
input = (8, 4, 7, 5, 6, 0, 9, 2, 3, 1)
JensGM marked this conversation as resolved.
Show resolved Hide resolved

print('input:', input)

program = context.quicksort.compile(input)
output = program()

print('output:', output)


if __name__ == '__main__':
main()
34 changes: 34 additions & 0 deletions examples/simple.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import xun


context = xun.context(
driver=xun.functions.driver.Sequential(),
store=xun.functions.store.Memory(),
)


v = 3


JensGM marked this conversation as resolved.
Show resolved Hide resolved
def add(a, b):
return a + b


@context.function()
def three():
return v


@context.function()
def add3(a):
return add(a, thr)
with ...:
thr = three()


@context.function()
def script(value):
print("Result:", result)
return result
with ...:
result = add3(value)
37 changes: 37 additions & 0 deletions examples/simple_issue.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
import xun


context = xun.context(
driver=xun.functions.driver.Sequential(),
store=xun.functions.store.Memory(),
)


v = 3


JensGM marked this conversation as resolved.
Show resolved Hide resolved
def add(a, b):
return a + b


@context.function()
def three():
return v


@context.function()
def add3(a):
return add(a, three)
with ...:
three = three()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you mark explicitly somewhere what's wrong with this example?
Took me a while to see :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that, will make an issue



@context.function()
def script():
return result
with ...:
result = add3(2)


program = context.script.compile()
print(program())
8 changes: 5 additions & 3 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
setuptools >=28
setuptools_scm
astor
astunparse
pyshd
pytest
pytest-runner
pyshd
setuptools >=28
setuptools_scm
sphinx
3 changes: 3 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
camille
diskcache
fastavro
matplotlib
networkx
2 changes: 2 additions & 0 deletions xun/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@
from .core import SchemaError
from .core import args_hash
from .core import filename_from_args
from .functions import context
from .functions import make_shared
from .memoized import memoized


Expand Down
Loading