# Torcharrow: Tracing

Torcharrow programs are executed eagerly -- that is every every expression is evaluated bottom up and statememts are executed one after another. While this is fast and allows developers to debug programs easily it doesn't allow to inspect the executed code for analysis, optimization or platform retargeting. 

To get the best of both worlds, fast execution, and ease of analyzability, torcharrow introduces tracing. To create a torcharrow trace you simply have to follow three steps:
 * first, turn on tracing by calling
   
   * ```Trace.turn(on=True, types=(AbstractColumn, GroupedDataFrame))``` also call ```AbstractColumn.reset()``` to always have symbolic variables starting at zero 
 
 * next, run the program without any column data; you do so by calling the column and dataframe factory methods with types only, e.g.
   
   * ```Column(dtype = int64)``` or ```DataFrame(dtype = Struct([Field('a', int64), Field('b', string)]))```
 
 * finally, capture the trace as a list of single assignment statements and a variable name, the variable referencing the resulting object defined by this trace. So call
   
   * ```stms = Trace.statements(); res = Trace.result()``` and then do what you need to do with it.
 
## Example: 1000% semantic preserving traces

Let's see this in practice: First we turn on tracing and run the program


In [1]:
from torcharrow import Column, AbstractColumn,  DataFrame, GroupedDataFrame, Struct, Field, int64, Trace, me

# turn on tracing
Trace.turn(on=True, types=(AbstractColumn, GroupedDataFrame))
AbstractColumn.reset()

#run program
d0 = DataFrame(dtype=Struct([Field(i, int64) for i in ['a', 'b', 'c']]))
d1 = d0.select('*', e=me['a'] + me['b'])
str(d1)


"DataFrame({'a':Column([], id = c1), 'b':Column([], id = c2), 'c':Column([], id = c3), 'e':Column([], id = c5), id = c4})"

The result is an empty dataframe but with particular object ids. Next we capture the trace

In [2]:
#capture trace
d1_result = Trace.result()
d1_stms = Trace.statements()
(d1_result, d1_stms)

('c4',
 ["c0 = DataFrame(dtype=Struct([Field('a', int64), Field('b', int64), Field('c', int64)]))",
  "c4 = DataFrame.select(c0, '*', e=me.__getitem__('a').__add__(me.__getitem__('b')))"])

The right hand side of each statement is a fully resolved and type checked expressions in normal form. Each statement has a unique identifier, nameley c*i* where *i* is the id the column or dataframe of the statement's right hand side.  

What can we do with such trace? We can 
 * analyze it for type correctness or for privacy flows
 * optimize and rewrite it
 * capture it, ship it to another machine and reexecute with or without data. 

For simplicity we rerun the trace

In [3]:
# turn tracing off
Trace.turn(on=False)
AbstractColumn.reset()

# execute the statements
exec(';'.join(Trace.statements()))

#eval the result
str(eval(d1_result))

"DataFrame({'a':Column([], id = c1), 'b':Column([], id = c2), 'c':Column([], id = c3), 'e':Column([], id = c5), id = c4})"

Hurrah! `d1` and `eval(d1_result)` are structurally exactly the same, including their object ids. This torcharrow trace preserved 100% of the original semantics. 

Next we will discuss the constraints a torcharrow program has to obey so that its traces are 100% semantics preserving...

## What are the contraints for 100% faithful traces?  (TO BE CONT'D)