In [1]:
from stepfunctions.steps import states
from stepfunctions.workflow import Workflow
from disdat_step_function.caching_wrapper import Caching, PipelineCaching, LambdaGenerator

## Create lambda code and layer 
Our code must run on something, the cheapest way to host disdat-step-functioin is AWS lambda. Therefore, please use the following code to generate the lambda function.

**Warning**  
The following command takes some time as it needs to build docker image + compressing liibraries

In [2]:
LambdaGenerator.generate(root='generated_code', force_rerun=True)

What happened behind the scene was that we started a docker container with volume mount and run `pip install disdat` inside. By doing so, the install c binaries are compatible with AWS lambda runtime 

### Create Lambda on AWS
Now we have the lambda code and layer zip, it is time to create a lambda function on AWS. There are a few things to bear in mind:
1. The generated code is in /generated_code folder

2. Make sure the lambda function has read and write permission to the S3 bucket you are using for data storage
3. Make sure the lambda function use runtime python 3.8
4. Don't forget to upload the layer zip file, as disdat is not available on AWS by default!

## Option 1 - state-level caching

In [5]:
# create the caching object 
caching = Caching(caching_lambda_name="YOUR_LAMBDA_NAME", # for instance 'cache_lambda_worker'
                  s3_bucket_url='s3://YOUR_BUCKET',
                  context_name='tutorial_context',
                  verbose=True)

In [6]:
state_1 = states.Pass(state_id='state_1')
state_2 = states.Pass(state_id='state_2')
state_3 = states.Pass(state_id='state_3')

state_1 = caching.cache_step(state_1)
state_2 = caching.cache_step(state_2)
state_3 = caching.cache_step(state_3)

definition = states.Chain([state_1, state_2, state_3])

### Execute
Go to the submission and execute section. You can check out the modified state machine on AWS console
![state-level](docs/state_level.png)

## Option 2 - pipeline-level caching 
If you have a long state machine with many states, it would be nice to refactor the pipeline with one line of code. 
Good news! Disdat-step-functions does have this functionality

In [8]:
# the same state machine 
state_1 = states.Pass(state_id='state_1')
state_2 = states.Pass(state_id='state_2')
state_3 = states.Pass(state_id='state_3')
# normal definition 
definition = states.Chain([state_1, state_2, state_3])


caching = Caching(caching_lambda_name="YOUR_LAMBDA_NAME", # for instance 'cache_lambda_worker'
                  s3_bucket_url='s3://YOUR_BUCKET',
                  context_name='tutorial_context',
                  verbose=True)
# cache the pipeline annnd done!
PipelineCaching(definition, caching).cache()

### Execute
Go to the submission and execute section. You can check out the modified state machine on AWS console. This is what I obtained
![pipeline](docs/pipeline_level.png)

WAIT a second, why isn't the pipeline cached? This is because, by design,  `PipelineCaching(definition, caching).cache()` only replaces `states.Task`(where real work actually happens). Replacing states such as `Wait`, `Pass` doesn't make any sense + introduces a lot of overhead! 

If you replace `Pass` with proper `Task` objects, `PipelineCaching` will kick in and replace them!

### Pipeline-level caching fixed

In [9]:
# the same state machine 
state_1 = states.Task(state_id='state_1')
state_2 = states.Task(state_id='state_2')
state_3 = states.Task(state_id='state_3')

# normal definition 
definition = states.Chain([state_1, state_2, state_3])

caching = Caching(caching_lambda_name="YOUR_LAMBDA_NAME", # for instance 'cache_lambda_worker'
                  s3_bucket_url='s3://YOUR_BUCKET',
                  context_name='tutorial_context',
                  verbose=True)
# cache the pipeline annnd done!
PipelineCaching(definition, caching).cache()

### Submit the workflow but DON'T execute it!
`states.Task` needs more configuration, such as Lambda name or EC2 ARN, which obviously we don't have. If you execute the state machine, you'll get an error.   

However, you should be able to see the augmented graph on AWS. Here's what I obtained:
![task](docs/task.png)

## Submission

Executing the following commands requires a valid AWS credential. Please make sure your IAM role has access to StepFunctions

In [16]:
workflow_name = 'simple_cached_workflow'

target_flow = [flow for flow in Workflow.list_workflows() if flow['name'] == workflow_name]

# if the same workflow name is used, update the definition 
if len(target_flow) > 0:
    workflow = Workflow.attach(target_flow[0]['stateMachineArn'])
    workflow.update(definition=definition, role=config.EXECUTION_ROLE)
# otherwise create a new one 
else:
    workflow = Workflow(workflow_name, definition=definition, role=config.EXECUTION_ROLE)
    workflow.create()

### Execute
Supply some inputs to the state machine you just created and see what will happen!

In [17]:
inputs = {'Hello': 'world'}
execution = workflow.execute(inputs=inputs)
result = execution.get_output(wait=True)
# the state machine doesn't do anything, it just passes the data along 
assert result == inputs