# Lightweight component exercise

The goal of this exerices is to build a pipeline with two components built using ```kfp.components.func_to_container_op```.

The first one, **WRITE**,
- takes as input a string and a GCS path
- writes the input string to the GCS path
- outputs the GCS path .

The second one, **READ**,
- takes as input a GCS path
- reads its content and print it.

## Import kfp modules

In [None]:
import kfp.components as comp
import kfp.dsl as dsl
import kfp.gcp as gcp
from kfp import Client as KfpClient

## Component **WRITE**

1. The easiest way to write directly to GCS is to use ```tf.io.gfile.GFile```. The price to pay is that the image of your container is quite heavy.
2. The third hint is that the function can be run if the all the dependencies are installed

### Start with the function ```write_to_gcs```

In [None]:
def write_to_gcs(content: str, output_path: str) -> str:
    """Simple function to write content to file in GCS"""
    from tensorflow.io import gfile
    with gfile.GFile(output_path, 'w') as f_out:
        f_out.write(content)
        
    return output_path

### Create the op ```write_to_gcs_op```

In [None]:
write_to_gcs_op = comp.func_to_container_op(write_to_gcs)

## Component **READ**

### Start with the function ```read_from_gcs```

In [None]:
def read_from_gcs(input_path: str) -> None:
    """Simple function to read content from a file on GCS"""
    from tensorflow.io import gfile
    with gfile.GFile(input_path, 'r') as f_in:
        for line in f_in.readlines():
            print(line)

### Create the op ```read_from_gcs_op```

In [None]:
read_from_gcs_op = comp.func_to_container_op(read_from_gcs)

## Crate the pipeline write_and_read

- Use your user name in the pipeline name to make it unique
- Remember to apply the gcp secret ```'user-gcp-sa'```

In [None]:
@dsl.pipeline(
    name='Read and write',
    description='A pipeline that writes to a file in GCS and reads back the content'
)
def write_and_read(
    content: str='',
    gcs_path: dsl.types.GCSPath=''
):
    write_to_gcs_task = write_to_gcs_op(
        content=content, output_path=gcs_path).apply(gcp.use_gcp_secret('user-gcp-sa'))
    write_to_gcs_task.set_display_name('Write file to GCS')
    
    read_from_gcs_task = read_from_gcs_op(
        write_to_gcs_task.output).apply(gcp.use_gcp_secret('user-gcp-sa'))
    read_from_gcs_task.set_display_name('Read from file on GCS')

## Create and connect the client

If running outside of the cluster with Kubeflow, set `GOOGLE_APPLICATION_CREDENTIALS` for dealing with authorisation. The service account needs to have the role `IAP-secured Web App User`.

In [None]:
# import os
# os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '' # path to the json file of the service account used to log in: it need to have role IAP-secured Web App User
# HOST = '' # url of the cluster e.g. https://demo-kubeflow.endpoints.lf-ml-demo.cloud.goog/pipeline
# CLIENT_ID = '' # The client ID used by Identity-Aware Proxy
# NAMESPACE = '' # user namespace e.g. https://demo-kubeflow.endpoints.lf-ml-demo.cloud.goog/pipeline

In [None]:
client = KfpClient(
# we are running into the same Kubeflow so we do not need to do anything
#     host=HOST,
#     client_id=CLIENT_ID,
#     namespace=NAMESPACE  
)

## Run the pipeline

Run the pipeline using the method ```create_run_from_pipeline_func``` of the class ```kfp.Client```.

To make unique your GCS path use the template ```{{workflow.uid}}``` and ```{{pod.name}}```.
1. Why are we doing it?
2. Why can we do it?

In [None]:
client.create_run_from_pipeline_func(
    pipeline_func=write_and_read,
    arguments={'content': '0\n1\n', 
               'gcs_path': 'gs://lf-ml-demo-eu-w1/{{workflow.uid}}/{{pod.name}}/data'},
    experiment_name='01_single_write_and_read',
    run_name='001'
)