# Get Data Snapshot
 --------------------------------------------------------------------
 This function will take a snapshot of the feature table to train a model

## Create and Test a Function 

In [None]:
import mlrun

The following code uses the `# nuclio: start-code` marker to instruct Nuclio to start processing code only from this location, and then performs basic Nuclio function configuration &mdash; defining the name of the function's container image (`mlrun/ml-models`) and the function type (`job`).

> **Note:** You can add code to define function dependencies and perform additional configuration after the `# nuclio: start-code` marker.

In [None]:
# nuclio: start-code

In [None]:
%nuclio config spec.build.baseImage = "mlrun/mlrun"
%nuclio config kind = "job"
%nuclio cmd -c pip install v3io-frames==0.8.*

### Define a Data-Snapshot Function <a id="gs-step-ingest-data-define-function"></a>


In [None]:
from os import path, getenv, getcwd
import pandas as pd
import v3io_frames as v3f


# Ingest a data set into the platform
def snapshot_data(context, container, table_path, columns, format='csv'):
    
    client = v3f.Client("framesd:8081", container=container)
    client.execute(backend="kv", table=table_path, command="infer")
    df = client.read('kv', table_path, columns=columns)
    
    target_path = path.join(context.artifact_path, 'data')
    # Optionally print data to your logger
    context.logger.info('Saving snapshot data set to {} ...'.format(target_path))
    
    # Store the data set in your artifacts database
    context.log_dataset('snapshot_dataset', df=df, format=format,
                        index=False, artifact_path=target_path)

The following cell uses the `# nuclio: end-code` marker to mark the end of a Nuclio code section and instruct Nuclio to stop parsing the notebook at this point.<br>
> **IMPORTANT:** Do not remove the end-code cell.

In [None]:
# nuclio: end-code

### Convert code to function

In [None]:
from mlrun import code_to_function, mlconf, mount_v3io

mlconf.dbpath = mlconf.dbpath or 'http://mlrun-api:8080'
mlconf.artifact_path = mlconf.artifact_path or f'{getenv("HOME")}/artifacts'


# Convert the local snapshot_data function into a gen_func project function
snapshot_data_func = code_to_function(name='snapshot-data')

In [None]:
# Set the source-data URL
container = 'users'
test_path = path.join(getcwd(), 'test')
table_path = path.join(getenv('V3IO_USERNAME'), 'examples/model-deployment-pipeline/data/feature-table')

columns = ['label', 'socioeconomic_idx', 
           'purchase_sum', 'purchase_mean', 'purchase_count', 'purchase_var', 
           'bet_sum', 'bet_mean', 'bet_count' ,'bet_var',
           'win_sum', 'win_mean', 'win_count' ,'win_var']
format = 'csv'

envs = {'V3IO_USERNAME': getenv('V3IO_USERNAME'),
       'V3IO_ACCESS_KEY': getenv('V3IO_ACCESS_KEY')}
snapshot_data_func.set_envs(envs)
snapshot_data_func.apply(mount_v3io())

#### Run the Function on a Cluster <a id="gs-run-ingest-func-cluster"></a>


In [None]:
#Build image
snapshot_data_func.deploy()

##### Run the Function on the Cluster <a id="gs-run-ingest-func-on-the-cluster-run-function"></a>


In [None]:
snapshot_data_run = snapshot_data_func.run(name='snapshot_data',
                                 handler='snapshot_data',
                                 params={'container': container, 'table_path': table_path, 
                                         'columns':columns, 'format': format},
                                 artifact_path=test_path)

In [None]:
#clean up
!rm -rf test/data

## Done