# Describe Function
Get the table's summary statistics and summary plots

## Required parameters

param context:         the function context
param table:           MLRun input pointing to pandas dataframe (csv/parquet file path)
param label_column:    ground truth column label
param class_labels:    label for each class in tables and plots
param plot_hist:       (True) set this to False for large tables
param plots_dest:      destination folder of summary plots (relative to artifact_path)
param update_dataset:  when the table is a registered dataset update the charts in-place

## Output artifacts

The function will output the following artifacts per column within the data frame (based on data types):
1. histogram chart
2. violin chart
3. imbalance chart
4. correlation-matrix chart
5. correlation-matrix csv
6. imbalance-weights-vec csv


### MLconfig

In [1]:
import os
import mlrun
mlrun.set_environment(api_path = 'http://mlrun-api:8080',
                      artifact_path = os.path.abspath('./'))

('default', '/User/functions/describe')

## Save

In [2]:
import yaml

with open('item.yaml') as item_file:
    items = yaml.load(item_file, Loader=yaml.FullLoader)

In [3]:
# create job function object from notebook code
fn = mlrun.code_to_function(items["name"],
                            kind=items["spec"]["kind"],
                            handler=items["spec"]["handler"],
                            filename=items["spec"]["filename"],
                            image=items["spec"]["image"],
                            description=items["description"],
                            categories=items["categories"],
                            labels=items["labels"],
                            requirements=items["spec"]["requirements"])

fn.export("describe.yaml")

> 2021-02-15 15:33:08,282 [info] function spec saved to path: describe.yaml


<mlrun.runtimes.kubejob.KubejobRuntime at 0x7f10d7ae9d90>

## Examples

In [4]:
fn.apply(mlrun.platforms.auto_mount())

<mlrun.runtimes.kubejob.KubejobRuntime at 0x7f10d7ae9d90>

In [5]:
from describe import summarize

DATA_URL = 'https://s3.wasabisys.com/iguazio/data/iris/iris_dataset.csv'

task = mlrun.NewTask(name="tasks-describe", 
                     handler=summarize, 
                     inputs={"table": DATA_URL}, 
                     params={'update_dataset': True, 
                             'label_column': 'label'})

### Run Locally

In [6]:
run = mlrun.run_local(task)

> 2021-02-15 15:33:08,857 [info] starting run tasks-describe uid=e5b0dc5a8a32440fb38febf1e32b8d4f DB=http://mlrun-api:8080


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...e32b8d4f,0,Feb 15 15:33:08,completed,tasks-describe,v3io_user=eyalskind=handlerowner=eyalshost=jupyter-eyals-666bf556fc-5v7bf,table,update_dataset=Truelabel_column=label,,histogramsviolinimbalanceimbalance-weights-veccorrelation-matrixcorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run e5b0dc5a8a32440fb38febf1e32b8d4f --project default , !mlrun logs e5b0dc5a8a32440fb38febf1e32b8d4f --project default
> 2021-02-15 15:33:13,417 [info] run executed, status=completed


### Run Remotely

In [7]:
fn.run(task, inputs={"table": DATA_URL})

> 2021-02-15 15:33:13,424 [info] starting run tasks-describe uid=33a83c87ccea4920ac2832cfc6f09f32 DB=http://mlrun-api:8080
> 2021-02-15 15:33:13,569 [info] Job is running in the background, pod: tasks-describe-k8pbk
> 2021-02-15 15:33:22,392 [info] run executed, status=completed
final state: completed


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...c6f09f32,0,Feb 15 15:33:17,completed,tasks-describe,v3io_user=eyalskind=jobowner=eyalshost=tasks-describe-k8pbk,table,update_dataset=Truelabel_column=label,,histogramsviolinimbalanceimbalance-weights-veccorrelation-matrixcorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run 33a83c87ccea4920ac2832cfc6f09f32 --project default , !mlrun logs 33a83c87ccea4920ac2832cfc6f09f32 --project default
> 2021-02-15 15:33:22,767 [info] run executed, status=completed


<mlrun.model.RunObject at 0x7f10c6d03110>