# Generate classification data

Use this function to generate sample data sets, wraps scikit-learn's **[make_classification](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html#sklearn-datasets-make-classification)**.  See the link for a description of all parameters.

### save

In [1]:
from mlrun import code_to_function
from mlrun.platforms.other import auto_mount
import yaml

with open("item.yaml") as item_file:
    items = yaml.load(item_file, Loader=yaml.FullLoader)

gpus = False

fn_params = {
    "name": items["name"],
    "handler": items["spec"]["handler"],
    "kind": items["spec"]["kind"],
    "filename": items["spec"]["filename"],
    "image": items["spec"]["image"] if not gpus else "mlrun/ml-models-gpu",
    "description": items["description"],
    "categories": items["categories"],
    "labels": items["labels"],
}

fn = code_to_function(**fn_params)

fn.export("gen_class_data.yaml")
fn.apply(auto_mount())

> 2021-02-17 09:48:16,970 [info] function spec saved to path: gen_class_data.yaml


<mlrun.runtimes.kubejob.KubejobRuntime at 0x7f4bc2269d50>

### example function

In [2]:
from mlrun import NewTask, mlconf

task_params = {
    "name": "tasks-generate-classification-data",
    "params": {
        "n_samples": 10_000,
        "m_features": 5,
        "k_classes": 2,
        "weight": [0.5, 0.5],
        "sk_params": {"n_informative": 2},
        "file_ext": "csv",
    },
}

### local

In [3]:
from mlrun import run_local
from gen_class_data import gen_class_data

run_local(NewTask(**task_params), handler=gen_class_data)

> 2021-02-17 09:48:17,397 [info] starting run tasks-generate-classification-data uid=45226aa0fac845cb9418d043a9de0e75 DB=http://mlrun-api:8080


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...a9de0e75,0,Feb 17 09:48:17,completed,tasks-generate-classification-data,v3io_user=eyalskind=handlerowner=eyalshost=jupyter-eyals-666bf556fc-5v7bf,,"n_samples=10000m_features=5k_classes=2weight=[0.5, 0.5]sk_params={'n_informative': 2}file_ext=csv",,classifier-data


to track results use .show() or .logs() or in CLI: 
!mlrun get run 45226aa0fac845cb9418d043a9de0e75 --project default , !mlrun logs 45226aa0fac845cb9418d043a9de0e75 --project default
> 2021-02-17 09:48:17,749 [info] run executed, status=completed


<mlrun.model.RunObject at 0x7f4bc224a990>

### remote

In [4]:
run = fn.run(NewTask(**task_params), artifact_path=mlconf.artifact_path)

> 2021-02-17 09:48:17,755 [info] starting run tasks-generate-classification-data uid=2464662532594f7ead10d6f2e14ecceb DB=http://mlrun-api:8080
> 2021-02-17 09:48:17,905 [info] Job is running in the background, pod: tasks-generate-classification-data-n9bgh
> 2021-02-17 09:48:22,321 [info] run executed, status=completed
final state: completed


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...e14ecceb,0,Feb 17 09:48:21,completed,tasks-generate-classification-data,v3io_user=eyalskind=jobowner=eyalshost=tasks-generate-classification-data-n9bgh,,"n_samples=10000m_features=5k_classes=2weight=[0.5, 0.5]sk_params={'n_informative': 2}file_ext=csv",,classifier-data


to track results use .show() or .logs() or in CLI: 
!mlrun get run 2464662532594f7ead10d6f2e14ecceb --project default , !mlrun logs 2464662532594f7ead10d6f2e14ecceb --project default
> 2021-02-17 09:48:24,058 [info] run executed, status=completed
