# More accurate ML example

Here we are working with the famous Iris Data... our goal is to basically predict the species of a flower.

I wont try to create a perfect model, i will just show how we can use easydags for ML tasks. 
Lets suppose that we have 2 ideas for a machine learning model that are good enough, we want the final predicction to be the mean of the sum of those 2 models.

In this notebook we will run this task in the easydags way!


# As a dag


We will need this nodes

- Pre pro
- base
- under
- over
- smote
- final metrics

The steps to build and run are the following:

1. The common task before defining a dag is defining the function that we will run in each node
2. Creates nodes (please check that we did not add the dependency directly in here in this example)
3. Define dependencies using >> (thats the Hard dependency operator)
4. Create the nodes list using all the ExecNodes availables in the envioronment... if you do not want to do it with all the created nodes please create the list by yourself as usual
5. Create the dag with the list of nodes
6. Run the dag
7. Check the html output with one iframe


In [1]:
from easydags import  ExecNode, DAG, search_nodes
import time
from sklearn import datasets
import numpy as np

import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestClassifier


from IPython.display import HTML
import time
t = time.time()




def read_data():
    iris = datasets.load_iris()
    X = iris.data
    y = iris.target
    return (X,y)

def pre_pro(**kwargs):
    X = kwargs['data'][0]
    y = kwargs['data'][1]
    scaler = MinMaxScaler()
    scaler.fit(X)
    X = scaler.transform(X)
    return (X,y)



def model0 (**kwargs):
    X = kwargs['data'][0]
    y = kwargs['data'][1]
    clf = RandomForestClassifier(max_depth=4, random_state=0)
    clf.fit(X, y)

    return clf


def model1 (**kwargs):
    X = kwargs['pre_pro_data'][0]
    y = kwargs['pre_pro_data'][1]
    clf = RandomForestClassifier(max_depth=2, random_state=0)
    clf.fit(X, y)

    return clf

def predict_ensemble (**kwargs):
    X_pre_pro = kwargs['pre_pro_data'][0]
    y_pre_pro = kwargs['pre_pro_data'][1]
    X = kwargs['data'][0]
    y = kwargs['data'][1]
    model0 = kwargs['model0']
    model1 = kwargs['model1']
  
    res0 = model0.predict_proba(X)
    res1 = model0.predict_proba(X_pre_pro)
    res2 = 0.5* (res0 + res1)

    res = np.concatenate((res0,res1,res2), axis=1)

    return res


def save_results (**kwargs):
    preds = kwargs['final_result']

    pd.DataFrame(preds).to_csv('probas.csv')
    

    return _

node_read = ExecNode('read', output_name = 'data',exec_function = read_data)

node_pre_pro = ExecNode('pre_pro', output_name = 'pre_pro_data',exec_function = pre_pro)

node_model0 = ExecNode('model0', output_name = 'model0',exec_function = model0)

node_model1 = ExecNode('model1', output_name = 'model1',exec_function = model1) 

node_model_esemble = ExecNode('final',exec_function = predict_ensemble) 

node_write = ExecNode('save',exec_function = save_results) 



node_read >> node_pre_pro >> node_model1

node_read >> node_model0

node_read >> node_model_esemble

node_pre_pro >> node_model_esemble

node_model0 >> node_model_esemble

node_model1 >> node_model_esemble >> node_write


nodes = [] 
globs = globals().copy()
for obj_name in globs:         
    if isinstance(globs[obj_name], ExecNode):
        nodes.append(globs[obj_name])

dag = DAG(nodes,name = 'Real ML toy',max_concurrency=3, debug = False)

dag.execute()
    
from IPython.display import IFrame

IFrame(src=f"{dag.name}_states_run.html", width='100%', height=600)



print(f'time: {int(time.time() - t)} seconds')

2023-06-15 11:11:43.742 | INFO     | easydags.node:execute:146 - Start executing read at 2023-06-15, 11:11:43
2023-06-15 11:11:43.745 | INFO     | easydags.node:execute:146 - Start executing pre_pro at 2023-06-15, 11:11:43
2023-06-15 11:11:43.745 | INFO     | easydags.node:execute:146 - Start executing model0 at 2023-06-15, 11:11:43
2023-06-15 11:11:43.747 | INFO     | easydags.node:execute:146 - Start executing model1 at 2023-06-15, 11:11:43
2023-06-15 11:11:43.889 | INFO     | easydags.node:execute:146 - Start executing final at 2023-06-15, 11:11:43
2023-06-15 11:11:43.900 | INFO     | easydags.node:execute:146 - Start executing save at 2023-06-15, 11:11:43


drawing
time: 0 seconds


# Final DAG

If you run this tutorial, you will get the dag html by yourself. Here i will add a png version so you can check it out without running the tutorial:

![Motivation](https://raw.githubusercontent.com/magralo/easydags/main/resource_readme/dag_tut_ml_toy.png)
              