# Profiling

To collect metrics on task runs you can set the `profile` flag in the Dag object constructor to true.

In [None]:
import time

import daglib

dag = daglib.Dag(name="example", description="This is an example DAG", profile=True)


@dag.task()
def task_1():
    """Do some stuff"""
    time.sleep(1)
    return [1, 2, 3]


@dag.task(final=True)
def task_2(task_1):
    """Do some other stuff"""
    return list(map(lambda x: x * 2, task_1))

In [None]:
dag.run()

Records containing profiling data for the tasks executed in the DAG run will be written as AVRO records. The files are saved under a file path matching the following pattern:

```
meta/profiling/{Dag.name}/{Dag.run_id}.avro
```

In [None]:
from pathlib import Path

list(Path("meta/profiling/").rglob("*.avro"))

## Query Profiling Data

To access profiling records, you can query the `MetaDB`. Profiling records are available under the `profiling` table.

In [None]:
from pathlib import Path

from daglib.metadata import MetaDB

db = MetaDB()

In [None]:
db.query("""
SELECT *
FROM profiling
""")

### Drop all data from Metadata DB

In [None]:
db.drop()  # drops all files and directories in the metadata directory

list(Path("meta/profiling/").rglob("*.avro"))

db.query("""
SELECT *
FROM profiling
""")

### Conducting Analytics on Profiling Data

Records for all runs where profiling is enabled will be saved to the metadata directory. All records are loaded to the `profiling` table.

In [None]:
import time
import random

import daglib

for _ in range(5):  # create and run the DAG 5 times
    dag = daglib.Dag(name="example2", description="This is another example DAG", profile=True)


    @dag.task()
    def task_1():
        """Do some stuff"""
        time.sleep(random.randint(1, 3))
        return [1, 2, 3]


    @dag.task(final=True)
    def task_2(task_1):
        """Do some other stuff"""
        return list(map(lambda x: x * random.randint(1, 10), task_1))


    print(dag.run())

In [None]:
db = daglib.metadata.MetaDB()

db.query("""
SELECT AVG(task_runtime) AS avg_task_runtime
FROM profiling
""")

In [None]:
db.drop()