In [172]:
import warnings
import os
os.environ.pop('DJANGO_SETTINGS_MODULE', '') 
os.environ['INPROGRESS'] = '0'
warnings.filterwarnings("ignore")

# Model Tracking

Model tracking means to record all input and outputs of a model or a service.
This allows us to track our model's performance over time, by automatically storing all inputs and outputs.

Note:

    We can also use this feature for experiment tracking (to find the best model), and for model monitoring (to detect drift)
    For now let's focus on just tracking input and outputs.

## Define a tracker

To track a model or a service, use

* `om.runtime.experiment()` - to create a *experiment*, which is just a append-only container for data (aka *tracker*, technically a *tracking provider*)
* `exp.track()` - to attach a tracker and automatically log all calls, inputs and outputs sent via the runtime (including via Python, the REST API and the command line)
* `exp.clear(force=True)` - to reset the experiment and delete all data. note  this cannot be undone (hence `force=True` is required)

In [185]:
import omegaml as om

with om.runtime.experiment('myservice') as exp:
    exp.clear(force=True)
    exp.track('myservice')

experiment experiments/myservice may previously have logged to data/myservice, now using data/myservice


## Call the model

In [189]:
# generate some aribtrary calls, thus collecting tracking data
for i in range(10):
    model = om.runtime.model('myservice')
    result = model.predict({'rooms': 10})
    result.get()

## Check the recorded data

Using the `exp` tracker instance that create in the above `with ...` statement,
we can get all our data.

```python
exp.data(run='*')
```

Why do we specify `run='*'`? This means we want to see all data ever stored in this experiment.

Each call to our model has recorded a *run*. Each run represents one call to 
our model. A run consists of several events:

* *start* - the start event, this starts the run
* *task_call* - the start of *model.predict()* by the runtime
* *task_success* - the result of *model.predict()* by the runtime
* *end* - the end event, this ends the run

(in case of a failure there would be a `task_error` event instead of `task_success`)

In [190]:
exp.data(run='*')

Unnamed: 0,experiment,run,step,event,key,value,dt,node,userid,taskid,name
0,myservice,12,,stop,stop,,2024-11-27 10:49:48.746561,varda,patrick,,
1,myservice,1,,start,start,,2024-11-27 10:55:22.880591,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,
2,myservice,1,,task_call,omegaml.tasks.omega_predict,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:22.891448,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,
3,myservice,1,,artifact,related,"{'name': 'related', 'data': '{""_id"": {""$oid"": ...",2024-11-27 10:55:22.893284,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,related
4,myservice,1,,stop,stop,,2024-11-27 10:55:22.927547,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,
5,myservice,1,,task_success,omegaml.tasks.omega_predict,"{'result': {'price': [20.700640412703976]}, 't...",2024-11-27 10:55:22.937261,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,
6,myservice,2,,start,start,,2024-11-27 10:55:22.984283,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,
7,myservice,2,,task_call,omegaml.tasks.omega_predict,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:22.997963,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,
8,myservice,2,,artifact,related,"{'name': 'related', 'data': '{""_id"": {""$oid"": ...",2024-11-27 10:55:23.000065,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,related
9,myservice,2,,stop,stop,,2024-11-27 10:55:23.025295,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,


We can subset the data in various ways

* `exp.data(..., event='<event name>')` - to select specific events (pass a list for multiple event names)
* `exp.data(..., since=<datetime>|'relative time')` - to select a specific time period
* `exp.data(..., <column>=<value>)` - to select some other column (pass a list of values for multiple matching values)

The data is returned as a dataframe

In [191]:
exp.data(run='*', event='task_call').iloc[0].value

{'args': ['myservice', [{'rooms': 10}]],
 'kwargs': {'rName': None, 'pure_python': False}}

In [221]:
exp.data(run='*', since='10m')

Unnamed: 0,experiment,run,step,event,key,value,dt,node,userid,taskid,name
0,myservice,1,,start,start,,2024-11-27 10:55:22.880591,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,
1,myservice,1,,task_call,omegaml.tasks.omega_predict,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:22.891448,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,
2,myservice,1,,artifact,related,"{'name': 'related', 'data': '{""_id"": {""$oid"": ...",2024-11-27 10:55:22.893284,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,related
3,myservice,1,,stop,stop,,2024-11-27 10:55:22.927547,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,
4,myservice,1,,task_success,omegaml.tasks.omega_predict,"{'result': {'price': [20.700640412703976]}, 't...",2024-11-27 10:55:22.937261,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,
5,myservice,2,,start,start,,2024-11-27 10:55:22.984283,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,
6,myservice,2,,task_call,omegaml.tasks.omega_predict,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:22.997963,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,
7,myservice,2,,artifact,related,"{'name': 'related', 'data': '{""_id"": {""$oid"": ...",2024-11-27 10:55:23.000065,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,related
8,myservice,2,,stop,stop,,2024-11-27 10:55:23.025295,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,
9,myservice,2,,task_success,omegaml.tasks.omega_predict,"{'result': {'price': [20.700640412703976]}, 't...",2024-11-27 10:55:23.031458,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,


## Calculting runtime statistics

Trackers also provide convenient access to some summary statistics:

* `exp.stats.latency()` - to calculate the latency of calls
* `exp.stats.throughput()` - to calculate the throughput (number of calls per time unit, defaults to 60 seconds)
* `exp.stats.utilization()` - to calculate the utilization (percent of theoretical throughput, measured as max. calls possible / time unit, given  the latency)

In [222]:
exp.stats.latency(run='*')

Unnamed: 0,run,experiment,step,event,key,value,dt,node,userid,taskid,name,latency
0,1,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:22.880591,varda,patrick,017db6c3-7864-4141-a44d-45e41cd63f62,related,0.046956
1,2,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:22.984283,varda,patrick,1fdf7b24-5712-4887-8b57-c177f690c943,related,0.041012
2,3,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:23.065562,varda,patrick,694921d4-85a2-4a85-ad49-e559850f2f8b,related,0.021865
3,4,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:23.117013,varda,patrick,b1892b07-c26c-4e58-af72-42fad28dc1b6,related,0.025702
4,5,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:23.172720,varda,patrick,83886313-2ee4-4c79-b977-8345ec9a4a19,related,0.033605
5,6,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:23.268160,varda,patrick,c1d336b1-2a8f-4476-8306-046bd8a4676f,related,0.042339
6,7,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:23.366372,varda,patrick,b82d2c67-1371-4af9-8706-49e13af75bf9,related,0.044595
7,8,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:23.465653,varda,patrick,baedea9a-279d-4bd1-aaa4-cc53c5faf79f,related,0.052719
8,9,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:23.564656,varda,patrick,fc4ebf43-4d86-4139-86fe-aa6a3a1582b1,related,0.038827
9,10,myservice,,start,start,"{'args': ['myservice', [{'rooms': 10}]], 'kwar...",2024-11-27 10:55:23.632611,varda,patrick,68147b63-5077-4576-bede-3ab75d36678a,related,0.033669


In [223]:
exp.stats.latency(run='*', percentiles=[.5, .9])

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,std,min,50%,90%,max
event,key,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
metric,latency,11.0,0.034663,0.014629,0.0,0.038827,0.046956,0.052719


In [224]:
exp.stats.utilization(run='*')

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,std,min,25%,50%,75%,max
event,key,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
metric,utilization,0.181818,0.001768,inf,0.0,0.000884,0.001768,0.002652,0.003535


In [225]:
exp.stats.throughput(run='*')

Unnamed: 0_level_0,Unnamed: 1_level_0,count,mean,std,min,25%,50%,75%,max
event,key,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
metric,group_latency,11.0,60.0,0.0,60.0,60.0,60.0,60.0,60.0
