# Virtual Drift

Drift magnitude metrics
       Computes drift magnitude metrics between base dataset t and dataset u.  

Metrics:
- TVD (Total Variation Distance)
- Helinger
- KL Divergence

### **Steps**

1. [Data exploration](#Data-exploration)
2. [Importing the function](#Importing-the-function)
3. [Running the function locally](#Running-the-function-locally)

### **Data exploration**

In [14]:
# Scikit-learn's wine dataset
from sklearn.datasets import load_wine

wine = load_wine()
print(wine["DESCR"])

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
                                   Min   Max   Mean     SD
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0

In [24]:
wine_t_path = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/virtual_drift/wine_t.pq'
wine_u_path = 'https://s3.wasabisys.com/iguazio/data/function-marketplace-data/virtual_drift/wine_u.pq'
wine_t=pd.read_parquet(wine_t_path)
wine_u=pd.read_parquet(wine_u_path)
print(f'wine_t and wine_u are generated from the wine dataset, where wine_t is the entire dataset while wine_u is a sample (50%) of the entire dataset. \n\
wine_t shape is {wine_t.shape[0]} and wine_u shape is {wine_u.shape[0]} \n\n')
wine_t.head()

wine_t and wine_u are generated from the wine dataset, where wine_t is the entire dataset while wine_u is a sample (50%) of the entire dataset. 
wine_t shape is 178 and wine_u shape is 89 




Unnamed: 0,alcohol,malic_acid,ash,alcalinity_of_ash,magnesium,total_phenols,flavanoids,nonflavanoid_phenols,proanthocyanins,color_intensity,hue,od280/od315_of_diluted_wines,proline,y,prediction
0,14.23,1.71,2.43,15.6,127.0,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065.0,0,0
1,13.2,1.78,2.14,11.2,100.0,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050.0,0,0
2,13.16,2.36,2.67,18.6,101.0,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185.0,0,0
3,14.37,1.95,2.5,16.8,113.0,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480.0,0,0
4,13.24,2.59,2.87,21.0,118.0,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735.0,0,0


### **Importing the function**

In [25]:
import mlrun

# Importing the function
mlrun.set_environment(project='function-marketplace')

fn = mlrun.import_function("hub://virtual_drift")
fn.apply(mlrun.auto_mount())

> 2021-10-26 13:45:22,345 [info] created and saved project function-marketplace


<mlrun.runtimes.kubejob.KubejobRuntime at 0x7ff54a864dd0>

### **Running the function locally**

In [27]:
import os 

container = os.path.join('/',os.environ['V3IO_HOME'].split('/')[0])
user = os.environ["V3IO_USERNAME"]
rel_path = os.getcwd()[6:] + '/artifacts'
tsdb_path = os.path.join(user,rel_path) + "/output_tsdb"

In [32]:
virtual_drift_run=fn.run(params={'label_col': 'y',
                                 'results_tsdb_container': container[1:],
                                 'results_tsdb_table': tsdb_path},
                         inputs={'t': wine_t_path,
                                 'u': wine_u_path},
                         artifact_path=os.getcwd(),
                         local=True)

> 2021-10-26 14:00:41,020 [info] starting run virtual-drift-drift_magnitude uid=28ec7f08ce7c4c528114e2590ff49325 DB=http://mlrun-api:8080




> 2021-10-26 14:00:43,469 [info] Fitting discretizer for alcohol
> 2021-10-26 14:00:43,471 [info] Fitting discretizer for malic_acid
> 2021-10-26 14:00:43,471 [info] Fitting discretizer for ash
> 2021-10-26 14:00:43,472 [info] Fitting discretizer for alcalinity_of_ash
> 2021-10-26 14:00:43,473 [info] Fitting discretizer for magnesium
> 2021-10-26 14:00:43,474 [info] Fitting discretizer for total_phenols
> 2021-10-26 14:00:43,475 [info] Fitting discretizer for flavanoids
> 2021-10-26 14:00:43,476 [info] Fitting discretizer for nonflavanoid_phenols
> 2021-10-26 14:00:43,477 [info] Fitting discretizer for proanthocyanins
> 2021-10-26 14:00:43,477 [info] Fitting discretizer for color_intensity
> 2021-10-26 14:00:43,478 [info] Fitting discretizer for hue
> 2021-10-26 14:00:43,479 [info] Fitting discretizer for od280/od315_of_diluted_wines
> 2021-10-26 14:00:43,480 [info] Fitting discretizer for proline
> 2021-10-26 14:00:43,531 [info] Discretizing featuers
> 2021-10-26 14:00:43,752 [info] C

divide by zero encountered in log
casting datetime64[ns] values to int64 with .astype(...) is deprecated and will raise in a future version. Use .view(...) instead.


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
function-marketplace,...0ff49325,0,Oct 26 14:00:41,completed,virtual-drift-drift_magnitude,v3io_user=danikind=owner=danihost=jupyter-dani-6bfbd76d96-zxx6f,tu,label_col=yresults_tsdb_container=usersresults_tsdb_table=dani/test/functions/virtual_drift/artifacts/output_tsdb,prior_tvd=0.5prior_helinger=0.541prior_kld=10class_shift_tvd=0.017class_shift_helinger=0.014class_shift_kld=0.002,discritizerst_discreteu_discretefeatures_t_pdffeatures_u_pdfclass_t_pdfclass_u_pdf





> 2021-10-26 14:00:44,153 [info] run executed, status=completed


In [38]:
virtual_drift_run.artifact('class_u_pdf').show()
virtual_drift_run.artifact('class_t_pdf').show()

Unnamed: 0,u
0,0.348315
1,0.382022
2,0.269663


Unnamed: 0,t
0,0.331461
1,0.398876
2,0.269663


In [69]:
import v3io_frames as v3f
client = v3f.Client(os.environ["V3IO_FRAMESD"],container=container[1:])
client.read(backend='tsdb',table=tsdb_path)



Unnamed: 0_level_0,class_shift_helinger,class_shift_kld,class_shift_tvd,prior_helinger,prior_kld,prior_tvd,stream
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2021-10-26 13:58:04.445000+00:00,0.01398,0.001564,0.016854,0.541196,10.0,0.5,some_stream
2021-10-26 14:00:44.008000+00:00,0.01398,0.001564,0.016854,0.541196,10.0,0.5,some_stream


[Back to the top](#Virtual-Drift)