# Run ML Insights Using Config File

# Use Case

This Notebook shows the ML Insights Low code option to define and customise all of its core features like data scheme , data ingestion, data transformation, metric calculation and post processing of metric output using json based configuration .

### About Dataset
The Iris flower data set or Fisher's Iris data set is a multivariate data set . The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

Dataset source : https://archive.ics.uci.edu/dataset/53/iris

# Install ML Observability Insights Library SDK

- Prerequisites
    - Linux/Mac (Intel CPU)
    - Python 3.8 and 3.9 only
    
    
- Installation 
    - ML Insights is made available as a Python package (via Artifactory) which can be installed using pip install as shown below. Depending on the execution engine on which to do the run, one can use scoped package. For eg: if we want to run on dask, use oracle-ml-insights[dask], for spark use oracle-ml-insights[spark], for native use oracle-ml-insights. One can install all the dependencies as use oracle-ml-insights[all]
    
      !pip install oracle-ml-insights


Refer : [Installation and Setup](https://docs.oracle.com/en-us/iaas/tools/ml-insights-docs/latest/ml-insights-documentation/html/user_guide/tutorials/install.html)

This example notebook showcases how to use Insights config reader to run the evaluation based on monitor config.  Sample monitor config JSON and sample data are available under `monitor_configs/monitor_config.json` and `input_data/iris-dataset` respectively

In [None]:
!python3 -m pip install oracle-ml-insights

# 1 ML Insights Imports

In [8]:
import pandas as pd

from mlm_insights.builder.insights_builder import InsightsBuilder
from mlm_insights.config_reader.insights_config_reader import InsightsConfigReader

import json

# 2 What is Json based Configuration 

ML Insights provides low code option to define and customise all of its core features like data scheme , data ingestion, data transformation, metric calculation and post processing of metric output using json based configuration .

Refer : [ML Insights Config File Documentation](https://docs.oracle.com/en-us/iaas/tools/ml-insights-docs/latest/ml-insights-documentation/html/user_guide/tutorials/config_reader.html)



# 3 Compute the profile 

Initialize the Insights builder by specifying the location of the monitor config JSON using the Insights config reader.

The run() method is responsible to run the internal workflow. It also handles the life cycle of each component passed, which includes creation (if required), invoking interface functions, destroying etc . Additionally, runner also handles some more advanced operations like thread pooling, compute engine abstraction etc.



In [9]:
insights_builder = InsightsConfigReader(config_location="monitor_configs/monitor_config.json").get_builder()
run_result = insights_builder.build().run()
profile = run_result.profile


# 4 Result

## 4.1 Visualize the Profile in tabular format

In [10]:
profile.to_pandas()

Unnamed: 0,Skewness,StandardDeviation,Min,IsConstantFeature,IQR,Mode,Range,TypeMetric.string_type_count,TypeMetric.integral_type_count,TypeMetric.fractional_type_count,...,Count.missing_count_percentage,Max,DistinctCount,Sum,IsQuasiConstantFeature,Quartiles.q1,Quartiles.q2,Quartiles.q3,Mean,Kurtosis
sepal length (cm),0.311753,0.825301,4.3,False,1.3,[5.0],3.6,0,0,600,...,0.0,7.9,35,3506.0,False,5.1,5.8,6.4,5.843333,-0.573568
sepal width (cm),0.315767,0.434411,2.0,False,0.5,[3.0],2.4,0,0,600,...,0.0,4.4,23,1834.4,False,2.8,3.0,3.3,3.057333,0.180976
petal length (cm),-0.272128,1.759404,1.0,False,3.5,"[1.4, 1.5]",5.9,0,0,600,...,0.0,6.9,43,2254.8,False,1.6,4.3,5.1,3.758,-1.395536
petal width (cm),-0.101934,0.759693,0.1,False,1.5,[0.2],2.4,0,0,600,...,0.0,2.5,22,719.6,False,0.3,1.3,1.8,1.199333,-1.336067


## 4.2 Visualize the Profile in JSON format

In [11]:
profile_json = profile.to_json()
dataset_metrics = profile_json
print(json.dumps(dataset_metrics,sort_keys=True, indent=4))


{
    "dataset_metrics": {},
    "feature_metrics": {
        "petal length (cm)": {
            "Count": {
                "metadata": {},
                "metric_data": [
                    600.0,
                    0.0,
                    0.0
                ],
                "metric_description": "Feature metric that returns total count, missing count and missing count percentage",
                "metric_name": "Count",
                "variable_count": 3,
                "variable_dimensions": [
                    0,
                    0,
                    0
                ],
                "variable_dtypes": [
                    "INTEGER",
                    "INTEGER",
                    "FLOAT"
                ],
                "variable_names": [
                    "total_count",
                    "missing_count",
                    "missing_count_percentage"
                ],
                "variable_types": [
                    "CONTINUOUS",
            

In [12]:
pd.json_normalize(dataset_metrics).T.dropna()

Unnamed: 0,0
feature_metrics.sepal length (cm).Skewness.metric_name,Skewness
feature_metrics.sepal length (cm).Skewness.metric_description,Feature Metric to compute Skewness
feature_metrics.sepal length (cm).Skewness.variable_count,1
feature_metrics.sepal length (cm).Skewness.variable_names,[skewness]
feature_metrics.sepal length (cm).Skewness.variable_types,[CONTINUOUS]
...,...
feature_metrics.petal width (cm).Kurtosis.variable_names,[kurtosis]
feature_metrics.petal width (cm).Kurtosis.variable_types,[CONTINUOUS]
feature_metrics.petal width (cm).Kurtosis.variable_dtypes,[FLOAT]
feature_metrics.petal width (cm).Kurtosis.variable_dimensions,[0]
