# Improved Insights Configuration Authoring Experience

This Notebook shows the features to ease the developer experience of authoring Insights JSON Configuration. This shows how to author Insights configuration programmatically using InsightsBuilder and InsightsConfigWriter APIs.

The InsightsBuilder class is used to define and customise all of its core features like data schema, data ingestion, data transformation, metric calculation and post processing of metric output .

The InsightsConfigWriter class from ML Insights Library will be used to build a config JSON file from InsightsBuilder class instance.

In this Notebook we have the following examples -


- Generate Insights Configuration JSON from InsightsBuilder class
- Approximate input_schema detection from sample dataset and then generate Insights Configuration JSON

# Install ML Observability Insights Library SDK

- Prerequisites
    - Linux/Mac (Intel CPU)
    - Python 3.8 and 3.9 only


- Installation
    - ML Insights is made available as a Python package (via Artifactory) which can be installed using pip install as shown below. Depending on the execution engine on which to do the run, one can use scoped package. For eg: if we want to run on dask, use oracle-ml-insights[dask], for spark use oracle-ml-insights[spark], for native use oracle-ml-insights. One can install all the dependencies as use oracle-ml-insights[all]

      !pip install oracle-ml-insights

Refer : [Installation and Setup](https://docs.oracle.com/en-us/iaas/tools/ml-insights-docs/latest/ml-insights-documentation/html/user_guide/tutorials/install.html)

In [None]:
!python3 -m pip install oracle-ml-insights

# 1 ML Insights Imports 

In [1]:
# imports

import json

# Import Data Quality metrics 
from mlm_insights.core.metrics.mean import Mean
from mlm_insights.core.metrics.standard_deviation import StandardDeviation

# Import Data Integrity metrics
from mlm_insights.core.metrics.rows_count import RowCount

from mlm_insights.builder.builder_component import MetricDetail
from mlm_insights.constants.types import FeatureType, DataType, VariableType
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
from mlm_insights.core.post_processors.local_writer_post_processor import LocalWriterPostProcessor

# import data reader
from mlm_insights.core.data_sources import LocalDatePrefixDataSource
from mlm_insights.mlm_native.readers import CSVNativeDataReader

# import InsightsBuilder
from mlm_insights.builder.insights_builder import InsightsBuilder

# import InsightsConfigWriter
from mlm_insights.config_writer.insights_config_writer import InsightsConfigWriter

## 2 Generate Insights Configuration JSON using InsightsConfigWriter 

The below section shows how the InsightsBuilder class is used to define and customise all of its core features like  data schema, data ingestion, metric calculation and post processing of metric output .

The Config Writer class from ML Insights Library used to build a config file from InsightsBuilder class instance using to_json() method.

The user can save the config to Object storage using save_config_to_object_storage() method of Config Writer class of ML Insights Library.

In [None]:
def get_input_schema():
    return {
        "Pregnancies": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),
        "BloodPressure": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS)
    }

def get_metrics_input():
    metrics = [
               MetricMetadata(klass=Mean),
               MetricMetadata(klass=StandardDeviation)
              ]
    uni_variate_metrics = {
        "BloodPressure": metrics
    }
    metric_details = MetricDetail(univariate_metric=uni_variate_metrics,
                                  dataset_metrics=[MetricMetadata(klass=RowCount)])
    return metric_details

def get_reader():
    data = {
        "file_type": "csv",
        "date_range": {"start": "2023-06-26", "end": "2023-06-27"}
    }
    base_location ="input_data/diabetes_prediction"
    ds = LocalDatePrefixDataSource(base_location, **data)
    csv_reader = CSVNativeDataReader(data_source=ds)
    return csv_reader

def write_config(config_json,file_name):
  """
  Writes the configuration dictionary to a JSON file.
  """
  with open(file_name, "w") as f:
      json.dump(config_json, f, indent=4)  # Indent for readability
  print("Configuration file created ")

def main():    
    # Set up the insights builder by passing: input schema, metric, reader and engine details
    runner = InsightsBuilder(). \
        with_input_schema(get_input_schema()). \
        with_metrics(metrics=get_metrics_input()). \
        with_reader(reader=get_reader()). \
        with_post_processors(post_processors=[LocalWriterPostProcessor(file_location="output_data/profiles", file_name="classification_metrics_profile.bin")])


    # Run the evaluation
    config_writer = InsightsConfigWriter(insights_builder=runner)
    config_json_from_builder = config_writer.to_json()
    return config_json_from_builder
    
config_json = main()
config_json_1 = json.loads(config_json)
print(config_json_1)   
write_config(config_json_1,"config_json_1")

## 2.1 Generate Configuration with Automatic approximate input_schema detection 

In above section we showed how to define the input schema of each feature one by one along with defining other components using Insights Builder . To ease the developer experience in below section we show how to use automatic approximate input_schema detection feature using the sample dataset.The auto-generated input_schema feature infers the data_type and variable_type of each feature and creates the input schema.

Here we are using with_input_schema_using_dataset() method of InsightsBuilder class which take the sample dataset and column_type feature details and auto generate the approximated input_schema instead of defining each feature schema .

Note : The auto generated input_schema is approximated version of input-schema, it may not be 100% correct .User needs to validate the input_schema and make the neccesary changes if required.


In [None]:
def config_authoring_using_auto_generated_input_schema():  
    data_set_location = "input_data/diabetes_prediction/2023-06-26/2023-06-26.csv"
    target_features = ["Outcome"]
    prediction_features = ["Prediction"]
    prediction_score_features = ["Prediction_Score"]
    # Set up the insights builder by passing: dataset location to generate approaximate input_schema, coulumn_type feature name , metric, reader and engine details
    runner = InsightsBuilder(). \
        with_input_schema_using_dataset(data_set_location,target_features,prediction_features,prediction_score_features). \
        with_metrics(metrics=get_metrics_input()). \
        with_reader(reader=get_reader()). \
        with_post_processors(post_processors=[LocalWriterPostProcessor(file_location="output_data/profiles", file_name="classification_metrics_profile.bin")])


    # Run the evaluation
    config_writer = InsightsConfigWriter(insights_builder=runner)
    config_json_from_builder = config_writer.to_json()
    print(config_json_from_builder)
    return config_json_from_builder

config_json = config_authoring_using_auto_generated_input_schema()
config_json_2 = json.loads(config_json)

write_config(config_json_2,"config_json_2.json")
