# Run ML Insights Unified Builder API to push Notifications to OCI Monitoring

# Use Case

This Notebook demonstrates how to leverage the ML Insights SDK to push data regarding Insights test results to OCI Monitoring, so that any threshold breaches can be communicated to a user via OCI Monitoring. For information on how OCI Monitoring can be used to configure alarms or visualize threshold breaches, refer [OCI Monitoring Notifications Documentation](https://objectstorage.us-ashburn-1.oraclecloud.com/p/52qrFSNgCH85OWPBGIfTgNm-KeibRU8oPSSBdDg_t90gZ89r5qXrQFpTfdvQ9ear/n/bigdatadatasciencelarge/b/ml-insight-doc/o/user_guide/tutorials/notifications.html)

## Prerequisites
The OCI Monitoring post-processor component (used to push test results to OCI Monitoring) will fail unless the user is authenticated to push to OCI Monitoring. These authentication details can be set by arguments passed to the post-processor. Additionally, the user will need to set IAM policies to allow pushing metrics to OCI Monitoring.

For more details on arguments passed to this post-processor as well as IAM policy requirements, refer : [OCI Monitoring Post-processor Documentation](https://docs.oracle.com/en-us/iaas/tools/ml-insights-docs/latest/ml-insights-documentation/html/user_guide/getting_started/post_processor_component.html#ocimonitoringpostprocessor)

## About Dataset
The data was collected and made available by “National Institute of Diabetes and Digestive and Kidney Diseases” as part of the Pima Indians Diabetes Database. Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.

The data set contains medical and demographic data of patients . It consists of various features such as Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age, Outcome, Prediction, BMICategory, Prediction_Score .

Dataset source : https://www.kaggle.com/datasets/kandij/diabetes-dataset


# Install ML Observability Insights Library SDK

- Prerequisites
- Linux/Mac (Intel CPU)
- Python 3.8 and 3.9 only


- Installation
- ML Insights is made available as a Python package (via Artifactory) which can be installed using pip install as shown below. Depending on the execution engine on which to do the run, one can use scoped package. For eg: if we want to run on dask, use oracle-ml-insights[dask], for spark use oracle-ml-insights[spark], for native use oracle-ml-insights. One can install all the dependencies as use oracle-ml-insights[all]

!pip install oracle-ml-insights

Refer : [Installation and Setup](https://docs.oracle.com/en-us/iaas/tools/ml-insights-docs/latest/ml-insights-documentation/html/user_guide/tutorials/install.html)

In [None]:
!python3 -m pip install oracle-ml-insights

In [None]:
!python3 -m pip install matplotlib

# ML Insights Imports

In [12]:
# Imports

# Import Data Quality metrics
from mlm_insights.core.metrics.count import Count
from mlm_insights.core.metrics.mean import Mean
from mlm_insights.core.metrics.quartiles import Quartiles

# Import Data Integrity metrics
from mlm_insights.core.metrics.rows_count import RowCount

from mlm_insights.builder.builder_component import MetricDetail
from mlm_insights.constants.types import FeatureType, DataType, VariableType, ColumnType
from mlm_insights.core.metrics.metric_metadata import MetricMetadata
from mlm_insights.builder.insights_builder import InsightsBuilder

# Import Data reader
from mlm_insights.core.data_sources import LocalDatePrefixDataSource
from mlm_insights.mlm_native.readers import CSVNativeDataReader

# Import Profile reader
from mlm_insights.core.profile_readers.local_profile_reader import LocalProfileReader

# Import Test config
from mlm_insights.builder.builder_component import TestConfig
from mlm_insights.tests.selectors.dataset_metric_selector import DatasetMetricSelector
from mlm_insights.tests.test_types.metric_based_tests.test_is_complete import TestIsComplete
from mlm_insights.tests.test_types.predicate_based_tests.test_less_than import TestLessThan
from mlm_insights.tests.profile_source import ProfileSource
from mlm_insights.tests.selectors.feature_metric_selector import FeatureMetricSelector
from mlm_insights.tests.test_types.predicate_based_tests.test_greater_than import TestGreaterThan

# Import OCI Monitoring post-processor
from mlm_insights.core.post_processors.oci_monitoring_post_processor import OCIMonitoringPostProcessor

# Configure feature schema, metrics and data reader

For additional context on these components, please refer to the previous sample notebooks in this series.

In [13]:
def get_input_schema():
    return {
        "Pregnancies": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),
        "BloodPressure": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),
        "SkinThickness": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),
    

    "Insulin": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),
        "BMI": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),
        "Age": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),
        "DiabetesPedigreeFunction": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS),
        "Outcome": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS,column_type = ColumnType.TARGET),
        "Prediction": FeatureType(data_type=DataType.FLOAT, variable_type=VariableType.CONTINUOUS,column_type = ColumnType.PREDICTION),
        "BMICategory":FeatureType(data_type=DataType.STRING, variable_type=VariableType.NOMINAL)
    }
        

def get_metrics():
    metrics = [
        MetricMetadata(klass=Mean),
        MetricMetadata(klass=Count),
        MetricMetadata(klass=Quartiles)
    ]
    uni_variate_metrics = {
        "BloodPressure": metrics
    }
    metric_details = MetricDetail(univariate_metric=uni_variate_metrics,
                                  dataset_metrics=[MetricMetadata(klass=RowCount)])
    return metric_details

def get_reader():
    data = {
        "file_type": "csv",
        "date_range": {"start": "2023-06-26", "end": "2023-06-27"}
    }
    base_location ="input_data/diabetes_prediction"
    ds = LocalDatePrefixDataSource(base_location, **data)
    print(ds.get_data_location())
    csv_reader = CSVNativeDataReader(data_source=ds)
    return csv_reader

# 1 Configure Tests

The Tests component allows users to configure threshold-based tests on metrics. Tests can be supplied via the Insights Config Reader or Builder API.


Refer : [Test/Test Suites Documentation](https://objectstorage.us-ashburn-1.oraclecloud.com/p/52qrFSNgCH85OWPBGIfTgNm-KeibRU8oPSSBdDg_t90gZ89r5qXrQFpTfdvQ9ear/n/bigdatadatasciencelarge/b/ml-insight-doc/o/user_guide/getting_started/test_suites_component.html)

In [14]:
def get_test_config():
    test = []
    test.append(TestGreaterThan(
        lhs=FeatureMetricSelector(
            profile_source=ProfileSource.CURRENT,
            feature_name="BloodPressure",
            metric_key="Mean"),
        rhs=100))
    test.append(TestLessThan(
        lhs=FeatureMetricSelector(
            profile_source=ProfileSource.CURRENT,
            feature_name="BloodPressure",
            metric_key="Quartiles.q1"),
        rhs=200))
    test.append(TestGreaterThan(
        lhs=FeatureMetricSelector(
            profile_source=ProfileSource.CURRENT,
            feature_name="BloodPressure",
            metric_key="Quartiles.q1"),
        rhs=50000))
    test.append(TestIsComplete(feature_name='BloodPressure'))
    test.append(TestGreaterThan(
        lhs=DatasetMetricSelector(
            profile_source=ProfileSource.CURRENT,
            metric_key="RowCount"),
        rhs=5000))

    test_config = TestConfig(tests=test)
    return test_config

# 2 Configure Profile Reader

The Profile Reader component allows for reading an Insights profile into the framework. This optional component is primarily used to pass the reference profile to the Insights Tests component.


Refer : [Profile Reader Documentation](https://objectstorage.us-ashburn-1.oraclecloud.com/p/52qrFSNgCH85OWPBGIfTgNm-KeibRU8oPSSBdDg_t90gZ89r5qXrQFpTfdvQ9ear/n/bigdatadatasciencelarge/b/ml-insight-doc/o/user_guide/getting_started/profile_reader_component.html)

In [15]:
def get_profile_reader():
    base_location ="input_data/profiles/profile_diabetes_reference.bin"
    profile_reader = LocalProfileReader(base_location)
    return profile_reader

# 3 Configure OCI Monitoring Post-processor

The OCI Monitoring post-processor component allows users to push the test results to OCI Monitoring Service.

NOTE: The post-processor component will fail unless the user is authenticated to push to OCI Monitoring. Please make sure that the arguments configured in the oci_monitoring_params dictionary in the below code snippet are valid for your use case before running the rest of the notebook.

For more details on arguments passed to this processor, refer : [OCI Monitoring Post-processor Documentation](https://objectstorage.us-ashburn-1.oraclecloud.com/p/52qrFSNgCH85OWPBGIfTgNm-KeibRU8oPSSBdDg_t90gZ89r5qXrQFpTfdvQ9ear/n/bigdatadatasciencelarge/b/ml-insight-doc/o/user_guide/getting_started/post_processor_component.html#ocimonitoringpostprocessor)

In [28]:
def get_post_processors():
    post_processors = []
    oci_monitoring_params = {
        "compartment_id": "ocid1.compartment.oc1..<ocid>", # Update with the compartment OCID you wish to use
        "namespace": "sample_ml_insights_tests",
        "dimensions": {"key": "value"},
        "is_critical": False,
        "auth": { # Update with the auth details you wish to configure
            "file_location": "~/.oci/config",
            "profile_name": "DEFAULT"
        }
    }
    post_processors.append(OCIMonitoringPostProcessor(
        compartment_id=oci_monitoring_params["compartment_id"],
        namespace=oci_monitoring_params["namespace"],
        dimensions=oci_monitoring_params["dimensions"],
        is_critical=oci_monitoring_params["is_critical"],
        auth=oci_monitoring_params["auth"]))
    return post_processors

# 4 Run Unified Insights Builder API

Create the InsightsBuilder object which provides core set of api, using which user can set the behavior of their monitoring. The below code snippet introduces the unified InsightsBuilder API, which allows users to configure tests along with the components that were previously configurable via InsightsBuilder. The InsightsBuilder computes the profile, runs tests on metric results based on the test config, and pushes test results to OCI Monitoring via the OCI Monitoring post-processor.

Refer : [Builder Object Documentation](https://objectstorage.us-ashburn-1.oraclecloud.com/p/52qrFSNgCH85OWPBGIfTgNm-KeibRU8oPSSBdDg_t90gZ89r5qXrQFpTfdvQ9ear/n/bigdatadatasciencelarge/b/ml-insight-doc/o/user_guide/getting_started/builder_object.html)


In [29]:
def main():
    # Set up the insights builder by passing: input schema, metric, reader, profile reader, test config and engine details
    # NOTE: The post-processor component will fail unless the user is authenticated to push to OCI Monitoring
    runner = InsightsBuilder(). \
        with_input_schema(get_input_schema()). \
        with_metrics(metrics=get_metrics()). \
        with_reader(reader=get_reader()). \
        with_reference_profile(profile=get_profile_reader()). \
        with_test_config(test_config=get_test_config()). \
        with_post_processors(post_processors=get_post_processors()). \
        build()

    # Run the evaluation
    run_result = runner.run()
    profile = run_result.profile
    test_results = run_result.test_results
    return profile, test_results

profile, test_results = main()
print("Profile data: ")
print(profile.to_pandas())
print("Test results data: ")
print(test_results)

['input_data/diabetes_prediction/2023-06-26/2023-06-26.csv', 'input_data/diabetes_prediction/2023-06-27/2023-06-27.csv']
Profile data: 
               Quartiles.q1  Quartiles.q2  Quartiles.q3       Mean  \
BloodPressure          62.0          72.0          80.0  69.134328   

               Count.total_count  Count.missing_count  \
BloodPressure              938.0                  0.0   

               Count.missing_count_percentage  
BloodPressure                             0.0  
Test results data: 
TestResults(test_summary=TestSummary(total_tests=5, passed_tests=2, failed_tests=3, error_test=0), test_results=[TestResult(name='TestGreaterThan', description='The Mean of feature BloodPressure is 69.13432835820896. Test condition : 69.13432835820896 > 100', status=<TestStatus.FAILED: 1>, test_config={'strictly': False}, test_assertion_info=TestAssertionInfo(expected=100, actual=69.13432835820896), error=TestError(description='', has_error=False), user_defined_tags={}, system_tags={'met

# 5 Verify Pushed Test Results in OCI Monitoring

At this stage, the test results have been sent to OCI Monitoring. For information on how these pushed test results can be visualized in OCI Monitoring and then used to configure alarms or dashboards, refer [OCI Monitoring Notifications Documentation](https://objectstorage.us-ashburn-1.oraclecloud.com/p/52qrFSNgCH85OWPBGIfTgNm-KeibRU8oPSSBdDg_t90gZ89r5qXrQFpTfdvQ9ear/n/bigdatadatasciencelarge/b/ml-insight-doc/o/user_guide/tutorials/notifications.html)