# Writing Custom Metrics
Let's start by generating a simple synthetic dataset.

In [1]:
import warnings

warnings.filterwarnings("ignore")

In [2]:
from sdv import load_demo, SDV

sdv = SDV()
metadata, real_tables = load_demo(metadata=True)

sdv.fit(metadata, real_tables)
synthetic_tables = sdv.sample_all(20)

Next, we'll create an empty `MetricsReport` object to hold our custom metrics.

In [3]:
from sdmetrics.report import MetricsReport

report = MetricsReport()

## Generic Metric
The simplest way to create a custom metric is to use the generic metric API. You simply write a function which yields a sequence of Metric objects, attach it to a metrics report, and you're ready to go!

In [4]:
from sdmetrics.report import Metric

def my_custom_metrics(metadata, real_tables, synthetic_tables):
    name = "abs-diff-in-number-of-rows"

    for table_name in real_tables:

        # Absolute difference in number of rows
        nb_real_rows = len(real_tables[table_name])
        nb_synthetic_rows = len(synthetic_tables[table_name])
        value = float(abs(nb_real_rows - nb_synthetic_rows))

        # Specify some useful tags for the user
        tags = set([
            "priority:high",
            "table:%s" % table_name
        ])

        yield Metric(name, value, tags)
        
report.add_metrics(my_custom_metrics(metadata, real_tables, synthetic_tables))

## Statistic Metric
Alternatively, if you're looking to create a statistical metric which looks at univariate or bivariate distributions, you can subclass the `UnivariateMetric` class and fill in a single function. The base class will handle identifying the columns which have the correct data type, traversing the tables, and so on. You can simply focus on the math.

In [5]:
from scipy.stats import chisquare

from sdmetrics.report import Goal
from sdmetrics.multivariate.statistical.univariate import UnivariateMetric
from sdmetrics.multivariate.statistical.utils import frequencies

class CSTest(UnivariateMetric):

    name = "chisquare"
    dtypes = ["object", "bool"]

    @staticmethod
    def metric(real_column, synthetic_column):
        """This function uses the Chi-squared test to compare the distributions
        of the two categorical columns. It returns the resulting p-value so that
        a small value indicates that we can reject the null hypothesis (i.e. and
        suggests that the distributions are different).

        Arguments:
            real_column (np.ndarray): The values from the real database.
            synthetic_column (np.ndarray): The values from the fake database.

        Returns:
            (str, Goal, str, tuple): A tuple containing (value, goal, unit, domain)
            which corresponds to the fields in a Metric object.
        """
        f_obs, f_exp = frequencies(real_column, synthetic_column)
        statistic, pvalue = chisquare(f_obs, f_exp)
        return pvalue, Goal.MAXIMIZE, "p-value", (0.0, 1.0)

report.add_metrics(CSTest().metrics(metadata, real_tables, synthetic_tables))

## Detection Metric
Similarly, if you're looking to create a detection metric, you can subclass the `TabularDetector` class and fill in the `fit` and `predict_proba` functions. The base class will handle denormalizing parent-child relationships, etc. so you can focus on the machine learning.

In [6]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import RobustScaler
from sklearn.svm import SVC

from sdmetrics.multivariate.detection.tabular import TabularDetector


class SVCDetector(TabularDetector):

    name = "svc"

    def fit(self, X, y):
        """This function trains a sklearn pipeline with a robust scalar
        and a support vector classifier.

        Arguments:
            X (np.ndarray): The numerical features (i.e. transformed rows).
            y (np.ndarray): The binary classification target.
        """
        self.model = Pipeline([
            ('scalar', RobustScaler()),
            ('classifier', SVC(probability=True, gamma='scale')),
        ])
        self.model.fit(X, y)

    def predict_proba(self, X):
        return self.model.predict_proba(X)[:, 1]

report.add_metrics(SVCDetector().metrics(metadata, real_tables, synthetic_tables))

## Collecting Metrics
Now that we've generated all the metrics, we can explore the value of each metric using the standard `MetricsReport` interface which allows users to summarize, visualize, and explore the metrics at various levels of granularity.

In [7]:
report.details()

Unnamed: 0,Name,Value,Goal,Unit,Tables,Columns,Misc. Tags
0,abs-diff-in-number-of-rows,10.0,Goal.IGNORE,,table:users,,priority:high
1,abs-diff-in-number-of-rows,9.0,Goal.IGNORE,,table:sessions,,priority:high
2,abs-diff-in-number-of-rows,11.0,Goal.IGNORE,,table:transactions,,priority:high
3,chisquare,0.999283,Goal.MAXIMIZE,p-value,table:users,column:country,statistic:univariate
4,chisquare,0.0,Goal.MAXIMIZE,p-value,table:users,column:gender,"statistic:univariate,priority:high"
5,chisquare,0.698933,Goal.MAXIMIZE,p-value,table:sessions,column:device,statistic:univariate
6,chisquare,0.714906,Goal.MAXIMIZE,p-value,table:sessions,column:os,statistic:univariate
7,chisquare,0.859781,Goal.MAXIMIZE,p-value,table:transactions,column:approved,statistic:univariate
8,svc,0.619048,Goal.MINIMIZE,auroc,table:transactions,,detection:auroc
9,svc,0.672619,Goal.MINIMIZE,auroc,table:sessions,,detection:auroc
