# Causeme

The CauseMe platform, hosted at https://causeme.uv.es/, is an online benchmarking tool focused on causal discovery methods, specifically designed to detect causal associations in time series datasets. These datasets may relate to intricate systems like the Earth's ecology or the human brain, where determining causality can be highly complex. CauseMe offers ground truth benchmark datasets, containing both synthetic models that emulate real challenges and real-world datasets with known causal structures, varying in complexity, dimensionality, and sophistication. Method developers can contribute by developing new methods and assessing their performance on available datasets, or by providing multivariate time series data with known causal ground truth. Within the platform, synthetic model data are provided to address a multitude of real-world challenges, and developers can upload their predictions of causal connections. The platform then evaluates and ranks these methods using different performance metrics. CauseMe serves as a valuable tool for researchers and scholars by providing access to data, insights, and evaluation techniques that promote advancements in causal discovery across various fields.

## Causeme and D2C

D2CPY could find substantial utility by integrating with the platform's vast benchmark datasets for causal discovery. By leveraging CauseMe’s ground truth datasets and evaluation metrics, developers and data scientists using D2CPY can validate and refine their causal discovery methods. The platform's diverse datasets, encompassing both real-world and synthetic challenges, could help in testing the robustness of the D2C algorithm implemented in D2CPY.

By utilizing the code that we'll be exploring together, library users can seamlessly connect their D2CPY-based causal discovery methods to CauseMe. This will allow them to run their algorithms on CauseMe's real and synthetic datasets, compare the outcomes with ground truth causal structures, and gain valuable insights into the effectiveness and robustness of their approaches. The code bridges the functionality of D2CPY with CauseMe’s evaluation environment, streamlining the process of testing, validation, and enhancement.

## The code

First, the user must register her method on the causemeplatform, to receive a `method_sha` that needs to be included in the submission. <br>
For submitting to the platform, a wrapper around a function `my_method()` is provided by the D2C platform. <br>
Therefore, the user can focus on the implementation of the aforementioned function. A minimal example is shown below. <br>

In [1]:

"""
This file must contain a function called my_method that triggers all the steps 
required in order to obtain

 *val_matrix: mandatory, (N, N) matrix of scores for links
 *p_matrix: optional, (N, N) matrix of p-values for links; if not available, 
            None must be returned
 *lag_matrix: optional, (N, N) matrix of time lags for links; if not available, 
              None must be returned

Zip this file (together with other necessary files if you have further handmade 
packages) to upload as a code.zip. You do NOT need to upload files for packages 
that can be imported via pip or conda repositories. Once you upload your code, 
we are able to validate results including runtime estimates on the same machine.
These results are then marked as "Validated" and users can use filters to only 
show validated results.

Shown here is a vector-autoregressive model estimator as a simple method.
"""
# import sys
# sys.path.append("..")
# import numpy as np
# from d2c.simulatedTimeSeries import SimulatedTimeSeries
# from d2c.D2C import D2C
# import pandas as pd
# from sklearn.ensemble import RandomForestClassifier

# Your method must be called 'my_method'
# Describe all parameters (except for 'data') in the method registration on CauseMe
def my_method(data, maxlags=1, correct_pvalues=True):

    # Input data is of shape (time, variables)
    T, N = data.shape

    data_df = pd.DataFrame(data)

    d2c_test = D2C([None],[data_df])
    X_test = d2c_test.compute_descriptors_no_dags()

    training_data = pd.read_csv('./timeseries_training.csv')

    X_train = training_data.drop(['graph_id', 'edge_source', 'edge_dest', 'is_causal'], axis=1)
    y_train = training_data['is_causal']

    test_df = pd.DataFrame(X_test).drop(['graph_id', 'edge_source', 'edge_dest'], axis=1)
    clf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
    clf.fit(X_train, y_train)

    y_pred = clf.predict(test_df)

    returned = pd.concat([pd.DataFrame(X_test), pd.DataFrame(y_pred, columns=['is_causal'])], axis=1)
    of_interest = returned[['edge_source', 'edge_dest','is_causal']]
    

    val_matrix = np.zeros((N, N), dtype='float32')

    for index, row in of_interest.iterrows():
        source = row['edge_source']
        dest = row['edge_dest']
        weight = row['is_causal']
        val_matrix[source, dest] = weight

    return val_matrix, None, None


The wrapper code, that requires minor modifications, can be found below. Datasets in the `experiment` folder must be downloaded from the platform as necessary. The wrapper will produce an output file in the `results` folder that needs to be uploaded on the causeme platform. 

In [2]:
# """
# This script can be used to iterate over the datasets of a particular experiment.
# Below you import your function "my_method" stored in the module causeme_my_method.

# Importantly, you need to first register your method on CauseMe.
# Then CauseMe will return a hash code that you use below to identify which method
# you used. Of course, we cannot check how you generated your results, but we can
# validate a result if you upload code. Users can filter the Ranking table to only
# show validated results.
# """


# # Imports
# import numpy as np
# import json
# import zipfile
# import bz2
# import time

# from causeme_my_method import my_method

# # Setup a python dictionary to store method hash, parameter values, and results
# results = {}

# ################################################
# # Identify method and used parameters
# ################################################

# # Method name just for file saving
# method_name = 'varmodel-python'

# # Insert method hash obtained from CauseMe after method registration
# results['method_sha'] = "e182a71f4e1645a1b9ede10f615df88a"

# # The only parameter here is the maximum time lag
# maxlags = 1

# # Parameter values: These are essential to validate your results
# # provided that you also uploaded code
# results['parameter_values'] = "maxlags=%d" % maxlags

# #################################################
# # Experiment details
# #################################################
# # Choose model and experiment as downloaded from causeme
# results['model'] = 'linear-VAR'

# # Here we choose the setup with N=3 variables and time series length T=150
# experimental_setup = 'N-3_T-150'
# results['experiment'] = results['model'] + '_' + experimental_setup

# # Adjust save name if needed
# save_name = '{}_{}_{}'.format(method_name,
#                               results['parameter_values'],
#                               results['experiment'])

# # Setup directories (adjust to your needs)
# experiment_zip = 'experiments/%s.zip' % results['experiment']
# results_file = 'results/%s.json.bz2' % (save_name)

# #################################################

# # Start of script
# scores = []
# pvalues = []
# lags = []
# runtimes = []

# # (Note that runtimes on causeme are only shown for validated results, this is more for
# # your own assessment here)

# # Loop over all datasets within an experiment
# # Important note: The datasets need to be stored in the order of their filename
# # extensions, hence they are sorted here
# print("Load data")
# with zipfile.ZipFile(experiment_zip, "r") as zip_ref:
#     for name in sorted(zip_ref.namelist()):

#         print("Run {} on {}".format(method_name, name))
#         data = np.loadtxt(zip_ref.open(name))

#         # Runtimes for your own assessment
#         start_time = time.time()

#         # Run your method (adapt parameters if needed)
#         val_matrix, p_matrix, lag_matrix = my_method(data, maxlags)
#         runtimes.append(time.time() - start_time)

#         # Now we convert the matrices to the required format
#         # and write the results file
#         scores.append(val_matrix.flatten())

#         # pvalues and lags are recommended for a more comprehensive method evaluation,
#         # but not required. Then you can leave the dictionary field empty          
#         if p_matrix is not None: pvalues.append(p_matrix.flatten())
#         if lag_matrix is not None: lags.append(lag_matrix.flatten())

# # Store arrays as lists for json
# results['scores'] = np.array(scores).tolist()
# if len(pvalues) > 0: results['pvalues'] = np.array(pvalues).tolist()
# if len(lags) > 0: results['lags'] = np.array(lags).tolist()
# results['runtimes'] = np.array(runtimes).tolist()

# # Save data
# print('Writing results ...')
# results_json = bytes(json.dumps(results), encoding='latin1')
# with bz2.BZ2File(results_file, 'w') as mybz2:
#     mybz2.write(results_json)


For further instruction, the user is encouraged to look at the extensive causeme documentation.