<a id='top'></a>

# Scaling Up


<br>

 <center> <img src=img/scaling_up.png  width=65%> </center> 


## Notebook Overview
 
### <a href='#import'> 1) Imports </a>

### <a href='#config'> 2) Measurement and metric configuration file </a>

### <a href='#taskrunner'> 3) TaskRunner </a>

### <a href='#metrics'> 4) Metrics </a>

<a id='import'></a>

### Imports
[Jump to top](#top)

So far, we have been focusing on exploring the available measurements and analysis by running individual measurements of each type. Because the code was designed for the automated evaluation of information spread simulations during SocialSim challenge events, it incorporates many automated features to run multiple measurements in batch mode. 

In [4]:
import socialsim as ss
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy  as np
import pprint

import matplotlib.pyplot as plt

<a id='config'></a>

## The SocialSim config file format
[Jump to top](#top)

In order to run multiple measurements, the package uses a configuration file to specify the measurements that should be included.  This config file has the following format:

In [5]:
config_schema = {
    "platform_name": {
        "measurement_type": {
            "measurement_scale": {
                "measurement_name": {
                    "measurement": "function_name",
                    "measurement_args": {
                        "arg1" : "arg1_value",
                        "arg2" : "arg2_value",
                    },
                    "metrics": {
                        "metric_name": {
                            "metric": "function_name",
                            "metric_args":{
                                "arg1": "arg1_value",
                                "arg2": "arg2_value"
                            }
                        }
                    }
                }
            }
        }
    }
}



For example, if we want to measure several network properties for Twitter and several multi-platform measurements for all of the input platforms, we could use the following configuration:

In [6]:
config_example = {
    "twitter": {
        "social_structure": {
            "population": {
                "number_of_nodes": {
                    "measurement": "number_of_nodes",
                },
                "degree_distribution": {
                    "measurement": "degree_distribution"
                }
            }
        }
    },
    "multi_platform": {
        "multi_platform": {
            "population": {
                "top_audience_reach":{
                    "measurement":"top_audience_reach",
                    "measurement_args":{
                        "k":5
                    }
                }
            },
            "node" : {
                "unique_users_over_time":{
                    "measurement":"unique_users_over_time",
                    "measurement_args":{
                        "node_level": True
                    }
                }
            }
        }
    }
}

<a id='taskrunner'></a>

### The TaskRunner
[Jump to top](#top)

In [7]:
dataset_path = '../data/tutorial_multi-platform.json'
dataset = ss.load_data(dataset_path, verbose=False)

In [8]:
task_runner = ss.TaskRunner(dataset,config_example,ss.MetaData())

results, logs = task_runner.get_results()

SOCIALSIM TASKRUNNER   | Subsetting twitter data... Done.
SOCIALSIM TASKRUNNER   | Instantiating social_structure... Done.
SOCIALSIM MEASUREMENTS | Running social_structure population number_of_nodes... Done. (1.7e-05 seconds.)
SOCIALSIM MEASUREMENTS | Running social_structure population degree_distribution... Done. (0.009787 seconds.)
SOCIALSIM TASKRUNNER   | Subsetting multi_platform data... Done.
SOCIALSIM TASKRUNNER   | Instantiating multi_platform... Done.
SOCIALSIM MEASUREMENTS | Running multi_platform population top_audience_reach... Done. (0.121839 seconds.)
SOCIALSIM MEASUREMENTS | Running multi_platform node unique_users_over_time... Done. (1.350649 seconds.)


We can now access the measurement outputs from the results dictionary, which has an identical structure to hte input config file.  For example, if we want to take a look at the number of nodes in the network:

In [9]:
num_nodes = results['twitter']['social_structure']['population']['number_of_nodes']
print('{} Twitter nodes'.format(num_nodes))

2082 Twitter nodes


Or if we want to look at the time series of audience size for a specific piece of information:

In [10]:
time_series = results['multi_platform']['multi_platform']['node']['unique_users_over_time']
time_series['CVE-2015-6620'].head()

Unnamed: 0,nodeTime,value
0,2015-01-01,0.0
1,2015-01-02,0.0
2,2015-01-03,0.0
3,2015-01-04,0.0
4,2015-01-05,0.0


We can also look at the log file to see if each measurement ran succesfully.

In [11]:
pprint.pprint(logs)

{'multi_platform': {'multi_platform': {'node': {'unique_users_over_time': {'run_time': 1.350649,
                                                                           'status': 'success'}},
                                       'population': {'top_audience_reach': {'run_time': 0.121839,
                                                                             'status': 'success'}}}},
 'twitter': {'social_structure': {'population': {'degree_distribution': {'run_time': 0.009787,
                                                                         'status': 'success'},
                                                 'number_of_nodes': {'run_time': 1.7e-05,
                                                                     'status': 'success'}}}}}


<a id='metrics'></a>

### Running Metrics
[Jump to top](#top)

This tutorial has focused on *measuring* properties of information spread on social platforms, but one of the primary purposes of the package is to enable the *evaluation* of social simulations in comparison with known ground truth observations. We can update our previous configuration to specify which metrics we would like to use to compare the simualation and the ground truth.

<img src="img/socialsim_evaluation_approach.png?1" width="800"/>

In [12]:
config_example = {
    "twitter": {
        "social_structure": {
            "population": {
                "number_of_nodes": {
                    "measurement": "number_of_nodes",
                    "metrics": {
                        "absolute_difference": {
                              "metric": "absolute_difference",
                        },
                        "absolute_percentage_error": {
                          "metric": "absolute_percentage_error",
                        }
                    }
                },
                "degree_distribution": {
                    "measurement": "degree_distribution",
                    "metrics": {
                        "js_divergence": {
                          "metric": "js_divergence",
                          "metric_args": {
                            "discrete": True
                        }
                    }
                  }
                }
            }
        }
    },
    "multi_platform": {
        "multi_platform": {
            "population": {
                "top_audience_reach":{
                    "measurement":"top_audience_reach",
                    "measurement_args":{
                        "k":5
                    },
                    "metrics": {
                        "rbo": {
                          "metric": "rbo_score",
                        }
                      }
                }
            },
            "node" : {
                "unique_users_over_time":{
                    "measurement":"unique_users_over_time",
                    "measurement_args":{
                        "node_level": True
                    },
                    "metrics": {
                        "rmse": {
                          "metric": "rmse",
                          "metric_args": {
                            "join": "outer"
                          }
                        },
                        "dtw": {
                          "metric": "fast_dtw",
                          "metric_args": {
                             "join":"outer"
                          }
                        }
                    }
                }
            }
        }
    }
}

In [13]:
task_runner = ss.TaskRunner(dataset,config_example,ss.MetaData())

results, logs = task_runner(dataset)

SOCIALSIM TASKRUNNER   | Subsetting twitter data... Done.
SOCIALSIM TASKRUNNER   | Instantiating social_structure... Done.
SOCIALSIM MEASUREMENTS | Running social_structure population number_of_nodes... Done. (1.4e-05 seconds.)
SOCIALSIM MEASUREMENTS | Running social_structure population degree_distribution... Done. (0.010068 seconds.)
SOCIALSIM TASKRUNNER   | Subsetting multi_platform data... Done.
SOCIALSIM TASKRUNNER   | Instantiating multi_platform... Done.
SOCIALSIM MEASUREMENTS | Running multi_platform population top_audience_reach... Done. (0.07717 seconds.)
SOCIALSIM MEASUREMENTS | Running multi_platform node unique_users_over_time... Done. (1.314047 seconds.)


Now our results should contain the simulation measurements outputs, the ground truth measurements outputs, and the results of the metric comparisons. Because we are comparing the same data against itself, we expect to get perfect agreement.

In [14]:
sim_result = results['simulation_results']['twitter']['social_structure']['population']['number_of_nodes']
ground_truth_result = results['ground_truth_results']['twitter']['social_structure']['population']['number_of_nodes']
print('Simulation: {} nodes\nGround Truth: {} nodes'.format(sim_result,ground_truth_result))

Simulation: 2082 nodes
Ground Truth: 2082 nodes


In [15]:
pprint.pprint(results['metrics'])

{'multi_platform': {'multi_platform': {'node': {'unique_users_over_time': {'dtw': {'CVE-2015-0235': 0.0,
                                                                                   'CVE-2015-1805': 0.0,
                                                                                   'CVE-2015-3864': 0.0,
                                                                                   'CVE-2015-6620': 0.0,
                                                                                   'CVE-2016-0777': 0.0,
                                                                                   'CVE-2016-10033': 0.0,
                                                                                   'CVE-2017-0037': 0.0,
                                                                                   'CVE-2017-0059': 0.0,
                                                                                   'CVE-2017-0199': 0.0,
                                                      

In [16]:
pprint.pprint(logs,indent=0)

{'ground_truth_logs': {'multi_platform': {'multi_platform': {'node': {'unique_users_over_time': {'run_time': 1.314047,
                                                                                           'status': 'success'}},
                                                         'population': {'top_audience_reach': {'run_time': 0.07717,
                                                                                             'status': 'success'}}}},
                     'twitter': {'social_structure': {'population': {'degree_distribution': {'run_time': 0.010068,
                                                                                         'status': 'success'},
                                                                  'number_of_nodes': {'run_time': 1.4e-05,
                                                                                     'status': 'success'}}}}},
'metrics_logs': {'multi_platform': {'multi_platform': {'node': {'unique_users_over_time':