# Compare Randomised Datasets to Observed
Here we look at the significance of the transition probabilities observed in our recorded combis, by comparing these to the transition probabilities obtained in the 10, 000 random simulations of combi arrays (sensu Monte Carlo simulations). Significance is calculated as the relative ranking of each observed statistic (transition probability) among the sample values from the Monte Carlo randomization. 

The observed transition probability was found to be significantly different from random if it occurs within the 95th percentile for an upper or lower tail of the Monte Carlo randomizations.

For example, in our dataset of recorded combis the probability to transition from a call from group 0 to call within the same group is 0.33. This statistic is deemed as significant if it is either greater or lower than at least 95% of the values for the probabilities of the same transition in the Monte Carlo randomizations).

In [1]:
from tqdm.auto import tqdm

In [2]:
import avgn

In [3]:
import pandas as pd
import numpy as np
from avgn.utils.paths import DATA_DIR, ensure_dir, FIGURE_DIR

In [4]:
from scipy.stats import kruskal

In [5]:
DATASET_ID = "git_repos_call"

In [6]:
DT_ID = '2022-03-12_17-46-00'

In [7]:
randdfs = pd.read_pickle(DATA_DIR / DATASET_ID / 'Monte_Carlo_Calls' /  'random_combi_simulations.pickle')
randdfs[:3]

Unnamed: 0,0-0,0-1,0-2,1-0,1-1,1-2,2-0,2-1,2-2,rand_run
0,0.353448,0.068966,0.577586,0.428571,0.0,0.571429,0.392344,0.038278,0.569378,0
0,0.362069,0.060345,0.577586,0.470588,0.0,0.529412,0.368932,0.043689,0.587379,1
0,0.293651,0.047619,0.65873,0.538462,0.0,0.461538,0.36,0.06,0.58,2


In [8]:
len(randdfs)

10000

# Original Transition Probs

In [9]:
trans_origdf = pd.read_pickle(DATA_DIR / DATASET_ID / 'Monte_Carlo_Calls' /  'observed_call_transitions.pickle')
trans_origdf

Unnamed: 0,0-0,0-1,0-2,1-0,1-1,1-2,2-0,2-1,2-2
0,0.333333,0.0,0.666667,0.0,0.0,1.0,0.648387,0.009677,0.341935


## 0 to 0
- transition between calls within the same group: Contains LH Segment

In [10]:
trans_origdf["0-0"].values

array([0.33333333])

In [11]:
x = 0.33333333

In [12]:
rand_prob = np.array(randdfs["0-0"].values)

In [13]:
min(rand_prob)

0.21666666666666667

In [14]:
max(rand_prob)

0.5225225225225225

In [15]:
def min_monte_carlo(x, rand_prob, tot_samples):
    r = rand_prob <= x
    return np.divide((np.sum(r)+1), (tot_samples + 1))

In [16]:
##0-0
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

0.22257774222577742

In [17]:
np.sum(rand_prob <= x)

2225

In [18]:
def max_monte_carlo(x, rand_prob, tot_samples):
    r = rand_prob >= x
    return np.divide((np.sum(r)+1), (tot_samples + 1))

In [19]:
##0-0
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

0.7775222477752225

In [20]:
np.sum(rand_prob >= x)

7775

In [21]:
### Original Transition Probability not significantly different from random

In [22]:
sig = {"Transition":["0-0"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]}
sigdf = pd.DataFrame(sig)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522


## 0-1
- Transition from calls containing an LH segment to lone NL segment calls 

In [23]:
trans_origdf["0-1"]

0    0.0
Name: 0-1, dtype: float64

In [24]:
x = 0.0
rand_prob = np.array(randdfs["0-1"].values)

In [25]:
min(rand_prob)

0.0

In [26]:
max(rand_prob)

0.11570247933884298

In [27]:
##0-1
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

0.0020997900209979003

In [28]:
np.sum(rand_prob <= x)

20

In [29]:
##0-1
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

1.0

In [30]:
np.sum(rand_prob >= x)

10000

0-1 observed transition within the 95th percentile. There are 20 randomly generated transition probs (under 5% of 10000) equal to observed transition probability. 

In [31]:
sig = pd.DataFrame({"Transition":["0-1"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]})
sigdf = pd.concat([sigdf, sig], axis = 0)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522
0,0-1,0.0,0.0021,1.0


## 0-2
- Transition from calls containing an LH segment to calls comprising lone DS or SH segments, or those segments and/or NL in any combination 

In [32]:
trans_origdf["0-2"]

0    0.666667
Name: 0-2, dtype: float64

In [33]:
min(randdfs["0-2"])

0.43333333333333335

In [34]:
max(randdfs["0-2"])

0.7355371900826446

In [35]:
x = 0.666667
rand_prob = np.array(randdfs["0-2"].values)

In [36]:
##Lower tail probability
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

0.9771022897710229

In [37]:
np.sum(rand_prob <= x)

9771

9771 simulations with a transition probability below or equal to the original

In [38]:
##Upper tail probability
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

0.022997700229977002

In [39]:
np.sum(rand_prob >= x)

229

229 simulations with a transition prob values over or equal to the original

In [40]:
sig = pd.DataFrame({"Transition":["0-2"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]})
sigdf = pd.concat([sigdf, sig], axis = 0)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522
0,0-1,0.0,0.0021,1.0
0,0-2,0.666667,0.977102,0.022998


## 1-0
- Transition from lone NL segments to calls containing an LH segment 

In [41]:
trans_origdf["1-0"]

0    0.0
Name: 1-0, dtype: float64

In [42]:
min(randdfs["1-0"])

0.0

In [43]:
max(randdfs["1-0"])

0.8181818181818182

In [44]:
x = 0.0
rand_prob = np.array(randdfs["1-0"].values)

In [45]:
##Lower tail probability
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

0.001999800019998

In [46]:
np.sum(rand_prob <= x)

19

In [47]:
##Upper tail probability
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

1.0

In [48]:
np.sum(rand_prob >= x)

10000

In [49]:
#19 simulations have a trans prob equal to 0 (orig trans prob for 1-0)

In [50]:
sig = pd.DataFrame({"Transition":["1-0"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]})
sigdf = pd.concat([sigdf, sig], axis = 0)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522
0,0-1,0.0,0.0021,1.0
0,0-2,0.666667,0.977102,0.022998
0,1-0,0.0,0.002,1.0


## 1-1
- Transition from lone NL segments to calls within the same group 

In [51]:
trans_origdf["1-1"]

0    0.0
Name: 1-1, dtype: float64

In [52]:
min(randdfs["1-1"])

0.0

In [53]:
max(randdfs["1-1"])

0.42857142857142855

In [54]:
x = 0
rand_prob = np.array(randdfs["1-1"].values)

In [55]:
##Lower tail probability
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

0.48525147485251474

In [56]:
np.sum(rand_prob <= x)

4852

In [57]:
##Upper tail probability
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

1.0

In [58]:
np.sum(rand_prob >= x)

10000

In [59]:
#almost half of the simulation display a trans probability equal to 0 for NL to NL transition

In [59]:
sig = pd.DataFrame({"Transition":["1-1"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]})
sigdf = pd.concat([sigdf, sig], axis = 0)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522
0,0-1,0.0,0.0021,1.0
0,0-2,0.666667,0.977102,0.022998
0,1-0,0.0,0.002,1.0
0,1-1,0.0,0.485251,1.0


## 1-2
- Transition from lone NL segments to calls comprising lone DS or SH segments, or those segments and/or NL in any combination 

In [60]:
trans_origdf["1-2"]

0    1.0
Name: 1-2, dtype: float64

In [61]:
min(randdfs["1-2"])

0.14285714285714285

In [62]:
max(randdfs["1-2"])

1.0

In [63]:
x = 1
rand_prob = np.array(randdfs["1-2"].values)

In [64]:
##Lower tail probability
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

1.0

In [65]:
np.sum(rand_prob <= x)

10000

In [66]:
##Upper tail probability
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

0.0006999300069993001

In [67]:
np.sum(rand_prob >= x)

6

In [68]:
# 6 simulations show transition prob equal to 1

In [69]:
sig = pd.DataFrame({"Transition":["1-2"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]})
sigdf = pd.concat([sigdf, sig], axis = 0)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522
0,0-1,0.0,0.0021,1.0
0,0-2,0.666667,0.977102,0.022998
0,1-0,0.0,0.002,1.0
0,1-1,0.0,0.485251,1.0
0,1-2,1.0,1.0,0.0007


## 2-0
- Transition from calls comprising lone DS or SH segments, or those segments and/or NL in any combination, to calls containing an LH segment

In [70]:
trans_origdf["2-0"]

0    0.648387
Name: 2-0, dtype: float64

In [71]:
min(randdfs["2-0"])

0.26666666666666666

In [72]:
max(randdfs["2-0"])

0.46798029556650245

In [73]:
x = 0.648387
rand_prob = np.array(randdfs["2-0"].values)

In [74]:
##Lower tail probability
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

1.0

In [75]:
np.sum(rand_prob <= x)

10000

In [76]:
##Upper tail probability
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

9.999000099990002e-05

In [77]:
np.sum(rand_prob >= x)

0

In [78]:
## All rnadom probs less than observed for 2-0

In [79]:
sig = pd.DataFrame({"Transition":["2-0"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]})
sigdf = pd.concat([sigdf, sig], axis = 0)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522
0,0-1,0.0,0.0021,1.0
0,0-2,0.666667,0.977102,0.022998
0,1-0,0.0,0.002,1.0
0,1-1,0.0,0.485251,1.0
0,1-2,1.0,1.0,0.0007
0,2-0,0.648387,1.0,0.0001


## 2-1
- Transition from calls comprising lone DS or SH segments, or those segments and/or NL in any combination, to lone NL segment calls

In [80]:
trans_origdf["2-1"]

0    0.009677
Name: 2-1, dtype: float64

In [81]:
min(randdfs["2-1"])

0.00980392156862745

In [82]:
max(randdfs["2-1"])

0.09852216748768473

In [83]:
x = 0.009677
rand_prob = np.array(randdfs["2-1"].values)

In [84]:
##Lower tail probability
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

9.999000099990002e-05

In [85]:
np.sum(rand_prob <= x)

0

In [86]:
##Upper tail probability
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

1.0

In [87]:
np.sum(rand_prob >= x)

10000

In [88]:
### All rnadom probs greater than observed for 2-1

In [89]:
sig = pd.DataFrame({"Transition":["2-1"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]})
sigdf = pd.concat([sigdf, sig], axis = 0)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522
0,0-1,0.0,0.0021,1.0
0,0-2,0.666667,0.977102,0.022998
0,1-0,0.0,0.002,1.0
0,1-1,0.0,0.485251,1.0
0,1-2,1.0,1.0,0.0007
0,2-0,0.648387,1.0,0.0001
0,2-1,0.009677,0.0001,1.0


## 2-2
- Transition from calls comprising lone DS or SH segments, or those segments and/or NL in any combination, to calls within the same group

In [90]:
trans_origdf["2-2"]

0    0.341935
Name: 2-2, dtype: float64

In [91]:
min(randdfs["2-2"])

0.48792270531400966

In [92]:
max(randdfs["2-2"])

0.6871794871794872

In [93]:
x = 0.341935
rand_prob = np.array(randdfs["2-2"].values)

In [94]:
##Lower tail probability
results_min = min_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_min

9.999000099990002e-05

In [95]:
np.sum(rand_prob <= x)

0

In [96]:
##Upper tail probability
results_max = max_monte_carlo(x=x, rand_prob=rand_prob, tot_samples = 10000)
results_max

1.0

In [97]:
np.sum(rand_prob >= x)

10000

In [98]:
### All rnadom probs higher than observed for 2-2

In [99]:
sig = pd.DataFrame({"Transition":["2-2"], "Orig_Prob": [x], "Lower Tail": [results_min], "Upper Tail": [results_max]})
sigdf = pd.concat([sigdf, sig], axis = 0)
sigdf

Unnamed: 0,Transition,Orig_Prob,Lower Tail,Upper Tail
0,0-0,0.333333,0.222578,0.777522
0,0-1,0.0,0.0021,1.0
0,0-2,0.666667,0.977102,0.022998
0,1-0,0.0,0.002,1.0
0,1-1,0.0,0.485251,1.0
0,1-2,1.0,1.0,0.0007
0,2-0,0.648387,1.0,0.0001
0,2-1,0.009677,0.0001,1.0
0,2-2,0.341935,0.0001,1.0


In [None]:
#Contains LH Segment: 0 (MS = 2)
#NL Segment Alone:1 (MS = 3)
#Other Calls:2 (MS = 1)