Goal: To be able to pull apart the influence of independent and correlated decisions.

Method: Track each variable separated by k (number of timesteps with independent observations). 

Produced: 3x3 plot with 
    Row: exact, over, both; 
    Col: First decider accuracy, percent trials, avgTime
    
    In the legends, the accuracy with all trials averaged together regardless of k value is black and is labeled 'all k.'
    
Variables:
- n: number of agents
- c: the probability of having a correlated observation on a single timestep
- p: probability of a positive observation (coded as pc, the p for correlated observations and p1c, the p for independent observations) (p1c is short for p_(1-c), where (1-c) is used for the probability of an independent observation
- q: the probability of a negative observation. Here, we restrict to either q = p/e or q = p/e^2
- k: the number of timesteps with independent observations. 
- th: short for theta. The size of the threshold. 


Dependent variables:
- udpate size: supposing an update $x,$ $\log \frac{(x | H^+)}{(x | H^-)} = \log \frac{p}{q}$ or $\log \frac{q}{p}$. In the code, mc is the (positive) update size for the correlated observations and m1c is the (positive) update size for independent observations

Sections:
1) Because we pulled apart the results by the size of k, some parameter sets have very few trials. How many is too few? What are the effects of thresholding a percentage of total trials? How does this affect the appearance of the data? Is anything meaningful lost in the process?

2) Keeping to the original update size (updates in increments of 1) and keeping p the same for both correlated and indepedent observations, what is the effect of the size of p on behavior with respect to k? Does this vary with n? 

3) What is the effect of varying update size? 

4) But what if we have an even threshold? Now, section 2 repeated for threshold = 6.

5) Section 3, for threshold = 6

# 1) Should we show all data? What about data with very few trials?
Graphs:
- 1a: no data points removed
- 1b: data points with fewer than 17 trials removed (.01 %)
- 1c: fewer than 175 removed (0.1 %)
- 1d: fewer than 437 removed (0.25 %)
- 1e: fewer than 875 removed (0.5 %)
- 1f: fewer than 1750 removed (1.0 %)

Question: In displaying later graphs, should we remove datapoints for clarity? If so, how should we threshold?

In [61]:
import numpy as np
n = 100; mc = 1; m1c = 2; pc = .3; th = 5; which = ['exact']
tK = 10
cls = getcolors(sns.color_palette('Paired',tK))

plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors =cls , which = which, 
           graphNum = "1a", heightAdd = .75)
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, 
           remove = 10**(-4), graphNum = "1b", heightAdd = .75)
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, 
          remove = 10**(-3), graphNum = " 1c", heightAdd = .75)
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, 
          remove = .0025, graphNum = " 1d", heightAdd = .75)
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which,
          remove = .005, graphNum = "1e", heightAdd = .75)
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which,
          remove = 10**(-2), graphNum = "1f", heightAdd = .75)

# According to reported preferences, set color palette, threshold percentage for min num trials

In [67]:
cls = getcolors(sns.color_palette('Paired',tK))
rm = .0025
rm = 500

# 2) For the original case (update size = 1 for both correlated and independent observations), compare effects of n, p on the behavior of k for odd threshold = 5.

Graph list:
- 2a: n = 100, p = .3
- 2b: n = 100, p = .7

Note about graphs 2a, 2b: for small p = .3, accuracy decreases with number of ind. observations (k) and with c. But for larger p = .7, this relationship does not entirely hold: the few cases with larger and smaller numbers of indepdentent observations (k >= 5, k <= 2 ) have better accuracy than those with middling values of k (k = 3, 4)

- 2c: n = 1000, p = .3
- 2d: n = 1000, p = .7

Note about graphs 2c, 2d: In 2c, even for the smaller value of p, we see suggestions of the same trend that we were observing in 2b, with our low point being k = 6; however, with so few data points is difficult to be certain. In 2d, we see similar behavior to 2b with the lower accuracies belonging again to k = 3, 4. 

Question: Why is there a change in behavior with respect to k when moving to larger values of n and p? 

In [70]:
n = 100; th = 5; mc = 1; m1c = 1; pc = .3; p1c = .3  ; tK = 10; which = ['exact']
cls = getcolors(sns.color_palette('Paired',tK))
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "2a", heightAdd = 1.35)
pc = .7
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "2b", heightAdd = 1)

n = 1000; pc = .3
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "2c", heightAdd = 1)
pc = .7
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "2d", heightAdd = 1)

# 3) What is the effect of varying update size? Does this also change with p?
Note: the colors are changing here.
Holding n fixed at n = 1000, threshold fixed at th = 5:

Graphs at p = 0.3:
- 3a: update size = 1 for both ind, corr observations (this is a repeat of earlier graph 2c for reference)
- 3b: Correlated update size = 1, Independent update size = 2
- 3c: Correlated update size = 2, Independent update size = 1

Notes for graphs 3a, 3b, 3c: 

While having even update sizes as in 3a seems to lend itself to a diminishment of accuracy with increasing number of independent observations, inflating the update size of either the indepdenent (3b) or correlated (3c) observations results in those trials with an intermediate value of k (here, k = 2 and k = 4 respectively) having a much lower accuracy than those with primarily independent or primarily correlated observations.

Graphs at p = 0.7:
- 3d: update size = 1 for both ind, corr observations (this, also, is a repeat of earlier graph 2d for reference)
- 3e: Correlated update size = 1, Independent update size = 2
- 3f: Correlated update size = 2, Independent update size = 1

Notes for graphs 3d, 3e, 3f: 

In 3e, we see k = 2 again being far less accurate than the other k lines (k = 3 does not appear in the legend because it is not in the top row, 'exact threshold hits'- k = 3 corresponds to a belief of 6 in this case where ind. observations have a value of 2. It is the dark green line.) 

In 3f, I think it is fun how closely the average number of timesteps lines up between even and odd number of indepdent observations.

In [71]:
n = 1000; th = 5; mc = 1; m1c = 1; pc = .3; p1c = .3  ; tK = 10; which = ['exact']
cls = getcolors(sns.color_palette('Paired',tK))

plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = '3a', heightAdd = .75 )
mc = 1; m1c = 2
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, remove = rm, graphNum = "3b")
mc = 2; m1c = 1
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, remove = rm, graphNum = "3c")

pc = .7; mc = 1; m1c = 1
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "3d", heightAdd = .75)
mc = 1; m1c = 2
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, remove = rm, graphNum = "3e")
mc = 2; m1c = 1
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, remove = rm, graphNum = "3f")

# 4) For the original case (update size = 1 for both correlated and independent observations), compare effects of n, p on the behavior of k when we have an even threshold = 6

Graph list:
- 4a: n = 100, p = .3
- 4b: n = 100, p = .7

Note about graphs 4a, 4b: 4a demonstrates a very similar pattern to 2a with its rainbow of falling accuracies as k and c increase. For 4b, increasing values of c behave similarly to 2a, but lower values of c behave more like 2b. Neat!

- 4c: n = 1000, p = .3
- 4d: n = 1000, p = .7

Note about graphs 2c, 2d: In 2c, even for the smaller value of p, we see suggestions of the same trend that we were observing in 2b, with our low point being k = 6; however, with so few data points is difficult to be certain. In 2d, we see similar behavior to 2b with the lower accuracies belonging again to k = 3, 4. 

Question: Why is there a change in behavior with respect to k when moving to larger values of n and p? 

In [79]:
n = 100; th = 6; mc = 1; m1c = 1; pc = .3; p1c = .3  ; tK = 10; which = ['exact']
rm = .005
rm = 500
cls = getcolors(sns.color_palette('Paired',tK))
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "4a", heightAdd = 1)
pc = .7
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "4b", heightAdd = 1)

n = 1000; pc = .3
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "4c", heightAdd = 1)
pc = .7
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "4d", heightAdd = 1)

# 5) What is the effect of varying update size? Does this also change with p? Now for even threshold = 6
Note: the colors are changing here.
Holding n fixed at n = 1000:

Graphs at p = 0.3:
- 5a: update size = 1 for both ind, corr observations (this is a repeat of earlier graph 2c for reference)
- 5b: Correlated update size = 1, Independent update size = 2
- 5c: Correlated update size = 2, Independent update size = 1

Notes for graphs 5a, 5b, 5c: 

The same trends as before seem to hold, with the lowest accuracy points shifting to slightly larger k values (here, k = 5 rather than k = 4 as before). The exception is the value k = 8 in graph 5c: graph 3c has no line with equivalent behavior.


Graphs at p = 0.7:
- 5d: update size = 1 for both ind, corr observations (this, also, is a repeat of earlier graph 2d for reference)
- 5e: Correlated update size = 1, Independent update size = 2
- 5f: Correlated update size = 2, Independent update size = 1



In [81]:
n = 1000; th = 6; mc = 1; m1c = 1; pc = .3; p1c = .3  ; tK = 10; which = ['exact']
cls = getcolors(sns.color_palette('Paired',tK))
rm = .0025
rm = 500
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = '5a', heightAdd = 1 )
mc = 1; m1c = 2
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, remove = rm, graphNum = "5b")
mc = 2; m1c = 1
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, remove = rm, graphNum = "5c")

pc = .7; mc = 1; m1c = 1
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, which = which, remove = rm, 
           graphNum = "5d", heightAdd = 1)
mc = 1; m1c = 2
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, remove = rm, graphNum = "5e")
mc = 2; m1c = 1
plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors = cls, remove = rm, graphNum = "5f")

In [4]:
import plotly.express as px
import plotly.colors as PC
import seaborn as sns

def getcolors(colorlist):
    cols = []
    for k in colorlist:
        aa,bb,cc = k
        cols.append(PC.label_rgb((aa*255, bb*255, cc*255)))
    
    return cols


In [66]:
#            f.writerow(["n","th","c","pc","mc","p1c","m1c",
#            "avgAcc","avgTime","avgK","overTh", "numTrials", "numK"])

        
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

# print(px.colors.sequential.Plasma)

# to convert a seaborn color palette
def getcolors(colorlist):
    cols = []
    for k in colorlist:
        aa,bb,cc = k
        cols.append(PC.label_rgb((aa*255, bb*255, cc*255)))
    
    return cols

def plotKsplit(n,th, m1, m1c , p1, p1c, topK = 9, 
               colors = px.colors.sequential.solar, 
               which = ('exact', 'over', 'both'), 
              remove = 0, 
              graphNum = "",
              heightAdd = 1.25):
    #print(topK)
    
    
    kLabels = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    args2 = [n, th, pc, p1c, mc, m1c]    
    #print(kLabels[0:topK])
    # get a df which splits by the characters in args2,
    # but which still spans the 'c' , 'overTh', and 'numK' values.
    
    df = getSmall(args2)
    totTrials = list(df[df['overTh'] == 'both']['numTrials'])[0]
    which = list(which)
    #if which[-1] == " ":
    if len(which) == 1:
#         fig = make_subplots(rows = len(which), cols = 3, 
#                            subplot_titles = ['Accuracy',
#                                             'Percent Trials',
#                                             'Average # Timesteps'],
#                             row_heights = [.9, .1],
#                            shared_xaxes = True,
#                            horizontal_spacing = .035, vertical_spacing = .05) 
        fig = make_subplots(rows = len(which), cols = 3, 
                           subplot_titles = ['Accuracy',
                                            'Percent Trials',
                                            'Average # Timesteps'],
                           horizontal_spacing = .035, vertical_spacing = .05) 
        figHeight = 200*(len(which)+heightAdd)
        fig.update_xaxes(title_text = "c",
                    row = 1)
    else:
        fig = make_subplots(rows = len(which), cols = 3, 
                           subplot_titles = ['Accuracy',
                                            'Percent Trials',
                                            'Average # Timesteps'],
                           shared_xaxes = True,
                           horizontal_spacing = .035, vertical_spacing = .05)
        figHeight = 200*(len(which))
    
    # populate graphs
    rowNum = 1
    for entry in which:
        df2 = df[df['overTh'] == entry]
        #print("Dealing with " + entry)
        #print(df2.head())
        
        fig.update_yaxes(title_text = entry + " threshold",
                        row = rowNum, col = 1)
        
        alltogether = df2[df2['numK'] == 'all']
        alltogether.sort_values(by = 'c')
        
        if rowNum == 1:
            fig.add_trace( go.Scatter(x = alltogether['c'], y = alltogether['avgAcc'],
                                     name = 'all K',
                                     line = dict(color = 'black')),
                         row = rowNum, col = 1)
        else:
            fig.add_trace( go.Scatter(x = alltogether['c'], y = alltogether['avgAcc'],
                                      name = 'all K',
                                 showlegend = False,
                                 line = dict(color = 'black')),
                     row = rowNum, col = 1)
            
        for numK in kLabels[0:topK]:
            df3 = df2[df2['numK'] == str(numK)]
            df3 = df3[df3['numTrials'] > 0]
            df3 = df3.sort_values(by = 'c')
            
            #print(df3['c'])
            
            color = colors[numK]
            #print('Dealing with k: ' + str(numK))
            #print(df3.head())
            # get percent of trials. Remove those not significant.
            percentC = []
            numC = []
            for c in df3['c']:
                hi = df3[df3['c'] == c]
                #print(ex['numTrials'])
                num = int(list(hi['numTrials'])[0])
                per = num/totTrials
                if remove > 0 and remove < 1:
                    removeNum = remove*totTrials
                    if per > remove:
                        percentC.append(round(per,5))
                        numC.append(num)
                    else:
                        df3 = df3[(df3.c != c)]
                elif remove > 0:
                    removeNum = remove
                    if num > remove:
                        percentC.append(round(per,5))
                        numC.append(num)
                    else:
                        df3 = df3[(df3.c != c)]
                else:
                    percentC.append(round(per,5))
                    numC.append(num)
            # plot accuracy
            if rowNum == 1:
                fig.add_trace( go.Scatter(x = df3['c'], y = df3['avgAcc'], 
                                      name = "k = " + str(numK), 
                                     line = dict(color = color)),
                             row = rowNum, col = 1)
            else:
                fig.add_trace( go.Scatter(x = df3['c'], y = df3['avgAcc'], 
                                    name = "k = " + str(numK),
                                  showlegend = False, 
                                 line = dict(color = color)),
                         row = rowNum, col = 1)
                
            
            # plot percent of trials


            fig.add_trace( go.Scatter(x = df3['c'], y = percentC, 
                                      name = "k = " + str(numK),
                                      showlegend = False,
                                     line = dict(color = color),
                                         hovertext = numC),
                         row = rowNum, col = 2)
            
            # plot average time
            fig.add_trace( go.Scatter(x = df3['c'], y = df3['avgTime'],
                                      name = "k = " + str(numK),
                                     showlegend = False,
                                     line = dict(color = color)),
                         row = rowNum, col = 3)
        
        rowNum += 1
    
    fig.update_xaxes(title_text = "c",
                    row = 3)
    fig.update_layout(height = figHeight,
          title_text= graphNum + ": n = " + str(n)
                      + "; th = " + str(th)
                      + "; Corr.  p = " + str(pc) + ", update = " + str(mc)
                    +  "; Ind. p = " + str(p1c) + ", update = " + str(m1c)
                     + "; " + str(totTrials) + " trials" + 
                     " ( < " + str(int(removeNum)) + " rem)")
    fig.update_layout(hovermode="x unified")
    fig.show()

In [44]:
n = 100; mc = 1; m1c = 2; pc = .3; th = 5; which = ['exact']; tK = 10
cls = getcolors(sns.color_palette('Paired',tK))

plotKsplit(n,th, mc, m1c , pc, pc, topK=tK, colors =cls , which = which, graphNum = "1a")

In [2]:
folder = 'March_5_2021/'
import pandas as pd

def getSmall(args2):
    
    n, th, pc, p1c, mc, m1c = args2
    
    fileName = "altProp.csv"
    apple = pd.read_csv(folder + fileName)
    
    shrink = apple[apple['n'] == n]
    shrink = shrink[shrink['th'] == th]
    shrink = shrink[shrink['pc'] == pc]
    shrink = shrink[shrink['p1c'] == p1c]
    shrink = shrink[shrink['mc'] == mc]
    shrink = shrink[shrink['m1c'] == m1c]
    shrink = shrink[shrink['avgAcc'] > 0]
    
    return shrink