# Dev New Design Matrix Generator (DMG)

**Goal**: Use this notebook to develop a new, more flexible DMG class.

**Motivation**: The previous version has a "generate_base_matrix" function and then other classes operated on that base matrix (e.g. filtering the history). The structure of the base matrix was not flexible at all and required adding lots of if else logic. Also, adding violation features to a binary model was not possible Now that I am entering a phase of wanting to change features readily, this needs to be updated.


In [1]:
import pandas as pd
import numpy as np
from multiglm.features.exp_filter import ExpFilter
from multiglm.features.design_matrix_generator import *

from multiglm.data.dataset_loader import *

%load_ext autoreload
%autoreload 2


In [7]:
df = DatasetLoader(animal_ids=["W051"], data_type="new_trained").load_data()

Loading data for animal ids:  ['W051']


In [9]:
df.head()

Unnamed: 0,animal_id,session_date,session_file_counter,rig_id,training_stage,s_a,s_b,hit,violation,trial_not_started,...,l_water_vol,r_water_vol,antibias_beta,antibias_right_prob,using_psychometric_pairs,choice,session,session_relative_to_old,n_prev_trial_not_started,trial
0,W051,2014-07-17,327,19,4,60.0,68.0,,1,False,...,18.0,18.0,3,0.5,,2,326,326.0,0.0,1
1,W051,2014-07-17,327,19,4,60.0,68.0,,1,False,...,18.000001,18.000001,3,0.5,,2,326,326.0,0.0,2
2,W051,2014-07-17,327,19,4,76.0,68.0,,1,False,...,18.000004,18.000004,3,0.5,,2,326,326.0,0.0,3
3,W051,2014-07-17,327,19,4,68.0,76.0,1.0,0,False,...,18.000014,18.000014,3,0.5,,0,326,326.0,0.0,4
4,W051,2014-07-17,327,19,4,84.0,76.0,1.0,0,False,...,18.000033,18.000033,3,0.5,,1,326,326.0,0.0,5


In [40]:
from pandas.core.series import Series


def normalize_column(df_col: Series) -> Series:
    return (df_col - df_col.mean()) / df_col.std()


def prev_trial_value_column(
    df_col: Series,
    method=None,
    mask_violations=True,
    **kwargs,
) -> Series:
    df_col = df_col.shift().fillna(0)

    if method is not None:
        df_col = DesignMatrixGenerator.apply_custom_method(df_col, method, **kwargs)
    else:
        print("Raw values used for previous history")

    mask = DesignMatrixGenerator.create_mask(df, mask_violations=mask_violations)

    return df_col * mask


def binarize(col_data: Series, comparison, value) -> Series:
    """
    method for converting a column to a binary 0/1 int
    given comparison logic and value

    possible comparison options from operator class:
        - eq : == equal
        - ne : != not equal
        - gt : > greater than
        - lt : < less than
        - ge : >= grater than or equal to
        - lt : <= less than or equal to
        - and_ : bit wise AND
        - or_ : bit wise OR
        - xor : bitwise XOR
    """
    return comparison(col_data, value).astype(int)


def chain(funs, df_col: Series) -> Series:
    for fn in funs:
        df_col = fn(df_col)
    return df_col


chain([normalize_column, prev_trial_value_column], df["s_a"])

Raw values used for previous history


0        0.000000
1       -0.000000
2       -0.000000
3        0.000000
4       -0.815273
           ...   
98918    0.019421
98919    1.584472
98920   -1.649967
98921   -0.000000
98922    1.584472
Length: 98923, dtype: float64

In [None]:
dmg.prev_trial_avg(df, col_names=["s_a", "s_b"], mask_violations=True, normalize=True)

In [43]:
config = {
    "bias": (lambda df: dmg.add_bias_column(df)),
    "s_a_norm": (lambda df: dmg.normalize_column(df, "s_a")),
    "s_b_norm": (lambda df: dmg.normalize_column(df, "s_b")),
    "prev_sound_avg": (
        lambda df: dmg.prev_trial_avg(
            df, col_names=["s_a", "s_b"], mask_violations=True, normalize=True
        )
    ),
    "s_a": (lambda df: dmg.copy(df, col_name="s_a")),
    "prev_sa": (
        lambda df: dmg.prev_trial_value(
            df,
            col_name="s_a",
            method=DesignMatrixGenerator.implement_map,
            mapping={60: 2},
        )
    ),
    "chained_column": lambda df: chain(
        [
            normalize_column,
            prev_trial_value_column,
            lambda series: binarize(series, operator.eq, 10),
        ],
        df["s_a"],
    ),
}
dmg = DesignMatrixGenerator(df, config)
dmg.config["bias"](dmg.df)

array([1, 1, 1, ..., 1, 1, 1])

now set up to make a filtered column. The one thing is this filtered column needs to act on the temp column...hmm. I guess maybe putting it back to the df would have been the right call? but weird copy things were happening (e.g. s_a exists in both). I want to be able to chain things together. So filtering can happen on a column thats already been normalized etc in a separate operations. I want to be able to filter "prev_choice" after I explicitly make prev choice and same for violations. Lets start by writing a method that takes a col, tau and applies the filter to it then we can worry about where it's coming form.


In [44]:
dmg.create()
dmg.X

KeyError: "None of [Index(['s_a', 's_b'], dtype='object')] are in the [index]"

In [54]:
sum(dmg.X.prev_sa == 27)

0

In [19]:
x = {"comparison": operator.eq, "value": 2}

x["comparison"]

{'comparison': <function _operator.eq(a, b, /)>, 'value': 2}

In [16]:
col = dmg.X.s_a.copy()
value = 60

import operator

operator.eq(col, 60).astype(int)

0        1
1        1
2        0
3        0
4        0
        ..
98918    0
98919    1
98920    0
98921    0
98922    0
Name: s_a, Length: 98923, dtype: int64