# Assignment 13 draft (NOT FINALIZED)

Please fill in blanks in the *Answer* sections of this notebook. To check your answer for a problem, run the Setup, Answer, and Result sections. DO NOT MODIFY SETUP OR RESULT CELLS. See the [README](https://github.com/mortonne/datascipsych) for instructions on setting up a Python environment to run this notebook.

Write your answers for each problem. Then restart the kernel, run all cells, and then save the notebook. Upload your notebook to Canvas.

If you get stuck, read through the other notebooks in this directory, ask us for help in class, or ask other students for help in class or on the weekly discussion board.

## Problem: code style (2 points)

Change the following code to meet the guidelines described in the [Coding Best Practices](https://mortonne.github.io/datascipsych/assignments/assignment13/coding_best_practices.html#use-consistent-code-style) lecture. Edit the code in the Modified section.

### Standardize import statements (1 point)

Move the import statements to the top of the code cell (0.5 points) and use standard names for each of the packages (0.5 points).

### Apply Black formatting (1 point)

Apply Black-style formatting to the code.

### Original

In [1]:
from datascipsych import datasets
file = datasets.get_dataset_file('Osth2019')

import polars as pl
data = pl.read_csv( file )
targets = data.filter(pl.col("type") == "intact").group_by("subj").agg(pl.col("response").mean())
import numpy as N
x = N.arange(10)
y = N.sum(x)

### Modified

In [2]:
# your modified version of the code here

## Problem: variable names (2 points)

The code below uses very generic variable names that are not very informative. Rename the variables to describe what they refer to. There isn't a specific right answer, but try to choose names that are descriptive but not too long.

### Original

In [3]:
import polars as pl
from datascipsych import datasets

a = datasets.get_dataset_file("Osth2019")
b = datasets.clean_osth(pl.read_csv(a))
c = b.filter(pl.col("phase") == "test")
d = c.drop_nulls().group_by("response").agg(pl.col("RT").mean())

### Modified

In [4]:
# your modified version of the code here

## Problem: code comments (2 points)

The code below has good variable names, but could still use a little clarification. Add one comment to describe what is happening in each of the two code blocks.

### Original

In [5]:
import numpy as np

trial_number = np.array([1, 2, 3, 4, 5, 6])
trial_type = np.array(["target", "lure", "lure", "lure", "target", "target"])
old_response = np.array([1, 0, 1, 0, 0, 1])

hit_rate = np.mean(old_response[trial_type == "target"])
false_alarm_rate = np.mean(old_response[trial_type == "lure"])

### Modified

In [6]:
# your modified version of the code here

## Problem: using loops (2 points)

The code below reads in CSV files for each subject, calculates the total number of correct trials for that subject, and then adds the correct trials across subjects.

Change the code to calculate the same sum using a `for` loop instead.

### Original

In [7]:
import polars as pl

n1 = pl.read_csv("data/sub-01_beh.csv")["correct"].sum()
n2 = pl.read_csv("data/sub-02_beh.csv")["correct"].sum()
n3 = pl.read_csv("data/sub-03_beh.csv")["correct"].sum()
n4 = pl.read_csv("data/sub-04_beh.csv")["correct"].sum()
n5 = pl.read_csv("data/sub-05_beh.csv")["correct"].sum()
n6 = pl.read_csv("data/sub-06_beh.csv")["correct"].sum()
n7 = pl.read_csv("data/sub-07_beh.csv")["correct"].sum()
n8 = pl.read_csv("data/sub-08_beh.csv")["correct"].sum()
total = n1 + n2 + n3 + n4 + n5 + n6 + n7 + n8
print(total)

48


### Modified

In [8]:
# your modified version of the code here

## Problem: using functions (2 points)

The code in the Original section uses a Polars expression to calculate mean and SEM for conditions in two DataFrames. Write two functions called `mean_expr` and `sem_expr` that take a column name and return a Polars expression to calculate the given statistic. Rewrite the original code to use your functions.

### Setup

In [9]:
import polars as pl
from datascipsych import datasets

dataset_file = datasets.get_dataset_file("Osth2019")
df_osth = datasets.clean_osth(pl.read_csv(dataset_file))
mean_response_type = (
    df_osth.filter(pl.col("phase") == "test")
    .group_by("subj", "probe_type")
    .agg(pl.col("response").mean())
    .sort("subj", "probe_type")
)
mean_response_lag = (
    df_osth.filter((pl.col("phase") == "test") & (pl.col("probe_type") == "lure"))
    .group_by("subj", "lag")
    .agg(pl.col("response").mean())
    .sort("subj", "lag")
)

### Original

In [10]:
stats_response_type = (
    mean_response_type.group_by("probe_type")
    .agg(
        mean=pl.col("response").mean(),
        sem=pl.col("response").std() / pl.col("response").len().sqrt()
    )
)
stats_response_lag = (
    mean_response_lag.group_by("lag")
    .agg(
        mean=pl.col("response").mean(),
        sem=pl.col("response").std() / pl.col("response").len().sqrt()
    )
)

### Modified

In [11]:
# your modified version of the code here

## Problem: make a function more flexible (2 points)

The `subject_mean_response` function defined in the Original section can be used to calculate the mean response for each combination of subject and probe type. Rewrite the function to be more flexible. Your modified version should take a `subject` input, which indicates the name of the column with subject labels, and a `condition` input, which indicates the name of the column with conditions that we want to split up. It should work the same as the original function when `subject="subj"` and `condition="probe_type"`.

### Setup

In [12]:
import polars as pl
from datascipsych import datasets

dataset_file = datasets.get_dataset_file("Osth2019")
df_osth = datasets.clean_osth(pl.read_csv(dataset_file))
df_test = df_osth.filter(pl.col("phase") == "test")
df_test_lures = df_osth.filter((pl.col("phase") == "test") & (pl.col("probe_type") == "lure"))

### Original

In [13]:
def subject_mean_response(df):
    means = (
        df.group_by("subj", "probe_type")
        .agg(pl.col("response").mean())
        .sort("subj", "probe_type")
    )
    return means


mean_probe_type = subject_mean_response(df_test)

### Modified

In [14]:
# your modified version of the code here

# uncomment the lines below to test your function
# mean_probe_type = subject_mean_response(df_test, "subj", "probe_type")
# mean_lure_lag = subject_mean_response(df_test_lures, "subj", "lag")