# Python Data Analysis Cheat Sheet

## Saving and loading data

Use `np.savetxt` and `np.save` to save data from a NumPy array to a file. Use `np.savez` to save multiple arrays to one file. Use `np.loadtxt` and `np.load` to load arrays from files.

In [1]:
import numpy as np
trial_type = np.array(["target", "lure", "lure", "target", "lure", "target", "target", "lure"])
response = np.array(["old", "old", "new", "old", "new", "new", "old", "new"])
response_time = np.array([5.4, 3.4, 8.4, 3.2, 3.9, 5.2, 6.1, 7.1])

Use `np.savetxt` to save an array to a human-readable text file.

In [2]:
np.savetxt("data/response_time.txt", response_time)

Use `np.save` to save an array to a NumPy-format file that is smaller and faster to read and write.

In [3]:
np.save("data/response_time.npy", response_time)

Use `np.savez` to save multiple arrays to one NumPy-format file.

In [4]:
np.savez(
    "data/trials.npz", 
    trial_type=trial_type, 
    response=response,
    response_time=response_time,
)

Use `np.loadtxt` to load an array from a text file (run `help(np.loadtxt)` to see options for reading from files with different formatting).

In [5]:
rt1 = np.loadtxt("data/response_time.txt")

Use `np.load` to load data from a NumPy-formatted file (`.npy` or `.npz`).

In [6]:
rt2 = np.load("data/response_time.npy")  # load one array from an npy file
trial_data = np.load("data/trials.npz")  # load multiple arrays from an npz file
print(list(trial_data.keys()))           # display arrays in a loaded npz file
trial_type = trial_data["trial_type"]    # access a variable from a loaded npz file

['trial_type', 'response', 'response_time']


## Analyzing data in different files

When data are stored in separate files, we can use a `for` loop to iterate over them and run analysis code for each file.

In [7]:
subjects = ["01", "02", "03", "04", "05", "06", "07", "08"]  # subjects to analyze
hr = []  # list that will hold results
for subject in subjects:
    file = f"data/sub-{subject}_beh.npz"     # data file for this subject
    data = np.load(file, allow_pickle=True)  # use allow_pickle=True to load string data
    trial_type = data["trial_type"]          # access variable from the loaded data
    response = data["response"]
    subject_hr = np.mean(response[trial_type == "target"])
    hr.append(subject_hr)                    # add result to the list

The `for` loop produces a list of results, with one hit rate for each subject. We can then analyze the subject hit rates further.

In [8]:
mean_hr = np.mean(hr)
std_hr = np.std(hr)
print(f"Hit rate: mean={mean_hr:.2f}, sd={std_hr:.2f}")

Hit rate: mean=0.59, sd=0.06
