## Step 1: Import some packages 

As in often the case, we start our code by importing some Python modules. 

In this case, we import the ebbbinghaus module which holds a number of helper function that I prepared. We also import numpy so you can use it later.

In [None]:
# install tools for getting data off osf, and for fitting the data
!pip install psignifit osfclient
import sys
import shutil
# grab ebbinghaus code from github (make sure you have latest version)
shutil.rmtree("NRSC2200", ignore_errors=True)
!git clone https://github.com/pjkohler/NRSC2200
# add ebbinghaus code to path
sys.path.append('NRSC2200/pphys_ebbinghaus')
# import python modules to use for analysis
import ebbinghaus
import numpy as np

## Step 2: Get the data from Open Science Foundation
The data files are shared in an public repository at the [Open Science Foundation](https://osf.io/s6mxd/]). 

The code below calls a function

    ebbinghaus.fetch_data()

that grabs the data and makes it accessible to the current workbook. 

If you inspect the function (look under "fetch_data" in ebbinghaus.py) you will notice the structure of the code

    !osf --project s6mxd clone
    
- "!" indicates that this we are running something from the command line, outside Python.
- "s6mxd" is the ID of the project on OSF.
- "clone" simply means we want to make a local copy of the files from OSF

The command will place the files in the local folder 

    s6mxd/osfstorage/data
    
which is where we grab the data from in the next step. The code also checks if files have already been downloaded, and if they have, it will not waste time redownloading.

A limitation that I am still trying to work out is that you cannot specify a subset files that you want to grab, you have to get the whole thing.

In [None]:
ebbinghaus.fetch_data("s6mxd")

## Step 3: Load and organize the data from each participant

This code uses the function
    
    ebbinghaus.load_data()

to load in the individual data files in csv format that each participant, including you, created when they did the experiment. When running an experiment online, you would normally have some sort of back-end that saves the files automatically somewhere, so you do not have to trust your participant to send you the raw data. 

Note that the function takes two additional inputs, grab_course and grab_term, which allows you to specify the specific course and term that you want to get data from. Here, we just grab data from our course - that should be plenty. 

In [None]:
data_dir = ["s6mxd/osfstorage/data"]
info_df = ebbinghaus.load_data(data_dir, grab_course = "nrsc2200", grab_term = "W2025")

## Step 5: Fit psychometric functions to the data

This code uses the function
    
    ebbinghaus.fit_data()
    
to fit psychometric functions to the data from each participant. This takes a few minutes. 

After fitting, the data from all participants are stored in a variable type called a *Pandas dataframe*. This variable type is useful in many ways, including that it makes it easy to save the combined data in a new csv file that can be used statistical analysis or figure making outside of Python. You will not be working with this new csv file in this course, instead you will be grabbing the data directly from the dataframe. 

In [None]:
info_df, same_params = ebbinghaus.fit_data(info_df, "same", False)
info_df, small_params = ebbinghaus.fit_data(info_df, "small", False)

# combine the parameters into a single variable
fit_params = [same_params, small_params]

## Assignment 2
### Question 1 (4 pts): 

The code 

        ebbinghaus.plot_ps(info_df, fit_params, participant_id)
    
creates two plots, one of the reaction time and one of the responses. The input arguments info_df and fit_params should already be on the workspace if you have run steps 1-5 of the code, and the argument participant_id is a string that indicates the participant ID you want to plot. You can also pass "means" to participant_id and the function will plot the average across all participants.

Reaction times and responses are plotted seperately for each of the seven physical sizes used for the inner *test disc* (in pixels): 20, 23, 24, 25, 26, 27, 30. The physical size of the inner *reference disc* was always 25 pixels. The same inducers condition is shown in blue, and the small inducers in orange. 

**(A)** Please use the function <span style="color: green; font-weight:bold">plot_ps</span> to plot the averages across all participants. 

**(B)** Then use the function <span style="color: green; font-weight:bold">plot_ps</span> to plot your own data - input your own participant ID to the function. If your data was excluded you may plot someone else's ID. Use the command:

        print(info_df.ID)

to get a list of IDs in the experiment. Try a few, and find someone with reasonable looking data. 

**(C)** Then draw, by hand, S-shaped *Psychometric functions* through the data, and indicate the approximate *Point of Subjective Equality* (PSE). Do this separately for the same and small inducer conditions, in different colors. Do this for both the average data and your own data.  

**(D)** Based on what you did in 1a-1c, is your effect size, measured using the PSEs, bigger than the average or smaller than the average? Explain why. 

**Please share the code used for A and B (2 lines per question, at most) and screenshots of the *Responses* part of the plot with your hand drawn Psychometric functions, as well as your written response to D, in your submitted assignment.**


In [None]:
# work on your answer here:


### Question 2 (2 pts): 

The code 

        pse_data = np.array(info_df[["pse-same","pse-small"]])

grabs data from the Pandas dataframe to create a 2 x n numpy array that contains the PSEs for the same and small inducers condition, respectively. 

**(A)** Please write a function <span style="color: orange; font-weight:bold">subject_count</span> that takes **pse_data** as input and applies the *shape* method to get the number of participants in your dataset and return that as an integer variable.
    
**(B)** Please write a function <span style="color: orange; font-weight:bold">summary_stats</span> that takes **pse_data** as input and uses the methods *mean* and *std* or their corresponding numpy commands to compute the **mean** and **standard deviation**, seperately for the two conditions, and returns them as two separate variables. Note that both mean and standard deviation can be computed in one line of code.

**(C)** Please write a function <span style="color: orange; font-weight:bold">compute_err</span> that takes **pse_data** as input and computes the **standard error** and 95% confidence interval, seperately for the two conditions, and returns them as two separate variables. Watch this [video](https://www.youtube.com/watch?v=AQy11Hfp_dU) (also on eClass) to learn how to compute **standard error** using the sample size (=number of subjects), and how to convert standard error to the 95% confidence interval. You can use the command np.sqrt to take the square root of a number. Your function should return two variables named:
    
        pse_stderr # standard error
        pse_ci # 95% confidence interval

**(D)** Then use the following code to plot your data as a bar plot with error bars. Try replacing pse_ci with pse_stderr in the below and observe how the error bars change:
    
        plt.bar((0,1), pse_means, yerr=pse_stderr, capsize=5)
        plt.ylim([15,30])

**Please submit the functions created for A, B and C for checking using VPL. Please share a screenshot of your bar plot created in D in your submitted assignment.**


In [None]:
# work on your answer here:

# (A)
def subject_count(pse_data):
    # your code here
    return(num_subs)

num_subs = subject_count(pse_data)

# (B)
def summary_stats(pse_data):
    # your code here
    return(pse_mean, pse_stdev)

pse_mean, pse_stdev = summary_stats(pse_data)

# (C)
def compute_err(pse_data):
    # your code here
    return(pse_stderr, pse_ci)

pse_stderr, pse_ci = compute_err(pse_data)

# (D)

### Question 3 (2 pts): 

The size of the *Ebbinghaus effect* for each participant can be computed as the difference between pse_same and pse_small. 

**(A)** Write a function <span style="color: orange; font-weight:bold">effect_sizes</span> that takes pse_data as input and subtracts pse_small from pse_same and assign the output to a new variable. This can be done in one line by selecting the different columns of pse_data. Now use the methods *max* and *min* or the corresponding numpy commands to compute the maximum and minimum effect sizes. Have the function returns them as two distinct variables. 

**(B)** Write a similar function <span style="color: orange; font-weight:bold">effect_ids</span> that again takes pse_data as input, and subtracts pse_small from pse_same to create a new variable, but instead uses the methods *argmax* and *argmin* or the corresponding numpy commands to get the index of the participant that has the maximum and minimum effect size. Have the function returns them as two distinct variables. 

You can then get the ID of the participants with the minimum or maximum effect sizes using this command
    
    print(np.array(info_df[["ID"]])[max_id,0])

where max_id is the index. Same approach for min_id.

**Please submit the functions created for A and B, for checking using VPL. Please report the ID of the participant with the maximum and minimum effects in your submitted assignment.**


In [None]:
# work on your answer here:

# (A)
def effect_sizes(pse_data):
    # your code here
    return(pse_max, pse_min)

pse_max, pse_min = effect_sizes(pse_data)

# (B)
def effect_ids(pse_data):
    # your code here
    return(max_id, min_id)

max_id, min_id = effect_ids(pse_data)

### Question 4 (2 pts):

Please read the article “The surface area of human V1 predicts the subjective experience of object size”, linked on eClass. 

Based on the findings presented in the article, what would you expect to be true about primary visual cortex (area V1) of those participants who have the largest Ebbinghaus effects in your experiment?

**Please share your answer to this question in your submitted assignment.**