## Step 1: Import some packages 
As in often the case, we start our code by importing some Python modules. 

Remember: **Your code will not work** unless you run the cell in which the modules are imported.

In [10]:
# import python modules to use for analysis
import ebbinghaus

## Step 2: Get the data from Open Science Foundation
The data files are shared in an public repository at the [Open Science Foundation](https://osf.io/s6mxd/]). 

The code below calls a function that grabs the data and makes it accessible to the current workbook. 

If you inspect the function ("fetch_data" in ebbinghaus.py) you will notice the structure of the code

    !osf --project s6mxd clone
    
- "!" indicates that this we are running something from the command line, outside Python.
- "s6mxd" is the ID of the project on OSF.
- "clone" simply means we want to make a local copy of the files from OSF

The command will place the files in the local folder 

    s6mxd/osfstorage/data
    
which is where we grab the data from in the next step.

A limitation that I am still trying to work out is that you cannot specify a subset files that you want to grab, you have to get the whole thing.

In [11]:
ebbinghaus.fetch_data("s6mxd")

66
data already exists, not downloading


## Step 3: Load and organize the data from each participant

The code below loads in the individual data files in csv format that each participant, including you, created when they did the experiment. When running an experiment online, you would normally have some sort of back-end that saves the files automatically somewhere, so you do not have to trust your participant to send you the raw data. 

In [12]:
data_dir = ["s6mxd/osfstorage/data"]
info_df = ebbinghaus.load_data(data_dir)

Excluding NRSC2200_S2023 : 1456, chance performance
Excluding NRSC2200_S2023 : 7823, chance performance
Excluding NRSC2200_S2023 : 9092, chance performance
Excluding NRSC2200_S2023 : 1493, chance performance
Excluding NRSC2200_S2023 : 2543, chance performance
Excluding NRSC2200_S2023 : 4733, chance performance
Excluding NRSC2200_S2023 : 6160, chance performance
Excluding PSYC4260_F2021 : 0972, less than 10 trials with RT < 3 secs
Excluding PSYC4260_F2021 : 6555, chance performance
Excluding PSYC4260_F2021 : 8491, less than 10 trials with RT < 3 secs


## Step 5: Fit psychometric functions to the data

This code uses the function
    
    ebbinghaus.fit_data()
    
to fit psychometric functions to the data from each participant. This takes a few minutes. 

After fitting, the data from all participants are stored in a variable type called a *Pandas dataframe*. This variable type is useful in many ways, including that it makes it easy to save the combined data in a new csv file that can be used statistical analysis or figure making outside of Python. You will not be working with this new csv file in this course, instead you will be grabbing the data directly from the dataframe. 

In [13]:
info_df, fit_params = ebbinghaus.fit_data(info_df)

Fitting psychometric functions for "same" condition ... finished!
Fitting psychometric functions for "small" condition ... finished!


## Assignment 2
### Question 1 (4 pts): 

The code 

    ebbinghaus.plot_ps(info_df, fit_params, participant_id)
    
creates two plots, one of the reaction time and one of the responses. The input arguments info_df and fit_params should already be on the workspace if you have run steps 1-5 of the code, and the argument participant_id is a string that indicates the participant ID you want to plot. You can also pass "means" to participant_id and the function will plot the average across all participants.

Reaction times and responses are plotted seperately for each of the seven physical sizes used for the inner *test disc* (in pixels): 20, 23, 24, 25, 26, 27, 30. The physical size of the inner *reference disc* was always 25 pixels. The same inducers condition is shown in blue, and the small inducers in orange. 

(a) Please use the function plot_ps to plot the averages across all participants. 

(b) Then use the function plot_ps to plot your own data - input your own participant ID to the function. If your data was excluded you may plot someone else's ID. Participant "999" has very reasonable data. 

(c) Then draw, by hand, S-shaped *Psychometric functions* through the data, and indicate the approximate *Point of Subjective Equality* (PSE). Do this separately for the same and small inducer conditions, in different colors. Do this for both the average data and your own data.  

(d) Based on what you did in 1a-1c, is your effect size, measured using the PSEs, bigger than the average or smaller than the average? Explain why. 

Your answer should include the code used in a and b (2 lines per question, at most) and screenshots of the *Responses* part of the plot with your hand drawn functions.  



In [34]:
# work on your answer here:


### Question 2 (4 pts): 

The code 

    pse_data = np.array(info_df[["pse-same","pse-small"]])

grabs data from the Pandas dataframe to create a 2 x n numpy array that contains the PSEs for the same and small inducers condition, respectively. 

(a) Please use the *shape* method to get the **number of participants in your dataset** and save that as a integer variable named
    
    num_subs

(b) Please use the methods *mean* and *std* or their corresponding numpy commands to compute the **mean** and **standard deviation**, seperately for the two conditions, and save them as two variables named:

    pse_means
    pse_stdev

Note that both mean and standard deviation can be computed in one line of code.

(c) Watch this [video](https://www.youtube.com/watch?v=AQy11Hfp_dU) (also on eClass) to learn how to compute **standard error** using the sample size, and how to convert standard error to the 95% confidence interval. You can use the command *np.sqrt* to take the square root of a number. Save as two variables named:
    
    pse_stderr # standard error
    pse_ci # interval

Use the following code to plot your data as a bar plot with error bars:
    
    plt.bar((0,1), pse_means, yerr=pse_ci, capsize=5)
    plt.ylim([15,30])
    
Try replacing pse_ci with pse_stderr in the above and observe how the error bars change. 
    
(d) The size of the *Ebbinghaus effect* for each participant can be computed as the difference between pse_same and pse_small. Use pse_data to subtract pse_small from pse_same and assign the output to a new variable called:

    pse_diff

this can be done in one line by selecting the different columns of pse_data. Now use the method *max* or its corresponding numpy command to assign the maximum effect size to a new variable called

    pse_max
    
Bonus: Use the method *argmax* or its corresponding numpy command to get the index of the participant that has the maximum effect size. You can then get the ID of that participant using this command
    
    print(np.array(info_df[["ID"]])[max_id,0])

where max_id is the index. 

Your answer should include the code used in a-d and a screenshot of the bar plot you created in (c). 


In [None]:
# work on your answer here:

## Question 3 (2 pts):

According to the article “The surface area of human V1 predicts the subjective experience of object size”, linked on eClass, what would you expect to be true about primary visual cortex (area V1) of those participants who have the largest Ebbinghaus effects in your experiment?