# Lab 4: Review of loops and lists; writing functions; `pandas`
---

# Practice with loops and lists.

## 1. Loop through the provided list `my_list`.

For each element, check if it is an even number. If the element is an even number, append the **index** of that element to the list `inds`.  

Note that you are adding the index to `inds`, **not the element itself.** 

Hints:  
1. To check if a number is even, you can use the modulo `%` operator.
2. To loop through an iterable while keeping track of the index, you can use the `enumerate` function. Given a list, calling `enumerate` in a for loop will unpack both the index as well as the value in the list it is acting on. Example usage:

```python
for idx, val in enumerate(some_list):
```

In [None]:
# These variables are provided to you.
my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
inds = []

# Your answer here



---
## 2. Looping through Dictionaries.

Using the `students` dictionary provided below, write a for loop that loops across the dictionary and collects all subject numbers (e.g., 'S2') where the dictionary value is `False`. 

Imagine, for example, the dictionary indicates whether a student has completed an assignment, and we wanted to get a list of the students who had *not yet completed the assignment*. 

To answer this question, use a for loop across the `students` dictionary. You then need to get the associated value in each iteration, and check if it is `True`. If it is `True`, you can use `continue` to skip ahead to the next iteration. Otherwise, append the subject number (i.e. 'S2') to a list called `incomplete`.

In [None]:
# Run this cell
students = {
    "S1": True,
    "S2": False,
    "S3": True,
    "S4": False
}

In [None]:
# Your answer here



---
# Writing a function

In the next several problems you will need to write little functions that accomplish various goals. ***Be sure to include example usages of your custom-written functions (i.e., test them).***

## 3. Defining a function.

What does this function do?  Write a new cell below the function definition that calls the function using five different inputs and verify the outputs match your expectation.

In [None]:
def my_function(x, y):
    return x + y

---
## 4. What is not great about this function?

This function works but is not ideal, why?

In [None]:
def my_function_2(x, y, z):
    return x + y

---
## 5. Write your own function.

Write your own custom function named `catchy_name()` to write a program that takes as input two arguments which are assumed to be strings.  Return a new string which is the concatenations of the first two letters of each of the inputs.  e.g.: 
```
catchy_name("South", "Houston") -> "SoHo"
catchy_name("Jennifer", "Lopez") -> "JeLo"
```

Test it on your full name. 

In [None]:
# These variables are provided to you.
first_name = "Jennifer"
last_name = "Lopez"

# Your answer here



---
## 6. Cat-Dog.

Write a function called `catdog` that returns True if an input string given has the word cat and dog an equal number of times.  For example, 
```
cat_dog('catdog') # should return True
cat_dog('catcat') # should return False
cat_dog('1cat1cadodog') # should return True
```

Google 'python count occurrences in string' to find an existing function or method that counts the number of times a substring appears in a string to get started.

In [None]:
# Your answer here



---
## 7. Summing function.

Write a function called `summing()` that takes a list of numbers as input and returns the sum of all the numbers.  Use a `for` loop to do this.

```
summing([1, 2, 3]) → 6
summing([5, 11, 2]) → 18
summming([7, 0, 0]) → 7
```

In [None]:
# Your answer here



---
## 8. Biggest difference function. 

Given an array length 1 or more of ints, return the difference between the largest and smallest values in the array. Hint: You can use the built-in `min` and `max` functions.

```
big_diff([10, 3, 5, 6])  # Should return 7
big_diff([7, 2, 10, 9])  # Should return 8
big_diff([2, 10, 7, 2])  # Should return 8
```

In [None]:
# Your answer here



---
## 9. Simulating a dynamical system 

In mathematics, a [dynamical system](https://en.wikipedia.org/wiki/Dynamical_system) is a system
in which a function describes the time dependence of a point in a geometrical space. A canonical
example of a dynamical system is the [logistic map](https://en.wikipedia.org/wiki/Logistic_map),
a growth model that computes a new population density (between  0 and 1) based on the current
density. In the model, time takes discrete values 0, 1, 2, ...


1) Define a function called `logistic_map` that takes two inputs: `x`, representing the current
population (at time `t`), and a parameter `r = 1`. This function should return a value 
representing the state of the system (population) at the next time step `t + 1`, using the mapping function: 
$$f(t+1) = r * f(t) * [1 - f(t)]$$
Note that the indexing is not necessary within the function itself, as you will be using a loop in next step(s).

2) Using a `for` loop, iterate the `logistic_map` function defined in part 1, starting
from an initial population of 0.5, for a period of time `t_final = 10`. Store the intermediate
results in a list so that after the loop terminates you have accumulated a sequence of values
representing the state of the logistic map at times `t = [0,1,...,t_final]` (11 values in total).
Print this list to see the evolution of the population.

3) Encapsulate the logic of your loop into a function called `iterate` that takes the initial
population as its first input, the parameter `t_final` as its second input and the parameter
`r` as its third input. The function should return the list of values representing the state of
the logistic map at times `t = [0,1,...,t_final]`. Run this function for periods `t_final = 100`
and `1000` and print some of the values. Is the population trending toward a steady state?

In [None]:
# Your answer here (Part 1)


In [None]:
# Your answer here (Part 2)



In [None]:
# Your answer here (Part 3)




In [None]:
# Your answer here (use this cell to call "iterate" and run for different
# periods.


---
# `pandas` DataFrames

In this portion of the lab we will be working with pseudo data from a motor control experiment involving reaching movements to visual targets. The goal is to familiarize yourselves with processing and analyzing data using the `pandas` library. **Hint: Every answer should be possible with one line of code. The only exceptions are when you want to view your data frame after doing the task, i.e., call `df.head()` or something similar.** 

## Preliminaries
Import the `NumPy` and `pandas` libraries using appropriate aliases.

In [None]:
# Your answer here


Execute the next cell so that the `data` dictionary is created. Besides `participantID`, the values in each cell represent the corresponding subject's average for that variable across the 100 trials of the experiment.
- `sex` represents the biological sex of the subject 
- `RT` stands for "reaction time", the elapsed time between visual target presentation and movement onset
- `MT` stands for "movement time", the elapsed time between movement onset and movement end
- `Error` refers to the Euclidean distance (in mm) between the reach endpoint and the center of the target

In [None]:
# Run this cell
data = {
    "participantID": ["s01", "s02", "s03", "s04", "s05", 
                      "s06", "s07", "s08", "s09", "s10"],
    "sex": ["M", "M", "F", "M", "F", "F", "F", "M", "F", "M"], 
    "RT": [432, 501, 498, 1399, 359, 444, 442, 491, 508, 380], 
    "MT": [195, 233, 201, 176, 240, 300, 205, 223, 191, 366],
    "Error": [23.2, 15.6, 10.9, 11.3, 19.0, 10.1, 24.2, 11.7, 9.8, 8.3]
}

---
## 10. Convert `data` to a pandas DataFrame called `df`. Use the `.head()` method to view your new data frame.

In [None]:
# Your answer here


---
## 11. Use the `participantID` variable as indices for the DataFrame. 

In [None]:
# Your answer here


---
## 12. Edit the DataFrame.

A mistake was made recording s3's `MT`. After cross-referencing the values in the original data, you realize the value should actually be 374 ms. Replace the incorrect value with the correct one.

In [None]:
# Your answer here


---
## 13. Filter the data.

Create a Boolean mask to differentiate between valid and invalid data (or "outliers"). Here, invalid data are defined as an `Error` greater than 20mm *or* a `RT` greater than 1000ms. (Note: these are arbitrary thresholds; soon, you will learn much more principled ways of determining what is or is *not* an outlier value.) Also, it's typically your choice of whether you want your Boolean mask to have `True` values to represent valid or invalid data. However, for now, make sure `True` represents valid (i.e., non-outlier) values. 

In [None]:
# Your answer here


#### Use your mask to display only valid rows from `df`. 

In [None]:
# Your answer here


#### Assign the processed (filtered) data to a new DataFrame `df_filt`. 

In [None]:
# Your answer here


---
## 14. Analyze the filtered data.

Calculate mean `RT`s, `MT`s, and `Error`s across the remaining subjects and print the results. Do this in one line. 

In [None]:
# Your answer here


#### Create a new column `PerfIndex` that represents each participant's overall performance on the task (a lower score means better performance). The formula is: $$PerfIndex_i = \frac{1}{3}RT_i + \frac{2}{3}MT_i + Error_i^2$$  

Here, $i$ indexes the subject whose data is being converted. Note that you may receive a "SettingWithCopyWarning" message; however, this is one case where you can ignore the message for now. If you want to get rid of it, you can explicitly create a copy of the subsetted data frame you created in problem 13. (**Warning:** This is a rather arbitrary performance index; its use here is simply as part of the exercise and is not meant to provide any insight into actual motor control.)

In [None]:
# Your answer here


#### Using a `pandas` method, determine who performed best in this experiment? 

In [None]:
# Your answer here


**Split-apply-combine.** Were there any salient differences between males and females on this task? Apply the **split-apply-combine** paradigm to find out. There's no single right answer, but there are some logical starting points--e.g., comparing some measure of central tendency. By the way, I'm not looking for a rigorous statistical inference; I just want you to think and practice using the **split-apply-combine** technique. 

In [None]:
# Your answer here


---
This notebook has been adapted from materials from NYU's [Lab in Cognition and Perception](https://cims.nyu.edu/~brenden/courses/labincp/course-content/syllabus.html), the [Data Science in Practice textbook](https://datascienceinpractice.github.io/docs/index.html), and [Software Carpentry's Plotting and Programming in Python workshop](https://swcarpentry.github.io/python-novice-gapminder/index.html).