# Functions

Have a look at this program (no, don't run it -- it won't run):



In [None]:
# Load the raw data using their helper function and use just a single ROI as a mask
# Note, I cleaned up how the path here works from their example
print('Loading the data')
tutorial_data_path='/mnt/mridata2/pymvpa_tutorials'
ds = load_tutorial_data(path=tutorial_data_path, roi='vt')\

# Do some typical pre-processing.
print('Pre-processing')
# Remove 1st order (linear) drift by making separate drift estimate per 'chunk' 
# (scan run here)
poly_detrend(ds, polyord=1, chunks_attr='chunks')
# Convert the EPI values into z-scores using the rest samples to come up with 
# the mean and standard deviation that we use to calculate z-scores.  This is
# again by chunk.  
zscore(ds, param_est=('targets', ['rest']))

# Filter out the rest blocks
print('Removing rest blocks - started at {0} samples'.format(ds.shape))
ds = ds[ds.sa.targets != 'rest']
print('Now {0} samples'.format(ds.shape))

# Average timepoints within each target type and chunk
print('Averaging within each target type and chunk')
run_averager = mean_group_sample(['targets', 'chunks'])
ds = ds.get_mapped(run_averager)
print('Averaged within-blocks -- now have {0} samples'.format(ds.shape))


That's almost readable.  There's some function `load_tutorial_data()` that takes a path and some _roi_ parameter. It does the loading however it does the loading, but then must return the loaded data as the _ds_ variable gets set to whatever was returned.  So, that function does some heavy lifting of reading stuff from the disk and now all I as a user need to do is to hold onto whatever it returns.  

Next what happens ... well there's some kind of trend in the data that we seem to want to remove.  We pass the data into `poly_detrend()` and then there's some kind of normalizing that must be going on inside of `zscore()`.  See how you can read the code reasonably and understand it?

One of the main purposes of functions is to make your code more readable.

A second purpose is to make it reusable.  Everything in SciPy, NumPy, and Pandas that you're using here came down to someone writing that function.  So, you're using their work.  They used each other's work as well to make all these packages and make them more useful.  **By writing functions (and doing it well), you're adding new tools into the language that you, or someone else, can use.**

Here are a few sample functions:


```
def funcA(i):
    j=100
    return i+j

def funcB(i=10):
    j=100
    return i+j

def funcC(i, j=100):
    return i+j

def funcD():
    i=10
    j=100
    return i+j

def funcE(i):
    j=100
```


All start out the same with the `def` keyword, the name of the function, some parentheses, and a colon.  That's mandatory.  Past that, though, they're a touch different.



*   In `funcA`, _i_ is a mandatory input argument and it returns _i_+100 (as _j_ is always 100).


*   In `funcB`, _i_ is now an optional argument that has a default value of 10.  If I say `funcB()` and don't specify something for _i_, the return value will be 110 as _i_takes its default value of 10.  If I say `funcB(150)` the return value will be 250.


*   In `funcC`, _i_ is mandatory, _j_ is optional and has a default value of 100.  I could call `funcC(50)` and get 150 back.  I could call `funcC(50,50)` and get 100 back.


*   In `funcD`, no input is given.  It will always return 110.


*   `funcE` is stupid. It takes a parameter _i_, doesn't do anything with it and doesn't return anything either.  Don't write `funcE`.  At the very least, do something inside on some known globals, fixed-location files, etc.  But, do know that functions don't need inputs and don't need to have return values.
Keep in mind, of course, that in each of these examples, _i_ and _j_ are local to those functions.  Your outside code doesn't know about those variable names.  So, I might have code elsewhere that says:


```
d=load_data()
proc_d=funcA(d)
```


What this does is to call `load_data()` which much know where to get the data from (or prompt the user). That function will have a `return` that sends some data back.  Once back, we assign it to the variable _d_.  Now, we call `funcA` and pass in whatever is in _d_ (what got returned from `load_data()`) and internally, we know it calls this _i_.  We don't really care though as that is all inside of `funcA`.  When it comes back, we assign this to _proc_d_.

Now, if we never need the raw _d_ again, we could have also said:


```
proc_d=funcA(load_data())
```


So - think about writing useful bits of code that either help break up a big problem into small problems or would be useful in several places in your code.  In general, if a block of code is going beyond a page or so, you should really be looking at breaking it up into different functions.


## Returning multiple values

In the above, we covered multiple input parameters and optional parameters, but what about returning multiple values?  Technically, you can only return one thing (which suggests you're up a creek), but remember that that one thing can be something like a tuple or a list.  So we can do things like:


```
def UniteMiddleEast(number_of_cat_memes_to_deploy=1000):
   ...  # This part is an exercise for the reader
   return (phase1,phase2)

ph1,ph2=UniteMiddleEast(np.inf)
```


So here, we take an optional parameter, do lots of computation, and return two results. We do this by bundling them in a tuple. So, we're returning just one thing - a tuple. On the calling end, we have a tuple consisting of _(ph1,ph2)_ getting the return value (you don't actually need the parentheses here).  Python then maps the two elements of the tuple being returned to these two variables.


# Flow control

Our programs need to be able to branch and execute certain lines under certain conditions.  We need to loop code over and over some number of times or for every element on a list, however long that list may be.  We need to be able to break out of these or bail out of functions when things go badly.  All this is flow control.


## If / else / elif

If / else-if / else -- the classic branching kind of statement is in every programming language.  Each one has a little different syntax (_else if_ vs. _elseif_ vs. _elsif_ vs. _elif_ for starters...) but the concept is always the same.  Have a look at the following:


```
if size > 100:
    mode = 1
elif size < 50:
    mode = 2
elif size < 70:
    mode = 3
else:
    mode = 4
```


Now, if _size_=110 our resulting mode will be 1. If it's 40, the first `if` fails and we get to the first `elif`. That succeeds, so our mode is now 2. If size were 60 though, the first `if` fails, the first `elif` fails, but the second `elif` is true so our mode is now 3. Now, if it's 80, our mode gets to be 4. So this code is saying "above 100 is mode 1; below 50 is mode 2; between 50 and 70 is 3; between 70 and 100 is 4".

Think about that for a second - in particular that last bit about 70-100 being 4. Because the first `if` failed, we know the value of _size_ is &lt;= 100. None of the other `elif` lines or the `else` is ever even looked at if _size_ > 100. So, we know by virtue of hitting that `else` that _size_ must be under or equal to 100.  Since the first `elif` failed, we know it's not &lt;50.  So a this point, it's 50-100.  But we also know that second one failed and that the value isn't &lt; 70.  So, it must be between 70-100 (note, I'm being lose here with my ranges and the exact values of 50, 70, and 100 to try to keep the text less awkward).

You might accomplish the same kind of thing with code like this: 


```
if size > 100:
    mode = 1
if size < 50:
    mode = 2
if size > 50 and size < 70:
    mode = 3
if size > 70 and size < 100:
    mode = 4
```


The reason for doing the former, though, is that you can have some more interesting control.  I the latter version, each `if` is evaluated whereas in the former, we only hit an `elif` when the things above it have failed.  This gives some nice flexibility.  We also get to have a catch-all `else` to make sure that something happens each time, no matter what. 


## While loops

Want to keep doing something over and over until a condition is satisfied?  That's a while-loop.


```
converged=False
while not converged:
    converged=fit_data(data)
```


Here, we set a flag, _converged_, to be `False` and so when we hit this `while` loop and we evaluate _not converged_ that evaluates to `True` so we enter the loop.  We keep calling `fit_data()` assigning its output to _converged_.  We do this as many times as it takes to have it return `True`.  Once it's `True`, _not converged_ is false, so we don't enter the loop again.

The astute among you may have noticed a potential problem.  What if `fit_data()` never returns `True`? We have an infinite loop and the program will hang.  

You can get around this in a few ways demo'ed here:


```
converged = False
count = 0
while not converged:
    converged = fit_data()
    count += 1
    if count >= 10:
        break    
print ('ran {0} times'.format(count))
  
converged = False
count = 0
while (not converged) and (count < 10):
    converged = fit_data()
    count += 1  
print ('ran {0} times'.format(count))
```


In the first one, we make use of the `break` statement.  `break` will bail out of any loop you're in now - in this case, jumping to that `print` line.  In the second version,  we have two checks on entering the loop each time -- if we've not converged and if we've not run it 10 times.  If both of those are true (the `and` statement), we enter the loop.

Note, Python has a syntax you don't see very often:


```
while CONDITION:
    stuff
else:
    other_stuff
```


In this case, _other_stuff_ gets executed when the _condition_ is `False`.  So, if we don't enter the loop (_condition_ starts off as `False`) or when it's done.  Note, using a `break` will not have the `else` be executed as the _condition_ never went `False`. 


## For loops

If you want to go through a loop N times or iterate over every element of something like a list, a `for` loop is your answer.  For example:


```
for animal in ['dingo','puma','okapi']:
    print (animal)
```


Will print out 'dingo', 'puma', and 'okapi' in that order. Likewise, both of these will do the same thing:


```
animals = ['dingo','puma','okapi']
for animal in animals:
    print (animal)

animals = ['dingo','puma','okapi']
for i in range(len(animals)):
    print (animals[i])
```


With for loops, we have a few commands we've seen before and one new one: 



*   `break` - bail out of the loop (just like with _while_ or _if_ statements)
*   `else` - similar to the while-loop - this gets done when the loop is exhausted
*   `continue` - jump to next iteration

There's one more handy format in for-loop syntax to know.  While being very Phythonic has you avoid things like `for i in range...`, in scientific work, we often need to keep track of an index.  Python lets you have the best of both worlds, getting the index and the item itself in one loop:


```
animals = ['dingo','puma','okapi']
for ind,animal in enumerate(animals):
   print (ind,animal,animals[ind])
```

## pass

Sometimes, you just want to do nothing, amIright?  Just nothing... But, in Python, you'll at times need to have a line of code somewhere to follow the syntax, acting as if you're doing something.  But again, sometimes you want to do nothing.  The solution?

```
pass
```

The `pass` command will do just that.  It's a placeholder for when you need to say you're doing something but really want to do nothing.



***In the space below,*** write a for-loop or while loop that takes `i` going from 1-10 (inclusive) and:
- If `i` < 5, print "small" and add 1 to `sum` 
- If `i` is between 5 and 7 (inclusive), print "medium" and add 2 to `sum`
- If `i` is between 8 and 10 (inclusive), print "large" and add 3 to sum

In [6]:
sum=0
# Your code
for i in range(1,11):
    if i < 5:
        print("small")
        sum +=1 
    elif 5 <= i <= 7:
        print("medium")
        sum += 2
    elif 8 <= i <= 10:
        print("large")
        sum += 3


print(sum)

small
small
small
small
medium
medium
medium
large
large
large
19


# List comprehensions
Let's say you've got a list that contains some data, but bad values in the data have been marked as -999.  Good data are anything other than -999 and we want to filter out those -999 values and have a list with just the good data. A new-to-Python programmer might write code that looks like this:
```
data=[1,8,0,-999,10,4,-999,-2,12]
gooddata=[]
for elem in data:
    if elem != -999:
        gooddata.append(elem)
``` 

It works, sure. But, it's not very Pythonic and it relies on the `append()` method which can be horribly slow.  Python has a concept called a *list comprehension* that is very powerful and that'll you see a ton.  It can be a bit awkward to unpack until you get the hang of it, but here's that same code using a comprehension:
```
data=[1,8,0,-999,10,4,-999,-2,12]
gooddata=[ elem for elem in data if elem != -999 ]
```

Let's unpack that a bit.  The general syntax is: `newlist = [expression for item in iterable if condition ]`.  So, we're saying:
- Loop over `data` and each time in the loop, set the value in `data` to be `elem`.  Aka, `for elem in data:`
- Inside the loop, check to see if `elem` isn't -999. Aka, `if elem != 999`
- If that evaluates to `True`, make the list you're bulding up take on the value `elem` (the intial [ *elem* for ...] part)

Here, we just built up the list based on the actual value of `elem`. But, that's not at all necessary.  Let's filter the list and make each element 2x the original:

```
gooddata=[ 2*elem for elem in data if elem != -999 ]
```

Note, if you want things to happen to every element, you can drop the `if`

```
gooddata=[ 2*elem for elem in data]
```

You can also have `else` statements in your "expression" part.  Here, we turn the -999 into 'NA':

```
gooddata=[ elem if elem != -999 else "NA" for elem in data ]
```


***In the space below,*** write a list comprehension that uses `abs` to make a new list that has the absolute value of each element. So, the output should show `[1, 8, 0, 999, 10, 4, 999, 2, 12]`

In [9]:
data=[1,8,0,-999,10,4,-999,-2,12]
absdata= [abs(elem) for elem in data]
print(absdata)

[1, 8, 0, 999, 10, 4, 999, 2, 12]


***In the space below,*** write a single list comprehension that uses `abs` to make a new list that has the absolute value of each element and also removes all the -999 values. So, the output should show `[1, 8, 0, 10, 4, 2, 12]`

In [10]:
data=[1,8,0,-999,10,4,-999,-2,12]
absdata= [abs(elem) for elem in data if elem!= -999]
print(absdata)

[1, 8, 0, 10, 4, 2, 12]



# Raising errors

Every time you write a function, you should really check to make sure the inputs are valid.  If they're not, you should raise an error.  This allows other bits of code (be it yours or someone else's) to know that things didn't go as planned.  For example:


```
def MyFunc (data):
    """Remember to write your description"""
    if type(data) is not np.ndarray:
        raise TypeError('Must pass in np.array')
    if data.ndim != 1:
        raise TypeError('Must be a 1D array')
    if len(data) < 2:
        raise ValueError('Must be at least 2 elements long')
```


The two kinds of errors we raise here are the most common we'll use.  _TypeError_ and _ValueError_.  There are a number more, but for most of your checking, these will suffice.

You may be wondering at this point, why go through all this?  Why not just print an error out to the screen?  Well, if your program is running for hours and generting thousands of lines of output on the screen or in a log file, are you really going to see that always? Also, other parts of your code can check to see if these errors have been raised and either bail, or handle them gracefully.   For example:


```
try:
    SuperCoolComplexFilter(data)
except:
    print("badness happened - using the simpler filter")
    SimplerReliableFilter(data)
...
```


Here, the program would try to use the `SuperCoolComplexFilter()` but if that had an issue and did a "raise" the exception would get passed up and the `try` here would fail. Control then goes to the `except` where we alerting the user (via the print) and using some simpler processing of the data as a result.

# Static typing
Back in Class 1.2, we mentioned that Python is _dynamtically typed_. That is, the type that a variable holds can change on the fly. It's a big part of what makes Python so easy to use. You don't need to pre-declare what type of info a variable holds and you can change this later on without issue. Great!  What's the downside?

The downside is that while it makes some things easier, it makes some things tougher for you as the programmer.  By now, you should know that strings have functions like `upper()`. It sure would be nice if your Python environment were smart enough to know that your variable is a string so when you type the variable name and then a period that all the available functions would appear wouldn't it?

Python gives you the ability to do this via [type hints](https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html). One huge thing to note is that **Python itself doesn't care about these hints**. They are only there for your working environment. They're here to help you and do not affect the actual running of the code at all.  Here's a simple example:

In [2]:
foo='Llamas are bigger than frogs'
foo2: str='The llama is a quadruped which lives in big rivers like the Amazon'

Now, it can get a bit more useful still. You can also say what a function should take as input and also what to expect it to return and you can use this to help debug your code. Look at the cell below.  Do you see errors highighted in red underline?  If not, you'll need to enable the feature. For this to work in VSCode, you'll need to make sure that the Pylance extension is installed (probably is) and you'll need to make sure type checking is enabled. For that, go to your settings (Ctrl-, or Cmd-,) and start typing "typecheck".  You should see an entry there that may be set to "off".  Set it to at least "basic".

In [1]:
def foo(a: int, b: float, c: str) -> str:
    return f'{a} {b} {c}'

def foo2(a) -> int:
    b: float=float(a)
    return b

s1=foo(1,2.2,'Nobody')
s2: str = foo(1,2.2,'expects')
s3: int = foo(1,2.2,'the')
s4 = foo('Spanish',1,2.2)

def foo3(a) -> float:
    b: int=int(a)
    return b


It's flagging a few errors.  First, foo2() says it returns an int, but actually returned a float.  So, that return line highlights an error.

Second, s1 and s2 are fine, but s3 and s4 aren't.  ***Why***?

Finally, foo3() doesn't flag an error but looks a lot like foo2().  ***Why not?***  For a hint, why is it that in s4, only the first and third arguments are bad?