# Basic Python Programming

## Table of Contents

* [Task 1: Compute the Mean](#Task-1:-Compute-the-Mean)
    * [Description of the Mean](#Description-of-the-Mean)
    * [Summing a List of Numbers](#Summing-a-List-of-Numbers)
        * [For Loop With an Index](#For-Loop-With-an-Index)
        * [For Loop Without an Index](#For-Loop-Without-an-Index)
        * [Python's Built-In Sum](#Python's-Built-In-Sum)
    * [Your Mean Function](#Your-Mean-Function)
* [Task 2: Compute the Variance](#Task-2:-Compute-the-Variance)
    * [Description of Variance](#Description-of-Variance)
    * [Creating a New List From an Old One](#Creating-a-New-List-From-an-Old-One)
        * [Ceating a New List With a For Loop](#Ceating-a-New-List-With-a-For-Loop)
        * [Creating a New List With a List Comprehension](#Creating-a-New-List-With-a-List-Comprehension)
    * [Your Variance Function](#Your-Variance-Function)
* [Task 3: Compute the Skewness](#Task-3:-Compute-the-Skewness)
    * [Description of Skewness](#Description-of-Skewness)
    * [Your Skewness Function](#Your-Skewness-Function)
* [Task 4: Compute Arbitrary Central Moments](#Task-4:-Compute-Arbitrary-Central-Moments)
    * [Description of Central Moments](#Description-of-Central-Moments)
    * [Your Central Moment Function](#Your-Central-Moment-Function)
* [Bonus Task 4.1: A Function to Make Central Moment Functions](#Bonus-Task-4.1:-A-Function-to-Make-Central-Moment-Functions)
    * [Example of a Function Returning Function](#Example-of-a-Function-Returning-Function)
    * [Your Central Moment Function Function](#Your-Central-Moment-Function-Function)
* [Task 5: Assemble Your Statistics Module](#Task-5:-Assemble-Your-Statistics-Module)

## Task 1: Compute the Mean

### Description of the Mean

Consider the dataset of numbers $X = (1, 4, 3, -5, 2)$. For convenience we refer to the individual values with a subscript, so $x_0 = 1$ and $x_3 = -4$. The [mean](https://en.wikipedia.org/wiki/Expected_value) of a dataset, written $\overline{x}$ or $\mu$ or $\text{E}(X)$, is one way to measure its centre (the other common ways being median and mode). To compute the mean we must add up the numbers in the set and divide by the number of numbers in the set. That is

$$
\mu = \frac{1+4+3-5+2}{5} = 1 .
$$

For a general dataset $(x_0, x_1, \dots, x_{n+1})$, the mean is

$$
\mu = \frac{x_0+x_2+\dots+x_{n-1}}{n}.
$$

We can write this a bit more compactly with big-sigma notation, like

$$
\mu = \frac{1}{n} \sum_{i=0}^{n-1} x_i .
$$

You will write a function that computes the mean of a list of numbers. However, first we must work out how to sum a list of numbers. We'll use the list `[1, 4, 3, -5, 2]` for experiments.

In [None]:
data = [1, 4, 3, -5, 2]

### Summing a List of Numbers

#### For Loop With an Index

First I'll write at pseudocode for summing a list. Remember, pseudocode is just a rough statement of the steps, not real Python code.

```
For each index, i, from 0 to the last
    Add the number at index i to the sum
```

There are three parts to this: iterating over all indices from 0 to $n$, looking up the number at that index, and adding that number to a sum variable.

Python's `range` function will generate the list of indices from 0 to $n-1$, but how do I get $n$? $n$ is the number of numbers in the list, which is returned by Python's `len` function. So `range(len(data))` will return the list `0, 1, 2, ..., n-1` (remember that `range` doesn't return the ending value).

`i` is the usual name for an index, so I can iterate over the indices of the list with the line

```python
for i in range(len(data)):
```

Inside the `for` loop we have to look up the number at index `i` and add it to the sum. `sum` is a special name in Python, so I'll name the sum `s`. To look up something in a list in Python (and most programming languages) you use square brackets, like `data[i]`. To add something to a variable you use the `+=` operator. The for loop will then look like

```python
for i in range(len(data)):
    s += data[i]
```

However, if you run this code, you get an error.

In [None]:
for i in range(len(data)):
    s += data[i]

I've told Python to add to `s`, but I didn't give it a value to start with (called "initializing"). That's easily fixed by assigning 0 to `s`.

In [None]:
s= 0

for i in range(len(data)):
    s += data[i]
    
print(s)

Finally, I can wrap this in a function which takes a list and returns the sum. Again, `sum` is a special name in Python, but I can use `my_sum`.

In [None]:
def my_sum(lst):
    """Add up the values in a list.
    
    Arguments
        lst: A list of numbers.
        
    Returns:
        The sum of the numbers in the list.
    """
    s = 0
    
    for i in range(len(lst)):
        s += lst[i]
        
    return s

`list` is another special name in Python, so `lst` is commonly used to name a list variable (only if a more specific name isn't appropriate!).

Notice that I've included a doctring. Every function you ever write should have a docstring. Yours don't have to be as formal as mine, but a stranger should be able to read it and understand what your function does. Remember that you can read a functions docstring with `help` or `?`.

In [None]:
help(my_sum)

In [None]:
my_sum?

Finally, I will test my function. Many books have been written on how to test code, but I'll make do by checking a few different lists with sums I know.

In [None]:
my_sum(data)

In [None]:
my_sum([1] * 10)

In [None]:
my_sum(range(10))

In [None]:
my_sum([-1.1, 2.2, 3.3])

#### For Loop Without an Index

The previous code works perfectly well, but Python provides an easier way to iterate over a list. Notice that I didn't really care about the index `i` in my code. I only used `i` to get a value from the list, and Python's `for` loop lets me skip that middle step. I can iterate over the values in the list directly like so

```python
for value in data:
```

I can now rewrite my earlier loop a bit more simply.

In [None]:
s = 0

for value in data:
    s += value
    
print(s)

Notice that the `for` loop knows when to stop, so I can't mix up the final index of the list.

I can rewrite my function as well.

In [None]:
def my_sum(lst):
    """Add up the values in a list.
    
    Arguments
        lst: A list of numbers.
        
    Returns:
        The sum of the numbers in the list.
    """
    s = 0
    
    for value in lst:
        s += value
        
    return s

I'll run a few tests, just to make sure I *actually* didn't change anything.

In [None]:
my_sum(data)

In [None]:
my_sum([-1.1, 2.2, 3.3])

#### Python's Built-In Sum

Summing a list is such a common task that Python has a built-in function for it: `sum`.

In [None]:
sum(data)

In [None]:
sum([-1.1, 2.2, 3.3])

Notice that `sum` is coloured green in a code cell. Jupyter notebooks highlight all Python special words like that.

In [None]:
sum, list, abs, min, int

### Your Mean Function

Now you will write a function to calculate the mean of a list of numbers. You can write your own code to sum the list, or use Python's built-in `sum` function. I have provided a template to fill in, as well as some tests.

In [None]:
def mean(lst):
    """docstring"""
    return 0

In [None]:
# Correct value is 1.
mean(data)

In [None]:
# Correct value is 2.
mean([1,2,3])

In [None]:
# Correct value is 1.
mean([1, 1, 1])

In [None]:
# Correct value is -3.
mean([-2, -1, -6])

## Task 2: Compute the Variance

### Description of Variance

The [variance](https://en.wikipedia.org/wiki/Variance) of a dataset, written $\sigma^2$ or $\text{Var}(X)$, is a measure of how spread out its values are. It is always positive, and a larger variance means more of the data are far away from the mean. To compute the variance we first compute how far each value is from the mean, square those distances, then average the squares. Our example data, $X = (1, 3, 4, -5, 2)$, has mean 1, so this looks like

$$
\sigma^2 = \frac{(1-1)^2 + (3-1)^2 + (4-1)^2 + (-5-1)^2 + (2-1)^2}{5} = 10 .
$$

For a general dataset with mean $\mu$ this becomes

$$
\sigma^2 = \frac{(x_0-\mu)^2 + (x_1-\mu)^2 + \dots + (x_{n-1}-\mu)^2}{n} ,
$$

or, in big-sigma notation,

$$
\sigma^2 = \frac{1}{n} \sum_{i=0}^{n-1} (x_i - \mu)^2 .
$$

Note that this is the *biased* variance. There are several slightly different forms of variance, the next most common being the unbiased variance, which is divided by $n-1$ instead of $n$.

The variance is a bit more complicated to compute. You still need to sum a list, but not the list you're given. Instead you must modify the list by subtracting the mean from each value and squaring the result.

### Creating a New List From an Old One

#### Ceating a New List With a For Loop

I can do this with an explicit `for` loop. Consider the following pseudocode.

```
Compute the mean of the list
Create a new empty list
For each value in the data list
    Subtract the mean from the value
    Square the result
    Append the result to the new list
```

I've already used the name `mean` for a function, so I'll name this variable `data_mean`. Since the `mean` function isn't filled in as I'm writing this, I'll fill the value manually.

In [None]:
data_mean = 1
modified_data = []

for value in data:
    modified_data.append((value - data_mean)**2)
    
modified_data

#### Creating a New List With a List Comprehension

Python makes working with lists easy. I can simplify my code using a *list comprehension*. I specify what to do with each element, then where the elements are coming from. A list comprehension looks like

```python
[code_to_compute_new_value for name_for_old_value in name_of_list]
```

My code can be rewritten like

In [None]:
data_mean = 1
modified_data = [(value - data_mean)**2 for value in data]
modified_data

### Your Variance Function

Now you will write a function to calculate the variance of a list of numbers. Use the `mean` function you wrote earlier. I have provided a template to fill in, as well as some tests.

In [None]:
def variance(lst):
    """docstring"""
    return 

In [None]:
# Correct value is 10.
variance(data)

In [None]:
# Correct value is 0.
variance([1, 1, 1])

In [None]:
# Correct value is 2.0
variance([0, 1, 2, 3, 4])

## Task 3: Compute the Skewness

### Description of Skewness

The [skewness](https://en.wikipedia.org/wiki/Skewness) of a dataset, written $\gamma_1$, is a measure of how asymmetric the values are about the mean. Negative skew means most of the data are to the right of the mean, positive skew means most of the data are on the left, and zero skew means the data are symmetric about the mean. Skewness is computed similarly to variance, but with a cube instead of a square, and all values are divided by the standard deviation to normalize them (remember that the standard deviation is the square root of the variance). In big-sigma notation, this is

$$
\gamma_1 = \frac{1}{n} \sum_{i=0}^{n-1} \left( \frac{x_i - \mu}{\sigma} \right)^3 .
$$

Our example data, $X = (1, 3, 4, -5, 2)$, has mean 1 and variance 10, so this looks like

$$
\gamma_1 = \frac{1}{5} \left( \left( \frac{1-1}{\sqrt{10}} \right)^3 + \left( \frac{3-1}{\sqrt{10}} \right)^3 + \left( \frac{4-1}{\sqrt{10}} \right)^3 + \left( \frac{-5-1}{\sqrt{10}} \right)^3 + \left( \frac{2-1}{\sqrt{10}} \right)^3 \right) = \frac{-180}{50 \sqrt{10}} \approx -1.1384.
$$

Once again there is a distinction between biased and unbiased skewness, but we won't get into it here.

### Your Skewness Function

You will write a function to compute the skewness of a list of numbers. Use the `mean` and `variance` functions you wrote earlier.

In [None]:
def skewness(lst):
    """docstring"""
    return 0

In [None]:
# Correct value is -1.1384199576606167.
skewness(data)

In [None]:
# Correct value is 0.
skewness([1,2,3])

In [None]:
# Correct value is 0.1728005440786501.
skewness([-1, 2, 6, -3, 100])

## Task 4: Compute Arbitrary Central Moments

### Description of Central Moments

Notice that formulas for variance and skewness both contain something like $\frac{1}{n} \sum (x-\mu)^k$. These are called the [central moment](https://en.wikipedia.org/wiki/Central_moment) of the distribution, and they show up a lot. The $k$th central moment, written $\mu_k$ is

$$
\mu_k = \frac{1}{n} \sum_{i=0}^{n-1} (x - \mu)^k .
$$

With this definition, the variance is the second central moment and the skewness is $\gamma_1 = \mu_3 / \mu_2^{3/2}$. Many important statitistical values have very simple formulas in terms of central moments.

### Your Central Moment Function

You will write a function to compute the $k$th central moment of a list of numbers, for any $k$. Your function doesn't need to check that $k$ is a valid power just yet. Use the `mean` function you wrote earlier.

In [None]:
def central_moment(lst, k):
    """docstring"""
    return 0

In [None]:
# Corrent value is 0.
central_moment(data, 1)

In [None]:
# Correct value is 10.
central_moment(data, 2)

In [None]:
# Correct value is -36.
central_moment(data, 3)

## Bonus Task 4.1: A Function to Make Central Moment Functions

### Example of a Function-Returning Function

A Python function doesn't have to return a constant like `3` or `'asdf'`. A function can create and return a function. for example, the following function returns a function which appends a string to its input.

In [None]:
def appender_factory(suffix):
    def appender(string):
        return string + suffix
    
    return appender

The function `appender` is defined inside the body of `append_factory` and returned. The name `appender` isn't visible outside of `appender_factory`, but you can assign it to a name you choose and use it like any other function.

In [None]:
append_asdf = appender_factory('asdf')

`append_asdf` is a function which appends `'asdf'` to a string.

In [None]:
append_asdf('qwer')

You can create any number of different appender functions.

In [None]:
append_foo = appender_factory('foo')
append_foo('qwer')

In [None]:
append_asdf(append_foo('qwer'))

You don't technically need to assign the returned function before you use it. This is almost certainly a bad idea, though. Readability counts!

In [None]:
appender_factory('the very model of a modern Major General')('I am the ')

### Your Central Moment Function Function

You will write a function which creates a function which computes the $k$th central moment of a list, for any $k$.

In [None]:
def central_moment_factory(k):
    """docstring"""
    def central_moment(lst):
        return 1
    
    return central_moment

Your central moment function should return functions which do exactly the same things as your `variance` and `skewness` functions. The tests don't assign the central moment functions before using them, but only for brevity.

In [None]:
# Correct value is True.
variance_from_factory = central_moment_factory(2)
variance_from_factory(data) == variance(data)

In [None]:
# Correct value is True.
third_moment = central_moment_factory(3)

def skewness_from_factory(lst):
    return third_moment(lst) / variance_from_factory(lst)**(3/2)

skewness_from_factory(data) == skewness(data)

## Task 5: Assemble Your Statistics Module

Now that you have a few general purpose functions you should package them up into a module. A module can be imported into any other Python code so you don't have to copy and paste your code.

You will first create a file called `statistics.py` and copy your `mean`, `variance`, `skewness`, and `central_moment` function defintions (and `central_moment_factory` if you wrote one) into it. You can do this by creating a new text file and changing its name. Creating a new Python 3 file won't work because that creates a Python 3 Jupyter notebook.

Next you will create a file called `__init__.py` with nothing in it. This file simply tells Python that there are modules in this directory. You can do more with it, but that's beyond the scope of this assignment.

You can import the functions from a module with

In [None]:
import statistics

The functions in `statistics.py` are now accessible through the name `statistics`.

In [None]:
statistics.variance(data)

In [None]:
statistics.skewness(data)

You have to be a bit careful with modules and Jupyter notebooks. Once you've imported a module the notebook won't always notice if you change the module's code. If you're working on a module and changes aren't showing up in a notebook, just restart the notebook's kernel.

## Summary

And the end of this assignment you should be comfortable with:

* iterating over a list with a `for` loop,
* creating a list with a list comprehension,
* writing a simple function to compute a value,
* writing docstrings, and
* putting code into a reusable module.