# Basic Python Programming

## Table of Contents

* [Task 1: Compute the Mean](#Task-1:-Compute-the-Mean)
    * [Description of the Mean](#Description-of-the-Mean)
    * [Summing a List of Numbers](#Summing-a-List-of-Numbers)
        * [For Loop With an Index](#For-Loop-With-an-Index)
        * [For Loop Without an Index](#For-Loop-Without-an-Index)
        * [Python Built-In](#Python-Built-In)
    * [Your Mean Function](#Your-Mean-Function)
* [Task 2: Compute the Variance](#Task-2:-Compute-the-Variance)
    * [Description of Variance](#Description-of-Variance)
* [Task 3: Compute the Skewness](#Task-3:-Compute-the-Skewness)
* [Task 4: Compute Arbitrary Central Moments](#Task-4:-Compute-Arbitrary-Central-Moments)
* [Task 5: Assemble Your Statistics Module](#Task-5:-Assemble-Your-Statistics-Module)

## Task 1: Compute the Mean

### Description of the Mean

Consider the dataset of numbers $X = (1, 4, 3, -5, 2)$. For convenience we refer to the individual values with a subscript, so $x_0 = 1$ and $x_3 = -4$. The mean of a dataset, written $\overline{x}$ or $\mu$, is one way to measure its centre (the other common ways being median and mode). To compute the mean we must add up the numbers in the set and divide by the number of numbers in the set. That is

$$
\overline{x} = \frac{1+4+3-5+2}{5} = 1 .
$$

For a general dataset $(x_0, x_1, \dots, x_{n+1})$, the mean is

$$
\overline{x} = \frac{x_0+x_2+\dots+x_{n-1}}{n}.
$$

We can write this a bit more compactly with big-sigma notation, like

$$
\overline{x} = \frac{1}{n} \sum_{i=0}^{n-1} x_i .
$$

You will write a function that computes the mean of a list of numbers. However, first we must work out how to sum a list of numbers. We'll use the list `[1, 4, 3, -5, 2]` for experiments.

In [1]:
data = [1, 4, 3, -5, 2]

### Summing a List of Numbers

#### For Loop With an Index

First I'll write at pseudocode for summing a list. Remember, pseudocode is just a rough statement of the steps, not real Python code.

```
For each index, i, from 0 to the last
    Add the number at index i to the sum
```

There are three parts to this: iterating over all indices from 0 to $n$, looking up the number at that index, and adding that number to a sum variable.

Python's `range` function will generate the list of indices from 0 to $n$, but how do I get $n$? $n$ is the number of numbers in the list, which is returned by Python's `len` function. So `range(len(data))` will return the list `0, 1, 2, ..., n-1` (remember that `range` doesn't return the ending value).

`i` is the usual name for an index, so I can iterate over the indices of the list with the line

```python
for i in range(len(data)):
```

Inside the `for` loop we have to look up the number at index `i` and add it to the sum. `sum` is a special name in Python, so I'll name the sum `s`. To look up something in a list in Python (and most programming languages) you use square brackets, like `data[i]`. To add something to a variable you use the `+=` operator. The for loop will then look like

```python
for i range(len(data)):
    s += data[i]
```

However, if you run this code, you get an error.

In [2]:
for i in range(len(data)):
    s += data[i]

NameError: name 's' is not defined

I've told Python to add to `s`, but I didn't give it a value to start with (called "initializing"). That's easily fixed by assigning 0 to `s`.

In [3]:
s= 0

for i in range(len(data)):
    s += data[i]
    
print(s)

5


Finally, I can wrap this in a function which takes a list and returns the sum. Again, `sum` is a special name in Python, but I can use `my_sum`.

In [4]:
def my_sum(lst):
    """Add up the values in a list.
    
    Arguments
        lst: A list of numbers.
        
    Returns:
        The sum of the numbers in the list.
    """
    s = 0
    
    for i in range(len(lst)):
        s += lst[i]
        
    return s

`list` is another special name in Python, so `lst` is commonly used to name a list variable (only if a more specific name isn't appropriate!).

Notice that I've included a doctring. Every function you ever write should have a docstring. Yours don't have to be as formal as mine, but a stranger should be able to read it and understand what your function does. Remember that you can read a functions docstring with `help` or `?`.

In [5]:
help(my_sum)

Help on function my_sum in module __main__:

my_sum(lst)
    Add up the values in a list.
    
    Arguments
        lst: A list of numbers.
        
    Returns:
        The sum of the numbers in the list.



In [6]:
my_sum?

Finally, I will test my function. Many books have been written on how to test code, but I'll make do by checking a few different lists with sums I know.

In [7]:
my_sum(data)

5

In [8]:
my_sum([1] * 10)

10

In [9]:
my_sum(range(10))

45

In [10]:
my_sum([-1.1, 2.2, 3.3])

4.4

In [12]:
my_sum([1j, 2+2j, -1])

(1+3j)

#### For Loop Without an Index

The previous code works perfectly well, but Python provides an easier way to iterate over a list. Notice that I didn't really care about the index `i` in my code. I only used `i` to get a value from the list, and Python's `for` loop lets me skip that middle step. I can iterate over the values in the list directly like so

```python
for value in data:
```

I can now rewrite my earlier loop a bit more simply.

In [13]:
s = 0

for value in data:
    s += value
    
print(s)

5


Notice that the `for` loop knows when to stop, so I can't mix up the final index of the list.

I can rewrite my function as well.

In [14]:
def my_sum(lst):
    """Add up the values in a list.
    
    Arguments
        lst: A list of numbers.
        
    Returns:
        The sum of the numbers in the list.
    """
    s = 0
    
    for value in lst:
        s += value
        
    return s

I'll run a few tests, just to make sure I *actually* didn't change anything.

In [15]:
my_sum(data)

5

In [16]:
my_sum([-1.1, 2.2, 3.3])

4.4

#### Python Built-In

Summing a list is such a common task that Python has a built-in function for it: `sum`.

In [17]:
sum(data)

5

In [18]:
sum([-1.1, 2.2, 3.3])

4.4

Notice that `sum` is coloured green in a code cell. Jupyter notebooks highlight all Python special words like that.

In [19]:
sum, list, abs, min, int

(<function sum>, list, <function abs>, <function min>, int)

### Your Mean Function

Now you will write a function to calculate the mean of a list of numbers. You can write your own code to sum the list, or use Python's built-in `sum` function. I have provided a template to fill in, as well as some tests.

In [20]:
def mean(lst):
    """docstring"""
    return 0

In [21]:
mean(data)

0

In [22]:
mean([1,2,3])

0

In [23]:
mean([1, 1, 1])

0

In [24]:
mean([-2, -1, -6])

0

## Task 2: Compute the Variance

### Description of Variance

The variance of a dataset, written $\sigma^2$ or $\operatorname{Var}(X)$, is a measure of how spread out its values are. To compute the variance we first compute how far each value is from the mean, square those distances, then average the squares. Our example data, $X = (1, 3, 4, -5, 2)$, has mean 1, so this looks like

$$
\operatorname{Var}(X) = \frac{(1-1)^2 + (3-1)^2 + (4-1)^2 + (-5-1)^2 + (2-1)^2}{5} = 10 .
$$

For a general dataset with mean $\mu$ this becomes

$$
\operatorname{Var}(X) = \frac{(x_0-\mu)^2 + (x_1-\mu)^2 + \dots + (x_{n-1}-\mu)^2}{n} ,
$$

or, in big-sigma notation,

$$
\operatorname{Var}(X) = \frac{1}{n} \sum_{i=0}^{n-1} (x_i - \mu)^2 .
$$

## Task 3: Compute the Skewness

## Task 4: Compute Arbitrary Central Moments

## Task 5: Assemble Your Statistics Module