# Introduction to `Python`

Though learning Python in any meaningful sense is far beyond what we can tackle in the last week of the semester, it will be useful to examine a few aspects of the language before diving into data analysis. (If you are motivated and have the time, I highly recommend [the Software Carpentries "Programming with Python" lesson](https://swcarpentry.github.io/python-novice-inflammation/).)

First, you should get familiar with the Jupyter Notebook environment. Find the buttons to run a chunk of code, change the format of a particular chunk (from Code—i.e., Python 3—to Markdown or Raw, which are used for writing), and save your work. Look also at the directory browser on the left-hand side of the notebook, and orient yourself to its structure.

Next, we'll import the Python libraries `numpy` and `math`, which we will use to demonstrate simple calculations: 

In [1]:
import numpy as np
import math

We can start with some simple calculations, as any Python interpreter (i.e., an interface to enter Python code, like a Jupyter Notebook chunk) can mathematical operations:

In [2]:
0.4**2 # the double asterix operator exponentiates the following term

0.16000000000000003

We can also assign numbers to variables using the "=" operator:

In [3]:
p = 0.2
q = 0.8
p**2 + 2*p*q + q**2

1.0000000000000002

Unlike other languages you might be familiar with (e.g., R), Python allows you to assign multiple values to variables using a comma:

In [4]:
p, q = 0.4, 0.6
p**2 + 2*p*q + q**2

1.0

Python also includes a set of built-in functions (i.e., functions you don't need to load via an external library). Useful ones to know include `print()`, `range()`, `list()`, `len()` and `type()`:

In [5]:
p = 0.5
print('The frequency of allele A_1 is now', p)

The frequency of allele A_1 is now 0.5


In [6]:
nums = list(range(0,10,1))
nums

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [7]:
len(nums)

10

Note that the object nums has a length of 10. This is because it is somethign known as an list, a data structure that can hold more than one variable at a time. We can confirm this using the `type()` function:

In [8]:
type(nums)

list

We can also use `type()` to determine what kind of numerical data is encoded by an object. Typically, this will be either a `float` (i.e., a floating-point number, with a decimal point) or an `integer` (a number without a decimal point). We can change from a `float` to an `integer` and back using the functions `int()` and `float()`:

Text can be represented by a type of object known as a `String`:

In [9]:
text = "Conservation Genetics"

In [10]:
print(text)

Conservation Genetics


In [11]:
type(text)

str

Specialized libraries (equivalent to packages in R) include their own set of functions, which can be viewed by pressing tab after typing the library name and a period. We imported `numpy` with the nickname `np`. Here is one of its many functions, which takes the average of a list of numbers.

In [12]:
np.average(nums)

np.float64(4.5)

**QUESTION 1**: Choose another `Numpy` funciton and explain its purpose. (You may use other sources.)

Next, we'll introduce for now is the help() function. Its output is a little cryptic, but once you get oriented, can be useful for debugging:

In [13]:
help(np.average)

Help on _ArrayFunctionDispatcher in module numpy:

average(a, axis=None, weights=None, returned=False, *, keepdims=<no value>)
    Compute the weighted average along the specified axis.
    
    Parameters
    ----------
    a : array_like
        Array containing data to be averaged. If `a` is not an array, a
        conversion is attempted.
    axis : None or int or tuple of ints, optional
        Axis or axes along which to average `a`.  The default,
        `axis=None`, will average over all of the elements of the input array.
        If axis is negative it counts from the last to the first axis.
        If axis is a tuple of ints, averaging is performed on all of the axes
        specified in the tuple instead of a single axis or all the axes as
        before.
    weights : array_like, optional
        An array of weights associated with the values in `a`. Each value in
        `a` contributes to the average according to its associated weight.
        The array of weights must be

Though scripting languages like Python have a vast range of sophisticated uses in population genetics, a simple but powerful use is defining functions to streamline the calculation of simple quantities. For example, our unit on genetic drift has required use of the binomial probability distribution to determine the expected distribution of allele frequencies in the next generation, given a particular population size and parental allele frequencies. While not overly onerous with a graphing calculator or Google, definine a function in Python (or R) can make this task much more efficient: 

In [14]:
def binomial_probability(N, p, k):
    # Calculate q as the complement of p
    q = 1 - p
    
    # Calculate the binomial coefficient using factorials: C(2N, k)
    binomial_coeff = math.factorial(2 * N) // (math.factorial(k) * math.factorial(2 * N - k))
    
    # Calculate the probability using the binomial distribution formula
    probability = binomial_coeff * (p ** k) * (q ** (2 * N - k))
    
    return probability

In [15]:
N = 10 # e.g., the haploid population size; gets multuplie the number alleles in the next generation
p = 0.2 # e.g., the frequency of the focal allele
k = 1 # i.e., what is the probability the next generation only has 1 focal allele?
binomial_probability(N, p, k)

0.05764607523034241

(In other words, the probability of having exactly 1 copy of an allele with initial frequency 0.2 in a generation with diploid population size $2N=10$ is about 5%.)

**QUESTION 2**: Add a new chunk of code and use the `binomial_probability()` function to determine the probability of having exactly 25 copies of an allele with a starting frequency of 0.4 if the next generation has a *haploid* population size of 50. 

Functions can be as complex or as simple as needed. For example, here is one that calculates the dominance and seleciton coefficients of genotypes given relative fitness values:

In [5]:
def dominance_selection(w_rec): 

    # Calculate the selection coefficient:
    s = 1 - w_rec

    return s

In [8]:
w_rec = 0.5
dominance_selection(w_rec)

0.5

**QUESTION 3**: Write a function to calculate expected heterozygosity at a triallelic, diploid locus (one with allele frequencies $p$, $q$, and $r$). 