# Python bridge 1

*John Pinney*

## Hello!

Welcome to the python bridge.

This course follows on directly from *Introduction to Python*, aiming to bridge the gap between a beginner-level course and the specific computing resources that you need for your research.

We have three sessions to work together, with the following aims:

* Improve your confidence in writing python code.
* Introduce three essential packages for scientific computing in python: `numpy`, `matplotlib` and `pandas`.
* Practice working with a data set to carry out some basic analysis and visualisation tasks.

At the end of these sessions, we hope that you will feel better prepared for further training in scientific computing (e.g. machine learning, statistical modelling, simulation etc.)

This is a new course, and we are very grateful for your questions and feedback on the content and delivery so that we can continue to improve the training that we offer. Please email j.pinney@imperial.ac.uk with your comments and suggestions.

## How to use this course

A couple of ideas to get the most from your learning:

### Vocabulary list
It's a good idea to keep your own record of the various modules, functions, classes and methods that you will encounter. This can make it easier when you come to apply what you have learned in your own code. Perhaps make a new jupyter notebook (including working examples of code) that you can keep as a personal reference.

### Documentation
This course is a good opportunity to familiarise yourself with the online documentation for the packages we look at. Python documentation tends to be quite standardised, so any other packages you work with in the future are likely to be documented in a similar way.


## Packages

### Packages for scientific computing

You have already learned how to import python code from external *modules*. In scientific computing, packages give us access to an enormous range of modules containing algorithms and data structures that are written and maintained by domain experts. Choosing packages that are appropriate to your needs &mdash; and taking the time to learn how to use them effectively &mdash; can save a huge amount of time and effort in writing your own code.

A number of very useful packages are collected under the `scipy` umbrella, and linked from https://scipy.org . In this course, we will introduce three of the most widely used scipy packages, starting with the numerical computing tools in the `numpy` package.

### Using a package manager

To work with modules that are not part of the core python distribution, we need a framework that will deal with downloading the external code and ensuring that different modules are compatible with each other. 

With Anaconda, the easiest way to do this is using the Anaconda Navigator GUI. Go to *Environments* and use the search facility to find the packages that you want to install or uninstall. The package manager will attempt to install these from the internet, and you can then import the corresponding modules within your jupyter notebook.

The command-line utility `conda` gives access to the same package management system, e.g. the command

`
conda install numpy
`

will install numpy in the current environment.

Anaconda/conda is highly recommended as the most straightforward way to manage your python environments. If you have a different python install, you will need to use a different package manager to download packages (usually `pip`, e.g. `pip install numpy`). 

**NB** If you have any difficulties loading packages during the session, we recommend that you switch to using the online Binder versions of these notebooks - see https://github.com/johnpinney/python_bridge


Let's see if the `numpy` package is available in your notebook's environment. If you're using the default 'base(root)' environment in Anaconda, it is probably already there. 

In [86]:
import numpy as np

If it isn't found (you will get a *ModuleNotFoundError*), use your package manager to install it and try again.

Remember that the `as np` instruction means that we will refer to the `numpy` module in our code using the shorthand `np`.

Before we can get started with using `numpy`, we'll need to revise some of the basic ideas around python objects.

## Python object essentials

You might already be aware that python is an *object-oriented* language, but the details of what this means are not usually addressed in a beginners' course. This is because it is possible to write plenty of useful python code without thinking about objects at all.

However, as your projects become more complex and we incorporate code from external packages, it becomes important to understand how to handle objects to get them to do what you want. Specifically, we need to understand the concepts of **class**, **instance**, **attribute** and **method**.


Object-oriented languages encourage us organise our code to group data structures together with the functions that operate on them.
Let's look at what this means in practice, using a kind of object that you have already encountered: the python `list`.

We can generate a new list:

In [64]:
fruits = ['pear', 'orange', 'apple', 'pear', 'banana']
fruits

['pear', 'orange', 'apple', 'pear', 'banana']

We can make changes to the list:

In [65]:
fruits.append('kiwi')
fruits.reverse()
fruits

['kiwi', 'banana', 'pear', 'apple', 'orange', 'pear']

We can retrieve items from the list:

In [66]:
fruits[1]

'banana'

And we can count items in the list:

In [67]:
fruits.count('pear')

2

Let's examine in a little more detail what is happening here.

### Classes

We can think of a *class* as a blueprint for an object. Essentially, we need to define two things: 

* How the data associated with the object is handled in memory.
* The ways in which the object can interact with the rest of the program.

This means that every object of the same kind will behave in the same way. For example, if I have a `list`, I can check [the documentation](https://python-reference.readthedocs.io/en/latest/docs/list/) to find out all the things that I can do with it.

By convention, names for user-defined classes usually start with a capital letter.


### Instances

Once a class has been defined, we can create multiple *instance objects* from it, which are independent of each other. An instance of a class `C` is an object that has been made according to the blueprint defined by that class. 

We usually create new instances of a class `C` using a *constructor* `C()`. For a `list`, the equivalent function is `list()`:


In [78]:
x = list()
print(x)

[]


In [79]:
x.append(100)
x.append(101)
print(x)

[100, 101]


Sometimes new instances can be created in other ways, for example by obtaining a copy of an existing object:

In [81]:
y = x.copy()
y.append(200)
print(x)
print(y)

[100, 101]
[100, 101, 200]


### Attributes

An *attribute* is a variable that is attached to an object, which exposes some information about the internal state of that object. Depending on the way that the attribute is defined, this might be something fixed or something that can change during the lifetime of the object.

We can access an attribute `x` of a class `C` using a dot: `C.x`

A `list` doesn't have many attributes, but here is one, which simply records the class that the object is derived from:

In [82]:
fruits.__class__

list

Note that this gives the same output as the `type` function:

In [218]:
type(fruits)

list

### Methods

Lastly, a *method* is a function that is attached to an object. The methods that are available to a particular object are defined by its class.

We can invoke the method `f` of an object `x` using `x.f()` &mdash; the parentheses allow us to send arguments to the function, just as we would when using a normal function.

#### Exercise

Which methods of `fruits` have we already used?

Which of those methods cause the object to change in some way?

Find out what the methods `pop`, `index`, `insert` and `extend` do. Try them out below.

## `numpy`

`numpy` provides a set of general data structures and utilities to support numerical computing in python. It is one of the most widely used packages in scientific computing. 

### `ndarray`

A major feature of `numpy` is the *n-dimensional array* (`ndarray`) data type that it provides. This is similar to a `list`, but has three advantages for numerical computing:

* Every element must be of the same data type (e.g. `float` or `int`).
* Operations are much faster and more memory-efficient using `ndarray` than using `list`.
* The resulting code is easier to read and write.

In [199]:
n = 10

In [251]:
a = list()
for i in range(n):
    a.append(1.0)
print(a)
type(a)

[1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]


list

In [252]:
b = np.array(a)
print(b)
type(b)

[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]


numpy.ndarray

Notice that `np.array()` is a constructor, making a new `ndarray` object using data from the `list` provided. When we refer to "an array" in python, we usually mean an object of type `ndarray`.

We can consider `a` and `b` as different representation of the same column vector of length `n`. 

Now increase `n` to 1,000,000 above and compare the computation times for `a+a` and `b+b`:

In [204]:
import time

In [207]:
start_time = time.time()
result = list()
for i in range(n):
    result.append(a[i] + a[i])
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.00013399124145507812 seconds ---


In [208]:
start_time = time.time()
result = np.zeros(n) # makes a new ndarray filled with 0's
for i in range(n):
    result[i] = b[i] + b[i]
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.0001957416534423828 seconds ---


### vectorised operations

Wait a minute &mdash; didn't I just say that calculations using `ndarray` are supposed to be *faster*..?

This is true, but we have to work with the objects in the right way. 

Arithmetic operations like addition should be done directly (i.e. literally as `b + b`), which `numpy` understands as a vector operation:

In [210]:
start_time = time.time()
result = b + b
print(result)
print("--- %s seconds ---" % (time.time() - start_time))

[2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
--- 0.00043892860412597656 seconds ---


For large vectors, this is many times faster than using a loop to do the addition, and the code is simpler to write and easier to read! `numpy` handles the vectorised operation for us, so we can concentrate on the calculation itself.

#### Notes

The length of an `ndarray` is fixed when it is created, so it has no `append` method.

You can check the data type of an `ndarray` using the `dtype` attribute, and the number of data using `size`:

In [213]:
b.dtype

dtype('float64')

In [214]:
b.size

10


You can find all the attributes and methods available for an `ndarray` [here](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html).

#### Exercise

`x` is a vector of numerical data:

In [262]:
x = np.array([4, 20, 33, 10, 24, 36, 23, 61, 9, 53, 3, 66, 31])

Use the attributes and methods of `x` to answer the following questions:

How many values does `x` contain?

What are the mean and standard deviation of the values in `x`?

What is the range (maximum - minimum) of the values in `x`?

What is the index of the largest value in `x`? 
*Hint: use the `argmax()` method.*

### Array generators

zeros
ones
arange
linspace


Create a null vector of size 10, but make the fifth value 1.

Create a vector with values ranging from 10 to 49.

### Array slices

Create the reverse of a given vector `x` (first element becomes last).

### Arrays in higher dimensions

### Stacking arrays

In [7]:

data = np.vstack([x_vals, y_vals])

## Arithmetic

### Vector arithmetic

### Matrix multiplication

### Broadcasting

## Mathematical functions

In [None]:
x_vals = np.arange(0,30,step=np.pi/16)
y_vals = np.sin(x_vals)

## Random numbers

Make a 5x5 random matrix.

Normalize a 5x5 matrix.

Write a function to add Gaussian noise to a monochrome image provided as a 2D matrix.

In [11]:
np.random.rand(100, 1)

array([[0.80686535],
       [0.30328787],
       [0.13306068],
       [0.15469299],
       [0.41084397],
       [0.71247185],
       [0.56453821],
       [0.55491044],
       [0.52682772],
       [0.7833095 ],
       [0.25496814],
       [0.61563354],
       [0.54095577],
       [0.63078542],
       [0.66919827],
       [0.82111056],
       [0.73312421],
       [0.05824746],
       [0.24362309],
       [0.61493779],
       [0.25579933],
       [0.36036643],
       [0.3672504 ],
       [0.98426743],
       [0.98179395],
       [0.48488024],
       [0.88648762],
       [0.25874002],
       [0.53595738],
       [0.99383515],
       [0.85149674],
       [0.37716698],
       [0.09421315],
       [0.00225724],
       [0.19833786],
       [0.26411833],
       [0.35035939],
       [0.63688029],
       [0.48747792],
       [0.48902147],
       [0.07615082],
       [0.87341296],
       [0.8589914 ],
       [0.97751873],
       [0.63202915],
       [0.2332867 ],
       [0.35355492],
       [0.296

In [12]:
np.random.randn(100, 1)

array([[-1.58420427],
       [-0.76629332],
       [-0.0644615 ],
       [ 0.88991891],
       [-0.10433141],
       [ 0.18487157],
       [-0.24278505],
       [ 0.8794294 ],
       [-0.40480024],
       [-0.85780344],
       [ 0.30740613],
       [ 0.0231382 ],
       [-1.16536909],
       [-0.71124196],
       [-0.10178581],
       [-0.52250639],
       [ 1.18034906],
       [-1.08299902],
       [-1.40752803],
       [ 0.94697262],
       [-0.07097913],
       [ 0.50362712],
       [ 1.33084443],
       [ 0.48433515],
       [ 0.25572598],
       [-1.39613864],
       [ 1.16382679],
       [-0.53796989],
       [-0.38351483],
       [ 0.50894806],
       [ 2.03089117],
       [-1.40936462],
       [-1.79005372],
       [ 1.13858506],
       [ 0.3934622 ],
       [ 0.39725946],
       [-1.31820388],
       [-1.26622257],
       [-1.35176058],
       [-0.90446045],
       [ 0.30605255],
       [-1.382709  ],
       [ 1.07552728],
       [ 0.93405072],
       [ 0.98696315],
       [ 0

## Homework: Connect Four

Here's an extended exercise using 2D arrays. The idea is to get a bit more practice with writing functions and loops, and thinking about array indexing. 

Nothing will be marked, it's just for fun. Do as much as you like.

### The scenario

The game [Connect Four](https://en.wikipedia.org/wiki/Connect_Four) is played on a vertical grid with 7 columns and 6 rows.

We can represent the state of the game using an integer matrix, where 1 is a red counter, 2 is a yellow counter and 0 is an empty cell.

At the start of the game, the board looks like this:


In [266]:
board_0 = np.zeros((7,6),int)
print(board_0)

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]


(Notice that when the array is printed like this, the board is shown rotated by 90 degrees clockwise).

Red goes first, placing a counter in the fifth column:

In [270]:
board_1 = board_0.copy()
board_1[4][0] = 1
print(board_1)

[[0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]
 [1 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]


After seven moves, the board looks like this:

In [271]:
board_7 = np.array([[0, 0, 0, 0, 0, 0],
                    [1, 0, 0, 0, 0, 0],
                    [2, 1, 1, 0, 0, 0],
                    [1, 2, 0, 0, 0, 0],
                    [2, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0]])
print(board_7)

[[0 0 0 0 0 0]
 [1 0 0 0 0 0]
 [2 1 1 0 0 0]
 [1 2 0 0 0 0]
 [2 0 0 0 0 0]
 [0 0 0 0 0 0]
 [0 0 0 0 0 0]]


### Task 1

It's already annoying having to strain my neck to look at these boards. 

Please write a function that prints a representation of the board in the correct orientation, without all of those square brackets.

In [None]:
def display(board):
    

### Task 2

We could make it easier for a player to make a move.

Complete the function `do_move(board, player, column)`, which returns the new state of the board after a move is made in the column specified:

In [265]:
def do_move(board, player, column):
    """Returns the new board configuration after the specified move.

    Parameters:
        board (numpy.ndarray): The current board configuration.
        player (int): The player who is moving (1 or 2).
        column (int): The column in which they play (0-6).

    Returns:
        numpy.ndarray: The board configuration after the move. """
    
    new_board = board.copy()
    
    # do some things here...

    return new_board
    
    

### Task 3

Write a function `get_move(board, player)` that returns a legal move (column index) for the given player.

### Task 4 (harder)

Write a function `winner(board)` that returns an integer:

* -1 if the game is not yet over.
* 0 if the game is a draw.
* 1 if red has won.
* 2 if yellow has won.



### Task 5

You have *almost* made a Connect Four simulation. 
Can you finish it so that I can play against the computer? 


In [None]:
# Might be useful...
response = input("Please enter a column number:")
col = int(response)
print(col)

### Task 6 (optional)

Can you improve your `get_move` function to make a more strategic move?

## Further reading and exercises

Lots of useful tutorials are collected at [numpy.org](https://numpy.org/devdocs/user/index.html), including [this beginners' guide](https://numpy.org/devdocs/user/absolute_beginners.html).

[From Python to Numpy](https://www.labri.fr/perso/nrougier/from-python-to-numpy/) is an in-depth guide to using `ndarray` and vectorisation effectively, with examples from fluid dynamics to maze-building. 

The same author has collected [100 numpy exercises](https://github.com/rougier/numpy-100) with hints and solutions.

