# Python Crash Course for "Machine Learning and Pattern Recognition"

<center>
    <b>
        KV 344.009, 2020<br/>
        Jan Schlüter
    </b><br/>
    Some parts borrowed from Hamid Eghbal-Zadeh, Khaled Koutini, Filip Korzeniowski, Matthias Dorfer, Rainer Kelz, Andreas Arzt<br/><br/>
    <img src="http://imgs.xkcd.com/comics/python.png" />
</center>

## Why Python?

There are several aspects to consider when choosing a programming language for a project:

* [**Language features:**](https://docs.python.org/3/reference)
  Python is a general-purpose language supporting object-oriented, imperative and functional programming. It has a dynamic type system and automatic memory management. It is usually executed by an interpreter, rather than compiled, but it is easy to integrate parts written in C/C++.

* [**Builtin libraries:**](https://docs.python.org/3/library/)
  Python comes with a pretty comprehensive standard library, in a "batteries included" philosophy. For example, with `python3 -m http.server` you can start a web server.

* **Ecosystem:**
  There over [8 million Python users](https://www.zdnet.com/article/programming-languages-python-developers-now-outnumber-java-ones/), so most problems you run into [have already been solved](https://stackoverflow.blog/2017/09/06/incredible-growth-python/) by others, and most libraries you can think of [have already been implemented](https://pypi.org) better than you could.

* **Costs and Openness:**
  Python is free as in free beer (*gratis*), and as in free speech (*libre*).

As an interpreted language, it is well-suited for prototyping and experimentation, and it has built a healthy ecosystem of scientific users and libraries.

## Installation

### Linux (Ubuntu, Debian)

Python and its package manager are included in the repositories:
```bash
sudo apt install python3 python3-pip
```
Once installed, you can install additional packages with `pip3`:
```bash
sudo pip3 install numpy scipy matplotlib jupyterlab
```
(Notes: Make sure to use `pip3` or `python3 -m pip` -- `pip` without `3` would install a package for Python 2, which Debian/Ubuntu includes as well. Many packages are also included in the Debian repositories, such as `python3-numpy`, but they are older versions. If you work on a machine without root access, you can install packages into your home directory with `pip3 install --user numpy scipy matplotlib jupyterlab`. If you want to install packages for a particular project only, look into [virtual environments](https://docs.python.org/3/library/venv.html).)

### Windows

The easiest option to install Python 3 on Windows will be using Miniconda, currently available on: https://docs.conda.io/en/latest/miniconda.html. Make sure to choose Python 3.x, not Python 2.7.

After installing Miniconda, open a terminal and run:
```
conda install numpy scipy matplotlib jupyterlab
```

### Mac OS

Follow the Windows guide.

## Running Python

There are three options to use the Python interpreter.

### 1. Interactive console

Open a terminal and run:
```bash
ipython3
```

This will open a command prompt where you can enter statements and see results:
```
In [1]: x = 7

In [2]: x * 3
Out[2]: 21
```

It supports tab completion (i.e., hit the "tabulator" key any time to complete a partial name). You can also quickly read the documentation for something by appending a question mark and hitting the return key:
```
In [3]: print?
```

`ipython3` uses a module that has been installed along with `jupyterlab`. You can also just run `python3` to get a similar command prompt with fewer features.

The interactive console is the way to go for quickly trying something, or rough prototyping.

### 2. Python script

Create a file `test.py`:
```
#!/usr/bin/env python3
# -*- coding: utf-8 -*-

x = 7
print(x * 3)
```

Run it:
```
python3 test.py
```
or, on Linux/Mac OS, make it executable and run it (this uses the `#!` line):
```
chmod +x test.py
./test.py
```

This is the way to go to write Python programs. If you did not find your favorite text editor yet, I can recommend [VS Code](https://code.visualstudio.com).

### 3. Notebook

Launch [Jupyter Lab](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html):
```
jupyter lab
```

This should open a browser window with a view of the current directory on the left, and a launcher on the right. Choose "Python 3" in the category "Notebooks", or go via "File", "New", "Notebook". Right-click the tab title to rename the file if you want.

A [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Notebook%20Basics.html#Modal-editor) is a file that mixes markdown text (such as this one) with code cells (such as the following):

In [None]:
print('Hello world!')

Cells can be run by selecting them and pressing "Shift + Return". To edit a cell, press "Return". Use "Tab" for code completion, and "Shift+Tab" to access the documentation for where the cursor is. To stop editing without running, press "Esc". To switch between markdown and code, press "m" and "y", and to add a cell above or below, press "a" or "b" (all while not in edit mode).

Note that cells can be run interactively in any order. With great power comes great responsibility: Write your notebook such that it works when running from top to bottom, like a Python script.

Notebooks are the way to go for demonstrations and not-too-complicated experiments.

## Learning Python

You probably don't want to learn a new language just for some random course. 

*Lucky you:* Python is so simple that you already understand it!

In [None]:
def print_vowels(text):
    vowels = 'aeiouy'
    for char in text:
        if char in vowels:
            print("Found " + char)

What do you think the following will do?

In [None]:
print_vowels('testing')

Furthermore, there are many online resources, e.g.:
* [Scipy Lecture Notes](https://scipy-lectures.org/): highly recommended, same goals as this crash course!
* [The Python Tutorial](https://docs.python.org/3/tutorial/index.html): for learning more about Python

Enough, let's start the crash course! We will go through the language features, builtin libraries and the scientific ecosystem, in that order.

# Language features

## Variables and basic types

### Declaration

Variables get created by assigning to them. They are not declared first.

In [None]:
x = 3

Variable names can include letters, underscores and numbers (but not start with a number). The convention is to use lower case letters and underscores:

In [None]:
last_name = 'Müller'

### Types

Python supports several basic types:

In [None]:
x = 'asdf'
type(x)

In [None]:
x = "asdf"
type(x)

In [None]:
x = 3
type(x)

In [None]:
x = 3.0
type(x)

In [None]:
x = True
type(x)

In [None]:
x = None
type(x)

The last one is used to represent nothing, similar to `nil` or `null` in other languages.

### Conversion

Some common type conversions:

In [None]:
int('42')

In [None]:
float('42')

In [None]:
str(1.41421)

In [None]:
str(True)

Almost anything can be interpreted as a bool. Empty and zero things are *false-ish*, everything else is *true-ish*.

In [None]:
bool(0)

In [None]:
bool(123)

In [None]:
bool('')

In [None]:
bool('asdf')

In [None]:
bool(None)

### Scopes

Variables live until explicitly deleted with `del`, or until exiting a function or method. The objects they refer to are deleted when their last reference is deleted.

## Operators

### Basic arithmetics:

In [None]:
7 + 3

In [None]:
7 - 3

In [None]:
7 / 3

In [2]:
7 // 3  # truncating division

2

In [None]:
7 % 3  # modulo operator

In [None]:
7 * 3

In [None]:
7 ** 3  # exponentiation

In [1]:
(3*3 + 4*4) / 5  # usual operator precedence

5.0

### Comparisons:

In [None]:
3 == 3

In [None]:
3 != 3

In [None]:
3 <= 3

In [None]:
3 < 3

In [None]:
3 > 2

In [None]:
3 >= 2

### Logical:

In [None]:
True or False

In [None]:
True and False

In [None]:
not False

In [3]:
0 or 'asdf'

'asdf'

### Binary:

In [None]:
2 | 6  # or

In [None]:
2 & 6  # and

In [None]:
~2  # not

In [None]:
3 ^ 1  # xor

### Ternary:

In [None]:
3 if 5 == 5 else 'asdf'

This would be written `(5 == 5) ? 3 : 'asdf'` in C or Java.

### String operations:

In [5]:
x = 'asdf'

In [None]:
x + x

In [None]:
x * 3

Accessing elements and substrings:

In [6]:
x[0]

'a'

In [7]:
x[0:2]

'as'

In [8]:
x[:2]

'as'

In [9]:
x[2:]

'df'

The general syntax is `x[start:stop:step]`. To get every second element:

In [10]:
x[::2]

'ad'

To reverse:

In [12]:
x[::-1]

'fdsa'

If `start` or `stop` is negative, it will be counted from the end:

In [None]:
x[-1]

In [11]:
x[:-2]

'as'

Getting the length:

In [None]:
len(x)

String formatting:

In [None]:
'The string "%s" has %d characters.' % (x, len(x))

To discover more string operations, enter `x.` and hit the "Tab" key for autocompletion.

## Data structures:

We can collect values of basic types (or any other objects) in the typical data structures you would find in other languages.

### list

This is a dynamic array of Python objects. Types can be mixed freely.

In [13]:
x = [1, 2, 3, 'vier']

Elements can be accessed as in a string:

In [None]:
len(x)

In [None]:
x[:2]

Addition and multiplication is defined as for a string:

In [14]:
x + x

[1, 2, 3, 'vier', 1, 2, 3, 'vier']

In [15]:
x * 3

[1, 2, 3, 'vier', 1, 2, 3, 'vier', 1, 2, 3, 'vier']

Unlike a string, lists can be modified in place:

In [None]:
x.append(None)
x

In [16]:
x[:2] = ['a', 'b']
x

['a', 'b', 3, 'vier']

### tuple

Like a list, but cannot be modified after creating it. Often used to return multiple values at once from a function.

In [17]:
x = (1, 2, 3, 'vier')

In [None]:
x + x

In [18]:
x[0] = 5

TypeError: 'tuple' object does not support item assignment

In many places, the round brackets can be omitted:

In [None]:
x = 1, 2, 3, 'vier'
print(x)

A tuple of one element still needs a comma, though, otherwise it's just a value with superfluous brackets:

In [None]:
(1)

In [None]:
(1,) 

### dict

An associative array, or hash table. Values can be of any type, keys must be hashable (lists can not be keys, only tuples).

In [22]:
x = {1: 'eins', 2: 'zwei'}

In [23]:
x[1]

'eins'

In [24]:
x[3] = 'three'
x

{1: 'eins', 2: 'zwei', 3: 'three'}

In [25]:
del x[1]
x

{2: 'zwei', 3: 'three'}

In [26]:
len(x)

2

### set

An unordered list of hashable objects.

In [None]:
x = {1, 2, 'vier'}

In [None]:
2 in x

In [None]:
4 not in x

In [None]:
x.add(3)
x

In [None]:
x.add(1)
x

In [None]:
x & {2, 3, 4, 5}

### Packing/unpacking

List and tuples can be packed and unpacked:

In [None]:
x = (2, 3)
a, b = x
print(a, b)

In [None]:
b, a = a, b
print(a, b)

In [None]:
x = [1, 2, 3, 4]
a, b, *c = x
print(a, b, c)

## Control structures

Mostly everything you know also exists in Python.

### Conditions

In [None]:
x = 3
if x > 3:
    y = 'Greater than 3'
    x = x - 2
elif x > 2:
    y = 'Greater than 2'
    x = x - 1
else:
    y = 'Smaller than 2'
print(y)

**Look ma, no brackets!** The end of each block is indicated by the indentation (the amount of white space / blanks before the first character). Convention is to use 4 spaces for each nested block.

Conditions in an `if` clause are implicitly converted with `bool()`. So the following clauses are equivalent:

In [None]:
x = []
if x:
    print(x)
if len(x):
    print(x)
if len(x) != 0:
    print(x)

### Loops

In [None]:
for x in 0, 1, 2:
    print(2**x)

In [None]:
for x in range(3):
    print(2**x)

In [None]:
for x in 'asdf':
    print(x)

In [None]:
x = {1: 'one', 2: 'two', 3: 'three'}
for k, v in x.items():
    print('%d --- %s' % (k, v))

With `x.keys()` or `x` you can iterate over just the keys, and with `x.values()` over just the values.

In [None]:
x = {1, 4, 8}
while x:  # as long as x is not empty
    print(x.pop())  # remove some item

To exit a loop early, use `break`. To skip the rest of a loop body and continue with the next iteration, use `continue`. You can append an `else` clause that is executed if you did not `break`:

In [27]:
x = [2, 4, 8]
for item in x:
    if item % 3 == 0:
        print("Found item divisible by 3!")
        break
else:
    print("Did not find any.")

Did not find any.


### Catching exceptions

Simple try/except/else:

In [None]:
divisor = 0
try:
    y = 12 / divisor
except IOError:
    print("Input/output problem")
except Exception as e:
    print("Oh no: " + e.args[0])
else:
    print("It worked!")

Try/finally:

In [None]:
f = open('Python Intro.ipynb', 'r')
try:
    print(f.readline())
finally:
    f.close()

Some objects have context managers so you do not need to write a suitable try/finally:

In [None]:
with open('Python Intro.ipynb', 'r') as f:
    print(f.readline())

## Comprehensions and Generators

Python allows to construct lists in a very compact way:

In [None]:
x = [2**exponent for exponent in range(5)]
print(x)

This is basically the same as:

In [None]:
x = []
for exponent in range(5):
    x.append(2**exponent)

This also works for dictionaries:

In [None]:
x = {key: 2**key for key in range(5)}
print(x)

If all you want to do with a list comprehension is to iterate over it, you can use a generator instead:

In [None]:
x = (2**exponent for exponent in range(5))

This can be iterated over like a list, but it is evaluated lazily, i.e., the elements are constructed one by one when we ask for them:

In [None]:
for value in x:
    print(value)

This concept is especially useful for building data processing pipelines.

## Functions

**Don’t Repeat Yourself (DRY).**
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
This holds for constants, algorithms, etc.

For algorithms, just in any other language, you can define functions:

In [None]:
def hypothenuse(a, b):
    return (a*a + b*b)**.5

In [None]:
hypothenuse(3, 4)

Arguments can be passed by position or by name:

In [None]:
hypothenuse(b=4, a=3)

Arguments can be defined as optional, by giving them a default value:

In [None]:
def ascii_shift(text, shift=13):
    return ''.join(chr(ord(c) + shift) for c in text)

In [None]:
ascii_shift('abcde')

In [None]:
ascii_shift('abcde', 3)

The common idiom for returning multiple values is to use tuples:

In [None]:
def div_mod(a, b):
    return a // b, a % b

In [None]:
x, y = div_mod(7, 3)
print(x)
print(y)

Note that `return x, y` is the short form of `return (x, y)`, which creates a tuple and returns it. On the receiving side, `x, y =` unpacks the tuple.

(The example was just for illustration; Python has a builtin `divmod` function.)

We can also create unnamed functions using `lambda arg1, arg2: expression`, e.g., to pass them to another function.

In [None]:
def find_first(x, condition):
    for elem in x:
        if condition(elem):
            return elem

In [None]:
find_first(range(10), lambda v: v**2 > 25)

If we want to support an arbitrary number of positional (unnamed) arguments, and an arbitrary number of keyword (named) arguments, we use `*args` and `**kwargs` in the argument list:

In [None]:
def print_everything(*args, prefix='- ', **kwargs):
    for arg in args:
        print(prefix + str(arg))
    for k, v in kwargs.items():
        print('%s%s: %s' % (prefix, k, v))

In [None]:
print_everything(1, 2, 3, conchita='wurst')

When *calling* a function, `*` and `**` can be used to unpack an iterable or dict into function arguments:

In [None]:
x = range(5)
y = {'hello': 'world'}
print_everything(*x, **y)

Functions should begin with a docstring, such as:

In [None]:
def find_first(x, condition, default=None):
    """
    Returns the first element of `x` for which
    the function `condition` returns a true-ish
    value, otherwise returns `default`.
    
    Parameters
    ---------
    x : iterable
        The elements to search in.
    condition : callable
        The condition to evaluate.
    default
        Value to return if no element matches.

    Returns
    -------
    The first element of x that fulfills the
    condition. If no element matches, returns
    the default value.
    """
    for elem in x:
        if condition(elem):
            return x
    else:
        return default

Here we followed the [numpy docstring conventions](https://numpydoc.readthedocs.io/en/latest/format.html#docstring-standard). This is what will be displayed when pressing "Shift+Tab", or appending a question mark.

## Classes and objects

In Python, **everything is an object**. Even the most basic types. Remember the beginning:

In [None]:
x = 3
type(x)

`int` is actually a class, a subclass of `object`:

In [None]:
isinstance(x, int)

In [None]:
isinstance(x, object)

In [None]:
issubclass(int, object)

And it has attributes and methods we can see by typing `x.` and hitting "Tab":

### References

A variable does not hold a value, it holds an object reference. For immutable types, you don't need to think about this:

In [None]:
x = 3  # ints are immutable
y = x  # y references the same Python object
x += 2  # same as x = x + 2, because int does not implement +=
print(x)
print(y)

For mutable types, in-place operations will be visible through all references to an object:

In [None]:
x = [1]  # lists are mutable
y = x   # y references the same Python object
x += [5]  # modifies that object in place
print(x)
print(y)

We can check that `x` and `y` refer the same object:

In [None]:
print(id(x))
print(id(y))
print(x is y)

Plain assignment (of the form `x = `, not `x += `, or `x[0] = `) evaluates the expression on the right and binds a name to the result, which may be a new object.

In [None]:
x = [1]
y = x
x = x + [5]  # creates a new object "x + [5]"
print(x, id(x))
print(y, id(y))
print(x is y)

When passing arguments to a function, only the object reference is passed, no copy of the object is made. So the same rules apply.

### Defining classes

In [None]:
class Animal(object):
    """
    An animal with a given `name` and `sound`.
    """
    def __init__(self, name, sound='...'):
        self.name = name
        self.sound = sound

    def talk(self):
        print(self.name + ': ' + self.sound)
        
    def shout(self):
        print(self.name + ': ' + self.sound.upper())

Inheritance:

In [None]:
class Duck(Animal):
    """
    A duck with a given `name`.
    """
    def __init__(self, name):
        super().__init__(name, 'quack')

class Dog(Animal):
    """
    A dog with a given `name`.
    """
    def __init__(self, name):
        super().__init__(name, 'woof')

### Using objects

Instantiation:

In [None]:
animals = [Animal('alf', 'grunt'), Duck('daisy'),
           Duck('dagobert'), Dog('fiffy')]

Accessing attributes:

In [None]:
animals[-1].name

Calling methods:

In [None]:
for animal in animals:
    animal.shout()

In [None]:
[isinstance(animal, Duck) for animal in animals]

In [None]:
[isinstance(animal, Animal) for animal in animals]

### Operator overloading

All operators in Python correspond to a special method being called on the underlying objects, the ["underscore methods"](https://docs.python.org/3/reference/datamodel.html#emulating-callable-objects). For example, for our animals to support addition, we would implement the `__add__` method.

We can even tack on a method implementation in hindsight. Sorry for the hack, I did not want to copy all the class definitions again.

In [None]:
Animal.__add__ = lambda self, other: Animal(self.name + other.name,
                                            self.sound + other.sound)

In [None]:
(animals[0] + animals[-1]).talk()

## Modules

Python comes with a lot of builtin types and functions, but others are defined in libraries that need to be imported. These are called modules in Python. In their most basic form, modules are Python scripts with function definitions and class definitions. They can be loaded with the `import` statement.

In [None]:
import os

Everything in Python is an object, and so are modules. Their attributes are the functions and classes (and submodules) defined within.

In [None]:
os

In [None]:
os.listdir('.')

You can also import some attribute directly:

In [None]:
from os import listdir
listdir('.')

Or change the name on importing it:

In [None]:
from os import listdir as ls
ls('.')

Use this sparingly, as it makes your code harder to follow for others. If you want to fully confuse readers of your code, you can also import everything in a module into your global namespace with `from os import *`. (This is a no-go.)

### Defining your own module

Even for small projects, it often makes sense to divide your code into multiple files. If you place them in the same directory as your main script or notebook, they can be imported by their name. For example, we put the three classes from above in a file `animals.py`. Now we can do:

In [None]:
import animals

In [None]:
dog = animals.Dog('rex')

In [None]:
dog.talk()

# The Python Standard Library

Python comes with a large collection of modules. Here we will just briefly mention the ones you will most probably need. For details on their use, check their respective documentation at https://docs.python.org/3/library/.

## File browsing

The modern take on browsing directories and working with file names is [`pathlib`](https://docs.python.org/3/library/pathlib.html):

In [None]:
import pathlib
p = pathlib.Path('.')

In [None]:
p.is_dir()

In [None]:
for item in p.iterdir():
    print(item.name + ' - ' + item.suffix)

In [None]:
list(p.glob('**/*.py'))

You will also find [`os`](https://docs.python.org/3/library/os.html) and [`glob`](https://docs.python.org/3/library/glob.html) being used.

## File reading/writing

Input/output is handled by the [`io`](https://docs.python.org/3/library/io.html) module. It supports text files (reading and writing `str` with specified encoding, can be line-based) and binary files (reading and writing `bytes`).

In [None]:
import io
with io.open('animals.py', 'r') as f:
    for line in f:
        if 2 < len(line) < 10:
            print(line.rstrip())

## Serializing/deserializing Python objects

Many Python objects support the "pickle" protocol to be serialized into a sequence of bytes:

In [None]:
import pickle
with io.open('dog.pkl', 'wb') as f:
    pickle.dump(dog, f)

In [None]:
with io.open('dog.pkl', 'rb') as f:
    x = pickle.load(f)
x.shout()

## Command line arguments

When writing a Python script, you may want to pass arguments to it on the command line. The most basic way to access them is via the [`sys`](https://docs.python.org/3/library/sys.html) module:

In [None]:
import sys
print('We got %d arguments:' % len(sys.argv))
for arg in sys.argv:
    print(arg)

You will want to use the [`argparse`](https://docs.python.org/3/library/argparse.html) module instead, though. See `script.py` for the template I use.

# The Scientific Ecosystem

There are three open-source modules not build into Python that are often referred to as the "scientific software stack":

1. [numpy](http://www.numpy.org/), for all things vectors and matrices
2. [scipy](http://www.scimpy.org/), for more advanced statistics and optimization
3. [matplotlib](http://www.matplotlib.org/), for data visualization and plots

## numpy

Numpy provides a class for representing vectors, matrices and higher-order tensors, and functions for performing linear algebra operations.

### Why linear algebra?

Many algorithms and models can be represented using matrix operations:

* Probabilistic Models (factor analysis, bayesian networks, ...)
* Neural Networks
* Signal / Image Processing
* ...

Matrix operations are easily vectorisable and thus scale well. Representing problems using matrices might be unintuitive at the beginning, but once you get the hang of it, you can apply your skills to various problems.

### Baby steps

The convention is to import numpy under the shorthand name `np`:

In [None]:
import numpy as np

All tensors use the `ndarray` class:

In [None]:
x = np.array([1, 2, 3])
x

In [None]:
x + 2

### Why numpy?

We *could* just use Python lists of (lists of ...) numbers to represent tensors. But this would be slow:

* every single number would be an object, taking more space than needed
* arithmetic operations on these objects incur quite some overhead
* iteration in Python also incurs quite some overhead

Numpy stores tensors like arrays in C, and uses highly-optimized BLAS libraries for tensor operations. Anything performance-critical (especially loops!) is implemented in C rather than Python. So it is fast as C, with the convenience of Python.

Let's compare performance:

In [None]:
x = range(10000)
%timeit [v + 2 for v in x]

In [None]:
x = np.arange(10000)
%timeit x + 2

Of course this requires us to express our algorithms without resorting to `for` loops. Looping over a numpy array is as bad as a list:

In [None]:
x = np.arange(10000)
%timeit np.array([v + 2 for v in x])  # bad idea

### Vectors, matrices, and other tensors

Let's start with a simple vector again:

In [None]:
x = np.array([1, 2, 3])

Every numpy array has a dimensionality, shape and data type:

In [None]:
x.ndim

In [None]:
x.shape

In [None]:
x.dtype

Indexing works as for lists and strings:

In [None]:
x[0]

In [None]:
x[::-1]

A matrix can be constructed from a list of rows:

In [None]:
x = np.array([[1, 2, 1], [2, 3, 2]])
x

In [None]:
print(x.ndim, x.shape, x.dtype)

Indexing can now take a specification for each dimension:

In [None]:
x[0, 0]

First row:

In [None]:
x[0]

First column (`:` is shorthand for "everything"):

In [None]:
x[:, 0]

Transposing a matrix:

In [None]:
x.T

We can also create tensors just by giving their shape:

In [None]:
np.zeros((1, 2))

In [None]:
np.ones((2, 2, 2))

In [None]:
np.random.rand(3, 2)

Transposing a higher-order tensor:

In [None]:
x = np.zeros((3, 4, 5))
x = x.transpose(0, 2, 1)  # new order of the three dimensions
x.shape

### Data types

You may have noted that sometimes, numbers end with a dot (`.`) -- this denotes floating point numbers.

In [None]:
np.array([1, 1])

In [None]:
np.array([1, 1], dtype=np.float32)

The data type can be specified in any array-constructing function. Important data types include:

* floating point: np.float32, np.float64
* integers: np.int, np.uint8
* boolean: np.bool

Conversions are done using `.astype` (which creates a copy even if the type stays the same) or `asarray` (which only copies if needed):

In [None]:
x = np.zeros(3, dtype=np.bool)
x

In [None]:
x.astype(np.double)

In [None]:
np.asarray(x, dtype=np.int)

### Shapes

Shapes can be changed if the total number of elements stays the same.

In [None]:
x = np.arange(6).reshape(2, 3)
x

In [None]:
x.ravel()

In [None]:
x.reshape(3, 2)

In [None]:
x.reshape(-1, 2, 1)  # -1 is inferred to match

In [None]:
x[:, np.newaxis, :].shape  # insert a dimension

### Views

Slicing, reshaping and transposition create *views* into the same underlying memory, not copies of the data. Changing data through a view affects the original array:

In [None]:
x = np.zeros(5)
y = x[::2]  # a view of every second element
y += 3
print(x)

### Fancy indexing

We can also select elements from an array using a boolean mask, or an array of indices. This is called *fancy indexing*. It creates a copy, not a view:

In [None]:
x = np.arange(10)
y = x[x > 5]
print(y)

In [None]:
y *= 0
print(x)

However, using slice assignment, we can modify elements selected with fancy indexing:

In [None]:
x[x > 5] = 1
print(x)

(If it helps: This statement calls [`x.__setitem__`](https://docs.python.org/3/reference/datamodel.html#object.__setitem__), it does not construct `x[x > 5]` and then assign to it.)

### Pointwise operations and broadcasting

Most operations on arrays are applied elementwise:

In [None]:
x = np.arange(6).reshape(2, 3)

In [None]:
x * 5

They work as long as the two operands have compatible shapes. That is, either the same shape:

In [None]:
x + np.ones((2, 3))

Or the same shape, but with some dimensions of size `1`:

In [None]:
x + np.arange(3).reshape(1, 3)

In [None]:
x + np.arange(2).reshape(2, 1)

Dimensions of size 1 ("singleton dimensions") are implicitly replicated as needed. This is called broadcasting. If the dimensionality of the operands is not the same, then dimensions of size 1 are appended on the left. (Same as using `bsxfun` in Matlab/Octave, if this helps anybody.)

In [None]:
x

In [None]:
y = np.array([10, 20, 30])

In [None]:
x + y  # broadcasting: add y to every row

Since missing dimensions are always added on the left, a one-dimensional array is always interpreted as a row vector. To get a column vector, we need to make it two-dimensional.

In [None]:
y = np.array([10, 20])
y[:, np.newaxis]  # view of y with new dimension in the end

In [None]:
x + y[:, np.newaxis]  # add column vector to every column

We can also add a row vector and a column vector to get a matrix:

In [None]:
x = np.arange(3)
y = np.arange(2)
x + y[:, np.newaxis]

### Matrix multiplication

We saw that `*` just does an elementwise multiplication. For a dot product, we use `np.dot` or the `@` operator:

In [None]:
x = np.ones((2, 3))
y = np.ones((4, 3))
x @ y.T

### Reduction

Often we need to compute multiple sums or other statistics. This is called a *reduction* operation, and can be applied to a subset of dimensions:

In [None]:
x = np.random.randn(10, 20, 30)
y = x.mean(axis=0)
y.shape

Negative axes are counted from the end, as in array indexing:

In [None]:
y = x.mean(axis=-1)
x.shape

We can keep the dimension we reduce over as a singleton dimension:

In [None]:
y = x.mean(axis=1, keepdims=True)
y.shape

This is especially useful when combined with broadcasting.

In [None]:
x -= x.mean(axis=(1, 2), keepdims=True)  # subtract mean over last two dims

## Matplotlib

The most widespread module for plotting data and creating figures for papers in Python is `matplotlib`. It nicely integrates with notebooks when entering the `%matplotlib inline` magic function. Let's do so and import it:

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

`pyplot` is an easy interface for `matplotlib`, and imported as `plt` by convention. It relies on `numpy` for the data representation.

### Line graphs and legends:

In [None]:
x = np.linspace(0, 10, 100)
y = np.cos(x)
plt.plot(x, y, label='low')
plt.plot(x, 2 * y, color='r', linestyle=':', label='high')
plt.legend(loc='upper right')

### Bar charts:

In [None]:
data = {'apples': 2, 'pasta': 0, 'fennel': 30}
xpos = [0, 1, 2]
plt.bar(x=xpos, height=data.values())
plt.xticks(xpos, data.keys())
plt.suptitle('remaining stock')

### Matrices and color bars:

In [None]:
x = np.random.randn(40, 50)
x += np.arange(50)
plt.matshow(x)
plt.colorbar()

The matplotlib tutorial includes a [guide on choosing color maps](https://matplotlib.org/tutorials/colors/colormaps.html) for a particular purpose.

### Multiple plots:

In [None]:
fig, axs = plt.subplots(1, 2, sharey=True, figsize=(8, 3))
axs[0].plot(20 - np.random.rand(20) - np.arange(20))
axs[1].plot(20 - np.random.rand(20) - np.arange(20))
axs[0].set_title('orange dataset')
axs[1].set_title('apple dataset')

### Saving figures

In [None]:
plt.plot(np.random.rand(20))
plt.suptitle('randomness')
plt.savefig('myplot.pdf', transparent=True, bbox_inches='tight')

### Outside notebooks

If you start the interactive console with `ipython --matplotlib`, you will get *interactive plotting*: The first plotting command opens a figure in a separate window, and all other plotting commands update it.

In a Python script, call `plt.show()` whenever you want to display the figure(s) you created. Or just call `plt.savefig()` and do not show anything on screen.

### More plots

These were just the basics -- see the [matplotlib gallery](https://matplotlib.org/gallery/index.html) or the [matplotlib reference](https://matplotlib.org/api/pyplot_summary.html) for more plotting goodness.