# Lecture 1

In this lecture we start exploring the Python language,
covering the following topics:
1. Basic syntax
2. Built-in data types
3. NumPy arrays

## Basic syntax

- You don't need to add a character to terminate a line, unlike in some languages.
- Everything after a # character (until the end of the line) is a comment and will be ignored.
- Whitespace characters matter (unlike in most languages)!
    Python uses indentation (usually 4 spaces) to group statements,
    for example loop bodies, functions, etc.
- You can use the `print()` function to inspect almost any object.

In [103]:
# First example

# create a variable named 'text' that stores the string 'Hello, world!'
text = 'Hello, world!'

# print contents of 'text'
print(text)

Hello, world!


In IPython notebooks and command-liny Python interpreters,
we can also display a value by simply writing the variable name.

In [104]:
text

'Hello, world!'

This does not do anything in *proper* python script files
that are run through the interpreter, though.

## Built-in data types

Pythons is a dynamically-typed language:
- Unlike in C or Fortran, you don't need to declare a variable or its type

We now look at the most useful built-in data types:
- integers
- floating-point numbers (floats)
- strings
- tuples
- lists
- dictionaries

### Integers and floats

In [105]:
# Integer variables
i = 1

# Floating-point variables
x = 1.0

# A name can reference any data type:
# Previously, x was a float, now it's an integer!
x = 1

# It is good programming practice to specify floating-point
# literals using a decimal point. It makes a difference in
# a few cases (especially when using Numba or Cython).
x = 1.0 # instead of x = 1

### Strings

The string data type stores sequences of characters.

In [106]:
# Strings need to be surrounded by single (') or double (") quotes!
institution = 'University of Glasgow'
institution = "University of Glasgow"


### Tuples

Tuples represent a collection of several items which can
have different data types. They are created using parenthesis.

In [107]:
# A tuple containing a string, an integer and a float
items = ('foo', 1, 1.0)

Use [] to access an element in a tuple (or any other
list-like object)

In [108]:
first = items[0] # contains 'foo'
print(first)

foo


Python indices are 0-based, so 0 references the first element,
1 the second element, etc.

In [109]:
second = items[1] # contains second element


Tuples are immutable, which means that the references
stored in the tuple cannot be changed!

In [110]:
# This raises an error!
items[0] = 123

TypeError: 'tuple' object does not support item assignment

### Lists

Lists are like tuples, expect that they can be modified.
We create lists using brackets [].

In [None]:
lst = ['foo', 1, 1.0]

Accessing list items works the same as with tuples

In [None]:
print(lst[0]) # print first item

Lists items can be modified:

In [None]:
lst[0] = 'bar'
lst

Lists are full-fledged objects that support various operations,
for example

In [None]:
lst.insert(0, 'abc') # insert element at position 0
lst.append(2.0) # append element at the end
del lst[3] # delete the 4th element
lst

The built-in functions `len()` returns the number
of elements in a list (and any other collection object)

In [None]:
n = len(lst)
print(n)

### Dictionaries
Dictionaries are container objects that map keys to values.
- Both keys and values can be (almost any) Python objects,
even though usually, we use strings as keys.
- Dictionaries are created using curly braces {}.
- Dictionaries are unordered, ie. elements stored in dictionaries
  cannot be retrieved in any particular order.

In [None]:
dct = {'institution': 'University of Glasgow',
       'course': 'Python bootcamp'}
dct

Specific values are access using the syntax `dict[key]`:

In [None]:
value = dct['institution']
value

## NumPy arrays

NumPy is a library that allows us to efficiently store and access
(mainly) numerical data and apply numerical operations
simular to those available in Matlab.
- NumPy is not part of the core Python project
- Python itself has an array type, but there is really no
  reason to use it. Use NumPy!
- NumPy types and functions are not built-in, we must
  first import them to make them visible.
  We do this using the `import` statement.

In [None]:
# The convention is to make NumPy functionality available
# using the 'np' namespace
import numpy as np

### Creating arrays
NumPy offers a multitude of functions to create arrays.

In [None]:
# Create a 1-dimensional array with 10 elements, initialise values to 0
# We need to prefix the NumPy function zeros() with 'np'
arr = np.zeros(10)
arr

In [None]:
arr1 = np.ones(5) # vector of five ones
arr1

We can also create sequences of integers

In [None]:
arr2 = np.arange(5) # vector [0,1,2,3,4]
arr2

We can explicitly specify initial values and increments.
The end value is NOT included.

In [None]:
start = 2
end = 10
step = 2
arr3 = np.arange(start, end, step)
arr3

We can create arrays without initialisation. This
only allocates a chunk of memory, which most likely
contains arbitrary garbage.

In [None]:
arr4 = np.empty(3)
arr4 # values are unpredictable

As in Matlab, there is a `linspace()` function that
creates a vector of uniformly-spaced real values.

In [None]:
# Create 11 elements, equally spaced on the interval [0.0, 1.0]
arr5 = np.linspace(0.0, 1.0, 11)
arr5

We can create arrays of higher dimension by specifying
the desired shape

In [None]:
mat = np.ones((2,2)) # Create 2x2 matrix of ones
mat

#### Creating arrays from other Python objects
You can create arrays from other objects such as lists
and tuples by calling `array()`

In [None]:
# Create array from list [1,2,3]
arr = np.array([1,2,3])
arr

In [None]:
# Create array from tuple
arr = np.array((1.0,2.0,3.0))
arr

In [None]:
# Create two-dimensional array from nested list
arr = np.array([[1,2,3],[4,5,6]])
arr

### Reshaping arrays
The `reshape()` method of an array object can be used
to reshape it to some other (conformable) shape.

In [None]:
mat = np.arange(4).reshape((2,2))
mat

In [None]:
# reshape back to vector of 4 elements
vec = mat.reshape(4)
vec

We can use -1 to let NumPy automatically compute
the size of one remaining dimension.

In [None]:
# with 2 dimensions, second dimension must have size 2
mat = np.arange(4).reshape((2,-1))
mat

If you just want to create an arbitray array to a vector,
use the `flatten()` method.

In [None]:
mat.flatten()

**Important**: the reshaped array must have the same
number of elements!

In [None]:
mat.reshape((2,2)) # Cannot reshape 6 into 4 elements!


### Indexing

#### Scalar indices

To retrieve only a single element of each axis,
just use the element index
(axis is the NumPy terminology for an array dimension)

In [None]:
mat = np.arange(6).reshape((3,2))
mat

In [None]:
mat[1,1] # returns element in row 2, column 2

#### Range indices

There are numerous ways to retrieve a subset of
elements from an array. The most common way is to
specify a triplet of values
`start:end:step` for some axis:

In [None]:
# Create a 3x2 matrix
mat = np.arange(6).reshape((2,3))
mat

In [None]:
# Retrieve only the first and third columns:
mat[0:2,0:3:2]

Indexing in NumPy can get quite intricate.
Some basic rules:
- all elements of `start:end:step` are optional, with
  the obvious default values.
  We could therefore write `::` to include all indices,
  which is the same as `:`
- The end value is NOT included. Writing
  `vec[0:n]` does not include the *n*-th element!
- Any of the elements of `start:end:step` can be negative.
    - If `start` or `end` are negative, elements are counted
        backwards:
        `vec[:-1]` retrieves the vector except for the last element.
         (the end index is never included!)
    - If `step` is negative, the order of elements is inverted.

In [None]:
vec = np.arange(5)
# These are equivalent ways to return the whole vector
vec[0:5]
vec[::]
vec[:]
vec[-5:]

You can reverse the order like this

In [None]:
vec[::-1]

With multi-dimensional arrays, these rules apply for each
dimension.
- We can, however, omit explicit indices
    for higher-order dimensions if all elements should
    be included.

In [None]:
mat[1] # includes all columns of row 2; same as mat[1,:]

We cannot omit the indices for preceding axis, though!

In [None]:
mat[:, 1] # includes all rows of column 2

#### Logical indices
Alternatively, we can pass logical arrays as indices.
- Logical (or boolean) arrays consist of elements that
  can only take on values `True` and `False`

In [None]:
vec = np.arange(5)
mask = (vec > 1)
mask

In [None]:
vec[mask] # use mask to retrieve only elements greater than 1

We can even apply boolean indexing to multi-dimensional
arrays. The result will be flatted to a 1-dimensional array,
though.

In [None]:
mat = np.arange(6).reshape((2,3))
mat[mat > 1]

Some of these indexing rules also apply to the built-in
`tuple` and `list` types, but NumPy indexing goes far beyond
that.

In [None]:
# Apply start:end:step indexing to tuple
tpl = (1,2,3)
tpl[:3:2]

Logical indexing does not work with `tuple` and `list`

In [None]:
tpl[tpl > 1]

### Other array methods
You can transpose an array using `arr.T`:

In [None]:
mat = np.arange(6).reshape((2,3))
mat

In [None]:
mat.T

### Numerical data types (advanced)

You can explicitly specify the numerical data type when
creating NumPy arrays.

So far we haven't done so, and then NumPy does the following:
- Functions such as `zeros()`, `ones()` and `empty()`
    default to using `np.float64`, ie. a 64-bit floating-point
    data type (this is also called *double precision*)
- Other constructors such as `arange()` and `array()`
    inspect the input data and return a corresponding array.
- Most array creation routines accept a `dtype` argument
  which allows you to explicitly set the data type.

For example:

In [None]:
# Floating-point arguments return array of type np.float64
arr = np.arange(1.0, 5.0, 1.0)
arr.dtype

In [None]:
# Integer arguments return array of type np.int64
arr = np.arange(1,5,1)
arr.dtype

Often we don't care about the data type too much, but keep in
mind that
- Floating-point has limited precision, even for integers
  larger than (approximately) $10^{16}$
- Integer values cannot represent fractional numbers and
  (often) have a more limited range.

This might lead to surprising consequences:

In [None]:
# Create integer array
arr = np.ones(5, dtype=np.int64)
# Add floating-point value 0.234 to second element in place
arr[1] += 0.234
arr

The array is unchanged because it's impossible to represent
1.234 as an integer value!

The take-away is to always explicitly write floating-point
literal values when you want data to be interpreted as
floating-point values, eg. always write 1.0 instead of 1,
unless you really want an integer!

### Array storage order (advanced)
Computer memory is linear, so a multi-dimensional
array is mapped to a one-dimensional block in memory.
This can be done in two ways:
1. NumPy uses the so-called *row-major order* (also called
*C order*, because its the same as in *C* programming language)
2. This is exactly the opposite of Matlab, which uses
  *column-major order* (also called *F order*, because its the same
  as in the *Fortran* programming language)

In [None]:
mat = np.arange(6).reshape((2,3))
# The matrix mat is stored in memory like this
mat.reshape(-1, order='C')

In [None]:
# ... and NOT like this
mat.reshape(-1, order='F')

While this is not particularly important initially,
as an advanced user you should remember that you never
want to perform on non-contiguous blocks of memory.
This can have devastating effects on performance!

In [None]:
# Avoid operations on non-contiguous array sections such as
mat[:, 1]
# Contiguous array sections are fine
mat[1]