# Recap of Week 3: Code structuring

In the last week we continued to learn basic concepts of Python. In particular, we learned how to organise code into smaller components for easier re-use, testing and sharing. We covered the following topics:

- How to make you life easier with string formatting
- Using the Python standard library to find and use useful functions
- Ways of bundling up your code into reusable units with functions
- Making it possible to share your code with others by moving code into modules
- How to produce custom errors
- How to compactly generate lists with list comprehensions

We covered a wide range of the core concepts of the Python programming language, including data types, slicing, loops, branching, functions, modules and errors, in the last two weeks. Now, we can finally apply all of this to start processing scientific data. We will still learn about new ideas and tools along the way, but you will see that nearly all of these concepts - and the syntax to apply them - will be based upon the fundamental building blocks we learned about over the last two weeks.

# Week 4: Numerical Python

This week we will learn how to easily and efficiently work with large collections of numerical data. This could be a timeseries of observations, output from your environmental model or a distributed collection of multi-layer satellite data. The premier tool to work with any of these datasets is called NumPy.

## What is NumPy?

[NumPy](https://numpy.org/doc/stable/user/absolute_beginners.html) (short for Numerical Python and pronouced *num-pea* or *num-pie*) is a third-party Python module for numerical programming. Any time you have collections of numbers, either one-, two- or more dimensional, then NumPy is likely to help you out. It provides you with a large suite of tools, algorithms and techniques. It is one of the most commonly used Python packages around and is used in the majority of Python-based scientific software.

## Motivation: the speed of NumPy

Before we start actually introducing NumPy, let's use an easy example to demonstrate the main reason why NumPy is so popular and widely used in scientific computing: it is extremely fast! 

Let's consider we want to double all the values in a large list of 1 million values:

In [60]:
large_python_list = list(range(1_000_000))

import numpy as np
large_numpy_array = np.arange(1_000_000)

Doing this with plain Python could be done with a list comprehension (Note: `%%timeit` is an [IPython magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-timeit) to measure the time it takes to execute a line or a cell of code):

In [61]:
%%timeit

[i*2 for i in large_python_list]

33.7 ms ± 343 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


But NumPy allows us to do:

In [62]:
%%timeit

large_numpy_array * 2

524 µs ± 89.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


You might see different results on your computer but speedups of anything from 10 to 100 times is common on an example like this. There are plenty of operations which might see speedups of 1000 times or more. Obviously, the amounts of data you will deal with when running environmental models or analysing geographical data sets in the future can get very big, very fast. So today, let's learn more about NumPy and how we can use it to unlock completely new ways of working with large, multi-dimensional datasets.

You can jump ahead to any chapter with the navigation below:


## Table of Contents

1. #### [NumPy arrays](./chapters/01-numpy_arrays.ipynb)
2. #### [Operations on NumPy arrays](./chapters/02-operations.ipynb)
3. #### [Multi-dimensional arrays](./chapters/03-multi-dimensional_arrays.ipynb)
4. #### [Filtering data](./chapters/04-filtering_data.ipynb)
5. #### [Combining arrays](./chapters/05-combining_arrays.ipynb)
6. #### [Final exercise](./chapters/06-final_exercise.ipynb)
7. #### [Summary](./chapters/07-summary.ipynb)


## [[Next](./chapters/01-numpy_arrays.ipynb)]