# Recap of Week 3: Code structuring

In the last week we continued to learn basic concepts of Python. In particular, we learned how to organise code into smaller components for easier re-use, testing and sharing. We covered the following topics:

- How to make you life easier with string formatting
- Using the Python standard library to find and use useful functions
- Ways of bundling up your code into reusable units with functions
- Making it possible to share your code with others by moving code into modules
- How to produce custom errors
- How to compactly generate lists with list comprehensions

We covered a wide range of the core concepts of the Python programming language, including data types, slicing, loops, branching, functions, modules and errors, in the last two weeks. Now, we can finally apply all of this to start processing scientific data. We will still learn about new ideas and tools along the way, but you will see that nearly all of these concepts - and the syntax to apply them - will be based upon the fundamental building blocks we learned about over the last two weeks.

# Week 4: Numerical Python

This week we will learn how to easily and efficiently work with large collections of numerical data. This could be a timeseries of observations, output from your environmental model or a distributed collection of multi-layer satellite data. The premier tool to work with any of these datasets is called NumPy.

## What is NumPy?

NumPy (short for Numerical Python and pronouced *num-pea* or *num-pie*) is a third-party Python module for numerical programming. Any time you have collections of numbers, either one-, two- or more dimensional, then NumPy is likely to help you out. It provides you with a large suite of tools, algorithms and techniques. It is one of the most commonly used Python packages around and is used in the majority of Python-based scientific software.

## Motivation: the speed of NumPy

Before we start actually introducing NumPy, let's use an easy example to demonstrate the main reason why NumPy is so popular and widely used in scientific computing: it is extremely fast! 

Let's consider we want to add together all the integers from 1 to 1000000. We could do this with the tools we learned so far:

### So far: data processing with lists
We can loop over a list of numbers and add each of them to a total count like this:

In [51]:
# we load the time module here to measure the time our code needs to run
import time
start_time = time.time()

total = 0
for num in range(1000001):
    total += num

elapsed_time_list = time.time() - start_time
print(f"time in seconds: {elapsed_time_list}")


time in seconds: 0.07615804672241211


### New: the NumPy way
Let's do the same calculation with NumPy

In [48]:
import numpy as np

start_time = time.time()

numpy_total = np.sum(np.arange(1000001))

elapsed_time_numpy = time.time() - start_time

print(f"time in seconds: {elapsed_time_numpy}")

time in seconds: 0.0029807090759277344


In [49]:
print(f"NumPy speed-up: {elapsed_time_list / elapsed_time_numpy}")

NumPy speed-up: 26.032074868021116


We see that NumPy can be an order of magnitude faster in processing large collections of data than anything we could do before (we will see why this is later on). Obviously, the amounts of data you will deal with when running environmental models or analysing geographical data sets in the future can get very big, very fast. So today, let's learn more about NumPy and how we can use it to unlock completely new ways of working with large, multi-dimensional datasets.

You can jump ahead to any chapter with the navigation below:


## Table of Contents

1. #### [String formatting](./chapters/01-string_formatting.ipynb)


## [[Next](./chapters/01-string_formatting.ipynb)]