# Functions, modules, packages and libraries

There are many different types of programmers.
Take, for example, the area of self-driving cars.
We can think of a variety of people who write different types of programs in that industry.

There are programmers who work in the low-levels of the car technologies, such as ABS brakes that prevent skidding.
ABS brakes require small, special-purpose chips that read the outputs of sensors related to the brakes in your car and make decisions based on them.
Somebody must write the code to run on those chips.
In fact, there won't be just one person but a team of people, a team of specialised programmers.

A different team of will be responsible for automating the systems that drive the car.
They will write the higher-level software that makes softer decisions like whether a stray dog is likely to run in front of the car or not.
The software will decide what angle to turn the steering wheel or how hard to push the accelerator.
This team will rely at least partly on the team that program the low-level devices - their software will be based on a range of data coming from the lower levels.
They'll have a different skillset and a different focus.
Because of this, it's essential that the interface between the teams is well managed.
That is, there must be clear lines of responsibility and communication between the teams.

There are likely to be many other teams too.
The automation team's work will be based on research.
In companies, researchers tend to be a different group of people to the software developers.
Programming for research tends to be bespoke, fast and ready.
Research goals are typically scientific, statistical results.
Researchers are typically not too concerned with writing reusable, user-friendly software.
When a researcher writes code, they are usually the only person to ever run it and they essentially only run it once.

Yet another team might take data from the car and combine it with data from all the other cars to analyse it.
They'll be looking for anomalies that might be dangerous or performance metrics to see can they improve the car.

All these teams must work together in an environment where they have different aims, constraints and issues.
Over many decades the computing industry has elicited some basic principles for code writing.
These make programming a little easier, a bit more efficient, and a lot safer for everyone involved.
In what follows we'll touch on a couple of these principles.

## Reusability

The basic principle guiding most programming work is reusability.
There are numerous buzzwords and phrases related to reusability, such as Don't Repeat Yourself (DRY).
The idea is that you avoid re-writing the same or similar code in different parts of your program.
Rather, you write the code once, give it a name, and then use it by name from then on.

For example, let's say you have many occasions to calculate the factorial of a positive integer. That's not functionality that's typically built-in to a programming language in Python, and you must write a few statements to do it. 

In [1]:
# Calculate 10 factorial.
factorial10 = 1
for i in range(1, 11):
    factorial10 = factorial10 * i
print(factorial10)

3628800


This code calculates only the factorial of 10. If you want to calculate the factorial of 11, then you need to write code that is highly similar.

In [2]:
# Calculate 11 factorial.
factorial11 = 1
for i in range(1, 12):
    factorial11 = factorial11 * i
print(factorial11)

39916800


## Functions

Programmers hate this kind of duplication for many reasons.
One is that if you find a bug in your factorial-calculating code then you must change the code everywhere it's written.
Likewise, should you find a better and more efficient way to calculate the factorial of a number. 
To avoid this, we write a function with clearly defined inputs and output.

In [3]:
def factorial(n):
    """Return the factorial of n."""
    ans = 1
    for i in range(1, n + 1):
        ans = ans * i
    return ans

Now you can call the code using its name and use it to calculate any factorial.

In [4]:
print(factorial(10))
print(factorial(11))

3628800
39916800


Let's say you use this factorial function lots of times in your code, and then realise could make the function more efficient.
For instance, our function multiplies `ans` by 1 in the first iteration of the `for` loop, which has no effect.
We can change `range(1, n + 1)` to `range(2, n + 1)` which will give the same result with less iterations.
Since we've written it as a function, we can just change the code in one place.
It will automatically filter through to all the places where we have called the function.

In [5]:
def factorial(n):
    """Return the factorial of n."""
    ans = 1
    for i in range(2, n + 1):
        ans = ans * i
    return ans

The following code, which is the same code from before, gives the same result but it now works more efficiently.

In [6]:
print(factorial(10))
print(factorial(11))

3628800
39916800


## Over-abstraction

When you get the hang of functions, it's natural to start turning everything into a function.
There is a temptation to start adding bells and whistles too.
It turns out that you can take it too far.

Let's write a simple (and largely unnecessary) function to square a number.
Note that Python has a power operator built-in: `10**2` gives `100`.
We won't use that here - imagine it doesn't exist for now.

In [7]:
def square(x):
    """Return the square of x."""
    return x * x

In [8]:
print(square(11))

121


Now, let's say you also need a function to cube a number.

In [9]:
def cube(x):
    """Return the cube of x."""
    return x * x * x

In [10]:
print(cube(11))

1331


Here's an idea: let's write a power function that not only squares and cubes but can raise any number to any (positive integer) power.

In [11]:
def power(x, y):
    """Return x to the power of y."""
    ans = x
    for i in range(y - 1):
        ans = ans * x
    return ans

In [12]:
print(power(11, 2))
print(power(11, 3))

121
1331


In some ways this is better.
We now have one function instead of two and we are looking super-DRY since we have removed some duplication of code.
However, there are trade-offs to consider.

First, the `power` function is a little more complex to use than each of the `square` and `cube` functions.
We might be more likely to get confused when using it.
Say I've had too much coffee (quite likely) and I incorrectly use the function as `power(2, 10)` when what I meant to write was `power(10, 2)` to get `100`.
That wouldn't have happened if I was using `square` instead, as it only takes one argument.
Of course, you can write `square` and `cube` in terms of `power` if you want.

In [13]:
def square(x):
    """Returns the square of x."""
    return power(x, 2)

def cube(x):
    """Returns the cube of x."""
    return power(x, 3)

print(square(10))
print(cube(10))

100
1000


Another trade-off is the efficiency of the code.
Loops are typically costly operations in programming - they take a little while to get going and complete.
The original `square` and `cube` functions don't use loops.
The `power` function does, and therefore the second versions of `square` and `cube` that are based on it do too.
You should consider how many times you will likely call the functions.
If you use them once every so often, then the (possible) slight inefficiency won't matter.
On the other hand, if you're calling them 1,000 times a second it might.

The keyword here is abstraction.
When we wrote our `factorial` function, we abstracted the idea of multiplying a number by all the numbers less than it.
The factorial function is a high-level concept, an abstraction.
Likewise, when we wrote our `power` function we abstracted the idea of multiplying a number by itself several times.
We added another layer of abstraction when we re-wrote the square and cube functions to use the `power` function.

We've seen a couple of downsides of using abstractions - the complexity and the possible inefficiency.
Unfortunately, there's no one-size-fits-all rule as to when to when you should and shouldn't abstract.
It might help to avoid considering whether abstraction is good or bad.
Rather, think of it as tool that can be used when it helps.
This is where programming becomes a bit of an art.

## Modules and packages

Another benefit of writing our code in functions is that we can share them with our collaborators.
Modern programming is fundamentally based on this idea.
There are not too many people who know how to program everything from the ground up.
Rather, people specialise in one aspect of programming and share their work with others

Functions enable this by hiding the details under their hood.
To use a function, it's enough to know what it does, what inputs it expects and what output it gives.
How it does it is can often be left to someone else.
This is sometimes called the *black box* view of functions.

This is a useful concept and modern programming is largely based on it.
A typical program will be built from lots of functions, often written by lots of different people.
We can write a bunch of useful functions in a single file and pass the file around to our friends so that they can use our functions.
Python calls these kinds of files modules.

Modules are normal Python scripts that can be run through Python just like any other script.
The difference is in their intended use.
Modules are scripts that don't really do anything by themselves - they're meant for use in other programs.
Remember all the programmers and teams involved in self-driving cars?
Modules allow them to each write individual parts of the final code that can then joined up into one final automated driving program.
Modules themselves can be organised in packages, which are essentially folders containing modules.

Modules and packages turn out to be useful for lots of reasons.
When working collaboratively, it's convenient that different programmers can work on small parts of the code organised in separate files.
This helps to avoid problems (sometimes called conflicts) where two programmers edit the same part of the same file at the same time and the computer doesn't know which version of the code to keep.

Modules also provide an easy solution to re-use the same code in different programs.
There are usually some parts of programs that we write that can be re-used in other programs, while there are some parts that are unlikely to be re-used.
If we separate the re-usable parts into a module, we can include just those parts in both programs.

To find out more about modules, including how to write your own, you can consult [part 6 of the Python tutorial](https://docs.python.org/3.5/tutorial/modules.html).

## Libraries

Over time the programming community at large have realised that there are vast swathes of re-usable functionality.
This has led to the creation of libraries of packages, modules and functions that are freely available for incorporation into your own programs.

One important library is (nearly) always installed alongside Python itself, and it is called the standard library.
It contains functions that are commonly used in programs, but not often enough to be in included directly as part of Python itself.
Generally, they try to keep Python lean, without all the extra functions unless they're needed.

To use modules from the standard library you must first tell Python that you plan to use them.
You do this using the `import` keyword.
This incurs a slight cost, but you get the extra functionality.
It turns out there's a function to calculate factorials in the standard library.

In [14]:
import math

print(math.factorial(10))

3628800


Note the use of the `math` name in front of `factorial` function.
This tells Python that the function is in the `math` module, rather than the current file we are writing in.
We must have imported `math` somewhere previously in the current file, otherwise Python will give us an error to tell us it doesn't know what `math` is.
When Python was installed on our computer it installed the math module and configured itself to know where it is.
It can be in a different location depending on your own system, but we don't have to worry about it because the Python installer took care of it.

(I'm glossing over quite a few technicalities here, but they're not important for this discussion. The math module is, in fact, a special module that isn't even written in Python, but that idea isn't relevant here. The only reason I mention it is that if you go looking for math.py on your system you won't find it. A module you can go looking for on your own system is `os`. It helps you access underlying Windows/MacOS/Linux functionality on your system. You can view the file online [here](https://svn.python.org/projects/python/trunk/Lib/os.py), just to convince you that it's a bunch of Python code that someone else wrote.)

## Other useful modules

It turns out that while it's important to know the Python fundamentals, most programmers rarely write code from scratch. Rather they use other people's code as their building blocks. Aside from the modules provide in the standard library, there are many useful ones that are provided online. They come pre-installed with some versions of Python, such as Anaconda. If they aren't pre-installed, you can use programs like `pip` to install them. See [here](https://packaging.python.org/tutorials/installing-packages/) for information about pip.

### numpy

Numpy provides functions for dealing with numerical data efficiently in Python. While Python does already provide good mathematical functionality out of the box, numpy is highly efficient at things like multiplying matrices and dealing with huge arrays of data.

In [15]:
import numpy

# Create a matrix.
A = numpy.array([[5,2,9],[3,1,2],[8,8,3]])

print(A)

[[5 2 9]
 [3 1 2]
 [8 8 3]]


In [16]:
# Access the first row of A.
print(A[0])

# Access the first column of A.
print(A[:,0])

# Access the second element of the third row of A.
print(A[2][1])

# Square A.
print(numpy.matmul(A,A))

[5 2 9]
[5 3 8]
8
[[103  84  76]
 [ 34  23  35]
 [ 88  48  97]]


In [17]:
# Create list of ten random values between 0 (inclusive) and 1 (exclusive).
r = numpy.random.rand(10)
print(r)

# Create list of ten random normal values with mean 5 and standard deviation 0.1.
r = numpy.random.normal(5, 0.1, 10)
print(r)

[0.16016564 0.93871216 0.90648604 0.70680759 0.14447783 0.75262064
 0.99745915 0.23329621 0.60226221 0.57846507]
[4.97477229 4.85849842 5.09728682 4.97306761 4.9631231  5.16656597
 4.95747996 5.04085282 5.01458421 4.94272473]


Numpy is usually used as the basis for other modules, like matplotlib.pyplot which plots data for us.

### matplotlib.pyplot

`matplotlib` is the most popular plotting (graphing) package for Python.
Here we see an example of using it to plot the curve $ y = x^3 $.

In [18]:
import matplotlib.pyplot as p

# Create a numpy array containing the numbers from 0 to 99 inclusive.
x = numpy.array(range(100))
# Create another numpy array from x, by squaring each element in turn.
y = x**2

# Plot x versus y.
p.plot(x, y)

[<matplotlib.lines.Line2D at 0x1e919cf9a90>]

There a couple of things to notice here.
First, there's a dot in `matplotlib.pyplot`.
The dot means that `matplotlib` is a package, and `pyplot` is a module within that package.
It's not important for us to dwell on this.
In use, we treat it much the same as if `matplotlib.pyplot` was the module.

Secondly, the after the `import matplotlib.pyplot` we add `as p`.
This is a handy way of avoiding having to type `matplotlib.pyplot` every time you want to use a function in it.
It lets us, for example, type `p.plot()` instead of `matplotlin.pyplot.plot()`.
It basically gives us a nickname for the module.

## Further reading

As you learn more about Python, and begin to apply it to real-world problems, you will find yourself relying on modules and libraries written by other people.
It's often best not to try to write much code from scratch yourself, as packages like `numpy` and `matplotlib` have been written by many people with a good deal of programming and mathematical expertise.
They've been built up over several years, sometimes decades, and are usually heavily informed by research in these areas.
Rather, for a given programming task, you should try to use packages like these as your building blocks.
In future you might consider contributing to their development, as they are open source.
For now, it's easy to get started with them, as there is a huge amount of beginner's literature available online, such as the following.

1. [Pyplot tutorial (https://matplotlib.org/users/pyplot_tutorial.html)](https://matplotlib.org/users/pyplot_tutorial.html)
2. [Numpy Quickstart tutorial (https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)](https://docs.scipy.org/doc/numpy-dev/user/quickstart.html)
3. [Python numpy tutorial (http://cs231n.github.io/python-numpy-tutorial/)](http://cs231n.github.io/python-numpy-tutorial/)
4. [Scipy lecture notes (http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html)](http://www.scipy-lectures.org/intro/matplotlib/matplotlib.html)