# Numerics and Numpy

One of the growing areas of use for Python is within the scientific communities. One
issue, which has always been an issue, is that Python is not very efficient when doing
numeric calculations. Luckily, Python’s very design is meant to make it relatively easy to
expand its functionality. The core module that helps in scientific calculations is the Numpy
module. Numpy takes the most inefficient parts of dealing with numerical calculations
and outsources them to external libraries that are written in C. It uses the same standard
open source libraries that are used in other applications written specifically to do heavy
number-crunching. <br>
The core of the Numpy functionality is provided by a new object called an array . An
array is a multi-dimensional object that contains elements of one datatype. This means
that the functions within the Numpy module are free to make assumptions about what
can be done with the data without having to check every element as it is being accessed.
### 9-1. Creating Arrays

You want to create arrays to use in other Numpy functions.
<br>
The simplest way to create an array is to use the supplied creation function to take
existing data within a list and convert it into a new array object. You can also use the
empty function to create a new empty array object.
<br>
The simplest form of the array function simply takes a list of values and returns a new
array object, as in Listing 9-1 .
### Listing 9-1. Basic Array Creation

In [1]:
import numpy as np
list1 = [1, 2, 3.0, 4]

In [2]:
array1 = np.array(list1)
array1

array([1., 2., 3., 4.])

This will return a one-dimensional array where each of the elements is a real
number. The default behavior of the array function is to select the smallest datatype that
will hold each of the elements in the original list. You can specifically select the datatype
you wish to use with code such as Listing 9-2 .
### Listing 9-2. Creating an Array of Complex Numbers

In [3]:
complex1 = np.array(list1, dtype=complex)
complex1

array([1.+0.j, 2.+0.j, 3.+0.j, 4.+0.j])

If you have a matrix of data that you need to work with, you can simply hand in a list
of lists, where each list is a row of your matrix, as in Listing 9-3 .
#### Listing 9-3. Creating a Matrix

In [4]:
matrix1 = np.array([[1, 2], [3, 4]])
matrix1

array([[1, 2],
       [3, 4]])

If you don’t have data ready, but want somewhere to store data, there is a function
to create an empty array of some fixed size and of a particular datatype. For example,
Listing 9-4 shows how to make an empty two-dimensional array of integers.
#### Listing 9-4. Creating an Empty Array of Integers

In [5]:
empty1 = np.empty([2, 2], dtype=int)
empty1

array([[4607182418800017408, 4611686018427387904],
       [4613937818241073152, 4616189618054758400]])

The issue with this function is that it may not initialize the values in any way,
depending on the operating system that it is running on. You just end up with whatever
data exists in those memory locations. While this is slightly faster, it does mean that you
need to be aware that the initial values in your new array are junk data. If you need to start
with some initial values, you can start with either zeroes or ones, as in Listing 9-5 .
#### Listing 9-5. Creating Arrays of Zeroes and Ones

In [6]:
zero1 = np.zeros((2, 3), dtype=float)
zero1

array([[0., 0., 0.],
       [0., 0., 0.]])

In [7]:
ones1 = np.ones((3, 2), dtype=int)
ones1

array([[1, 1],
       [1, 1],
       [1, 1]])

In [None]:
Be aware that these two functions take a sequence of values, rather than a list, for the
dimensions of the newly created array.
### 9-2. Copying an Array

You need to make a copy of an array for further processing.
<br>
There are three ways of sharing data across different parts of your program: no-copy
access, shallow copying, and deep copying.
<br>
You can make your arrays accessible to different parts of your program by using more
than one variable at a time. In Listing 9-6 , you can see how to assign the same array to
two different variables.
#### Listing 9-6. Using No-Copy Sharing

In [8]:
a = np.ones((6,), dtype=int)
a

array([1, 1, 1, 1, 1, 1])

In [9]:
b = a

As with the rest of Python in general, these two variables point to the same actual
object in memory. You can use either to affect the actual object.
The second type of access sharing is through a shallow copy, where the data itself
isn’t copied, only information about the data. This is possible because an array object
consists of two parts. The first is the data that is being stored within the array, while the
second contains metadata about the array, such as the shape of the array. Listing 9-7
shows how to create a shallow copy by creating a view.
### Listing 9-7. Shallow Copies

In [10]:
view1 = ones1.view()
# Do these variables point to the same object?
view1 is ones1

False

In [11]:
view1.base is ones1

True

You can access the original object by using the base property of the new view. You
can change the metadata through the view, as in Listing 9-8 .
#### Listing 9-8. Changing the Shape of a View

In [12]:
view1.shape = 2,3
ones1

array([[1, 1],
       [1, 1],
       [1, 1]])

In [13]:
view1

array([[1, 1, 1],
       [1, 1, 1]])

This changes the shape of the matrix that the data is stored in (the number of
columns and rows). The third form of copy is a deep copy, where all parts of an array are
copied over. This is handled by the copy method, as in Listing 9-9 .
### Listing 9-9. Deep Copy of an Array

In [14]:
copy1 = a.copy()
a is copy1

False

In [15]:
a is copy1.base

False

In [None]:
### 9-3. Accessing Array Data

You need to access individual elements or subsections of an array.
<br>
You can access individual elements with multidimensional list indexing, and subsections
can be accessed with slices.
<br>
For a one-dimensional array, you can access individual elements with the same indexing
that is used for lists. Listing 9-10 shows a simple example.
#### Listing 9-10. Changing the Value of an Array Element

In [16]:
a[1] = 2
a

array([1, 2, 1, 1, 1, 1])

Slices also work the same way, as in Listing 9-11 .
#### Listing 9-11. Getting a Slice of an Array

In [17]:
a[1:3]

array([2, 1])

One thing to note is that a slice actually returns a shallow copy of the original array,
so no copy of the original data is made.
When dealing with multi-dimensional arrays, you simply need to extend the
indexing by adding one extra value for each additional dimension. For example, Listing 9-12 shows how to get a single element from a matrix.
#### Listing 9-12. Accessing One Element from a Matrix

In [18]:
ones1[1, 1] = 2
ones1

array([[1, 1],
       [1, 2],
       [1, 1]])

If you were interested in getting a single row, you could do so with the example in
Listing 9-13 .
#### Listing 9-13. Selecting a Row from a Matrix

In [19]:
ones1[1, : ]

array([1, 2])

### 9-4. Manipulating a Matrix

You need to manipulate a given matrix. This includes inversion, transposing, and
calculating the norm.
<br>
Numpy includes a full suite of linear algebra tools to handle matrix manipulations .
<br>
If you start with a simple 2-by-2 matrix, you can transpose it with the code in Listing 9-14 .
### Listing 9-14. Inverting a Matrix

In [20]:
a = np.array([[1.0, 2.0], [3.0, 4.0]])
np.linalg.inv(a)

array([[-2. ,  1. ],
       [ 1.5, -0.5]])

The linalg submodule also provides a function to calculate the norm, as in
Listing 9-15 .
#### Listing 9-15. Finding a Norm

In [21]:
np.linalg.norm(a)

5.477225575051661

If you want to find the trace of a matrix, this is actually a method of the array object,
as in Listing 9-16
### Listing 9-16. Finding the Trace of a Matrix

In [22]:
 a.trace()

5.0

The transpose of a matrix is also a method of the array, as in Listing 9-17 .
### Listing 9-17. Finding the Transpose of a Matrix

In [23]:
a.transpose()

array([[1., 3.],
       [2., 4.]])

In [None]:
### 9-5. Calculating Fast Fourier Transforms

You need to calculate a Fast Fourier Transform to look at the frequency spectrum of some
collection of data.

Numpy provides a suite of different types of FFT ( Fast Fourier Transform ) functions.
<br>
The discrete FFT can be used with one-dimensional, two-dimensional, or n-dimensional
data. The math for each of these cases is very different, however. So Numpy provides
separate functions for each of the cases, as you can see in Listing 9-18 .
#### Listing 9-18. Discrete FFTs


In [24]:
# a is a 1-dimensional array
np.fft.fft(a)

array([[ 3.+0.j, -1.+0.j],
       [ 7.+0.j, -1.+0.j]])

In [None]:
# b is a 2-dimensional array
np.fft.fft2(b)

In [None]:
# c is a 3-dimensional array
np.fft.fftn(c)

As you can see, all of the FFT functions are actually arranged within a submodule
of Numpy called fft . If you use a larger data set than appropriate for the chosen FFT
function, then the last x number of axes are used. For example, if you use the array c in
the one-dimensional FFT, it will use the last axis as the input for the calculation. If you
wish to, you can specify a different axis with the axis parameter, as in Listing 9-19 .
#### Listing 9-19. FFT Over Other Axes

In [None]:
np.fft.fft(c, axis=1)

### 9-6. Loading File Data into an Array

You want to load data from a file directly into an array.
<br>
Numpy can read and write plain text files, as well as its own special binary format.
<br>
To read data from a plain text file, you can use the function loadtxt() , as in Listing 9-20 .
### Listing 9-20. Reading in a Text File

In [None]:
txt1 = np.loadtxt('mydata.txt')

This function assumes that your data is laid out in columns and rows, where each
line is a row. Defining the columns is done by delimiting the separate values with some
other character. By default, this is done with whitespaces. The usual format with scientific
data is comma-separated values (CSV). If this is the case, you can load your data with the
code given in Listing 9-21 .
#### Listing 9-21. Loading a CSV File

In [None]:
txt2 = np.loadtxt('mydata.txt', delimiter=',')

If you have data that has been saved in Numpy ’s special binary format, you can use a
simple load command to load it back into memory, as in Listing 9-22 .
Listing 9-22. Loading Binary Data

In [None]:
 data = np.load('mydata.npy')

### 9-7. Saving Arrays

You have data in an array that you want to save to disk.
<br>
As with loading data, you have a few options when saving data. You can either save it to
Numpy ’s binary format or save it to some raw text format.
<br>
To save 9 data using Numpy ’s binary format, you can simply use the save function, as in
Listing 9-23 .
### Listing 9-23. Saving Data Using Numpy’s Binary Format

In [None]:
np.save('mydata.npy', data)

If the filename you give it in the above call doesn’t have an .npy file extension, one
will be added to it. If, instead, you want to save the data to a plain text file so that it can be
used by other programs, you can use the savetxt function call, as in Listing 9-24 .
### Listing 9-24. Saving a CSV File

In [None]:
 np.savetxt('mydata.csv', data, delimiter=',')

In this case, you explicitly set the delimiter as the comma, giving you a CSV file. If you
don’t set a delimiter, the default is a single space character.
### 9-8. Generating Random Numbers

You need to generate good quality random numbers.
<br>
Numpy provides a Mersenne Twister pseudo-random number generator , which provides
very good quality random numbers. It can provide random numbers based on several
distributions, like binomial, chisquare, gamma, and exponential.

<br>
If you need random numbers using a particular distribution, you can use methods
provided by RandomState to generate them. Listing 9-25 shows how to generate a
random value from the geometric distribution.
### Listing 9-25. Generating Random Numbers from a Geometric Distribution

In [27]:
rand1 = np.random.geometric(p=0.5)

Most of these generators include parameters that control the details for each
distribution. They usually also include a size parameter, which you can use to ask for an
array of random values rather than just a single one.
If you want to have a repeatable sequence of random numbers (if you are testing
code, for example), you can explicitly set a seed with the code in Listing 9-26 .
### Listing 9-26. Setting a Seed for Random Number Generation

In [28]:
np.random.seed(42)

This seed also gets initialized when RandomState is created. If you don’t hand one
in, then RandomState will either try to read a value from the operating system random
number generator (for example, /dev/urandom on Linux), or it will set the seed based on
the clock.
In most cases, you can get the type of random numbers used with the code in
Listing 9-27 .
#### Listing 9-27. Generating Random Numbers

In [29]:
rand2 = np.random.random()

### 9-9. Calculating Basic Statistics

You need to do basic statistics on data stored in arrays.
<br>
Numpy provides a series of statistical functions that operate on arrays of various
dimensions. You can do all of the standard simple statistical analyses that you are likely to
need.
<br>
Given a set of data stored in a one-dimensional array, you can find the mean, median,
variance, and standard deviation with the code in Listing 9-28 .
### Listing 9-28. Basic Statistics

In [30]:
a = np.array([1, 2, 3, 4, 5])
np.mean(a)

3.0

In [31]:
np.median(a)

3.0

In [32]:
np.var(a)

2.0

In [33]:
np.std(a)

1.4142135623730951

If you have multi-dimensional data, you can select which axis to calculate these
statistics along.
### 9-10. Computing Histograms

You have a series of data that you need to group into bins and calculate a histogram.
<br>
Numpy contains a handful of related functions to work with histograms, both one-
dimensional and multi-dimensional.
<br>
Assuming you have your data series stored in the variable b , you can generate a histogram
with the code in Listing 9-29 .
#### Listing 9-29. Generating a Simple Histogram

In [34]:
 b = np.array([1,2,1,2,3,1,2,3,3,2,1])
np.histogram(b)

(array([4, 0, 0, 0, 0, 4, 0, 0, 0, 3]),
 array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. ]))

By default, Numpy will try to group your data into 10 bins. The first array tells you how
many values are in each bin and the second array tells you the boundaries of each bin.
You can set the number of bins by adding in a second parameter, as in Listing 9-30 .
#### Listing 9-30. Histograms with a Set Bin Count

In [35]:
np.histogram(b, 3)

(array([4, 4, 3]), array([1.        , 1.66666667, 2.33333333, 3.        ]))