Python lists are very flexible, but they are slow for big calculations.
NumPy arrays can store purely numerical data in much less space, and are much simpler and faster for calculations.

We can calculate the mean with a NumPy array instead of a list:

In [2]:
import numpy as np
fluxes = np.array([23.3, 42.1, 2.0, -3.2, 55.6])
m = np.mean(fluxes)
print(m)

23.96


You should get the same answer as you did before. This may not look simpler yet, but it will in the future.

NumPy has a great range of numerical functions. For example, to calculate the size of an array, and the standard deviation:

In [3]:
import numpy as np
fluxes = np.array([23.3, 42.1, 2.0, -3.2, 55.6])
print(np.size(fluxes)) # length of array
print(np.std(fluxes))  # standard deviation

5
22.5853580888


The NumPy website has a full list of functions.

Tables are often stored in comma-separated values (CSV) format. You can use Python's built-in string functions to read a CSV file into a list and process it.

The following examples read this data.csv file:

In [4]:
# data.csv

# 8.84,17.22,13.22,3.84
# 3.99,11.73,19.66,1.27
# 16.14,18.72,7.43,11.09

Our file has several rows and columns. We want to store each row in a list and the whole file as a list of these rows.

The program loops over each line in the file, splitting the row into a list of values, and appending each row to data:


In [5]:
data = []
for line in open('data.csv'):
  data.append(line.strip().split(','))

print(data)

[['8.84', '17.22', '13.22', '3.84'], ['3.99', '11.73', '19.66', '1.27'], ['16.14', '18.72', '7.43', '11.09'], [''], ['']]


The strip method removes whitespace (including the newline) from the ends of line. The split method creates a list of strings using the ',' character as the separator between items.

Each value is a string!
 
The split method returns a list of strings, so each value in each row is a string. We have to convert the values to floats before we can do any calculations with them.

Now we can store the data in lists, we need to convert each item from a string to a float. We could do this using nested for loops:


In [6]:
data = []
for line in open('data.csv'):
  row = []
  for col in line.strip().split(','):
    row.append(float(col))
  data.append(row)

print(data)

ValueError: could not convert string to float: 

NumPy has a simpler asarray function to do this conversion:

In [7]:
data = []
for line in open('data.csv'):
  data.append(line.strip().split(','))

data = np.asarray(data, float)
int(data)

ValueError: setting an array element with a sequence.

Most NumPy functions operate on the whole array at once rather than individual items.
 
The NumPy loadtxt function can automatically read a CSV file into a NumPy array, including converting from string to numbers.
 
Using our example file from the previous slide:

In [8]:
# data.csv

# 8.84,17.22,13.22,3.84
# 3.99,11.73,19.66,1.27
# 16.14,18.72,7.43,11.09

Reading and converting to floats becomes a single statement:

In [9]:
import numpy as np
data = np.loadtxt('data.csv', delimiter=',')
print(data)

[[  8.84  17.22  13.22   3.84]
 [  3.99  11.73  19.66   1.27]
 [ 16.14  18.72   7.43  11.09]]


The NumPy loadtxt function is simpler, faster, and less error-prone than our previous solution. Use it!