In [1]:
import numpy as np

Imagine that we have several categories of data on a number of people (say, name, age, and weight), and we'd like to store these values for use in a Python program. It would be possible to store these in three separate arrays:

In [2]:
name   = ['Alice', 'Bob', 'Cathy', 'Doug']
age    = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

In [3]:
x = np.zeros(4, dtype=int)

In [4]:
data = np.zeros

In [5]:
data = np.zeros(4, dtype={'names':('name','age','weight'),
                          'formats':('U10','i4','f8')} )

In [6]:
print(data)
data.dtype

[('', 0, 0.) ('', 0, 0.) ('', 0, 0.) ('', 0, 0.)]


dtype([('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

In [7]:
dataOne = np.ones(4, dtype={'names':('name','age','weight'),
                          'formats':('U10','i4','f8')} )
dataOne

array([('1', 1, 1.), ('1', 1, 1.), ('1', 1, 1.), ('1', 1, 1.)],
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

both np.zeros and np.ones produce "structured data" in the form of tuples (each representing an element/raw) according to the specified dtype. Note the behaviour of np.zeros, which sets to '' the column specified to be unicode string, ando not '0', while np.ones sets '1'

In [8]:
# dt = np.dtype({'names':('name','age','weight'),'formats':('U10','i4','f8')})
{'names':('name', 'age', 'weight'),
          'formats':('U10', 'i4', 'f8')}
#print(type(dt))
#dt

{'names': ('name', 'age', 'weight'), 'formats': ('U10', 'i4', 'f8')}

In [13]:
dt = np.dtype( {'names':('name','age','weight'), 'formats':('U10','i4','f8')} )
print(dt)
print(type(dt))

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]
<class 'numpy.dtype'>


In [14]:
dataSame = np.empty_like(4, dtype=dt )
dataSame

array(('', 0, 0.),
      dtype=[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')])

As we had hoped, the data CAN BE arranged together in one convenient block of memory.

The handy thing with structured arrays is that you can now refer to values either by index or by name:

In [16]:
data['name']   = name
data['age']    = age
data['weight'] = weight
print(data)

[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )
 ('Doug', 19, 61.5)]


In [17]:
# Get all names
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')

In [18]:
# Get first row of data
data[0]

('Alice', 25, 55.)

In [19]:
# Get the name from the last row
data[-1]['name']

'Doug'

Using Boolean masking, this even allows you to do some more sophisticated operations such as filtering on age:

In [22]:
# Get names where age is under 30
print( data[data['age'] < 30]['name'] )
print( data[data['age'] < 30] )

['Alice' 'Doug']
[('Alice', 25, 55. ) ('Doug', 19, 61.5)]


Note that if you'd like to do any operations that are any more complicated than these, you should probably consider the Pandas package, covered in the next chapter. As we'll see, Pandas provides a Dataframe object, which is a structure built on NumPy arrays that offers a variety of useful data manipulation functionality similar to what we've shown here, as well as much, much more.

## Creating structured arrays

Structured array data types can be specified in a number of ways. Earlier, we saw the dictionary method:


In [None]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':('U10', 'i4', 'f8')})

For clarity, numerical types can be specified using Python types or NumPy dtypes instead:


In [23]:
np.dtype({'names':('name', 'age', 'weight'),
          'formats':((np.str_, 10), int, np.float32)})

dtype([('name', '<U10'), ('age', '<i8'), ('weight', '<f4')])

A compound type can also be specified as a list of tuples:

In [24]:
np.dtype([('name', 'S10'), ('age', 'i4'), ('weight', 'f8')])

dtype([('name', 'S10'), ('age', '<i4'), ('weight', '<f8')])

If the names of the types do not matter to you, you can specify the types alone in a comma-separated string: 

In [25]:
np.dtype('S10,i4,f8')

dtype([('f0', 'S10'), ('f1', '<i4'), ('f2', '<f8')])

## More Advanced Compound Types
It is possible to define even more advanced compound types. For example, you can create a type where each element contains an array or matrix of values. Here, we'll create a data type with a mat component consisting of a 3×3 floating-point matrix:

In [26]:
tp = np.dtype([('id', 'i8'), ('mat', 'f8', (3, 3))])
X = np.zeros(1, dtype=tp)
print(X[0])
print(X['mat'][0])

(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
