## Structured Arrays

<b>Structured arrays</b> or <b>record arrays</b> are useful when you perform computations, and at the same time you could keep closely related data together. Structured arrays provide efficient storage for compound, heterogeneous data.

NumPy also provides powerful capabilities to create arrays of records, as multiple data types live in one NumPy array. However, one principle in NumPy that still needs to be honored is that the data type in each field (think of this as a column in the records) needs to be homogeneous.

In [1]:
import numpy as np

Imagine that we have several categories of data on a number of students say, name, student ID, and aggregate marks(in %).

In [2]:
name  = ["Andy","Rosy","Susan","Rick"]

studentID  = [106,102,139,167]

scores = [85.5,90.34,87.65,78.25]

There's nothing here that tells us that the three arrays are related; it would be more natural if we could use a single structure to store all of this data. 

##### Define the np array with the names of the 'columns' and the data format for each
* U10 represents a 10-character Unicode string
* i4 is short for int32 (i for int, 4 for 4 bytes)
* f8 is shorthand for float64

In [3]:
student_data = np.zeros(4, dtype={'names':('name', 'studentID', 'scores'),
                          'formats':('U10', 'i4', 'f8')})

#### np.zeros() for a string sets it to an empty string

In [4]:
student_data

array([('', 0, 0.), ('', 0, 0.), ('', 0, 0.), ('', 0, 0.)],
      dtype=[('name', '<U10'), ('studentID', '<i4'), ('scores', '<f8')])

In [5]:
print(student_data.dtype)

[('name', '<U10'), ('studentID', '<i4'), ('scores', '<f8')]


Now that we've created an empty container array, we can fill the array with our lists of values

In [6]:
student_data['name'] = name
student_data['studentID'] = studentID
student_data['scores'] = scores

In [7]:
print(student_data)

[('Andy', 106, 85.5 ) ('Rosy', 102, 90.34) ('Susan', 139, 87.65)
 ('Rick', 167, 78.25)]


In [8]:
print(student_data)

[('Andy', 106, 85.5 ) ('Rosy', 102, 90.34) ('Susan', 139, 87.65)
 ('Rick', 167, 78.25)]


Tprint(employee_data)he handy thing with structured arrays is that you can now refer to values either by index or by name

In [9]:
student_data['name']

array(['Andy', 'Rosy', 'Susan', 'Rick'], dtype='<U10')

In [10]:
student_data['studentID']

array([106, 102, 139, 167], dtype=int32)

In [11]:
student_data['scores']

array([85.5 , 90.34, 87.65, 78.25])

If you index student_data at position 1 you get a structure:

In [12]:
student_data[1]

('Rosy', 102, 90.34)

##### Get the name attribute from the last row

In [13]:
student_data[-1]['name']

'Rick'

##### Get names where score is above 86

In [14]:
student_data[student_data['scores'] > 86]['name']

array(['Rosy', 'Susan'], dtype='<U10')

Note that if you'd like to do any operations that are any more complicated than these, you should probably consider the Pandas package with provides a powerful data structure called data frames.