In [2]:
import numpy as np

## Structured ndarray
- A structured array is an ndarray in which each element can be thought of as representing a struct in C (hence the “structured” name)
- Structured arrays allow you to define a custom data structure with multiple fields (columns), each with its own name and data type.
- Creation: define a data type (dtype) with named fields, where each field has a specific type.

Benefits: 
- efficiently store data with multiple types in one array
- suitable for handling complex datasets like rows in a CSV file, where each column can have a different type
- can act as a lightweight alternative to a DataFrame if you're dealing with simpler datasets in NumPy 

Ideal for: 
- writing and reading data to/from disk (including memory maps)
- transporting data over a network
- interfacing with C-style binary data formats

In [5]:
dtype = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]
# name is a string with up to 10characters
# age is a 32bit integer
# weight is a 32bit floating point number
data = np.array([('Alice', 25, 55.5), ('Bob', 30, 85.2), ('Cathy', 28, 68.3)], dtype=dtype)
print(data)

# the data array now stores info in a structured format. can be accessed by each field by name

[('Alice', 25, 55.5) ('Bob', 30, 85.2) ('Cathy', 28, 68.3)]


In [7]:
# accessing fields
print(data['name'])
print(data['age'])

print(data[0])

['Alice' 'Bob' 'Cathy']
[25 30 28]
('Alice', 25, 55.5)


### Nested Data Types

In [9]:
dtype = [
    ('person', [('name', 'U10'), ('age', 'i4')]),
    ('weight', 'f4')
]

data = np.array([
    (('apple', 24), 56.3),
    (('Bob', 45), 89.5)
], dtype=dtype)
print(data)

[(('apple', 24), 56.3) (('Bob', 45), 89.5)]


In [11]:
print(data['person'])
print(data['person']['name'])
print(data['person']['age'])
print(data[0])

[('apple', 24) ('Bob', 45)]
['apple' 'Bob']
[24 45]
(('apple', 24), 56.3)


### Multidimensional Fields
useful for storing structured data where one or more fields naturally represent arrays of values

In [12]:
dtype = [
    ('name', 'U10'),
    ('scores', 'f4', (3,)) # Multidimensional field: a 1D array with 3 elements
]

data = np.array([
    ('Alice', [85.0, 90.5, 78.0]),
    ('Bob', [72.0, 88.5, 91.0])
], dtype=dtype)

print(data)

[('Alice', [85. , 90.5, 78. ]) ('Bob', [72. , 88.5, 91. ])]


In [13]:
print(data['name'])
print(data['scores'])

print(data['scores'][0])
print(data['scores'][0][1])

['Alice' 'Bob']
[[85.  90.5 78. ]
 [72.  88.5 91. ]]
[85.  90.5 78. ]
90.5


- nested data type: allows fields to contain subfields, enabling hierarchical data structures
- multidimensional fields: allow fields to store arrays as part of structured data 

## Record Arrays
- subclass of structured arrays that add the ability to access fields as attrributes, making them more convenient for interactive use
- To create a record array, you can use `np.recarray` or convert an existing structured array using `np.core.records.fromarrays`

In [16]:
dtype = [('name', 'U10'), ('age', 'i4'), ('weight', 'f4')]
data = np.array([('Alice', 25, 55.5), ('Bob', 30, 85.2), ('Cathy', 28, 68.3)], dtype=dtype)

# Convert the structured array `data` to a record array
record_data = data.view(np.recarray)

# Access fields as attributes
print(record_data.name)  # Output: ['Alice' 'Bob' 'Cathy']
print(record_data.age)   # Output: [25 30 28]


# With record arrays, you can access the fields using record_data.name instead of record_data['name'], making the syntax cleaner and potentially more readable.

['Alice' 'Bob' 'Cathy']
[25 30 28]
