## ``Numpy`` 的结构化数据

In [1]:
name = ['Alice', 'Bob', 'Cathy', 'Doug']
age = [25, 45, 37, 19]
weight = [55.0, 85.5, 68.0, 61.5]

创建无结构化的数据

In [2]:
import numpy as np

x = np.zeros(4, dtype=np.int)
x

array([0, 0, 0, 0])

创建结构化数据，dtype是一个字典，包括names和formats

In [3]:
import numpy as np

data = np.zeros(4, dtype={'names':('name', 'age', 'weight'),
                          'formats':('U10', 'i4', 'f8')})
print(data.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


In [4]:
data['name'] = name
data['age'] = age
data['weight'] = weight 
print(data)

[('Alice', 25,  55. ) ('Bob', 45,  85.5) ('Cathy', 37,  68. )
 ('Doug', 19,  61.5)]


In [5]:
data['name']

array(['Alice', 'Bob', 'Cathy', 'Doug'],
      dtype='<U10')

In [6]:
data['age'].mean()

31.5

In [7]:
data['weight'].max()

85.5

In [8]:
data[data['age'] < 30]['name']

array(['Alice', 'Doug'],
      dtype='<U10')

### 使用元组创建结构化数据

In [10]:
data2 = np.zeros(4, dtype=[('name', 'U10'), ('age', 'i4'), ('weight', 'f8')])
print(data2.dtype)

[('name', '<U10'), ('age', '<i4'), ('weight', '<f8')]


In [11]:
data2['name'] = name
data2['age'] = age
data2['weight'] = weight 
print(data2)

[('Alice', 25,  55. ) ('Bob', 45,  85.5) ('Cathy', 37,  68. )
 ('Doug', 19,  61.5)]


### RecordArray

和structured array一样，但是属性（列向量）可以直接使用[.属性名]的方式访问

In [12]:
data_rec = data.view(np.recarray)
data_rec.age

array([25, 45, 37, 19], dtype=int32)

方便，但效率稍低

In [15]:
%timeit data['age']
%timeit data_rec['age']
%timeit data_rec.age

228 ns ± 13.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
6.01 µs ± 193 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
8.85 µs ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
