# Numpy数据存取与函数

## CSV文件存储

** np.savetxt(frame, array, fmt='%.18e', delimiter=None) **

- frame : 文件、字符串或产生器，可以是.gz或.bz2的压缩文件
- array : 存入文件的数组
- fmt : 写入文件的格式，例如：%d %.2f %.18e
- delimiter : 分割字符串，默认是任何空格

In [1]:
import numpy as np
a = np.arange(100).reshape(5,20)

In [2]:
np.savetxt('a.csv', a, fmt='%d', delimiter=',')

** np.loadtxt(frame, dtype=np.float, delimiter=None， unpack=False) **

- frame : 文件、字符串或产生器，可以是.gz或.bz2的压缩文件
- dtype : 数据类型，可选
- delimiter : 分割字符串，默认是任何空格
- unpack : 如果True，读入属性将分别写入不同变量

In [7]:
b = np.loadtxt('a.csv', dtype=np.int, delimiter=',', unpack=False)

In [8]:
b

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
        17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36,
        37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56,
        57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76,
        77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
        97, 98, 99]])

** csv文件局限性 **

- CSV只能有效存储一维和二维数组
- np.savetxt() np.loadtxt()只能有效存取一维和二维数组

## 多维数据的存取

** a.tofile(frame, sep='', format='%s') **

- frame : 文件、字符串
- sep : 数据分割字符串，如果是空串，写入文件为二进制
- format : 写入数据的格式

In [9]:
a = np.arange(100).reshape(5, 10, 2)
a.tofile('b.dat', sep=',', format='%d')

** np.fromfile(frame, dtype=float, count=‐1, sep='') **

- frame : 文件、字符串
- dtype : 读取的数据类型
- count : 读入元素个数，‐1表示读入整个文件
- sep : 数据分割字符串，如果是空串，写入文件为二进制

In [10]:
b = np.fromfile('b.dat', dtype=int, count=-1, sep=',')

In [11]:
b

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
       68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84,
       85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [12]:
c = np.fromfile('b.dat', dtype=int, count=-1, sep=',').reshape(5, 10, 2)

In [13]:
c

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29],
        [30, 31],
        [32, 33],
        [34, 35],
        [36, 37],
        [38, 39]],

       [[40, 41],
        [42, 43],
        [44, 45],
        [46, 47],
        [48, 49],
        [50, 51],
        [52, 53],
        [54, 55],
        [56, 57],
        [58, 59]],

       [[60, 61],
        [62, 63],
        [64, 65],
        [66, 67],
        [68, 69],
        [70, 71],
        [72, 73],
        [74, 75],
        [76, 77],
        [78, 79]],

       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

> 该方法需要读取时知道存入文件时数组的维度和元素类型  
>a.tofile()和np.fromfile()需要配合使用  
>可以通过元数据文件来存储额外信息   

### numpy便捷文件存取



** np.save(fname, array) 或 np.savez(fname, array) **

- fname : 文件名，以.npy为扩展名，压缩扩展名为.npz
- array : 数组变量    


** np.load(fname) **

- fname : 文件名，以.npy为扩展名，压缩扩展名为.npz

In [14]:
a = np.arange(100).reshape(5, 10, 2)

In [15]:
np.save('a.npy', a)

In [16]:
b = np.load('a.npy')

In [17]:
b

array([[[ 0,  1],
        [ 2,  3],
        [ 4,  5],
        [ 6,  7],
        [ 8,  9],
        [10, 11],
        [12, 13],
        [14, 15],
        [16, 17],
        [18, 19]],

       [[20, 21],
        [22, 23],
        [24, 25],
        [26, 27],
        [28, 29],
        [30, 31],
        [32, 33],
        [34, 35],
        [36, 37],
        [38, 39]],

       [[40, 41],
        [42, 43],
        [44, 45],
        [46, 47],
        [48, 49],
        [50, 51],
        [52, 53],
        [54, 55],
        [56, 57],
        [58, 59]],

       [[60, 61],
        [62, 63],
        [64, 65],
        [66, 67],
        [68, 69],
        [70, 71],
        [72, 73],
        [74, 75],
        [76, 77],
        [78, 79]],

       [[80, 81],
        [82, 83],
        [84, 85],
        [86, 87],
        [88, 89],
        [90, 91],
        [92, 93],
        [94, 95],
        [96, 97],
        [98, 99]]])

## Numpy的随机数函数

### Numpy的随机数函数库

|函数 |说明|
|:--:|:---|
|rand(d0,d1,..,dn) |根据d0‐dn创建随机数数组，浮点数，[0,1)，均匀分布|
|randn(d0,d1,..,dn) |根据d0‐dn创建随机数数组，标准正态分布|
|randint(low[,high,shape]) |根据shape创建随机整数或整数数组，范围是[low, high)|
|seed(s) |随机数种子，s是给定的种子值|

In [20]:
a = np.random.rand(3, 4, 5)

In [21]:
a

array([[[ 0.58563049,  0.85261608,  0.16804246,  0.28158164,  0.2784136 ],
        [ 0.18745988,  0.11360339,  0.00509407,  0.98202047,  0.39977068],
        [ 0.6832677 ,  0.47902946,  0.47265638,  0.24517086,  0.43835245],
        [ 0.5149072 ,  0.13447436,  0.75539091,  0.30414486,  0.62595929]],

       [[ 0.91600183,  0.89534044,  0.29193375,  0.35974638,  0.48345238],
        [ 0.0966452 ,  0.98205555,  0.60104178,  0.20584068,  0.87897145],
        [ 0.76815448,  0.75606502,  0.91255528,  0.28787284,  0.05832752],
        [ 0.50979921,  0.6548834 ,  0.71815205,  0.86219037,  0.88033228]],

       [[ 0.47257381,  0.77055345,  0.95687207,  0.09894681,  0.94413114],
        [ 0.67350119,  0.30483295,  0.15581034,  0.9187968 ,  0.71142273],
        [ 0.88817527,  0.02431243,  0.57481908,  0.40234218,  0.93238656],
        [ 0.78205143,  0.31177679,  0.05236487,  0.2861462 ,  0.3644832 ]]])

In [22]:
b = np.random.randn(3, 4, 5)

In [23]:
b

array([[[ 1.25082927, -2.0024305 , -0.4996733 ,  0.85118598,  0.39426479],
        [-0.58900946,  0.79159322,  0.64239492,  0.8182633 , -1.12626777],
        [ 0.0903343 , -1.36351025, -1.47622067,  0.26584168,  1.8403392 ],
        [ 1.78170647,  0.51395299,  0.24554429,  0.05416128, -1.75584357]],

       [[-0.61212369, -0.67236653,  0.22644777,  2.37582119, -0.14531585],
        [-0.82506039, -0.15899379,  1.24405302, -0.61233669, -0.83793272],
        [ 2.78588117, -1.0817449 , -0.39994801,  0.51771132,  0.17092246],
        [ 0.08312971, -1.38331675,  0.95171506, -0.00622721,  0.88496462]],

       [[-1.02097919, -0.48436356,  1.3923381 ,  1.19380588,  0.06840044],
        [-0.11790914,  1.17484778,  0.1173336 ,  1.75073489,  0.37758789],
        [-0.24962329,  0.60586089,  1.48144684,  0.33171584,  1.06744775],
        [-1.47755598,  1.200496  ,  0.53757848,  0.47831703, -0.87701125]]])

In [24]:
c = np.random.randint(100,200,(3,4))

In [25]:
c

array([[185, 163, 101, 137],
       [103, 186, 124, 139],
       [154, 126, 149, 195]])

In [26]:
np.random.seed(10)

In [27]:
np.random.randint(100,200,(3,4))

array([[109, 115, 164, 128],
       [189, 193, 129, 108],
       [173, 100, 140, 136]])

In [28]:
np.random.seed(10)

In [29]:
np.random.randint(100,200,(3,4))

array([[109, 115, 164, 128],
       [189, 193, 129, 108],
       [173, 100, 140, 136]])

|函数 |说明|
|:---:|:----|
|shuffle(a)| 根据数组a的第1轴进行随排列，改变数组x|
|permutation(a) |根据数组a的第1轴产生一个新的乱序数组，不改变数组x|
|choice(a[,size,replace,p]) |从一维数组a中以概率p抽取元素，形成size形状新数组 replace表示是否可以重用元素，默认为False|

In [30]:
a = np.random.randint(100,200,(3,4))

In [31]:
a

array([[116, 111, 154, 188],
       [162, 133, 172, 178],
       [149, 151, 154, 177]])

In [32]:
np.random.shuffle(a)

In [33]:
a

array([[116, 111, 154, 188],
       [149, 151, 154, 177],
       [162, 133, 172, 178]])

In [34]:
np.random.shuffle(a)

In [35]:
a

array([[162, 133, 172, 178],
       [116, 111, 154, 188],
       [149, 151, 154, 177]])

In [42]:
a = np.random.randint(100,200,(3,4))
a

array([[188, 111, 117, 146],
       [107, 175, 128, 133],
       [184, 196, 188, 144]])

In [43]:
np.random.permutation(a)

array([[184, 196, 188, 144],
       [188, 111, 117, 146],
       [107, 175, 128, 133]])

In [44]:
a

array([[188, 111, 117, 146],
       [107, 175, 128, 133],
       [184, 196, 188, 144]])

In [45]:
b = np.random.randint(100,200,(8,))

In [46]:
b

array([171, 188, 188, 150, 154, 134, 115, 177])

In [48]:
np.random.choice(b,(3, 2))

array([[171, 177],
       [134, 115],
       [115, 134]])

In [49]:
np.random.choice(b,(3, 2), replace=False)

array([[188, 177],
       [134, 154],
       [171, 150]])

In [50]:
np.random.choice(b,(3, 2), p = b/np.sum(b))

array([[150, 188],
       [154, 188],
       [188, 188]])

|函数 |说明|
|:---:|:---|
|uniform(low,high,size)| 产生具有均匀分布的数组,low起始值,high结束值,size形状|
|normal(loc,scale,size)| 产生具有正态分布的数组,loc均值,scale标准差,size形状|
|poisson(lam,size) |产生具有泊松分布的数组,lam随机事件发生率,size形状|

In [51]:
u = np.random.uniform(0, 10,(3,4))

In [52]:
u

array([[ 6.37951564,  3.72519952,  0.02406761,  5.48816356],
       [ 1.26971841,  0.79792681,  2.35038596,  6.59964947],
       [ 2.14953192,  2.03046616,  3.82865111,  2.24872802]])

In [53]:
n = np.random.normal(10, 5, (3,4))

In [54]:
n

array([[ 11.02692287,   5.23319662,  11.60758976,   2.39530663],
       [ -0.80726459,  11.72656647,  14.39484692,  12.81381503],
       [ 11.44217668,   4.96759248,  10.2698921 ,   0.86638368]])

## Numpy的统计函数

|函数| 说明|
|:----:|:---|
|sum(a, axis=None) |根据给定轴axis计算数组a相关元素之和，axis整数或元组|
|mean(a, axis=None) |根据给定轴axis计算数组a相关元素的期望，axis整数或元组|
|average(a,axis=None,weights=None) |根据给定轴axis计算数组a相关元素的加权平均值|
|std(a, axis=None) |根据给定轴axis计算数组a相关元素的标准差|
|var(a, axis=None) |根据给定轴axis计算数组a相关元素的方差|

>axis=None 是统计函数的标配参数

In [61]:
a = np.arange(15).reshape(3, 5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [62]:
np.sum(a)

105

In [64]:
np.sum(a, axis = 1)

array([10, 35, 60])

In [65]:
np.mean(a, axis = 1)

array([  2.,   7.,  12.])

In [66]:
np.mean(a, axis = 0)

array([ 5.,  6.,  7.,  8.,  9.])

In [68]:
np.average(a, axis=0, weights=[10, 5, 1])#2*10+7*5+1*12/(10+5+1)=4.1875

array([ 2.1875,  3.1875,  4.1875,  5.1875,  6.1875])

|函数| 说明|
|:--:|:---|
|min(a) max(a) |计算数组a中元素的最小值、最大值|
|argmin(a) argmax(a) |计算数组a中元素最小值、最大值的降一维后下标|
|unravel_index(index, shape) |根据shape将一维下标index转换成多维下标|
|ptp(a) |计算数组a中元素最大值与最小值的差|
|median(a) |计算数组a中元素的中位数（中值）|

In [69]:
b = np.arange(15,0,-1).reshape(3, 5)

In [70]:
b

array([[15, 14, 13, 12, 11],
       [10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

In [71]:
np.max(b)

15

In [72]:
np.argmax(b)

0

In [73]:
np.unravel_index(np.argmax(b), b.shape)

(0, 0)

In [74]:
np.ptp(b)

14

In [75]:
np.median(b)

8.0

## NUmpy的梯度函数

|函数 |说明|
|:---:|:--|
|np.gradient(f)| 计算数组f中元素的梯度，当f为多维时，返回每个维度梯度|

> 梯度：连续值之间的变化率，即斜率   
> XY坐标轴连续三个X坐标对应的Y轴值：a, b, c，其中，b的梯度是： (c‐a)/2

In [76]:
a = np.random.randint(0, 20, (5))

In [77]:
a

array([14, 17, 13,  9,  1])

In [78]:
np.gradient(a)

array([ 3. , -0.5, -4. , -6. , -8. ])

In [79]:
b = np.random.randint(0, 20, (3, 5))

In [80]:
b

array([[ 4,  7,  7,  9,  7],
       [ 0,  3,  9, 12,  4],
       [ 6, 14, 10,  9,  2]])

In [81]:
np.gradient(b)

[array([[ -4. ,  -4. ,   2. ,   3. ,  -3. ],
        [  1. ,   3.5,   1.5,   0. ,  -2.5],
        [  6. ,  11. ,   1. ,  -3. ,  -2. ]]),
 array([[ 3. ,  1.5,  1. ,  0. , -2. ],
        [ 3. ,  4.5,  4.5, -2.5, -8. ],
        [ 8. ,  2. , -2.5, -4. , -7. ]])]