# Numpy 简介

numpy是一个功能强大的工具集，用来执行数字列表的数学运算。 它比普通的Python列表操作更快，也可以处理高维数组。

更多帮助可参见:

 - http://wiki.scipy.org/Tentative_NumPy_Tutorial
 - http://docs.scipy.org/doc/numpy/reference/

> SciPy (pronounced “Sigh Pie”) is a Python-based ecosystem of open-source software for mathematics, science, and engineering.
>
>  *http://www.scipy.org/*

所以NumPy是基于NumPy NDArray的优化性能，是python数据科学的更大的生态系统的一部分。

它包含这些核心软件包：

<table>
<tr>
    <td style="background:Lavender;"><img src="http://www.scipy.org/_static/images/numpylogo_med.png"  style="width:50px;height:50px;" /></td>
    <td style="background:Lavender;"><h4>NumPy</h4> Base N-dimensional array package </td>
    <td><img src="http://www.scipy.org/_static/images/scipy_med.png" style="width:50px;height:50px;" /></td>
    <td><h4>SciPy</h4> Fundamental library for scientific computing </td>
    <td><img src="http://www.scipy.org/_static/images/matplotlib_med.png" style="width:50px;height:50px;" /></td>
    <td><h4>Matplotlib</h4> Comprehensive 2D Plotting </td>
</tr>
<tr>
    <td><img src="http://www.scipy.org/_static/images/ipython.png" style="width:50px;height:50px;" /></td>
    <td><h4>IPython</h4> Enhanced Interactive Console </td>
    <td><img src="http://www.scipy.org/_static/images/sympy_logo.png" style="width:50px;height:50px;" /></td>
    <td><h4>SymPy</h4> Symbolic mathematics </td>
    <td><img src="http://www.scipy.org/_static/images/pandas_badge2.jpg" style="width:50px;height:50px;" /></td>
    <td><h4>Pandas</h4> Data structures & analysis </td>
</tr>
</table>





## 加载Numpy库

### Import numpy library as np
这有助于编写代码，这几乎是常识性的操作。

In [1]:
import numpy as np

## Working with ndarray

用 np.arange 方法生成一个 ndarray 

#### np.arange([start,] stop[, step,], dtype=None)

In [2]:
np.arange?

In [3]:
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [4]:
np.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [5]:
np.arange(1,10, 0.5)

array([ 1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,  5.5,  6. ,
        6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5])

In [6]:
np.arange(1,10, 3)

array([1, 4, 7])

In [7]:
np.arange(1,10, 2, dtype=np.float64)

array([ 1.,  3.,  5.,  7.,  9.])

### 探索 ndrray

In [2]:
ds = np.arange(1,10,2)
ds.ndim  #查看维度

1

In [8]:
ds.shape #另一种方式查看维度

(5,)

In [9]:
ds.size  #查看元素个数

5

In [10]:
ds.dtype  #查看数据类型

dtype('int32')

In [11]:
ds.itemsize #一个元素占的内存空间大小

4

In [12]:
x=ds.data  #查看内存中存放的数据
x

<memory at 0x000002076F715A08>

In [13]:
list(x)  #

[1, 3, 5, 7, 9]

In [14]:
# 内存占用字节数
ds.size * ds.itemsize

20

## 为什么要用 numpy?

我们将比较创建两个列表上的操作所需的时间，并对它们进行一些基本的操作。

### 基本操作

In [3]:
python_list_1=range(1,1000)
python_list_2=range(1,1000)

In [4]:
%%capture timeit_python
%%timeit
# Regular Python
[(x + y) for x, y in zip(python_list_1, python_list_2)]
[(x - y) for x, y in zip(python_list_1, python_list_2)]
[(x * y) for x, y in zip(python_list_1, python_list_2)]
[(x / y) for x, y in zip(python_list_1, python_list_2)];

In [5]:
print(timeit_python)

1000 loops, best of 3: 330 us per loop



In [6]:
numpy_list_1=np.arange(1,1000)
numpy_list_2=np.arange(1,1000)

In [7]:
%%capture timeit_numpy
%%timeit
#Numpy
numpy_list_1 + numpy_list_2
numpy_list_1 - numpy_list_2
numpy_list_1 * numpy_list_2
numpy_list_1 / numpy_list_2

In [8]:
print(timeit_numpy)

The slowest run took 42.85 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.06 us per loop



## Numpy常用的一些函数

## ndarray创建

### array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)

```
Parameters
----------
object : array_like
    An array, any object exposing the array interface, an
    object whose __array__ method returns an array, or any
    (nested) sequence.
dtype : data-type, optional
    The desired data-type for the array.  If not given, then
    the type will be determined as the minimum type required
    to hold the objects in the sequence.  This argument can only
    be used to 'upcast' the array.  For downcasting, use the
    .astype(t) method.
copy : bool, optional
    If true (default), then the object is copied.  Otherwise, a copy
    will only be made if __array__ returns a copy, if obj is a
    nested sequence, or if a copy is needed to satisfy any of the other
    requirements (`dtype`, `order`, etc.).
order : {'C', 'F', 'A'}, optional
    Specify the order of the array.  If order is 'C' (default), then the
    array will be in C-contiguous order (last-index varies the
    fastest).  If order is 'F', then the returned array
    will be in Fortran-contiguous order (first-index varies the
    fastest).  If order is 'A', then the returned array may
    be in any order (either C-, Fortran-contiguous, or even
    discontiguous).
subok : bool, optional
    If True, then sub-classes will be passed-through, otherwise
    the returned array will be forced to be a base-class array (default).
ndmin : int, optional
    Specifies the minimum number of dimensions that the resulting
    array should have.  Ones will be pre-pended to the shape as
    needed to meet this requirement.
```

In [28]:
np.array([1,2,3,4,5])   

array([1, 2, 3, 4, 5])

#### Multi Dimentional Array

In [29]:
np.array([[1,2],[3,4],[5,6]])

array([[1, 2],
       [3, 4],
       [5, 6]])

### zeros(shape, dtype=float, order='C')

```
Parameters
----------
shape : int or sequence of ints
    Shape of the new array, e.g., ``(2, 3)`` or ``2``.
dtype : data-type, optional
    The desired data-type for the array, e.g., `numpy.int8`.  Default is
    `numpy.float64`.
order : {'C', 'F'}, optional
    Whether to store multidimensional data in C- or Fortran-contiguous
    (row- or column-wise) order in memory.
```

In [13]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [11]:
np.zeros((3,5), dtype=np.int64)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]], dtype=int64)

### np.linspace(start, stop, num=50, endpoint=True, retstep=False)

```
Parameters
----------
start : scalar
    The starting value of the sequence.
stop : scalar
    The end value of the sequence, unless `endpoint` is set to False.
    In that case, the sequence consists of all but the last of ``num + 1``
    evenly spaced samples, so that `stop` is excluded.  Note that the step
    size changes when `endpoint` is False.
num : int, optional
    Number of samples to generate. Default is 50.
endpoint : bool, optional
    If True, `stop` is the last sample. Otherwise, it is not included.
    Default is True.
retstep : bool, optional
    If True, return (`samples`, `step`), where `step` is the spacing
    between samples.
```

In [32]:
np.linspace(1,5)

array([ 1.        ,  1.08163265,  1.16326531,  1.24489796,  1.32653061,
        1.40816327,  1.48979592,  1.57142857,  1.65306122,  1.73469388,
        1.81632653,  1.89795918,  1.97959184,  2.06122449,  2.14285714,
        2.2244898 ,  2.30612245,  2.3877551 ,  2.46938776,  2.55102041,
        2.63265306,  2.71428571,  2.79591837,  2.87755102,  2.95918367,
        3.04081633,  3.12244898,  3.20408163,  3.28571429,  3.36734694,
        3.44897959,  3.53061224,  3.6122449 ,  3.69387755,  3.7755102 ,
        3.85714286,  3.93877551,  4.02040816,  4.10204082,  4.18367347,
        4.26530612,  4.34693878,  4.42857143,  4.51020408,  4.59183673,
        4.67346939,  4.75510204,  4.83673469,  4.91836735,  5.        ])

In [33]:
np.linspace(0,2,num=4)

array([ 0.        ,  0.66666667,  1.33333333,  2.        ])

In [34]:
np.linspace(0,2,num=4,endpoint=False)

array([ 0. ,  0.5,  1. ,  1.5])

### random_sample(size=None)

```
Parameters
----------
size : int or tuple of ints, optional
    Defines the shape of the returned array of random floats. If None
    (the default), returns a single float.
```

In [35]:
np.random.random((2,3))

array([[ 0.54853722,  0.66725888,  0.33680362],
       [ 0.32335562,  0.24570423,  0.6398063 ]])

In [36]:
np.random.random_sample((2,3))

array([[ 0.7731598 ,  0.35296059,  0.58136254],
       [ 0.68026162,  0.17260933,  0.87473458]])

## Numpy的统计分析

In [37]:
data_set = np.random.random((2,3))
data_set

array([[ 0.51930738,  0.58462712,  0.99079741],
       [ 0.06460546,  0.35145993,  0.76202154]])

### np.max(a, axis=None, out=None, keepdims=False)

```
Parameters
----------
a : array_like
    Input data.
axis : int, optional
    Axis along which to operate.  By default, flattened input is used.
out : ndarray, optional
    Alternative output array in which to place the result.  Must
    be of the same shape and buffer length as the expected output.
    See `doc.ufuncs` (Section "Output arguments") for more details.
keepdims : bool, optional
    If this is set to True, the axes which are reduced are left
    in the result as dimensions with size one. With this option,
    the result will broadcast correctly against the original `arr`.
```

In [38]:
np.max(data_set)

0.99079741386293785

In [39]:
np.max(data_set, axis=0)

array([ 0.51930738,  0.58462712,  0.99079741])

In [40]:
np.max(data_set, axis=1)

array([ 0.99079741,  0.76202154])

### np.min(a, axis=None, out=None, keepdims=False)

In [41]:
np.min(data_set)

0.064605459463033088

### np.mean(a, axis=None, dtype=None, out=None, keepdims=False)

In [42]:
np.mean(data_set)

0.54546980564625047

### np.median(a, axis=None, out=None, overwrite_input=False)

In [43]:
np.median(data_set)

0.55196724742830794

### np.std(a, axis=None, dtype=None, out=None, ddof=0, keepdims=False)

In [44]:
np.std(data_set)

0.29334264736691923

### np.sum(a, axis=None, dtype=None, out=None, keepdims=False)

In [45]:
np.sum(data_set)

3.2728188338775026

## 重塑数据形状(Reshaping)

### np.reshape(a, newshape, order='C')

In [46]:
data_set

array([[ 0.51930738,  0.58462712,  0.99079741],
       [ 0.06460546,  0.35145993,  0.76202154]])

In [47]:
np.reshape(data_set, (3,2))

array([[ 0.51930738,  0.58462712],
       [ 0.99079741,  0.06460546],
       [ 0.35145993,  0.76202154]])

In [48]:
np.reshape(data_set, (6,1))

array([[ 0.51930738],
       [ 0.58462712],
       [ 0.99079741],
       [ 0.06460546],
       [ 0.35145993],
       [ 0.76202154]])

In [49]:
np.reshape(data_set, (6))

array([ 0.51930738,  0.58462712,  0.99079741,  0.06460546,  0.35145993,
        0.76202154])

### np.ravel(a, order='C')

In [50]:
data_set

array([[ 0.51930738,  0.58462712,  0.99079741],
       [ 0.06460546,  0.35145993,  0.76202154]])

In [51]:
np.ravel(data_set)   #数组扁平化

array([ 0.51930738,  0.58462712,  0.99079741,  0.06460546,  0.35145993,
        0.76202154])

In [52]:
np.ravel?

### Slicing（切片）

In [54]:
data_set = np.random.random((5,10))
data_set

array([[ 0.59418707,  0.3485941 ,  0.73377769,  0.10649973,  0.11470625,
         0.58037787,  0.10049875,  0.34082015,  0.91530599,  0.16553292],
       [ 0.2117392 ,  0.48241083,  0.13690005,  0.0530091 ,  0.15767604,
         0.44046033,  0.46562058,  0.65375921,  0.32870468,  0.91519441],
       [ 0.6596611 ,  0.70576365,  0.26100397,  0.17555837,  0.93397552,
         0.66885039,  0.28742459,  0.81639524,  0.05278959,  0.83406543],
       [ 0.42488971,  0.12220396,  0.08628983,  0.87785502,  0.26907759,
         0.25024491,  0.86436533,  0.22141363,  0.86238346,  0.62504678],
       [ 0.01765955,  0.48565731,  0.92533196,  0.71136182,  0.90043605,
         0.91703742,  0.74368159,  0.36985674,  0.06553571,  0.78600482]])

In [55]:
data_set[1]

array([ 0.2117392 ,  0.48241083,  0.13690005,  0.0530091 ,  0.15767604,
        0.44046033,  0.46562058,  0.65375921,  0.32870468,  0.91519441])

In [56]:
data_set[1][0]

0.2117391998914816

In [57]:
data_set[1,0]

0.2117391998914816

#### Slicing a range

In [58]:
data_set[2:4]

array([[ 0.6596611 ,  0.70576365,  0.26100397,  0.17555837,  0.93397552,
         0.66885039,  0.28742459,  0.81639524,  0.05278959,  0.83406543],
       [ 0.42488971,  0.12220396,  0.08628983,  0.87785502,  0.26907759,
         0.25024491,  0.86436533,  0.22141363,  0.86238346,  0.62504678]])

In [59]:
data_set[2:4,0]

array([ 0.6596611 ,  0.42488971])

In [60]:
data_set[2:4,0:2]

array([[ 0.6596611 ,  0.70576365],
       [ 0.42488971,  0.12220396]])

In [61]:
data_set[:,0]

array([ 0.59418707,  0.2117392 ,  0.6596611 ,  0.42488971,  0.01765955])

#### Stepping（步长）

In [62]:
data_set[2:4:1]

array([[ 0.6596611 ,  0.70576365,  0.26100397,  0.17555837,  0.93397552,
         0.66885039,  0.28742459,  0.81639524,  0.05278959,  0.83406543],
       [ 0.42488971,  0.12220396,  0.08628983,  0.87785502,  0.26907759,
         0.25024491,  0.86436533,  0.22141363,  0.86238346,  0.62504678]])

In [74]:
data_set[::]

array([[ 0.95365395,  0.32505267,  0.82480618,  0.46036321,  0.34338836,
         0.12881573,  0.01026362,  0.68100446,  0.94315764,  0.62914417],
       [ 0.93864263,  0.39590845,  0.21333959,  0.7078367 ,  0.70270114,
         0.32421766,  0.75809732,  0.04037557,  0.9221513 ,  0.67665465],
       [ 0.9611624 ,  0.37361325,  0.6545133 ,  0.53755736,  0.45486164,
         0.6394931 ,  0.30980385,  0.33339197,  0.94446133,  0.48289455],
       [ 0.789572  ,  0.02115418,  0.48178952,  0.78025341,  0.01631776,
         0.77592566,  0.30823739,  0.56459575,  0.71829307,  0.65602318],
       [ 0.01852271,  0.07014847,  0.70841559,  0.73837653,  0.74884475,
         0.18828002,  0.48843505,  0.79682205,  0.76594343,  0.81109415]])

In [63]:
data_set[::2]

array([[ 0.59418707,  0.3485941 ,  0.73377769,  0.10649973,  0.11470625,
         0.58037787,  0.10049875,  0.34082015,  0.91530599,  0.16553292],
       [ 0.6596611 ,  0.70576365,  0.26100397,  0.17555837,  0.93397552,
         0.66885039,  0.28742459,  0.81639524,  0.05278959,  0.83406543],
       [ 0.01765955,  0.48565731,  0.92533196,  0.71136182,  0.90043605,
         0.91703742,  0.74368159,  0.36985674,  0.06553571,  0.78600482]])

In [64]:
data_set[2:4]

array([[ 0.6596611 ,  0.70576365,  0.26100397,  0.17555837,  0.93397552,
         0.66885039,  0.28742459,  0.81639524,  0.05278959,  0.83406543],
       [ 0.42488971,  0.12220396,  0.08628983,  0.87785502,  0.26907759,
         0.25024491,  0.86436533,  0.22141363,  0.86238346,  0.62504678]])

In [65]:
data_set[2:4,::2]

array([[ 0.6596611 ,  0.26100397,  0.93397552,  0.28742459,  0.05278959],
       [ 0.42488971,  0.08628983,  0.26907759,  0.86436533,  0.86238346]])

# Thanks