# Computation on NumPy Arrays: Universal Functions

# NumPy數組運算：通用函數

## The Slowness of Loops 循環，慢的實現

> The relative sluggishness of Python generally manifests itself in situations where many small operations are being repeated – for instance looping over arrays to operate on each element.
For example, imagine we have an array of values and we'd like to compute the reciprocal of each.
A straightforward approach might look like this: This implementation probably feels fairly natural to someone from, say, a C or Java background. But if we measure the execution time of this code for a large input, we see that this operation is very slow, perhaps surprisingly so! We'll benchmark this with IPython's ``%timeit`` magic (discussed in [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb)):

上面的代碼實現對於很多具有C或者Java語言背景的讀者來說是非常自然的。但是如果我們在一個很大的數據集上測量上面代碼的執行時間，我們會發現這個操作很慢，甚至慢的讓你吃驚。下面使用`%timeit`魔術指令（參見[性能測算和計時](01.07-Timing-and-Profiling.ipynb)）對一個大數據集進行測時：Python 當重複進行很多細微操作時表現相對低效，比方說一個數組中的每個元素進行循環操作。例如我們有一個數組，現在我們需要計算每個元素的倒數。

In [None]:
import numpy as np
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

In [None]:
big_array = np.random.randint(1, 100, size=1000000)
%timeit compute_reciprocals(big_array)

> It takes several seconds to compute these million operations and to store the result!
When even cell phones have processing speeds measured in Giga-FLOPS (i.e., billions of numerical operations per second), this seems almost absurdly slow.
It turns out that the bottleneck here is not the operations themselves, but the type-checking and function dispatches that CPython must do at each cycle of the loop.
Each time the reciprocal is computed, Python first examines the object's type and does a dynamic lookup of the correct function to use for that type.
If we were working in compiled code instead, this type specification would be known before the code executes and the result could be computed much more efficiently.

這個操作對於百萬級的數據集耗時需要幾秒。當現在手機的每秒浮點數運算次數都已經已經達到10億級別，這實在是不可思議的慢了。通過分析發現瓶頸並不是代碼本身，而是每次循環時CPython必須執行的類型檢查和函數匹配。每次計算倒數時，Python首先需要檢查對象的類型，然後尋找一個最合適的函數對這種類型進行計算。如果我們使用編譯型的語言實現上面的代碼，每次計算的時候，類型和應該執行的函數都已經確定，因此執行的時間肯定短很多。

## Introducing UFuncs UFuncs
## 介紹

> For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a *vectorized* operation.
This can be accomplished by simply performing an operation on the array, which will then be applied to each element.
This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution. Looking at the execution time for our big array, we see that it completes orders of magnitude faster than the Python loop:

對於許多操作，NumPy都為這種靜態類型提供了編譯好的函數。被稱為*向量化*的操作。向量化操作可以簡單應用在數組上，實際上會應用在每一個元素上。實現原理就是將循環的部分放進NumPy編譯後的那個層次，從而提高性能。下面使用ufuncs來測算執行時間，我們可以看到執行時間相差了好幾個數量級：

In [None]:
print(compute_reciprocals(values))
print(1.0 / values)

In [None]:
%timeit (1.0 / big_array)

> Vectorized operations in NumPy are implemented via *ufuncs*, whose main purpose is to quickly execute repeated operations on values in NumPy arrays.
Ufuncs are extremely flexible – before we saw an operation between a scalar and an array, but we can also operate between two arrays:

NumPy中的向量化操作是通過*ufuncs*實現的，其主要目的就是在NumPy數組中快速執行重複的元素操作。 Ufuncs是極端靈活的，我們上面看到是標量和數組間的操作，但是我們也可以將它們用在兩個數組之間：

In [None]:
np.arange(5) / np.arange(1, 6)

In [5]:
x = np.arange(9).reshape((3, 3))
2 ** x

NameError: name 'np' is not defined

### Array arithmetic 數組運算

> NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic operators.
The standard addition, subtraction, multiplication, and division can all be used:

NumPy的ufuncs用起來非常的自然和人性化，因為它們採用了Python本身的算術運算符號 - 標準的加法、剪髮、乘法和除法實現：

In [None]:
x = np.arange(4)
print("x     =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2)  # 整除

> There is also a unary ufunc for negation, and a ``**`` operator for exponentiation, and a ``%`` operator for modulus:

下面是一元的取反，`**`求幂和`%`取模：看到的這些算術運算操作，都是NumPy中相應函數的簡化寫法；例如+號實際上是add函數的封裝

In [None]:
print("-x     = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2  = ", x % 2)

In [None]:
-(0.5*x + 1) ** 2

In [None]:
np.add(x, 2)

In [None]:
import numpy as np
arr=np.array([1,3,5,2,6])
np.cos(arr)
np.median(arr)
np.mean(arr)
np.std(arr)

> he following table lists the arithmetic operators implemented in NumPy: Additionally there are Boolean/bitwise operators; we will explore these in [Comparisons, Masks, and Boolean Logic](02.06-Boolean-Arrays-and-Masks.ipynb).

下表列出NumPy實現的運算符號及對應的ufunc函數：除此之外還有布爾和二進制位操作；我們會在[比較，遮蓋和布爾邏輯](02.06-Boolean-Arrays-and-Masks.ipynb)中介紹它們。

| 運算符	    | 對應的ufunc函數    | 說明                           |
|---------------|---------------------|---------------------------------------|
|``+``          |``np.add``           |加法 (例如 ``1 + 1 = 2``)         |
|``-``          |``np.subtract``      |減法 (例如 ``3 - 2 = 1``)      |
|``-``          |``np.negative``      |一元取負 (例如 ``-2``)          |
|``*``          |``np.multiply``      |乘法 (例如 ``2 * 3 = 6``)   |
|``/``          |``np.divide``        |除法 (例如 ``3 / 2 = 1.5``)       |
|``//``         |``np.floor_divide``  |整除 (例如 ``3 // 2 = 1``)  |
|``**``         |``np.power``         |求冪 (例如 ``2 ** 3 = 8``)  |
|``%``          |``np.mod``           |模除 (例如 ``9 % 4 = 1``)|

### Absolute value 絕對值

> Just as NumPy understands Python's built-in arithmetic operators, it also understands Python's built-in absolute value function:

就像NumPy能夠理解Python內建的算術操作一樣，它同樣能理解Python內建的絕對值函數：

In [None]:
x = np.array([-2, -1, 0, 1, 2])
abs(x)

In [None]:
np.absolute(x)  # NumPy's ufunc

In [None]:
np.abs(x)  # NumPy的ufunc short

In [None]:
#這個ufunc可以處理複數，返回的是矢量的長度：
x = np.array([3 - 4j, 4 - 3j, 2 + 0j, 0 + 1j])
np.abs(x)

### Trigonometric functions 三角函數

> NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist are the trigonometric functions.
We'll start by defining an array of angles:

NumPy提供了大量的有用的ufuncs，對於數據科學加來說非常有用的還包括三角函數。我們先定義一個角度的數組：

In [None]:
theta = np.linspace(0, np.pi, 3)
print("theta      = ", theta)
print("sin(theta) = ", np.sin(theta)) # 正弦
print("cos(theta) = ", np.cos(theta)) # 餘弦
print("tan(theta) = ", np.tan(theta)) # 正切

In [None]:
x = [-1, 0, 1]
print("x         = ", x)
print("arcsin(x) = ", np.arcsin(x)) # 反正弦
print("arccos(x) = ", np.arccos(x)) # 反餘弦
print("arctan(x) = ", np.arctan(x)) # 反正切

### Exponents and logarithms 指數和對數

> Another common type of operation available in a NumPy ufunc are the exponentials:

NumPy中另一種常用操作是指數：

In [None]:
x = [1, 2, 3]
print("x     =", x)
print("e^x   =", np.exp(x))
print("2^x   =", np.exp2(x))
print("3^x   =", np.power(3, x))

In [None]:
x = [1, 2, 4, 10]
print("x        =", x)
print("ln(x)    =", np.log(x))
print("log2(x)  =", np.log2(x))
print("log10(x) =", np.log10(x))

In [None]:
x = [0, 0.001, 0.01, 0.1]
print("exp(x) - 1 =", np.expm1(x))
print("log(1 + x) =", np.log1p(x))

### Specialized ufuncs 特殊的ufuncs

> NumPy has many more ufuncs available, including hyperbolic trig functions, bitwise arithmetic, comparison operators, conversions from radians to degrees, rounding and remainders, and much more.
A look through the NumPy documentation reveals a lot of interesting functionality.

NumPy包含更多的ufuncs，包括雙曲函數，二進制位運算，比較操作，角度弧度轉換，舍入以及求餘數等等。參考NumPy的在線文檔你可以看到很多有趣的函數說明。

> Another excellent source for more specialized and obscure ufuncs is the submodule ``scipy.special``.
If you want to compute some obscure mathematical function on your data, chances are it is implemented in ``scipy.special``.
There are far too many functions to list them all, but the following snippet shows a couple that might come up in a statistics context:

在`scipy.special`模塊中還有更多的特殊及難懂的ufuncs。如果你需要計算使用到晦澀數學函數操作你的數據，基本上你都可以在這個模塊中找到。下面列出了部分與數據統計相關的ufuncs，還有很多因為篇幅關係並未列出。

In [None]:
from scipy import special

In [None]:
# 伽瑪函數（通用階乘函數）及相關函數
x = [1, 5, 10]
print("gamma(x)     =", special.gamma(x)) # 伽瑪函數
print("ln|gamma(x)| =", special.gammaln(x)) # 伽瑪函數的自然對數
print("beta(x, 2)   =", special.beta(x, 2)) # 貝塔函數（第一類歐拉積分）

In [None]:
# 誤差函數 (高斯函數積分) 
x = np.array([0, 0.3, 0.7, 1.0])
print("erf(x)  =", special.erf(x)) # 誤差函數
print("erfc(x) =", special.erfc(x)) # 互補誤差函數
print("erfinv(x) =", special.erfinv(x)) # 逆誤差函數

## Advanced Ufunc Features

## 高級Ufunc特性

> Many NumPy users make use of ufuncs without ever learning their full set of features.
We'll outline a few specialized features of ufuncs here.

許多NumPy用戶在使用ufuncs的時候都沒有了解它們完整特性。我們在這裡會簡單介紹一些特別的特性。

### Specifying output 指定輸出

> For large calculations, it is sometimes useful to be able to specify the array where the result of the calculation will be stored.
Rather than creating a temporary array, this can be used to write computation results directly to the memory location where you'd like them to be.
For all ufuncs, this can be done using the ``out`` argument of the function:

對於大數據量的計算，有時指定存儲輸出數據的數組是很有用的。指定輸出結果的內存位置能夠避免創建臨時的數組。所有的ufuncs都能通過指定`out`參數來指定輸出的數組。

In [None]:
x = np.arange(5)
y = np.empty(5)
np.multiply(x, 10, out=y) # 指定結果存儲在y數組中
print(y)

In [None]:
y = np.zeros(10)
np.power(2, x, out=y[::2]) # 指定结果存储在y数组中，每隔一个元素存一个
print(y)

> If we had instead written ``y[::2] = 2 ** x``, this would have resulted in the creation of a temporary array to hold the results of ``2 ** x``, followed by a second operation copying those values into the ``y`` array.
This doesn't make much of a difference for such a small computation, but for very large arrays the memory savings from careful use of the ``out`` argument can be significant.

如果你沒使用`out`參數，而是寫成`y[::2] = 2 ** x`，這回導致首先創建一個臨時數組用來存儲`2 ** x`，然後再將這些值複製到y數組中。對於上面這麼小的數組來說，其實沒有什麼區別，但是如果對像是一個非常大的數組，使用`out`參數能節省很多內存空間。

### Aggregates 聚合

> For binary ufuncs, there are some interesting aggregates that can be computed directly from the object.
For example, if we'd like to *reduce* an array with a particular operation, we can use the ``reduce`` method of any ufunc. A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.For example, calling ``reduce`` on the ``add`` ufunc returns the sum of all elements in the array:

對於二元運算ufuncs來說，還有一些很有趣的聚合函數可以直接從數組中計算出結果。例如，如果你想`reduce`一個數組，你可以對於任何ufuncs應用`reduce`方法。 reduce會重複在數組的每一個元素進行ufunc的操作，直到最後得到一個標量。例如，在`add` ufunc上調用`reduce`會返回所有元素的總和：

In [None]:
x = np.arange(1, 6)
np.add.reduce(x)

In [None]:
np.multiply.reduce(x)  #返回所有元素的乘積

In [None]:
np.add.accumulate(x)  #得到每一步計算得到的中間結果

In [None]:
np.multiply.accumulate(x)

### Outer products 外積

> Finally, any ufunc can compute the output of all pairs of two different inputs using the ``outer`` method.
This allows you, in one line, to do things like create a multiplication table:

最後，任何ufunc都可以計算輸入的每一對元素的結果，使用`outer`方法。你可以一行代碼就完成類似創建乘法表的功能：

In [None]:
x = np.arange(1, 6)
np.multiply.outer(x, x)

> The ``ufunc.at`` and ``ufunc.reduceat`` methods, which we'll explore in [Fancy Indexing](02.07-Fancy-Indexing.ipynb), are very helpful as well. Another extremely useful feature of ufuncs is the ability to operate between arrays of different sizes and shapes, a set of operations known as *broadcasting*.
This subject is important enough that we will devote a whole section to it (see [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb)).

Ufuncs還有一個極端有用的特性，能讓ufuncs在不同長度和形狀的數組之間進行計算，這是一組被稱為*廣播*的方法。這是一個非常重要的內容，因此我們會專門在[在數組上計算：廣播](02.05-Computation-on-arrays-broadcasting.ipynb)小節中進行介紹。`ufunc.at`和`ufunc.reduceat`方法也非常有用，我們會在[高級索引](02.07-Fancy-Indexing.ipynb)中詳細討論。

### Data Types for ndarrays

In [11]:
arr1 = np.array([1, 2, 3], dtype=np.float64)
arr2 = np.array([1, 2, 3], dtype=np.int32)
arr1.dtype
arr2.dtype

dtype('int32')

In [12]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype
float_arr = arr.astype(np.float64)
float_arr.dtype

dtype('float64')

In [13]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr
arr.astype(np.int32)

array([ 3, -1, -2,  0, 12, 10], dtype=int32)

In [14]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)

array([ 1.25, -9.6 , 42.  ])

In [15]:
int_array = np.arange(10)
calibers = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(calibers.dtype)

array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])

In [16]:
empty_uint32 = np.empty(8, dtype='u4')
empty_uint32

array([         0, 2952790016, 2834399368, 4026533885, 2834366466,
       1610614781,          0,     131072], dtype=uint32)