# 1.0 Introduction to Numpy

<div class="alert alert-block alert-info" style="margin-top: 20px">
<font size = 3><strong>In this notebook we will overview the basics of Numpy packages</strong></font>
<br>
<h2>Table of Contents</h2>
<ol>
    <li><a href="#ref2">Understanding data types in python</a></li>
    <li><a href="#ref3">The basics of Numpy Arrays</a></li>
    <li><a href="#ref4">Computation on Numpy Arrays: Universal Function</a></li>
    <li><a href="#ref5">Aggregations: Min, Max, and Everything In Between</a></li>
    <li><a href="#ref6">Computation on Arrays: Broadcasting</a></li>
    <li><a href="#ref7">Comparisons, Masks, and Boolean Logic</a></li>
    <li><a href="#ref8">Fancy Indexing</a></li>
    <li><a href="#ref9">Sorting Arrays</a></li>
    <li><a href="#ref10">Structured Data: NumPy's Structured Arrays</a></li>
</ol>
<p></p>
</div>
<br>

<hr>

<li>
Images–particularly digital images    ----    two-dimensional arrays of numbers representing pixel brightness across the area</li>
<li>Sound clips   ----    one-dimensional arrays of intensity versus time</li>
<li>Text    ----    in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words</li>

So, efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science.

NumPy arrays provide much more efficient storage and data operations as the arrays grow large in size.

In [1]:
import numpy
numpy.__version__

'1.16.5'

In [7]:
import numpy as np

In [8]:
np?

In [9]:
# ex: 
help(np.sort)

Help on function sort in module numpy:

sort(a, axis=-1, kind='quicksort', order=None)
    Return a sorted copy of an array.
    
    Parameters
    ----------
    a : array_like
        Array to be sorted.
    axis : int or None, optional
        Axis along which to sort. If None, the array is flattened before
        sorting. The default is -1, which sorts along the last axis.
    kind : {'quicksort', 'mergesort', 'heapsort', 'stable'}, optional
        Sorting algorithm. Default is 'quicksort'.
    order : str or list of str, optional
        When `a` is an array with fields defined, this argument specifies
        which fields to compare first, second, etc.  A single field can
        be specified as a string, and not all fields need be specified,
        but unspecified fields will still be used, in the order in which
        they come up in the dtype, to break ties.
    
    Returns
    -------
    sorted_array : ndarray
        Array of the same type and shape as `a`.
    
    S

Available subpackages
---------------------
<p><b>doc</b>
    
Topical documentation on broadcasting, indexing, etc.<p>
<p><b>lib</b>
    
Basic functions used by several sub-packages.<p>
<p><b>random</b>
    
Core Random Tools<p>
<p><b>linalg</b>
    
Core Linear Algebra Tools<p>
<p><b>fft</b>
    
Core FFT routines<p>
<p><b>polynomial</b>
    
Polynomial tools<p>
<p><b>testing</b>
    
NumPy testing tools<p>
<p><b>f2py</b>
    
Fortran to Python Interface Generator.<p>
<p><b>distutils</b>
    
Enhancements to distutils with support for Fortran compilers support and more.<p>

Utilities
---------
<p><b>test</b>
    
Run numpy unittests<p>
<p><b>show_config</b>
    
Show numpy build configuration<p>
<p><b>dual</b>
    
Overwrite certain functions with high-performance Scipy tools<p>
<p><b>matlib</b>
    
Make everything matrices.<p>

<a id="ref2"></a>
# 1.1 Understanding data types in python

<div class="alert alert-block alert-info" style="margin-top: 20px">
<h2>Table of Contents</h2>
<ol>
    <li><a href="#ref1.1">Python has dynamic typing style</a></li>
    <li><a href="#ref1.2">A Python integer is more than just an integer</a></li>
    <li><a href="#ref1.3">A Python list is more than just a list</a></li>
    <li><a href="#ref1.4">Fixed-Type arrays in Python</a></li>
    <li><a href="#ref1.5">Creating Arrays from Python Lists</a></li>
    <li><a href="#ref1.6">Creating Arrays from Scratch</a></li>
    <li><a href="#ref1.7">Numpy Standard Data Types</a></li>
</ol>
</div>

<hr>    

<a id="ref1.1"></a>
## 1.1.1 Python has dynamic typing style

A statically-typed language like C or Java requires each variable to be explicitly declared, a dynamically-typed language like Python skips this specification.

In [None]:
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}

In [None]:
# Python code
result = 0
for i in range(100):
    result += i

In C, the data types of each variable are explicitly declared, while in Python the types are dynamically inferred.

It means, we can assign any kind of data to any variable:

In [None]:
# Python code
x = 4
x = "four"

In [None]:
/* C code */
int x = 4;
x = "four";  // FAILS

<hr>

<a id="ref1.2"></a>
## 1.1.2 A Python integer is more than just an integer

主要就在講 Python implementation is written in C 所以 every Python object is simply a cleverly-disguised C structure.

A single integer in Python actually contains four pieces:
<li><b>ob_refcnt</b>: a reference count that helps Python silently handle memory allocation and deallocation</li>
<li><b>ob_type</b>: which encodes the type of the variable</li>
<li><b>ob_size</b>: which specifies the size of the following data members</li>
<li><b>ob_digit</b>: which contains the actual integer value that we expect the Python variable to represent.</li>

### ---Important---
* A C integer is essentially a label for a position in memory whose bytes encode an integer value. 
* A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value.

<strong>結論: 所以python比C來的沒有效率, 但是可以code很freely</strong>

<hr>

<a id="ref1.3"></a>
## 1.1.3 A Python list is more than just a list

### Python一個list可以存取不同的data type

In [14]:
# A list of interger

L = list(range(10))
print(L, type(L[0]))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] <class 'int'>


In [15]:
# A list of string

L2 = [str(c) for c in L]
print(L2, type(L[0]))

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'] <class 'int'>


In [17]:
# 不同data type

L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

[bool, str, float, int]

<strong>Cost</strong>: To allow these flexible types, each item in the list must contain its own type info, reference count, and other information–that is, <strong>each item is a complete Python object</strong>.

當你整個array都是same type的時候, 很多訊息就重複了

這個時候fixed-type array(Numpy-style)就比dynamic array來的有效率了

<img
src = 'https://res.cloudinary.com/dhk2edkft/image/upload/v1552287711/nwythfwzfpi7iurgvjzb.jpg'>

<strong>結論: The advantage of the list is flexibility: because each list element is a full structure containing both data and type information, the list can be filled with data of any desired type. 
    
Fixed-type NumPy-style arrays lack this flexibility, but are much more efficient for storing and manipulating data.</strong>

<hr>

<a id="ref1.4"></a>
## 1.1.4 FIxed-Type arrays in Python

python內建 <strong>array</strong> module讓你可以創造uniform type of array

In [18]:
import array
L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

那為什麼還要用Numpy呢?

因為numpy內建了其他operations讓你在處理資料運算時更方便

<hr>

<a id="ref1.5"></a>
## 1.1.5 Creating Arrays from Python Lists

In [19]:
# integer array:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

如果想要設定data type可以用 <strong>dtype</strong> keyword

In [20]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

Unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists

In [21]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

<hr>

<a id="ref1.6"></a>
## 1.1.6 Creating Arrays from Scratch

<ol>
    <li>np.zeros():  shape, dtype</li>
    <li>np.ones():  shape, dtype</li>
    <li>np.full():  shape, 要填入什麼</li>
    <li>np.arange(0,20,2):  頭, 尾, 隔幾步; 尾巴不包含</li>
    <li>np.linspace(0,1,5): 從幾到幾平均分割成幾等份</li>
    <li>np.random.random():  shape; uniform distribution</li>
    <li>np.random.normal():  mean, sd, shape; normal distribution</li>
    <li>np.random.randint(0,1,(3,3)):  從幾到幾之中隨機選取數字放進shape中</li>
    <li>np.eye(3):  3x3 identical matrix</li>
    <li>np.empty(3)</li>
</ol>

### Practice

In [24]:
# Create a length-10 integer array filled with zeros

np.zeros(10) # default dtpye = float
# 所以要記得設dtype
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [25]:
# Create a 3x5 floating-point array filled with ones

np.ones((3,5), dtype = float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [26]:
# Create a 3x5 array filled with 3.14

np.full((3,5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [27]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)

np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [28]:
# Create an array of five values evenly spaced between 0 and 1

np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [29]:
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1

np.random.random((3,3))

array([[0.70190658, 0.07760237, 0.48158627],
       [0.27533599, 0.74030971, 0.42572769],
       [0.7356823 , 0.29902731, 0.38655517]])

In [30]:
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1

np.random.normal(0,1,(3,3))

array([[ 0.38466322,  1.42381144,  0.71740105],
       [-0.1375204 ,  0.24656463, -0.65037409],
       [-0.69619562,  0.52855822,  0.75293831]])

In [31]:
# Create a 3x3 array of random integers in the interval [0, 10)

np.random.randint(0, 10, (3, 3))

array([[0, 1, 2],
       [2, 9, 4],
       [0, 0, 0]])

In [32]:
# Create a 3x3 identity matrix

np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [33]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location

np.empty(3)

array([1., 1., 1.])

<a id="ref1.7"></a>
## 1.1.7 Numpy Standard Data Types


Data type	Description

bool_	Boolean (True or False) stored as a byte

int_	Default integer type (same as C long; normally either int64 or int32)

intc	Identical to C int (normally int32 or int64)

intp	Integer used for indexing (same as C ssize_t; normally either int32 or int64)

int8	Byte (-128 to 127)

int16	Integer (-32768 to 32767)

int32	Integer (-2147483648 to 2147483647)

int64	Integer (-9223372036854775808 to 9223372036854775807)

uint8	Unsigned integer (0 to 255)

uint16	Unsigned integer (0 to 65535)

uint32	Unsigned integer (0 to 4294967295)

uint64	Unsigned integer (0 to 18446744073709551615)

float_	Shorthand for float64.

float16	Half precision float: sign bit, 5 bits exponent, 10 bits mantissa

float32	Single precision float: sign bit, 8 bits exponent, 23 bits mantissa

float64	Double precision float: sign bit, 11 bits exponent, 52 bits mantissa

complex_	Shorthand for complex128.

complex64	Complex number, represented by two 32-bit floats

complex128	Complex number, represented by two 64-bit floats

<a id="ref3"></a>
# 1.2 The basics of Numpy Arrays

5 個東西: Attribute, Indexing, Slicing, Reshaping, Join & Spliting

<div class="alert alert-block alert-info" style="margin-top: 20px">
<h2>Table of Contents</h2>
<ol>
    <li><a href="#ref2.1">NumPy Array Attributes</a></li>
    <li><a href="#ref2.2">Array Indexing: Accessing Single Elements</a></li>
    <li><a href="#ref2.3">Array Slicing: Accessing Subarrays</a></li>
    <li><a href="#ref2.4">Reshaping of Arrays</a></li>
    <li><a href="#ref2.5">Array Concatenation and Splitting</a></li>
</ol>
</div>

<hr>    

<a id="ref2.1"></a>
## 1.2.1 NumPy Array Attributes (主要4個)

<li>.ndim</li>number of dimension

<li>.shape</li> row, columns...

<li>.size</li> how many data points

<li>.dtype</li> data type

In [34]:
import numpy as np
np.random.seed(0)  # seed for reproducibility

x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

In [39]:
print(x3)

[[[8 1 5 9 8]
  [9 4 3 0 3]
  [5 0 2 3 8]
  [1 3 3 3 7]]

 [[0 1 9 9 0]
  [4 7 3 2 7]
  [2 0 0 4 5]
  [5 6 8 4 1]]

 [[4 9 8 1 1]
  [7 9 9 3 6]
  [7 2 0 3 5]
  [9 4 4 6 4]]]


In [40]:
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
print("x3 data type:", x3.dtype)

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60
x3 data type: int32


<a id="ref2.2"></a>
## 1.2.2 Array Indexing: Accessing Single Element
如何讀取個別element

In [42]:
x1

array([5, 0, 3, 3, 7, 9])

In [41]:
# one-dimensional
x1[0]

5

In [43]:
x1[1]

0

In [44]:
# one-dimensional the last one element 
x1[-1]

9

In [45]:
# one-dimensional the last two element
x1[-2]

7

In [46]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [47]:
# two-dimensional
x2[0, 0]

3

In [48]:
x2[2, -1]

7

<a id="ref2.3"></a>
## 1.2.3 Array Slicing: Accessing Subarrays (改變slice裡的element也會改變原始)

In [None]:
x[start:stop:step]

### One-dimensional subarrays

In [49]:
x = np.arange(10)
x

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [50]:
x[:5]

array([0, 1, 2, 3, 4])

In [51]:
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

In [52]:
x[::2]

array([0, 2, 4, 6, 8])

In [53]:
x[1::2]

array([1, 3, 5, 7, 9])

### Multi-dimensional subarrays

In [54]:
x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])

In [56]:
x2[:2, :3]   # 前兩rows, 前3 columns

array([[3, 5, 2],
       [7, 6, 8]])

In [57]:
print(x2[:, 0])  # 每個row的第一個column

[3 7 1]


In [59]:
print(x2[0, :])  # 每個column的第一個row
print(x2[0])     # 一樣的

[3 5 2 4]
[3 5 2 4]


### <font color = 'red'>No-copy views

如果是內建的list, 那slice出來的是個copy, ndarray不是

In [60]:
print(x2)

[[3 5 2 4]
 [7 6 8 8]
 [1 6 7 7]]


In [61]:
x2_sub = x2[:2, :2]
print(x2_sub)

[[3 5]
 [7 6]]


In [62]:
x2_sub[0, 0] = 99
print(x2_sub)

[[99  5]
 [ 7  6]]


In [63]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


#### <font color = 'red'> x2_sub是從x2中取出來的slice, 改變了x2_sub也會改變x2, 代表slice被改變了, original的也會被改變

### Creating copies of arrays

In [65]:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

[[99  5]
 [ 7  6]]


In [66]:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)

[[42  5]
 [ 7  6]]


In [67]:
print(x2)

[[99  5  2  4]
 [ 7  6  8  8]
 [ 1  6  7  7]]


這時的x2就沒有因為slice後的copy改變而改變

<a id="ref2.4"></a>
## 1.2.4 Reshaping of Arrays (不改變原始)

### EX1

In [69]:
grid = np.arange(1, 10)
grid

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [72]:
grid.reshape((3,3))

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [73]:
grid # 注意reshape並不會直接改變原始資料

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [75]:
# 得把他assign給另一個新的名稱
new_grid = grid.reshape((3,3))
new_grid

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### EX2

In [84]:
x=[1,2,3]
x = np.array(x)
x

array([1, 2, 3])

#### Turn into 2-dimensional (row vector)

In [85]:
x.reshape(1,3) 

array([[1, 2, 3]])

In [86]:
x[np.newaxis, :]

array([[1, 2, 3]])

#### Turn into 2-dimensional (column vector)

In [87]:
x.reshape((3, 1))

array([[1],
       [2],
       [3]])

In [88]:
x[:, np.newaxis]

array([[1],
       [2],
       [3]])

<a id="ref2.5"></a>
## 1.2.5 Array Concatenation and Splitting

### np.split(array, [斷點, 斷點])

In [91]:
x = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
x1,x2,x3,x4,x5 = np.split(x, [2,5,9,14])
# x是你想要split的array
# index = 2, 5, 9, 14的時候斷給下一個

In [92]:
print(x1,x2,x3,x4,x5)

[0 1] [2 3 4] [5 6 7 8] [ 9 10 11 12 13] [14 15]


<font color = 'red'>注意! N個split-point會有N+1個subarrays

### np.vsplit & np.hsplit

In [93]:
grid = np.arange(16).reshape((4, 4))
grid

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [94]:
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)

[[0 1 2 3]
 [4 5 6 7]]
[[ 8  9 10 11]
 [12 13 14 15]]


In [95]:
left, right = np.hsplit(grid, [2])
print(left)
print(right)

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]
[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


<hr>

<a id="ref4"></a>
# 1.3 Computation on Numpy Arrays: Universal Function

<a id="ref5"></a>
# 1.4 Aggregations: Min, Max, and Everything In Between

<a id="ref6"></a>
# 1.5 Computation on Arrays: Broadcasting

<a id="ref7"></a>
# 1.6 Comparisons, Masks, and Boolean Logic

<a id="ref8"></a>
# 1.7 Fancy Indexing

<a id="ref9"></a>
# 1.8 Sorting Arrays

<a id="ref10"></a>
# 1.9 Structured Data: NumPy's Structured Arrays