In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Numpy

### 創造ndarray
|method|Description|
|---|---|
|`np.array`(array)|轉換成ndarray|
|`np.linspace`(start, end, num)|start~end 切出num個一維ndarray|
|`np.zeros`(shape)|創造一個shape是0的array|
|`np.ones`(shape)|創造一個shape是1的array|
|`np.full`(shape, num )|創造一個shape是num的array|
|`np.random`( )|使用隨機模型創造array|

### ndarray datatype
| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

### ndarray 的基本屬性
|Attributes|Description|
|---|---|
|ndim|維度|
|shape|個維度大小(形狀)|
|size|array大小|
|dtype|datatype|
|itemsie|單位資料大小(對應datatype)|
|nbyte|整個ndarray資料大小|

In [2]:
arr = np.ones((3, 4, 5))
arr.dtype

dtype('float64')

In [3]:
print(arr.ndim)
print(arr.shape)
print(arr.size)
print(arr.dtype)
print(arr.itemsize)
print(arr.nbytes)

3
(3, 4, 5)
60
float64
8
480


### ndarray 的操作

* `arr.reshape()` 重塑造形狀
* `arr.astype()` 變換datatype
* `arr[]` 創造視圖
* `arr.copy()` 複製
* `np.concatenate, vstack, hstack` 
* `np.split, vsplit, hsplit` 分割

### ndarray 數值操作

|Function Name    | NaN-safe(NaN=0)    | Description                               |
|:----------------|:-------------------|:------------------------------------------|
| `np.sum`        | `np.nansum`        | Compute sum of elements                   |
| `np.prod`       | `np.nanprod`       | Compute product of elements               |
| `np.mean`       | `np.nanmean`       | Compute mean of elements                  |
| `np.std`        | `np.nanstd`        | Compute standard deviation                |
| `np.var`        | `np.nanvar`        | Compute variance                          |
| `np.min`        | `np.nanmin`        | Find minimum value                        |
| `np.max`        | `np.nanmax`        | Find maximum value                        |
| `np.argmin`     | `np.nanargmin`     | Find index of minimum value               |
| `np.argmax`     | `np.nanargmax`     | Find index of maximum value               |
| `np.median`     | `np.nanmedian`     | Compute median of elements                |
| `np.percentile` | `np.nanpercentile` | Compute rank-based statistics of elements |
| `np.any`        | N/A                | Evaluate whether any elements are true    |
| `np.all`        | N/A                | Evaluate whether all elements are true    |

### [Mathematical functions](https://docs.scipy.org/doc/numpy/reference/routines.math.html)

* Trigonometric functions
    * `np.sin`
    * `np.cos`
    * `np.tan`
    * `np.deg2rad`
    * `np.rad2deg`
    
* Handling complex numbers
    * `np.angle`
    * `np.real`
    * `np.imag`
    
* Miscellaneous
    * `np.clip` setting limit
    * `np.sqrt`
    * `np.sign`
    * `np.maxmum`
    * `np.minmum`


# [Pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html?highlight=concat)
### Series
* 當成字典操作
* `pd.Series(1D array)`
* `pd.Series(1D array, index)`
* `pd.Series(dict)`

### 屬性
* `ser.index` dict.keys
* `ser.values`

In [4]:
pd.Series(np.linspace(0, 1, 5))

0    0.00
1    0.25
2    0.50
3    0.75
4    1.00
dtype: float64

In [5]:
ser = pd.Series(np.linspace(0, 1, 5), index=[5,4,3,2,1])
ser

5    0.00
4    0.25
3    0.50
2    0.75
1    1.00
dtype: float64

In [6]:
ser.index

Int64Index([5, 4, 3, 2, 1], dtype='int64')

In [7]:
ser.values # ndarray

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

###  DataFrame
* `pd.DataFrame(2D array)`
* `pd.DataFrame(dict(values=array))`
* `pd.DataFrame([1D arrays, ])`

### 屬性
* `df.index`
* `df.columns` dict.keys
* `df.values`

In [8]:
ser.value_counts()

1.00    1
0.75    1
0.50    1
0.25    1
0.00    1
dtype: int64

### DataFrame & Series組合
* [pd.merge](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html?highlight=merge#pandas.merge)
* [pd.concat](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html?highlight=concat#pandas.concat)
* [pd.DataFrame.join](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.join.html)
* [pd.DataFrame.insert](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html?highlight=insert#pandas.DataFrame.insert)

## `pandas` vs `re`

In [9]:
name = pd.Series(['Pool Chen', 'Allison Yu', 'John Claeese', 'Eric Idle', 'Micheal Palin', 'Terry Gilliam'])
name

0        Pool Chen
1       Allison Yu
2     John Claeese
3        Eric Idle
4    Micheal Palin
5    Terry Gilliam
dtype: object

In [10]:
# re.match()
name.str.match(r'ol')

0    False
1    False
2    False
3    False
4    False
5    False
dtype: bool

In [11]:
# re.search()
name.str.contains(r'o')

0     True
1     True
2     True
3    False
4    False
5    False
dtype: bool

In [12]:
# re.match()
name.str.extract(r'([A-Za-z]+)')

Unnamed: 0,0
0,Pool
1,Allison
2,John
3,Eric
4,Micheal
5,Terry
