### NumPy 統計相關功能

- `mean`：計算數組中所有元素的算術平均值。
- `average`：計算數組中的加權平均值。與`mean`不同，`average`可以考慮權重。
- `var`：計算數組的方差，表示數據分布的程度，方差越大表示數據分布越廣。
- `std`：計算數組的標準差，是方差的平方根，表示數據的分散程度。
- `min`, `max`：分別找出數組中的最小值和最大值。
- `cumsum`：計算數組的累積和，即從數組的第一個元素開始逐個加總。
- `cumprod`：計算數組的累積乘積，即從數組的第一個元素開始逐個相乘。
- `ptp` (peak to peak)：計算數組中最大值和最小值之間的差值，即“峰值到峰值”的距離。
- `median`：計算數組的中位數，即將數據從小到大排列後位於中間的數值。
- `quantile`：計算數組的分位數，是指將數據分成幾等分的數值點。
- `percentile`：與`quantile`類似，計算數組中給定百分比下的數值。
- `corrcoef`：計算數組的相關係數矩陣，用於衡量數據間的線性關係程度。


In [2]:
import yfinance as yf

In [16]:
tsmc = yf.download("2330.tw", start="2023-10-01", end="2023-10-31")
tsmc_array = tsmc.loc[:, "Open":"Close"].to_numpy(dtype="int")

[*********************100%%**********************]  1 of 1 completed


In [18]:
print(tsmc_array)
print(tsmc_array.shape)

[[530 534 528 533]
 [528 533 528 529]
 [521 523 519 520]
 [523 529 523 528]
 [530 533 529 532]
 [542 544 540 544]
 [545 550 544 550]
 [550 554 548 553]
 [546 547 542 545]
 [550 552 548 551]
 [549 549 540 540]
 [540 548 540 546]
 [549 556 546 556]
 [552 553 543 544]
 [543 546 540 544]
 [544 551 544 544]
 [530 535 530 531]
 [534 536 532 533]
 [531 534 528 532]]
(19, 4)


In [19]:
import numpy as np

In [21]:
np.savetxt("tsmc.csv", tsmc_array, delimiter=",", fmt="%d")

In [23]:
np.loadtxt("tsmc.csv", delimiter=",", dtype="int")

array([[530, 534, 528, 533],
       [528, 533, 528, 529],
       [521, 523, 519, 520],
       [523, 529, 523, 528],
       [530, 533, 529, 532],
       [542, 544, 540, 544],
       [545, 550, 544, 550],
       [550, 554, 548, 553],
       [546, 547, 542, 545],
       [550, 552, 548, 551],
       [549, 549, 540, 540],
       [540, 548, 540, 546],
       [549, 556, 546, 556],
       [552, 553, 543, 544],
       [543, 546, 540, 544],
       [544, 551, 544, 544],
       [530, 535, 530, 531],
       [534, 536, 532, 533],
       [531, 534, 528, 532]])

In [25]:
np.mean(tsmc_array, axis=0)

array([538.78947368, 542.47368421, 536.42105263, 539.73684211])

In [32]:
np.average(np.arange(10, 14))

11.5

In [37]:
np.average(np.linspace(10, 100, num=4), weights=[0.1, 0.3, 0.3, 0.3])

64.0

In [38]:
np.linspace(10, 100, num=4).mean()

55.0

In [39]:
np.var(tsmc_array, axis=0)

array([93.6398892 , 91.40720222, 74.34903047, 91.66759003])

In [40]:
np.std(tsmc_array, axis=0)

array([9.6767706 , 9.56071139, 8.62258839, 9.5743193 ])

In [41]:
np.min(tsmc_array, axis=0)

array([521, 523, 519, 520])

In [44]:
np.ptp(tsmc_array, axis=1)

array([ 6,  5,  4,  6,  4,  4,  6,  6,  5,  4,  9,  8, 10, 10,  6,  7,  5,
        4,  6])

In [45]:
np.cumsum(tsmc_array, axis=0)

array([[  530,   534,   528,   533],
       [ 1058,  1067,  1056,  1062],
       [ 1579,  1590,  1575,  1582],
       [ 2102,  2119,  2098,  2110],
       [ 2632,  2652,  2627,  2642],
       [ 3174,  3196,  3167,  3186],
       [ 3719,  3746,  3711,  3736],
       [ 4269,  4300,  4259,  4289],
       [ 4815,  4847,  4801,  4834],
       [ 5365,  5399,  5349,  5385],
       [ 5914,  5948,  5889,  5925],
       [ 6454,  6496,  6429,  6471],
       [ 7003,  7052,  6975,  7027],
       [ 7555,  7605,  7518,  7571],
       [ 8098,  8151,  8058,  8115],
       [ 8642,  8702,  8602,  8659],
       [ 9172,  9237,  9132,  9190],
       [ 9706,  9773,  9664,  9723],
       [10237, 10307, 10192, 10255]])

In [46]:
np.cumprod(tsmc_array, axis=0)

array([[                 530,                  534,                  528,
                         533],
       [              279840,               284622,               278784,
                      281957],
       [           145796640,            148857306,            144688896,
                   146617640],
       [         76251642720,          78745514874,          75672292608,
                 77414113920],
       [      40413370641600,       41971359427842,       40030642789632,
              41184308605440],
       [   21904046887747200,    22832419528746048,    21616547106401280,
           22404263881359360],
       [-6509038519887327616, -5888913332899225216, -6687342447827255296,
        -6124398938961903616],
       [-1302835638377175296,  2615714620419866368,  6238409258864869376,
         7408296316624797696],
       [ 8074760320734801408, -8050140379678122752,  5463652815911256064,
        -2315459581877059584],
       [-4547145359861165056,  1987832181678180352,  57

In [47]:
np.median(tsmc_array, axis=0)

array([542., 546., 540., 544.])

In [49]:
np.quantile(tsmc_array, 0.25, axis=0)

array([530. , 534. , 528.5, 532. ])

In [51]:
np.quantile(tsmc_array, 0.70, axis=0)

array([545.6, 549.6, 542.6, 544.6])

In [52]:
np.percentile(tsmc_array, 70, axis=0)

array([545.6, 549.6, 542.6, 544.6])

In [55]:
import numpy.random as nr

In [64]:
nr.seed(0)
a1 = nr.randint(100, size=10)
nr.seed(10)
a2 = nr.randint(100, size=10)

In [65]:
print(a1)
print(a2)

[44 47 64 67 67  9 83 21 36 87]
[ 9 15 64 28 89 93 29  8 73  0]


In [66]:
np.corrcoef(a1, a2)

array([[ 1.       , -0.2857349],
       [-0.2857349,  1.       ]])