# 商用程式設計：第六章　Pandas Data Frame
## Chap 6(109/05/05)
* Pandas是python常用的數據分析函式庫，2009年底開源，提供高效能、簡易使用的資料格式(Data Frame)讓使用者可以快速操作及分析資料。
* 在異質數據的讀取、轉換和處理上，都讓分析人員更容易處理，例如：從列、欄資料中找到想要的值。
* Pandas提供DataFrame則是用來處理結構化(Table like)的資料，例如關聯式資料庫、CSV等。透過載入至Pandas的資料結構物件後，可以透過結構化物件所提供的方法，來快速地進行資料的前處理，如果資料補值、空值去除或取代等。

In [4]:
import pandas as pd

In [16]:
groups = ['Modem Web', 'DevOps', 'Cloud', 'Big Data', 'Security', '自我挑戰組']
ironman = [59, 9, 19, 14, 6, 77]
ironman_dict = {"groups":groups, "ironman":ironman}
ironman_df = pd.DataFrame(ironman_dict)
ironman_df

Unnamed: 0,groups,ironman
0,Modem Web,59
1,DevOps,9
2,Cloud,19
3,Big Data,14
4,Security,6
5,自我挑戰組,77


In [21]:
print("鐵人總數：")
print(ironman_df.sum()) #計算鐵人總數
print("------------------------------")
print("鐵人平均數：")
print(ironman_df.mean()) #計算鐵人平均數
print("------------------------------")
print("鐵人中位數：")
print(ironman_df.median()) #計算鐵人中位數
print("------------------------------")
print("描述性統計：")
print(ironman_df.describe()) #列出所有描述性統計

鐵人總數：
groups     Modem WebDevOpsCloudBig DataSecurity自我挑戰組
ironman                                          184
dtype: object
------------------------------
鐵人平均數：
ironman    30.666667
dtype: float64
------------------------------
鐵人中位數：
ironman    16.5
dtype: float64
------------------------------
描述性統計：
         ironman
count   6.000000
mean   30.666667
std    29.803803
min     6.000000
25%    10.250000
50%    16.500000
75%    49.000000
max    77.000000


In [26]:
import numpy as np
groups = ['Modem Web', 'DevOps', np.nan, 'Big Data', 'Security', '自我挑戰組']
ironman = [59, 9, 19, 14, 6, np.nan]
ironman_dict = {"groups":groups, "ironman":ironman}
ironman_df = pd.DataFrame(ironman_dict)
print(ironman_df)
print("------------------------------")
print(ironman_df.loc[:, "groups"]) #只顯示groups欄
print("------------------------------")
print(ironman_df.loc[:, "groups"].isnull()) #判斷groups是否空值(無空值:False,有空值:True)
print("------------------------------")
print(ironman_df.loc[:, "ironman"]) #只顯示ironman欄
print("------------------------------")
print(ironman_df.loc[:, "ironman"].notnull()) #判斷ironman是否空值(無空值:True,有空值:False)

      groups  ironman
0  Modem Web     59.0
1     DevOps      9.0
2        NaN     19.0
3   Big Data     14.0
4   Security      6.0
5      自我挑戰組      NaN
------------------------------
0    Modem Web
1       DevOps
2          NaN
3     Big Data
4     Security
5        自我挑戰組
Name: groups, dtype: object
------------------------------
0    False
1    False
2     True
3    False
4    False
5    False
Name: groups, dtype: bool
------------------------------
0    59.0
1     9.0
2    19.0
3    14.0
4     6.0
5     NaN
Name: ironman, dtype: float64
------------------------------
0     True
1     True
2     True
3     True
4     True
5    False
Name: ironman, dtype: bool


In [27]:
print(ironman_df.groups[ironman_df.loc[:, "groups"].isnull()])
print(ironman_df.ironman[ironman_df.loc[:, "ironman"].notnull()])

2    NaN
Name: groups, dtype: object
0    59.0
1     9.0
2    19.0
3    14.0
4     6.0
Name: ironman, dtype: float64


In [33]:
print(ironman_df)
print("-----------------------------------------")
ironman_df_dropped = ironman_df.dropna() #刪除遺失值
print(ironman_df_dropped)
print("-----------------------------------------")
ironman_df_filled = ironman_df.fillna(0) #遺失值變為0
print(ironman_df_filled)
print("-----------------------------------------")
ironman_df_fillvalue = ironman_df.fillna({"groups":"Cloud", "ironman":71}) #將遺失值補上指定值
print(ironman_df_fillvalue)

      groups  ironman
0  Modem Web     59.0
1     DevOps      9.0
2        NaN     19.0
3   Big Data     14.0
4   Security      6.0
5      自我挑戰組      NaN
-----------------------------------------
      groups  ironman
0  Modem Web     59.0
1     DevOps      9.0
3   Big Data     14.0
4   Security      6.0
-----------------------------------------
      groups  ironman
0  Modem Web     59.0
1     DevOps      9.0
2          0     19.0
3   Big Data     14.0
4   Security      6.0
5      自我挑戰組      0.0
-----------------------------------------
      groups  ironman
0  Modem Web     59.0
1     DevOps      9.0
2      Cloud     19.0
3   Big Data     14.0
4   Security      6.0
5      自我挑戰組     71.0
