# Pandas排序

In [1]:
from pandas import Series, DataFrame
import pandas as pd
import numpy as np

## Series

Series是一個類似於一維陣列的物件，包含資料陣列和相關的資料標籤貞烈。 資料可以是任何NumPy資料類型，標籤是Series的索引。

產生Series:

Series就像一個固定長度的有序字典，傳入一個dict創建一個Series：

In [2]:
dict_1 = {'apple' : 100, 'ball' : 200, 'car' : 300}
ser_3 = Series(dict_1)
ser_3

apple    100
ball     200
car      300
dtype: int64

通過傳入索引重新排序Series（未找到的索引是NaN）：

In [3]:
index = ['foo', 'bar', 'baz', 'donkey']
ser_4 = Series(dict_1, index=index)
ser_4

foo      NaN
bar      NaN
baz      NaN
donkey   NaN
dtype: float64

使用pandas方法檢查NaN，下面這兩種方式等效：

In [4]:
pd.isnull(ser_4)

foo       True
bar       True
baz       True
donkey    True
dtype: bool

In [5]:
ser_4.isnull()

foo       True
bar       True
baz       True
donkey    True
dtype: bool

Series在算術運算中自動對齊不同的索引資料：

In [6]:
ser_3 + ser_4

apple    NaN
ball     NaN
bar      NaN
baz      NaN
car      NaN
donkey   NaN
foo      NaN
dtype: float64

命名 Series:

In [7]:
ser_4.name = 'appleballcardonkey'

命名 Series索引：

In [8]:
ser_4.index.name = 'label'

In [9]:
ser_4

label
foo      NaN
bar      NaN
baz      NaN
donkey   NaN
Name: appleballcardonkey, dtype: float64

重命名 Series的索引：

In [10]:
ser_4.index = ['ap', 'ba', 'ca', 'do']
ser_4

ap   NaN
ba   NaN
ca   NaN
do   NaN
Name: appleballcardonkey, dtype: float64

## 排序(Sorting)和排名(Ranking)

In [11]:
ser_4

ap   NaN
ba   NaN
ca   NaN
do   NaN
Name: appleballcardonkey, dtype: float64

按索引對Series進行排序：

In [12]:
ser_4.sort_index()

ap   NaN
ba   NaN
ca   NaN
do   NaN
Name: appleballcardonkey, dtype: float64

按Series值排序Series：

In [13]:
ser_4.sort_values()

ap   NaN
ba   NaN
ca   NaN
do   NaN
Name: appleballcardonkey, dtype: float64

In [14]:
df_12 = DataFrame(np.arange(12).reshape((3, 4)),
                  index=['three', 'one', 'two'],
                  columns=['c', 'a', 'b', 'd'])
df_12

Unnamed: 0,c,a,b,d
three,0,1,2,3
one,4,5,6,7
two,8,9,10,11


按索引對DataFrame進行排序：

In [15]:
df_12.sort_index()

Unnamed: 0,c,a,b,d
one,4,5,6,7
three,0,1,2,3
two,8,9,10,11


按行降序排列DataFrame：

In [16]:
df_12.sort_index(axis=1, ascending=False)

Unnamed: 0,d,c,b,a
three,3,0,2,1
one,7,4,6,5
two,11,8,10,9


按行對DataFrame的值進行排序：

In [17]:
df_12.sort_values(by=['d', 'c'])

Unnamed: 0,c,a,b,d
three,0,1,2,3
one,4,5,6,7
two,8,9,10,11


排名類似於numpy.argsort，除了通過為每個群組分配平均排名來打破關聯：

In [18]:
ser_11 = Series([7, -5, 7, 4, 2, 0, 4, 7])
ser_11 = ser_11.sort_values()
ser_11

1   -5
5    0
4    2
3    4
6    4
0    7
2    7
7    7
dtype: int64

In [19]:
ser_11.rank()

1    1.0
5    2.0
4    3.0
3    4.5
6    4.5
0    7.0
2    7.0
7    7.0
dtype: float64

根據Series出現在資料中的時間對Series進行排名：

In [20]:
ser_11.rank(method='first')

1    1.0
5    2.0
4    3.0
3    4.0
6    5.0
0    6.0
2    7.0
7    8.0
dtype: float64

使用群組的最大排名按降序排列Series：

In [21]:
ser_11.rank(ascending=False, method='max')

1    8.0
5    7.0
4    6.0
3    5.0
6    5.0
0    3.0
2    3.0
7    3.0
dtype: float64

DataFrame可以對行或列進行排名：

In [22]:
df_13 = DataFrame({'apple' : [7, -5, 7, 4, 2, 0, 4, 7],
                   'ball' : [-5, 4, 2, 0, 4, 7, 7, 8],
                   'car' : [-1, 2, 3, 0, 5, 9, 9, 5]})
df_13

Unnamed: 0,apple,ball,car
0,7,-5,-1
1,-5,4,2
2,7,2,3
3,4,0,0
4,2,4,5
5,0,7,9
6,4,7,9
7,7,8,5


在列上排名DataFrame：

In [23]:
df_13.rank()

Unnamed: 0,apple,ball,car
0,7.0,1.0,1.0
1,1.0,4.5,3.0
2,7.0,3.0,4.0
3,4.5,2.0,2.0
4,3.0,4.5,5.5
5,2.0,6.5,7.5
6,4.5,6.5,7.5
7,7.0,8.0,5.5


在行上排名DataFrame：

In [24]:
df_13.rank(axis=1)

Unnamed: 0,apple,ball,car
0,3.0,1.0,2.0
1,1.0,3.0,2.0
2,3.0,1.0,2.0
3,3.0,1.5,1.5
4,1.0,2.0,3.0
5,1.0,2.0,3.0
6,1.0,2.0,3.0
7,2.0,3.0,1.0


## 彙總進行敘述統計(Descriptive Statistics)

不像NumPy的陣列處理，Pandas 敘述統計會自動將遺漏資料(missing data)與NaN數值排除。

In [26]:
df_12

Unnamed: 0,c,a,b,d
three,0,1,2,3
one,4,5,6,7
two,8,9,10,11


In [28]:
df_12.sum()

c    12
a    15
b    18
d    21
dtype: int64

依照列取總和:

In [29]:
df_12.sum(axis=1)

three     6
one      22
two      38
dtype: int64

計算NaNs數量:

In [30]:
df_12.sum(axis=1, skipna=False)

three     6
one      22
two      38
dtype: int64