# Mod09 Series Indexing and Selection

## Data Selection in Series

### Series as dictionary

In [1]:
import numpy as np
import pandas as pd

In [2]:
pd.__version__

'1.0.5'

In [3]:
np.__version__

'1.19.1'

In [4]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [5]:
data['b']

0.5

In [6]:
data[['a','c']]   #fancy index

a    0.25
c    0.75
dtype: float64

In [7]:
data.index    #如果自動產生會顯示range index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [8]:
data>0.5   #得到boolin series

a    False
b    False
c     True
d     True
dtype: bool

In [9]:
type(data>0.5)  

pandas.core.series.Series

In [10]:
data[data>0.5]

c    0.75
d    1.00
dtype: float64

We can also use dictionary-like Python expressions and methods to examine the keys/indices and values:

In [9]:
'a' in data    # 當成字典, 在data 裡是否有a 這個key

True

In [10]:
data.index   

Index(['a', 'b', 'c', 'd'], dtype='object')

In [12]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [19]:
data.keys()

Index(['a', 'b', 'c', 'd'], dtype='object')

extend a ``Series`` by assigning to a new index value:

In [13]:
data['e'] = 1.25    #
data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.25
dtype: float64

### Series as one-dimensional array

In [16]:
data['b']

0.5

In [17]:
# masking
data[(data > 0.3) & (data < 0.8)]

b    0.50
c    0.75
dtype: float64

In [20]:
data[(data < 0.3) | (data < 0.8)]   # or   |  pandas的寫法

a    0.25
b    0.50
c    0.75
dtype: float64

In [21]:
# fancy indexing
data[['a', 'e']]

a    0.25
e    1.25
dtype: float64

In [11]:
# slicing by explicit index
data['b':'c']    #明確的索引會包含最後一筆

b    0.50
c    0.75
dtype: float64

In [23]:
data[2]

0.75

In [14]:
data

a    0.25
b    0.50
c    0.75
d    1.00
e    1.25
dtype: float64

In [24]:
# slicing by implicit integer index
data[0:2]   #給位置不含最後一筆

a    0.25
b    0.50
dtype: float64

In [15]:
data[::2]

a    0.25
c    0.75
e    1.25
dtype: float64

In [26]:
data[0:4:2]

a    0.25
c    0.75
dtype: float64

In [27]:
data[:-1:2]

a    0.25
c    0.75
dtype: float64

### Indexers: loc, iloc

In [16]:
data = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])    # 離職專用
data

1    a
3    b
5    c
dtype: object

In [21]:
# explicit index when indexing
data[1]

'a'

In [44]:
# implicit index when slicing
data[1:3]    #給的是位置

3    b
5    c
dtype: object

In [45]:
data[0:2]

1    a
3    b
dtype: object

the ``loc`` attribute allows indexing and slicing that always references the explicit index:

In [24]:
data.at[1]    #key 為3 的value值

'a'

In [49]:
%timeit data.at[1]    #抓一份資料

4.92 µs ± 58.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [25]:
data

1    a
3    b
5    c
dtype: object

In [53]:
data.loc[1]    # key 指的是該label, 會對data 內的值抓一遍, 時間會比較久

'a'

In [52]:
%timeit data.loc[1]   

9.77 µs ± 99.9 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [54]:
data.loc[1:3]   #抓的是該索引值的元素,

1    a
3    b
dtype: object

The ``iloc`` attribute allows indexing and slicing that always references the implicit Python-style index:

In [55]:
data.iloc[1]     #位置 

'b'

In [56]:
data.iat[1]

'b'

In [57]:
data.iloc[1:3]

3    b
5    c
dtype: object

## Lab

<b>有一個 Series ser，取得如下的資料:
* 透過 index 方式取得 13 這份資料
* 透過 slice 方式取得 36、13、35、17 四份資料
</b>

In [31]:
np.random.seed(41)
ser = pd.Series(np.random.randint(1,50,size=7), index=list('abcdefg'))
ser

a     1
b    36
c    13
d    35
e    17
f     2
g    26
dtype: int64

In [36]:
ser['b':'e']

b    36
c    13
d    35
e    17
dtype: int64

In [37]:
ser[1:5]

b    36
c    13
d    35
e    17
dtype: int64

<b> 透過 loc 與 iloc 取得 36、13、35、17 四份資料</b>

In [39]:
ser.loc['b':'e']

b    36
c    13
d    35
e    17
dtype: int64

In [40]:
ser.iloc[1:5]

b    36
c    13
d    35
e    17
dtype: int64

<b> 透過 at 與 iat 取得 35 這份資料</b>

In [42]:
ser.at['d']

35

In [43]:
ser.iat[3]

35

In [41]:

### 匯入檔案

In [30]:
df = pd.read_csv('data/tips.csv')

In [31]:
df.shape

(244, 7)

In [32]:
df.info

<bound method DataFrame.info of      total_bill   tip     sex smoker   day    time  size
0         16.99  1.01  Female     No   Sun  Dinner     2
1         10.34  1.66    Male     No   Sun  Dinner     3
2         21.01  3.50    Male     No   Sun  Dinner     3
3         23.68  3.31    Male     No   Sun  Dinner     2
4         24.59  3.61  Female     No   Sun  Dinner     4
..          ...   ...     ...    ...   ...     ...   ...
239       29.03  5.92    Male     No   Sat  Dinner     3
240       27.18  2.00  Female    Yes   Sat  Dinner     2
241       22.67  2.00    Male    Yes   Sat  Dinner     2
242       17.82  1.75    Male     No   Sat  Dinner     2
243       18.78  3.00  Female     No  Thur  Dinner     2

[244 rows x 7 columns]>

In [33]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [36]:
ser = df['total_bill']
ser

0      16.99
1      10.34
2      21.01
3      23.68
4      24.59
       ...  
239    29.03
240    27.18
241    22.67
242    17.82
243    18.78
Name: total_bill, Length: 244, dtype: float64

In [37]:
ser.head(10)

0    16.99
1    10.34
2    21.01
3    23.68
4    24.59
5    25.29
6     8.77
7    26.88
8    15.04
9    14.78
Name: total_bill, dtype: float64

In [38]:
ser[3]

23.68

In [39]:
ser[1:4]

1    10.34
2    21.01
3    23.68
Name: total_bill, dtype: float64

In [34]:
df.tail()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
239,29.03,5.92,Male,No,Sat,Dinner,3
240,27.18,2.0,Female,Yes,Sat,Dinner,2
241,22.67,2.0,Male,Yes,Sat,Dinner,2
242,17.82,1.75,Male,No,Sat,Dinner,2
243,18.78,3.0,Female,No,Thur,Dinner,2
