# Subset Variables (Columns)
0. [Preparation](#0)
1. [Columns with Specific Names](#1)
 1. [Multiple Columns](#1-1)
 2. [Single Column](#1-2)
2. [Regeular Expression (정규표현식)](#2)
3. [Columns between A and B (inclusive)](#3)
4. [Columns in Specific Position](#4)
5. [Columns of Specific Rows Meeting Logial Condition](#5)

## 0. Preparation<a name="0"></a>

In [1]:
import pandas as pd
import seaborn as sns # for iris dataset

In [2]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


---
## 1. Columns with Specific Names<a name="1"></a>
### A. Multiple Columns<a name="1-1"></a>
Select multiple columns with specific names.
```python
dataframe[['column1', 'column2']]
```

In [3]:
df[['sepal_length', 'sepal_width', 'species']]

Unnamed: 0,sepal_length,sepal_width,species
0,5.1,3.5,setosa
1,4.9,3.0,setosa
2,4.7,3.2,setosa
3,4.6,3.1,setosa
4,5.0,3.6,setosa
...,...,...,...
145,6.7,3.0,virginica
146,6.3,2.5,virginica
147,6.5,3.0,virginica
148,6.2,3.4,virginica


In [4]:
columns = ['sepal_length', 'sepal_width', 'species']
df[columns].head()

Unnamed: 0,sepal_length,sepal_width,species
0,5.1,3.5,setosa
1,4.9,3.0,setosa
2,4.7,3.2,setosa
3,4.6,3.1,setosa
4,5.0,3.6,setosa


### B. Single Column<a name="1-2"></a>
Select single column with specific name.
```python
dataframe['column']
dataframe.column # only Latin alphabet with this format (no special character)
```

In [5]:
df['sepal_length']

0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ... 
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sepal_length, Length: 150, dtype: float64

In [6]:
df.sepal_length

0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ... 
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sepal_length, Length: 150, dtype: float64

---
## 2. Regeular Expression (정규표현식)<a name = "2"></a>
Select columns whose name matches regular expression <i>regex</i>.

```python
df.filter(regex='regex')
```

|regex|examples|
|:---|:---|
|`'_'`| Matches strings containing a period '_'|
|`'\.'`|Matches strings containing a period '.' (exception)|
|`'length$'`|Matches strings ending with word 'length'|
|`'^sepal'`|Matches strings beginning with the word 'sepal'|
|`'^x[1-5]$'`|Matches strings beginning with 'x' and ending with 1,2,3,4,5|
|`'^(?!species$).*'`|Matches strings except the string 'species'|

In [7]:
# Matches strings containing a period '_'
df.filter(regex='_')

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


In [8]:
# Matches strings ending with word 'length'
df.filter(regex="length$")

Unnamed: 0,sepal_length,petal_length
0,5.1,1.4
1,4.9,1.4
2,4.7,1.3
3,4.6,1.5
4,5.0,1.4
...,...,...
145,6.7,5.2
146,6.3,5.0
147,6.5,5.2
148,6.2,5.4


In [9]:
# Matches strings beginning with the word 'sepal'
df.filter(regex='^sepal')

Unnamed: 0,sepal_length,sepal_width
0,5.1,3.5
1,4.9,3.0
2,4.7,3.2
3,4.6,3.1
4,5.0,3.6
...,...,...
145,6.7,3.0
146,6.3,2.5
147,6.5,3.0
148,6.2,3.4


In [10]:
# Matches strings except the string 'species'
df.filter(regex='^(?!species$).*')

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2
...,...,...,...,...
145,6.7,3.0,5.2,2.3
146,6.3,2.5,5.0,1.9
147,6.5,3.0,5.2,2.0
148,6.2,3.4,5.4,2.3


---
## 3. Columns between A and B (inclusive)<a name="3"></a>
Select all columns between sepal_width and petal_width (inclusive).

In [11]:
df.loc[:, 'sepal_width':'petal_width']

Unnamed: 0,sepal_width,petal_length,petal_width
0,3.5,1.4,0.2
1,3.0,1.4,0.2
2,3.2,1.3,0.2
3,3.1,1.5,0.2
4,3.6,1.4,0.2
...,...,...,...
145,3.0,5.2,2.3
146,2.5,5.0,1.9
147,3.0,5.2,2.0
148,3.4,5.4,2.3


In [12]:
df.loc[:5, 'sepal_width':'petal_width'] # 인덱스 5까지 가져옴

Unnamed: 0,sepal_width,petal_length,petal_width
0,3.5,1.4,0.2
1,3.0,1.4,0.2
2,3.2,1.3,0.2
3,3.1,1.5,0.2
4,3.6,1.4,0.2
5,3.9,1.7,0.4


---
## 4. Columns in Specific Position<a name="4"></a>
Select columns in positions 1, 2 and 4 (first column is 0).

In [13]:
df.iloc[:, [1, 2, 4]]

Unnamed: 0,sepal_width,petal_length,species
0,3.5,1.4,setosa
1,3.0,1.4,setosa
2,3.2,1.3,setosa
3,3.1,1.5,setosa
4,3.6,1.4,setosa
...,...,...,...
145,3.0,5.2,virginica
146,2.5,5.0,virginica
147,3.0,5.2,virginica
148,3.4,5.4,virginica


In [14]:
df.iloc[:5, [1, 2, 4]] # 인덱스 5 전까지 가져옴

Unnamed: 0,sepal_width,petal_length,species
0,3.5,1.4,setosa
1,3.0,1.4,setosa
2,3.2,1.3,setosa
3,3.1,1.5,setosa
4,3.6,1.4,setosa


---
## 5. Columns of Specific Rows Meeting Logial Condition<a name="5"></a>
Select rows meeting logical condition, and only the specific columns.

In [15]:
df.loc[df['sepal_length'] > 5, ['sepal_length', 'sepal_width']]

Unnamed: 0,sepal_length,sepal_width
0,5.1,3.5
5,5.4,3.9
10,5.4,3.7
14,5.8,4.0
15,5.7,4.4
...,...,...
145,6.7,3.0
146,6.3,2.5
147,6.5,3.0
148,6.2,3.4
