# Industry 4.0 의 중심, BigData

<div align='right'><font size=2 color='gray'>Data Processing Based Python @ <font color='blue'><a href='https://www.facebook.com/jskim.kr'>FB / jskim.kr</a></font>, [김진수](bigpycraft@gmail.com)</font></div>
<hr>

# Pandas Advanced 

> ### INDEX
> - Object Creation (객체 생성)
> - Viewing Data (데이터 확인하기)
> - Selection (선택)
> - Missing Data (결측치)
> - Operation (연산)

## 1. Object Creation (객체 생성)

> - 데이터 구조 섹션 참조
> - Pandas는 값을 가지고 있는 리스트를 통해 Series를 만들고, 
<br/> 정수로 만들어진 인덱스를 기본값으로 불러올 것입니다.

In [1]:
import numpy as np
import pandas as pd

In [2]:
s = pd.Series([1, 3, 5, np.nan, 6, 8])
s

0    1.0
1    3.0
2    5.0
3    NaN
4    6.0
5    8.0
dtype: float64

In [3]:
dates = pd.date_range("20220601", periods=6)
dates

DatetimeIndex(['2022-06-01', '2022-06-02', '2022-06-03', '2022-06-04',
               '2022-06-05', '2022-06-06'],
              dtype='datetime64[ns]', freq='D')

In [4]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df

Unnamed: 0,A,B,C,D
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-05,-0.90548,1.25357,0.254604,0.192972
2022-06-06,0.196703,0.285102,1.328384,-1.410239


In [5]:
df2 = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20220601"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["JB_Bank", "KJ_Bank", "JB_Bank", "KJ_Bank"]),
        "F": "BigpyCraft",
    }
)
df2

Unnamed: 0,A,B,C,D,E,F
0,1.0,2022-06-01,1.0,3,JB_Bank,BigpyCraft
1,1.0,2022-06-01,1.0,3,KJ_Bank,BigpyCraft
2,1.0,2022-06-01,1.0,3,JB_Bank,BigpyCraft
3,1.0,2022-06-01,1.0,3,KJ_Bank,BigpyCraft


In [6]:
df2.dtypes

A           float64
B    datetime64[ns]
C           float32
D             int32
E          category
F            object
dtype: object

In [7]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 6 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   A       4 non-null      float64       
 1   B       4 non-null      datetime64[ns]
 2   C       4 non-null      float32       
 3   D       4 non-null      int32         
 4   E       4 non-null      category      
 5   F       4 non-null      object        
dtypes: category(1), datetime64[ns](1), float32(1), float64(1), int32(1), object(1)
memory usage: 288.0+ bytes


### Tap 키 : 자동완성 기능
> df2.\<TAB\> 

```python
df2.<TAB>  # noqa: E225, E999
df2.A                  df2.bool
df2.abs                df2.boxplot
df2.add                df2.C
df2.add_prefix         df2.clip
df2.add_suffix         df2.columns
df2.align              df2.copy
df2.all                df2.count
df2.any                df2.combine
df2.append             df2.D
df2.apply              df2.describe
df2.applymap           df2.diff
df2.B                  df2.duplicated
```

## 2. Viewing Data (데이터 확인하기)

In [8]:
df.head()

Unnamed: 0,A,B,C,D
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-05,-0.90548,1.25357,0.254604,0.192972


In [9]:
df.tail()

Unnamed: 0,A,B,C,D
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-05,-0.90548,1.25357,0.254604,0.192972
2022-06-06,0.196703,0.285102,1.328384,-1.410239


In [10]:
df.tail(3)

Unnamed: 0,A,B,C,D
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-05,-0.90548,1.25357,0.254604,0.192972
2022-06-06,0.196703,0.285102,1.328384,-1.410239


In [11]:
df.index

DatetimeIndex(['2022-06-01', '2022-06-02', '2022-06-03', '2022-06-04',
               '2022-06-05', '2022-06-06'],
              dtype='datetime64[ns]', freq='D')

In [12]:
df.columns

Index(['A', 'B', 'C', 'D'], dtype='object')

In [13]:
df.values

array([[-0.79555384, -0.63854385, -0.43249749, -0.33956308],
       [ 0.52638284, -3.13022737, -0.34730942,  1.20381557],
       [-2.48492146, -0.76782514, -0.15391762,  0.85860061],
       [ 0.91822694, -0.28035637,  1.91627677, -1.07467808],
       [-0.90548023,  1.2535703 ,  0.25460371,  0.19297158],
       [ 0.19670285,  0.28510249,  1.3283838 , -1.41023871]])

In [14]:
df.value_counts

<bound method DataFrame.value_counts of                    A         B         C         D
2022-06-01 -0.795554 -0.638544 -0.432497 -0.339563
2022-06-02  0.526383 -3.130227 -0.347309  1.203816
2022-06-03 -2.484921 -0.767825 -0.153918  0.858601
2022-06-04  0.918227 -0.280356  1.916277 -1.074678
2022-06-05 -0.905480  1.253570  0.254604  0.192972
2022-06-06  0.196703  0.285102  1.328384 -1.410239>

In [15]:
df

Unnamed: 0,A,B,C,D
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-05,-0.90548,1.25357,0.254604,0.192972
2022-06-06,0.196703,0.285102,1.328384,-1.410239


-  describe() : 데이터의 대략적인 통계적 정보 요약

In [16]:
df.describe()

Unnamed: 0,A,B,C,D
count,6.0,6.0,6.0,6.0
mean,-0.424107,-0.54638,0.42759,-0.094849
std,1.241559,1.465316,0.973233,1.041982
min,-2.484921,-3.130227,-0.432497,-1.410239
25%,-0.877999,-0.735505,-0.298961,-0.890899
50%,-0.299425,-0.45945,0.050343,-0.073296
75%,0.443963,0.143738,1.059939,0.692193
max,0.918227,1.25357,1.916277,1.203816


In [17]:
# 데이터 전치
df.T

Unnamed: 0,2022-06-01,2022-06-02,2022-06-03,2022-06-04,2022-06-05,2022-06-06
A,-0.795554,0.526383,-2.484921,0.918227,-0.90548,0.196703
B,-0.638544,-3.130227,-0.767825,-0.280356,1.25357,0.285102
C,-0.432497,-0.347309,-0.153918,1.916277,0.254604,1.328384
D,-0.339563,1.203816,0.858601,-1.074678,0.192972,-1.410239


In [18]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,D,C,B,A
2022-06-01,-0.339563,-0.432497,-0.638544,-0.795554
2022-06-02,1.203816,-0.347309,-3.130227,0.526383
2022-06-03,0.858601,-0.153918,-0.767825,-2.484921
2022-06-04,-1.074678,1.916277,-0.280356,0.918227
2022-06-05,0.192972,0.254604,1.25357,-0.90548
2022-06-06,-1.410239,1.328384,0.285102,0.196703


In [19]:
df.sort_values(by='B')

Unnamed: 0,A,B,C,D
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-06,0.196703,0.285102,1.328384,-1.410239
2022-06-05,-0.90548,1.25357,0.254604,0.192972


## 3. Selection (선택)

- Getting (데이터 얻기)

In [20]:
df['A']

2022-06-01   -0.795554
2022-06-02    0.526383
2022-06-03   -2.484921
2022-06-04    0.918227
2022-06-05   -0.905480
2022-06-06    0.196703
Freq: D, Name: A, dtype: float64

In [21]:
df[1:4]

Unnamed: 0,A,B,C,D
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-04,0.918227,-0.280356,1.916277,-1.074678


In [22]:
df['20220601' :'20220605']

Unnamed: 0,A,B,C,D
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-05,-0.90548,1.25357,0.254604,0.192972


- Selection by Label (Label 을 통한 선택)

In [23]:
df.loc[dates[0]]

A   -0.795554
B   -0.638544
C   -0.432497
D   -0.339563
Name: 2022-06-01 00:00:00, dtype: float64

In [24]:
df.loc[dates[0],'A']

-0.7955538361468891

In [25]:
df.loc['20220601',['A','B']]

A   -0.795554
B   -0.638544
Name: 2022-06-01 00:00:00, dtype: float64

In [26]:
df.loc[:,['A','B']]

Unnamed: 0,A,B
2022-06-01,-0.795554,-0.638544
2022-06-02,0.526383,-3.130227
2022-06-03,-2.484921,-0.767825
2022-06-04,0.918227,-0.280356
2022-06-05,-0.90548,1.25357
2022-06-06,0.196703,0.285102


In [27]:
df.loc['20220601':'20220603', ['A','B']]

Unnamed: 0,A,B
2022-06-01,-0.795554,-0.638544
2022-06-02,0.526383,-3.130227
2022-06-03,-2.484921,-0.767825


- Selection by Position (위치로 선택하기)

In [28]:
df.iloc[3]

A    0.918227
B   -0.280356
C    1.916277
D   -1.074678
Name: 2022-06-04 00:00:00, dtype: float64

In [29]:
df.iloc[[1,2,4],[0,2]]

Unnamed: 0,A,C
2022-06-02,0.526383,-0.347309
2022-06-03,-2.484921,-0.153918
2022-06-05,-0.90548,0.254604


In [30]:
df.iloc[1:3,1:3]

Unnamed: 0,B,C
2022-06-02,-3.130227,-0.347309
2022-06-03,-0.767825,-0.153918


In [31]:
df.iloc[1,1]

-3.1302273684679567

In [32]:
df.iat[1,1]     # 스칼라 값을 빠르게 얻는 방법입니다 (위의 방식과 동일합니다).

-3.1302273684679567

In [33]:
# ? df.iat

- Boolean Indexing

In [34]:
df

Unnamed: 0,A,B,C,D
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-05,-0.90548,1.25357,0.254604,0.192972
2022-06-06,0.196703,0.285102,1.328384,-1.410239


In [35]:
df[df.A > 0]

Unnamed: 0,A,B,C,D
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-06,0.196703,0.285102,1.328384,-1.410239


In [36]:
df[df > 0]

Unnamed: 0,A,B,C,D
2022-06-01,,,,
2022-06-02,0.526383,,,1.203816
2022-06-03,,,,0.858601
2022-06-04,0.918227,,1.916277,
2022-06-05,,1.25357,0.254604,0.192972
2022-06-06,0.196703,0.285102,1.328384,


In [37]:
df2 = df.copy()
df2

Unnamed: 0,A,B,C,D
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563
2022-06-02,0.526383,-3.130227,-0.347309,1.203816
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601
2022-06-04,0.918227,-0.280356,1.916277,-1.074678
2022-06-05,-0.90548,1.25357,0.254604,0.192972
2022-06-06,0.196703,0.285102,1.328384,-1.410239


In [38]:
df2['E'] = ['one', 'one', 'two', 'three', 'four', 'three']

In [39]:
df2

Unnamed: 0,A,B,C,D,E
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563,one
2022-06-02,0.526383,-3.130227,-0.347309,1.203816,one
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601,two
2022-06-04,0.918227,-0.280356,1.916277,-1.074678,three
2022-06-05,-0.90548,1.25357,0.254604,0.192972,four
2022-06-06,0.196703,0.285102,1.328384,-1.410239,three


In [40]:
df2[df2['E'].isin(['two','four'])]

Unnamed: 0,A,B,C,D,E
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601,two
2022-06-05,-0.90548,1.25357,0.254604,0.192972,four


- Setting (설정)

In [41]:
# s1 = pd.Series([1,2,3,4,5,6], index=pd.date_range('20220601', periods=6))
s1 = pd.Series(range(1,7), index=pd.date_range('20220601', periods=6))
s1

2022-06-01    1
2022-06-02    2
2022-06-03    3
2022-06-04    4
2022-06-05    5
2022-06-06    6
Freq: D, dtype: int64

In [42]:
df['F'] = s1

In [43]:
df

Unnamed: 0,A,B,C,D,F
2022-06-01,-0.795554,-0.638544,-0.432497,-0.339563,1
2022-06-02,0.526383,-3.130227,-0.347309,1.203816,2
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601,3
2022-06-04,0.918227,-0.280356,1.916277,-1.074678,4
2022-06-05,-0.90548,1.25357,0.254604,0.192972,5
2022-06-06,0.196703,0.285102,1.328384,-1.410239,6


In [44]:
dates[0]

Timestamp('2022-06-01 00:00:00', freq='D')

In [45]:
# 라벨에 의해 값 설정
df.at[dates[0],'A'] = 0
df

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,-0.638544,-0.432497,-0.339563,1
2022-06-02,0.526383,-3.130227,-0.347309,1.203816,2
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601,3
2022-06-04,0.918227,-0.280356,1.916277,-1.074678,4
2022-06-05,-0.90548,1.25357,0.254604,0.192972,5
2022-06-06,0.196703,0.285102,1.328384,-1.410239,6


In [46]:
# 위치에 의해 값 설정
df.iat[0,1] = 0
df

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,0.0,-0.432497,-0.339563,1
2022-06-02,0.526383,-3.130227,-0.347309,1.203816,2
2022-06-03,-2.484921,-0.767825,-0.153918,0.858601,3
2022-06-04,0.918227,-0.280356,1.916277,-1.074678,4
2022-06-05,-0.90548,1.25357,0.254604,0.192972,5
2022-06-06,0.196703,0.285102,1.328384,-1.410239,6


In [47]:
# Numpy 배열을 사용한 할당에 의해 값을 설정
df.loc[:,'D'] = np.array([5] * len(df))
df

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,0.0,-0.432497,5,1
2022-06-02,0.526383,-3.130227,-0.347309,5,2
2022-06-03,-2.484921,-0.767825,-0.153918,5,3
2022-06-04,0.918227,-0.280356,1.916277,5,4
2022-06-05,-0.90548,1.25357,0.254604,5,5
2022-06-06,0.196703,0.285102,1.328384,5,6


In [48]:
# where 연산을 설정
df2 = df.copy()
df2

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,0.0,-0.432497,5,1
2022-06-02,0.526383,-3.130227,-0.347309,5,2
2022-06-03,-2.484921,-0.767825,-0.153918,5,3
2022-06-04,0.918227,-0.280356,1.916277,5,4
2022-06-05,-0.90548,1.25357,0.254604,5,5
2022-06-06,0.196703,0.285102,1.328384,5,6


In [49]:
df2[df2 > 0] = -df2
df2

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,0.0,-0.432497,-5,-1
2022-06-02,-0.526383,-3.130227,-0.347309,-5,-2
2022-06-03,-2.484921,-0.767825,-0.153918,-5,-3
2022-06-04,-0.918227,-0.280356,-1.916277,-5,-4
2022-06-05,-0.90548,-1.25357,-0.254604,-5,-5
2022-06-06,-0.196703,-0.285102,-1.328384,-5,-6


## 4. Missing Data (결측치)

> - Pandas는 결측치를 표현하기 위해 주로 np.nan 값을 사용합니다. 
<br/>이 방법은 기본 설정값이지만 계산에는 포함되지 않습니다. 
<br/>Missing data section을 참조
> - Reindexing으로 지정된 축 상의 인덱스를 변경 / 추가 / 삭제할 수 있습니다. 
<br/>Reindexing은 데이터의 복사본을 반환합니다.

In [50]:
df

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,0.0,-0.432497,5,1
2022-06-02,0.526383,-3.130227,-0.347309,5,2
2022-06-03,-2.484921,-0.767825,-0.153918,5,3
2022-06-04,0.918227,-0.280356,1.916277,5,4
2022-06-05,-0.90548,1.25357,0.254604,5,5
2022-06-06,0.196703,0.285102,1.328384,5,6


In [51]:
columns = list(df.columns) + ['E']
columns

['A', 'B', 'C', 'D', 'F', 'E']

In [52]:
df1 = df.reindex(index=dates[0:4], columns=columns)
df1

Unnamed: 0,A,B,C,D,F,E
2022-06-01,0.0,0.0,-0.432497,5,1,
2022-06-02,0.526383,-3.130227,-0.347309,5,2,
2022-06-03,-2.484921,-0.767825,-0.153918,5,3,
2022-06-04,0.918227,-0.280356,1.916277,5,4,


In [53]:
df1.loc[dates[0]:dates[1],'E'] = 1
df1

Unnamed: 0,A,B,C,D,F,E
2022-06-01,0.0,0.0,-0.432497,5,1,1.0
2022-06-02,0.526383,-3.130227,-0.347309,5,2,1.0
2022-06-03,-2.484921,-0.767825,-0.153918,5,3,
2022-06-04,0.918227,-0.280356,1.916277,5,4,


- 결측치 행들에 대한 처리 : Drop / Fiil

In [54]:
df1.dropna(how='any')

Unnamed: 0,A,B,C,D,F,E
2022-06-01,0.0,0.0,-0.432497,5,1,1.0
2022-06-02,0.526383,-3.130227,-0.347309,5,2,1.0


In [55]:
df1.fillna(value=5)

Unnamed: 0,A,B,C,D,F,E
2022-06-01,0.0,0.0,-0.432497,5,1,1.0
2022-06-02,0.526383,-3.130227,-0.347309,5,2,1.0
2022-06-03,-2.484921,-0.767825,-0.153918,5,3,5.0
2022-06-04,0.918227,-0.280356,1.916277,5,4,5.0


In [56]:
pd.isna(df1)

Unnamed: 0,A,B,C,D,F,E
2022-06-01,False,False,False,False,False,False
2022-06-02,False,False,False,False,False,False
2022-06-03,False,False,False,False,False,True
2022-06-04,False,False,False,False,False,True


## 5. Operation (연산)

> Stats (통계)
> - 일반적으로 결측치를 제외한 후 연산됩니다.
> - 기술통계를 수행합니다.

In [57]:
df

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,0.0,-0.432497,5,1
2022-06-02,0.526383,-3.130227,-0.347309,5,2
2022-06-03,-2.484921,-0.767825,-0.153918,5,3
2022-06-04,0.918227,-0.280356,1.916277,5,4
2022-06-05,-0.90548,1.25357,0.254604,5,5
2022-06-06,0.196703,0.285102,1.328384,5,6


In [58]:
df.mean()

A   -0.291515
B   -0.439956
C    0.427590
D    5.000000
F    3.500000
dtype: float64

In [59]:
df.mean(1)

2022-06-01    1.113501
2022-06-02    0.809769
2022-06-03    0.918667
2022-06-04    2.310829
2022-06-05    2.120539
2022-06-06    2.562038
Freq: D, dtype: float64

In [60]:
s = pd.Series([1,3,5,np.nan,6,8], index=dates).shift(2)
s

2022-06-01    NaN
2022-06-02    NaN
2022-06-03    1.0
2022-06-04    3.0
2022-06-05    5.0
2022-06-06    NaN
Freq: D, dtype: float64

In [61]:
df.sub(s, axis='index')

Unnamed: 0,A,B,C,D,F
2022-06-01,,,,,
2022-06-02,,,,,
2022-06-03,-3.484921,-1.767825,-1.153918,4.0,2.0
2022-06-04,-2.081773,-3.280356,-1.083723,2.0,1.0
2022-06-05,-5.90548,-3.74643,-4.745396,0.0,0.0
2022-06-06,,,,,


> Apply (적용)
> - 데이터에 함수를 적용합니다.

In [62]:
df

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,0.0,-0.432497,5,1
2022-06-02,0.526383,-3.130227,-0.347309,5,2
2022-06-03,-2.484921,-0.767825,-0.153918,5,3
2022-06-04,0.918227,-0.280356,1.916277,5,4
2022-06-05,-0.90548,1.25357,0.254604,5,5
2022-06-06,0.196703,0.285102,1.328384,5,6


In [63]:
df.apply(np.cumsum)

Unnamed: 0,A,B,C,D,F
2022-06-01,0.0,0.0,-0.432497,5,1
2022-06-02,0.526383,-3.130227,-0.779807,10,3
2022-06-03,-1.958539,-3.898053,-0.933725,15,6
2022-06-04,-1.040312,-4.178409,0.982552,20,10
2022-06-05,-1.945792,-2.924839,1.237156,25,15
2022-06-06,-1.749089,-2.639736,2.56554,30,21


In [64]:
df.apply(lambda x: x.max() - x.min())

A    3.403148
B    4.383798
C    2.348774
D    0.000000
F    5.000000
dtype: float64

> Histogramming (히스토그래밍)
> - 도수분포표를 그래프로 나타낸 것
> - 즉, 표로 되어 있는 도수 분포를 정보 그림으로 나타낸 것

In [65]:
s = pd.Series(np.random.randint(1, 7, size=10))
s

0    1
1    2
2    5
3    5
4    1
5    5
6    4
7    6
8    4
9    5
dtype: int32

In [66]:
s.value_counts()

5    4
1    2
4    2
2    1
6    1
dtype: int64

> String Methods (문자열 메소드)
> - Series는 다음의 코드와 같이 문자열 처리 메소드 모음 (set)을 가지고 있습니다.
<br/>이 모음은 배열의 각 요소를 쉽게 조작할 수 있도록 만들어주는 문자열의 속성에 포함되어 있습니다.
> - 문자열의 패턴 일치 확인은 기본적으로 정규 표현식을 사용하며, 몇몇 경우에는 항상 정규 표현식을 사용함에 유의하십시오.
> - 좀 더 자세한 내용은 벡터화된 문자열 메소드 부분에서 확인할 수 있습니다.

In [67]:
s = pd.Series(['A', 'B', 'C', 'Aaba', 'Baca', np.nan, 'CABA', 'dog', 'cat'])
s

0       A
1       B
2       C
3    Aaba
4    Baca
5     NaN
6    CABA
7     dog
8     cat
dtype: object

In [68]:
s.str.lower()

0       a
1       b
2       c
3    aaba
4    baca
5     NaN
6    caba
7     dog
8     cat
dtype: object

In [69]:
s.str.upper()

0       A
1       B
2       C
3    AABA
4    BACA
5     NaN
6    CABA
7     DOG
8     CAT
dtype: object

In [70]:
s.str.swapcase()

0       a
1       b
2       c
3    aABA
4    bACA
5     NaN
6    caba
7     DOG
8     CAT
dtype: object

<hr>
<marquee><font size=3 color='brown'>The BigpyCraft find the information to design valuable society with Technology & Craft.</font></marquee>
<div align='right'><font size=2 color='gray'> &lt; The End &gt; </font></div>