# <a id='toc1_'></a>[데이터 변환](#toc0_)
---

**Table of contents**<a id='toc0_'></a>    
- [데이터 변환](#toc1_)    
    - [`astype`](#toc1_1_1_)    
  - [날짜 데이터 변환](#toc1_2_)    
    - [`datetime.strptime`](#toc1_2_1_)    
    - [`total_seconds`](#toc1_2_2_)    
  - [데이터의 범위 변환](#toc1_3_)    
    - [정규화(Normalization)](#toc1_3_1_)    
      - [최소-최대 정규화(Min-Max Normalization)](#toc1_3_1_1_)    
    - [표준화(Standardization)](#toc1_3_2_)    
      - [Z-점수(Z-Score)](#toc1_3_2_1_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

---

### <a id='toc1_1_1_'></a>[`astype`](#toc0_)

In [1]:
import pandas as pd

df = pd.DataFrame({'Class': [1, 2, 3, 1, 2, 1],
                    'Age': [1, 3.2, 11.8, 33.2, 42.9, 33.2],
                    'Part': [3, 7, 2, 1, 3, 5]})

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Class   6 non-null      int64  
 1   Age     6 non-null      float64
 2   Part    6 non-null      int64  
dtypes: float64(1), int64(2)
memory usage: 276.0 bytes
None


In [2]:
df['Class'] = df['Class'].astype('category')    # 범주형으로 변환
df['Age'] = df['Age'].astype(int)      # int형으로 변환
df['Part'] = df['Part'].astype(float)   # float형으로 변환

print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype   
---  ------  --------------  -----   
 0   Class   6 non-null      category
 1   Age     6 non-null      int32   
 2   Part    6 non-null      float64 
dtypes: category(1), float64(1), int32(1)
memory usage: 342.0 bytes
None


In [3]:
print(df)

  Class  Age  Part
0     1    1   3.0
1     2    3   7.0
2     3   11   2.0
3     1   33   1.0
4     2   42   3.0
5     1   33   5.0


## <a id='toc1_2_'></a>[날짜 데이터 변환](#toc0_)

### <a id='toc1_2_1_'></a>[`datetime.strptime`](#toc0_)

In [4]:
from datetime import datetime

a = datetime.strptime("1999 March 1", "%Y %B %d")
y = a.year
m = a.month
d = a.day

print(f"{y} {m} {d}")

1999 3 1


In [5]:
b = datetime.strptime("3-1 1999", "%m-%d %Y")
y = b.year
m = b.month
d = b.day

print(f"{y} {m} {d}")

1999 3 1


### <a id='toc1_2_2_'></a>[`total_seconds`](#toc0_)

In [6]:
from datetime import datetime

x = datetime(1999, 3, 1, 12, 1, 0)
y = datetime(1999, 3, 1, 12, 2, 0)
z = (x - y).total_seconds()   # 전체 초수 계산

print(z)

-60.0


## <a id='toc1_3_'></a>[데이터의 범위 변환](#toc0_)

### <a id='toc1_3_1_'></a>[정규화(Normalization)](#toc0_)

#### <a id='toc1_3_1_1_'></a>[최소-최대 정규화(Min-Max Normalization)](#toc0_)

$$MinMax = \frac{X - Min}{Max - Min} $$

In [7]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler

data = np.array([1, 3, 5, 7, 9])
x = data.reshape(-1, 1)
scaled_data = MinMaxScaler().fit_transform(x)

print(scaled_data.flatten())    # 1차원으로 평탄화 후 출력

[0.   0.25 0.5  0.75 1.  ]


In [8]:
import numpy as np

data = np.array([1, 3, 5, 7, 9])

def normalize(arr):
    return (arr - np.min(arr)) / (np.max(arr) - np.min(arr))

print(normalize(data))

[0.   0.25 0.5  0.75 1.  ]


### <a id='toc1_3_2_'></a>[표준화(Standardization)](#toc0_)

#### <a id='toc1_3_2_1_'></a>[Z-점수(Z-Score)](#toc0_)

In [9]:
import numpy as np

data = np.array([1, 3, 5, 7, 9])

def standardize(a):
	return (a - np.mean(a)) / np.std(a, ddof=1)   # 자유도를 1로 설정

data_zscore = standardize(data)

print(np.mean(data_zscore))

4.4408920985006264e-17


In [10]:
print(np.std(data_zscore, ddof=1))

1.0


In [11]:
print(data_zscore)

[-1.26491106 -0.63245553  0.          0.63245553  1.26491106]
