# 복습 및 정리내용

> - 1. ColumnTransformer
> - 2. One Hot Encoding
> - 3. 다중 선형 회귀
> - 4. 다양한 평가지(회귀 모델)


### 1. ColumnTransformer



# def;Class that allows combining the outputs of multiple transformer objects used on column subsets of the data into a single feature space.

<br>
<br>

> - shorthand for the ColumnTransformer constructor <br>
> - does not require and does not permit, naming the transfomers <br>
> - does not allow weighting with transformer_weights <br>

<br>
<br>

### parameters
> 1. transformers : list of tuples <br>
> - name : str 
> - transformer : {‘drop’, ‘passthrough’} or estimator <br>
> - columns : str, array-like of str, int, array-like of int, slice, array-like of bool or callable<br>
> 2. remainder : {‘drop’, ‘passthrough’} or estimator, default=’drop’<br>
> 3. sparse_threshold : float, default=0.3<br>
> 4. n_jobs : int, default=None<br>
> 5. transformer_weights : dict, default=None
> 6. verbose : bool, default=False<br>
> 7. verbose_feature_names_out : bool, default=True<br>
> Returns a ColumnTransformer object.

<br>
<br>

### attributes
> 1. transformers_ : list<br>
> 2. sparse_output_ : bool<br>
> 3. output_indices_ : dict<br>
> 4. n_features_in_ : int<br>

In [36]:
'''
https://scikit-learn.org/stable/modules/generated/sklearn.compose.ColumnTransformer.html#sklearn.compose.ColumnTransformer
'''

import numpy as np
from sklearn.compose import ColumnTransformer
# Normalizer 추가공부
from sklearn.preprocessing import Normalizer


# slice(); 슬라이싱을 함수형태로 표현한것
# ct = ColumnTransformer([("norm1", Normalizer(norm='l1'), [0, 1]),
#                        ("norm2", Normalizer(norm='l1'), slice(2, 4))])

ct = ColumnTransformer([("norm1", Normalizer(norm='l1'), [0, 1]),
                       ("norm2", Normalizer(norm='l1'), slice(2, 4))])

X = np.array([[0., 1., 2., 2.],
             [1., 1., 0., 1.,]])
print(X)



[[0. 1. 2. 2.]
 [1. 1. 0. 1.]]


In [37]:
ct.fit_transform(X)

array([[0. , 1. , 0.5, 0.5],
       [0.5, 0.5, 0. , 1. ]])

In [4]:
ct.transformers_

[('norm1', Normalizer(norm='l1'), [0, 1]),
 ('norm2', Normalizer(norm='l1'), slice(2, 4, None))]

In [5]:
ct.sparse_output_

False

In [6]:
ct.output_indices_

{'norm1': slice(0, 2, None),
 'norm2': slice(2, 4, None),
 'remainder': slice(0, 0, None)}

In [7]:
ct.n_features_in_

4

- 등장 배경; 변수가 많을수록 전처리의 방법은 다양해지고, 이를 작업하기 위해서 원본 데이터셋을 
  자르거나 합치거나 등의 작업을 여러 cell에서 수행하는 것은 까다로움
  <br>
  
- 사용법; 튜플을 인수로 받으며, 각 튜플마다 다양한 인코딩 노말라이징의 방법을 적용하고 싶은 col들과 같이 써줌

# One Hot Encoding

In [None]:
class sklearn.preprocessing.OneHotEncoder(*, categories='auto', drop=None, sparse=True, dtype=<class 'numpy.float64'>, handle_unknown='error', min_frequency=None, max_categories=None)