# Help

## OneHotEncoder

[scikit learn one hot encoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)

[Pandas categorical data](https://pandas.pydata.org/docs/user_guide/categorical.html#controlling-behavior)

[towardsdatascience one hot encoding](https://towardsdatascience.com/the-best-methods-for-one-hot-encoding-your-data-c29c78a153fd)

In [7]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder

In [8]:
df = pd.DataFrame(np.array([[1, 'A', 'X'], [2, np.nan, 'Y'], [3, 'B', 'X']]), 
             columns=['id', 'c_1', 'c_2'])
df

Unnamed: 0,id,c_1,c_2
0,1,A,X
1,2,,Y
2,3,B,X


In [9]:
X = df[['c_1', 'c_2']].to_numpy()
X

array([['A', 'X'],
       ['nan', 'Y'],
       ['B', 'X']], dtype=object)

In [15]:
unique_values = [['A', 'B', 'C'], ['X', 'Y']]
enc = OneHotEncoder(categories=unique_values, handle_unknown='ignore')
enc.fit(X)
X_dummy = enc.transform(X) # X: feature vector 1 hot encoded
X_dummy


<3x5 sparse matrix of type '<class 'numpy.float64'>'
	with 5 stored elements in Compressed Sparse Row format>

In [16]:
print(X_dummy)
print(X_dummy.toarray())

  (0, 0)	1.0
  (0, 3)	1.0
  (1, 4)	1.0
  (2, 1)	1.0
  (2, 3)	1.0
[[1. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]
 [0. 1. 0. 1. 0.]]


In [20]:
X_test = np.array([['A', 'Y']])
enc.transform(X_test).toarray()

array([[1., 0., 0., 0., 1.]])

In [1]:
[['A', 'B'] for i in range(2)] + [['C', 'D'] for i in range(2)]

[['A', 'B'], ['A', 'B'], ['C', 'D'], ['C', 'D']]

In [10]:
enc.fit(X)
X_dummy = enc.transform(X) # X: feature vector 1 hot encoded

## Ajout d'une colonne à une matrice creuse

In [24]:
X_id = [[1], [2], [3]]
X_id

[[1], [2], [3]]

In [22]:

# numpy add column to numpy sparse matrix
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html
from scipy.sparse import coo_matrix, hstack
X_id = [[1], [2], [3]]
X_id = coo_matrix(X_id)
hstack([X_id, X_dummy]).toarray()


array([[1., 1., 0., 0., 1., 0.],
       [2., 0., 0., 0., 0., 1.],
       [3., 0., 1., 0., 1., 0.]])