## One-Hot Encoding of Transaction Data

One-hot encoder class for transaction data in Python lists

> from mlxtend.preprocessing import OnehotTransactions

## Overview

Encodes database transaction data in form of a Python list of lists into a one-hot encoded NumPy integer array.

## Example 1

Suppose we have the following transaction data:

In [1]:
from mlxtend.preprocessing import OnehotTransactions

dataset = [['Apple', 'Beer', 'Rice', 'Chicken'],
           ['Apple', 'Beer', 'Rice'],
           ['Apple', 'Beer'],
           ['Apple', 'Bananas'],
           ['Milk', 'Beer', 'Rice', 'Chicken'],
           ['Milk', 'Beer', 'Rice'],
           ['Milk', 'Beer'],
           ['Apple', 'Bananas']]

Using and `OnehotTransaction` object, we can transform this dataset into a one-hot encoded format suitable for typical machine learning APIs. Via the `fit` method, the `OnehotTransaction` encoder learns the unique labels in the dataset, and via the `transform` method, it transforms the input dataset (a Python list of lists) into a one-hot encoded NumPy integer array:

In [2]:
oht = OnehotTransactions()
oht_ary = oht.fit(dataset).transform(dataset)
oht_ary

array([[1, 0, 1, 1, 0, 1],
       [1, 0, 1, 0, 0, 1],
       [1, 0, 1, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 1],
       [0, 0, 1, 0, 1, 1],
       [0, 0, 1, 0, 1, 0],
       [1, 1, 0, 0, 0, 0]])

After fitting, the unique column names that correspond to the data array shown above can be accessed via the `columns_` attribute:

In [3]:
oht.columns_

['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']

For our convenience, we can turn the one-hot encoded array into a pandas DataFrame:

In [4]:
import pandas as pd

pd.DataFrame(oht_ary, columns=oht.columns_)

Unnamed: 0,Apple,Bananas,Beer,Chicken,Milk,Rice
0,1,0,1,1,0,1
1,1,0,1,0,0,1
2,1,0,1,0,0,0
3,1,1,0,0,0,0
4,0,0,1,1,1,1
5,0,0,1,0,1,1
6,0,0,1,0,1,0
7,1,1,0,0,0,0


If we desire, we can turn the one-hot encoded array back into a transaction list of lists via the `inverse_transform` function:

In [5]:
first4 = oht_ary[:4]
oht.inverse_transform(first4)

[['Apple', 'Beer', 'Chicken', 'Rice'],
 ['Apple', 'Beer', 'Rice'],
 ['Apple', 'Beer'],
 ['Apple', 'Bananas']]

## API

In [1]:
with open('../../api_modules/mlxtend.preprocessing/OnehotTransactions.md', 'r') as f:
    print(f.read())

## OnehotTransactions

*OnehotTransactions()*

One-hot encoder class for transaction data in Python lists

**Parameters**

None

**Attributes**

columns_: list
List of unique names in the `X` input list of lists

### Methods

<hr>

*fit(X)*

Learn unique column names from transaction DataFrame

**Parameters**

- `X` : list of lists

    A python list of lists, where the outer list stores the
    n transactions and the inner list stores the items in each
    transaction.

    For example,
    [['Apple', 'Beer', 'Rice', 'Chicken'],
    ['Apple', 'Beer', 'Rice'],
    ['Apple', 'Beer'],
    ['Apple', 'Bananas'],
    ['Milk', 'Beer', 'Rice', 'Chicken'],
    ['Milk', 'Beer', 'Rice'],
    ['Milk', 'Beer'],
    ['Apple', 'Bananas']]

<hr>

*fit_transform(X)*

Fit a OnehotTransactions encoder and transform a dataset.

<hr>

*inverse_transform(onehot)*

Transforms a one-hot encoded NumPy array back into transactions.

**Parameters**

- `onehot` : NumPy array [n_transactions, n_unique_items]

   