## TransactionEncoder: Convert item lists into transaction data for frequent itemset mining

Encoder class for transaction data in Python lists

> from mlxtend.preprocessing import TransactionEncoder

## Overview

Encodes database transaction data in form of a Python list of lists into a NumPy array.

## Example 1

Suppose we have the following transaction data:

In [1]:
from mlxtend.preprocessing import TransactionEncoder

dataset = [['Apple', 'Beer', 'Rice', 'Chicken'],
           ['Apple', 'Beer', 'Rice'],
           ['Apple', 'Beer'],
           ['Apple', 'Bananas'],
           ['Milk', 'Beer', 'Rice', 'Chicken'],
           ['Milk', 'Beer', 'Rice'],
           ['Milk', 'Beer'],
           ['Apple', 'Bananas']]

Using and `TransactionEncoder` object, we can transform this dataset into an array format suitable for typical machine learning APIs. Via the `fit` method, the `TransactionEncoder` learns the unique labels in the dataset, and via the `transform` method, it transforms the input dataset (a Python list of lists) into a one-hot encoded NumPy boolean array:

In [2]:
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
te_ary

array([[ True, False,  True,  True, False,  True],
       [ True, False,  True, False, False,  True],
       [ True, False,  True, False, False, False],
       [ True,  True, False, False, False, False],
       [False, False,  True,  True,  True,  True],
       [False, False,  True, False,  True,  True],
       [False, False,  True, False,  True, False],
       [ True,  True, False, False, False, False]])

The NumPy array is boolean for the sake of memory efficiency when working with large datasets. If a classic integer representation is desired instead, we can just convert the array to the appropriate type: 

In [3]:
te_ary.astype("int")

array([[1, 0, 1, 1, 0, 1],
       [1, 0, 1, 0, 0, 1],
       [1, 0, 1, 0, 0, 0],
       [1, 1, 0, 0, 0, 0],
       [0, 0, 1, 1, 1, 1],
       [0, 0, 1, 0, 1, 1],
       [0, 0, 1, 0, 1, 0],
       [1, 1, 0, 0, 0, 0]])

After fitting, the unique column names that correspond to the data array shown above can be accessed via the `columns_` attribute, or the `get_feature_names_out` method:

In [4]:
te.columns_  # list of strings

['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice']

In [5]:
te.get_feature_names_out()  # numpy.array of strings (objects).

array(['Apple', 'Bananas', 'Beer', 'Chicken', 'Milk', 'Rice'],
      dtype=object)

If we desire, we can turn the one-hot encoded array back into a transaction list of lists via the `inverse_transform` function:

In [6]:
first4 = te_ary[:4]
te.inverse_transform(first4)

[['Apple', 'Beer', 'Chicken', 'Rice'],
 ['Apple', 'Beer', 'Rice'],
 ['Apple', 'Beer'],
 ['Apple', 'Bananas']]

For our convenience, we can set the default output to a pandas `DataFrame` with the `set_output` method:

In [7]:
te = TransactionEncoder().set_output(transform="pandas")
te_df = te.fit(dataset).transform(dataset)
te_df

Unnamed: 0,Apple,Bananas,Beer,Chicken,Milk,Rice
0,True,False,True,True,False,True
1,True,False,True,False,False,True
2,True,False,True,False,False,False
3,True,True,False,False,False,False
4,False,False,True,True,True,True
5,False,False,True,False,True,True
6,False,False,True,False,True,False
7,True,True,False,False,False,False


## API

In [3]:
with open('../../api_modules/mlxtend.preprocessing/TransactionEncoder.md', 'r') as f:
    print(f.read())

## TransactionEncoder

*TransactionEncoder()*

Encoder class for transaction data in Python lists

**Parameters**

None

**Attributes**

columns_: list
List of unique names in the `X` input list of lists

**Examples**

For usage examples, please see
[https://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/](https://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/)

### Methods

<hr>

*fit(X)*

Learn unique column names from transaction DataFrame

**Parameters**

- `X` : list of lists

    A python list of lists, where the outer list stores the
    n transactions and the inner list stores the items in each
    transaction.

    For example,
    [['Apple', 'Beer', 'Rice', 'Chicken'],
    ['Apple', 'Beer', 'Rice'],
    ['Apple', 'Beer'],
    ['Apple', 'Bananas'],
    ['Milk', 'Beer', 'Rice', 'Chicken'],
    ['Milk', 'Beer', 'Rice'],
    ['Milk', 'Beer'],
    ['Apple', 'Bananas']]

<hr>

*fit_transform(X, sparse=False)*

Fit a TransactionEncoder enco