# Multiple Categorical Encoding Problem

In Machine Learning, we often have features given not as continuous but categorical values. We can use LabelEncoder and OneHotEncoder from scikit-learn library to properly encode categorical feature to numerical values before feeding the data into an ML algorithm. See this [scikit-learn Guilde](http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features) for more details.

In a more complex scenario, each value in a feature column might contain a delimited list of categories. In the following example, the industry column contains comma separated industry catetories.


In [3]:
import pandas as pd
data = pd.DataFrame([['SaaS,Health Care'], ['SaaS, Enterprise Software'], ['Health Care, Enterprise Software'], ['Finance Software']], columns=['industry'])
data

Unnamed: 0,industry
0,"SaaS,Health Care"
1,"SaaS, Enterprise Software"
2,"Health Care, Enterprise Software"
3,Finance Software


Ideally, we would like to encode such feature in a way similar to one-hot-encoding so that each category get its own column but a sample can get multiple '1's for each category it has in its list. Here is an example of such encoding.

In [4]:
encoding = pd.DataFrame([
    [1, 1, 0, 0],
    [1, 0, 1, 0],
    [0, 1, 1, 0],
    [0, 0, 0, 1]
], columns=['industry_SaaS', 'industry_Health Care', 'industry_Enterprise Software', 'industry_Finance Software'])
encoding

Unnamed: 0,industry_SaaS,industry_Health Care,industry_Enterprise Software,industry_Finance Software
0,1,1,0,0
1,1,0,1,0
2,0,1,1,0
3,0,0,0,1


# Task

Sklearn does not have a built-in transformer to handle such situations. Please create a sklearn compatible transformer for such encoding task. 

Here are some other requirements:
1. The code need to be wrapped in a scikit-leearn compatible transformer class
1. The transformer should be able to handle different delimiters
1. The transformer should be able to handle large amount of categories, with option to callapse less frequent categories
1. Given a pandas Series as input, the result of the transformation should be a pandas DataFrame with meaningful column names indicating the categories