Original source is [here](https://www.kaggle.com/rhodiumbeng/classifying-multi-label-comments-ngrams) 

If you need additional explanation, above

In [1]:
import pandas as pd
train_df = pd.read_csv("./data/train.csv")
train_df.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,Explanation\nWhy the edits made under my usern...,0,0,0,0,0,0
1,000103f0d9cfb60f,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on ...",0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0


In [11]:
cols_target = ['obscene','insult','toxic','severe_toxic','identity_hate','threat']

In [2]:
X = train_df.comment_text
X.head()

0    Explanation\nWhy the edits made under my usern...
1    D'aww! He matches this background colour I'm s...
2    Hey man, I'm really not trying to edit war. It...
3    "\nMore\nI can't make any real suggestions on ...
4    You, sir, are my hero. Any chance you remember...
Name: comment_text, dtype: object

In [6]:
test_df = pd.read_csv("./data/test.csv")
test_df.head()

Unnamed: 0,id,comment_text
0,00001cee341fdb12,Yo bitch Ja Rule is more succesful then you'll...
1,0000247867823ef7,== From RfC == \n\n The title is fine as it is...
2,00013b17ad220c46,""" \n\n == Sources == \n\n * Zawe Ashton on Lap..."
3,00017563c3f7919a,":If you have a look back at the source, the in..."
4,00017695ad8997eb,I don't anonymously edit articles at all.


In [7]:
Y = test_df.comment_text
Y.head()

0    Yo bitch Ja Rule is more succesful then you'll...
1    == From RfC == \n\n The title is fine as it is...
2    " \n\n == Sources == \n\n * Zawe Ashton on Lap...
3    :If you have a look back at the source, the in...
4            I don't anonymously edit articles at all.
Name: comment_text, dtype: object

In [3]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
vect = TfidfVectorizer(analyzer='char', ngram_range=(1,4), max_features=50000, min_df=2)
vect

TfidfVectorizer(analyzer='char', binary=False, decode_error='strict',
        dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
        lowercase=True, max_df=1.0, max_features=50000, min_df=2,
        ngram_range=(1, 4), norm='l2', preprocessor=None, smooth_idf=True,
        stop_words=None, strip_accents=None, sublinear_tf=False,
        token_pattern='(?u)\\b\\w\\w+\\b', tokenizer=None, use_idf=True,
        vocabulary=None)

In [4]:
X_dtm = vect.fit_transform(X)
X_dtm

<159571x50000 sparse matrix of type '<class 'numpy.float64'>'
	with 105511344 stored elements in Compressed Sparse Row format>

In [13]:
test_X_dtm = vect.fit_transform(Y)
test_X_dtm

<153164x50000 sparse matrix of type '<class 'numpy.float64'>'
	with 91065914 stored elements in Compressed Sparse Row format>

## Solving a multi-label classification problem
One way to approach a multi-label classification problem is to transform the problem into separate single-class classifier problems. This is known as 'problem transformation'. There are three methods:

- **Binary Relevance**. This is probably the simplest which treats each label as a separate single classification problems. The key assumption here though, is that there are no correlation among the various labels.
    각각의 classification label이 독립된 것으로 생각한다. 이 경우 여러 가지 라벨간의 correlation은 없다고 생각한다.
- **Classifier Chains**. In this method, the first classifier is trained on the input X. Then the subsequent classifiers are trained on the input X and all previous classifiers' predictions in the chain. This method attempts to draw the signals from the correlation among preceding target variables.
    일단 첫 모델을 첫 번째 class에 대해 train한다. 그 다음 subsequent 모델은 input X에 대해 첫 번째 모델에서 예측한 prediction을 함께 사용하여 train한다. 이러한 모델은 서로 다른 라벨간의 correlation을 모델링할 수 있다.
- **Label Powerset**. This method transforms the problem into a multi-class problem where the multi-class labels are essentially all the unique label combinations. In our case here, where there are six binary labels, Label Powerset would in effect turn this into a 2^6 or 64-class problem! [Thanks Joshua.]
    n개의 라벨끼리의 조합을 항상 서로 다른 것으로 간주한다. 이 경우 6개의 class가 있으므로 2^6, 즉 64-class 문제로 변환하여 푼다.

In [9]:
# import and instantiate the Logistic Regression model
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
logreg = LogisticRegression(C=9.0)

In [15]:
submission_chains = pd.read_csv('data/sample_submission.csv')

In [5]:
# Classifier Chains - logistic regression

def add_feature(X, feature_to_add):
    '''
    Returns sparse feature matrix with added feature.
    feature_to_add can also be a list of features.
    '''
    from scipy.sparse import csr_matrix, hstack
    return hstack([X, csr_matrix(feature_to_add).T], 'csr')

In [16]:
for label in cols_target:
    print('... Processing {}'.format(label))
    y = train_df[label]
    # train the model using X_dtm & y
    logreg.fit(X_dtm,y)
    # compute the training accuracy
    y_pred_X = logreg.predict(X_dtm)
    print('Training Accuracy is {}'.format(accuracy_score(y,y_pred_X)))
    # make predictions from test_X
    test_y = logreg.predict(test_X_dtm)
    test_y_prob = logreg.predict_proba(test_X_dtm)[:,1]
    submission_chains[label] = test_y_prob
    # chain current label to X_dtm
    X_dtm = add_feature(X_dtm, y)
    print('Shape of X_dtm is now {}'.format(X_dtm.shape))
    # chain current label predictions to test_X_dtm
    test_X_dtm = add_feature(test_X_dtm, test_y)
    print('Shape of test_X_dtm is now {}'.format(test_X_dtm.shape))

... Processing obscene
Training Accuracy is 0.9878048016243554
Shape of X_dtm is now (159571, 50001)
Shape of test_X_dtm is now (153164, 50001)
... Processing insult
Training Accuracy is 0.987691999172782
Shape of X_dtm is now (159571, 50002)
Shape of test_X_dtm is now (153164, 50002)
... Processing toxic
Training Accuracy is 0.9772452387965231
Shape of X_dtm is now (159571, 50003)
Shape of test_X_dtm is now (153164, 50003)
... Processing severe_toxic
Training Accuracy is 0.9956445720086983
Shape of X_dtm is now (159571, 50004)
Shape of test_X_dtm is now (153164, 50004)
... Processing identity_hate
Training Accuracy is 0.9966785944814534
Shape of X_dtm is now (159571, 50005)
Shape of test_X_dtm is now (153164, 50005)
... Processing threat
Training Accuracy is 0.9987654398355591
Shape of X_dtm is now (159571, 50006)
Shape of test_X_dtm is now (153164, 50006)


In [17]:
submission_chains.to_csv('submission_chains.csv', index=False)