#Intro

Problem: Given sentences sourced from comment text from posts that contain certain tags on r/OffMyChest and r/CasualConversations, determine if sentence is supportive and/or disclosive

Define inputs/outputs  
Model choices  
- Classical Models  
    - Logistic
    - Support Vector Machine
    - Stochastic Gradient Descent
    - Naive Bayesian Inference

These models take in vectors that are created from the words, which include tfidf vectors, LSA vectors based off the tfidf vectors and LIWC vectors, which is just a vector of frequencies for every category in LIWC.

The outputs are just the probabilities for each label

I used 10 fold cross validation for these, and measured the per label and micro average precision, recall and f1 score

- Neural Network based Models
    - Simple Feedforward
    - AWD-LSTM
    - Transformer based networks
        - (Distil) RoBERTa (distilled version and not are basically interchangeable)
        - XLNet

The simple NN takes the feature vectors as input (excluding tfidf as those are too big), while the other models take in text. To be more specific, the text actually is tokenised first into tokens that the model was trained on, then numericalised according to the vocabulary of the model. 

The outputs of the models are logits, which need to be passed through a sigmoid function in order to get the actual probabilities.

I left out 10% of labeled data to use as a validation set, and also measured the per label and micro average precision, recall and f1 score. Some of the results also contained the 'accuracy', the total number of correct labels over the total number of labels

Also, every mention of RoBERTa will most likely be the distilled version due to colab memory constraints


- Language Model Training
    - (Distil) RoBERTa

Uses the unlabeled comments as text in order to do masked language modelling training, where words in the sentences will be randomly replaced with mask tokens, which the model will then have to predict.

Outputs the logits? for the possible classes for each word position. 

The RoBERTa part of the model is then loaded instead of the provided pretrained model to be used for classification. 

- [Unsupervised Data Augmentation](https://arxiv.org/pdf/1904.12848.pdf)
    - Backtranslation
        - [Facebook's submission to WMT'19](https://github.com/pytorch/fairseq/blob/master/examples/wmt19/README.md) 
    - (Distil) RoBERTa

The gist of UDA is basically consistency training using the data, but instead of adding noise, we use data augmentation as the source of noise. I choose to use back-translation as opposed to tfidf word replacement as the augmentation of choice as the task at hand is not really dependent on any particular key words, but rather the whole sentence structure as a whole (which is why I felt that the features would not be able to predict the classes very well), so tfidf based word replacement would generate a less diverse paraphrasing. 

To go into more detail, we compute a combined loss consisting of a supervised and unsupervised portion, where the supervised loss is a standard loss comparing the outputs of the model to the labels, while the unsupervised portion is the (KL) divergence between the predictions of the original unlabeled text compared to the backtranslated text (the distribution we are comparing to is the original). But since our problem is multilabel, I have adapted the KL divergence to compute the divergence between the distribution of every class (so calculate the KL divergence for the two probabilties for every class). I suspect that simply using a binary cross entropy meant for multilabeled data would give similar, or even better results since KL divergence doesnt make for a good loss (according to the internet as far as I can tell)

There is also the Training Signal Annealing portion of UDA, where if the model's confidence on a particular training example is above a certain threshold determined by the stage of the training the model is at, it does not count toward the supervised loss. According to the authors of the paper, this was their strategy for dealing with overfitting. More details in [paper](https://arxiv.org/pdf/1904.12848.pdf). 

The unlabeled comment text is fed into facebook's EN-GE and then GE-EN translation model in order to generate paraphrases for UDA. I chose to use Facebook's fairseq instead of tensor2tensor as in the original paper as tensor2tensor was giving me issues with dropping translations as well as having a older model. 

- Feature permutation importance

Basically for every liwc feature we swap the first half and the second half of the feature to see what is the drop in error. This will allow us to see which category in LIWC affects the neural network the most

#Results

Table form? and discussion

##Classical Classifiers
###Data
Format:   
```
                supportive    disclosive  
precision  
recall  
f1 score  
support(ignore)  

precision recall f1 score  
```

OneVsRest  
>Tfidf:  
Results for classifier Logistic Reg  
[[  0.741242   0.578209]  
 [  0.082481   0.69604 ]  
 [  0.147563   0.631616]  
 [322.6      681.9     ]]  
[0.584943 0.499209 0.538674]    
Results for classifier LinearSVC  
[[  0.58721    0.584152]  
 [  0.179531   0.603428]  
 [  0.274463   0.593333]  
 [322.6      681.9     ]]  
[0.584547 0.467203 0.519199]    
Results for classifier SGD  
[[  0.698698   0.579263]  
 [  0.122572   0.673083]  
 [  0.208287   0.622461]  
 [322.6      681.9     ]]  
[0.587186 0.496412 0.537812]  
Results for classifier Naive Bayes  
[[  0.929048   0.54631 ]  
 [  0.013076   0.893738]  
 [  0.025744   0.677871]  
 [322.6      681.9     ]]  
[0.547815 0.610792 0.577488]  

>LSA
Results for classifier Logistic Reg  
[[  0.683801   0.570369]  
 [  0.073255   0.728009]  
 [  0.131953   0.639117]  
 [322.6      681.9     ]]  
[0.574782 0.517624 0.544436]    
Results for classifier LinearSVC  
[[  0.643912   0.587667]  
 [  0.113716   0.642443]  
 [  0.193103   0.613607]  
 [322.6      681.9     ]]  
[0.591656 0.472631 0.525411]    
Results for classifier SGD  
[[  0.79091    0.541212]  
 [  0.05234    0.931395]  
 [  0.097841   0.677928]  
 [322.6      681.9     ]]  
[0.545714 0.64887  0.587517]  
Results for classifier Naive Bayes  
Naive Bayes doesnt like matrix factorisation methods (like LSA)

>LIWC
Results for classifier Logistic Reg  
[[  0.598628   0.631613]  
 [  0.227055   0.680657]  
 [  0.328835   0.65489 ]  
 [322.6      681.9     ]]  
[0.626635 0.534994 0.577055]    
Results for classifier LinearSVC  
[[  0.590914   0.6332  ]  
 [  0.227962   0.668885]  
 [  0.32873    0.650281]  
 [322.6      681.9     ]]  
[0.626825 0.527367 0.572623]  
Results for classifier SGD  
[[  0.661719   0.658957]  
 [  0.02916    0.609686]   
 [  0.054394   0.623961]  
 [322.6      681.9     ]]  
[0.657101 0.423416 0.507834] 
Results for classifier Naive Bayes    
[[  0.         0.559526]  
 [  0.         0.892687]  
 [  0.         0.687544]  
 [322.6      681.9     ]]  
[0.559526 0.605887 0.581585]  



###Comments

Standard baseline, worse results than I expected initially as other multilabel text classification problems had better results, even with the classical models. Also prone to classifying everything the same (noticed some ill defined f1 scores and precision scores), and lower recall that may be caused by data imbalance (not many support labels)



## AWD-LSTM

### Data
```
precision    recall  f1-score   support

     support    0.67552   0.33578   0.44858       682
  disclosive    0.67643   0.69581   0.68598      1361

   micro avg    0.67625   0.57562   0.62189      2043
   macro avg    0.67597   0.51579   0.56728      2043
weighted avg    0.67612   0.57562   0.60673      2043
 samples avg    0.43121   0.40770   0.41340      2043
```

###Comments

Better than the classical models, but suffering from a lower support recall as well. Attempted to use ULMFiT, but only to find out I did it wrong afterwards (Used a constant learning rate for all the layers instead of decreasing ones). Not much else to say, as I did not test it very thoroughly. 




## XLNet and RoBERTA with the provided pretrained models

###Data
```
RoBERTa:  
              precision    recall  f1-score   support

     support    0.67365   0.67164   0.67265       335
  disclosive    0.69572   0.69881   0.69726       674

   micro avg    0.68843   0.68979   0.68911      1009
   macro avg    0.68468   0.68523   0.68495      1009
weighted avg    0.68839   0.68979   0.68909      1009
 samples avg    0.47784   0.47589   0.46967      1009


XLNet:

              precision    recall  f1-score   support

     support    0.63714   0.66567   0.65109       335
  disclosive    0.69913   0.71365   0.70631       674

   micro avg    0.67823   0.69772   0.68784      1009
   macro avg    0.66814   0.68966   0.67870      1009
weighted avg    0.67855   0.69772   0.68798      1009
 samples avg    0.48523   0.48289   0.47538      1009

```

###Comments

Much better results than previous models, and the support recall does not seem to be significantly lower anymore. The precision and recall of the RoBERTa model seem to be more balanced as compared to for the XLNet model. There is still a slight discrepancy between the metrics for support vs disclosure, which is more prominent in the case for XLNet.

## Pretrained on unlabeled comments

###Data
##### lm 3:
###### best valid loss was kept:
>lr:2e-5,wd:0.001 
(0.7167042889390519, 0.6293359762140733, 0.670184696569921, None)
0.5754276827371695

>lr:1e-5,wd:0.001 
(0.678963110667996, 0.6749256689791873, 0.6769383697813122, None)
0.562208398133748

>lr:1e-4,wd:0.001  
(0.6697626418988648, 0.643211100099108, 0.6562184024266936, None)
0.5427682737169518

>lr:5e-5,wd:0.001  
>(0.6740597878495661, 0.6927651139742319, 0.6832844574780058, None)
>0.5598755832037325

>lr:1e-3,wd:0.001  
>(0.5241057542768274, 0.66798810703667, 0.587363834422658, None)
>0.42068429237947125

###### best f1 score was kept:
>lr:2e-5,wd:0.001  
(0.6678798908098271, 0.7274529236868187, 0.6963946869070209, None)
0.5668740279937792

>lr:1e-5,wd:0.01   
(0.6825242718446602, 0.6967294350842418, 0.689553702795488, None)
0.567651632970451

##### lm 6:  
Seems worse overall over the many attempts (not recorded)
lr:7e-5,wd:0.001
(0.698051948051948, 0.6392467789890981, 0.6673564407656492, None)
0.5544323483670296

##### lm 7:  
>lr:2e-5,wd:0.001  
(0.6724303554274735, 0.6937561942517344, 0.6829268292682927, None)
0.5544323483670296

>lr:1e-4,wd:0.001  
(0.6835978835978836, 0.6402378592666006, 0.6612077789150461, None)
0.5443234836702955

>lr:1e-5,wd:0.001  
(0.6734892787524367, 0.6848364717542121, 0.6791154791154792, None)
0.5559875583203733

>lr:9e-6,wd:0.001  
(0.6902564102564103, 0.6669970267591675, 0.6784274193548386, None)
0.5645412130637636

>lr:5e-6,wd:0.001  
(0.6837944664031621, 0.6858275520317145, 0.6848095002474023, None)
0.5637636080870918

>lr:1e-6,wd:0.001 (more epochs than usual)  
(0.6829025844930418, 0.6808721506442021, 0.6818858560794046, None)
0.5668740279937792

>lr:2e-6,wd:0.001
(0.6791120080726539, 0.6669970267591675, 0.673, None)
0.5528771384136858

##### lm 8:  

>lr:5e-5,wd:0.001  
(0.7510489510489511, 0.5322101090188305, 0.622969837587007, None)
0.5497667185069984

>lr:2e-5,wd:0.001  
(0.6762452107279694, 0.6997026759167493, 0.6877739892839747, None)
0.5552099533437014

>lr:1e-5,wd:0.001  
(0.6759615384615385, 0.6967294350842418, 0.6861883845778428, None)
0.5629860031104199

>lr:9e-6,wd:0.001  
(0.7004264392324094, 0.6511397423191279, 0.6748844375963019, None)
0.5699844479004665

>lr:1e-4,wd:0.001  
(0.7113133940182055, 0.5421209117938554, 0.6152980877390326, None)
0.5427682737169518

>lr:1e-6,wd:0.001  
(0.6923879040667362, 0.6580773042616452, 0.6747967479674797, None)
0.5660964230171073

>lr:5e-6,wd:0.001  
(0.6791907514450867, 0.6987115956392468, 0.6888128969223253, None)
0.5684292379471229

##### lm 9:  
>lr:1e-5,wd:0.001  
(0.6918429003021148, 0.6808721506442021, 0.6863136863136863, None)
0.5754276827371695

>lr:2e-5,wd:0.001  
(0.6914778856526429, 0.6352824578790882, 0.6621900826446281, None)
0.5544323483670296

>lr:9e-6,wd:0.001  
(0.6778523489932886, 0.7006937561942518, 0.689083820662768, None)
0.562208398133748

>lr:5e-6,wd:0.001  
(0.7067099567099567, 0.647175421209118, 0.6756337299534403, None)
0.5746500777604977

>lr:2e-6,wd:0.001  
(0.6854838709677419, 0.6739345887016849, 0.6796601699150424, None)
0.5637636080870918

>lr:1e-6,wd:0.001  
(0.6786786786786787, 0.6719524281466799, 0.6752988047808764, None)
0.557542768273717

### Comments

I only pretrained the RoBERTa model as it was easier to understand which aspect of language it was modelling, which led me to start implementing it first as a test, but then when the results were worse I decided it was not worth the effort to try to implement the permutation modelling of XLNet (and the cancer looking 2 streams of attention it boasts). The fact that there was sample code on the huggingface docs also helped in making this decision.

I had thought that these models would give better results than the previous, but it seems to perform slightly worse, even under extensive parameter tuning. This may be due to the fact that I did not train the language model with [discriminative finetuning](https://arxiv.org/pdf/1801.06146.pdf), where the learning rates vary from layer to layer. 


## Simple feedforward network using features

###Data


LSA:
```
              precision    recall  f1-score   support

     support    0.59677   0.22090   0.32244       335
  disclosive    0.60345   0.57122   0.58689       674

   micro avg    0.60236   0.45491   0.51835      1009
   macro avg    0.60011   0.39606   0.45467      1009
weighted avg    0.60123   0.45491   0.49909      1009
 samples avg    0.34176   0.31921   0.32426      1009
```
LIWC:  
```
              precision    recall  f1-score   support

     support    0.56757   0.43881   0.49495       335
  disclosive    0.64613   0.60682   0.62586       674

   micro avg    0.62332   0.55104   0.58496      1009
   macro avg    0.60685   0.52282   0.56041      1009
weighted avg    0.62005   0.55104   0.58240      1009
 samples avg    0.39425   0.38608   0.38362      1009
```

###Comments

Recall issue is present here. This is better than the classical classifiers but worse than the LSTM based one.

## UDA

### Data
#### With finetuned language model
log:  
lr:1e-5  
```

              precision    recall  f1-score   support

     support    0.68038   0.64179   0.66052       335
  disclosive    0.66922   0.77745   0.71929       674

   micro avg    0.67243   0.73241   0.70114      1009
   macro avg    0.67480   0.70962   0.68990      1009
weighted avg    0.67293   0.73241   0.69978      1009
 samples avg    0.50778   0.50855   0.49974      1009

epoch	train_loss	valid_loss	accuracy	accuracy_thresh	multi_label_fbeta	cpu used	peak	gpu used	peak	time
0	0.619644	0.535592	0.393079	0.726672	0.679435	40	75	1132	13850	19:54
1	0.575193	0.508279	0.387636	0.750778	0.692861	20	53	0	13618	19:52
2	0.592565	0.506151	0.386081	0.752722	0.663848	41	75	518	13720	13:48
3	0.575860	0.504516	0.381415	0.749222	0.698739	20	50	0	13662	13:49
4	0.590860	0.501973	0.384526	0.755054	0.701139	20	54	0	13744	13:51
```

exp:  
lr 1e-5
```
              precision    recall  f1-score   support

     support    0.68182   0.62687   0.65319       335
  disclosive    0.65380   0.77893   0.71090       674

   micro avg    0.66157   0.72844   0.69340      1009
   macro avg    0.66781   0.70290   0.68204      1009
weighted avg    0.66310   0.72844   0.69174      1009
 samples avg    0.51089   0.50700   0.50026      1009


 epoch	train_loss	valid_loss	accuracy	accuracy_thresh	multi_label_fbeta	cpu used	peak	gpu used	peak	time
0	0.493801	0.680391	0.396190	0.691291	0.679580	31	62	1132	13784	13:49
1	0.574188	0.663418	0.403966	0.747667	0.673377	20	55	0	13836	13:56
2	0.574881	0.630371	0.393079	0.746112	0.687709	20	52	0	13812	13:53
3	0.593605	0.560605	0.392302	0.749611	0.691571	20	50	0	13818	13:44
4	0.558335	0.521513	0.387636	0.747278	0.693396	20	54	0	13776	13:41
```

```
log: 
lr 1e-5
epoch	train_loss	valid_loss	accuracy	accuracy_thresh	multi_label_fbeta	cpu used	peak	gpu used	peak	time
0	0.648394	0.543909	0.388414	0.737947	0.682375	20	53	1132	13834	13:53
1	0.595730	0.522705	0.381415	0.732115	0.685532	20	50	0	13782	13:52
2	0.596397	0.517787	0.386858	0.751555	0.668396	20	54	0	13724	13:51
3	0.551665	0.526049	0.393857	0.729393	0.683060	20	52	0	13844	13:56
4	0.531672	0.529591	0.391524	0.735225	0.685160	20	56	0	13840	14:00

              precision    recall  f1-score   support

     support    0.75120   0.46866   0.57721       335
  disclosive    0.68688   0.72255   0.70427       674

   micro avg    0.70153   0.63826   0.66840      1009
   macro avg    0.71904   0.59560   0.64074      1009
weighted avg    0.70824   0.63826   0.66208      1009
 samples avg    0.46229   0.44751   0.44816      1009
```

exp:  
1e-5  
```
              precision    recall  f1-score   support

     support    0.65823   0.62090   0.63902       335
  disclosive    0.66232   0.75371   0.70507       674

   micro avg    0.66113   0.70961   0.68451      1009
   macro avg    0.66027   0.68730   0.67204      1009
weighted avg    0.66096   0.70961   0.68314      1009
 samples avg    0.49806   0.49456   0.48808      1009

epoch	train_loss	valid_loss	accuracy	accuracy_thresh	multi_label_fbeta	cpu used	peak	gpu used	peak	time
0	0.488590	0.675085	0.386081	0.664852	0.645850	20	52	1132	13832	13:53
1	0.558965	0.663746	0.394635	0.741446	0.631579	20	55	0	13796	13:52
2	0.543570	0.636842	0.408631	0.740280	0.607981	20	53	0	13846	13:58
3	0.566829	0.565225	0.398523	0.747667	0.667009	20	51	0	13768	13:48
4	0.550432	0.529440	0.393079	0.743390	0.684512	20	55	0	13830	13:46
```

###Comments

Gives comparable results to the given pretrained models, but without any parameter finetuning at all. I feel like better results could be obtained with further adjustment of the parameters. The language model finetuned on the unlabeled text seems to give better results when paired with UDA.

##Feature Importance

Look in notebook for more data

###Comments

Did not manage to test much but I noticed that adj was a influential category for the predictions made by the simple feedforward network.



#Discussion

Say issues and thought process  
Explain what was tried

##Implementation of things

Used fast.ai as it seemed to be easy to use, and it had some 1cycle training scheduling method that had results behind it.

The problem is fast.ai is very annoying to customise and had some issues that required some hacky workarounds

I haven't really looked too closely but fastai2 seems to fix these issues

### Data Cleaning

Data came with unicode artefacts, presumably from improper processing in R (as the text contained stuff liek <U+0009>, which I could only find in R) so had to regex that into the proper character. Also had to remove markdown and html entities as the comments were from reddit. 

###Features

To generate the features I lemmatized the text before putting it through either a tfidf vectorizer, or i got the frequency of every liwc category, which served as a vector. The LSA vectors were derived from the tfidf vectors as well.

###Classical Classifiers

Simple application of sklearn, nothing special was done here

### Fast.ai stuff

#### Quick rundown on fastai

Fastai modularises the whole pipeline into the dataprocessing to create a so called DataBunch, and the training and testing using a Learner. Learner can take in pytorch modules. Callbacks allow you to customise things more easily by having a callback before and after basically every step of training loop. They also have [1cycle](https://arxiv.org/pdf/1803.09820.pdf) implemented, which seems to do the same or better in terms of metrics using a shorter amount of time.

####Rough Explanation

LSTM was inbuilt, so the implementation is just following the tutorial and documentation

For the transformers, I decided to use the [pytorch implementation by huggingface team](https://huggingface.co/transformers/) since it was the most popular and hence well maintained port in pytorch. However, since our problem is a multilabel one, I had to use a [different library](https://github.com/kaushaltrivedi/fast-bert/blob/master/fast_bert/) that built on top of the huggingface library that implemented a multilabel classification head.

There were also quite a few tutorials online that had integrated the two libraries which I followed.  
https://towardsdatascience.com/fastai-with-transformers-bert-roberta-xlnet-xlm-distilbert-4f41ee18ecb2  
https://github.com/maximilienroberti/fastai-transformers/blob/master/Fastai_%2B_Transformers_%2B_ULMFiT.ipynb  
The gist of it is basically wrap the huggingface tokenizers in fastai versions so that it fits into the text preprocessing pipeline that already exists. Then we need to wrap the huggingface model as it gives a tuple as output instead of just the logits we predicted. (on hindsight we can get around this using a callback)

To train the RoBERTa language model, I had to implement masked language modelling in fastai. This involved masking the input text before feeding it into the model, and storing which tokens were masked. This was done using a callback, and labelling the data with an empty label. To export this model, I used the huggingface export as that allows the model to be transferred without the head (aka classifier linear layer or language modelling head).

##### [UDA](https://)
Ok this was a massive effort (and there were no tutorials :/)

Every batch into the model had to consist of labeled and unlabeled data. This was a problem as the labeled data were labeled with 'one hot encoded' vectors (quotes as multilabel so not one hot per se), while the unlabeled data was 'labeled' with text (quotes as not exactly a label but rather the original unaugmented text to be passed into the model to get a standard prediction to compare against). The way I got around this was to use a custom batch sampler which produces indices of data that will be in a batch, containing 2 batch samplers, one for unlabeled augmented data and labeled data, and then zipping the two batch samplers to concatenate them. The data would be arranged such that the first range of indices would be occupied by labeled data, and the next rangeunlabeled augmented, and the last range the original unlabeled, so each batch sample would sample, then the custom sampler will append and fix the offsets so that the indices are correct for the whole dataset. There is also another set of indices appended after the labeled+unlabeledaug belonging to the original unlabeled data, offset approprately. Note that this will make it such that the indices come at a seemingly larger batch size.


Then the collate function will pad and collate the batches of text, then split the batch into the labeled+unlabeledaug and a unlabeledori batch as what the model needs to backpropagate from are the labeled data and the unlabeled augmented set. The unlabeled original batch will be returned as the target instead (in a tuple with the labeled target), to be further processed by the callback that governs the entire implementation.

The loss involves two components, the supervised loss from the labeled data, and the unsupervised loss from the unlabeled data. This had to be implemented by using a custom loss to convert the logits predicted by the model into the log probability that the loss took as input.

The callback handled the replacement of the various parts of the pipeline, as well as shaping the data appropriately to be passed into the next part. The calculation of the original unlabeled probabilities as well as the TSA threshold mask takes place here.




##Quirks of Colab

The more you use their GPU the more likely for them to time you out/ give you worse GPUs
Workaround: Make more google accounts 

Free google drive storage may not be able to save model checkpoints from every epoch  
Workaround: Download to hard drive first

Random disconnects when tabbing out
Workaround: 
```
function ConnectButton(){
    console.log("Connect pushed"); 
    document.querySelector("#connect").click() 
}
setInterval(ConnectButton,120000);
```
Run this javascript code to auto click the connect button. Seems to work most of the time.

Somehow I feel that when running two runtimes on different accounts for the same file will interfere with each other.

#Future work

Try to give stuff other than the classic "more data/compute"
But still more compute - 1 GPU with 15 Gb of useable ram not cutting it, especially when the model itself is 1 Gb. Have to use smaller batchsizes, which will affect training time, and may even affect the accuracy - especially in the case of UDA, where the ratio of unlabeled to labeled data per batch is important as that determines whether the model is able to look at every datapoint we have.

Cleaner data - Instead of splitting into sentences before the cleaning, should clean first to avoid markdown being screwed up  

MixMatch - another semi-supervised learning algorithm. Use MixUp on the numericalised text or embeddings maybe

Many hyperparameters to tune: Didnt really vary much on the weight decay; many other combinations of parameters to tune  
e.g. UDA schedule vs weight decay vs learning rate or looking at which language model would give the best result or testing UDA without language model pretraining

Combined model using both text and features - May give better results. Can implement in fast.ai using link in notebook, but need to mess around with custom everything for it to work. 

Could have predicted each label individually -  Supportiveness and Disclosure not exactly linked 

Use full roberta model - had to use a distilled version as it had much lower memory requirements (in exchange for losing a bit of accuracy)

More epochs for certain models - Some models may be underfitting since the validation loss was still decreasing.