# Style Transfer on Text

This is an attempt to use the model released by Hu, Zhiting, et al. on other sentiments aside from positive and negative

Hu, Zhiting, et al. "Toward controlled generation of text." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017. Available: https://arxiv.org/pdf/1703.00955.pdf. https://github.com/asyml/texar/tree/master/examples/text_style_transfer

The actual model was different from the one proposed in the paper. The actual model used as far as we could backtrack their code and explanations is as follows

![title](images/Pretrainingjupy.png)

For the pretraining:
The autoencoder is trained with losses from the reconstruction loss and the predicted label of the reconstruction

The classifier is trained 

![title](images/Trainingjupy.png)

For the training:
The classifier parameters are fixed
The autoencoder is trained similarly, EXCEPT instead of adding some transform (the code uses a 200 vector MLP transoform) of the origin label to the latent code, what is actually added is a transform of the desired label

(i.e. if 1 == positive, 0 == negative, a sentence with an attached label of 0  will have a transform of 1 concatenated to it's latent code)

## Methodology and running the code

We used the code of the simplified model provided by Hu et al.

We tried extending the length of the sequences(sentences) allowed however our machine could not handle it and to be safe we just used the same length they did (21)

We also removed words from the vocabulary extending past 15 characters as we found that there were some links, etc. included. Further more a lot of these words were probably outliers, that is, they were only used one time. We also removed the corresponding sentences.

Preprocessing and forming of database was handled by Aerjay and the code is available in his github: https://github.com/aerjayc/formality/tree/authors

## To run the code go to the /text-style-transfer folder and run python main.py --config config

/texar-master/texar-master/examples/text-style-transfer

We have set it up such that thet database loaded is the authors database (i.e. we replaced the contents of their files but did not rename them) and the training time runs for roughly the same as the one used by Hu et al. However we did not adjust the hyperparameters, as we lacked the time to experiment with them. Each training run takes around 10 hours on our machine (GTX 1060)

# Results - Formality Transfer

Full training samples are available at the samples_formality folder
val1-val220 are reconstruction attempts (pretraining)
val221-val300 are sentiment transfer attempts (between formal and informal and vice versa)

The sentences from the database could not even be reconstructed. We think this is because to the extreme spread in sentence variety. The vocabulary was also relatively large compared to the dataset.

![title](images/ptval219.jpg)

The end result then is barely understandable

![title](images/ptval300.jpg)

# Results - Author Style Transfer


Hence we decided to work on author style transfer.First, because the vocabulary would be much smaller due to the lack of informal/varied words. Second, the dataset could easily be acquired from pdfs of books. Third, we could think of two authors of the top of our heads with varying styles

We used Mark Twain's The Adventures of Tom Sawyer and Jane Austen's Pride and Prejudice

The text we used is in the /text-style-transfer folder labelled as MarkTwain and Jane Austen

Full training samples are available at the samples_author folder

### As can be seen the reconstruction is much better

![title](images/tval219.jpg)

### The model is also able to somewhat perform a style transfer.

For example it learns to replace characters names with those used by another author (tom -> jane f.). It also learns that some authors are much more likely to use certain words (i.e. waylaid for Mark Twain was replaced by simpers for Jane Austen)

![title](images/tval300.jpg)

## Analysis/Discussion

Unfortunately we ran into errors running Hu et al.'s code.

One we had to take out the part that printed the BLEU metric discussed in the paper. Instead we will only performed a visual/cursory analysis

Two we could not run the code on Jupyter notebook, so instead we will be including two training logs at the end of this notebook. Unfortunately we did not manage to save the training log of the formality-run that was able to complete.

In the github we provide the following files can be found 

TrainLog1 : the training log of the uncompleted formality run (the training log of the completed run was not saved)

TrainLog2 : the training log of the completed author-style transfer run

#### Noteworthy points.
The sentiment transfer results by Hu et al. seemingly can be reduced to mere vocabulary switches. I.e. change delicious to horrible

### Below are some samples from rerunning their experiment on our machine. The same can be seen in from the samples in their github.

![title](images/huetalsamples.jpg)

For formality transfer which is much than simply replacing a few words here and there (as most words don't necessarily have an informal counterpart), it seems that the model by Hu et al. is insufficient as of now.

It should be noted however that we were not able to train even the autoencoder part to convergence with our given dataset.

Perhaps a thing to note here is that

### We seem to not have considered how hard it is to define formality/informality and how the database we used defined it.

With that in mind what we did was we created our own database on a much better defined sentiment (Mark Twain vs. Jane Austen) with much more standardized vocabulary.

### From this, with a much smaller database than the one used by Hu et al in their experiment. We were able to achieve a similar (visually) performance.

## Final Conclusions

The formality-informality transfer failed because it is much harder to perform via a word replacement strategy

The style transfer that transcends a straight one-to-one transfer failed.

However we did have a better idea of what Hu et al.’s SIMPLIFIED model was capable of. Roughly it seems to be able to indicate keywords which expresses sentiment and replace them with keywords that indicate another sentiment.

We proved this by performing an author-style transfer of sorts.

Note that we did not run the exact model they proposed in the paper. What we used was the simplified/adapted model they released
