# Translation and sentiment analysis with ML

An important challenge in computational linguistics is accurate translation of a sentence from one spoken or written language to another.



Translation is a very hard problem compounded by the fact that there are thousands of languages and each can have very different grammar rules. One approach is to convert the formal grammar rules for one language, such as English, into a non-language dependent structure, and then translate it by converting back to another language. This approach means that you would take the following steps:

1. Identification. Identify or tag the words in input language into nouns, verbs etc.
2. Create translation. Produce a direct translation of each word in the target language format.

## Example sentence: English to Irish

In 'English', the sentence I feel happy is three words in the order:

- subject (I)
- verb (feel)
- adjective (happy)

However, in the 'Irish' language, the same sentence has a very different grammatical structure - emotions like "happy" or "sad" are expressed as being upon you.

The English phrase I feel happy in Irish would be Tá athas orm. A literal translation would be Happy is upon me.

An Irish speaker translating to English would say I feel happy, not Happy is upon me, because they understand the meaning of the sentence, even if the words and sentence structure are different.

The formal order for the sentence in Irish are:

- verb (Tá or is)
- adjective (athas, or happy)
- subject (orm, or upon me)

## Translation

A naive translation program might translate words only, ignoring the sentence structure.

Naive translation leads to bad (and sometimes hilarious) mistranslations: I feel happy translates literally to Mise bhraitheann athas in Irish. That means (literally) me feel happy and is not a valid Irish sentence. Even though English and Irish are languages spoken on two closely neighboring islands, they are very different languages with different grammar structures.

## Machine learning approaches

So far, you've learned about the formal rules approach to natural language processing. Another approach is to ignore the meaning of the words, and instead use machine learning to detect patterns. This can work in translation if you have lots of text (a corpus) or texts (corpora) in both the origin and target languages.

For instance, consider the case of Pride and Prejudice, a well-known English novel written by Jane Austen in 1813. If you consult the book in English and a human translation of the book in French, you could detect phrases in one that are idiomatically translated into the other. You'll do that in a minute.

For instance, when an English phrase such as I have no money is translated literally to French, it might become Je n'ai pas de monnaie. "Monnaie" is a tricky french 'false cognate', as 'money' and 'monnaie' are not synonymous. A better translation that a human might make would be Je n'ai pas d'argent, because it better conveys the meaning that you have no money (rather than 'loose change' which is the meaning of 'monnaie').

If an ML model has enough human translations to build a model on, it can improve the accuracy of translations by identifying common patterns in texts that have been previously translated by expert human speakers of both languages.



## Exercise - translation

You can use TextBlob to translate sentences. Try the famous first line of Pride and Prejudice:

In [6]:
from textblob import TextBlob

text = "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife!"
blob = TextBlob(text)
print(blob.translate(from_lang='en',to='fr'))


C'est une vérité universellement reconnue, qu'un seul homme en possession d'une bonne fortune doit être dans le manque d'une femme!


In [7]:
text = 'I am so happy I have finally began NLP'
blob = TextBlob(text)
print(blob.translate(from_lang='en',to='ha'))

Ina matukar farin ciki da na fara nlp


Yeah, so great. Worked with hausa too. Nice!!!