# Translating with NLTK

NLTK (Natural Language Toolkit) is a suite of libraries and programs for natural language processing.  NLTK provides tools for tokenization, part-of-speech tagging, stemming, chunking, parsing, and semantic reasoning, among other tasks. It also includes a large collection of corpora, lexicons etc. 
For this Task relevant is the nltk.corpus package. 


First, we need to download NLTK and the datasets 

In [8]:
pip install nltk

Collecting nltkNote: you may need to restart the kernel to use updated packages.
  Using cached nltk-3.8.1-py3-none-any.whl (1.5 MB)
Installing collected packages: nltk
Successfully installed nltk-3.8.1



In [10]:
# This Code will download all the data from nltk and open a window in which you can see the download progress. 
nltk.download() 


showing info https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/index.xml


True

# Translating a file with NLTK

This code imports the NLTK library and the comtrans module from NLTK, and loads the bilingual English-french corpus. It then creates a translation dictionary using the words and mots from the corpus. It then opens an English file, tokenizes it, and translates it using the translation dictionary. Finally, it writes the German (in this case french) version of the document to a new file.

In [136]:
import nltk
from nltk.corpus import comtrans

# load the english-french corpus
bilingual_corpus = comtrans.aligned_sents('alignment-en-fr.txt')

# create a translation dictionary
trans_dict = dict()
for sent in english_german:
    for english, german in zip(sent.words, sent.mots):
        # add the english-german translation to the dictionary
        trans_dict[english] = german

# open and read the english file
with open("ice_man.txt", encoding='utf-8') as f:
    english_file = f.read()

# tokenize the english file
english_tokens = nltk.word_tokenize(english_file)

# translate the english file
german_file = ""
for token in english_tokens:
    if token in trans_dict:
        german_file += trans_dict[token] + " "
    else:
        german_file += token + " "

# write the german file
with open("german_file.txt", "w", encoding='utf-8') as f:
    f.write(german_file)



In [None]:
# With this, we can only translate from english to french

# Python "translate"


An other option to translate is the python "translate' libary. For this we need to install and import translate. 

In [67]:
pip install translate 
import translate 

Collecting translate
  Downloading translate-3.6.1-py2.py3-none-any.whl (12 kB)
Collecting libretranslatepy==2.1.1
  Downloading libretranslatepy-2.1.1-py3-none-any.whl (3.2 kB)
Installing collected packages: libretranslatepy, translate
Successfully installed libretranslatepy-2.1.1 translate-3.6.1Note: you may need to restart the kernel to use updated packages.



In [142]:
#again we need to read the file
with open('ice_man.txt', 'r', encoding='utf-8') as f: 
    text = f.read() 
    
#with this code we translate from english to german
translator = translate.Translator(from_lang="english",to_lang="german")
translation = translator.translate(text) 

#now we create a new file called "output.txt" with the german translation in it.
with open('output.txt', 'w', encoding='utf-8') as f: 
    f.write(translation)

RuntimeError: generator raised StopIteration

In [None]:
!#With this option we can only translate texts that have less than 500 chars!

# Googletrans

We can also use googletrans to translate in python (max. 15.000 chars) 

In [None]:
pip install googletrans==3.1.0a0
from googletrans import Translator
translator = Translator()

In [133]:
with open('ice_man.txt', 'r', encoding='utf-8') as f: 
    text = f.read()
from googletrans import Translator

translator = Translator()
translation = translator.translate(text,dest='de') 

print(translation.text)
with open ('google.txt','w') as f:
    f.write(translation.text)

AttributeError: 'NoneType' object has no attribute 'group'

In [143]:
# I get a AttributeError: 'NoneType' object has no attribute 'group'


# Calculating the sentiment with Textblob

In [137]:
#This code imports the TextBlob module from textblob library to perform sentiment analysis. 
from textblob import TextBlob 

#reads the file and assigns the content to the variable 'text_en'.
with open("ice_man.txt") as f: 
    text_en = f.read()
blob_en = TextBlob(text_en) #creates a TextBlob object with text_en. 

print("Sentiment of English File:") #prints out the sentiment of the file.
print(blob_en.sentiment)

#reads the german_file and assigns the content to the variable 'text_de'
with open("german_file.txt") as f: 
    text_de = f.read()
blob_de = TextBlob(text_de) #creates a TextBlob object with text_de

print("Sentiment of German File:") 
print(blob_de.sentiment)
if blob_en.sentiment != blob_de.sentiment: #checks if the sentiment of the English and German files are the same.
    print("The sentiment of the English and German files differ.")
else:
    print("The sentiment of the English and German files are the same.")

Sentiment of English File:
Sentiment(polarity=0.08925143936379887, subjectivity=0.4623254814266053)
Sentiment of German File:
Sentiment(polarity=0.06477272727272726, subjectivity=0.5340909090909091)
The sentiment of the English and German files differ.
