# __IndoMTL EmoBank__
Machine Translated [EmoBank dataset](https://github.com/JULIELab/EmoBank/blob/master/corpus/emobank.csv) to Bahasa Indonesia using Microsoft Translator API

## Set-up Environment

In [None]:
!pip install requests uuid

Collecting uuid
  Downloading uuid-1.30.tar.gz (5.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: uuid
  Building wheel for uuid (setup.py) ... [?25l[?25hdone
  Created wheel for uuid: filename=uuid-1.30-py3-none-any.whl size=6478 sha256=e2a48f9dd7cd0ed8f58c4bfb20fa90e6ca1b2c8be3b9c4367bea8575e6deb82a
  Stored in directory: /root/.cache/pip/wheels/ed/08/9e/f0a977dfe55051a07e21af89200125d65f1efa60cbac61ed88
Successfully built uuid
Installing collected packages: uuid
Successfully installed uuid-1.30


In [None]:
import os
import numpy as np
import pandas as pd

## Load Dataset

In [None]:
print("Original EmoBank dataset...")
path = "https://raw.githubusercontent.com/JULIELab/EmoBank/master/corpus/emobank.csv"
data = pd.read_csv(path)
data.head()

Original EmoBank dataset...


Unnamed: 0,id,split,V,A,D,text
0,110CYL068_1036_1079,train,3.0,3.0,3.2,"Remember what she said in my last letter? """
1,110CYL068_1079_1110,test,2.8,3.1,2.8,If I wasn't working here.
2,110CYL068_1127_1130,train,3.0,3.0,3.0,".."""
3,110CYL068_1137_1188,train,3.44,3.0,3.22,Goodwill helps people get off of public assist...
4,110CYL068_1189_1328,train,3.55,3.27,3.46,Sherry learned through our Future Works class ...


In [None]:
print("Extracting test column...")
emobank_text = data.drop(data.iloc[:, 0:5], axis=1)
emobank_text.head()

Extracting test column...


Unnamed: 0,text
0,"Remember what she said in my last letter? """
1,If I wasn't working here.
2,".."""
3,Goodwill helps people get off of public assist...
4,Sherry learned through our Future Works class ...


### Convert df into list
format `'insert text'`

In [None]:
emobank_list = emobank_text['text'].values.tolist()
emobank_list

['Remember what she said in my last letter? "',
 "If I wasn't working here.",
 '.."',
 'Goodwill helps people get off of public assistance.',
 'Sherry learned through our Future Works class that she could rise out of the mire of the welfare system and support her family.',
 'Coming to Goodwill was the first step toward my becoming totally independent.',
 'I am now... totally off of welfare."',
 'Goodwill prepares people for life-long employment.',
 "Here's another story of success from what might seem like an unlikely source: Goodwill's controller, Juli.",
 'Cornell found a number of employment options that he never dreamed existed after a work-site injury forced him out of his job at a foundry.',
 'He trained in desktop publishing and combined his enthusiastic work ethic with new-found skills in a burgeoning industry. "',
 'Dear ,',
 'I\'ve got more than a job; I\'ve got a career."',
 'After a lifetime of trials, Donna not only earned her GED at Goodwill, she earned a job here. "',
 '

## MTL EmoBank

### Set-up variables
from https://learn.microsoft.com/en-us/azure/cognitive-services/translator/quickstart-text-rest-api?tabs=python

In [None]:
import requests, uuid, json

# Add your key and endpoint
key = "<your-translator-key>"
endpoint = "https://api.cognitive.microsofttranslator.com"

# location, also known as region.
# required if you're using a multi-service or regional (not global) resource. It can be found in the Azure portal on the Keys and Endpoint page.
location = "<YOUR-RESOURCE-LOCATION>"

path = '/translate'
constructed_url = endpoint + path

params = {
    'api-version': '3.0',
    'from': 'en',
    'to': 'id'
}

headers = {
    'Ocp-Apim-Subscription-Key': key,
    # location required if you're using a multi-service or regional (not global) resource.
    'Ocp-Apim-Subscription-Region': location,
    'Content-type': 'application/json',
    'X-ClientTraceId': str(uuid.uuid4())
}

### Define translation function

In [None]:
def translate(dataset_list, tr_dataset_list):
  for text in dataset_list:
    # You can pass more than one object in body.
    body = [{
        'text': text
    }]
    request = requests.post(constructed_url, params=params, headers=headers, json=body)
    response = request.json()

    # print(json.dumps(response, sort_keys=True, ensure_ascii=False, indent=4, separators=(',', ': ')))

    # Get translation
    translation = response[0]["translations"][0]["text"]

    # Add translation
    tr_dataset_list.append([translation])

#### Test with 1 input

In [None]:
# https://techcommunity.microsoft.com/t5/educator-developer-blog/translate-your-notes-with-azure-translator-and-python/ba-p/3267201

def translate(text):
  # You can pass more than one object in body.
  body = [{
       'text': text
   }]
  request = requests.post(constructed_url, params=params, headers=headers, json=body)
  response = request.json()

  # Get translation
  translation = response[0]["translations"][0]["text"]
  # Return the translation
  return translation

In [None]:
translate("Ibrahim's harsh punishment sends a chilling message to Egyptians yearning for a more accountable and tolerant society.")

'Hukuman keras Ibrahim mengirimkan pesan mengerikan kepada rakyat Mesir yang merindukan masyarakat yang lebih bertanggung jawab dan toleran.'

### Iterating function over emobank_dict

[Service Limits](https://learn.microsoft.com/en-us/azure/cognitive-services/Translator/service-limits#character-and-array-limits-per-request)

__Operation: Translate__
*	Maximum Size of Array Element: 50k
* Maximum Number of Array Element: 1k
* Maximum Request Size (characters): 50k

S1	40 million characters per hour

__Latency__

_"Typically, responses for text within 100 characters are returned in 150 milliseconds to 300 milliseconds."_

depends on code & network connection


In [None]:
indoMTL_emobank_list = []

print("Translating EmoBank text into Bahasa Indonesia...")
translate(emobank_list, indoMTL_emobank_list)

print("MTL finished.")

Translating EmoBank text into Bahasa Indonesia...
MTL finished.


#### Test for list iteration

In [None]:
tr_test_list = []

In [None]:
test_df = pd.DataFrame(columns=["teks"])
test_list = ['I wanted to be there.', '..I had my second chance to change my life."', 'Your gift to Goodwill will help the many people who want to tell their own stories of success.', 'Your support will help them go to work.']

In [None]:
translate(test_list, tr_test_list)
test_list
tr_test_list

[['Saya ingin berada di sana.'],
 ['.. Saya memiliki kesempatan kedua untuk mengubah hidup saya."'],
 ['Hadiah Anda untuk Goodwill akan membantu banyak orang yang ingin menceritakan kisah sukses mereka sendiri.'],
 ['Dukungan Anda akan membantu mereka pergi bekerja.'],
 ['Saya ingin berada di sana.'],
 ['.. Saya memiliki kesempatan kedua untuk mengubah hidup saya."'],
 ['Hadiah Anda untuk Goodwill akan membantu banyak orang yang ingin menceritakan kisah sukses mereka sendiri.'],
 ['Dukungan Anda akan membantu mereka pergi bekerja.']]

## Save IndoMTL_EmoBank

In [None]:
indoMTL_emobank_text = pd.DataFrame(indoMTL_emobank_list, columns=["teks"])
indoMTL_emobank_text.head()

Unnamed: 0,teks
0,Ingat apa yang dia katakan dalam surat terakhi...
1,Jika saya tidak bekerja di sini.
2,".."""
3,Goodwill membantu orang keluar dari bantuan pu...
4,Sherry belajar melalui kelas Future Works kami...


In [None]:
path = "<path.csv>"

indoMTL_emobank_text.to_csv(path, index=False)

new_data = pd.read_csv(path)
new_data.head()

Unnamed: 0,teks
0,Ingat apa yang dia katakan dalam surat terakhi...
1,Jika saya tidak bekerja di sini.
2,".."""
3,Goodwill membantu orang keluar dari bantuan pu...
4,Sherry belajar melalui kelas Future Works kami...


In [None]:
indoMTL_emobank = data.iloc[:,:5]
indoMTL_emobank.head()

Unnamed: 0,id,split,V,A,D
0,110CYL068_1036_1079,train,3.0,3.0,3.2
1,110CYL068_1079_1110,test,2.8,3.1,2.8
2,110CYL068_1127_1130,train,3.0,3.0,3.0
3,110CYL068_1137_1188,train,3.44,3.0,3.22
4,110CYL068_1189_1328,train,3.55,3.27,3.46


In [None]:
full_indoMTL_emobank = pd.concat([indoMTL_emobank, indoMTL_emobank_text], axis=1)
full_indoMTL_emobank.head()

Unnamed: 0,id,split,V,A,D,teks
0,110CYL068_1036_1079,train,3.0,3.0,3.2,Ingat apa yang dia katakan dalam surat terakhi...
1,110CYL068_1079_1110,test,2.8,3.1,2.8,Jika saya tidak bekerja di sini.
2,110CYL068_1127_1130,train,3.0,3.0,3.0,".."""
3,110CYL068_1137_1188,train,3.44,3.0,3.22,Goodwill membantu orang keluar dari bantuan pu...
4,110CYL068_1189_1328,train,3.55,3.27,3.46,Sherry belajar melalui kelas Future Works kami...


In [None]:
save_path = "<save_path.csv>"

full_indoMTL_emobank.to_csv(save_path, index=False)

final_data = pd.read_csv(save_path)
final_data

Unnamed: 0,id,split,V,A,D,teks
0,110CYL068_1036_1079,train,3.00,3.00,3.20,Ingat apa yang dia katakan dalam surat terakhi...
1,110CYL068_1079_1110,test,2.80,3.10,2.80,Jika saya tidak bekerja di sini.
2,110CYL068_1127_1130,train,3.00,3.00,3.00,".."""
3,110CYL068_1137_1188,train,3.44,3.00,3.22,Goodwill membantu orang keluar dari bantuan pu...
4,110CYL068_1189_1328,train,3.55,3.27,3.46,Sherry belajar melalui kelas Future Works kami...
...,...,...,...,...,...,...
10057,wwf12_4531_4624,train,3.00,3.50,3.00,Tolong biarkan itu menjadi pengingat konstan d...
10058,wwf12_501_591,train,3.80,3.40,3.60,Itu sebabnya saya ingin menyampaikan pengharga...
10059,wwf12_592_691,train,3.00,3.00,3.10,Dan mengapa saya menulis kepada Anda hari ini ...
10060,wwf12_702_921,train,3.33,3.44,3.44,"Bahkan, saya ingin mendorong Anda untuk memper..."
