An app to take English user input and return Japanese, Romanized Japanese, and a sentiment analysis of the translated text.

In [249]:
# Dependencies

# pip install -U pip setuptools wheel
# pip install -U spacy
# python -m spacy download en_core_web_sm
# python -m spacy download ja_core_news_sm

# pip install --user asari

# python -m unidic download
# pip install mecab-python3
# pip install cutlet

# pip install sacrebleu

# pip install --user ipywidgets
# pip install --user gradio


In [85]:
# Load Tokenizing and Analysis Dependencies
# English and Japanese Spacy Pipelines

import spacy
from spacytextblob.spacytextblob import SpacyTextBlob

en_parse = spacy.load("en_core_web_sm")
jp_parse = spacy.load("ja_core_news_sm")

# Translation Dependencies
from translate import Translator

In [86]:
# Set up pipeline for English sentiment analysis
en_parse.add_pipe("spacytextblob")

<spacytextblob.spacytextblob.SpacyTextBlob at 0x1dc38565760>

In [139]:
# Check Polarity of original Text.
import pprint as pp
text = "You used to like working with us mercs a whole bunch. Mercenaries are efficient instruments of war. Just like fast food, ready to eat and easily disposable. You want to know what happened in the past? Hmm, some people would prefer I keep my mouth shut about that topic. Say, it must be such a blessing to not remember anything, right?"
sent = en_parse(text)
sent._.blob.polarity

0.15272108843537419

In [166]:
for token in sent.sents:
    print(token.text)

You used to like working with us mercs a whole bunch.
Mercenaries are efficient instruments of war.
Just like fast food, ready to eat and easily disposable.
You want to know what happened in the past?
Hmm, some people would prefer I keep my mouth shut about that topic.
Say, it must be such a blessing to not remember anything, right?


In [171]:
# Translation Pipeline
translator= Translator(to_lang="ja")

# Empty list to store the translated tokens
tl = []

for token in sent.sents:
    translation = translator.translate(token.text)
    tl.append(translation)

# Recombine tokens as a single string.
tl_string = "".join(tl) 


https://github.com/polm/cutlet

In [172]:
# Romanizes the JP Kana/Kanji
import cutlet

romanizer = cutlet.Cutlet()
romanizer.use_foreign_spelling = False

print(tl_string)
romanizer.romaji(tl_string)



傭兵たちと一緒に働くのが好きだったな。傭兵は効率的な戦争の道具です。ファストフードと同じように、すぐに食べられ、使い捨ても簡単です。過去に何があったのか知りたいですか？うーん、その話題については口を閉ざしておいたほうがいいと思う人もいます。ねえ、何も覚えていないのは恵みですよね？


'Youheitachi to issho ni hataraku no ga suki datta na. youhei wa kouritsuteki na sensou no dougu desu. fasuto fuudo to onaji you ni, sugu ni taberare, tsukaisute mo kantan desu. kako ni nan ga atta no ka shiritai desu ka? uun, sono wadai ni tsuite wa kuchi wo tozashite oita hou ga ii to omou hito mo imasu. nee, nan mo oboete inai no wa megumi desu yo ne?'

In [189]:
# Stores the tokenized translated text.

token_jp = jp_parse(tl_string)

test = []

for token in token_jp.sents:
    test.append(token.text)
    print(token.text)
    print(romanizer.romaji(token.text))

傭兵たちと一緒に働くのが好きだったな。
Youheitachi to issho ni hataraku no ga suki datta na.
傭兵は効率的な戦争の道具です。
Youhei wa kouritsuteki na sensou no dougu desu.
ファストフードと同じように、すぐに食べられ、使い捨ても簡単です。
Fasuto fuudo to onaji you ni, sugu ni taberare, tsukaisute mo kantan desu.
過去に何があったのか知りたいですか？うーん、その話題については口を閉ざしておいたほうがいいと思う人もいます。
Kako ni nan ga atta no ka shiritai desu ka? uun, sono wadai ni tsuite wa kuchi wo tozashite oita hou ga ii to omou hito mo imasu.
ねえ、何も覚えていないのは恵みですよね？
Nee, nan mo oboete inai no wa megumi desu yo ne?


https://github.com/mjpost/sacrebleu

In [256]:
# BLEU Scoring against the official, localized Japanese translation.
# However, this library does not seem to handle Japanese properly, given JP and Korean are the few that it requires a separate library instead of its own.
from sacrebleu.metrics import BLEU

reference_sentence = "あたしたちは昔のあんたといいお付き合いをさせてもらってたわ。だって傭兵は効率の良い戦争道具だもの。ファーストフードみたいに食べたい時に食べて、要らなくなったら捨てるだけ。過去に何があったかって？あら、あたしが喋りすぎるのを嫌う人もいるのよ。でも何も覚えてないのは幸せなこと、そうでしょ？"

ref = []

token_eval = jp_parse(reference_sentence)
for token in token_eval.sents:
    ref.append(token.text)
    print(token.text)
    print(romanizer.romaji(token.text))

bleu = BLEU()

bleu.corpus_score(test, ref)


あたしたちは昔のあんたといいお付き合いをさせてもらってたわ。
Atashitachi wa mukashi no anta to ii otsukiai wo sasete moratteta wa.
だって傭兵は効率の良い戦争道具だもの。
Da tte youhei wa kouritsu no yoi sensou dougu da mono.
ファーストフードみたいに食べたい時に食べて、要らなくなったら捨てるだけ。
Faasuto fuudo mitai ni tabetai toki ni tabete, iranaku nattara suteru dake.
過去に何があったかって？あら、あたしが喋りすぎるのを嫌う人もいるのよ。
Kako ni nan ga atta ka tte? ara, atashi ga shaberi sugiru no wo kirau hito mo iru no yo.
でも何も覚えてないのは幸せなこと、そうでしょ？
De mo nan mo oboetenai no wa shiawase na koto, sou desho?


BLEU = 0.00 0.0/0.0/0.0/0.0 (BP = 1.000 ratio = 1.000 hyp_len = 5 ref_len = 5)

https://github.com/Hironsan/asari

In [248]:
# Setting up Japanese Sentiment Analysis
from asari.api import Sonar
sonar = Sonar()
jp_sent = sonar.ping(text=tl_string)
pp.pprint(jp_sent)

{'classes': [{'class_name': 'negative', 'confidence': 0.05036956071853638},
             {'class_name': 'positive', 'confidence': 0.9496304392814636}],
 'text': '傭兵たちと一緒に働くのが好きだったな。傭兵は効率的な戦争の道具です。ファストフードと同じように、すぐに食べられ、使い捨ても簡単です。過去に何があったのか知りたいですか？うーん、その話題については口を閉ざしておいたほうがいいと思う人もいます。ねえ、何も覚えていないのは恵みですよね？',
 'top_class': 'positive'}


In [177]:
#  Deriving the rating and confidence rate from the sentiment analysis dictionary.
rating = {i['confidence'] for i in jp_sent['classes'] if i['class_name'] ==jp_sent ["top_class"]}
print("{}, {}".format(jp_sent["top_class"], rating))


positive, {0.9496304392814636}


In [253]:
# Full Implementation of Above for Interface. Ensure Spacey, TextBlob, Cutlet, and Asari have been initialized to run properly.
def full_translate(user_text):

    if user_text == "" or user_text.isspace():
        return("","","")

    en_tokenized = en_parse(user_text)
    
    tl_tokenized = []

    for token in en_tokenized.sents:
        translation = translator.translate(token.text)
        tl_tokenized.append(translation)

    oneline = "".join(tl_tokenized)

    sentiment = sonar.ping(text=oneline)
    rating = {i['confidence'] for i in sentiment['classes'] if i['class_name'] ==sentiment ["top_class"]}
    sent_output = "{}, {}".format(sentiment["top_class"], rating)

    return(oneline, romanizer.romaji(oneline), sent_output)



UI: https://www.gradio.app/docs/interface

In [257]:
import gradio as gr
interface = gr.Interface(fn=full_translate, inputs=gr.Textbox(lines=5, placeholder='Text to translate'), outputs=['text','text','text'], theme='monochrome')

In [258]:
interface.launch()

Running on local URL:  http://127.0.0.1:7874

To create a public link, set `share=True` in `launch()`.


