# Version 1

## Gemini Pro API 
You can get API key here  
https://ai.google.dev/  
It's free now.  

**Note** : Remember to fill you api key in `.env` file

In [1]:
import os
from dotenv import load_dotenv

# Remember to fill you api key in .env file
load_dotenv()

from langchain_google_genai import ChatGoogleGenerativeAI
from google.generativeai.types.safety_types import HarmBlockThreshold, HarmCategory

safety_settings = {
    # Gemini blocks everything
    # HarmCategory.HARM_CATEGORY_UNSPECIFIED: HarmBlockThreshold.BLOCK_NONE, 
    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE, 
    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE, 
    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE, 
    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
}

geminipro = ChatGoogleGenerativeAI(model="gemini-pro",
                                temperature=0.5, top_p=0.85, 
                                 safety_settings=safety_settings,
                                convert_system_message_to_human=True) #SystemMessages are not yet supported!


## OpenAI API (Optional)
You can get API key here
https://platform.openai.com/docs/overview  

Note : Remember to fill you api key in `.env` file

In [2]:
!pip install -qU langchain-openai

In [3]:
from langchain_openai import OpenAI, ChatOpenAI
import os
from dotenv import load_dotenv

# Remember to fill you api key in .env file
load_dotenv()

openai = OpenAI(max_tokens=2048)

In [4]:
def split_paragraph(source_path):
    from langchain_text_splitters import RecursiveCharacterTextSplitter
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1024,
        chunk_overlap=0,
        separators= ["\n\n\n\n","\n\n\n","\n\n"]
        # length_function=len,
        # is_separator_regex=True,
    )
    content=""
    with open(source_path, 'r') as f:
        content = f.read()
        
    return text_splitter.split_text(content)

In [5]:
def translate_paragraph(llm, paragraphs, target_file=None):
    from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
    from langchain_core.output_parsers.string import StrOutputParser
    prompt = PromptTemplate.from_template(
        """You are a professional translation assistant. Your task is to translate a paragraph from english to traditional Chinese.
      
You should first sepatate each paragraph into several complete sentences.
Each sentence should be prepended with a newline(\n).
Then you should translate each sentence into zh-tw followed by the original sentence, separated by a newline(\n).
Add one more newline(\n) at end before processing next sentence.
Make sure no original text or translations should be skipped.

Example:
---
input:
Before one girl and another even younger one stood a figure in full plate armor brandishing a sword.
The blade swung, sparkling in the sunlight as if to say that taking their lives in a single stroke would be an act of mercy.
---
output:
在一名少女以及比她更年輕的少女面前，站著一位身穿全身板甲、揮舞著劍的男子。
Before one girl and another even younger one stood a figure in full plate armor brandishing a sword.

刀鋒揮動，在陽光下閃爍，彷彿在說一刀奪命是仁慈的作為。
The blade swung, sparkling in the sunlight as if to say that taking their lives in a single stroke would be an act of mercy.


---

The original paragraph is as following:

{input}

""")
    
    chain = prompt | llm | StrOutputParser()
    translated=[]
    from tqdm.notebook import tqdm, trange
    with tqdm(total=len(paragraphs)) as progress_bar:
        for i, paragraph in enumerate(paragraphs):
            if len(paragraph.strip()) > 0:
                temp = ""
                for chunk in chain.stream({"input": paragraph}):
                    print(chunk, end="", flush=True)
                    temp += chunk
                print("\n------\n")    
                translated.append(temp)
                if target_file:
                    with open(target_file, 'a+') as f:
                        f.write(temp)
                        f.write('\n')
            progress_bar.update(1)    

In [6]:
source_filenames = ["alice_in_wonderland.txt"]
for i, source_file in enumerate(source_filenames):
    path=source_file
    paragraphs = split_paragraph(path)[1:5]
    translated_paragraphs = translate_paragraph(geminipro, paragraphs)

  0%|          | 0/4 [00:00<?, ?it/s]

章節一

                      掉進兔子洞
------

愛麗絲開始對坐在河岸上陪伴姊姊感到很疲倦，而且無所事事：她曾一兩次偷瞄姊姊正在讀的書，但書裡沒有插圖或對話，愛麗絲心想：「沒有插圖或對話的書有什麼用？」
Alice was beginning to get very tired of sitting by her sister
on the bank, and of having nothing to do:  once or twice she had
peeped into the book her sister was reading, but it had no
pictures or conversations in it, `and what is the use of a book,'
thought Alice `without pictures or conversation?'


所以她開始思考（盡她所能，因為炎熱的天氣讓她感到昏昏欲睡且遲鈍），編織雛菊花環的樂趣是否值得起身去摘雛菊的麻煩，突然間一隻粉紅色眼睛的白兔跑過她身邊。
So she was considering in her own mind (as well as she could,
for the hot day made her feel very sleepy and stupid), whether
the pleasure of making a daisy-chain would be worth the trouble
of getting up and picking the daisies, when suddenly a White
Rabbit with pink eyes ran close by her.
------

那沒有什麼特別值得注意的；愛麗絲
也不覺得兔子自言自語：「喔，親愛的！喔，親愛的！我遲到了！」有什麼特別奇怪的。（事後她回想起來，才想到她應該對此感到奇怪，但當時一切似乎都很自然）；
但當兔子真的從背心口袋裡拿出懷錶，
看了看，然後匆匆離開時，愛麗絲站了起來，因為她腦海中閃過一個念頭，她從未見過一隻兔子有背心口袋，或從裡面拿出懷錶，她好奇心大發，於是跑過田野追在兔子後面，很幸運地剛好及時看到兔子跳進樹籬

## V1 output:
The output is terrible!  
Many source text or translations are lost.  
And many layout errors occur after processing few sentences.  
This might be the most bilingual article are writen in the format `english - zh-tw\`,  
but we want llm to translate item with the layout `zh-tw - english`.
  
And there are also many sentences got split(linebreak) by mistake, I want to remove the redundent '\n's.  

In [None]:
# CHAPTER I

#                       Down the Rabbit-Hole

# CHAPTER I
# 第一章

#                       Down the Rabbit-Hole
#                       愛麗絲漫遊仙境
# ------

# 愛麗絲開始覺得坐在河岸邊的姊姊旁很無聊，而且無所事事：她曾經一兩次偷看姊姊正在讀的書，但書中沒有圖片或對話，愛麗絲心想：「沒有圖片或對話的書有什麼用？」
# Alice was beginning to get very tired of sitting by her sister
# on the bank, and of having nothing to do:

# 她曾經一兩次偷看姊姊正在讀的書，但書中沒有圖片或對話，
# once or twice she had
# peeped into the book her sister was reading, but it had no
# pictures or conversations in it,

# 愛麗絲心想：「沒有圖片或對話的書有什麼用？」
# `and what is the use of a book,'
# thought Alice `without pictures or conversation?'
# ------

# 她於是開始思考（儘管她能思考的不多，因為炎熱的天氣讓她覺得非常想睡覺，而且腦袋昏沉），是否願意起身去摘雛菊來製作花環，還是覺得這樣做太麻煩，就在她猶豫不決時，突然一隻眼睛是粉紅色的白兔從她身旁跑過。
# So she was considering in her own mind (as well as she could,
# for the hot day made her feel very sleepy and stupid), whether
# the pleasure of making a daisy-chain would be worth the trouble
# of getting up and picking the daisies, when suddenly a White
# Rabbit with pink eyes ran close by her.
# ------

# 在當時，這並沒有什麼特別值得注意的地方；愛麗絲
# There was nothing so VERY remarkable in that;

# 也沒有覺得兔子自言自語說「喔，親愛的！喔，親愛的！
# nor did Alice
# 我遲到了！」有什麼特別奇怪的（事後她回想起這件事，
# think it so VERY much out of the way to hear the Rabbit say to
# 發現自己應該對這件事感到驚訝，但在當時這一切似乎
# itself,

# 都很自然）；
# `Oh dear!  Oh dear!  I shall be late!'  (when she thought
# 但當兔子真的從它的背心口袋中
# it over afterwards, it occurred to her that she ought to have
# TOOK A WATCH OUT OF ITS WAISTCOAT-
# wondered at this, but at the time it all seemed quite natural);

# 拿出懷錶來看，然後匆匆忙忙地走了，愛麗絲立刻站
# POCKET, and looked at it, and then hurried on, Alice started to
# 起身，因為她突然想到自己從未看過一隻兔子有背心口
# her feet, for it flashed across her mind that she had never
# 袋，或從裡面拿出懷錶，她懷著滿腔好奇心，跑過田野
# before seen a rabbit with either a waistcoat-pocket, or a watch to
# 跟在兔子後面，幸運的是，她及時看到兔子跳進了樹
# take out of it, and burning with curiosity, she ran across the
# 籬笆下的一個大兔子洞。
# field after it, and fortunately was just in time to see it pop
# down a large rabbit-hole under the hedge.
# ------

# 在另一刻，愛麗絲追著兔子掉下去，從未想過她要如何才能再出來。
# In another moment down went Alice after it, never once
# considering how in the world she was to get out again.
# ------

# 兔子洞像條隧道似的筆直地往前延伸，
# The rabbit-hole went straight on like a tunnel for some way,

# 然後突然向下傾斜，傾斜得如此突然，以致於愛麗絲來不及想辦法阻止自己，
# and then dipped suddenly down, so suddenly that Alice had not a
# moment to think about stopping herself before she found herself

# 就發現自己正掉進一口非常深的井裡。
# falling down a very deep well.
# ------

# 井要不是很深，就是她下落的速度很慢，因為她在下降的過程中，有充裕的時間四處張望，好奇接下來會發生什麼事。
# Either the well was very deep, or she fell very slowly, for she

# 她先試著往下看，想看清楚自己將會落到什麼地方，但因為太暗了什麼都看不見；然後她看看井壁，
# First, she tried to look

# 發現井壁上擺滿了櫥櫃和書架；她偶爾看到地圖和圖片掛在掛鉤上。
# and noticed that they were filled with cupboards and book-shelves;

# 她在經過一個書架時，從架子上取下一罐果醬；罐子上貼著「橘子果醬」的標籤，但她很失望地發現果醬罐是空的：她不願意把罐子摔在地上，以免砸到別人，於是在下落時設法把它放進其中一個櫥櫃裡。
# here and there she saw maps and pictures hung upon pegs. She

# took down a jar from one of the shelves as she passed; it was

# labelled `ORANGE MARMALADE', but to her great disappointment it

# was empty: she did not like to drop the jar for fear of killing

# somebody, so managed to put it into one of the cupboards as she

# fell past it.