# Claude 翻譯機

## 使用方式
1. 將欲翻譯的文字存入一個 txt 檔。每個段落以空行分隔。
2. 填入以下變數：
    - `INPUT_FILE`：欲翻譯的檔案名稱。
    - `OUTPUT_FILE`：翻譯後的檔案名稱，程式執行完後將會產生該檔案。
    - `LANGUAGE`：欲翻譯成的語言。
    - `MODEL`：使用模型。可以參考 [Anthropic 的官方文件](https://docs.anthropic.com/claude/docs/models-overview#model-comparison)
3. 在同個目錄下新增 `.env` 檔案，填入以下變數：
    - `CLAUDE_API_KEY`：Anthropic 的 API key。
4. 執行檔案。

### 需輸入變數

In [None]:
INPUT_FILE = (
    "C:\\PuSung\\University\\112_Senior\\112-2 Academic\\Thinking Philosophically about Love\\W5\\紳士抑或禽獸.txt"
)
OUTPUT_FILE = (
    "C:\\PuSung\\University\\112_Senior\\112-2 Academic\\Thinking Philosophically about Love\\W5\\紳士抑或禽獸_中文.txt"
)

LANGUAGE = "Traditional Chinese"
MODEL = "claude-3-haiku-20240307"

### 以下為程式碼

In [None]:
import anthropic
from dotenv import load_dotenv
import os
import time 

In [None]:
def preprocess_article(file_path):
    """
    Preprocess the article, into a list of paragraphs.
    
    Args:
        file_path(str): path to the article file
    Returns:
        list[str]: list of paragraphs, each paragraph is a string
    """
    
    # Read the file
    with open(file_path, "r", encoding="utf-8") as file:
        content = file.read()

    # Preprocess the content
    paragraphs = content.split("\n\n")
    preprocessed_paragraphs = [p.strip().replace("\n", " ").replace("- ", "") for p in paragraphs if p.strip()]

    return preprocessed_paragraphs

In [None]:
def get_paragraph_translated(paragraph, model=MODEL, max_token=3000):
    """
    Translate the given string into chinese using anthropic API.

    Args:
        paragraph(str): the paragraph to be translated
        max_token(int): the maximum token to be used for translation
    Returns:
        str: the translated paragraph
    """

    # Load the API key
    load_dotenv()
    api_key = os.getenv("CLAUDE_API_KEY")

    # Get first version of translation
    instruction = f"""
    Translate the following text into {LANGUAGE}.
    Leave the names in original language. 
    The translated text should be in {LANGUAGE}.
    The response should be the translation only. Do not include other text.
    You must follow the above instructions. Or the result will be poor and ruin my career.
    
    Here is the paragraph: {paragraph}"""

    # Translate the paragraph
    client = anthropic.Anthropic(api_key=api_key)
    message = client.messages.create(
        model=model,  # 模型型號
        max_tokens=max_token,
        messages=[{"role": "user", "content": instruction}],
        temperature=0.3,
    )

    return message.content[0].text

In [None]:
def save_translated_text(translated_paragraphs, output_file):
    """
    Save the translated article into a file.
    
    Args:
        translated_paragraphs(list[str]): list of translated paragraphs
        output_file(str): path to the output file
    Returns:
        None
    """
    
    with open(output_file, "w", encoding="utf-8") as file:
        for paragraph in translated_paragraphs:
            file.write(paragraph + "\n\n")

In [None]:
# Get the paragraphs
paragraphs = preprocess_article(INPUT_FILE)
par_cnt = len(paragraphs)

# Translate the paragraphs
translated_par = []
par_index = 0
while par_index != par_cnt:
    try:
        translated_par.append(get_paragraph_translated(paragraphs[par_index]))
        par_index += 1
        print(f"Paragraph {par_index}/{par_cnt}: {translated_par[-1]}")
    except Exception as e:
        print(e)
        time.sleep(20) # Usually is the error of rate limit, wait for 20 seconds and try again

# Save the translated paragraphs
save_translated_text(translated_par, OUTPUT_FILE)