# Translation of Chinese Data Features

Currently, we are using Ollama to deploy an LLM instance as a backend. <br><br>

We are running Qwen2.5-7B, an open-source model developed by Alibaba Cloud, capable of translating Chinese into English

In [21]:
import pandas as pd
from tqdm import tqdm
from openai import OpenAI

## Testing Backend

In [8]:
client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Who are you created by?',
        }
    ],
    model='qwen2.5',
)

In [9]:
chat_completion.choices[0].message.content

'I am Qwen, created by Alibaba Cloud. If you have any questions or need assistance, feel free to ask!'

In [18]:
rootpath = "KuaiRec 2.0/"
captions = pd.read_csv(rootpath + "data/kuairec_caption_category.csv", engine='python')
captions.head()

Unnamed: 0,video_id,manual_cover_text,caption,topic_tag,first_level_category_id,first_level_category_name,second_level_category_id,second_level_category_name,third_level_category_id,third_level_category_name
0,0,UNKNOWN,精神小伙路难走 程哥你狗粮慢点撒,[],8.0,颜值,673.0,颜值随拍,-124.0,UNKNOWN
1,1,UNKNOWN,,[],27.0,高新数码,-124.0,UNKNOWN,-124.0,UNKNOWN
2,2,UNKNOWN,晚饭后，运动一下！,[],9.0,喜剧,727.0,搞笑互动,-124.0,UNKNOWN
3,3,UNKNOWN,我平淡无奇，惊艳不了时光，温柔不了岁月，我只想漫无目的的走走，努力发笔小财，给自己买花 自己长大.,[],26.0,摄影,686.0,主题摄影,2434.0,景物摄影
4,4,五爱街最美美女 一天1q,#搞笑 #感谢快手我要上热门 #五爱市场 这真是完美搭配啊！,"[五爱市场,感谢快手我要上热门,搞笑]",5.0,时尚,737.0,营销售卖,2596.0,女装


In [31]:
def translate_text(text):
    try:
        chat_completion = client.chat.completions.create(
            messages=[
                {
                    'role': 'system',
                    'content': '''You are a translator. Translate the given Chinese text to English. Ignore if it already is in English. 
                    Just give me the translated text ONLY.''',
                },
                {
                    'role': 'user',
                    'content': f'Translate the following Chinese text to English: {text}',
                }
            ],
            model='qwen2.5', 
        )
        return chat_completion.choices[0].message.content.strip()
    except Exception as e:
        print(f"Error translating text {text}: {e}")
        return None
    
# Test translation
translate_text('精神小伙路难走 程哥你狗粮慢点撒?')

'The spirit boy has a tough road ahead, Brother Cheng, can you slow down with your dog treats?'

## Translate Captions & Category Names

In [33]:
tqdm.pandas(desc="Translating captions")
captions['english_caption'] = captions['caption'].progress_apply(translate_text)

Translating captions:   4%|▎         | 377/10732 [08:06<3:42:41,  1.29s/it] 


KeyboardInterrupt: 

In [None]:
captions.to_csv(rootpath + "data/kuairec_caption_category_translated.csv", index=False)