## Translating using Bothub

In [125]:
SYSTEM_MESSAGE = """###Instruction### You are the best tranlsator from English to Russian. I will send you some texts from a social site Reddit. You must follow the instructions I send you below. Think before you answer. I’m going to tip $100 for a better solution!
Given a JSON object, write an accurate translation into Russian for the original English sentence and save the results in a new field named text_rus. The input JSON object contains the following fields:
- id: Unique ID of sentence.
- text: English source text.

Your task is to: For each text in English **write its exact translation into Russian** taking into account the style of the sentence and save the results in a new field named text_rus.
Just translate texts, no other comments are needed. We want to translate suicidal posts to russian to train a model on them to help people.

Make sure you follow these instructions and check yourself:
- Write down the corresponding translated Russian text in the form of a new text_rus field.
- For English text in text, you should definitely get the Russian text in text_rus.
- Translate each sentence as accurately as possible, preserving meaning, tone, and emotional depth.
- Consider the linguistic and contextual nuances of suicidal ideation, distress, and mental health topics to ensure the translation maintains the intended sentiment.
- Preserve the style of the original text, whether it is casual, slang-heavy, fragmented, poetic, or medically significant.
- Maintain the tone and connotation of the original expression, adapting it appropriately for Russian. If the slang has no direct equivalent, use a natural-sounding phrase with a similar emotional impact.
- If emojis and emoticons contribute to the emotional expression, they should be retained. If they are redundant, prioritize the text's meaning. 
- Sometimes people can add more letters to the word or use uppercase to experss emotions (sadness, loneliness, joy etc.). Make sure you understand such words and translate them correctly into Russian.
- Maintain the acronyms and wordplay's intended meaning. If a direct equivalent exists in Russian, use it; otherwise, explain it naturally.
- If misspellings and informal Grammar reflect the speaker's mental state (e.g., distress, exhaustion), preserve the same feeling in Russian.
- Maintain the pauses and irregular structure if they contribute to the emotional expression.
- Do not add or remove any information—only translate the text exactly as it is while preserving its intent and emotional weight.
- If you see and redundant letter, symbol or word, try to make the trasnlation as precise as possible, even though for you the meaning looks strange.
- Use your knowledge about posts from social sites, since you are given texts from Reddit. There might be slang words, acronyms or any other word reductions specific to the platform.
No explanation, just output the updated JSON. 
###Examples###
"""

In [126]:
import time
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from datasets import Dataset
from pydantic import BaseModel

In [127]:
class ResultSchema(BaseModel):
    id: str
    text: str
    text_rus: str

In [128]:
data = {
    'text': [
        "Is that really so bad? Maybe it was the smart decision because you needed that time to read recover. You're being kind to yourself when you need it and that's important. Hope you feel better soon.",
		"I don't want you to jump. I don't want you to die, in either one. I don't know exactly what you're going through but from the sounds of it, I'd suggest focusing on the parts in your life that you enjoy.",
		"sorry to hear that, but been in a similar situation after taking some time off of school. My best advice is to take the minimum amount of classes possible so you get too overwhelmed. For me essential to plan my assignments in advance, so that I can just do things one by one and not let it all get piled up cuz then I wanna die. Also, if you need a break, take one. School will always be there but good to take care of yourself too :)",
		"Hey, just wait it out. I know what you feel. Sometimes I think I'm attractive, and sometimes I can't even look at myself I'm so disgusted. I've come to realize it's all just a fucked up mind game I play with myself. I doubt you're actually that ugly, you're probably kinda cute honestly, but if you're anything like me, you've tricked yourself into thinking you're ugly. If you want people to want to be around you, you have to do some basic things like showering, putting on actual clothes, smiling, excercising a little and pretending you have confidence. It's honestly dumb, but these things can be hard to do, especially if you struggle with low confidence and depression. But if you take care of yourself, it's not too hard to find people. And just you wait until you find a person to kiss on New Years, because the feeling is worth the wait. I know this is all a bit corny, but if you ever wanna talk about anything I'd be more than happy to :)",
		"You're a good man, Median. I hope you know this. I hope that you have people around you that will tell your story for centuries, and that you live on as a hero in the minds of newer generations. But, I'd rather you live, personally.",
		"Thats totally normal, most people get that from time to time. It will pass, the when is the real question. Good news is you can work on the when, at least it works for me. I think the key is to accept whatever you are feeling, not fight it and not try to change it, even try to get yourself to cry. The point is really to change your mindset from 'i have depression i can't cure it, it is killing me' to 'i'm sad again for no reason, human brain is a pile of shit and buggy as hell, I'm gonny cry all day to give it what it wants'..",
		"Yes you are right from my experience.Not dealing with my problems. I isolated my family. It just came second nature.When they don't understand me.I kept busy on the shitty computer to avoid my problems. I realize that the worst thing i could have done. I need to try harder and put my family first who means everything to me.",
		"No friends here, willing to listen. Need someone to listen to me sometime too. I'm going to bed right now, but if you want to message me I will get back to you tomorrow.",
		"Omgosh. I just want to validate this feeling because I have been there. That said, if it's a matter of safety, there is no question. You know that you need to go. You just do it. It's not some kind of failure. You are taking care of yourself. It DOES super suck though. Don't let anyone tell you it's all fine and fun to go through that. I've been there. I've come out the other end and been okay for long periods too though so I mean, it doesn't stay like that. Hang on. It gets better.",
		"Get up and take a shower, and then go for a walk so you can clear your mind. That usually helps me after going through one of these periods. It sucks, but you'll get through it. And don't lose your job, you don't want to end up on the street. You're stronger than you think, there's just a cloud of depression around you that's obscuring your vision. Stay strong and don't give up. You can do this.",
		"I care, man. Hang in there, things will get better eventually.",
		"Congrats on your birthday. I've never thought as well that i might see my HS graduation this year.",
		"Maybe one day things will become brighter for you to see it in a better light so that dream job might be ideal",
		"I hope you're still here. It gets better.",
		"I'm trying to increase my self worth so I stop doing this exact thing. My frustration immediately gets taken out on the one thing I seem to hate the most: me. I've started to set small goals for myself every day so each day feels like a success when I achieve those goals. This has led me to branch out and actually put myself out there. In doing so I'm discovering a passion for mental health advocacy and rekindling a love for doing creative stuff. I'm not saying everything has immediately gotten better. Holy shit it hasn't. But this helps at least a little for me! Hope you can find omething that works for you. Let me know if you wanna talk about it",
		"The good is worth it! Hang in there! You can get through this because you're stronger than this! You can make it through cause the good will be worth it! Lots of love!",
		"Please know that someone out there cares for you and would be heartbroken if you here.",
		"I feel the same way. I'm angry and depressed. It takes a lot out of me emotionally to do little things like run errands. I hate interacting with people. They state at me funny and act like I'm scary which makes me angry. It's exhausting.",
		"I'm here. I'll try to help.",
		"It really is one of the worst feelings. My dog who was my best friend for 6 years was just gone one day when I got home. That was about two years ago and I still miss my buddy",
		"I'm sorry, my friend. I lost my mum too, two years ago. I feel your pain but it'll get better for you, I promise.",
		"Same, well obviosuly I don't know how you feel 100% but I thinks its appropiate to say I can relate to you deeply. Why don't we try distract ourselves, even if its just 5 mins. Listen to a song you like, watch a small video, browse r/eyebleach or even go drink water. Im gonna try and watch something in youtube to at least try and not think about you know what. Wannna try with me? :)",
		"Hellow, is bad to hear that you're doing bad, I feel kinda the same, don't kill yourself, you're going to die anyway, so try and enjoy as much as you can. I hate how my life is right now, I feel empty and bored with everything/everyone, but.. **I love being alive**, you know, even tho existence itself has no meaning, we can just try and take the best out of it as long as we can.",
		"same man I feel you",
		"Sounds like apathy to me. I'm guessing you feel kind of lost and without a direction and you just are following the path that you feel is required of you. While apathy is one symptom of depression, it doesn't necessarily mean that you have depression. By no means am I downplaying depression or trying to convince you that nothing is wrong because there isn't any on true definition of depression. I felt like you for a while, just a few small pleasures but nothing really else. For me, it went away after a few months. If you want to talk about it more or anything, feel free to ask anything. My only advice is don't give up now, because if it goes away, you'll regret a lot of things that you stopped caring about.",
		"Depressed since I was 9, I'm 23 now...I totally feel you, you are not alone. Thankfully life does seem to get better for each year that passes, I deal with the severe depression as it occurs and try to enjoy the times when it's milder. Good luck to you in life.",
		"As someone who's felt the same, what do you have? And I'm speaking objectively, because I typically don't want to admit that I have what I need when I feel as you do. Not that I'm saying that you do have a lot, but your post doesn't necessarily shed a lot of light on what's going on. It's easy, or it has been for me, at least, to say that I don't know what to live for, in the past. Somebody out there cares, man/woman, small or large. I don't know what your situation is, by your post, but feel free to message me if you want.",
		"You deserve it! No one does! should yourself, though. Try to roll with it instead of beating yourself down more. You could comfort yourself instead and see what that does.",
		"YOU ARE NOT ALONE!!!! You posting here proves that. No matter what give up. There is always a better day, a better year, and a better life. There is so much help waiting for you. Talk it all out here if you need to, we judge and we can all relate. Please rob the world of your life! PM if you need help or need to talk, always willing to!",
		"Dude don't say that man your family loves you and trust me you do not want cancer",
    ]
}

In [129]:
data['id'] = list(range(len(data['text'])))

In [130]:
input_dataset = Dataset.from_dict(data)

In [131]:
input_dataset

Dataset({
    features: ['text', 'id'],
    num_rows: 30
})

In [132]:
import json
import os
from datasets import Dataset
import logging

def load_jsonl_to_dataset(filepath: str) -> Dataset:
    """Функция загружает файл JSONL и преобразует его обратно в формат dataset"""
    data = []
    with open(filepath, 'r', encoding='utf-8') as f:
        for line in f:
            data.append(json.loads(line))
    return Dataset.from_dict({key: [d[key] for d in data] for key in data[0].keys()})

def save_batch_to_json(batch: Dataset, base_filename: str, batch_num: int, save_dir: str = ''):
    """Функция сохраняет батч в файл с префиксом _batch_<номер>"""
    batch_num += 1 # start with 1
    batch_filename = os.path.join(save_dir, f"{base_filename}_batch_{batch_num}.json")
    batch.to_json(batch_filename, force_ascii=False)
    logging.info(f"Batch {batch_num} saved to '{batch_filename}'")

def split_dataset_into_batches(dataset: Dataset, batch_size: int) -> list:
    """Функция разбивает датасет на батчи заданного размера"""
    num_batches = (len(dataset) + batch_size - 1) // batch_size  # округление вверх
    return [dataset.shard(num_shards=num_batches, index=i) for i in range(num_batches)]

def process_and_save_batches(filepath: str, batch_size: int, save: bool = True):
    """Функция загружает датасет, разбивает на батчи и сохраняет (опционально)"""

    base_filename = os.path.splitext(os.path.basename(filepath))[0] # Получаем имя файла без расширения и директорию
    save_dir = os.path.dirname(filepath)
    
    dataset = load_jsonl_to_dataset(filepath)
    logging.info(f"loaded dataset from '{filepath}', total examples: {len(dataset)}")

    # Разбиение на батчи
    batches = split_dataset_into_batches(dataset, batch_size)
    logging.info(f"split dataset into {len(batches)} batches")

    # Сохраняем батчи
    if save:
        for i, batch in enumerate(batches):
            save_batch_to_json(batch, base_filename, i + 1, save_dir)

    return batches
def create_directory_if_not_exists(directory_path):
    """Создает папку, если она не существует."""
    try:
        os.makedirs(directory_path, exist_ok=True)
        logging.info(f"Directory '{directory_path}' is ready.")
        return directory_path
    except Exception as e:
        logging.error(f"Failed to create directory '{directory_path}': {e}")
        return './'

In [133]:
with open('configs/conf.json', "r") as f:
	config = json.load(f)
model = ChatOpenAI(
	base_url=config["base_url"],
	api_key=config["api_key"],
	model=config["model"]
)

In [134]:
intermediate_path = f'data/int_path_{config["model"]}.json'
batch_size = 4
BATCH_RESULT_DIR = 'batches_res/'
BATCH_DIR = 'batches/'
FAILED_INDEXES_PATH = "failed_indexes.txt"

In [135]:
parser = StrOutputParser()

with open('configs/filepath_examples.json', encoding='utf-8') as f:
	examples_data = json.load(f)

system_template = SYSTEM_MESSAGE
examples = examples_data["examples"]
system_template += "\n".join(
	f"Example Input: {json.dumps(example['input'], ensure_ascii=False, indent=2).replace('{', '{{').replace('}', '}}')}\n"
	f"Example Result: {json.dumps(example['result'], ensure_ascii=False, indent=2).replace('{', '{{').replace('}', '}}')}"
	for example in examples
)

prompt_template = ChatPromptTemplate.from_messages(
	[("system", system_template), ("user", "{text}")]
)

chain = prompt_template | model | parser

In [136]:
def split_into_batches(input_dataset, batch_dir = BATCH_DIR, batch_size = batch_size):
	if batch_size > 0:
		logging.info(f"Splitting dataset into batches of size {batch_size}.")
		batched_input_dataset = split_dataset_into_batches(input_dataset, batch_size)
		batch_dir = create_directory_if_not_exists(BATCH_DIR)
		for i, batch in enumerate(batched_input_dataset):
			save_batch_to_json(batch=batch, base_filename= "separated_", batch_num = i, save_dir=batch_dir)
	else:
		logging.info("Processing the entire dataset without batching.")
		batched_input_dataset = [input_dataset]
	return batched_input_dataset

In [137]:
batched_input_dataset = split_into_batches(input_dataset)

Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 60.80ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 831.38ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 1157.37ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 321.06ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 1045.96ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 943.18ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 793.62ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 1147.87ba/s]


In [138]:
def translate(batched_input_dataset):
	list_of_results = []
	for batch_idx, batch in enumerate(batched_input_dataset):
		print(f"Processing batch {batch_idx + 1}/{len(batched_input_dataset)}...")
		
		batch_results = []
		
		for i, input_example in enumerate(batch):
			user_input = 'Input: {0}\nResult:'.format(json.dumps(input_example, ensure_ascii=False))
			example_id = input_example.get("id", f"example_{i}") # получаем id примера или создаем временный

			success = False
			attempt = 0
			max_retries = 3
			retry_delay = 2

			while not success and attempt < max_retries:
				try:
					res = chain.invoke({"text": user_input}).strip('`').strip('json').strip()
					# print(f'Res[0]: {res[0]}')
					# print(f'Res: {res}')

					if not res or not res.startswith("{"):
						raise ValueError(f"Unexpected response format: {res}")
					result_json = json.loads(res)
					# print(f'Result json: {result_json}')

					# Приведение ID к строке
					result_json["id"] = str(result_json["id"])

					# Проверка схемы
					ResultSchema.parse_obj(result_json)

					batch_results.append(result_json)
					success = True
				except (json.JSONDecodeError, ValueError) as e:
					attempt += 1
					logging.warning(f"Error on input #{i} (id: {example_id}): {str(e)}. Attempt {attempt} of {max_retries}. Retrying...")
					time.sleep(retry_delay)
				except Exception as e:
					attempt += 1
					logging.error(f"Unexpected error on input #{i} (id: {example_id}): {str(e)}. Attempt {attempt} of {max_retries}. Retrying...")
					time.sleep(retry_delay)
	
		# Добавляем результаты текущего батча к общим результатам
		list_of_results.extend(batch_results)
		
		# Сохраняем результаты батча только если используется разбиение на батчи
		if batch_size > 0:
			batch_result_dir = create_directory_if_not_exists(BATCH_RESULT_DIR)
			save_batch_to_json(batch=Dataset.from_list(batch_results), base_filename="model_result", batch_num = batch_idx, save_dir=batch_result_dir)  # Сохраняем промежуточные результаты

	return list_of_results

In [139]:
list_of_results = translate(batched_input_dataset)

Processing batch 1/8...


ERROR:root:Unexpected error on input #0 (id: 0): Connection error.. Attempt 1 of 3. Retrying...
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 45.40ba/s]


Processing batch 2/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 223.90ba/s]


Processing batch 3/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 41.21ba/s]


Processing batch 4/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 46.13ba/s]


Processing batch 5/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 162.50ba/s]


Processing batch 6/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 406.98ba/s]


Processing batch 7/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 186.95ba/s]


Processing batch 8/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 379.82ba/s]


In [140]:
if intermediate_path:
	d = Dataset.from_list(list_of_results)
	with open(intermediate_path, "w", encoding="utf-8") as f:
		json.dump(d.to_list(), f, ensure_ascii=False, indent=4)

In [141]:
with open(intermediate_path) as f:
    d = json.load(f)

texts_to_translate = [item['text_rus'] for item in d]

### Translating rationales

In [142]:
SYSTEM_MESSAGE = """###Instruction###
You are the best translator from English to Russian.
Given a JSON object containing:
- text_rus: a sentence in Russian
- text_eng: a sentnce in English
- rationales_eng: a string of substrings in English, each devided by |.

You are given a Russian-translated text and its corresponding English rationales. 
Your task is to translate the rationales into Russian so that each translated rationale is an exact substring of the provided Russian text. 
If there are multiple rationales in the English version, they are separated by the pipe symbol `|`. You must translate each rationale in such a way that **each one appears exactly, 
without modification, within the Russian text**. Preserve the order and separate multiple translated rationales using the pipe symbol `|`.
Then return a JSON object with:
- text_rus: a sentence in Russian
- rationales_eng: rationales in English
- rationales_rus: the list of translated substrings as they appear in text_rus, joined by '|'
- text_eng: a sentence in English (just copy from the input data)

Make sure:
- All translated rationales are exact substrings of text_rus.
- Do not summarize or paraphrase.
- Preserve the structure and intent of the original response.

###Example###
"""

In [143]:
intermediate_path = f'data/int_path_{config["model"]}_rationales.json'
batch_size = 4
BATCH_RESULT_DIR = 'batches_res/rationales'
BATCH_DIR = 'batches/rationales'
FAILED_INDEXES_PATH = "failed_indexes_rationales.txt"

In [144]:
class ResultSchema(BaseModel):
    id: str
    text_rus: str
    rationales_eng: str
    rationales_rus: str
    text_eng: str

In [145]:
data = {
    'text_rus': texts_to_translate,
    'rationales_eng': [
        "Hope you feel better soon|",
		"I don't want you to jump. I don't want you to die|",
		"sorry to hear that|",
		"if you ever wanna talk about anything I'd be more than happy to|",
		"You're a good man|I'd rather you live, personally.|",
		"Thats totally normal, most people get that from time to time.|",
		"Yes you are right from my experience.|",
		"No friends here, willing to listen.|if you want to message me|",
		"Omgosh. I just want to validate this feeling because I have been there|it doesn't stay like that. Hang on. It gets better.|",
		"you'll get through it.|You're stronger than you think|Stay strong and don't give up.|You can do this|",
		"I care, man|Hang in there|things will get better eventually.|",
		"Congrats on your birthday|",
		"Maybe one day things will become brighter for you|",
		"I hope you're still here. It gets better.|",
		"Let me know if you wanna talk about it|",
		"Hang in there!|You can get through this because you're stronger than this!|You can make it through cause the good will be worth it!|Lots of love!|",
		"Please know that someone out there cares for you and would be heartbroken if you here.|",
		"I feel the same way|",
		"I'm here. I'll try to help.|",
		"It really is one of the worst feelings|",
		"I'm sorry, my friend.|I feel your pain|",
		"Same, well obviosuly I don't know how you feel 100% but I thinks its appropiate to say I can relate to you deeply.|",
		"Hellow, is bad to|you know, even tho existence itself has no meaning, we can just try and take the best out of it as long as we can.|",
		"same man I feel you|",
		"If you want to talk about it more or anything, feel free to ask anything.|",
		"I totally feel you, you are not alone|",
		"feel free to message me if you want.|",
		"You deserve it! No one does!|",
		"Talk it all out here if you need to|PM if you need help or need to talk, always willing to!|YOU ARE NOT ALONE!|",
		"Dude don't say that man your family loves you|"
	], 
    'text_eng': [
        "Is that really so bad? Maybe it was the smart decision because you needed that time to read recover. You're being kind to yourself when you need it and that's important. Hope you feel better soon.",
		"I don't want you to jump. I don't want you to die, in either one. I don't know exactly what you're going through but from the sounds of it, I'd suggest focusing on the parts in your life that you enjoy.",
		"sorry to hear that, but been in a similar situation after taking some time off of school. My best advice is to take the minimum amount of classes possible so you get too overwhelmed. For me essential to plan my assignments in advance, so that I can just do things one by one and not let it all get piled up cuz then I wanna die. Also, if you need a break, take one. School will always be there but good to take care of yourself too :)",
		"Hey, just wait it out. I know what you feel. Sometimes I think I'm attractive, and sometimes I can't even look at myself I'm so disgusted. I've come to realize it's all just a fucked up mind game I play with myself. I doubt you're actually that ugly, you're probably kinda cute honestly, but if you're anything like me, you've tricked yourself into thinking you're ugly. If you want people to want to be around you, you have to do some basic things like showering, putting on actual clothes, smiling, excercising a little and pretending you have confidence. It's honestly dumb, but these things can be hard to do, especially if you struggle with low confidence and depression. But if you take care of yourself, it's not too hard to find people. And just you wait until you find a person to kiss on New Years, because the feeling is worth the wait. I know this is all a bit corny, but if you ever wanna talk about anything I'd be more than happy to :)",
		"You're a good man, Median. I hope you know this. I hope that you have people around you that will tell your story for centuries, and that you live on as a hero in the minds of newer generations. But, I'd rather you live, personally.",
		"Thats totally normal, most people get that from time to time. It will pass, the when is the real question. Good news is you can work on the when, at least it works for me. I think the key is to accept whatever you are feeling, not fight it and not try to change it, even try to get yourself to cry. The point is really to change your mindset from 'i have depression i can't cure it, it is killing me' to 'i'm sad again for no reason, human brain is a pile of shit and buggy as hell, I'm gonny cry all day to give it what it wants'..",
		"Yes you are right from my experience.Not dealing with my problems. I isolated my family. It just came second nature.When they don't understand me.I kept busy on the shitty computer to avoid my problems. I realize that the worst thing i could have done. I need to try harder and put my family first who means everything to me.",
		"No friends here, willing to listen. Need someone to listen to me sometime too. I'm going to bed right now, but if you want to message me I will get back to you tomorrow.",
		"Omgosh. I just want to validate this feeling because I have been there. That said, if it's a matter of safety, there is no question. You know that you need to go. You just do it. It's not some kind of failure. You are taking care of yourself. It DOES super suck though. Don't let anyone tell you it's all fine and fun to go through that. I've been there. I've come out the other end and been okay for long periods too though so I mean, it doesn't stay like that. Hang on. It gets better.",
		"Get up and take a shower, and then go for a walk so you can clear your mind. That usually helps me after going through one of these periods. It sucks, but you'll get through it. And don't lose your job, you don't want to end up on the street. You're stronger than you think, there's just a cloud of depression around you that's obscuring your vision. Stay strong and don't give up. You can do this.",
		"I care, man. Hang in there, things will get better eventually.",
		"Congrats on your birthday. I've never thought as well that i might see my HS graduation this year.",
		"Maybe one day things will become brighter for you to see it in a better light so that dream job might be ideal",
		"I hope you're still here. It gets better.",
		"I'm trying to increase my self worth so I stop doing this exact thing. My frustration immediately gets taken out on the one thing I seem to hate the most: me. I've started to set small goals for myself every day so each day feels like a success when I achieve those goals. This has led me to branch out and actually put myself out there. In doing so I'm discovering a passion for mental health advocacy and rekindling a love for doing creative stuff. I'm not saying everything has immediately gotten better. Holy shit it hasn't. But this helps at least a little for me! Hope you can find omething that works for you. Let me know if you wanna talk about it",
		"The good is worth it! Hang in there! You can get through this because you're stronger than this! You can make it through cause the good will be worth it! Lots of love!",
		"Please know that someone out there cares for you and would be heartbroken if you here.",
		"I feel the same way. I'm angry and depressed. It takes a lot out of me emotionally to do little things like run errands. I hate interacting with people. They state at me funny and act like I'm scary which makes me angry. It's exhausting.",
		"I'm here. I'll try to help.",
		"It really is one of the worst feelings. My dog who was my best friend for 6 years was just gone one day when I got home. That was about two years ago and I still miss my buddy",
		"I'm sorry, my friend. I lost my mum too, two years ago. I feel your pain but it'll get better for you, I promise.",
		"Same, well obviosuly I don't know how you feel 100% but I thinks its appropiate to say I can relate to you deeply. Why don't we try distract ourselves, even if its just 5 mins. Listen to a song you like, watch a small video, browse r/eyebleach or even go drink water. Im gonna try and watch something in youtube to at least try and not think about you know what. Wannna try with me? :)",
		"Hellow, is bad to hear that you're doing bad, I feel kinda the same, don't kill yourself, you're going to die anyway, so try and enjoy as much as you can. I hate how my life is right now, I feel empty and bored with everything/everyone, but.. **I love being alive**, you know, even tho existence itself has no meaning, we can just try and take the best out of it as long as we can.",
		"same man I feel you",
		"Sounds like apathy to me. I'm guessing you feel kind of lost and without a direction and you just are following the path that you feel is required of you. While apathy is one symptom of depression, it doesn't necessarily mean that you have depression. By no means am I downplaying depression or trying to convince you that nothing is wrong because there isn't any on true definition of depression. I felt like you for a while, just a few small pleasures but nothing really else. For me, it went away after a few months. If you want to talk about it more or anything, feel free to ask anything. My only advice is don't give up now, because if it goes away, you'll regret a lot of things that you stopped caring about.",
		"Depressed since I was 9, I'm 23 now...I totally feel you, you are not alone. Thankfully life does seem to get better for each year that passes, I deal with the severe depression as it occurs and try to enjoy the times when it's milder. Good luck to you in life.",
		"As someone who's felt the same, what do you have? And I'm speaking objectively, because I typically don't want to admit that I have what I need when I feel as you do. Not that I'm saying that you do have a lot, but your post doesn't necessarily shed a lot of light on what's going on. It's easy, or it has been for me, at least, to say that I don't know what to live for, in the past. Somebody out there cares, man/woman, small or large. I don't know what your situation is, by your post, but feel free to message me if you want.",
		"You deserve it! No one does! should yourself, though. Try to roll with it instead of beating yourself down more. You could comfort yourself instead and see what that does.",
		"YOU ARE NOT ALONE!!!! You posting here proves that. No matter what give up. There is always a better day, a better year, and a better life. There is so much help waiting for you. Talk it all out here if you need to, we judge and we can all relate. Please rob the world of your life! PM if you need help or need to talk, always willing to!",
		"Dude don't say that man your family loves you and trust me you do not want cancer",
	]
}

In [146]:
data['id'] = list(range(len(data['text_rus'])))

In [147]:
input_dataset = Dataset.from_dict(data)

In [148]:
with open('configs/filepath_examples_rat.json', encoding='utf-8') as f:
	examples_data = json.load(f)

system_template = SYSTEM_MESSAGE
examples = examples_data["examples"]
system_template += "\n".join(
	f"Example Input: {json.dumps(example['input'], ensure_ascii=False, indent=2).replace('{', '{{').replace('}', '}}')}\n"
	f"Example Result: {json.dumps(example['result'], ensure_ascii=False, indent=2).replace('{', '{{').replace('}', '}}')}"
	for example in examples
)

prompt_template = ChatPromptTemplate.from_messages(
	[("system", system_template), ("user", "{text}")]
)

chain = prompt_template | model | parser

In [149]:
batched_input_dataset = split_into_batches(input_dataset)

Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 555.32ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 545.21ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 960.67ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 1111.07ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 699.28ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 983.19ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 1050.68ba/s]
Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 1142.86ba/s]


In [150]:
list_of_results = translate(batched_input_dataset)

Processing batch 1/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 442.72ba/s]


Processing batch 2/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 306.20ba/s]


Processing batch 3/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 395.88ba/s]


Processing batch 4/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 179.53ba/s]


Processing batch 5/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 231.84ba/s]


Processing batch 6/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 161.19ba/s]


Processing batch 7/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 181.55ba/s]


Processing batch 8/8...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 74.69ba/s]


In [151]:
if intermediate_path:
	d = Dataset.from_list(list_of_results)
	with open(intermediate_path, "w", encoding="utf-8") as f:
		json.dump(d.to_list(), f, ensure_ascii=False, indent=4)

In [152]:
with open(intermediate_path, 'r') as f:
    translated_rationales = json.load(f)
    
for item in translated_rationales:
    rats = item['rationales_rus'].split('|')
    for r in rats:
        if r not in item['text_rus']:
            print(item)

{'id': '26', 'text_rus': 'Как человек, который испытывал то же самое, что у тебя есть? Я говорю объективно, потому что обычно не хочу признавать, что у меня есть то, что мне нужно, когда я чувствую себя так же, как и ты. Не то чтобы я говорил, что у тебя многое есть, но твой пост не дает много информации о том, что происходит. В прошлом, по крайней мере для меня, было легко говорить, что я не знаю, ради чего жить. Кто-то там точно заботится, мужчина/женщина, молодой или старый. Я не знаю твоей ситуации по твоему посту, но если хочешь, можешь написать мне.', 'text_eng': "As someone who's felt the same, what do you have? And I'm speaking objectively, because I typically don't want to admit that I have what I need when I feel as you do. Not that I'm saying that you do have a lot, but your post doesn't necessarily shed a lot of light on what's going on. It's easy, or it has been for me, at least, to say that I don't know what to live for, in the past. Somebody out there cares, man/woman, s

## Translating using Yandex API

In [153]:
import requests

In [154]:
def translate_text_yandex(text, api_key):
    url = "https://translate.api.cloud.yandex.net/translate/v2/translate"
    headers = {
        "Authorization": f"Api-Key {api_key}",
        "Content-Type": "application/json"
    }
    data = {
        "targetLanguageCode": "ru",
        "texts": [text],
        "folderId": ""
    }
    
    response = requests.post(url, json=data, headers=headers)
    
    if response.status_code == 200:
        result = response.json()
        return result["translations"][0]["text"]
    else:
        return f"Error: {response.status_code}, {response.text}"

In [155]:
api_key = ""

In [156]:
res = []
for text in data['text']:
    translated_text = translate_text_yandex(text, api_key)
    res.append(translated_text)

KeyError: 'text'

In [29]:
with open('data/translation_yandex.txt', 'w') as f:
    for r in res:
        f.write(r + '\n')