# Batch Translation Simulated Engine (Magic)

In [1]:
from pptx import Presentation
from os import scandir
import pandas as pd

## Finding repeated text runs in a batch of PPTX files

In [61]:
folder = input('Folder: ').replace('\\', '/')
# folder

Folder:  D:\Earn\Translate\English\Stories\Corny


Extracting all text runs longer than 2 chars form each file:

In [54]:
all_runs = dict()  # keys will be file names, each mapped to a list of text runs from that file as strings
for file in scandir(folder):
    if file.name.endswith('.pptx'):
        pres = Presentation(file.path)
        runs = []
        for slide in pres.slides:
            for shape in slide.shapes:
                if shape.has_text_frame: 
                    for paragraph in shape.text_frame.paragraphs:
                        for run in paragraph.runs:
                            if len(run.text) > 2:
                                runs.append(run.text)
        all_runs[file.name.rstrip('.pptx')] = runs  
        
# the dict is not really needed at this point but could be built upon at a later dev stage

In [73]:
# merge all entries into one long list
runs_through = []
for val in all_runs.values():
    runs_through += val

# get count of occurrences for each entry
counted_runs = dict()
for run in runs_through:
    counted_runs[run] = runs_through.count(run)

In [90]:
# Pulling counted entries into a dataframe
text_bits = pd.DataFrame({'Entry': counted_runs.keys(), 
                          'Count': counted_runs.values()})

In [93]:
# selecting strings occuring at least twice
useful_text_bits = text_bits[text_bits.Count >= 2].sort_values(by = "Count", ascending=False)

In [94]:
useful_text_bits

Unnamed: 0,Entry,Count
467,Ответ,42
299,Corny,36
65,"день, Ситуация",30
79,Опишите словами и проиллюстрируй картинками сп...,24
36,Опишите подробно ситуацию,24
...,...,...
1,Задание 1,2
12,"После того, как вы выполните и отправите нам э...",2
13,"мы вышлем второе задание,",2
160,1 (продолжение),2


And saving strings occuring at least twice into a csv file

In [118]:
xl_destination = folder + '/repeated_bits.csv'
xl_destination

useful_text_bits.to_csv(xl_destination, sep='|', index_label='Index')

**At this point control is passed to a Sentient Entity (SE) capable of operating relevant human languages to do their magic on the csv file...**

!!! Processed file should be **properly** saved by SE preserving the columns !!!

They will have to figure out how to insert Translation column in the right position and manually fill it out using whatever tools at their disposal to prove their sentience.

## Replacing all occurrences of repeated strings across all files with translations provided by the SE, if any

and saving processed files under new names

In [57]:
path = folder + '/translated_bits_from_csv.txt'
path

# reading the file magically processed by SE - control returned to Computer
translated_bits = pd.read_csv(path, sep='|', usecols=['Entry','Translation'], index_col='Entry')
translated_bits.dropna(inplace=True)

In [44]:
translated_bits

Unnamed: 0_level_0,Translation
Entry,Unnamed: 1_level_1
Ответ,Answer
Опишите словами и проиллюстрируй картинками справа,"Please, provide verbal description and illustr..."
Опишите подробно ситуацию,Detailed description of the situation
Место для фото,Paste photo here
Место для фото ситуации,Paste photo of the situation here
...,...
Задание 1,Task 1
"После того, как вы выполните и отправите нам это задание,",Once you have completed the task and sent it t...
"мы вышлем второе задание,",We will send the second task to you
1 (продолжение),1 (continued)


In [60]:
for file in scandir(folder):  # scanning pptx files in folder one by one 
    if file.name.endswith('.pptx'):
        pres = Presentation(file.path)
        changed = False
        for slide in pres.slides:
            for shape in slide.shapes:
                if shape.has_text_frame: 
                    for paragraph in shape.text_frame.paragraphs:
                        for run in paragraph.runs:
                            if run.text in translated_bits.index:  # finding matched text runs
                                run.text = translated_bits.Translation[run.text]  # substituting translation
                                changed = True  # marking the file as changed
        if changed:
            pres.save(folder + '/preprocessed_' + file.name)  # saving processed file if changed

Voila