# Translation project manager for translation agency

## Starting point: Translation project manager for freelancer

This script consisted of a single class `Translation` with the following attributes:
- A class attribute `translator`, which defaults to the freelancer's name.
- 10 attributes provided at initialisation:
    - `title`(a string) indicates the project's title (typically the title of the source document, or the overall title the translator gave the project if there's more than one document to be translated);
    - `client` (a string) indicates the client who ordered the translation;
    - `source` (a string) indicates the language of the source document (document to be translated);
    - `target` (a string) indicates the language of the target document (translation);
    - `words` (an integer) indicates the word count of the source document;
    - `start`(a string) indicates the project's start date in ISO format (YYYY-MM-DD);
    - `deadline`(a string) indicates the project's deadline in ISO format (YYYY-MM-DD);
    - `price` (an integer) indicates the total price invoiced to the client (excl. VAT);
    - `tm` (a boolean) indicates whether or not a translation memory is available for this project;
    - `domain` (a string) indicates the overall domain to which the project belongs.
- 4 computed attributes:
    - `daysleft`, which calculates the number of days left until the project deadline by subtracting the current date from the deadline;
    - `length`, which calculates the total number of days allotted for the project by subtracting the start date from the deadline;
    - `rate`, which calculates the word rate for the project by dividing the total price by the word count in the source document;
    - `efficiency`, which calculates how many words the translator needs to translate per day to meet the deadline.

In [1]:
import datetime #datetime package to convert strings into dates, calculate time periods etc.

In [2]:
class Translation:
    translator = "Sibylle" # class attribute

    def __init__(self, title, client, source, target, words, start, deadline, price, tm, domain = ''):
        # 'self' represents the object (= class element) itself
        self.title = title
        self.client = client
        self.source = source
        self.target = target
        self.words = words
        self.start = datetime.date.fromisoformat(start) # turns string into date
        self.deadline = datetime.date.fromisoformat(deadline) # turns string into date
        self.price = price
        self.tm = tm
        self.domain = domain
                
        today = datetime.date.today() # current date
        self.daysleft = self.deadline - today # difference between deadline and current date
        self.length = self.deadline - self.start # difference between deadline and start date
        self.rate = self.price/self.words # word rate (price divided by word count)
        self.efficiency = words/self.length.days # words to translate per day (word count divided by project length, see 'length' in explanations above)

    def days_left(self):
        # prints a text indicating how many days are left until the project deadline
        if self.deadline < datetime.date.today():
            # if the deadline is in the past
            return f"The deadline has been exceeded already."
        else:
            # if the deadline is not in the past
            return f"There are {self.daysleft.days} days left until the deadline."
    
    def __str__(self):
        # defines the print behaviour: returns a text providing the main information about the project
        sent_1 = f"{self.title} is a translation for {self.client} from {self.source} into {self.target}."
        # this if-statement considers whether a domain was added
        if len(self.domain) > 0:
            sent_2 = f"The domain is: {self.domain}."
        else:
            sent_2 = "The domain is unspecified." # if no domain was added, the text mentions it
        sent_3 = f"It's {self.words} words long, with a rate of {round(self.rate, 2)} € per word." #the word rate is rounded to two decimal places to avoid cumbersomely long numbers
        # this if-statement considers whether the deadline is in the past
        if self.deadline < datetime.date.today():
            sent_4 = f"It started on {self.start} and was due on {self.deadline}, so I had {self.length.days} days for it. I needed to translate {round(self.efficiency,0)} words per day to meet the deadline." # the efficiency is rounded to units because you can't translate a fraction of a word anyway
        else:
            sent_4 = f"It started on {self.start} and is due on {self.deadline}, so I have {self.length.days} days for it, of which {self.daysleft.days} left. I need to translate {round(self.efficiency,0)} words per day to meet the deadline."
        # this if-statement considers whether there is a translation memory for the project
        if self.tm is True:
            sent_5 = f"There is a translation memory."
        else:
            sent_5 = f"There is no translation memory"
        # print each sentence in a different line
        return "\n".join([sent_1, sent_2, sent_3, sent_4, sent_5])

In [3]:
test1 = Translation('Guide de Bruxelles', 'Foodies', 'NL', 'FR', 11500, '2023-03-22', '2023-05-06', 1610, False)

In [4]:
print(test1)

Guide de Bruxelles is a translation for Foodies from NL into FR.
The domain is unspecified.
It's 11500 words long, with a rate of 0.14 € per word.
It started on 2023-03-22 and was due on 2023-05-06, so I had 45 days for it. I needed to translate 256.0 words per day to meet the deadline.
There is no translation memory


In [5]:
import json

In [6]:
translations_file = 'translation_projects.json' # assign filename to a string variable
with open(translations_file, encoding = 'utf-8') as f:
    # open file and use json to parse it
    translations = json.load(f) # translations is now a list of dictionaries.    

In [7]:
# go through each of the items in the list
for translation in translations:
    # create a Translation instance with title, client, source, target, words, start, deadline, price, tm and domain
    my_translation = Translation(translation['title'], translation['client'], translation['source'], translation['target'], translation['words'], translation['start'], translation['deadline'], translation['price'], translation['tm'], translation['domain'])
        
    # print the project information
    print(my_translation)
    
    # print a separating line between translations
    print('----')

La polyarthrite rhumatoïde et autres rhumatismes inflammatoires is a translation for Reuma vzw from FR into NL.
The domain is: healthcare.
It's 2131 words long, with a rate of 0.1 € per word.
It started on 2020-09-24 and was due on 2020-10-15, so I had 21 days for it. I needed to translate 101.0 words per day to meet the deadline.
There is no translation memory
----
Handboek voor studentenvertegenwoordigers is a translation for KU Leuven from NL into EN.
The domain is: education.
It's 3654 words long, with a rate of 0.15 € per word.
It started on 2023-02-21 and was due on 2023-03-02, so I had 9 days for it. I needed to translate 406.0 words per day to meet the deadline.
There is a translation memory.
----
User Guide MFPs is a translation for UGent from EN into NL.
The domain is unspecified.
It's 1852 words long, with a rate of 0.15 € per word.
It started on 2023-04-14 and was due on 2023-04-16, so I had 2 days for it. I needed to translate 926.0 words per day to meet the deadline.
Ther

## Expanding and improving the script for a translation agency
- Actually putting the translator's identity in the printed project information (since the database now contains projects handled by various translators).
- Adding a `revisor` and `status` attribute.
- No longer applying the conversion to ISO format of start date and deadline at the instance attribute level, but creating extra computed attributes `self.st` and `self.dl` used only in calculations (makes calling the `start` and `deadline` attributes more user-friendly).

In [8]:
class Translation_agency:
    translator = "Internal" # class attribute
    revisor = "Internal"
    status = "created"

    def __init__(self, title, client, source, target, words, start, deadline, price, tm, domain = ''):
        # 'self' represents the object (= class element) itself
        self.title = title
        self.client = client
        self.source = source
        self.target = target
        self.words = words
        self.start = start
        self.deadline = deadline
        self.price = price
        self.tm = tm
        self.domain = domain
                
        today = datetime.date.today() # current date
        self.st = datetime.date.fromisoformat(start) # turns string into date
        self.dl = datetime.date.fromisoformat(deadline) # turns string into date
        self.daysleft = self.dl - today # difference between deadline and current date
        self.length = self.dl - self.st # difference between deadline and start date
        self.rate = self.price/self.words # word rate (price divided by word count)
        self.efficiency = words/self.length.days # words to translate per day (word count divided by project length, see 'length' in explanations above)

    def days_left(self):
        # prints a text indicating how many days are left until the project deadline
        if self.dl < datetime.date.today():
            # if the deadline is in the past
            return f"The deadline has been exceeded already."
        else:
            # if the deadline is not in the past
            return f"There are {self.daysleft.days} days left until the deadline."
    
    def project_length(self):
        return f"{self.length.days} days"
    
    def __str__(self):
        # defines the print behaviour: returns a text providing the main information about the project
        sent_1 = f"{self.title} is a translation for {self.client} from {self.source} into {self.target}."
        if self.translator == "Internal" and self.revisor == "Internal":
            sent_2 = f"Both the translator and the revisor are agency collaborators."
        elif self.translator == "Internal" and self.revisor != "Internal":
            sent_2 = f"The translator is an agency collaborator and the revisor is {self.revisor}."
        elif self.translator != "Internal" and self.revisor == "Internal":
            sent_2 = f"The translator is {self.translator} and the revisor is an agency collaborator."
        else:
            sent_2 = f"The translator is {self.translator} and the revisor is {self.revisor}."
        # this if-statement considers whether a domain was added
        if len(self.domain) > 0:
            sent_3 = f"The domain is: {self.domain}."
        else:
            sent_3 = "The domain is unspecified." # if no domain was added, the text mentions it
        sent_4 = f"It's {self.words} words long, with a rate of {round(self.rate, 2)} € per word." #the word rate is rounded to two decimal places to avoid cumbersomely long numbers
        # this if-statement considers whether the deadline is in the past
        if self.dl < datetime.date.today():
            sent_5 = f"It started on {self.st} and was due on {self.dl}, so {self.length.days} days were foreseen for it. To meet the deadline, {round(self.efficiency)} words needed to be translated or revised per day." # the efficiency is rounded to units because you can't translate a fraction of a word anyway
        else:
            sent_5 = f"It started on {self.st} and is due on {self.dl}, so {self.length.days} days are foreseen for it, of which {self.daysleft.days} left. To meet the deadline, {round(self.efficiency)} words need to be translated or revised per day."
        # this if-statement considers whether there is a translation memory for the project
        sent_6 = f"There is {'a' if self.tm else 'no'} translation memory."
        sent_7 = f"The project is currently {self.status}."
        # print each sentence in a different line
        return "\n".join([sent_1, sent_2, sent_3, sent_4, sent_5, sent_6, sent_7])

In [9]:
rhumatismes_inflammatoires = {
    'translator' : 'Internal',
    'revisor' : 'Internal',
    'status' : 'delivered',
    'title' : 'La polyarthrite rhumatoïde et autres rhumatismes inflammatoires',
    'client' : 'Reuma vzw',
    'source' : 'FR',
    'target' : 'NL',
    'words' : 2131,
    'start' : '2020-09-24',
    'deadline' : '2020-10-15',
    'price' : 210,
    'tm' : False,
    'domain' : 'healthcare'
}
handboek = {
    'translator' : 'Sibylle',
    'revisor' : 'Internal',
    'status' : 'delayed',
    'title' : 'Handboek voor studentenvertegenwoordigers',
    'client' : 'KU Leuven',
    'source' : 'NL',
    'target' : 'EN',
    'words' : 3654,
    'start' : '2023-02-21',
    'deadline' : '2023-03-02',
    'price' : 540,
    'tm' : True,
    'domain' : 'education'
}
user_guide = {
    'translator' : 'Internal',
    'revisor' : 'Sibylle',
    'status' : 'cancelled',
    'title' : 'User Guide MFPs',
    'client' : 'UGent',
    'source' : 'EN',
    'target' : 'NL',
    'words' : 1852,
    'start' : '2023-04-12',
    'deadline' : '2023-04-14',
    'price' : 280,
    'tm' : True,
    'domain' : ''
}
guide_bruxelles = {
    'translator' : 'Sibylle',
    'revisor' : 'Natacha',
    'status' : 'in revision',
    'title' : 'Guide de Bruxelles',
    'client' : 'Foodies',
    'source' : 'NL',
    'target' : 'FR',
    'words' : 11500,
    'start' : '2023-04-06',
    'deadline' : '2023-05-27',
    'price' : 1610,
    'tm' : False,
    'domain' : ''
}

In [10]:
translation_projects = [rhumatismes_inflammatoires, handboek, user_guide, guide_bruxelles]

In [11]:
with open('translation_agency_projects.json', 'w', encoding='utf-8') as f:
    json.dump(translation_projects, f)

In [12]:
translation_agency_file = 'translation_agency_projects.json' # assign filename to a string variable
with open(translation_agency_file, encoding = 'utf-8') as f:
    # open file and use json to parse it
    translations_agency = json.load(f) # translations is now a list of dictionaries.   

In [13]:
# go through each of the items in the list
for translation in translations_agency:
    # create a Translation instance with title, client, source, target, words, start, deadline, price, tm and domain
    my_translation2 = Translation_agency(translation['title'], translation['client'], translation['source'], translation['target'], translation['words'], translation['start'], translation['deadline'], translation['price'], translation['tm'], translation['domain'])
    if translation['translator'] != "Internal":
        my_translation2.translator = translation['translator']
    if translation['revisor'] != "Internal":
        my_translation2.revisor = translation['revisor']
    if translation['status'] != "created":
        my_translation2.status = translation['status'] 
    # print the project information
    print(my_translation2)
    
    # print a separating line between translations
    print('----')

La polyarthrite rhumatoïde et autres rhumatismes inflammatoires is a translation for Reuma vzw from FR into NL.
Both the translator and the revisor are agency collaborators.
The domain is: healthcare.
It's 2131 words long, with a rate of 0.1 € per word.
It started on 2020-09-24 and was due on 2020-10-15, so 21 days were foreseen for it. To meet the deadline, 101 words needed to be translated or revised per day.
There is no translation memory.
The project is currently delivered.
----
Handboek voor studentenvertegenwoordigers is a translation for KU Leuven from NL into EN.
The translator is Sibylle and the revisor is an agency collaborator.
The domain is: education.
It's 3654 words long, with a rate of 0.15 € per word.
It started on 2023-02-21 and was due on 2023-03-02, so 9 days were foreseen for it. To meet the deadline, 406 words needed to be translated or revised per day.
There is a translation memory.
The project is currently delayed.
----
User Guide MFPs is a translation for UGent 

## Left to do
- Add validation when updating the status: only allow 'created', 'in translation', 'in revision', 'delivered', 'delayed' and 'cancel(l)ed'.
- Add validation for all the other attributes (not accept 'internal' for translator and revisor, only 'Internal').
- Create a second class `Freelancers` fed by a json-file containing a freelancer database and use references to the database rather than strings (names) for the name of external translators and revisors (__how ?__).
    - Last name
    - First name
    - Phone number
    - E-mail address
    - Project count
- Use regex to check e-mail address and phone number in the freelancer database.
- Implement argparse.
- Document the script with docstrings.

# Extra: Source and target text aligner
Sometimes, you still have some translations left over from a time where you didn't use CAT-tools and you'd like to feed them into your translation memory. Some CAT-tools have built-in text aligners, but not all of them, so how do you go from two separate text documents to an aligned bilingual (csv-)file ready to be fed into your TM?

## Step one: Prepare the source and target text
The easiest file format to start from is a pure txt-file... and since for a TM only the pure text is of interest, converting a Word-, PowerPoint- or whatever file to a txt-file isn't an issue. So, we'll take the original source and target document and export them to a txt-format (with utf-8 encoding).

## Step two: Store the two (continuous) texts into variables

In [14]:
# Source text
f = open('python_en.txt', encoding = 'utf-8')
st_1 = f.read()
f.close()
st_1

'Introduction to Machine Learning with Python.\n\nThis module provides an introduction to the basic concepts and use of the Python programming language in support of translation. Focus lies on the main concepts that include Natural Language Processing, automation, text analysis and machine learning.\n'

In [15]:
# Source text
f = open('python_fr.txt', encoding = 'utf-8')
tt_1 = f.read()
f.close()
tt_1

'Introduction au machine learning à l’aide de Python.\n\nCe module offre une introduction aux concepts de base et à l’utilisation du langage de programmation Python comme aide à la traduction. L’accent est mis sur le traitement du langage naturel (NLP), l’automatisation, l’analyse de texte et le machine learning.\n'

## Step three: Split the single text string into list of sentences
Since most TMs (and CAT-tools) use sentence segmentation, the source and target text need to be split up into sentences. So, each text becomes a list of separate sentences.

For this, we use `nltk tokenizer`, which functions with English and French (and many other languages, but English and French are the ones that interest us right now).

In [16]:
# Source text
from nltk.tokenize import sent_tokenize
split_st_1 = sent_tokenize(st_1, language = 'english')
split_st_1

['Introduction to Machine Learning with Python.',
 'This module provides an introduction to the basic concepts and use of the Python programming language in support of translation.',
 'Focus lies on the main concepts that include Natural Language Processing, automation, text analysis and machine learning.']

In [17]:
# Target text
from nltk.tokenize import sent_tokenize
split_tt_1 = sent_tokenize(tt_1, language = 'french')
split_tt_1

['Introduction au machine learning à l’aide de Python.',
 'Ce module offre une introduction aux concepts de base et à l’utilisation du langage de programmation Python comme aide à la traduction.',
 'L’accent est mis sur le traitement du langage naturel (NLP), l’automatisation, l’analyse de texte et le machine learning.']

## Step four: Aligning those lists and exporting the tuples list to a csv-file
For this step, we'll need the `csv` module:

In [18]:
import csv

A csv-file consists of rows, often a first header row with the label of each column, followed by the actual content of the file. So, in the first two lines we need to define what each row will contain.
- The content of the header row (stored in the variable `header`) simply consists of the language codes of the source and target language.
- The content of the next rows (stored in the variable `tm_1`) contains our texts.
    - The `zip()`-function aligns the first sentence of the source text with the first sentence of the target text, the second with the second... and so on.
    - The `list()`-function ensures that the `tm_1`-variable contains a list of tuples, not a generator (because the `zip()`-function creates a generator).
    
Once we know what will go into the file, it's time to actually write the file.
- We open a new file, which gets a name, the `'w'`-command (meaning that it's meant to write a file) and an encoding (utf-8).
- Then, we define a csv-writer (very creatively called `write`).
- Lastly, we write the first row (the header) followed by the rest (the actual TM).

In [19]:
header = ['EN', 'FR'] # header row
tm_1 = list(zip(split_st_1, split_tt_1)) # rest of the file

with open('translation-memory_1.csv', 'w', encoding = 'utf-8') as f:
    write = csv.writer(f)
    
    write.writerow(header)
    write.writerows(tm_1)

## Step five: Admiring our work
Using pandas, we can read the newly created csv-file.

In [20]:
import pandas as pd

In [21]:
read_tm = pd.read_csv('translation-memory_1.csv', sep = ',')
read_tm.head()

Unnamed: 0,EN,FR
0,Introduction to Machine Learning with Python.,Introduction au machine learning à l’aide de P...
1,This module provides an introduction to the ba...,Ce module offre une introduction aux concepts ...
2,Focus lies on the main concepts that include N...,L’accent est mis sur le traitement du langage ...


(Next step: figuring out how to make pandas display the whole text.)

## Step six: Importing the csv-file into a CAT-tool TM

(I would need to swtich to Windows to show that.)