Skip to content

Python script to process XML files created by the 'Qualitivity' plugin for Trados Studio. The script creates CSV file(s) with details of the language translation process including keystroke counts, count and duration of pauses in typing, and the duration of 'Records' or segment visits.

License

Notifications You must be signed in to change notification settings

ilrt/translation-xml-to-cvs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

A Python script to process XML files created by the 'Qualitivity' plugin for Trados Studio. The script creates CSV file(s) with details of the language translation process including keystroke counts, count and duration of pauses in typing, and the duration of 'Records' or segment visits.

The script uses pandas (https://pandas.pydata.org/) for the creation of the CSV file and NumPy (https://numpy.org/) for date parsing and the calculations of milliseconds.

Setup

Setup a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install libraries used by the script:

pip install --upgrade pip
pip install -r requirements.txt

If running on Windows, see https://docs.python.org/3/tutorial/venv.html

Running the script

The scripts take two arguments: a directory of the input XML files and a directory for the output files. The script will create the output directory if needed.

The following will create two files for each input XML file. One file will hold the duration and count values for each 'Record' in the source XML. The second is an 'audit' file providing each interval between keystrokes without categorisation. The 'audit' includes information about any 'system keystrokes' omitted from the main output files. These are keystroke logs generated by Trados Studio (e.g. when a segment is automatically populated with a machine translation match).

python process_xml.py ./sample ./output

It is possible to combine the results, so we only have two files: 'combined.csv' and 'combined-audit.csv'. The first has the duration and count values and the second provides the 'audit'. The combined files have an additional column – 'File' – that indicates the source XML file.

python process_xml.py ./sample ./output --combine

Translation process measures

The features for each 'Record' in the Qualitivity XML files are as follows. All pause measures are provided based on three minimum pause duration thresholds: 300 milliseconds and above (_300), 500 milliseconds and above (_500), and 1 second and above (_1s).

  • Record ID: The ID for each 'Record' in the Qualitivity output
  • Segment ID: The ID for each text segment
  • Total pause duration: The total duration of pauses - milliseconds.
  • Pause count: The number of pauses - count.
  • Keystrokes: The number of keystrokes ('ks created' elements) - count.
  • Active ms: The duration of each 'Record' copied from the 'activeMiliseconds' attribute in the Qualitivity XML file - milliseconds.
  • Record duration: The duration of each 'Record' computed as the difference between 'stopped' and 'started' times for the record - milliseconds.
  • Total duration: Same as Record duration, obtained by adding up all intervals in between keystrokes or beginning/end of a record.

About

Python script to process XML files created by the 'Qualitivity' plugin for Trados Studio. The script creates CSV file(s) with details of the language translation process including keystroke counts, count and duration of pauses in typing, and the duration of 'Records' or segment visits.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages