**DEV**:
* **where to run**: thesis-remote (Workspace) [SSH: login*.pegasus.kl.dfki.de]
* **kernel**: appraise-env (Python 3.10.6)
* **directory**: `~/Appraise`
* Appraise server (`manage.py runserver`) has to be running simultaneously

**PROD**:
* **where to run**: um-server (Workspace) [SSH: um]
* **kernel**: Python 3.8.10

In [1]:
import sys
sys.path.append("/home/juliafalcao/")
sys.path.append("/home/juliafalcao/experiments")

from experiments.constants import * 
from experiments.utils import *

from random import randint
from collections import OrderedDict
import json

  from pandas.core.computation.check import NUMEXPR_INSTALLED


In [2]:
# SRC, TGT = SPANISH, BASQUE
SRC, TGT = ENGLISH, MALTESE

LP_FOLDER = f"/home/juliafalcao/thesis_data/{SRC}-{TGT}"
BATCHES_FILENAME = f"batches.{SRC}-{TGT}.json"
BATCHES_PATH = f"{LP_FOLDER}/{BATCHES_FILENAME}"
print("lp folder:", LP_FOLDER)
print("batches filename:", BATCHES_FILENAME)

lp folder: /home/juliafalcao/thesis_data/en-mt
batches filename: batches.en-mt.json


## Create batches

In [22]:
# configurations
_task_definition = OrderedDict({
    "TGT": 80,
    "CHK": 0,
    "REF": 10,
    "BAD": 10,
})


In [23]:
TASK_DEFINITION = ":".join(map(str, _task_definition.values()))
SRC_FILE = f"{LP_FOLDER}/src.{SRC}"
REF_FILE = f"{LP_FOLDER}/ref.{TGT}"
SYSTEMS_FOLDER = f"{LP_FOLDER}/systems/"

print("src file:", SRC_FILE)
print("ref file:", REF_FILE)
print("/systems:", os.listdir(SYSTEMS_FOLDER))

src file: /netscratch/falcao/data/es-eu/eval-set/src.es
ref file: /netscratch/falcao/data/es-eu/eval-set/ref.eu
/systems: ['cmbacktrans.eu', 'nllb.eu', 'euskadi.eu']


In [24]:
# call the command from bash but using variables set in Python
! python manage.py CreateDirectAssessmentData \
    100 \
    $SRC.code3 \
    $TGT.code3 \
    $LP_FOLDER/src.$SRC \
    $LP_FOLDER/ref.$TGT \
    $SYSTEMS_FOLDER \
    $BATCHES_PATH \
    --task-definition $TASK_DEFINITION \
    --required-annotations 3 \
    --source-based \
    --all-batches

Using task definition: (80, 0, 10, 10)
Loaded 400 source segments
Loaded 400 reference segments
character_based = False
Loaded 400 system nllb.eu segments
Loaded 400 system cmbacktrans.eu segments
Loaded 400 system euskadi.eu segments
Creating /netscratch/falcao/data/es-eu/eval-set/batches.es-eu.json.segments ... OK
Missing items is 8/80/1192
Added 8 missing items rotating keys
Total number of batches is 15
0 10 10
chk_items: 0
ref_items: 10
bad_items: 10
chk_ids: []
ref_ids: [31, 11, 25, 44, 47, 36, 37, 33, 35, 16]
bad_ids: [42, 39, 20, 0, 45, 26, 13, 10, 1, 4]
empty_slots [52, 53, 55, 56, 57, 58, 59, 62, 64, 65, 67, 68, 69, 71, 72, 73, 74, 77, 78, 79, 80, 82, 84, 88, 90, 91, 93, 96, 98, 99]
len(batch_items): 100
len(batch_items) == None: 0
0 10 10
chk_items: 0
ref_items: 10
bad_items: 10
chk_ids: []
ref_ids: [31, 8, 1, 22, 6, 9, 15, 11, 19, 41]
bad_ids: [20, 35, 46, 26, 43, 0, 5, 12, 32, 10]
empty_slots [52, 53, 54, 57, 63, 64, 66, 67, 68, 71, 73, 74, 75, 77, 78, 79, 80, 83, 84, 86, 

## Create campaign

In [3]:
# configuration

CAMPAIGN_NAME = f"{SRC.code3.capitalize()}{TGT.code3.capitalize()}V1"
print("campaign name:", CAMPAIGN_NAME)

campaign name: EngMltV1


In [4]:
manifest = {
    "CAMPAIGN_URL": "http://127.0.0.1:8000/dashboard/sso/",
    "CAMPAIGN_NAME": CAMPAIGN_NAME,
    "CAMPAIGN_KEY": CAMPAIGN_NAME,
    "CAMPAIGN_NO": randint(0,100),
    "REDUNDANCY": 1,

    "TASKS_TO_ANNOTATORS": [
        [ SRC.code3, TGT.code3, "uniform", 1, 1 ]
    ]
}

assert type(manifest["TASKS_TO_ANNOTATORS"]) == list and type(manifest["TASKS_TO_ANNOTATORS"][0]) == list

In [5]:
MANIFEST_PATH = f"{LP_FOLDER}/manifest.json"

with open(MANIFEST_PATH, mode="w+") as f:
    json.dump(manifest, f, indent=4)

In [8]:
! python3 manage.py StartNewCampaign \
    $MANIFEST_PATH \
    --batches-json $BATCHES_PATH

JSON manifest path: '/home/juliafalcao/thesis_data/en-mt/manifest.json'
CSV output path: None
Excel output path: None
No task type found in the manifest file, assuming it is "Direct". If this is incorrect, define "TASK_TYPE" in the manifest file.
### Running InitCampaign
All languages: [('eng', 'mlt')]
Identified superuser: falcao
Processed Market/Metadata instances
### Creating a new campaign
- '/home/juliafalcao/thesis_data/en-mt/batches.en-mt.json'
Batch: /home/juliafalcao/thesis_data/en-mt/batches.en-mt.json
  Market: eng_mlt_EngMltV1
  Metadata: eng->mlt/EngMltV1["1.0"]
Uploaded file name: Batches/batches.en-mt.json
Campaign name: EngMltV1
### Running validatecampaigndata
Campaign name: EngMltV1
Batch name: Batches/batches.en-mt.json
Validated 1 batches
### Running ProcessCampaignData
Batches/batches.en-mt.json 1
6 ref.mt
153 b'\xc4\xa6afna nazzjonijiet s\xc4\xa7a\xc4\xa7 huma kompletament fluwenti bl-Ingli\xc5\xbc, u f\xe2\x80\x99sa\xc4\xa7ansitra \xc4\xa7afna iktar tista\xe2\x80