# Argument(ation) mining

## Table of Contents(maybe remove)

- [Argument Components](#arg_comp)
    * [Load Data](#load_data)
    * [Components distribution](#comp_dist)
    * [Transformer-based Argument Mining for Healthcare Applications](#ecai2020)
    * [Argument components classification results](#comp_clf)
        - Results: pre-problem identification
        - Results: post-problem identification
* [Translation and projections](#translation_projection)
    - Translations
    - Projections
- [Zero-shot](#zero-shot)
    - Train (ES) + test(ES)
    - BETO
    - Data augumentation results => train(EN + ES), test(ES), test(EN)
    - 
    
* [Relation Classification](#rel_clf)
    - Results

<a class='anchor' id='arg_comp'></a>
# Argument Components 

In [3]:
import pandas as pd
import glob
import os

<a id='load_data'></a>
### Load data 

Load AbstRCT data that contains .txt and .ann files for each category of disease 

In [5]:
neoplasm_trn = 'abstrct-master/AbstRCT_corpus/data/train/neoplasm_train/'
neoplasm_tst = 'abstrct-master/AbstRCT_corpus/data/dev/neoplasm_dev/'
neoplasm_dv = 'abstrct-master/AbstRCT_corpus/data/test/neoplasm_test/'
mixed = 'abstrct-master/AbstRCT_corpus/data/test/mixed_test/'
glaucoma = 'abstrct-master/AbstRCT_corpus/data/test/glaucoma_test/'


def get_annotated_data(url):
    files = glob.glob(url+"/*.ann")

    
    df = pd.DataFrame()
    for f in files:

        df_temp = pd.read_csv(f, sep='\t', header = None)
        df = df.append(df_temp)
        df['url'] = f[50:-4]
    df.dropna(inplace=True)
    df.reset_index(inplace=True)
    return df

# Neoplasm
neoplasm_train = get_annotated_data(neoplasm_trn)
neoplasm_test = get_annotated_data(neoplasm_tst)
neoplasm_dev = get_annotated_data(neoplasm_dv)

# Glaucoma
glaucoma_test = get_annotated_data(glaucoma)

# Mixed
mixed_test = get_annotated_data(mixed)

In [6]:
neoplasm_train.head()

Unnamed: 0,index,0,1,2,url
0,0,T1,Premise 761 825,The overall remission rate was 87% with 31% co...,-master/AbstRCT_corpus/data/train/neoplasm_tra...
1,1,T2,Premise 826 932,The median survival of all 406 eligible patien...,-master/AbstRCT_corpus/data/train/neoplasm_tra...
2,2,T3,Premise 933 1107,"The overall remission rate, the rate of comple...",-master/AbstRCT_corpus/data/train/neoplasm_tra...
3,3,T4,Premise 1108 1248,In limited disease the estimated percentages o...,-master/AbstRCT_corpus/data/train/neoplasm_tra...
4,4,T5,Premise 1249 1437,Patients with extensive disease survived signi...,-master/AbstRCT_corpus/data/train/neoplasm_tra...


In [7]:
glaucoma_test.head()

Unnamed: 0,index,0,1,2,url
0,0,T1,Premise 683 812,Percentage of IOP reduction or the magnitude o...,-master/AbstRCT_corpus/data/test/glaucoma_test...
1,1,T2,Premise 813 982,"In the visual field, the estimated rate of cha...",-master/AbstRCT_corpus/data/test/glaucoma_test...
2,2,T3,Premise 983 1130,The estimated rate of change in MD showed no s...,-master/AbstRCT_corpus/data/test/glaucoma_test...
3,3,T4,Premise 1131 1292,No changes in the optic nerve head topography ...,-master/AbstRCT_corpus/data/test/glaucoma_test...
4,4,T5,Premise 1293 1378,There were no patients who dropped out due to ...,-master/AbstRCT_corpus/data/test/glaucoma_test...


In [8]:
# Neoplasm

# neoplasm_train.info() #  2267
# neoplasm_test.info() # 326
# neoplasm_dev.info() # 686

# Glaucoma
# glaucoma_test.info() # 594

# Mixed
mixed_test.info()  # 600

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 600 entries, 0 to 599
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   index   600 non-null    int64 
 1   0       600 non-null    object
 2   1       600 non-null    object
 3   2       600 non-null    object
 4   url     600 non-null    object
dtypes: int64(1), object(4)
memory usage: 23.6+ KB


In [9]:
neoplasm_dev.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   index   686 non-null    int64 
 1   0       686 non-null    object
 2   1       686 non-null    object
 3   2       686 non-null    object
 4   url     686 non-null    object
dtypes: int64(1), object(4)
memory usage: 26.9+ KB


**Add column of relation component type**

In [11]:
def relation_components(df):
    components = list(df[1])
    comp_list = []
    for c in components:
        comp_list.append(c.split()[0])
    df['relation_components'] = comp_list

    return df

neoplasm_train = relation_components(neoplasm_train)
neoplasm_test = relation_components(neoplasm_test)
neoplasm_dev = relation_components(neoplasm_dev)

glaucoma_test = relation_components(glaucoma_test)

mixed_test = relation_components(mixed_test)

In [12]:
mixed_test.head()

Unnamed: 0,index,0,1,2,url,relation_components
0,0,T1,Premise 763 1114,"In sildenafil versus placebo arms, week-12 6MW...",-master/AbstRCT_corpus/data/test/mixed_test/28...,Premise
1,1,T2,Premise 1188 1295,Changes in WHO functional class and Borg dyspn...,-master/AbstRCT_corpus/data/test/mixed_test/28...,Premise
2,2,T3,Premise 1296 1363,"Headache, diarrhoea, and flushing were more co...",-master/AbstRCT_corpus/data/test/mixed_test/28...,Premise
3,3,T4,Claim 1364 1497,"Sildenafil, in addition to stable (≥3 months) ...",-master/AbstRCT_corpus/data/test/mixed_test/28...,Claim
4,0,T1,Premise 1287 1583,Responses (> or = 50% improvement) were seen i...,-master/AbstRCT_corpus/data/test/mixed_test/28...,Premise


<a id='comp_dist'></a>
### Component distributions

In [30]:
print("Neoplasm train:\n\n", neoplasm_train['relation_components'].value_counts(), '\n')
print("Neoplasm test:\n\n" , neoplasm_test['relation_components'].value_counts(), '\n')
print("Neoplasm dev: \n\n", neoplasm_dev['relation_components'].value_counts(), '\n')
# print('\n')
print("Glaucoma test: \n\n",glaucoma_test['relation_components'].value_counts(), '\n')
# print('\n')
print('Mixed test: \n\n', mixed_test['relation_components'].value_counts(), '\n')

Neoplasm train:

 Premise       1537
Claim          666
MajorClaim      64
Name: relation_components, dtype: int64 

Neoplasm test:

 Premise       218
Claim          99
MajorClaim      9
Name: relation_components, dtype: int64 

Neoplasm dev: 

 Premise       438
Claim         228
MajorClaim     20
Name: relation_components, dtype: int64 

Glaucoma test: 

 Premise       404
Claim         183
MajorClaim      7
Name: relation_components, dtype: int64 

Mixed test: 

 Premise       388
Claim         182
MajorClaim     30
Name: relation_components, dtype: int64 



In [17]:
print('Mean number of symbols per line - NEOPLASM: \n')
print("\t train set: ", int(neoplasm_train[2].apply(lambda x: len(x)).mean()))
print("\t test set: ", int(neoplasm_test[2].apply(lambda x: len(x)).mean()))
print("\t dev set: ", int(neoplasm_dev[2].apply(lambda x: len(x)).mean()))
print('\n')
print('Mean number of symbols per line - GLAUCOMA: ', int(glaucoma_test[2].apply(lambda x: len(x)).mean()))
print('\n')
print('Mean number of symbols per line - MIXED: ', int(mixed_test[2].apply(lambda x: len(x)).mean()))


Mean number of symbols per line - NEOPLASM: 

	 train set:  138
	 test set:  143
	 dev set:  135


Mean number of symbols per line - GLAUCOMA:  138


Mean number of symbols per line - MIXED:  145


In [46]:
print('Mean number of words per line - NEOPLASM: \n')
print("\t train set: ", int(neoplasm_train[2].apply(lambda x: len(x.split())).mean()))
print("\t test set: ", int(neoplasm_test[2].apply(lambda x: len(x.split())).mean()))
print("\t dev set: ", int(neoplasm_dev[2].apply(lambda x: len(x.split())).mean()))
print('\n')
print('Mean number of words per line - GLAUCOMA: ', int(glaucoma_test[2].apply(lambda x: len(x.split())).mean()))
print('\n')
print('Mean number of words per line - MIXED: ', int(mixed_test[2].apply(lambda x: len(x.split())).mean()))

Mean number of words per line - NEOPLASM: 

	 train set:  21
	 test set:  22
	 dev set:  20


Mean number of words per line - GLAUCOMA:  21


Mean number of words per line - MIXED:  22


**[ Biran and Rambow (2011)] found that premises are longer on the average than other sentences.** Yes

In [42]:
print('Neoplasm train: \n')
print('Mean length of MajorClaim: ',neoplasm_train[neoplasm_train['relation_components'] == 'MajorClaim'][2].apply(lambda x: len(x.split())).mean())
print('Mean length of Claim: ',neoplasm_train[neoplasm_train['relation_components'] == 'Claim'][2].apply(lambda x: len(x.split())).mean())
print('Mean length of Premise: ',neoplasm_train[neoplasm_train['relation_components'] == 'Premise'][2].apply(lambda x: len(x.split())).mean())

Neoplasm train: 

Mean length of MajorClaim:  16.484375
Mean length of Claim:  19.064564564564563
Mean length of Premise:  22.657124268054652


In [47]:
print('Glaucoma test: \n')
print('Mean length of MajorClaim: ',glaucoma_test[glaucoma_test['relation_components'] == 'MajorClaim'][2].apply(lambda x: len(x.split())).mean())
print('Mean length of Claim: ',glaucoma_test[glaucoma_test['relation_components'] == 'Claim'][2].apply(lambda x: len(x.split())).mean())
print('Mean length of Premise: ',glaucoma_test[glaucoma_test['relation_components'] == 'Premise'][2].apply(lambda x: len(x.split())).mean())

Glaucoma test: 

Mean length of MajorClaim:  16.142857142857142
Mean length of Claim:  16.950819672131146
Mean length of Premise:  23.925742574257427


In [48]:
print('Mixed test: \n')
print('Mean length of MajorClaim: ',mixed_test[mixed_test['relation_components'] == 'MajorClaim'][2].apply(lambda x: len(x.split())).mean())
print('Mean length of Claim: ',mixed_test[mixed_test['relation_components'] == 'Claim'][2].apply(lambda x: len(x.split())).mean())
print('Mean length of Premise: ',mixed_test[mixed_test['relation_components'] == 'Premise'][2].apply(lambda x: len(x.split())).mean())

Mixed test: 

Mean length of MajorClaim:  17.633333333333333
Mean length of Claim:  17.78021978021978
Mean length of Premise:  24.75


<a id='ecai2020'></a>
## <a href='https://ecai2020.eu/papers/1470_paper.pdf'>Transformer-based Argument Mining for Healthcare Applications</a>


Check and compare the data compliance with data from previous section

### Neoplasm

<!-- Glaucoma test:  {'Premise': 404, 'Claim': 183, 'MajorClaim': 7} -->


<!-- Mixed test:  {'Premise': 388, 'Claim': 182, 'MajorClaim': 30} -->

In [49]:
train_conll = pd.read_csv('ecai2020-transformer_based_am/data/neoplasm/train.conll', sep='\t', header=None)
test_conll = pd.read_csv('ecai2020-transformer_based_am/data/neoplasm/test.conll', sep='\t', header=None)
dev_conll = pd.read_csv('ecai2020-transformer_based_am/data/neoplasm/dev.conll', sep='\t', header=None)
train_conll.head()

Unnamed: 0,0,1,2,3,4
0,0,Facial,_,_,B-Claim
1,1,hirsutism,_,_,I-Claim
2,2,is,_,_,I-Claim
3,3,one,_,_,I-Claim
4,4,of,_,_,I-Claim


In [50]:
print("Train: \n", train_conll[4].value_counts())

Train: 
 O            61110
I-Premise    42438
I-Claim      14383
B-Premise     1535
B-Claim        729
Name: 4, dtype: int64


 > Supposed to be : **B-premise** = 1537, **B-claim** = 730 

In [51]:
print("Test: \n",test_conll[4].value_counts())

Test: 
 O            16739
I-Premise    11931
I-Claim       4677
B-Premise      438
B-Claim        248
Name: 4, dtype: int64


> Test => **Premise**: 218, **Claim**: 108 (99 + 9 (MajorClaim)) 

Should be compeletly different numbers according to the files in (.txt and .ann). 

***Looks like in .conll test is dev in .ann and vice versa.*** Yes!

In [18]:
print("Dev: \n",dev_conll[4].value_counts())

Dev: 
 O            9124
I-Premise    6655
I-Claim      2087
B-Premise     218
B-Claim       108
Name: 4, dtype: int64


(dev, test = test, dev)

>  Dev => **Premise**: 438, **Claim**: 248 (228 + 20(**MajorClaim**))

______________
Dev => Premise: 218, Claim: 108 <br>
Test => Premise: 438, Claim: 248
______________


### Glaucoma

In [52]:
gl_conll = pd.read_csv('ecai2020-transformer_based_am/data/glaucoma_test/test.conll', sep='\t', header=None)
gl_conll.head()

Unnamed: 0,0,1,2,3,4
0,1,The,_,_,O
1,2,aim,_,_,O
2,3,of,_,_,O
3,4,this,_,_,O
4,5,study,_,_,O


In [53]:
print("Glaucoma: \n", gl_conll[4].value_counts())

Glaucoma: 
 O            15919
I-Premise    11483
I-Claim       3357
B-Premise      404
B-Claim        190
Name: 4, dtype: int64


> Glaucoma test:  **Premise** -> 404; **Claim**: 190 (183 + 7 (MajorClaim))

### Mixed

In [23]:
mx_conll = pd.read_csv('ecai2020-transformer_based_am/data/mixed_test/test.conll', sep='\t', header=None)
mx_conll.head()

Unnamed: 0,0,1,2,3,4
0,0,We,_,_,O
1,1,previously,_,_,O
2,2,reported,_,_,O
3,3,that,_,_,O
4,4,treatment,_,_,O


In [24]:
print("Mixed: \n", mx_conll[4].value_counts())

Mixed: 
 O            16015
I-Premise    11767
I-Claim       3942
B-Premise      388
B-Claim        212
Name: 4, dtype: int64


> Mixed test:  **Premise**: 388, **Claim** : 212 (182 + 30 (MajorClaim))

### Find and fix the error in data

**Problem data due to the incorrect splitting in the data**

Tokens have been splitted incorrectly somewhere. For example:
    
 ***and nine symptoms ( .001 < P < . 01) , and the improvement***  - extra space in P < . 01 forces unnecessary split in the data which breaks IOB-schema logic

In [54]:
def check(url):
    '''
    Find the lines that cause the problem
    '''
    check_lines = []
    with open(url, 'r') as f:
        lines = f.readlines()
    for i, line in enumerate(lines[:-1]):
        if line != '\n':
            _, tkn, _, _, tag = line.split('\t')
            
        elif line == '\n' and lines[i+1] != '\n' and lines[i+1].split('\t')[4].startswith('I-'):
            check_lines.append(i+1)
    return check_lines

In [55]:
train_check = check('ecai2020-transformer_based_am/data/neoplasm/train.conll')
train_check

[]

#### Neoplasm 

<!-- 
# train_conll[train_conll[0] == 279][train_conll[1] == '4'] 
# train_conll[train_conll[0] == 359][train_conll[1] == '0001'] # 10129	359	0001	_	_	I-Premise
# train_conll[train_conll[0] == 400][train_conll[1] == '0002'] # 10170	400	0002	_	_	I-Premise
# train_conll[train_conll[0] == 213][train_conll[1] == '93'] # 11510	213	93	_	_	I-Premise
# train_conll[train_conll[0] == 242][train_conll[1] == '01'] # 15080	242	01	_	_	I-Premise
# train_conll[train_conll[0] == 271][train_conll[1] == '004'] # 33253	271	004	_	_	I-Premise
# train_conll[train_conll[0] == 234][train_conll[1] == 'US'] # 34268	234	US	_	_	I-Premise
# train_conll[train_conll[0] == 213][train_conll[1] == '048'] # 48940	213	048	_	_	I-Premise
# train_conll[train_conll[0] == 189][train_conll[1] == 'pain'] # 51615	189	pain	_	_	I-Premise
# train_conll[train_conll[0] == 210][train_conll[1] == '12'] # 51951	210	12	_	_	I-Premise
# train_conll[train_conll[0] == 232][train_conll[1] == '7'] # 51973	232	7	_	_	I-Premise
# train_conll[train_conll[0] == 205][train_conll[1] == 'better'] # 92762	205	better	_	_	I-Premise
 -->

In [189]:
def find_lines(arr, data):
    '''
    find lines that are causing problems to find out why
    prints number of the line in the file where the incorrect labels begin 
    returns 10 tokens before and after the problematic line
    '''
    chunk = []
    with open(data) as f:
            lines = f.readlines()
    for idx in arr:
        start = idx - 10
        end = idx + 10
        print(start)
        c = ''
        for i, line in enumerate(lines): 
           
            if i >= start and i <= end:
                if line != '\n':
                    c += line.split('\t')[1] + ' '
        chunk.append(c)
                    
    return chunk
        
lines_to_check = find_lines(check_lines, 'ecai2020-transformer_based_am/data/neoplasm/train.conll')

10434
10517
10560
11948
15662
34565
35624
50891
53688
54037
54061
96489


In [190]:
lines_to_check

['95 % CI = -9.6 % to -1 . 4 % ) per 4 weeks in the control group but ',
 '=.0002 , P =.0001 , and P = . 0001 , respectively , for ATP versus no ATP ) . ',
 '-0.2 % versus -2.4 % ; P = . 0002 ) ; functional scores ( +0.4 % versus -5.5 % ',
 '; 95 % CI , 0.61 to 0 . 93 ; P = .008 ) . Median survival was ',
 'and nine symptoms ( .001 < P < . 01 ) , and the improvement ( > 10 units on ',
 'responders indicated better physical well-being ( P = . 004 ) and mood ( P =.02 ) at month 3 ',
 'patients than in the control group , i.e . US $ 40782 and US $ 34465 , respectively ( P ',
 'survival rate , 31.8 % ; P = . 048 ) . Comparing survival for the two dose levels ',
 'and resolved . Mean ( s.d . ) pain scores on the day of discharge were 1·9 ( 3·1 ',
 ', 16.2 v 14.7 months ; P = . 12 ) . LV5FU2 plus oxaliplatin gave higher frequencies of ',
 'common toxicity criteria grade 3/4 neutropenia ( 41 . 7 % v 5.3 % of patients ) , grade 3/4 ',
 'QOL ( p < 0.05 ) ( i.e . better global health status , an

***Sequence where something is wrong => corresponding file where these lines are located***

'95 % CI = -9.6 % to -1 . 4 % ) per 4 weeks in the control group but ' => ***10675381.ann***   
'=.0002 , P =.0001 , and P = . 0001 , respectively , for ATP versus no ATP ) . ' => ***10675381.ann***   
'-0.2 % versus -2.4 % ; P = . 0002 ) ; functional scores ( +0.4 % versus -5.5 %'  => ***10675381.ann***   
'; 95 % CI , 0.61 to 0 . 93 ; P = .008 ) . Median survival was ' => ***15625369.ann***   
'and nine symptoms ( .001 < ***P < . 01***) , and the improvement ( > 10 units on ' => ***10561201.ann***   
responders indicated better physical well-being **( P = . 004 )** and mood ( P =.02 ) at month 3 => ***10561203.ann***   
patients than in the control group , i.e . US $ 40782 and US $ 34465 , respectively ( P => ***9531327.ann***   
survival rate , 31.8 % ; P = **. 048 )** . Comparing survival for the two dose levels => ***10653877.ann***   
and resolved . Mean ( s.d . ) pain scores on the day of discharge were 1·9 ( 3·1 => ***23254324.ann***   
, 16.2 v 14.7 months ; P = . 12 ) . LV5FU2 plus oxaliplatin gave higher frequencies of => ***10944126.ann***   
common toxicity criteria grade 3/4 neutropenia ( 41 . 7 % v 5.3 % of patients ) , grade 3/4 => ***10944126.ann***  
QOL ( p < 0.05 ) ( i.e . better global health status , and physical and emotional functioning => ***23866850.ann***  

In [196]:
# print(check_lines)

[10444, 10527, 10570, 11958, 15672, 34575, 35634, 50901, 53698, 54047, 54071, 96499]


***Dev and Test***

In [29]:
test = 'ecai2020-transformer_based_am/data/neoplasm/test.conll'
dev = 'ecai2020-transformer_based_am/data/neoplasm/dev.conll'

In [30]:
test_check = check(test)
dev_check = check(dev)

In [31]:
test_check

[1807, 1816, 1834, 1843, 1856, 1865, 33747]

In [214]:
dev_check

[]

In [216]:
tst_find_lines = find_wrong_lines(test_check, test)
tst_find_lines

1797
1806
1824
1833
1846
1855
33737


['both arms , with 12 % ( postop . RCT ) and 12 % ( pre-op . RCT ) ',
 'RCT ) and 12 % ( pre-op . RCT ) of patients , respectively , suffering from anastomotic leakage ',
 'from anastomotic leakage , 3 % ( postop . RCT ) and 3 % ( pre-op . RCT ) ',
 'RCT ) and 3 % ( pre-op . RCT ) from postoperative bleeding , and 6 % ( postop ',
 'postoperative bleeding , and 6 % ( postop . RCT ) and 4 % ( pre-op . RCT ) ',
 'RCT ) and 4 % ( pre-op . RCT ) from delayed wound healing . The patient accrual ',
 'the 6 mg ibandronate group . I.v . ibandronate treatment leads to significant improvements in quality of life , ']

In [217]:
find_wrong_lines(dev_check, dev)

[]

#### Glaucoma


In [208]:
glaucoma_check = check('ecai2020-transformer_based_am/data/glaucoma_test/test.conll')
glaucoma_check

[114, 18541, 24453, 31537]

In [209]:
glaucoma_incorrect_lines = find_wrong_lines(glaucoma_check, 'ecai2020-transformer_based_am/data/glaucoma_test/test.conll')
glaucoma_incorrect_lines

104
18531
24443
31527


['9.8 and 30.6 ± 9.9 mmHg , respectively . Postoperatively , mean IOPs were 11.4 ± 4.9 and 13.6 ± ',
 'least as effective as pilocarpine 2 % t.i.d . in reducing IOP when added to eyes currently on monotherapy with ',
 'the trabeculectomy group ( P = 0.7 ) . Mean postoperative IOP was 13.7+/-2.2 mmHg at 3 months , 14.8+/-3.3 ',
 ', 24.7 mm Hg in the medical group . Mean IOP ( both eyes ) at last follow-up was 18.2 ']

#### Mixed

In [223]:
mx_conll[4].value_counts()

O            16015
I-Premise    11767
I-Claim       3942
B-Premise      388
B-Claim        212
Name: 4, dtype: int64

In [224]:
mixed_check = check('ecai2020-transformer_based_am/data/mixed_test/test.conll')
mixed_check

[32920]

In [225]:
print(find_wrong_lines(mixed_check, mixed))


32910
['glucose levels < 54 mg/dL by 65 % . ( 3.1 vs. 8.9 min , P < 0.001 ) . ']


***( Majority of the problematic lines are in the neoplasm train and test data )***

<a id='comp_clf'></a>
## Argument components classification results

<!-- ### Results from authors

<img src='pics/result_authors.png'>
<img src='pics/results_fine-tune.png'>
<img src='pics/results_all.png'> -->

###   Results: pre-problem identification

||Neoplasm||||Glaucoma||||Mixed|||
|-|-|-|-|-|-|-|-|-|-|-|-|
||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1|
|SciBert+GRU+CRF|.84|.77|.93||.84|.82|.92||.82|.77|.91|
|BioBert+GRU+CRF|.81|.74|.91||.84|.77|.94||.82|.78|.92|
|Bert+GRU+CRF|.82|.72|.91||.83|.80|.89||.83|.77|.90|
|||||||||||||
|SciBert+LSTM+CRF|.70|.73|.90||.70|.72|.92||.69|.74|.90|
|||||||||||||
|mBert+LSTM+CRF|.84|.76|.92|||||||||

###   Results: post-problem identification

||Neoplasm||||Glaucoma||||Mixed|||
|-|-|-|-|-|-|-|-|-|-|-|-|
||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1|
|SciBert+GRU+CRF|.84|.76|.92||.86|.84|.92||.84|.77|.92|
|BioBert+GRU+CRF|.85|.76|.94||.88|.85|.92||.87|.81|.93|
|Bert+GRU+CRF|.84|.73|.93||.85|.80|.90||.84|.77|.90|
|mBert+GRU+CRF|.84|.74|.90||.83|.77|.85||.83|.74|.89|
|||||||||||||
|SciBert+LSTM+CRF|.84|.75|.93||.84|.83|.92||.82|.76|.91|

<!-- BioBert+LSTM+CRF|.83|.76|.93||.83|.82|.91||.82|.80|.92|
Bert+LSTM+CRF|.84|.75|.92||.82|.79|.88||.83|.77|.90|
mBert+LSTM+CRF|.83|.75|.89||.83|.76|.88||.83|.73|.90| -->

E-F1 is always higher than C-F1 => harder to identify claims

In [57]:
preds = pd.read_csv('ecai2020-transformer_based_am/output/sequence_tagging_predictions.conll', sep='\t', header=None)
preds.head()

Unnamed: 0,0,1,2
0,b'we',O,O
1,b'previously',O,O
2,b'reported',O,O
3,b'that',O,O
4,b'treatment',O,O


<a id='translation_projection'></a>
# Translations and Projections



* DeepL
* Opus
* Mixed


### IAA

In [34]:
import sklearn
from sklearn.metrics import cohen_kappa_score

# ann1 = [5,3,5,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,1,3,3]
# ann2 =[3,3,1,3,3,3,3,3,5,3,3,5,3,3,2,3,1=3,3,3,5,3,5,3=5,3,3,3,1=2=4=5,3,3]


ann1 = [2,3,5,3,3,3,3,3,1,3,3,3,3,3,3,3,3,3,3,3,3,1,3,3,3,3,1,3,3]
ann2 = [3,3,1,3,3,3,3,3,5,3,3,5,3,3,2,3,1,3,3,5,3,5,3,3,3,3,1,3,3]
print(cohen_kappa_score(ann1, ann2))



0.2817337461300309


### Neoplasm 
   * train - 4405 lines
   * test - 1251 lines
   * dev - 679 lines

### Glaucoma
   * test - 1247 lines

### Mixed
   * test - 1148 lines
   

## Translations

||train|dev|neoplasm|glaucoma|mixed|
|-|-|-|-|-|-|
|**DeepL**|✅|✅|✅|✅|✅|
|**Opus-mt**|✅|✅|✅|✅|✅|


## Projections

||train|dev|neoplasm|glaucoma|mixed|
|-|-|-|-|-|-|
|**DeepL**|
|simalign|✅|✅|✅|✅|✅|
|awesome|✅|✅|✅|✅|✅|
|**Opus**|
|simalign|✅|✅|✅|✅|✅|
|awesome|✅|✅|✅|✅|✅|




#### Post-processing. Compare

Projections using awesome align mostly does not project labels to the first tokens in the sentence that are articles, conjunctions, etc(Tokens like: 'Por lo tanto' => 'therefore'/ 'thus', etc.). when they are supposed to have a tag (B- or I-). 

Post-processing: if the sentence is full argument in the original data => same sentence is full argument in the translated data. Therefore, we can assign the full argument sentences without projection. Below are the numbers of how many sentence were changes during mentioned process (more readable version is in the end of this [section](#post_table)):

In [157]:
def get_difference(projected_path, corrected_path):
    projected = open(projected_path, 'r')
    corrected = open(corrected_path, 'r')
    
    difference_in_tokens = 0
    difference_in_seq = 0
    p_tags = []
    p_seq_tag = []
    c_tags = []
    c_seq_tag = []
    for p, c in zip(projected, corrected):
        if p != c: 
            difference_in_tokens += 1
        if p != '\n':
            p_seq_tag.append(p.split(' ')[1])
        else:
            p_tags.append(p_seq_tag)
            p_seq_tag = []
        if c != '\n':
            c_seq_tag.append(c.split(' ')[1])

        else:
            c_tags.append(c_seq_tag)
            c_seq_tag = []

    p_tags.append(p_seq_tag)
    c_tags.append(c_seq_tag)
    print(len(p_tags))
    for p, c in zip(p_tags, c_tags):
       
        if set(p) != set(c): 
            difference_in_seq += 1
    
    return difference_in_tokens, difference_in_seq

In [158]:
def count_full_seq(data_path):
    f = open(data_path)
    counter = 0
    num_os = 0
    tags = []
    cur_tag = []
    for line in f:
        if line != '\n':
            cur_tag.append(line.split(' ')[1])
        else:
            tags.append(cur_tag)
            if cur_tag[0].startswith('B') and len(set(cur_tag[1:])) == 1:
#                 print(len(cur_tag))
                counter += 1
            if len(set(cur_tag)) == 1 and cur_tag[0].startswith('O'):
                num_os += 1
            cur_tag = []
            
#     print(len(tags))
    print("Number of O's: ",num_os)
    return counter

### Awesome-align

#### Neoplasm

In [174]:
print('Neoplasm train (deepl & awesome): ')
print('Number of sentences that are full component in original data: ', count_full_seq('antidote-projections/data/neoplasm/train.tsv'))
print('Number of sentences that are full component in projected: ', count_full_seq('antidote-projections/data/projections/deepl_awesome/neoplasm/pprocessed_train.tsv'))
get_difference('antidote-projections/data/projections/deepl_awesome/neoplasm/train.tsv', 'antidote-projections/data/projections/deepl_awesome/neoplasm/pprocessed_train.tsv')

Neoplasm train (deepl & awesome): 
Number of O's:  2345
Number of sentences that are full component in original data:  1769
Number of O's:  2345
Number of sentences that are full component in projected:  1752
4405


(1652, 800)

In [175]:
print('Neoplasm test (deepl & awesome): ')
print('Number of sentences that are full component in original: ', count_full_seq('antidote-projections/data/neoplasm/test.tsv'))
print('Number of sentences that are full component in projected: ', count_full_seq('antidote-projections/data/projections/deepl_awesome/neoplasm/pprocessed_test.tsv'))
get_difference('antidote-projections/data/projections/deepl_awesome/neoplasm/test.tsv', 'antidote-projections/data/projections/deepl_awesome/neoplasm/pprocessed_test.tsv')

Neoplasm test (deepl & awesome): 
Number of O's:  630
Number of sentences that are full component in original:  518
Number of O's:  630
Number of sentences that are full component in projected:  518
1252


(505, 242)

In [None]:
# 1769   518  262

In [176]:
print('Neoplasm dev (deepl & awesome): ')
print('Number of sentences that are full component in original: ', count_full_seq('antidote-projections/data/neoplasm/dev.tsv'))
print('Number of sentences that are full component in projected: ', count_full_seq('antidote-projections/data/projections/deepl_awesome/neoplasm/pprocessed_dev.tsv'))
get_difference('antidote-projections/data/projections/deepl_awesome/neoplasm/dev.tsv', '/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_awesome/neoplasm/pprocessed_dev.tsv')

Neoplasm dev (deepl & awesome): 
Number of O's:  377
Number of sentences that are full component in original:  262
Number of O's:  377
Number of sentences that are full component in projected:  257
680


(195, 95)

#### Glaucoma

In [177]:
print('Glaucoma (deepl & awesome): ') 
print('Number of sentences that are full component in original: ', count_full_seq('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/glaucoma/test.tsv'))
print('Number of sentences that are full component in projected: ', count_full_seq('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_awesome/glaucoma/pprocessed_test.tsv'))
get_difference('antidote-projections/data/projections/deepl_awesome/glaucoma/test.tsv', '/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_awesome/glaucoma/pprocessed_test.tsv')

Glaucoma (deepl & awesome): 
Number of O's:  682
Number of sentences that are full component in original:  498
Number of O's:  682
Number of sentences that are full component in projected:  498
1248


(341, 167)

#### Mixed

In [179]:
print('Mixed (deepl & awesome): ') 
print('Number of sentences that are full component in original: ', count_full_seq('antidote-projections/data/mixed/test.tsv'))
print('Number of sentences that are full component in projected: ', count_full_seq('antidote-projections/data/projections/deepl_awesome/mixed/pprocessed_test.tsv'))
get_difference('antidote-projections/data/projections/deepl_awesome/mixed/test.tsv', 'antidote-projections/data/projections/deepl_awesome/mixed/pprocessed_test.tsv')

Mixed (deepl & awesome): 
Number of O's:  591
Number of sentences that are full component in original:  478
Number of O's:  591
Number of sentences that are full component in projected:  476
1147


(415, 203)

### Simalign

#### Neoplasm

In [171]:
print('Neoplasm train (deepl & simalign): ')
print('Number of sentences that are full component: ', count_full_seq('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/neoplasm/train.tsv'))
get_difference('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/neoplasm/train.tsv', '/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/neoplasm/pprocessed_train.tsv')

Neoplasm train (deepl & simalign): 
Number of O's:  2345
Number of sentences that are full component:  1769
4405


(466, 88)

In [168]:
print('Neoplasm test (deepl & simalign): ')
print('Number of sentences that are full component: ', count_full_seq('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_awesome/neoplasm/pprocessed_test.tsv'))
get_difference('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/neoplasm/test.tsv', '/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/neoplasm/pprocessed_test.tsv')

Neoplasm test (deepl & simalign): 
Number of O's:  630
Number of sentences that are full component:  518
1252


(189, 92)

In [167]:
print('Neoplasm dev (deepl & simalign): ')
print('Number of sentences that are full component: ', count_full_seq('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_awesome/neoplasm/pprocessed_dev.tsv'))
get_difference('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/neoplasm/dev.tsv', '/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/neoplasm/pprocessed_dev.tsv')

Neoplasm dev (deepl & simalign): 
Number of O's:  377
Number of sentences that are full component:  257
680


(66, 11)

#### Glaucoma

In [169]:
print('Glaucoma (deepl & simalign): ')
print('Number of sentences that are full component: ', count_full_seq('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/glaucoma/pprocessed_test_1.tsv'))
get_difference('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/glaucoma/test.tsv', '/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/glaucoma/pprocessed_test_1.tsv')

Glaucoma (deepl & simalign): 
Number of O's:  682
Number of sentences that are full component:  506
1248


(144, 51)

#### Mixed

In [170]:
print('Mixed (deepl & simalign): ')
print('Number of sentences that are full component: ', count_full_seq('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/mixed/pprocessed_test_1.tsv'))
get_difference('/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/mixed/test.tsv', '/Users/anaryegen/Desktop/UPV/Master_Thesis/antidote-projections/data/projections/deepl_simalign/mixed/pprocessed_test_1.tsv')

Mixed (deepl & simalign): 
Number of O's:  591
Number of sentences that are full component:  480
1147


(248, 90)

* Neoplasm train/test/dev: 1769   518  262
* Glaucoma: 498
* Mixed: 478

<a id='post_table'></a>
### Number of changes

"_s" - simalgin

"_a" - awesome-align

|(DeepL)|train_a|train_s|dev_a|dev_s|neoplasm_a|neoplasm_s|glaucoma_a|glaucoma_s|mixed_a|mixed_s|
|-|-|-|-|-|-|-|-|-|-|-|
|overall|4405|4405|680|680|1252|1252|1248|1248|1147|1147|
|# of changes|800|88|95|11|242|92|167|51|203|90|
|# of full O's|2345|2345|377|377|630|630|692|682|591|591|
|# of full component|1752|703|257|257|518|518|498|506|476|480|


<a id='zero-shot'></a>
# Zero-shot
### Without post-processing

||Neoplasm||||Glaucoma||||Mixed|||
|-|-|-|-|-|-|-|-|-|-|-|-|
||||||**Simalign**|||||
||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1|
|mBERT+GRU+CRF|
|*DeepL*|-|-|-||.73|.65|.81||.74|.65|.84|
|*Opus*|.76|.70|.86||.74|.67|.81||.75|.69|.85|
||||||**Awesome**|||||
|*DeepL*|-|-|-||.69|.64|.80||.68|.63|.84|
|*Opus*|.70|.68|.85||.70|.66|.80||.69|.68|.84|


### After post-processing

||Neoplasm||||Glaucoma||||Mixed|||
|-|-|-|-|-|-|-|-|-|-|-|-|
||||||**Simalign**|||||
||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1|
|mBERT+GRU+CRF|
|*DeepL*|.79|.69|.87||.76|.70|.81||.78|.69|.85|
|*Opus*|.79|.69|.88||.76|.68|.80||.78|.70|.84|
||||||**Awesome**|||||
|*DeepL*|.78|.69|.87||.76|.69|.81||.77|.69|.84|
|*Opus*|.78|.69|.87||.75|.68|.80||.78|.70|.84|

### Train and evaluate EN/ES

#### mBERT & BETO (train -EN => test - ES)
||Neoplasm||||Glaucoma||||Mixed|||
|-|-|-|-|-|-|-|-|-|-|-|-|
||||||**Simalign**|||||
||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1|
|**mBERT**|
|*DeepL*|.83|.76|.89||.82|.76|.88||.82|.75|.89|
|*Opus*|.80|.74|.88||.79|.75|.84||.81|.76|.88|
|**BETO**|
|*DeepL*|.83|.75|.90||.83|.80|.90||.83|.75|.91|
|*Opus*|.80|.74|.88||.79|.75|.84||.81|.76|.88|
||||||**Awesome**|||||
|**mBERT**|
|*DeepL*|.81|.75|.88||.80|.75|.87||.80|.75|.89|
|*Opus*|.78|.72|.88||.80|.72|.88||.80|.76|.89|
|**BETO**|
|*DeepL*|.82|.75|.90||.83|.75|.91||.82|.80|.89|
|*Opus*|.81|.74|.90||.83|.82|.89||.82|.77|.90|


<!-- #### BETO
||Neoplasm||||Glaucoma||||Mixed|||
|-|-|-|-|-|-|-|-|-|-|-|-|
||||||**Simalign**|||||
||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1|
|BETO+GRU+CRF|
|*DeepL*|.83|.75|.90||.83|.80|.90||.83|.75|.91|
|*Opus*|.80|.74|.88||.79|.75|.84||.81|.76|.88|
||||||**Awesome**|||||
|*DeepL*|.82|.75|.90||.83|.75|.91||.82|.80|.89|
|*Opus*|.81|.74|.90||.83|.82|.89||.82|.77|.90| -->

#### Train - EN + ES => Test - ES

||Neoplasm||||Glaucoma||||Mixed|||
|-|-|-|-|-|-|-|-|-|-|-|-|
||||||**Simalign**|||||
||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1||F1-macro|C-F1|E-F1|
|*DeepL*|.83|.76|.89||.84|.80|.88||.83|.75|.88|
|*Opus*|.83|.75|.89||.84|.80|.87||.82|.76|.88|
||||||**Awesome**|||||
|*DeepL*|.82|.76|.88||.84|.80|.87||.82|.75|.88|
|*Opus*|.82|.75|.89||.84|.80|.87||.82|.76|.88|

#### Train - EN + ES => Test - EN

||Neoplasm||||Glaucoma||||Mixed|||
|-|-|-|-|-|-|-|-|-|-|-|-|
|EN => EN|.84|.74|.90||.83|.77|.85||.83|.74|.89|
|EN+ES => EN|.83|.74|.89||.84|.81|.86||.83|.76|.89|


<!-- eval_f1_macro = 0.8283145954559271       
eval_f1_micro = 0.88376279715426         
f1_claim = 0.7357030134515669               
f1_evidence = 0.8887772887063897  

# g
eval_f1_macro = 0.8432134457273017               
eval_f1_micro = 0.8783164030688784               
f1_claim = 0.8137134198314234                    
f1_evidence = 0.8623455338715157 

#m 
eval_f1_macro = 0.8320493187989695      
eval_f1_micro = 0.8898684125234978      
f1_claim = 0.7610239471511149           
f1_evidence = 0.8929866304101518  -->

<!-- # Train en+es => test es (deepl+simalign)(mBert)
#Neoplasm
eval_f1_macro = 0.8331153841159322           
eval_f1_micro = 0.8842202282523245  
f1_claim = 0.7642670871490084          
f1_evidence = 0.8870742183598043 

# Glaucoma
eval_f1_macro = 0.8432297937148395
eval_f1_micro = 0.891302317763293    
f1_claim = 0.8001632653061226             
f1_evidence = 0.8766085585903499 

# Mixed
eval_f1_macro = 0.8276006934273226
eval_f1_micro = 0.8814192515435649    
f1_claim = 0.7540607939610434       
f1_evidence = 0.8820181813702546  

#opus+awesome
# n
eval_f1_macro = 0.8231949813663691    
eval_f1_micro = 0.8808427086532109       
f1_claim = 0.7492806205429751       
f1_evidence = 0.8865031462694234 
# g
eval_f1_macro = 0.835482877127567
eval_f1_micro = 0.8869774873467982  
f1_claim = 0.8008611410118407        
f1_evidence = 0.8734213410856407   
# m
eval_f1_macro = 0.8157857067189477  
eval_f1_micro = 0.8754521477015825  
f1_claim = 0.7551700208188759         
f1_evidence = 0.8755925365607665   

# opus+simailgn
#n
eval_f1_macro = 0.8315486985987438
eval_f1_micro = 0.8828713138475568    
f1_claim = 0.7549044265593561          
f1_evidence = 0.8894982317666956 
#g
eval_f1_macro = 0.8417131464135337         
eval_f1_micro = 0.8869200815162793        
f1_claim = 0.8036748882635326               
f1_evidence = 0.8730090605627088  
#m
eval_f1_macro = 0.8230364075842536  
eval_f1_micro = 0.8764506405425773    
f1_claim = 0.7568804159445408         
f1_evidence = 0.8769498273820023   

# deepl+awesome
# NEo
eval_f1_macro = 0.8208313036341793        
eval_f1_micro = 0.8824491492495504      
f1_claim = 0.7614884963273235       
f1_evidence = 0.8844663349751232 
#g
eval_f1_macro = 0.8364950775840218  
eval_f1_micro = 0.8898828978577965           
f1_claim = 0.7977133523887301                
f1_evidence = 0.8748910508586369
#m
eval_f1_macro = 0.8167979303797827
eval_f1_micro = 0.8805727062763969  
f1_claim = 0.7524015694763901      
f1_evidence = 0.8808520079875749 

# Train en+es => test es (deepl+simalign)(BETO)

# deepl+awesome
# neoplasm
eval_f1_macro = 0.817383732726738  
eval_f1_micro = 0.884087804330333  
f1_claim = 0.7551413881748072                
f1_evidence = 0.8873463114754099 

# glaucoma
eval_f1_macro = 0.8216812826486678  
eval_f1_micro = 0.8950146739337577         
f1_claim = 0.804414261460102         
f1_evidence = 0.8817333014124971

#mixed
eval_f1_macro = 0.807591296933762 
eval_f1_micro = 0.8812696145678688        
f1_claim = 0.7207846410684474             
f1_evidence = 0.8867386180374642   

# opus+awesome
# neoplasm
eval_f1_macro = 0.8099494109863706       
eval_f1_micro = 0.8818545133151443         
f1_claim = 0.7401346179226606             
f1_evidence = 0.8889993719907892 

# glaucoma
eval_f1_macro = 0.8176784586783586    
eval_f1_micro = 0.8951697584879243           
f1_claim = 0.7991386735572782               
f1_evidence = 0.88249885605568 

#mixed
eval_f1_macro = 0.8072936890406357 
eval_f1_micro = 0.8798016161770901        
f1_claim = 0.7395671240628094                     
f1_evidence = 0.8830523513753328  -->

<a id='rel_clf'></a>
# Relation classification

### Train

In [4]:
data = pd.read_csv('/ecai2020-transformer_based_am/data/neoplasm/train_relations.tsv', sep='\t', header=None)
data.head()

Unnamed: 0,0,1,2
0,__label__noRel,The overall remission rate was 87% with 31% co...,The median survival of all 406 eligible patien...
1,__label__noRel,The overall remission rate was 87% with 31% co...,"The overall remission rate, the rate of comple..."
2,__label__noRel,The overall remission rate was 87% with 31% co...,In limited disease the estimated percentages o...
3,__label__noRel,The overall remission rate was 87% with 31% co...,Patients with extensive disease survived signi...
4,__label__noRel,The overall remission rate was 87% with 31% co...,In the latter patients the received dose inten...


In [5]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14286 entries, 0 to 14285
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       14286 non-null  object
 1   1       14286 non-null  object
 2   2       14286 non-null  object
dtypes: object(3)
memory usage: 335.0+ KB


In [6]:
data[0].value_counts()

__label__noRel      12892
__label__Support     1194
__label__Attack       200
Name: 0, dtype: int64

In [22]:
data[1].nunique()

2266

In [13]:
count = 0
for line in data[1]:
    for char in line:
#     print(line)
        count += 1

print(f'Number of characters in training data: {count}')
    

Number of characters in training data: 1882891


### Test

In [14]:
test = pd.read_csv('/ecai2020-transformer_based_am/data/neoplasm/test_relations.tsv', sep='\t', header=None)
test.head()

Unnamed: 0,0,1,2
0,__label__noRel,Responses (> or = 50% improvement) were seen i...,"The rate of progressive disease was 47%, 21%, ..."
1,__label__noRel,Responses (> or = 50% improvement) were seen i...,Eight (73%) of 11 patients crossing over from ...
2,__label__noRel,Responses (> or = 50% improvement) were seen i...,The median duration of response from start of ...
3,__label__noRel,Responses (> or = 50% improvement) were seen i...,The following drug-related adverse effects wer...
4,__label__noRel,Responses (> or = 50% improvement) were seen i...,"No cases of drug-related neutropenic fever, se..."


In [27]:
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4380 entries, 0 to 4379
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       4380 non-null   object
 1   1       4380 non-null   object
 2   2       4380 non-null   object
dtypes: object(3)
memory usage: 102.8+ KB


In [26]:
test[0].value_counts()

__label__noRel      3961
__label__Support     359
__label__Attack       60
Name: 0, dtype: int64

In [23]:
test[1].nunique()

686

In [25]:
count = 0
for line in test[1]:
    for char in line:
#     print(line)
        count += 1

print(f'Number of characters in test data: {count}')
    

Number of characters in test data: 577319


### Dev

In [28]:
dev = pd.read_csv('/Users/anaryegen/Desktop/UPV/servers/ecai2020-transformer_based_am/data/neoplasm/dev_relations.tsv', sep='\t', header=None)
dev.head()

Unnamed: 0,0,1,2
0,__label__noRel,Intervention increased SF36 score by 9.5 point...,"Effect size (ES) was 0.63 [0.37; 0.90], 0.29 [..."
1,__label__noRel,Intervention increased SF36 score by 9.5 point...,Anxiety score was shortly minored by intervent...
2,__label__Support,Intervention increased SF36 score by 9.5 point...,This 2-week group intervention seemed to durab...
3,__label__noRel,Intervention increased SF36 score by 9.5 point...,"Differences, smaller at 12 months than at six,..."
4,__label__noRel,"Effect size (ES) was 0.63 [0.37; 0.90], 0.29 [...",Intervention increased SF36 score by 9.5 point...


In [29]:
dev.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2030 entries, 0 to 2029
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       2030 non-null   object
 1   1       2030 non-null   object
 2   2       2030 non-null   object
dtypes: object(3)
memory usage: 47.7+ KB


In [30]:
dev[0].value_counts()

__label__noRel      1815
__label__Support     185
__label__Attack       30
Name: 0, dtype: int64

In [32]:
dev[1].nunique()

326

### Results

|| precision   | recall | f1-score | support| 
|-| - | - | - | - |   
|**sciBERT**|
||    0.95   |   0.96  |    0.96    |   1815|                                             
||    0.61   |   0.61   |   0.61    |   185 |                                               
||    0.78   |   0.23   |   0.36    |    30 |                                                
|accuracy|    ||                        0.92   |   2030  |                                               
|macro avg|       0.78  |     0.60  |    0.64  |   2030  |                                               
|weighted avg|       0.92  |    0.92   |   0.92   |   2030  |                                               
|**bioBERT**|
||   0.95  |   0.97  |   0.96   |   1815
||   0.68  |   0.64  |   0.66   |    185
||   0.00  |   0.00  |   0.00   |     30
|    accuracy|   ||                         0.93  |  2030|
|   macro avg|       0.54   |   0.54  |    0.54    |  2030|
|weighted avg|       0.91   |   0.93   |  0.92    |  2030|
|**BERT**|
||0.94   |   0.96   |   0.95   |   1815
||0.59   |   0.57   |   0.58   |    185
||0.00   |   0.00   |   0.00   |     30
|    accuracy|  ||                         0.91   |   2030
|   macro avg|       0.51 |     0.51   |   0.51   |   2030
|weighted avg|       0.90  |    0.91   |   0.90   |   2030


In [60]:
#sciBert
eval_f1_macro = 0.6411988822634034         
eval_f1_macro_filtered = 0.7823111439079256

eval_f1_micro = 0.9187192118226601       
eval_f1_micro_filtered = 0.9241482218353644

#bioBert
eval_f1_macro = 0.5397247640802364
eval_f1_macro_filtered = 0.8095871461203545

eval_f1_micro = 0.9261083743842364
eval_f1_micro_filtered = 0.9332340531149169

#Bert
eval_f1_macro = 0.5100361117076702
eval_f1_macro_filtered = 0.7650541675615052

eval_f1_micro = 0.9103448275862069
eval_f1_micro_filtered = 0.9171215880893302

|| precision   | recall | f1-score | support| 
|-| - | - | - | - |   
|**es => es (BETO)**|
||      0.93|      0.98|      0.95|      1814|
||      0.62|     0.42|      0.50|       185|
||      0.00|      0.00|      0.00|        30|
|    accuracy|||                           0.91|      2029|
|   macro avg|       0.52|      0.47|      0.49|      2029|
|weighted avg|       0.89|      0.91|      0.90|      2029|
|**en => es**|
||        0.93   |   0.97  |    0.95  |    1814|
||        0.57   |   0.39  |    0.46  |     185|
||        0.45   |   0.17  |    0.24  |      30|
|    accuracy|  ||                         0.90|      2029|
|   macro avg|       0.65  |    0.51  |    0.55|      2029|
|weighted avg|       0.89  |    0.90  |    0.89|      2029|
   
        

In [None]:
# es => es
eval_f1_macro = 0.5623686841904604
eval_f1_macro_filtered = 0.7153478980805623
eval_f1_micro = 0.905864958107442
eval_f1_micro_filtered = 0.9121672057725804

#en => es
eval_f1_macro = 0.551885577785158
eval_f1_macro_filtered = 0.705877147165542

eval_f1_micro = 0.9043863972400197
eval_f1_micro_filtered = 0.9111277072442121

