## Reading Causal Relations Corpora
By: Pedram Hosseini (phosseini@gwu.edu)

There have been efforts in creating various causal relation corpora and resources with different levels of granularity. These resources, even though valuable, are fairly scattered and do not follow a unified schema which makes it hard for people in the NLP community to use the generated knowledge. To alleviate this scatteredness problem, I’ve developed various helper methods in a **Converter** class to unify all of these resources into a simple and user-friendly format to make them easier to use. In the following, there is a list of current data sets which are covered in CREST:

- **SemEval 2007 Task 4** - Public (source: **1**)
- **SemEval 2010 Task 8** - Public (source: **2**)
- **EventCausality** - Public (source: **3**)
- **Causal-TimeBank** - Not public (source: **4**)
- **EventStoryLine (v0.9, v1.0, v1.5)** - Public (source: **5**)
- **CaTeRS** - Public (source: **6**)
- **BECAUSE v2.1** - Public (source: **7**)
- **Choice of Plausible Alternatives (COPA)** - Public (source: **8**)
- **Penn Discourse TreeBank (PDTB) 3.0** - Not public (source: **9**)
- **BioCause** - Public (source: **10**)
- **Temporal and Causal Reasoning (TCR)** - Public (source: **11**)
- **Benchmark Corpus for Adverse Drug Effects (ADE)** - Public (source: **12**)
- **SemEval 2020 Task 5** - Public (source: **13**)
- **Your data set?**

#### JOIN US
We invite everyone in the ML/NLP/NLU community and groups of researchers who work on causal/counterfactual relations extraction in language to contribute to this repository so that we all take a step forward in improving the quality of availbale data resources and alleviate the scatteredness issue.

In [1]:
import os
import sys

sys.path.insert(0, os.path.abspath('..') + '/src')

from crest import Converter, crest2brat
from utils import min_avg_max

converter = Converter()
total_samples = 0

## SemEval 2007 Task 4 

In [2]:
data, mis = converter.convert_semeval_2007_4()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

# crest2brat(data, '../data/crest_brat/1')

samples: 1529
mismatch: 0
+ causal: 114
- non-causal: 1415


In [3]:
data.head()

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,1,1,[tumor shrinkage],[radiation therapy],[],The period of tumor shrinkage after radiation ...,span1 14:29\nspan2 36:53\nsignal,1,1,1,,0
1,2,2,[Habitat degradation],[stream channels],[],Habitat degradation from within stream channel...,span1 0:19\nspan2 32:47\nsignal,0,1,1,,0
2,3,3,[discomfort],[traveling],[],Earplugs relieve the discomfort from traveling...,span1 21:31\nspan2 37:46\nsignal,1,1,1,,0
3,4,4,[daily terror],[antipersonnel land mines],[],We continue to see progress toward a world fre...,span1 55:67\nspan2 71:95\nsignal,1,1,1,,0
4,5,5,[segment],[anecdotes],[],The Global Warming segment starts off with two...,span1 19:26\nspan2 53:62\nsignal,0,1,1,,0


In [4]:
min_avg_max(data)

Avg. length: 17.521909744931328
+++++++++++++++
min length/id: {'len_min': 3, 'original_id': 83}
min context: Trees grow seeds.
+++++++++++++++
max length/id: {'len_max': 82, 'original_id': 7}
max context: Literary criticism is the study of literature by means of a microscopic knowledge of the language in which a book is written, of its growth from various roots, of its stages of development and the factors influencing them, of its condition in the period of this particular composition, of the writer's idiosyncrasies of thought and style in his ripening periods, of the general history and literature of his race, and of the special characteristics of his age and of his contemporary writers.


## SemEval 2010 Task 8

In [5]:
data, mis = converter.convert_semeval_2010_8()
total_samples += len(data)

print("samples: {}".format(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

# crest2brat(data, '../data/crest_brat/2')

samples: 10717
mismatch: 0
+ causal: 1331
- non-causal: 9386


In [6]:
data.head()

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,1,1,[configuration],[elements],[],The system as described above has its greatest...,span1 73:86\nspan2 98:106\nsignal,0,1,2,,0
1,2,2,[child],[cradle],[],The child was carefully wrapped and bound into...,span1 4:9\nspan2 51:57\nsignal,0,-1,2,,0
2,3,3,[author],[disassembler],[],The author of a keygen uses a disassembler to ...,span1 4:10\nspan2 30:42\nsignal,0,1,2,,0
3,4,4,[ridge],[surge],[],A misty ridge uprises from the surge.,span1 8:13\nspan2 31:36\nsignal,0,-1,2,,0
4,5,5,[student],[association],[],The student association is the voice of the un...,span1 4:11\nspan2 12:23\nsignal,0,0,2,,0


In [7]:
min_avg_max(data)

Avg. length: 17.21246617523561
+++++++++++++++
min length/id: {'len_min': 3, 'original_id': 7587}
min context: Trees grow seeds.
+++++++++++++++
max length/id: {'len_max': 85, 'original_id': 4025}
max context: It was formerly known as How Park, possibly through the early connexion of William de Ow with the parish, and had its origin in the charter of 1200 granting William Briwere the elder chase of hare, fox, cat and wolf through all the king's land (per totam terram nostram) and warren of hares, pheasants and partridges throughout all his own lands, as also licence to inclose two coppices, one of which was situated between King's Somborne and Stockbridge and the other was called How Wood.


## EventCausality data set

In [8]:
data, mis = converter.convert_event_causality()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

# crest2brat(data, '../data/crest_brat/3')

samples: 583
mismatch: 1
+ causal: 583
- non-causal: 0


In [9]:
data.head()

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,1,R_1_3_1_102010.01.13.google.china.exit,[search],[return],[],"Previously , a search for "" Tiananmen "" would ...",span1 15:21\nspan2 51:57\nsignal,1,0,3,2010.01.13.google.china.exit,1
1,2,C_6_4_6_102010.01.13.google.china.exit,[attacks],[conclude],[],"The company says the attacks "" have led us to ...",span1 21:28\nspan2 46:54\nsignal,1,0,3,2010.01.13.google.china.exit,1
2,3,C_6_10_6_142010.01.13.google.china.exit,[conclude],[review],[],"The company says the attacks "" have led us to ...",span1 46:54\nspan2 70:76\nsignal,1,0,3,2010.01.13.google.china.exit,1
3,4,C_12_5_12_182010.01.13.google.china.exit,[deliveries],[interpreted],[],A large number of flower deliveries were made ...,span1 25:35\nspan2 105:116\nsignal,1,0,3,2010.01.13.google.china.exit,1
4,5,C_14_3_14_142010.01.13.google.china.exit,[leaves],[advancement],[],""" If Google leaves China , it is likely to be ...",span1 12:18\nspan2 57:68\nsignal,1,0,3,2010.01.13.google.china.exit,1


In [10]:
data.iloc[35].context

'" If these men are ever found , jail wo n\'t be enough to make them pay for the way they \'ve made us feel . " '

In [11]:
min_avg_max(data)

Avg. length: 42.451114922813034
+++++++++++++++
min length/id: {'len_min': 11, 'original_id': 'R_10_8_10_52010.01.01.iran.moussavi'}
min context: At least eight people were killed during those protests . 
+++++++++++++++
max length/id: {'len_max': 202, 'original_id': 'C_3_2_11_12010.03.17.france.eta.policeman'}
max context: French police responded to reports of car theft in a town near Paris late Tuesday and a shootout ensued with a group of alleged thieves . Most of them escaped but police captured one and he was later identified as a suspected ETA member , said the spokeswoman , who by custom is not identified .  Spanish media reported that the shootout occurred in the town of Dammarie-les-Lys .  The dead French policeman was wearing a bullet-proof vest but bullets struck fatally elsewhere on his body .  He was reported to be in his 50s , and the father of four children .  ETA has traditionally used France as its rearguard logistics and planning base to prepare attacks across the bor

## Causal-TimeBank

In [12]:
data, mis = converter.convert_causal_timebank()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

# crest2brat(data, '../data/crest_brat/4')

samples: 318
mismatch: 0
+ causal: 318
- non-causal: 0


In [13]:
data.head()

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,1,10,[was],[thought],[So],"Not that long ago , before the Chinese takeove...",span1 82:85\nspan2 215:222\nsignal 147:149,1,0,4,ABC19980108.1830.0711.xml,
1,2,27,[downturn],[spending],[],"But in the past three months , stocks have plu...",span1 88:96\nspan2 139:147\nsignal,1,0,4,ABC19980108.1830.0711.xml,
2,3,36,[change],[reposition],[So],"I think that the mood is fairly gloomy , and I...",span1 72:78\nspan2 174:184\nsignal 103:105,1,0,4,ABC19980108.1830.0711.xml,
3,4,6,[rains],[landslides],[],Officials in California are warning residents ...,span1 60:65\nspan2 105:115\nsignal,1,0,4,PRI19980213.2000.0313.xml,
4,5,22,[rains],[get],[],Forecasters say the picture will get worse bec...,span1 56:61\nspan2 33:36\nsignal,1,0,4,PRI19980213.2000.0313.xml,


In [14]:
min_avg_max(data)

Avg. length: 32.79874213836478
+++++++++++++++
min length/id: {'len_min': 13, 'original_id': '32'}
min context: Iraq said the roundup was to protect them from unspecified threats ; 
+++++++++++++++
max length/id: {'len_max': 107, 'original_id': '3'}
max context: WASHINGTON _ Following are statements made Friday and Thursday by Lawrence Wechsler , a lawyer for the White House secretary , Betty Currie ; the White House ; White House spokesman Mike McCurry , and President Clinton in response to an article in The New York Times on Friday about her statements regarding a meeting with the president : Wechsler on Thursday " Without commenting on the allegations raised in this article , to the extent that there is any implication or suggestion that Mrs. Currie was aware of any legal or ethical impropriety by anyone , that implication or suggestion is entirely inaccurate . " 


## EventStoryLine

In [15]:
data, mis = converter.convert_eventstorylines_v1(version="1.5")
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal (PRECONDITION and FALLING_ACTION): {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

# crest2brat(data, '../data/crest_brat/5')

samples: 2608
mismatch: 0
+ causal (PRECONDITION and FALLING_ACTION): 2608
- non-causal: 0


In [16]:
data.head(2)

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,1,246682,[double murder],[killing],[],Cumbria double murder : Son suspected of killi...,span1 8:21\nspan2 41:48\nsignal,1,1,5,32_11ecbplus.xml.xml,
1,2,246683,[sectioned],[suicide attempt],[],"John Jenkin , 23 , had been sectioned after an...",span1 28:37\nspan2 56:71\nsignal,1,1,5,32_11ecbplus.xml.xml,


In [17]:
a = data.loc[data['context'] == 'SEACOM downtime explained']
b = data.loc[data['original_id'] == '245609']

In [18]:
b

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
1382,1383,245609,[downtime],[explained],[],SEACOM downtime explained,span1 7:15\nspan2 16:25\nsignal,1,0,5,30_6ecbplus.xml.xml,


In [19]:
min_avg_max(data)

Avg. length: 43.20782208588957
+++++++++++++++
min length/id: {'len_min': 4, 'original_id': '245609'}
min context: SEACOM downtime explained 
+++++++++++++++
max length/id: {'len_max': 839, 'original_id': '241458'}
max context: The Athens protest march marking the zenith of the general strike called for the 5th of May was attended by an approximate 200 , 000 ( 20 , 000 which is the foreign broadcast number referring to the PAME march alone ) , although because of lack of media coverage due to the media participation in the general strike no concrete estimates can be made . After the PAME ( Communist Party union ) protesters left Syntagma square , the first lines of the main march started arriving before the Parliament with the first clashes erupting at the end of Stadiou street . The march then walked on the Unknown Soldier grounds leading the Presidential Guard to retreat , and attempted to storm the Parliament but was pushed back by riot police forces which today demonstrated a parti

## CaTeRS

In [20]:
data, mis = converter.convert_caters()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[(data["label"] == 1) | (data["label"] == 2)])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

# crest2brat(data, '../data/crest_brat/6')

samples: 2502
mismatch: 0
+ causal: 308
- non-causal: 2194


In [21]:
data.head()

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,1,R1,[resuscitation],[passed away],[],There was a man in the alley named Bill\nBill ...,span1 194:207\nspan2 166:177\nsignal,0,0,6,test_15Oct.ann,2
1,2,R2,[shot],[passed away],[],There was a man in the alley named Bill\nBill ...,span1 136:140\nspan2 166:177\nsignal,1,0,6,test_15Oct.ann,2
2,3,R3,[intoxicated],[insulted],[],There was a man in the alley named Bill\nBill ...,span1 49:60\nspan2 91:99\nsignal,1,0,6,test_15Oct.ann,2
3,4,R4,[insulted],[shot],[],There was a man in the alley named Bill\nBill ...,span1 91:99\nspan2 136:140\nsignal,1,0,6,test_15Oct.ann,2
4,5,R6,[ruined],[sad],[],Grayson wanted to bake his brother a birthday ...,span1 192:198\nspan2 232:235\nsignal,1,0,6,test_15Oct.ann,2


In [22]:
min_avg_max(data)

Avg. length: 42.48760991207035
+++++++++++++++
min length/id: {'len_min': 20, 'original_id': 'R90'}
min context: Billy felt lonely in school.
He had no friends.
One day, a new kid came to school.
They instantly became friends.
They became inseparable.
+++++++++++++++
max length/id: {'len_max': 65, 'original_id': 'R123'}
max context: My mother's cat was ill, so my brother took it to the vet for her.
They said the cat needed to stay there for some tests.
They called my brother an hour later, telling him to come pick it up.
When he got there, the girl at the front desk said the cat had died!
Later that day, she called to say she was mistaken, the cat was fine.


## BECAUSE v2.1

Since the raw text files for **PTB** and **NYT** need LDC subscription, these file have not been covered in our data reader yet. Once we have access to the raw files from these data resources, we will write the proper data readers for them.

In [None]:
# Running this cell may take a while because it uses spaCy (spacy.load("en_core_web_sm")) for processing documents.

data, mis = converter.convert_because()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[(data["label"] == 1) | (data["label"] == 2)])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

# crest2brat(data, '../data/crest_brat/7')

In [None]:
data.head()

In [38]:
min_avg_max(data)

Avg. length: 32.358024691358025
+++++++++++++++
min length: 3
min context: so why
not? 
+++++++++++++++
max length: 84
max context: I think this is an important task, and there's a great deal of agreement, that we should be moving to empower the Federal Reserve to have regulatory authority over a wide range of financial institutions in recognition in part of the fact that they have a systemic impact and that the current situation puts the Fed in an untenable position of being given a set of expectations to respond when it doesn't have the full panoply of tools to respond.
    


## Choice of Plausible Alternatives (COPA)

In [23]:
data, mis = converter.convert_copa(dataset_code=2)
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

samples: 2000
mismatch: 0
+ causal: 1000
- non-causal: 1000


In [24]:
data.head()

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,1,1501,[The item was packaged in bubble wrap.],[It was fragile.],[],The item was packaged in bubble wrap. It was f...,span1 0:37\nspan2 38:53\nsignal,1,1,8,BCOPA-CE.xml,2
1,2,1501,[The item was packaged in bubble wrap.],[It arrived at its destination intact.],[],The item was packaged in bubble wrap. It arriv...,span1 0:37\nspan2 38:75\nsignal,0,1,8,BCOPA-CE.xml,2
2,3,1502,[I emptied my pockets.],[I retrieved a ticket stub.],[],I emptied my pockets. I retrieved a ticket stub.,span1 0:21\nspan2 22:48\nsignal,1,0,8,BCOPA-CE.xml,2
3,4,1502,[I emptied my pockets.],[I lost my keys.],[],I emptied my pockets. I lost my keys.,span1 0:21\nspan2 22:37\nsignal,0,0,8,BCOPA-CE.xml,2
4,5,1503,[Termites invaded the house.],[The termites ate through the wood in the house.],[],Termites invaded the house. The termites ate t...,span1 0:27\nspan2 28:75\nsignal,1,0,8,BCOPA-CE.xml,2


## Penn Discourse Treebank (PDTB 3.0)

In [None]:
# Running this cell may take a while because it uses spaCy (spacy.load("en_core_web_sm")) for processing documents.

data, mis = converter.convert_pdtb3()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

In [None]:
data.head()

## BioCause

In [15]:
data, mis = converter.convert_biocause()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

[crest-log] Error in converting BioCause. Detail: 
samples: 844
mismatch: 0
+ causal: 844
- non-causal: 0


In [16]:
data.head(3)

Unnamed: 0,global_id,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,1,,[Each paired reaction set (TB/PA14) resulted i...,[there is little difference between the two is...,[These results show that],Characterization of the cheB2 Mutant \nWe chos...,span1 836:954\nspan2 997:1131\nsignal 973:996,1,0,10,PMC2714965-02-Results-05.ann,0
1,2,,"[As shown in Figure 3A, the newly engineered c...",[a delayed C. elegans killing comparable to th...,[showed],Characterization of the cheB2 Mutant \nWe chos...,span1 1377:1467\nspan2 1468:1558\nsignal 1461:...,1,0,10,PMC2714965-02-Results-05.ann,0
2,3,,"[In addition, we engineered a similar cheB2 mu...",[the virulence phenotype of a cheB2 mutant and...,[This further confirmed],Characterization of the cheB2 Mutant \nWe chos...,span1 1692:1882\nspan2 1919:2016\nsignal 1896:...,1,0,10,PMC2714965-02-Results-05.ann,0


In [10]:
crest2brat(data, 'biocause')

## Temporal and Causal Reasoning (TCR)

In [11]:
data, mis = converter.convert_tcr()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

samples: 172
mismatch: 0
+ causal: 172
- non-causal: 0


In [12]:
data.head(3)

Unnamed: 0,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,,[addressed],[killed],[],"\nMir Hossein Moussavi, the reformist Iranian ...",span1 204:213\nspan2 292:298\nsignal,1,1,11,2010.01.01.iran.moussavi.tml,1
1,,[killed],[denies],[],"\nMir Hossein Moussavi, the reformist Iranian ...",span1 292:298\nspan2 422:428\nsignal,1,0,11,2010.01.01.iran.moussavi.tml,1
2,,[stop],[making],[],"\nMir Hossein Moussavi, the reformist Iranian ...",span1 1733:1737\nspan2 1753:1759\nsignal,1,0,11,2010.01.01.iran.moussavi.tml,1


## Benchmark Corpus for Adverse Drug Effects (ADE)

In [8]:
data, mis = converter.convert_ade()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

samples: 5671
mismatch: 0
+ causal: 5671
- non-causal: 0


In [9]:
data.head(3)

Unnamed: 0,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,,[azithromycin],[ototoxicity],[],Intravenous azithromycin-induced ototoxicity.\...,span1 12:24\nspan2 33:44\nsignal,1,0,12,,0
1,,[dihydrotachysterol],[increased calcium-release],[],Unaccountable severe hypercalcemia in a patien...,span1 898:916\nspan2 950:975\nsignal,1,0,12,,0
2,,[hypercalcemia],[dihydrotachysterol],[],Unaccountable severe hypercalcemia in a patien...,span1 21:34\nspan2 84:102\nsignal,1,1,12,,0


In [10]:
min_avg_max(data)

Avg. length: 124.4203844119203
+++++++++++++++
min length/id: {'len_min': 20, 'original_id': ''}
min context: Prothipendylhydrochloride-induced priapism: case report.

We present the first case of a patient with priapism after oral intake of the phenothiazine prothipendylhydrochloride.
+++++++++++++++
max length/id: {'len_max': 420, 'original_id': ''}
max context: Erythema multiforme associated with phenytoin and cranial radiation therapy: a report of three patients and review of the literature.

Intracranial malignancies (primary and metastatic) are often complicated by seizure activity. Phenytoin (Dilantin) is typically employed as prophylactic anticonvulsant in this setting. Uncommonly, erythema multiforme (EM) can develop in such patients at the port site during or soon after cranial radiation and can rapidly progress to EM major. Herein, in addition to a comprehensive literature review of this entity, three additional patients are presented. The acronym 'EMPACT' is suggested (E: e

## SemEval 2020 Task 5

In [5]:
data, mis = converter.convert_semeval_2020_5()
total_samples += len(data)

print("samples: " + str(len(data)))
print("mismatch: " + str(mis))
print("+ causal: {}".format(len(data.loc[data["label"] == 1])))
print("- non-causal: {}".format(len(data.loc[data["label"] == 0])))

samples: 5501
mismatch: 0
+ causal: 5501
- non-causal: 0


In [6]:
data.head(3)

Unnamed: 0,original_id,span1,span2,signal,context,idx,label,direction,source,ann_file,split
0,,[I don't think any of us---even economic gurus...,[if the stimulus bill had become hamstrung by ...,[],I don't think any of us---even economic gurus ...,span1 0:139\nspan2 140:233\nsignal,1,1,13,,0
1,,[The GOP's malignant amnesia regarding the eco...,[were it not for the wreckage they caused],[],The GOP's malignant amnesia regarding the econ...,span1 0:68\nspan2 69:109\nsignal,1,1,13,,0
2,,[Had the SEC followed its own standard procedu...,[OPKO and Dr. Frost would gladly have provided...,[],OPKO said in a statement Friday that it was aw...,span1 229:277\nspan2 279:467\nsignal,1,0,13,,0


In [7]:
min_avg_max(data)

Avg. length: 30.79930921650609
+++++++++++++++
min length/id: {'len_min': 4, 'original_id': ''}
min context: I wish I were.
+++++++++++++++
max length/id: {'len_max': 265, 'original_id': ''}
max context: We have been fulfilling our role there." ON STEPPING UP BOND PURCHASES "That is a matter for further consideration, but as you know we don't precommit or even announce the type of policy that we pursue with the Securities Market Programme." "You only have after each week the total amount that has been used, as you have seen that is according to developments in the market, in particular in passing a judgment on whether such developments can have some economic justification in the fundamentals of the economy or not." "The program is still there and you have seen that in the more recent weeks in view of developments we have used a little bit more than in recent months." ON GREECE "There are two elements there (Greece)...in the first place the results for this year are in big part the resu