Skip to content

Commit

Permalink
some modifications holocaust1
Browse files Browse the repository at this point in the history
  • Loading branch information
FEREDJ committed Aug 1, 2017
2 parents bc91891 + ffc27ba commit 1ccbea6
Show file tree
Hide file tree
Showing 16 changed files with 202 additions and 187 deletions.
11 changes: 9 additions & 2 deletions grobid-ner/doc/class-and-senses.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,14 @@ than 1,200</ENAMEX> synagogues were damaged or destroyed.

* <span style="color:#848484">_The history can be divided into four periods: the_</span> **_first_**<span style="color:#848484">, _from 1919 to 1940_</span> <br/>

* <span style="color:#848484">_there occurred a boycott of Jewish businesses, which was the_</span> **_first_** <span style="color:#848484">_national antisemitic campaign_</span> (the "first campaign" is the boycott) <br/> <br/>
* <span style="color:#848484">_there occurred a boycott of Jewish businesses, which was the_</span> **_first_** <span style="color:#848484">_national antisemitic campaign_</span> (the "first campaign" is the boycott) <br/>

* **_second_** <span style="color:#848484">_place in the 2009 European elections and_</span> **_first_** <span style="color:#848484">_place in the 2014 European elections_</span> <br/>

* <span style="color:#848484">_his was the_</span> **_first_** <span style="color:#848484">_time since the 1910 general election_</span> <br/>
* <span style="color:#848484">_These were their_</span> **_first_** <span style="color:#848484">_elected MPs_</span> <br/> <br/>

* But referring expressions, or ordinals not really ordering or quantifying, should **not** be annotated MEASURE.
For example:
Expand All @@ -210,7 +217,7 @@ For example:

* Plurals like in <span style="color:#848484">_the first jews to be deported_</span>.

=> in these examples it's impossible to enumerate precisely what is « first ».
=> in these examples it's impossible to enumerate precisely what is « first ». Furthermore, it can't really be replaced by "second" or "third".

➡ Expressions measuring nothing are not to be annotated, for example [(issue #14)](https://github.com/kermitt2/grobid-ner/issues/14):

Expand Down
6 changes: 3 additions & 3 deletions grobid-ner/doc/largest-entity-mention.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,13 +83,13 @@ Coordinated words are annotated as one entity. For example:
Two named entities in apposition are annotated as one NE. If there is a comma, its role is equivalent to a functional word and introduces an apposition, therefore it does not split the entity. For example:

```xml
- Meanwhile, <ENAMEX type="PERSON">Nigel Farage, leader of the anti-EU UKIP</ENAMEX> stood
- Meanwhile, <ENAMEX type="PERSON">Neil Hamilton, Chairman of the anti-EU UKIP</ENAMEX> stood
down after his party's long-term ambition had been accomplished.

- Meanwhile, the <ENAMEX type="PERSON">leader of the anti-EU UKIP, Nigel Farage</ENAMEX> stood
- Meanwhile, the <ENAMEX type="PERSON">Chairman of the anti-EU UKIP, Neil Hamilton</ENAMEX> stood
down after his party's long-term ambition had been accomplished.

- Meanwhile, the <ENAMEX type="PERSON">leader of the anti-EU UKIP Nigel Farage</ENAMEX> stood
- Meanwhile, the <ENAMEX type="PERSON">Chairman of the anti-EU UKIP Neil Hamilton</ENAMEX> stood
down after his party's long-term ambition had been accomplished.
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@
<sentence xml:id="P0E0">In retrospect, the <ENAMEX type="INSTITUTION">National Archives of Belgium</ENAMEX> were established by the <ENAMEX type="LEGAL">French law of October 26th 1796 (5 Brumair V)</ENAMEX>, which, amongst others, foresaw in the organisation of departmental depots (amongst others, in <ENAMEX type="LOCATION">Brussels</ENAMEX>), in which the archives of the disbanded institutions of the <ENAMEX type="PERIOD">Ancien Régime</ENAMEX> would be stored.</sentence>
<sentence xml:id="P0E1">In <ENAMEX type="PERIOD">1831</ENAMEX>, the archive depot in <ENAMEX type="LOCATION">Brussels</ENAMEX> was officially named the <ENAMEX type="INSTITUTION">National Archives of Belgium</ENAMEX>.</sentence>
<sentence xml:id="P0E2">Already in the <ENAMEX type="PERIOD">early nineteenth century</ENAMEX>, more archival depots in the provinces were installed, which were officially placed under the direction of the <ENAMEX type="TITLE">National State Archivist</ENAMEX> (who holds his office in the <ENAMEX type="INSTITUTION">National Archives</ENAMEX>) in <ENAMEX type="PERIOD">1851</ENAMEX>.</sentence>
<sentence xml:id="P0E3">The “<ENAMEX type="INSTITUTION">Archives Générales du Royaume</ENAMEX>”(<ENAMEX type="INSTITUTION">National Archives of Belgium</ENAMEX>) and the “<ENAMEX type="INSTITUTION">Archives de l’État dans les Provinces</ENAMEX>”(<ENAMEX type="INSTITUTION">State Archives in the Provinces</ENAMEX>), in other words the <ENAMEX type="INSTITUTION">State Archives</ENAMEX> are a federal academic establishment that forms part of the <ENAMEX type="INSTITUTION">Service Public Fédéral de Programmation Politique scientifique</ENAMEX>”(<ENAMEX type="INSTITUTION">Belgian Federal Science Policy Office</ENAMEX>).</sentence>
<sentence xml:id="P0E3"><ENAMEX type="INSTITUTION">The “Archives Générales du Royaume”(National Archives of Belgium) and the “Archives de l’État dans les Provinces”(State Archives in the Provinces)</ENAMEX>, in other words the <ENAMEX type="INSTITUTION">State Archives</ENAMEX> are a federal academic establishment that forms part of the <ENAMEX type="INSTITUTION">Service Public Fédéral de Programmation Politique scientifique”(Belgian Federal Science Policy Office)</ENAMEX>.</sentence>
<sentence xml:id="P0E4">The institution includes the “<ENAMEX type="INSTITUTION">Archives Générales du Royaume</ENAMEX>” in <ENAMEX type="LOCATION">Brussels</ENAMEX> and <ENAMEX type="MEASURE">18</ENAMEX> <ENAMEX type="INSTITUTION">State Archives</ENAMEX> that are distributed throughout the country.</sentence>
<sentence xml:id="P0E5">The <ENAMEX type="INSTITUTION">State Archives</ENAMEX> ensure the proper preservation of archival documents produced and managed by the state authorities.</sentence>
<sentence xml:id="P0E6">For this purpose, the <ENAMEX type="INSTITUTION">State Archives</ENAMEX> issue directives and recommendations, conduct inspections, organises training for civil servants and act as an advisory body for the construction and preparation of premises for the conservation of archives and for the organisation of archive management within a public authority.</sentence>
<sentence xml:id="P0E7">The <ENAMEX type="INSTITUTION">State Archives</ENAMEX> obtain and preserve (following sorting) archive documents that are at least <ENAMEX type="PERIOD">30 years</ENAMEX> old from courts, tribunals, public authorities, notaries and from the private sector and private individuals (companies, politicians, associations and societies, influential families, etc. that have played an important role in society).</sentence>
<sentence xml:id="P0E8">They ensure that public archives are transferred according to strict archival standards.</sentence>
</p>
<p xml:lang="en" xml:id="P1">
<sentence xml:id="P1E0">The <ENAMEX type="INSTITUTION">National Archives of Belgium</ENAMEX> 2 – <ENAMEX type="PERSON">Joseph Cuvelier</ENAMEX> repository preserves the archives of the external services of the <ENAMEX type="INSTITUTION">Federal Public Service Justice</ENAMEX> (penal institutions), the courts and tribunals under the responsiblity of the <ENAMEX type="LOCATION">Brussels-Capital Region</ENAMEX> (justices of peace, <ENAMEX type="INSTITUTION">police tribunals, Court of Cassation</ENAMEX>, etc.), the <ENAMEX type="INSTITUTION">Federal Public Service Economy</ENAMEX> (patents), the <ENAMEX type="INSTITUTION">Ministry for Reconstruction</ENAMEX> (files on war damages) and business archives.</sentence>
<sentence xml:id="P1E0">The <ENAMEX type="INSTITUTION">National Archives of Belgium 2 – Joseph Cuvelier repository</ENAMEX> preserves the archives of the external services of the <ENAMEX type="INSTITUTION">Federal Public Service Justice</ENAMEX> (penal institutions), the courts and tribunals under the responsiblity of the <ENAMEX type="LOCATION">Brussels-Capital Region</ENAMEX> (<ENAMEX type="INSTITUTION">justices of peace, police tribunals, Court of Cassation</ENAMEX>, etc.), the <ENAMEX type="INSTITUTION">Federal Public Service Economy</ENAMEX> (patents), the <ENAMEX type="INSTITUTION">Ministry for Reconstruction</ENAMEX> (files on war damages) and business archives.</sentence>
</p>
<p xml:lang="en" xml:id="P2">
<sentence xml:id="P2E0">There are several online search engines: keyword, archives, creator, persons, themes (<ENAMEX type="WEBSITE">http://search.arch.be/</ENAMEX>).</sentence>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
<sentence xml:id="P6E1">In <ENAMEX type="PERIOD">1940</ENAMEX>, following the invasion by the <ENAMEX type="LOCATION">Soviet Union</ENAMEX>, <ENAMEX type="INSTITUTION">the Archives</ENAMEX> administration was transferred to <ENAMEX type="LOCATION">Vilnius</ENAMEX>.</sentence>
<sentence xml:id="P6E2">From an organizational point of view, <ENAMEX type="INSTITUTION">the Archives</ENAMEX> were now under the supervision of the <ENAMEX type="INSTITUTION">NKVD</ENAMEX>.</sentence>
<sentence xml:id="P6E3">The confidential documents were transferred to the <ENAMEX type="LOCATION">Soviet Union</ENAMEX> and afterwards were only partially returned.</sentence>
<sentence xml:id="P6E4">Even after <ENAMEX type="EVENT">World War II</ENAMEX>, <ENAMEX type="INSTITUTION">the Archives</ENAMEX> remained under the supervision of the <ENAMEX type="INSTITUTION">NKVD</ENAMEX>; in <ENAMEX type="PERIOD">1960</ENAMEX>, <ENAMEX type="INSTITUTION">the Archives</ENAMEX> was transferred to the supervision of the <ENAMEX type="INSTITUTION">National Council of Soviet Union Archives</ENAMEX>.</sentence>
<sentence xml:id="P6E4">Even <ENAMEX type="PERIOD">after World War II</ENAMEX>, <ENAMEX type="INSTITUTION">the Archives</ENAMEX> remained under the supervision of the <ENAMEX type="INSTITUTION">NKVD</ENAMEX>; in <ENAMEX type="PERIOD">1960</ENAMEX>, <ENAMEX type="INSTITUTION">the Archives</ENAMEX> was transferred to the supervision of the <ENAMEX type="INSTITUTION">National Council of Soviet Union Archives</ENAMEX>.</sentence>
</p>
<p xml:lang="en" xml:id="P8">
<sentence xml:id="P8E0">The current period began with the enactment of <ENAMEX type="LEGAL">the Archives Law</ENAMEX>.</sentence>
Expand All @@ -37,7 +37,7 @@
<sentence xml:id="P14E0"><ENAMEX type="INSTITUTION">the Archives</ENAMEX> preserves records of state, local government, enterprises, religious communities, popular organizations, other non-state institutions and individuals, dating <ENAMEX type="PERIOD">from 1918 until 1990</ENAMEX>.</sentence>
<sentence xml:id="P14E1"><ENAMEX type="INSTITUTION">The division of Sound and Image</ENAMEX> is the main repository of audiovisual heritage in <ENAMEX type="LOCATION">Lithuania</ENAMEX>.</sentence>
<sentence xml:id="P14E2">It preserves moving pictures <ENAMEX type="PERIOD">since 1919</ENAMEX>, photo negatives and positives <ENAMEX type="PERIOD">since 1850&apos;s</ENAMEX>, sound recordings <ENAMEX type="PERIOD">since 1950</ENAMEX>&apos;s, videotapes <ENAMEX type="PERIOD">since 1988 until the present day</ENAMEX>.</sentence>
<sentence xml:id="P14E3"><ENAMEX type="INSTITUTION">Archives</ENAMEX> holdings comprise of approximately <ENAMEX type="MEASURE">31 000</ENAMEX> linear meters of records.</sentence>
<sentence xml:id="P14E3"><ENAMEX type="INSTITUTION">Archives</ENAMEX> holdings comprise of approximately <ENAMEX type="MEASURE">31 000 linear meters</ENAMEX> of records.</sentence>
<sentence xml:id="P14E4">&quot;<ENAMEX type="INSTITUTION">the Archive</ENAMEX> contains <ENAMEX type="MEASURE">177</ENAMEX> collections from the <ENAMEX type="EVENT">Nazi occupation</ENAMEX> period (a total of <ENAMEX type="MEASURE">72 189</ENAMEX> items i.e. files).</sentence>
<sentence xml:id="P14E5">They are collections of the documents of the civil government, offices, military offices and police stations and their branches, industrial, transport and other companies and offices of the <ENAMEX type="EVENT">Nazi period</ENAMEX>.&quot;</sentence>
</p>
Expand All @@ -56,9 +56,9 @@
<sentence xml:id="P20E1">Accumulation of video recordings started in <ENAMEX type="PERIOD">1988</ENAMEX>.</sentence>
<sentence xml:id="P20E2">They were mostly video clips filmed by cameramen of <ENAMEX type="INSTITUTION">the Archives</ENAMEX>.</sentence>
<sentence xml:id="P20E3">They testify to the activities of the <ENAMEX type="ORGANISATION">National Liberation Movement Sąjūdis</ENAMEX>, the restoration of <ENAMEX type="EVENT">independence of Lithuania</ENAMEX>, the events of <ENAMEX type="PERIOD">January 1991</ENAMEX>.</sentence>
<sentence xml:id="P20E4">There are also video recordings of reminiscences of the former deportees, political prisoners, partisans submitted to <ENAMEX type="INSTITUTION">the Archives</ENAMEX> by the <ENAMEX type="ORGANISATION">Lithuanian National Foundation, Inc</ENAMEX>; Approximately <ENAMEX type="MEASURE">9,000</ENAMEX> cinematographic documents (<ENAMEX type="PERIOD">1910s– to date</ENAMEX>).</sentence>
<sentence xml:id="P20E4">There are also video recordings of reminiscences of the former deportees, political prisoners, <ENAMEX type="PERSON_TYPE">partisans</ENAMEX> submitted to <ENAMEX type="INSTITUTION">the Archives</ENAMEX> by the <ENAMEX type="ORGANISATION">Lithuanian National Foundation, Inc</ENAMEX>; Approximately <ENAMEX type="MEASURE">9,000</ENAMEX> cinematographic documents (<ENAMEX type="PERIOD">1910s– to date</ENAMEX>).</sentence>
<sentence xml:id="P20E5">Copies of the films made by the <ENAMEX type="PERSON">Lumieres</ENAMEX> in <ENAMEX type="PERIOD">1895</ENAMEX> – the earliest films ever made – are preserved in <ENAMEX type="INSTITUTION">the Archives</ENAMEX>.</sentence>
<sentence xml:id="P20E6">There is also a collection of the <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> newsreels <ENAMEX type="PERIOD">from 1918–1940</ENAMEX> and the works of the <ENAMEX type="MEASURE">first</ENAMEX> <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> cameramen – <ENAMEX type="PERSON">K. Lukšys, J.Milius, brothers Motūza-Beleckas, S. Vainalavičius, S. Uzdonas and J. Miežlaiškis</ENAMEX>; cinematographic material relating to the <ENAMEX type="EVENT">World War II</ENAMEX> period; documentaries made in the <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> Film Studios and <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> Television in <ENAMEX type="PERIOD">1946–1990</ENAMEX>; the <ENAMEX type="MEASURE">first</ENAMEX> <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> feature films &quot;<ENAMEX type="CREATION">Blue Horizon&quot; (Žydrasis horizontas),&quot;The Bridge&quot; (Tiltas), &quot;Ignotas Returned Home&quot; (Ignotas grįžo namo), &quot;The Turkeys&quot; (Kalakutai)</ENAMEX>, and others.</sentence>
<sentence xml:id="P20E6">There is also a collection of the <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> newsreels <ENAMEX type="PERIOD">from 1918–1940</ENAMEX> and the works of the <ENAMEX type="MEASURE">first</ENAMEX> <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> cameramen – <ENAMEX type="PERSON">K. Lukšys, J.Milius, brothers Motūza-Beleckas, S. Vainalavičius, S. Uzdonas and J. Miežlaiškis</ENAMEX>; cinematographic material relating to the <ENAMEX type="EVENT">World War II</ENAMEX> period; documentaries made in the <ENAMEX type="BUSINESS">Lithuanian Film Studios</ENAMEX> and <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> Television in <ENAMEX type="PERIOD">1946–1990</ENAMEX>; the <ENAMEX type="MEASURE">first</ENAMEX> <ENAMEX type="NATIONAL">Lithuanian</ENAMEX> feature films &quot;<ENAMEX type="CREATION">Blue Horizon&quot; (Žydrasis horizontas),&quot;The Bridge&quot; (Tiltas), &quot;Ignotas Returned Home&quot; (Ignotas grįžo namo), &quot;The Turkeys&quot; (Kalakutai)</ENAMEX>, and others.</sentence>
</p>
</document>
</subcorpus>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<subcorpus>
<document name="EHRI_unit_il-002798-o_32-61.en">
<p xml:lang="en" xml:id="P0">
<sentence xml:id="P0E0">Documentation of the murder of the <ENAMEX type="PERSON_TYPE">Jews of Crimea</ENAMEX>, <ENAMEX type="PERIOD">1941-1942</ENAMEX> Excerpts of documents of the <ENAMEX type="INSTITUTION">Soviet Extraordinary State Commission</ENAMEX> (<ENAMEX type="INSTITUTION">ChGK</ENAMEX>) in the <ENAMEX type="NATIONAL">Crimean</ENAMEX> occupied territory, including information regarding the murder of <ENAMEX type="PERSON_TYPE">Jews</ENAMEX> in <ENAMEX type="LOCATION">Yevpatoria, Zuya, Kheyrus and Biyuk-Onlar</ENAMEX>, names of those who perished, <ENAMEX type="PERIOD">January-March 1942</ENAMEX>; notice from the <ENAMEX type="TITLE">mayor of Simferopol</ENAMEX> regarding housing for <ENAMEX type="PERSON_TYPE">Jews</ENAMEX>, <ENAMEX type="PERIOD">15 March 1942</ENAMEX>; Ausweis (identity card) of <ENAMEX type="PERSON">Foma Paskitovski</ENAMEX>, from <ENAMEX type="LOCATION">Kerch</ENAMEX>; copy of a photograph of a cell in the <ENAMEX type="INSTALLATION">Kerch jail</ENAMEX>, <ENAMEX type="PERIOD">December 1941</ENAMEX>.</sentence>
<sentence xml:id="P0E0">Documentation of the murder of the <ENAMEX type="PERSON_TYPE">Jews of Crimea</ENAMEX>, <ENAMEX type="PERIOD">1941-1942</ENAMEX> Excerpts of documents of the <ENAMEX type="INSTITUTION">Soviet Extraordinary State Commission (ChGK) in the Crimean occupied territory</ENAMEX>, including information regarding the murder of <ENAMEX type="PERSON_TYPE">Jews</ENAMEX> in <ENAMEX type="LOCATION">Yevpatoria, Zuya, Kheyrus and Biyuk-Onlar</ENAMEX>, names of those who perished, <ENAMEX type="PERIOD">January-March 1942</ENAMEX>; notice from the <ENAMEX type="TITLE">mayor of Simferopol</ENAMEX> regarding housing for <ENAMEX type="PERSON_TYPE">Jews</ENAMEX>, <ENAMEX type="PERIOD">15 March 1942</ENAMEX>; Ausweis (identity card) of <ENAMEX type="PERSON">Foma Paskitovski</ENAMEX>, from <ENAMEX type="LOCATION">Kerch</ENAMEX>; copy of a photograph of a cell in the <ENAMEX type="INSTALLATION">Kerch jail</ENAMEX>, <ENAMEX type="PERIOD">December 1941</ENAMEX>.</sentence>
</p>
</document>
</subcorpus>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
<subcorpus>
<document name="EHRI_unit_il-002798-p_37-278.en">
<p xml:lang="en" xml:id="P0">
<sentence xml:id="P0E0">Articles from <ENAMEX type="PERSON_TYPE">European Zionist</ENAMEX> newspapers, <ENAMEX type="PERIOD">1901-1931</ENAMEX> Section of the <ENAMEX type="MEDIA">L&apos;echo Sioniste</ENAMEX> newspaper with reports on <ENAMEX type="PERSON_TYPE">Jewish</ENAMEX> problems in <ENAMEX type="LOCATION">Bulgaria</ENAMEX>, <ENAMEX type="PERIOD">May 1901</ENAMEX>; Page from the <ENAMEX type="MEDIA">Die Arbeit</ENAMEX> newspaper of <ENAMEX type="ORGANISATION">Hapoel Hatzair</ENAMEX>, <ENAMEX type="PERIOD">15 February 1920</ENAMEX>; Issue of <ENAMEX type="MEDIA">Haolam</ENAMEX> newspaper with excerpts on <ENAMEX type="PERSON_TYPE">Zionist</ENAMEX> events in <ENAMEX type="LOCATION">Europe</ENAMEX> at the time.</sentence>
<sentence xml:id="P0E0">Articles from <ENAMEX type="CONCEPTUAL">European Zionist</ENAMEX> newspapers, <ENAMEX type="PERIOD">1901-1931</ENAMEX> Section of the <ENAMEX type="MEDIA">L&apos;echo Sioniste</ENAMEX> newspaper with reports on <ENAMEX type="PERSON_TYPE">Jewish</ENAMEX> problems in <ENAMEX type="LOCATION">Bulgaria</ENAMEX>, <ENAMEX type="PERIOD">May 1901</ENAMEX>; Page from the <ENAMEX type="MEDIA">Die Arbeit</ENAMEX> newspaper of <ENAMEX type="ORGANISATION">Hapoel Hatzair</ENAMEX>, <ENAMEX type="PERIOD">15 February 1920</ENAMEX>; Issue of <ENAMEX type="MEDIA">Haolam</ENAMEX> newspaper with excerpts on <ENAMEX type="CONCEPTUAL">Zionist</ENAMEX> events in <ENAMEX type="LOCATION">Europe</ENAMEX> at the time.</sentence>
</p>
</document>
</subcorpus>
Expand Down

0 comments on commit 1ccbea6

Please sign in to comment.