Skip to content

Commit

Permalink
doc: adding CONCEPT & MEASURE examples - issue #54
Browse files Browse the repository at this point in the history
  • Loading branch information
Emilia Verzeni committed Aug 31, 2017
1 parent dbd1d55 commit 59781a3
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions grobid-ner/doc/class-and-senses.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@

GROBID NER identifies named-entities and classifies them in 27 classes, as compared to the 4-classes or 7-classes model of most of the existing NER open source tools (usually using the Reuters/CoNLL 2003 annotated corpus, or the MUC annotated corpus).

In addition the entities are often enriched with WordNet sense annotations to help further disambiguation and resolution of the entity. GROBID NER has been developed for the purpose of disambiguating and resolving entities against knowledge bases such as Wikipedia and FreeBase. Sense information can help to disambiguate the entity, because they refine the entity class based on contextual clues.
Expand All @@ -18,7 +18,7 @@ The following table describes the 27 named entity classes produced by the model.
| <a style="text-decorations:none; color:#265C83" href=#artifact> ARTIFACT | human-made object, including softwares | _FIAT 634_, _Microsoft Word_ |
| AWARD | award for art, science, sport, etc. | _Ballon d'or_, _Nobel prize_|
| BUSINESS | company / commercial organisation | _Air Canada_, _Microsoft_ |
| <a style="text-decorations:none; color:#265C83" href=#concept> CONCEPT | abstract concept not included in another class | _English_ <span style="color:#848484">(as language)</span>, _Communism_, _Zionism_ |
| <a style="text-decorations:none; color:#265C83" href=#concept> CONCEPT | abstract concept not included in another class | _English_ <span style="color:#848484">(as language)</span>, _Communism_, _Zionism_, _FTSE 100_, _CAC40_|
| <a style="text-decorations:none; color:#265C83" href=#conceptual> CONCEPTUAL | entity relating to a concept | _Greek_ <span style="color:#848484">myths</span>, _eurosceptic_ <span style="color:#848484">doctrine</span> |
| <a style="text-decorations:none; color:#265C83" href=#creation> CREATION | artistic creation, such as song, movie, book, TV show, etc. | _Monna Lisa_, _Mullaholland drive_, _Kitchen Nightmares_, _EU Referendum: The Great Debate_, _Europe: The Final Debate_ |
| <a style="text-decorations:none; color:#265C83" href=#event> EVENT | event | _World War 2_, _Battle of France_, _Brexit referendum_|
Expand All @@ -27,7 +27,7 @@ The following table describes the 27 named entity classes produced by the model.
| <a style="text-decorations:none; color:#265C83" href=#institution> INSTITUTION | organization of people and a location or structure that share the same name | _Yale University_, _European Patent Office_, <span style="color:#848484">the</span> _British government_, _European Union_, _City Police_, _Eurozone_ |
| <a style="text-decorations:none; color:#265C83" href=#legal> LEGAL | legal mentions such as article of law, convention, cases, treaty., etc. | _European Patent Convention_, &nbsp; &nbsp; _Maastricht Treaty_, &nbsp; &nbsp; _Article 52(2)(c) and (3)_, &nbsp; &nbsp; _Roe v. Wade 410 U.S.113 (1973)_, &nbsp; &nbsp; _European Union Referendum Act 2015_ |
| <a style="text-decorations:none; color:#265C83" href=#location> LOCATION | physical location, including planets and galaxies. | _Los Angeles_, _Northern Madagascar_, _Southern Thailand_, _Channel Islands_, _Earth_, _Milky Way_, _West Mountain_, _Warsaw Ghetto_ |
| <a style="text-decorations:none; color:#265C83" href=#measure> MEASURE | numerical amount, including an optional unit of measure | _1 500_, &nbsp; _six million_, &nbsp; _72%_, &nbsp; _50°2′9″N 19°10′42″E_ |
| <a style="text-decorations:none; color:#265C83" href=#measure> MEASURE | numerical amount, including an optional unit of measure | _1 500_, &nbsp; _six million_, &nbsp; _72%_, &nbsp; _50°2′9″N 19°10′42″E_, _AA+_|
| MEDIA | media organization or publication | _Le monde_, _The New York Times_ |
| <a style="text-decorations:none; color:#265C83" href=#national> NATIONAL | relating to a location | _North American_, _German_, _British_ |
| <a style="text-decorations:none; color:#265C83" href=#organisation> ORGANISATION | organized group of people, with some sort of legal entity and concrete membership | _Alcoholics Anonymous_, _Jewish resistance_, _Polish undergound_ |
Expand Down Expand Up @@ -73,6 +73,8 @@ Human-made object, including softwares.
### CONCEPT
➡ Sometimes an entity, in isolation, can be ambiguous, for example _**British**_. When it refers to the British English language, it's annotated CONCEPT. (issues [#29](https://github.com/kermitt2/grobid-ner/issues/29) and [#30](https://github.com/kermitt2/grobid-ner/issues/30)).

➡ Economical indexes and bonds are CONCEPT, for example **_US Dow Jones Industrial Average_, _CAC40_, _CETES_** [(issue #54)](https://github.com/kermitt2/grobid-ner/issues/19).

---
### CONCEPTUAL

Expand Down Expand Up @@ -223,6 +225,8 @@ For example:
➡ GPS coordinates are a MEASURE (numerical amounts + units), example `50°2′9″N 19°10′42″E`. [(issue #44)](https://github.com/kermitt2/grobid-ner/issues/44)

➡ Credit ratings like **_AA1, AA+_** are MEASURE [(issue #54)](https://github.com/kermitt2/grobid-ner/issues/54).

---
### MEDIA

Expand Down

0 comments on commit 59781a3

Please sign in to comment.