No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
datasets
README.md
annotation_guidelines.html

README.md

Following datasets are found in the datasets folder.


Compositionality ratings for Estonian particle verbs (folder compositionality_ET_PV)

Compositionality ratings for Estonian particle verbs collected via Qualtrics. We collected judgements for 211 PVs asking 110 judges to evaluate how much the meaning of a PV agrees with the meanings of its components. We collected compositionality ratings via Qualtrics, only Estonian native speakers were asked to participate. Each annotator evaluated 21 PVs on a scale from 1 (not at all) to 5 (fully), with the 6th option I don't know included. Each PV was evaluated at least 10, maximum 20 times.

54 PVs got at least once rating "I don't know".

File 211_comp_unit_ratings.csv - ratings for 211 PVs (column 1: particle verb (adverb[space]verb), column 2-21: ratings);

File 157_comp_unit_ratings.csv - ratings for PVs that didn't get any "I don't know" rating (column 1: particle verb (adverb[space]verb), column 2-21: ratings);

File 157_comp_unit_ratings_SD.csv - ratings and standard deviation for PVs that didn't get any "I don't know" rating (column 1: particle verb (adverb[space]verb), column 2-21: ratings, column 22: value of standard deviation);

File 157_comp_unit_ratings_avg_SD.csv - ratings, average compositionality rate, number of rates, and standard deviation for PVs that didn't get any "I don't know" rating (column 1: particle verb (adverb[space]verb), column 2-21: ratings, column 22: average rating, column 23: number of ratings, column 24: value of standard deviation);


Abstractness/concreteness ratings for Estonian words (folder abstractness_ET)

File abstractness_ET.csv - abstractness/concreteness ratings for Estonian lemmas (column 1: WORD (indicates Estonian lemma), column 2: Rating expresses lemma's concreteness/abstractness score (higher score indicates higher concreteness), lemmas are sorted according to the score in descending order (concrete to abstract))


Literal/non-literal usage of Estonian particle verbs (folder literalness_ET_PV)

File literalness_ET_PV.csv - dataset of literal/non-literal usage of Estonian particle verbs, containing 1490 sentences

Format: id;class;avg;sentence;particle;verb;all;objcase;objanimacy;objabs;subjcase;subjabs;nounsabs;subjanimacy;casegovernment

Explanation:

  • id - number of the sentence (value: 1-1838)
  • class - usage (according to the average score by human annotators) (value: literal or non-literal)
  • avg - average literalness score by human annotators(value: numerical)
  • sentence - lemmatized sentence
  • particle - particle of the PV
  • verb - verbal component of the PV
  • all - abstractness score of all nouns in the sentence (value: numerical)
  • objcase - case of the object of the PV in the sentence (value: nom (nominative) OR gen (genitive) OR part (partitive) OR noobj (no object in the sentence))
  • objanimacy - animacy of the object (value: yes - object is alive OR no - object is not alive OR 0 - no object)
  • objabs - abstrcatness score of the object of the PV (value: numerical)
  • subjcase - case of the subject of the PV (value: nom (nominative) OR gen (genitive) OR part (partitive) OR nosubj (no subject in the sentence))
  • subjabs - abstractness score of the subject (value: numerical)
  • nounsabs - average abstractness score of all nouns in the sentence (value: numerical)
  • subjanimacy - animacy of the subject (value: yes - subject is alive OR no - subject is not alive OR 0 - no subject)
  • casegovernment - case of the argument of the verb (value: nom (nominative) OR gen (genitive) OR part (partitive) OR el (elative) OR all (allative) OR ill (illative) OR tr (translative) OR adt (short illative) OR kom (comitative) OR in (inessive) OR abl (ablative) OR ad (adessive) OR es (essive) OR 0 (no government))

File 1481_literalness_ET_PV.csv - dataset of literal/non-literal usage of Estonian particle verbs, contains 1481 sentences

Format: id;class;avg;sentence;particle;verb;unigrams;all;nounsabs;subjabs;objabs;subjcase;objcase;objanimacy;subjanimacy;casegovernment;advfreq;vfreq;pvfreq

Explanation:

  • id - number of the sentence (value: 1-1838)
  • class - usage (according to the average score by human annotators) (value: literal or non-literal)
  • avg - average literalness score by human annotators(value: numerical)
  • sentence - lemmatized sentence
  • particle - particle of the PV
  • verb - verbal component of the PV
  • unigrams - binary classification of sentences based on the unigrams feature (taking account words that appear at least 6 times)
  • all - abstractness score of all nouns in the sentence (value: numerical)
  • nounsabs - average abstractness score of all nouns in the sentence (value: numerical)
  • subjabs - abstractness score of the subject (value: numerical)
  • objabs - abstrcatness score of the object of the PV (value: numerical)
  • subjcase - case of the subject of the PV (value: nom (nominative) OR gen (genitive) OR part (partitive) OR nosubj (no subject in the sentence))
  • objcase - case of the object of the PV in the sentence (value: nom (nominative) OR gen (genitive) OR part (partitive) OR noobj (no object in the sentence))
  • subjanimacy - animacy of the subject (value: yes - subject is alive OR no - subject is not alive OR 0 - no subject)
  • objanimacy - animacy of the object (value: yes - object is alive OR no - object is not alive OR 0 - no object)
  • casegovernment - case of the argument of the verb (value: nom (nominative) OR gen (genitive) OR part (partitive) OR el (elative) OR all (allative) OR ill (illative) OR tr (translative) OR adt (short illative) OR kom (comitative) OR in (inessive) OR abl (ablative) OR ad (adessive) OR es (essive) OR 0 (no government))
  • advfreq - frequency of the adverb (particle) in the corpus
  • vfreq - frequency of the verb in the corpus
  • pvfreq - cooccurrence frequency of the adverb and verb in the corpus

References:

Aedmaa, Eleri. 2017. Exploring Compositionality of Estonian Particle Verbs. In: Lohiniva, Karoliina; Wahle, Johannes (Ed.). Proceedings of the ESSLLI 2017 Student Session, 197−208. 29th European Summer School in Logic, Language & Information, Toulouse, France, July 17-28, 2017. Toulouse, France.

Aedmaa, Eleri; Köper, Maximilian; Schulte im Walde, Sabine. 2018. Combining Abstractness and Language-specific Theoretical Indicators for Detecting Non-Literal Usage of Estonian Particle Verbs. Proceedings of NAACL-HLT 2018: Student Research Workshop: The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, June 2018. Ed. Silvio Ricardo Cordeiro, Shereen Oraby, Umashanthi Pavalanathan, Kyeongmin Rim. New Orleans: Association for Computational Linguistics, 9−16.