Skip to content

mejutoco/german-grammar-statistics

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 

german-grammar-statistics

Top 500,000 words in german with gender. The word frequency was taken from Wikipedia sets for different languages. I believe the german one uses this underneath (https://invokeit.wordpress.com/frequency-word-lists/) The gender was extracted querying different dictionaries for the word.

Format

The csv has 2 columns with word and gender, which can be one of:

  • f: Feminine
  • m: Masculine
  • n: Neutral
  • none: prepositions, conjunctions, etc. do not have a gender
ihm none
koennen none
ich n
sie none
das none
ist n
du none
nicht none
die none
und none
es n
... ...

Usage

This dataset was used for the article about grammar statistics in german It is free to user under the Creative Commons Attribution 4.0 International. (see License) Enjoy.

Releases

No releases published

Packages

No packages published