# German worldist for quordle

The objective is to generate two wordlists for use at [Quordle](https://www.quordle.com/).

I'm using the the [DeReKo](https://www.ids-mannheim.de/digspra/kl/projekte/methoden/derewo) word list from IDS Mannheim.

In [1]:
import pandas as pd
import re

In [2]:
# generate translation table for umlauts
umlauts = str.maketrans({
    'ä': 'ae',
    'ö': 'oe',
    'ü': 'ue',
    'ß': 'ss'
})

In [3]:
dereko = pd.read_table('/Users/felixpuetsch/Downloads/DeReKo-2014-II-MainArchive-STT.100000.freq', quoting=3, names=['word','base','gram','freq'])
dereko.shape

(100000, 4)

In [4]:
dereko.head(10)

Unnamed: 0,word,base,gram,freq
0,",",",","$,",500367700.0
1,.,.,$.,481370200.0
2,der,die,ART,241408400.0
3,die,die,ART,188943900.0
4,und,und,KON,186351600.0
5,"""","""",$(,156259600.0
6,in,in,APPR,140040900.0
7,den,die,ART,90221590.0
8,),),$(,87781680.0
9,(,(,$(,86235020.0


## Full list

Generate the full list based on all words incl. inflections

In [13]:
dereko_full = dereko.loc[~dereko.gram.isin(['NE']), ['word','freq']].copy()
dereko_full.word = dereko_full.word.str.lower().str.translate(umlauts)
dereko_full = dereko_full.groupby('word').freq.sum().filter(regex='^[a-z]{5}$').sort_values(ascending=False)
print(dereko_full.shape)
quordle_full = dereko_full.sort_index().reset_index().word
quordle_full.head(15)

(2483,)


0     abbau
1     abend
2     abgab
3     abhob
4     abkam
5     about
6     abruf
7     abtei
8     abtes
9     abtun
10    abzog
11    abzug
12    achse
13    achte
14    acker
Name: word, dtype: object

In [10]:
quordle_full.to_csv('quordle-full.csv', index=False, encoding='utf-8', header=None)

## Answers

Generate the list of answers only from the base forms of the words

In [14]:
dereko_answer = dereko.loc[~dereko.gram.isin(['NE']), ['base','freq']].copy()
dereko_answer.base = dereko_answer.base.str.lower().str.translate(umlauts)
dereko_answer = dereko_answer.groupby('base').freq.sum().filter(regex='^[a-z]{5}$').sort_values(ascending=False)
print(dereko_answer.shape)
quordle_answer = dereko_answer.sort_index().reset_index().base
quordle_answer.head(15)

(1546,)


0     abbau
1     abend
2     abgas
3     abruf
4     abtei
5     abtun
6     abweg
7     abzug
8     achse
9     acker
10    acryl
11    adeln
12    adieu
13    adler
14    adlig
Name: base, dtype: object

In [12]:
quordle_answer.to_csv('quordle-answer.csv', index=False, encoding='utf-8', header=None)