# Preparing Greek Author Names

This notebook creates a list of Greek names that can be used to train a model to recognize Greek and Latin authors.

The names in `input/greek-authors.txt` came from <https://stephanus.tlg.uci.edu/tlgauthors/post_tlg_e.php>.

The names in `input/tlg-greek-authors-cd.txt` came from <https://stephanus.tlg.uci.edu/tlgauthors/cd.authors.php>.

In general, any names that were liable to be mistaken for Roman names (e.g., "Rufus") were omitted. Also omitted were any names of works.

In [1]:
# Use pandas to deal with csv data
import pandas as pd

In [None]:
# Make a list of names from the files
names = []
with open('input/greek-authors.txt', 'r', encoding='UTF-8') as file:
    while line := file.readline():
        author = line.rstrip()
        names.append(author)

with open('input/tlg-greek-authors-cd.txt','r',encoding='UTF-8') as file:
    while line := file.readline():
        author = line.rstrip()
        names.append(author)

In [3]:
# Count the names
len(names)

3868

In [8]:
# Use set() to get the count of unique names
unique_names = set(names)
len(unique_names)

2253

In [12]:
# Strip unwanted characters and change the names to title case
cleaned = [name.strip('<>()[]').title() for name in unique_names]

In [16]:
# Use set() again to deduplicate the names. Use sorted() to alphabetize them.
unique_cleaned = sorted(set(cleaned))

In [17]:
# Print the names for review
for name in unique_cleaned:
    print(name)

Abramius
Abydenus
Acacius
Acacius Sabaita
Acesander
Achaeus
Achilleis Byzantina
Achilles Tatius
Achmet
Acusilaus
Adamantius
Adamantius Coraes
Adamantius Judaeus
Adespota Papyracea, Sh
Adrianus
Aegimius
Aelianus
Aelius Aristides
Aelius Dionysius
Aelius Dius
Aelius Herodianus
Aelius Herodianus Pseudo-Herodianus
Aelius Promotus
Aelius Theon
Aeneas
Aeschines
Aeschines Socraticus
Aeschrion
Aeschylus
Aeschylus Alexandrinus
Aeschylus Atheniensis
Aesopus Aesopica
Aethiopis
Aethlius
Aetius Amidenus
Agaclytus
Agamestor
Agapetus
Agapius Landus
Agathangelus
Agatharchides
Agathemerus
Agathias Scholasticus
Agathocles
Agathodaemon
Agathon
Agathyllus
Agesilaus
Agl(A)Osthenes
Aglais
Agroetas
Albinus
Alcaeus
Alcibiades
Alcidamas
Alcimenes
Alcimenes Atheniensis
Alcimus
Alciphron
Alcmaeon
Alcmaeonis
Alcman
Alexander
Alexander Aphrodisiensis
Alexander Nicaeensis
Alexander Trallianus
Alexandri Magni Epistulae
Alexandrus Iv Papa
Alexarchus
Alexinus
Alexis
Alexius
Alexius Aristenus
Alexius Episcopus Nicaeensi

In [4]:
# Make a dataframe out of the new list of Greek names
df = pd.read_csv('greek-names.txt')

In [5]:
# Add a "Label" column and set the default value to "Greek"
df = df.assign(Label='Greek')

In [6]:
df

Unnamed: 0,Name,Label
0,Abramius,Greek
1,Abydenus,Greek
2,Acacius,Greek
3,Acacius Sabaita,Greek
4,Acesander,Greek
...,...,...
25735,Tucídides ca. 460-ca. 400 a. C,Greek
25736,Tucídides ca. 460-ca. 400 a.,Greek
25737,Xenophon,Greek
25738,Yamblico,Greek


In [7]:
# Save the file to CSV
df.to_csv('greek-authors.csv',index=False)