# Word Usage

Determines New Testament word usage.

## Load MorphGNT and Lexemes DataFrames

Loads the `DF_MORPHGNT` and `DF_LEXEMES` DataFrames.

Saves the total word count as `TOTAL_WORD_COUNT`

DF_MORPHGNT:
* Scripture Reference
* Part of Speech Code
* Inflection Codes
* Text
* Word
* Normalized Word
* Lemma
* Book
* Chapter
* Verse
* Part of Speech
* Person
* Tense
* Voice
* Mood
* Case
* Number
* Gender
* Degree

DF_LEXEMES:
* Lexeme (index)
* Part of Speech Code
* Full Citation Form
* BDAG Entry
* Danker Entry
* Dodson Entry
* Mounce Entry
* Strongs
* GK
* Dodson Part of Speech Code
* Gloss
* Mounce MorphCat
* Part of Speech

In [1]:
import pandas as pd
from pprint import pprint

DF_MORPHGNT = pd.read_csv("morphgnt.csv", index_col="Index")

DF_LEXEMES = pd.read_csv("lexemes.csv", index_col="Lemma")

TOTAL_WORD_COUNT = len(DF_MORPHGNT)

TOTAL_LEXEME_COUNT = len(DF_LEXEMES)


# print("===== DF_MORPHGNT")
# print(DF_MORPHGNT.__class__.__name__)
# print("-----")
# pprint(vars(DF_MORPHGNT))
# print("-----")
# pprint(DF_MORPHGNT)

# print("===== DF_LEXEMES")
# print(DF_LEXEMES.__class__.__name__)
# print("-----")
# pprint(vars(DF_LEXEMES))
# print("-----")
# pprint(DF_LEXEMES)

# print("===== TOTAL_WORD_COUNT")
# print(TOTAL_WORD_COUNT)

# print("===== TOTAL_LEXEME_COUNT")
# print(TOTAL_LEXEME_COUNT)

## Determine Lemma Word Counts

Creates `S_LEMMA_WORD_COUNTS` containing the word count for each lemma.

In [2]:
S_LEMMA_WORD_COUNTS = DF_MORPHGNT.groupby("Lemma").size()

# print("===== S_LEMMA_WORD_COUNTS")
# print(S_LEMMA_WORD_COUNTS.__class__.__name__)
# print("-----")
# pprint(vars(S_LEMMA_WORD_COUNTS))
# print("-----")
# pprint(S_LEMMA_WORD_COUNTS)

## Create DataFrame for Lemma Analysis (DF_ANALYSIS)

DF_ANALYSIS:
* Lemma (index)
* Word Count

In [3]:
DF_ANALYSIS = S_LEMMA_WORD_COUNTS.to_frame(name="Word Count")
DF_ANALYSIS.index.name = "Lemma"

# print("===== DF_ANALYSIS")
# print(DF_ANALYSIS.__class__.__name__)
# print("-----")
# pprint(vars(DF_ANALYSIS))
# print("-----")
# pprint(DF_ANALYSIS)

## Add Word Percentage to Analysis (DF_ANALYSIS)

DF_ANALYSIS:
* Lemma (index)
* Word Count
* Word Percentage (new)

In [4]:
DF_ANALYSIS["Word Percentage"] = (DF_ANALYSIS["Word Count"] / TOTAL_WORD_COUNT) * 100

# print("===== DF_ANALYSIS")
# print(DF_ANALYSIS.__class__.__name__)
# print("-----")
# pprint(vars(DF_ANALYSIS))
# print("-----")
# pprint(DF_ANALYSIS)

## Add Cumulative Percentage Column (DF_ANALYSYS)

DF_ANALYSIS:
* Lemma (index)
* Word Count
* Word Percentage
* Word Percentage Cumulative (new)
* Word Index (new)

In [5]:
DF_ANALYSIS = DF_ANALYSIS.sort_values("Word Percentage", ascending=False)
DF_ANALYSIS["Word Index"] = range(len(DF_ANALYSIS))
DF_ANALYSIS["Word Percentage Cumulative"] = DF_ANALYSIS["Word Percentage"].cumsum()

# DF_ANALYSIS = DF_ANALYSIS[DF_ANALYSIS["Word Index"] < 100]

# print("===== DF_ANALYSIS")
# print(DF_ANALYSIS.__class__.__name__)
# print("-----")
# pprint(vars(DF_ANALYSIS))
# print("-----")
# pprint(DF_ANALYSIS)

## Merge Analysis with Lexemes (DF_MERGED)

In [6]:
DF_MERGED = DF_ANALYSIS.join(DF_LEXEMES)

# print("===== DF_MERGED")
# print(DF_MERGED.__class__.__name__)
# print("-----")
# pprint(vars(DF_MERGED))
# print("-----")
# pprint(DF_MERGED)

## Display Word Analysis

In [7]:
DF_MERGED_REORDERED = DF_MERGED.reindex(
    columns=[
        "Word Index",
        "BDAG Entry",
        "Dodson Entry",
        "Part of Speech",
        "Gloss",
        "Strongs",
        "GK",
        "Word Count",
        "Word Percentage",
        "Word Percentage Cumulative",
    ]
)
DF_MERGED_REORDERED.style.hide(axis="index").set_properties(
    subset=["Gloss", "BDAG Entry", "Dodson Entry", "Part of Speech"],
    **{"text-align": "left"}
).set_table_styles([{"selector": "th", "props": [("text-align", "left")]}]).bar(
    subset=["Word Percentage Cumulative"], vmax=100
)

Word Index,BDAG Entry,Dodson Entry,Part of Speech,Gloss,Strongs,GK,Word Count,Word Percentage,Word Percentage Cumulative
0,ὁ,"ὁ, ἡ, τό",Definite Article,the,3588,3836,19769,14.37181,14.37181
1,καί,καί,Conjunction,"and, even, also, namely",2532,2779,8973,6.523256,20.895067
2,αὐτός,"αὐτός, αὐτή, αὐτό",Personal Pronoun,"he, she, it, they, them, same",846,899,5546,4.031871,24.926938
3,σύ,"σύ, σοῦ, σοί, σέ",Personal Pronoun,you,4771,5148,2894,2.103901,27.030839
4,δέ,δέ,Conjunction,"but, on the other hand, and",1161,1254,2766,2.010847,29.041685
5,ἐν,ἐν,Preposition,"in, on, among",1722,1877,2733,1.986856,31.028542
6,ἐγώ,ἐγώ,Personal Pronoun,I,1473,1609,2572,1.869811,32.898353
7,εἰμί,εἰμί,Verb,"I am, exist",1510,1639,2456,1.785481,34.683833
8,λέγω,λέγω,Verb,"I say, speak",3004,3306,2345,1.704785,36.388618
9,εἰς,εἰς,Preposition,"into, in, among, till, for",1519,1650,1754,1.275136,37.663754
