## 1. Testing the CLTK macronizer

In [1]:
from cltk.prosody.lat.macronizer import Macronizer

In [2]:
# Initialize the Macronizer instance
macronizer = Macronizer(tagger="tag_ngram_123_backoff")

In [27]:
words = ["vero", 
         "dicebatur", 
         "subiectis",
         "essent",
         "orationem", # idx: 6173
         "integra", # idx: 6176
         "deprehensus", # idx: 6735
         "zizania",# idx: 8597
         "comprehendat",
         "venerunt",
         "diuisione", # idx: 19980,
         "sublimis" # idx: 11967
        ]
print("Macronized words:")
for word in words:
    macronized_word = macronizer.macronize_text(word) 
    print(macronized_word)

Macronized words:
vērō
dīcēbātur
subiectis
essent
ōrātiōnem
integra
dēprehēnsus
zizania
comprehendat
vēnērunt
diuisione
sublīmis


## 2. Testing CLTK scansion tool

In [3]:
from cltk.prosody.lat.hexameter_scanner import HexameterScanner

In [4]:
scanner = HexameterScanner()

In [9]:
print(HexameterScanner().scan(
"ueram uitam extulit , habitu uelut nostrā ").scansion) 

-U    UU    -  U U     U U   UU -   -   - 


In [8]:
print(HexameterScanner().scan(
"veram vitam extulit , habitu velut nostrā ").scansion) 

 - -   -    -  - -     - - -  U U   -   - 


=> It seems that the non-normalized version is scanned more correctly.

#### Breve

In [20]:
print(HexameterScanner().scan(
"nōvit glōrĭam . quīdem ūnde etĭam vīta").scansion) 




In [21]:
print(HexameterScanner().scan(
"nōvit glōriam . quīdem ūnde etiam vīta").scansion) 

 - -    - --      - -  -    - UU   - U


=> There seems to be an issue with short macrons

#### Elisions

In [16]:
print(HexameterScanner().scan(
"mātrī , sīve īllīs quībus dīgnatur īpse").scansion) 

 -  -    -   -  -    - -   -  U U  -  U


=> This only yields 12 (12 is ok).
* But there is an elision: sive --> illis

"valid = False" when:
*  if verse.syllable_count < 12:
            verse.valid = False
* verse.scansion = self.produce_scansion(stresses, syllables_wspaces, offset_map)
        if len(
            string_utils.stress_positions(self.constants.STRESSED, verse.scansion)
        ) != len(set(stresses)):
            verse.valid = False
            verse.scansion_notes += [self.constants.NOTE_MAP["invalid syllables"]]
            return verse

  => **Compares the number of stressed syllables in the scansion the tool generates to the number of unique stresses that are provided in the line.**

Source: https://docs.cltk.org/en/latest/_modules/cltk/prosody/lat/hexameter_scanner.html#HexameterScanner


In [26]:
verse = scanner.scan("monstrum horrendum, informe, ingens, cui lumen ademptum")
print(verse.scansion)
print(verse.valid)     

 -        -  -      -  -     -  -      -  - U  U -   U 
True


In [29]:
verse = scanner.scan("īpsum , eūntēs , vivite , pater , cūrrite bona")
print(verse.scansion)
print(verse.valid)     

-       --  -     U U U    U -     -  U U  - U
False


In [55]:
verse = scanner.scan("mātrī , sīve īllīs quībus dīgnatur īpse")
print(verse.scansion)
print(verse.valid)     

 -  -    -   -  -    - -   -  U U  -  U
False


In [60]:
verse = scanner.scan("quīdem spīritus patris , nisi spīritus sānctus")
print(verse.scansion)
print(verse.valid)     

  - -    - - -   -  -   - -   - - -   -   U 
False


In [66]:
verse = scanner.scan("vōbīs mūlta , quīa tālibus hōstiis , sīve")
print(verse.scansion)
print(verse.valid)     

 - -   -  U     UU  - U U   -  --     - U
False


=> The breves are not identified.

In [58]:
verse = scanner.scan("mātrī sīve īllīs quībus dīgnatur īpse")

print("Original:", verse.original)
print("Scansion:", verse.scansion)
print("Meter:", verse.meter)
print("Valid:", verse.valid)

# Print all available attributes
print("\nAll attributes of the Verse object:")
for attr in dir(verse):
    if not attr.startswith('__'):
        try:
            value = getattr(verse, attr)
            print(f"{attr}: {value}")
        except:
            print(f"{attr}: <unable to retrieve>")

# Analyze the scansion pattern
print("\nAnalysis of scansion pattern:")
feet = [verse.scansion[i:i+2] for i in range(0, len(verse.scansion), 2)]
for i, foot in enumerate(feet, 1):
    print(f"Foot {i}: {foot}")

# Check if the last foot is anceps (allows both long and short)
if verse.scansion.endswith('U') or verse.scansion.endswith('X'):
    print("Last foot is anceps (as expected in hexameter)")
else:
    print("Last foot is not anceps")

# Count long and short syllables
long_syllables = verse.scansion.count('-')
short_syllables = verse.scansion.count('U')
print(f"\nLong syllables: {long_syllables}")
print(f"Short syllables: {short_syllables}")

# Check total number of syllables
total_syllables = long_syllables + short_syllables
print(f"Total syllables: {total_syllables}")
if total_syllables != 17:
    print("Warning: A proper hexameter should have 17 syllables (including the anceps)")

Original: mātrī sīve īllīs quībus dīgnatur īpse
Scansion:  -  -  -   -  -    - -   -  U U  -  U
Meter: hexameter
Valid: False

All attributes of the Verse object:
accented: 
meter: hexameter
original: mātrī sīve īllīs quībus dīgnatur īpse
scansion:  -  -  -   -  -    - -   -  U U  -  U
scansion_notes: ['Inverted amphibrachs corrected.']
syllable_count: 12
syllables: ['mā', 'trī', 'sīv', 'īl', 'līs', 'qui', 'būs', 'dī', 'gna', 'tur', 'īp', 'se']
valid: False
working_line: mātrī sīv  īllīs quibūs dīgnatur īpse

Analysis of scansion pattern:
Foot 1:  -
Foot 2:   
Foot 3: - 
Foot 4:  -
Foot 5:   
Foot 6:  -
Foot 7:   
Foot 8: - 
Foot 9:   
Foot 10:  -
Foot 11:  -
Foot 12:   
Foot 13:  -
Foot 14:   
Foot 15: U 
Foot 16: U 
Foot 17:  -
Foot 18:   
Foot 19: U
Last foot is anceps (as expected in hexameter)

Long syllables: 9
Short syllables: 3
Total syllables: 12


#### Poetry generation with poetry data with CLTK

In [5]:
print(HexameterScanner().scan(
"stat adoratas istrahelitarum formas").scansion) 

  -  - - - -  -   - - - U U   -  U 


In [7]:
print(HexameterScanner().scan(
"zephyros perpetuo uigor stheneleius").scansion) 

 -  - -   -  - -  -- -     U U - U 


In [8]:
print(HexameterScanner().scan(
"non deteriore pyras silex mihi crescit ad undas").scansion) 

 -   U U -U U  - U   U -   U U   -  U  U  -  U 
