Portuguese word structure fundamentally relies on how vowels (V) and consonants (C) combine to create syllables. As the rhythmic unit of speech, the syllable almost always has a vowel as its core, the main sound. Understanding this vowel-consonant interaction is key to finding the perfect word in the game Termo (the Portuguese version of Wordle).

In [9]:
import os
from collections import Counter
import unicodedata 

In [10]:
FILENAME = 'palavras_5letras.txt' 
TARGET_LENGTH = 5                 

##### Removes accents from a character

In [11]:
def normalize_char(char):
    return ''.join(c for c in unicodedata.normalize('NFD', char)
                   if unicodedata.category(c) != 'Mn')

##### Defining the vowels and consonants

In [22]:
BASE_VOWELS = set('aeiou')
ALPHABET_PT_FULL = set('abcdefghijklmnopqrstuvwxyzç')
CONSONANTS = {char for char in ALPHABET_PT_FULL if char not in BASE_VOWELS}

##### Variables and Counters

In [23]:
max_distinct_vowels = 0
words_with_max_vowels = []

consonant_counter = Counter()

total_words_analyzed = 0
total_consonants_counted = 0
error_message = None # Store error messages

##### Reads the word file, performs vowel and consonant analysis and updates global variables with the results.

In [24]:
def analyze_words():

    global max_distinct_vowels, words_with_max_vowels, consonant_counter
    global total_words_analyzed, total_consonants_counted, error_message
    
    if not os.path.exists(FILENAME):
        error_message = f"Critical Error: The file '{FILENAME}' was not found in the current directory."
        print(error_message)
        return False

    try:
        with open(FILENAME, 'r', encoding='utf-8') as f:
            for line in f:
                original_word = line.strip()

                # Consider only non-empty words with the exact target length
                if original_word and len(original_word) == TARGET_LENGTH:
                    # Normalize the word: lowercase and remove accents
                    normalized_word = "".join(normalize_char(c) for c in original_word.lower())

                    # Additional validation: check if all normalized characters are valid letters
                    if all(c in ALPHABET_PT_FULL for c in normalized_word) and len(normalized_word) == TARGET_LENGTH:
                        total_words_analyzed += 1
                        vowels_in_word = set()

                        # Iterate over each normalized letter in the word
                        for letter in normalized_word:
                            # Vowel Analysis
                            if letter in BASE_VOWELS:
                                vowels_in_word.add(letter)
                            # Consonant Analysis
                            elif letter in CONSONANTS:
                                consonant_counter[letter] += 1
                                total_consonants_counted += 1

                        # Update Distinct Vowel Analysis Results
                        num_distinct_vowels = len(vowels_in_word)
                        if num_distinct_vowels > max_distinct_vowels:
                            max_distinct_vowels = num_distinct_vowels
                            words_with_max_vowels = [original_word]
                        elif num_distinct_vowels == max_distinct_vowels:
                            words_with_max_vowels.append(original_word)

        print(f"Analysis complete. {total_words_analyzed} valid {TARGET_LENGTH}-letter words were processed.")
        return True 

    except FileNotFoundError:
        error_message = f"Critical Error: The file '{FILENAME}' was not found."
        print(error_message)
        return False 
    except Exception as e:
        error_message = f"An unexpected error occurred during file processing: {e}"
        print(error_message)
        return False 

In [25]:
analysis_completed = analyze_words()

Analysis complete. 19082 valid 5-letter words were processed.


##### Insights

Now, let's move on to finding the best starting words for the game. The goal is to identify words that maximize our chances of guessing letters correctly on the first try.

To do this, we'll combine some insights. We're looking for words that not only contain a good number of distinct vowels (to test which vowels are present) but also include some of the most frequently occurring consonants we identified earlier. This strategy aims to gather the most information possible with the initial guess.

##### Prints the results of the distinct vowel analysis

In [31]:
def display_vowel_results():
    
    if error_message and not total_words_analyzed: # Show error only if analysis didn't even start
        print(f"Cannot display results due to a critical error: {error_message}")
    elif total_words_analyzed == 0:
        print(f"No valid {TARGET_LENGTH}-letter words were found or analyzed in the file '{FILENAME}'.")
    else:
        print(f"Total {TARGET_LENGTH}-letter words analyzed: {total_words_analyzed}")
        print("-" * 40)
        if words_with_max_vowels:
            print(f"Highest number of distinct vowels found: {max_distinct_vowels}")
            print(f"Words with {max_distinct_vowels} distinct vowels ({len(words_with_max_vowels)} found):")
            # Print words, 15 per line
            for i in range(0, len(words_with_max_vowels), 15):
                print(", ".join(words_with_max_vowels[i:i+15]))
        else:
            print("No words containing vowels were found.")
        if error_message:
             print(f"\nNote: An error occurred during processing: {error_message}")

In [32]:
display_vowel_results()

Total 5-letter words analyzed: 19082
----------------------------------------
Highest number of distinct vowels found: 4
Words with 4 distinct vowels (132 found):
aboei, aboie, acoei, acuei, adioe, adiou, adoei, aduei, afeio, afeou, afiou, aguei, aiemo, aigue, aioes
airou, aiues, aiune, aiuno, ajoie, aleio, aleou, aliou, aluei, aluio, ameio, ameou, amoie, amuei, aneio
aneou, aoqui, apeio, apeou, apoie, apuei, apuie, aqueo, areio, areou, ariou, ateio, ateou, atoei, atuei
audio, aueti, aueto, aunei, aurei, aureo, ausio, aveio, aviou, avoei, azoei, baiou, caiou, caiue, cuiao
ecoai, eguai, eicou, eimou, eivao, eivou, eixou, eluia, eolia, equio, euria, eurio, faiou, feiao, gaiou
guaie, guaio, guiao, iameu, iaque, iauos, iaupe, iauvo, ideou, iogue, iolau, iuane, iucea, lauie, maeio
maiou, meiao, mueia, odeia, ofaie, oguei, oigue, oleai, oleia, opaie, oquea, oquei, oquie, oreai, oreia
ouari, ougai, ourai, ourei, ousai, ousei, ousia, ousie, outai, outei, ouvia, oviua, ozeai, ozeia, queia
raiou

##### Prints the results of the consonant frequency analysis

In [34]:
def display_consonant_results():

    if error_message and not total_words_analyzed: 
        print(f"Cannot display results due to a critical error: {error_message}")
    elif total_words_analyzed == 0:
         print(f"No valid {TARGET_LENGTH}-letter words were found or analyzed in the file '{FILENAME}'.")
    else:
        print(f"Total {TARGET_LENGTH}-letter words analyzed: {total_words_analyzed}")
        print(f"Total consonants counted: {total_consonants_counted}")
        print("-" * 40)

        if consonant_counter:
            top_10_consonants = consonant_counter.most_common(10)
            print("The 10 most frequent consonants are:")
            print(f"{'Rank':<6} {'Consonant':<12} {'Frequency':<12}")
            print("-" * 40)
            for rank, (consonant, frequency) in enumerate(top_10_consonants, 1):
                print(f"{rank:<6} {consonant:<12} {frequency:<12}")
        elif total_consonants_counted == 0 and total_words_analyzed > 0:
             print("No consonants were found in the analyzed words.")
        else:
             print("No consonants were counted.")
        if error_message: # Display non-critical errors
             print(f"\nNote: An error occurred during processing: {error_message}")


In [35]:
display_consonant_results()

Total 5-letter words analyzed: 19082
Total consonants counted: 48302
----------------------------------------
The 10 most frequent consonants are:
Rank   Consonant    Frequency   
----------------------------------------
1      s            6375        
2      r            5981        
3      m            4282        
4      l            4028        
5      c            3886        
6      n            3465        
7      t            3432        
8      p            2498        
9      b            2471        
10     d            2408        
