### **Performing ETL on [quordle](https://quordle.com) game** *(4 word variant of [Wordle](https://www.nytimes.com/games/wordle/index.html))*

Before we start, let's manually look into the source code for the variables where each words `list` is stored.

###############################################################################<br>
1st `list`:

```js script
146 >>>        Y =
147 >>>           "aback abase abate abbey abbot ... young youth zebra zesty zonal".split(
148 >>>                " "
169 >>>            ),
```

###############################################################################<br>
2nd `list`: 

```js script
151 >>>        ms =
152 >>>           "aahed aalii aargh aarti abaca ... zuzim zygal zygon zymes zymic".split(
153 >>>                " "
154 >>>            ),
```

Analysing the entire code *(**quordle.js** file)*, we can find 2 lists of words. Apparently, there is a **valid guesses** `list` and a **valid_answers** `list`.

We can see that the words are stored in variables called `Y` and `ms`, respectively. They are written in a single line of code.

So, let's extract them!

In [3]:
# Import modules
import re

# Define constants
NUMBER_OF_PREVIEW_ELEMENTS = 10
DIVIDER = '#' * 120

# Init variables
words_lists = [None] * 2


def print_words_lists_preview() -> None:
    """Helper function to print a preview of each words lists and their respective length"""

    # Iterate over each words list
    for words_list in words_lists:

        # Check if the words list is not empty
        if words_list:
            print(f"first:\t{words_list[:NUMBER_OF_PREVIEW_ELEMENTS]}") # Print the first words
            print(f"last:\t{words_list[-NUMBER_OF_PREVIEW_ELEMENTS:]}") # Print the last words
            print(f"length:\t{len(words_list)}")                        # Print the length of the words list
            print()  

Let's load the JavaScript source code as a `list` of lines and preview it.

In [4]:
def load_file(filepath: str) -> list:
    """Function to output a list of lines from a file"""

    with open(filepath, "r", encoding="utf-8") as f:
        return f.read().splitlines()


def print_code_preview(src_code: list) -> None:
    """Helper function to print a preview of the code"""

    print(DIVIDER)
    print("FIRST:")

    # Iterate over the first lines
    for line in src_code[:NUMBER_OF_PREVIEW_ELEMENTS]:
        print(f"\t{line}")

    print(DIVIDER)
    print("LAST:")

    # Iterate over the last lines
    for line in src_code[-NUMBER_OF_PREVIEW_ELEMENTS:]:
        print(f"\t{line}")
    
    print(DIVIDER)
    print(f"LENGTH: {len(src_code)}") # Print the code's length


# Load the code
src_code = load_file(filepath="data\\src_code\\quordle.js")

# Preview the first and last lines of code
print_code_preview(src_code=src_code)

########################################################################################################################
FIRST:
	var Qe = (e, s) => () => (s || e((s = { exports: {} }).exports, s), s.exports);
	var Ce = (e, s, t) =>
	    new Promise((l, i) => {
	        var n = (a) => {
	                try {
	                    c(t.next(a));
	                } catch (r) {
	                    i(r);
	                }
	            },
########################################################################################################################
LAST:
	                            return h(Pa, {});
	                        },
	                    });
	                },
	            }),
	        document.getElementById("root")
	    );
	});
	export default Fa();
	//# sourceMappingURL=index.3754bfc4.js.map
########################################################################################################################
LENGTH: 2838


Let's extract both words `list`:

+ We can detect their start based on their opening string: `<var_name> =`
+ We can detect their end based on their closing string: `.split`

In [32]:
def extract_words_from_list(src_code: list, opening_string: str, closing_string: str) -> list:
    """Function to extract words from a list in the source code"""

    # Init variables
    store_words = False # Flag to wheter store words or not

    # Iterate over the source code's lines
    for line in src_code:
        
        # If store_words flag is set, find words in the line
        if store_words and closing_string in line:
            break
        
        # Reset the store_words flag
        store_words = False    
        
        # If line contains the variable name and the opening char, set the flag to store words
        if opening_string in line:
            store_words = True

    # Perform RegEx to search to extract words
    matches = re.findall(r'[a-z]{5}[ ,"]', line)

    # Return the output words list
    return matches

# Extract each words list
words_lists[0] = extract_words_from_list(src_code=src_code, opening_string='Y =', closing_string='split')
words_lists[1] = extract_words_from_list(src_code=src_code, opening_string='ms =', closing_string='split')

# Print the words lists previews
print_words_lists_preview()

first:	['aback ', 'abase ', 'abate ', 'abbey ', 'abbot ', 'abhor ', 'abide ', 'abled ', 'abode ', 'abort ']
last:	['wryly ', 'yacht ', 'yearn ', 'yeast ', 'yield ', 'young ', 'youth ', 'zebra ', 'zesty ', 'zonal"']
length:	2315

first:	['aahed ', 'aalii ', 'aargh ', 'aarti ', 'abaca ', 'abaci ', 'abacs ', 'abaft ', 'abaka ', 'abamp ']
last:	['zulus ', 'zupan ', 'zupas ', 'zuppa ', 'zurfs ', 'zuzim ', 'zygal ', 'zygon ', 'zymes ', 'zymic"']
length:	10657



Before proceeding, we need to remove the trailing whitespaces and quotes.

In [35]:
def strip_word_list(word_list: list) -> list:
    """Function to strip the words list"""

    # Init variables
    stripped_words_list = [] # Output words list

    # Iterate over the words list
    for word in word_list:
        
        # Replace both quotes and spaces
        stripped_word = word.replace('"', '').replace(' ', '')

        # Append the stripped word to the stripped words list
        stripped_words_list.append(stripped_word)

    # Return the stripped words list
    return stripped_words_list

# Strip each words list
for i, _ in enumerate(words_lists):
    words_lists[i] = strip_word_list(word_list=words_lists[i])

# Print the words lists previews
print_words_lists_preview()

first:	['aback', 'abase', 'abate', 'abbey', 'abbot', 'abhor', 'abide', 'abled', 'abode', 'abort']
last:	['wryly', 'yacht', 'yearn', 'yeast', 'yield', 'young', 'youth', 'zebra', 'zesty', 'zonal']
length:	2315

first:	['aahed', 'aalii', 'aargh', 'aarti', 'abaca', 'abaci', 'abacs', 'abaft', 'abaka', 'abamp']
last:	['zulus', 'zupan', 'zupas', 'zuppa', 'zurfs', 'zuzim', 'zygal', 'zygon', 'zymes', 'zymic']
length:	10657



Let's make sure, there are no duplicated words inside each list. We are going to convert each `list` into a `set`, which will get rid of duplicates.

Then, we will check if there are duplicates across both lists.

And, finally, each `set` will be converted back to a `list`.

*(Duplicated words would unbalance the probabilities of each word being randomly picked during the game execution)*

In [36]:
words_lists[0] = set(words_lists[0])
words_lists[1] = set(words_lists[1])

print(f"Duplicated words across lists: {list(words_lists[0] & words_lists[1])}")

words_lists[0] = list(words_lists[0])
words_lists[1] = list(words_lists[1])

Duplicated words across lists: []


Great, no words are duplicated across lists.

Now, taking into consideration that the **valid guesses** list will always be larger than the **valid answers** list, we can store them *(alphabetically sorted)* on `.txt` files.

The words in the first list `words_lists[0]` are our **valid answers** and the ones in the second list `words_lists[1]` are our **valid guesses**.

In [37]:
def write_file(filepath: str, words: list) -> None:
    """Function to write a list of words to a file"""

    with open(filepath, "w") as f:
        for word in words:
            f.write(f"{word}\n")

write_file("data\\words\\valid_answers.txt", sorted(words_lists[0]))
write_file("data\\words\\valid_guesses.txt", sorted(words_lists[1]))