Refactor and Cleanup #28

LopeKinz · 2023-05-23T07:46:37Z

No description provided.

Sourcery refactored master branch

desterhuizen

I do see the majority of the changes here are mainly cleaning up the code. I feel that a few small changes could speed up the operations even more.

desterhuizen · 2024-09-08T07:18:13Z

bopscrk/modules/auxiliars.py

 def remove_duplicates_from_file(infile_path, outfile_path="temp.000000000.bopscrk"):
    lines_seen = set()  # holds lines already seen
-    outfile = open(outfile_path, "w")
-    infile = open(infile_path, "r")
-    for line in infile:
-        if line not in lines_seen:  # not a duplicate
-            outfile.write(line)
-            lines_seen.add(line)
-    outfile.close()
+    with open(outfile_path, "w") as outfile:
+        infile = open(infile_path, "r")
+        for line in infile:
+            if line not in lines_seen:  # not a duplicate
+                outfile.write(line)
+                lines_seen.add(line)


I would recommend sorting the data read here from infile_path before executing the loop, this will remove the need for a set(), reducing the memory footprint of this function. The conditional check for a duplicate is a check if two strings are equal with only accessing those memory locations for comparison, where a set has to verify all values in the set that is does not exist (Many loops even if they are C level loops!).

I use sorted here but if the file is already sorted that can be removed.

def remove_duplicates_from_file(infile_path, outfile_path="temp.000000000.bopscrk"): last_line = '' # holds lines already seen with open(outfile_path, "w") as outfile: infile = open(infile_path, "r") for line in sorted(infile): if line != last_line: # not a duplicate outfile.write(line) last_line = line

desterhuizen · 2024-09-08T07:41:46Z

bopscrk/modules/excluders.py

 def remove_by_lengths(wordlist, min_length, max_length):
    '''expect a list, return a new list with the values between min and max length provided'''
-    new_wordlist = []
-    for word in wordlist:
-        #if (len(str(word)) < min_length) or (len(str(word)) > max_length): wordlist.remove(word)
-        if (len(str(word)) >= min_length) and (len(str(word)) <= max_length): new_wordlist.append(str(word))
-    return new_wordlist
+    return [
+        str(word)
+        for word in wordlist
+        if (len(str(word)) >= min_length) and (len(str(word)) <= max_length)
+    ]


I would make use of the filter function in python. it may make it slightly cleaner and can remove some additional calculations

def length_filter(word, min, max): word_length = len(str(word)) return word_length >= min and word_length <= max def remove_by_lengths(wordlist, min_length, max_length): '''expect a list, return a new list with the values between min and max length provided''' return filter(lambda word: length_filter(word, min_length, max_length) , wordlist)

desterhuizen · 2024-09-08T07:43:42Z

bopscrk/modules/transforms.py

@@ -69,18 +64,15 @@ def case_transforms(word):
 def leet_transforms(word):
    new_wordlist = []


Changing this to a set, make the list unique. no need to remove the duplicates at the end, this applies to almost all the other generate and transform funcions.

Sourcery AI and others added 2 commits May 23, 2023 07:45

'Refactored by Sourcery'

77ac385

Merge pull request #1 from LopeKinz/sourcery/master

6cffa60

Sourcery refactored master branch

desterhuizen reviewed Sep 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor and Cleanup #28

Refactor and Cleanup #28

LopeKinz commented May 23, 2023

desterhuizen left a comment

desterhuizen Sep 8, 2024

desterhuizen Sep 8, 2024

desterhuizen Sep 8, 2024

		@@ -69,18 +64,15 @@ def case_transforms(word):
		def leet_transforms(word):
		new_wordlist = []

Refactor and Cleanup #28

Are you sure you want to change the base?

Refactor and Cleanup #28

Conversation

LopeKinz commented May 23, 2023

desterhuizen left a comment

Choose a reason for hiding this comment

desterhuizen Sep 8, 2024

Choose a reason for hiding this comment

desterhuizen Sep 8, 2024

Choose a reason for hiding this comment

desterhuizen Sep 8, 2024

Choose a reason for hiding this comment