Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and Cleanup #28

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Refactor and Cleanup #28

wants to merge 2 commits into from

Conversation

LopeKinz
Copy link

No description provided.

Sourcery AI and others added 2 commits May 23, 2023 07:45
Copy link

@desterhuizen desterhuizen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see the majority of the changes here are mainly cleaning up the code. I feel that a few small changes could speed up the operations even more.

Comment on lines 47 to +54
def remove_duplicates_from_file(infile_path, outfile_path="temp.000000000.bopscrk"):
lines_seen = set() # holds lines already seen
outfile = open(outfile_path, "w")
infile = open(infile_path, "r")
for line in infile:
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)
outfile.close()
with open(outfile_path, "w") as outfile:
infile = open(infile_path, "r")
for line in infile:
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend sorting the data read here from infile_path before executing the loop, this will remove the need for a set(), reducing the memory footprint of this function. The conditional check for a duplicate is a check if two strings are equal with only accessing those memory locations for comparison, where a set has to verify all values in the set that is does not exist (Many loops even if they are C level loops!).

I use sorted here but if the file is already sorted that can be removed.

def remove_duplicates_from_file(infile_path, outfile_path="temp.000000000.bopscrk"):
    last_line = '' # holds lines already seen
    with open(outfile_path, "w") as outfile:
        infile = open(infile_path, "r")
        for line in sorted(infile):
            if line != last_line:  # not a duplicate
                outfile.write(line)
                last_line = line  

Comment on lines 26 to 33
def remove_by_lengths(wordlist, min_length, max_length):
'''expect a list, return a new list with the values between min and max length provided'''
new_wordlist = []
for word in wordlist:
#if (len(str(word)) < min_length) or (len(str(word)) > max_length): wordlist.remove(word)
if (len(str(word)) >= min_length) and (len(str(word)) <= max_length): new_wordlist.append(str(word))
return new_wordlist No newline at end of file
return [
str(word)
for word in wordlist
if (len(str(word)) >= min_length) and (len(str(word)) <= max_length)
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make use of the filter function in python. it may make it slightly cleaner and can remove some additional calculations

def length_filter(word, min, max):
   word_length = len(str(word))
   return word_length >= min and word_length <= max

def remove_by_lengths(wordlist, min_length, max_length):
    '''expect a list, return a new list with the values between min and max length provided'''
    return filter(lambda word: length_filter(word, min_length, max_length) , wordlist)

@@ -69,18 +64,15 @@ def case_transforms(word):
def leet_transforms(word):
new_wordlist = []

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to a set, make the list unique. no need to remove the duplicates at the end, this applies to almost all the other generate and transform funcions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants