Tld parsers optimize access #84

Anvil · 2024-05-24T18:06:00Z

Hey there.

While looking at the code, it appeared there was a giant if/elif combo in parse_tld.py, which implied that to reach the "ve" tld parser you need to test all other combinations (linear complexity)

I've moved the parsers to their specific files, and replaced the nearly 150 lines of if/elif by a getattr, which should prove efficient enough to allow access to any RegexFOO class in constant time.

Also, I've made all RegexFOO respect the same interface, so that there's no need for a specific __init__ in each class.. which allowed the removal of another 200 lines. This should be easier to maintain.

There are a few other simplifications, in the branch to reduce the memory consumption.

Hope this helps.

…result computations

pogzyb

Wow yeah this is so much better!

The if/else block was originally a big dictionary, but that was even worse for memory usage. Can't believe I never thought to leverage getattr here.

I will try to get this rolled out in 1.1.3 sometime this weekend - hopefully I don't mess it up again lol. I will probably refactor the tests and maybe reorganize the "parse" related files into a separate dir or something too.

Again, really appreciate your contributions to this library.

Anvil · 2024-05-25T18:40:43Z

Happy to help.

Do you want to keep the python 3.9 compat ? If so, I'll restore the Union i've dropped (Union[t1, t2] replaced by t1 | t2).

(Doing less imports from typing speeds up loading type:

Dict, List, Set, etc. which are deprecated since 3.9 can be replaced by dict, list, set, and so on.
Union which is not deprecated can be by replaced | syntax since 3.10
)

pogzyb · 2024-05-25T23:54:10Z

Happy to help.

Do you want to keep the python 3.9 compat ? If so, I'll restore the Union i've dropped (Union[t1, t2] replaced by t1 | t2).

(Doing less imports from typing speeds up loading type:
* Dict, List, Set, etc. which are deprecated since 3.9 can be replaced by dict, list, set, and so on.

* Union which is _not_ deprecated can be by replaced `|` syntax since 3.10
  )

Yes thank you - Python 3.9 won't be EOL until October 2025, so I'd vote that we keep the old-style typing for the near future.

Other than that some of the parser tests are failing:

RegexNU: This is due to my comment above about tld_specific_expressions not being present in the specific parser classes for RegexPT nor RegexNU.
RegexGQ: Even though this class is inheriting from RegexTK, it's still carrying over it's empty expression dictionary. I added a possible fix below.

class RegexGQ(RegexTK):
    ...
    # tld_specific_expressions: ExpressionDict = {}  # comment out or remove, so parent class expressions are used

After these changes, we should be good to merge!

Anvil added 4 commits May 23, 2024 22:58

use generator instead of list for any() call

484d9c6

refactor blob.lower() in DomainParser.parse to avoid repetitive same-…

d6a679a

…result computations

move TLDParser classes to their own file

e5300c6

remove redundant __init__ methods in TLDParser subclasses

1473fc1

pogzyb self-requested a review May 24, 2024 18:17

pogzyb approved these changes May 24, 2024

View reviewed changes

refactor some parse methods with inheritence

45b5720

Fix NU, PT, and GQ

16e4f9b

pogzyb merged commit b6bdd26 into pogzyb:main May 26, 2024
12 checks passed

pogzyb mentioned this pull request May 26, 2024

Release 1.1.3 #85

Merged

Anvil deleted the tld-parsers-optimize-access branch May 26, 2024 16:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tld parsers optimize access #84

Tld parsers optimize access #84

Anvil commented May 24, 2024

pogzyb left a comment

Anvil commented May 25, 2024 •

edited

Loading

pogzyb commented May 25, 2024 •

edited

Loading

Tld parsers optimize access #84

Tld parsers optimize access #84

Conversation

Anvil commented May 24, 2024

pogzyb left a comment

Choose a reason for hiding this comment

Anvil commented May 25, 2024 • edited Loading

pogzyb commented May 25, 2024 • edited Loading

Anvil commented May 25, 2024 •

edited

Loading

pogzyb commented May 25, 2024 •

edited

Loading