Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed concurrent access to cache file #146

Merged
merged 1 commit into from Aug 6, 2020

Conversation

thomasperrot
Copy link

An error occures when tldextract is used with multithreading. Some threads try to write the cache file while some threads try to read it. Putting a lock on the read/write operation fixed this issue.

@CalebFenton
Copy link

This will fix my problem also. 👍

@brycedrennan
Copy link
Collaborator

@CalebFenton @thomasperrot #144 might also address this issue.

In my case I needed the lock to work across multiple non-communicating but local processes. I think this method should work for threads as well.

@floer32 floer32 added the hold: a PR is pending a PR is pending that will address the issue (including if it is "X birds 1 stone") label Mar 4, 2019
@floer32
Copy link
Collaborator

floer32 commented Mar 4, 2019

marked as hold because #144 could solve this, and since #144 uses filelock it could be more portable https://filelock.readthedocs.io/en/latest/

@john-kurkowski
Copy link
Owner

I thought that other PR was farther along, but turned out to be contentious and a breaking change. We can merge this one now and easily revert if #144 accomplishes the same. Thank you!

@john-kurkowski john-kurkowski merged commit a6e7277 into john-kurkowski:master Aug 6, 2020
@Srivathsan-Srinivas
Copy link

Srivathsan-Srinivas commented Nov 5, 2020

Is there a way to suppress the tldextract.lock.json messages? I get it on stdout when I do multiprocessing. I am trying to selectively suppress messages on stdout (I have other log messages that I need to see on stdout). Here are the messages I get on stdout:

2020-10-29 20:00:11,750 - filelock - filelock - INFO - Lock 4992537872 released on /Users/sri/p37/lib/python3.7/site-packages/tldextract/.suffix_cache/urls/62bf135d1c2f3d4db4228b9ecaf507a2.tldextract.json.lock
2020-10-29 20:00:11,765 - filelock - filelock - INFO - Lock 4992398992 released on /Users/sri/p37/lib/python3.7/site-packages/tldextract/.suffix_cache/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2020-10-29 20:00:11,790 - filelock - filelock - INFO - Lock 4992506640 acquired on /Users/sri/p37/lib/python3.7/site-packages/tldextract/.suffix_cache/publicsuffix.org-tlds/de84b5ca2167d4c83e38fb162f2e8738.tldextract.json.lock
2020-10-29 20:00:11,796 - filelock - filelock - INFO - Lock 4992529872 acquired on /Users/sri/p37/lib/python3.7/site-packages/tldextract/.suffix_cache/urls/62bf135d1c2f3d4db4228b9ecaf507a2.tldextract.json.lock
2020-10-29 20:00:11,802 - filelock - filelock - INFO - Lock 4992529872 released on /Users/sri/p37/lib/python3.7/site-packages/tldextract/.suffix_cache/urls/62bf135d1c2f3d4db4228b9ecaf507a2.tldextract.json.lock

I tried contextlib, but still unable to suppress:

import io, tldextract, contextlib, os

def get_sld(self, reg_domain_name):
    """
    Get only the second-level domain.
    :param reg_domain_name: sports.cnn.com
    :return: cnn
    """
    domain = reg_domain_name
    #f = io.StringIO()
    with contextlib.redirect_stdout(os.devnull):
        if reg_domain_name:
            try:
                ext = tldextract.extract(reg_domain_name)
            except:
                logger.info(f'Unable to extract domain from {reg_domain_name}. Using given fqdn.')
            else:
                domain = ext.domain

    return domain

Any suggestions?

@john-kurkowski
Copy link
Owner

You're not the only one. Check out tox-dev/filelock#59. Gives code how to manually restrict the log level just for that package.

@Srivathsan-Srinivas
Copy link

Srivathsan-Srinivas commented Nov 6, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hold: a PR is pending a PR is pending that will address the issue (including if it is "X birds 1 stone")
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants