Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow cache_file argument to also accept list #204

Closed
hibare opened this issue Jul 23, 2020 · 9 comments
Closed

Allow cache_file argument to also accept list #204

hibare opened this issue Jul 23, 2020 · 9 comments

Comments

@hibare
Copy link

hibare commented Jul 23, 2020

Problem: We have an application, which perform TLD operation using celery workers. As we have a couple of celery workers, whenever cache_file update is called, it only updates the file in instances of the celery worker which picked up that task.
So, there is content difference across all the celery instances.

If the tldextract can accept list as cache_file argument, essentially that list can be stored in redis and any worker can pick up easily.

@john-kurkowski
Copy link
Owner

Relatedly, #144 changes the cache file name to a directory name.

@john-kurkowski
Copy link
Owner

Can you say more why the list fixes your issue?

Is this a worker ops issue? Does this pseudocode work?

# Instead of …
all_celery_workers.enqueue(`tldextract --update`)

# Do this …
for worker in all_celery_workers:
  worker.exec(`tldextract --update`)

@hibare
Copy link
Author

hibare commented Nov 6, 2020

This will not be efficient if a worker disconnects temporary or a new worker is added between the update gap time

@john-kurkowski
Copy link
Owner

Ok, can you say more why the list fixes your issue? What would it look like?

@hibare
Copy link
Author

hibare commented Nov 8, 2020

Assume there are 5 celery workers and we are updating list independent of this module. This provide the flexibility to cache the latest PSL data into a cache such as redis. We can pass list from cache to the module without worrying worker has the update

@john-kurkowski
Copy link
Owner

To be clear, what would it look like?

@john-kurkowski
Copy link
Owner

In the meantime, suffix_list_urls can be a local file. You could dump raw PSL text content from your Redis into a tempfile. Pass that tempfile path in suffix_list_urls.

@hibare
Copy link
Author

hibare commented Nov 16, 2020

yeah, my current method is this

@john-kurkowski
Copy link
Owner

Closing due to lack of response why the list fixes the issue / what it would look like

@john-kurkowski john-kurkowski closed this as not planned Won't fix, can't repro, duplicate, stale Dec 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants