Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use ThreadPool instead of usual Pool #34

Open
VolodyaCO opened this issue Mar 26, 2021 · 0 comments
Open

Use ThreadPool instead of usual Pool #34

VolodyaCO opened this issue Mar 26, 2021 · 0 comments

Comments

@VolodyaCO
Copy link

I tried p_tqdm to do multiprocessing within a function. This works extremely slowly:

import spacy
from pathos.pools import ThreadPool as Pool
import time
from p_tqdm import p_map

# Install with python -m spacy download es_core_news_sm
nlp = spacy.load("es_core_news_sm")

def preworker(text, nlp):
    return [w.lemma_ for w in nlp(text)]

worker = lambda text: preworker(text, nlp)

texts = ["Este es un texto muy interesante en español"] * 1000

st = time.time()
pool = Pool(3)
r = pool.map(worker, texts)
print(f"Usual pool took {time.time()-st:.3f} seconds")

def out_worker(texts, nlp):
    worker = lambda text: preworker(text, nlp)
    pool = Pool(3)
    return pool.map(worker, texts)

st = time.time()
r = out_worker(texts, nlp)
print(f"Pool within a function took {time.time()-st:.3f} seconds")

def out_worker_tqdm(texts, nlp):
    worker = lambda text: preworker(text, nlp)
    return p_map(worker, texts)

st = time.time()
r = out_worker_tqdm(texts, nlp)
print(f"p_tqdm within a function took {time.time()-st:.3f} seconds")

def out_worker2(texts, nlp, pool):     
    worker = lambda text: preworker(text, nlp)     
    return pool.map(worker, texts)

st = time.time()
pool = Pool(3) 
r = out_worker2(texts, nlp, pool)
print(f"Pool passed to a function took {time.time()-st:.3f} seconds")

The output is

Usual pool took 0.052 seconds
Pool within a function took 0.062 seconds
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:08<00:00,  1.23it/s]
p_tqdm within a function took 8.341 seconds
Pool passed to a function took 0.055 seconds

I got the tip of using threadpool instead of the ususal pool (I guess p_tqdm uses the usual pool underneath, but I haven't checked) from pathos author here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant