-
Notifications
You must be signed in to change notification settings - Fork 68
Closed
Labels
Description
Hi,
I want to do entity linking on a large subset of the c4/en dataset. Since in the current settings, I am able to extract entities for around ~1000 rows/hr using CPU and around 5000 rows/hr using GPU. Is there any way be it batching/multiprocessing or any other suggestions from your side to try out to speed up the process taking into account the size of the c4 dataset? Any advice would be highly appreciated. I was also eager to know if the data is currently stored in memory? Thanks!