You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think this is an interesting idea. Performance of medspacy has been a concern for us but I don't think any of our processes have been able to really stress test the performance. The "extreme example" in the intro is the team I work on and we expect our completed systems to be able to process millions of records per day.
We also work closely with the authors of this paper and they have contributed to other components of medspacy. We can easily ask them if they have any thoughts for speeding up the code inside each component.
I do have a few initial thoughts on it, though:
One of our goals with medspacy was to be able to leverage the optimizations and work that others were doing (which was largely done in python for the broader NLP/ML community). Implementing the specialized trie for FastContext would require fully abandoning the spacy matchers and may need to be entirely custom.
Medspacy components are not currently compatible with spacy's built-in multiprocessing due to (we believe) some missing or incorrect serialization methods, which is most likely a much quicker path to significant performance increases than implementing the current context algorithm. Parallelization is not a substitute for fastcontext, as mentioned in the paper, but may be a quicker fix for performance than changing the component significantly.
We created common internal structures for medspacy's context, sectionizer and target matcher in the last major release. If this change was made, it could possibly be done inside this framework and benefit all three components. This would be more involved that simply implementing fastcontext, but is still a possibility in favor of eventually making the change.
https://www.sciencedirect.com/science/article/pii/S1532046418301576
makes a pretty strong claim of their implementation of ConText being 2 orders of magnitude faster (which I doubt is achievable in spacy) but also more accurate. I think it would be worth trying to match the performance:
https://github.com/jianlins/FastContext
The text was updated successfully, but these errors were encountered: