Improving Context to follow FastContext #70

MichalMalyska · 2021-05-25T22:51:09Z

https://www.sciencedirect.com/science/article/pii/S1532046418301576
makes a pretty strong claim of their implementation of ConText being 2 orders of magnitude faster (which I doubt is achievable in spacy) but also more accurate. I think it would be worth trying to match the performance:
https://github.com/jianlins/FastContext

turbosheep · 2021-05-26T00:28:32Z

I think this is an interesting idea. Performance of medspacy has been a concern for us but I don't think any of our processes have been able to really stress test the performance. The "extreme example" in the intro is the team I work on and we expect our completed systems to be able to process millions of records per day.

We also work closely with the authors of this paper and they have contributed to other components of medspacy. We can easily ask them if they have any thoughts for speeding up the code inside each component.

I do have a few initial thoughts on it, though:

One of our goals with medspacy was to be able to leverage the optimizations and work that others were doing (which was largely done in python for the broader NLP/ML community). Implementing the specialized trie for FastContext would require fully abandoning the spacy matchers and may need to be entirely custom.
Medspacy components are not currently compatible with spacy's built-in multiprocessing due to (we believe) some missing or incorrect serialization methods, which is most likely a much quicker path to significant performance increases than implementing the current context algorithm. Parallelization is not a substitute for fastcontext, as mentioned in the paper, but may be a quicker fix for performance than changing the component significantly.
We created common internal structures for medspacy's context, sectionizer and target matcher in the last major release. If this change was made, it could possibly be done inside this framework and benefit all three components. This would be more involved that simply implementing fastcontext, but is still a possibility in favor of eventually making the change.

turbosheep added the enhancement New feature or request label May 26, 2021

MichalMalyska closed this as not planned Won't fix, can't repro, duplicate, stale Aug 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving Context to follow FastContext #70

Improving Context to follow FastContext #70

MichalMalyska commented May 25, 2021

turbosheep commented May 26, 2021

Improving Context to follow FastContext #70

Improving Context to follow FastContext #70

Comments

MichalMalyska commented May 25, 2021

turbosheep commented May 26, 2021