Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pos-stats #17

Closed
HLasse opened this issue Sep 22, 2021 · 4 comments
Closed

Optimize pos-stats #17

HLasse opened this issue Sep 22, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@HLasse
Copy link
Owner

HLasse commented Sep 22, 2021

Calculating POS stats seems to slow things down significantly. TODO:

  • Profile the package, what causes the slowdown?
  • Calculate the sum of values once and then call in the dict comprehension in PosStatistics

Ideas for speedup:

  • Identify which pos_tags the model can make and predefine the counter/dictionary with those keys (would also solve the issue of different numbers of keys across docs/sentences)
  • Alternatives to Counter?

Other options:

  • Remove posstats from default TextDescriptives and make it an optional component that takes in which specific POS tags the user is interested in and extracts those (+ 'others')
@HLasse HLasse added the enhancement New feature or request label Sep 22, 2021
@HLasse
Copy link
Owner Author

HLasse commented Sep 22, 2021

A job for you, @martbern ?

@MartinBernstorff
Copy link
Contributor

Very well might be! Have quite a few open loops atm – what's the priority on this?

@HLasse
Copy link
Owner Author

HLasse commented Sep 22, 2021

Profiling has fairly high priority - if pos-stats really slows things down as much as i fear, then it should be removed from the default TextDescriptives pipe pretty soon. Optimizing it is not a huge priority atm

@MartinBernstorff
Copy link
Contributor

Okay, I've been doing some digging! I've been unable to find any component that dramatically improves runtime.

Attached is my profiling script and its profile from cProfile: https://www.dropbox.com/sh/s60k7894ufva5vf/AADSpcN5OaDkg5rFDIPAAdLwa?dl=0

I recommend snakeviz for visualisation 👍

Let me know if there's anything I can add!

@HLasse HLasse closed this as completed Oct 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants