Make markovify.Text accept a file-like object or generator to reduce memory footprint when using large files? #54

Oliver2213 · 2017-01-21T02:23:33Z

I've written a script to recursively find all files with a given extension, generate a chain for each, and (once all files have an associated chain), combine them into one mega-chain and store it.
I'm running this on a vary large directory (~1.4 G), and while coding my script I was aware that holding all of that in ram (as markkovify.Text only accepts strings) would probably be an issue.
I was correct; not 2 seconds after having run it the process was killed.
Is there a way to modify .Text and .NewlineText so they can accept (and properly process, of course), a generator or file-like object to iterate over?
I have no problem implementing this myself and filing a pull request, I'm just unsure how to deal with sentence splitting along chunks.

danthedaniel · 2017-02-24T21:14:06Z

I'm just unsure how to deal with sentence splitting along chunks.

I'd recommend using a generator internally for this, where it runs over an iterable (a generator, list, or something else) and only yields a new sentence upon the discovery of a !, ? or ..

That way you're relying on Python for maintaining state for you, rather than maintaining state with local variables.

ghost · 2017-02-27T08:46:40Z

Also you may try PyPy for some speedups

jsvine · 2017-09-02T04:08:11Z

Thanks for suggesting this. It's an improvement I'd been meaning to make. Now available in v0.6.1. Fetch the latest with pip install -U markovify, and see the instructions here: https://github.com/jsvine/markovify#generating-markovifytext-models-from-very-large-corpora

Does that improve performance for you?

jsvine added the enhancement label Mar 31, 2017

jsvine modified the milestone: v0.7.0 Mar 31, 2017

jsvine mentioned this issue Sep 2, 2017

Accept file-like objects, and to discard original #75

Merged

jsvine closed this as completed Sep 29, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make markovify.Text accept a file-like object or generator to reduce memory footprint when using large files? #54

Make markovify.Text accept a file-like object or generator to reduce memory footprint when using large files? #54

Oliver2213 commented Jan 21, 2017

danthedaniel commented Feb 24, 2017 •

edited

ghost commented Feb 27, 2017

jsvine commented Sep 2, 2017

Make markovify.Text accept a file-like object or generator to reduce memory footprint when using large files? #54

Make markovify.Text accept a file-like object or generator to reduce memory footprint when using large files? #54

Comments

Oliver2213 commented Jan 21, 2017

danthedaniel commented Feb 24, 2017 • edited

ghost commented Feb 27, 2017

jsvine commented Sep 2, 2017

danthedaniel commented Feb 24, 2017 •

edited