Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add update method to Text #82

Closed
wants to merge 1 commit into from
Closed

Conversation

thallada
Copy link
Contributor

@thallada thallada commented Oct 7, 2017

Sorry to overwhelm you with yet another PR. I'm playing around with trying to make markovify models out of huge corpus and I keep finding ways to eke out a little more saved memory.

This is mainly useful for incrementally building one Text from separate chunks of a corpus too large to store in memory all at once. This is slightly better than using combine() which requires you to have the overhead of at least two models in memory at once.

Chain now accepts base_model which the new Text update method uses.

I also tweaked how Text/Chain behave when you initialize them with None or '' so that you can initialize an empty Text without it throwing an exception and then later call update() on it.

@coveralls
Copy link

coveralls commented Oct 7, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 696e544 on thallada:model-update into f08065c on jsvine:master.

@coveralls
Copy link

coveralls commented Oct 8, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 00c51bf on thallada:model-update into f08065c on jsvine:master.

@jsvine
Copy link
Owner

jsvine commented Nov 12, 2017

Thanks for suggesting this, @thallada! The goal makes sense. One question, though: What's the use case in self.build for a base_model that isn't self.model?

Put another way: What would you think about converting def build(self, corpus, state_size, base_model=None) to def build(self, corpus, state_size, overwrite=False)? If you wanted to add to the pre-existing model, you'd pass overwrite=True.

But perhaps I'm glossing over something important. What do you think?

Useful for incrementally building one Text from separate chunks of a corpus too
large to store in memory all at once (example added to README).

Chain's build method now accepts `overwrite` which the new update method uses.
@coveralls
Copy link

coveralls commented Jan 11, 2018

Coverage Status

Coverage remained the same at 100.0% when pulling 806348c on thallada:model-update into c41af53 on jsvine:master.

@thallada
Copy link
Contributor Author

I finally got around to addressing your comment. I think you are right that there is really no use for base_model and that it could be simplified with just an overwrite boolean parameter.

I made those changes and rebased with master. But, the 2.7 tests are failing in Travis and throws an error on showing me the log... I ran the tests on 2.7 on my machine and they all pass so I'm not sure what's up with that.

aurieh added a commit to aurieh/markovify that referenced this pull request Jun 17, 2020
@thallada
Copy link
Contributor Author

thallada commented Nov 9, 2022

After ~4 years I've forgotten most of the context for this PR and it now has conflicts. I'm no longer working with markovify so I don't think I have the time/motivation to clean this up and test it again. Let me know if this is useful to anyone else and maybe I can help implement it again.

@thallada thallada closed this Nov 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants