Skip to content

Commit

Permalink
Merge c3853b5 into 6968649
Browse files Browse the repository at this point in the history
  • Loading branch information
kaaanishk committed May 11, 2019
2 parents 6968649 + c3853b5 commit 055e971
Showing 1 changed file with 10 additions and 10 deletions.
20 changes: 10 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

# Markovify

Markovify is a simple, extensible Markov chain generator. Right now, its main use is for building Markov models of large corpora of text, and generating random sentences from that. But, in theory, it could be used for [other applications](http://en.wikipedia.org/wiki/Markov_chain#Applications).
Markovify is a simple, extensible Markov chain generator. Right now, its primary use is for building Markov models of large corpora of text and generating random sentences from that. However, in theory, it could be used for [other applications](http://en.wikipedia.org/wiki/Markov_chain#Applications).

- [Why Markovify?](#why-markovify)
- [Installation](#installation)
Expand All @@ -16,7 +16,7 @@ Markovify is a simple, extensible Markov chain generator. Right now, its main us

Some reasons:

- Simplicity. "Batteries included," but it's easy to override key methods.
- Simplicity. "Batteries included," but it is easy to override key methods.

- Models can be stored as JSON, allowing you to cache your results and save them for later.

Expand Down Expand Up @@ -56,15 +56,15 @@ for i in range(3):

Notes:

- The usage examples here assume you're trying to markovify text. If you'd like to use the underlying `markovify.Chain` class, which is not text-specific, check out [the (annotated) source code](markovify/chain.py).
- The usage examples here assume you are trying to markovify text. If you would like to use the underlying `markovify.Chain` class, which is not text-specific, check out [the (annotated) source code](markovify/chain.py).

- Markovify works best with large, well-punctuated texts. If your text doesn't use `.`s to delineate sentences, put each sentence on a newline, and use the `markovify.NewlineText` class instead of `markovify.Text` class.
- Markovify works best with large, well-punctuated texts. If your text does not use `.`s to delineate sentences, put each sentence on a newline, and use the `markovify.NewlineText` class instead of `markovify.Text` class.

- If you've accidentally read your text as one long sentence, markovify will be unable to generate new sentences from it due to a lack of beginning and ending delimiters. This can happen if you've read a newline delimited file using the `markovify.Text` command instead of `markovify.NewlineText`. To check this, the command `[key for key in txt.chain.model.keys() if "___BEGIN__" in key]` command will return all of the possible sentence starting words, and should return more than one result.
- If you have accidentally read the input text as one long sentence, markovify will be unable to generate new sentences from it due to a lack of beginning and ending delimiters. This issue can occur if you have read a newline delimited file using the `markovify.Text` command instead of `markovify.NewlineText`. To check this, the command `[key for key in txt.chain.model.keys() if "___BEGIN__" in key]` command will return all of the possible sentence-starting words and should return more than one result.

- By default, the `make_sentence` method tries, a maximum of 10 times per invocation, to make a sentence that doesn't overlap too much with the original text. If it is successful, the method returns the sentence as a string. If not, it returns `None`. To increase or decrease the number of attempts, use the `tries` keyword argument, e.g., call `.make_sentence(tries=100)`.
- By default, the `make_sentence` method tries a maximum of 10 times per invocation, to make a sentence that does not overlap too much with the original text. If it is successful, the method returns the sentence as a string. If not, it returns `None`. To increase or decrease the number of attempts, use the `tries` keyword argument, e.g., call `.make_sentence(tries=100)`.

- By default, `markovify.Text` tries to generate sentences that don't simply regurgitate chunks of the original text. The default rule is to suppress any generated sentences that exactly overlaps the original text by 15 words or 70% of the sentence's word count. You can change this rule by passing `max_overlap_ratio` and/or `max_overlap_total` to the `make_sentence` method. Alternatively you can disable this check entirely by passing `test_output` as False.
- By default, `markovify.Text` tries to generate sentences that do not simply regurgitate chunks of the original text. The default rule is to suppress any generated sentences that exactly overlaps the original text by 15 words or 70% of the sentence's word count. You can change this rule by passing `max_overlap_ratio` and/or `max_overlap_total` to the `make_sentence` method. Alternatively, this check can be disabled entirely by passing `test_output` as False.

## Advanced Usage

Expand Down Expand Up @@ -92,12 +92,12 @@ model_b = markovify.Text(text_b)
model_combo = markovify.combine([ model_a, model_b ], [ 1.5, 1 ])
```

... would combine `model_a` and `model_b`, but place 50% more weight on the connections from `model_a`.
This code snippet would combine `model_a` and `model_b`, but, it would also place 50% more weight on the connections from `model_a`.


### Extending `markovify.Text`

The `markovify.Text` class is highly extensible; most methods can be overridden. For example, the following `POSifiedText` class uses NLTK's part-of-speech tagger to generate a Markov model that obeys sentence structure better than a naive model. (It works. But be warned: `pos_tag` is very slow.)
The `markovify.Text` class is highly extensible; most methods can be overridden. For example, the following `POSifiedText` class uses NLTK's part-of-speech tagger to generate a Markov model that obeys sentence structure better than a naive model. (It works; however, be warned: `pos_tag` is very slow.)

```python
import markovify
Expand Down Expand Up @@ -165,7 +165,7 @@ You can also export the underlying Markov chain on its own — i.e., excluding

### Generating `markovify.Text` models from very large corpora

By default, the `markovify.Text` class loads, and retains, the your textual corpus, so that it can compare generated sentences with the original (and only emit novel sentences). But, with very large corpora, loading the entire text at once (and retaining it) can be memory-intensive. To overcome this, you can `(a)` tell Markovify not to retain the original:
By default, the `markovify.Text` class loads, and retains, your textual corpus, so that it can compare generated sentences with the original (and only emit novel sentences). However, with very large corpora, loading the entire text at once (and retaining it) can be memory-intensive. To overcome this, you can `(a)` tell Markovify not to retain the original:

```python
with open("path/to/my/huge/corpus.txt") as f:
Expand Down

0 comments on commit 055e971

Please sign in to comment.