Skip to content

Commit

Permalink
Refactor dispersion plot (#3082)
Browse files Browse the repository at this point in the history
* Modernize Matplotlib code for dispersion plot

- Use object-oriented instead of state-machine interface.
- Return Axes object to allow additional customization (see #2239).
- Remove useless scalex kwarg (it's supposed to be a bool and True by
  default, so passing in 0.1 is confusing).

* Use default palette in dispersion plot

* Refactor data preparation in dispersion plot

Make the code a bit more concise and readable for beginners, who may
want to use it as a starting point for their own tweaked dispersion
plot.

(Incidentally, this version is also a bit faster since it replaces the
nested loop over words with the in operator on a dict, but that's not
the main goal.)

* Casefold instead of lower in dispersion plot

str.casefold is the method primarily meant for caseless comparison.

* Dispersion plot docstring tweak

* Reraise ImportError if importing matplotlib fails

Rather than ValueError.
Additionally, add a space between "... installed." and "See ..."

* Add docstring for return value to dispersion plot

Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>
  • Loading branch information
dlukes and tomaarsen committed Dec 7, 2022
1 parent f019fbe commit b16931c
Showing 1 changed file with 28 additions and 30 deletions.
58 changes: 28 additions & 30 deletions nltk/draw/dispersion.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,51 +15,49 @@ def dispersion_plot(text, words, ignore_case=False, title="Lexical Dispersion Pl
Generate a lexical dispersion plot.
:param text: The source text
:type text: list(str) or enum(str)
:type text: list(str) or iter(str)
:param words: The target words
:type words: list of str
:param ignore_case: flag to set if case should be ignored when searching text
:type ignore_case: bool
:return: a matplotlib Axes object that may still be modified before plotting
:rtype: Axes
"""

try:
from matplotlib import pylab
import matplotlib.pyplot as plt
except ImportError as e:
raise ValueError(
"The plot function requires matplotlib to be installed."
raise ImportError(
"The plot function requires matplotlib to be installed. "
"See https://matplotlib.org/"
) from e

text = list(text)
words.reverse()

if ignore_case:
words_to_comp = list(map(str.lower, words))
text_to_comp = list(map(str.lower, text))
else:
words_to_comp = words
text_to_comp = text

points = [
(x, y)
for x in range(len(text_to_comp))
for y in range(len(words_to_comp))
if text_to_comp[x] == words_to_comp[y]
]
if points:
x, y = list(zip(*points))
else:
x = y = ()
pylab.plot(x, y, "b|", scalex=0.1)
pylab.yticks(list(range(len(words))), words, color="b")
pylab.ylim(-1, len(words))
pylab.title(title)
pylab.xlabel("Word Offset")
pylab.show()
word2y = {
word.casefold() if ignore_case else word: y
for y, word in enumerate(reversed(words))
}
xs, ys = [], []
for x, token in enumerate(text):
token = token.casefold() if ignore_case else token
y = word2y.get(token)
if y is not None:
xs.append(x)
ys.append(y)

_, ax = plt.subplots()
ax.plot(xs, ys, "|")
ax.set_yticks(list(range(len(words))), words, color="C0")
ax.set_ylim(-1, len(words))
ax.set_title(title)
ax.set_xlabel("Word Offset")
return ax


if __name__ == "__main__":
import matplotlib.pyplot as plt

from nltk.corpus import gutenberg

words = ["Elinor", "Marianne", "Edward", "Willoughby"]
dispersion_plot(gutenberg.words("austen-sense.txt"), words)
plt.show()

0 comments on commit b16931c

Please sign in to comment.