# `Bubble Charts` Tutorial
   
This notebook is to show examples of how to produce bubble chart visualisations of term counts. It starts out with a condensed version of the tutorial for the `dtm` module.

## Load Data and Construct a DTM

For this tutorial, we will load Jane Austen's _Pride and Prejudice_, tokenise it, and then cut it into ten segments, which we'll treat as ten separate documents.

In [None]:
# Python imports
import re
from lexos.io.smart import Loader
from lexos import tokenizer
from lexos.cutter.ginsu import Ginsu
from lexos.dtm import DTM

# Load the data
loader = Loader()
loader.load("../test_data/txt/Austen_Pride.txt")
text = re.sub("[\r\n|\n]", " ", loader.texts[0]).strip()

# Make a doc
doc = tokenizer.make_doc(text)

# Cut the god into 10 segments
cutter = Ginsu()
docs = cutter.splitn(doc, n=10)

# Build a DTM with labels
labels=["Pride1", "Pride2", "Pride3", "Pride4", "Pride5", "Pride6", "Pride7", "Pride8", "Pride9", "Pride10"]
dtm = DTM(docs, labels=labels)

## Create a Bubble Chart

Bubble Charts are an alternative form of word cloud. In the Lexos web app, a buble chart is known as a `bubbleviz`, and that is the name used for the Lexos API module. Note that `bubbleviz` is still somewhat experimental and subject to change.

In the example below, we get lists of terms and their counts from our spaCy docs and feed them to the `create_bubble_chart()` function.

In [None]:
from collections import Counter
from lexos.visualization.bubbleviz import create_bubble_chart

# Get a Python Counter containing term counts for all docs
term_counts = Counter([token.text for doc in docs for token in doc])

# Get the terms and counts (area) as lists
terms = list(term_counts.keys())
counts = list(term_counts.values())

# Create the bubble chart
create_bubble_chart(terms, counts, show=True)

This may seem somewhat inefficient. Why can't you just provide a DTM? You can with the `create_bubble_chart_from_dtm()` function.

In [None]:
from lexos.visualization.bubbleviz import create_bubble_chart_from_dtm

# Create the bubble chart
create_bubble_chart_from_dtm(dtm, show=True)

You will notice that this produces a different (and probably less useful) bubble chart. The reason is that the Python `Counter` class preserves the order of the tokens, so that the counts are fed to the plotting algorithm in a non-sequential order. This appears to influence how the algorithm handles collisions between bubbles. As a result, the Python `Counter` to get the terms and counts and feeding them to `create_bubble_chart()` is currently the preferred method for generating bubble charts.

Both functions take a number of optional arguments:

- `limit`: The maximum number of bubbles to plot. Default = 100.
- `title`: The title of the plot.
- `bubble_spacing`: The spacing between bubbles. Default = 0.1
- `colors`: The colors of the bubbles. This must be a list of colors in hexadecimal format (e.g. "#5A69AF").
- `figsize`: A tuple containing the height and width of the figure. Default = (15, 15).
- `font_family`: The font family of the plot (must be installed on your system). Default = "DejaVu Sans".
- `show`: Whether to show the plot. Default = True.
- `filename`: The filename to save the plot to.

Feel free to try these out in the cell below.


In [None]:
create_bubble_chart(terms, counts, show=True)