# Lexos Visualizations

This script does the following:

1. Creates spaCy docs from a list of text files.
2. Converts the tokens to lower case and filters them to remove digits, punctuation, and whitespace.
3. Tests that the loader is working properly.
4. Store data in a document term matrix.
5. Generates a dendogram from the dtm.

## Configuration

Configure a list of file paths, the labels you wish to use for each document, and the language model you wish to use to parse the texts.

Note that converting long texts to spaCy docs can take a long time.

In [2]:
# Replace with your own data
data = [
    r"C:\Users\jack\OneDrive\Documents\School\summer22\LexosRepo\lexos\tests\test_data\txt\Austen_Pride.txt",
    r"C:\Users\jack\OneDrive\Documents\School\summer22\LexosRepo\lexos\tests\test_data\txt\Austen_Pride.txt"
]
labels = ["Pride", "Sense"]
model = "en_core_web_sm"

## Import Lexos API

In [4]:
# Set local import path
import os
import sys
LEXOS_PATH = "lexos"
if "NOTEBOOK_INITIATED_FLAG" not in globals():
    NOTEBOOK_INITIATED_FLAG = True
    try:
        module_path = os.path.join(os.path.dirname(__file__), os.pardir)
    except:
        module_path = os.path.abspath(os.path.join(LEXOS_PATH))
        %cd lexos
        %pwd
    if module_path not in sys.path:
        sys.path.append(module_path)
        
# Import Lexos API modules
from lexos.io.basic import Loader
from lexos import tokenizer
from lexos.dtm import DTM
from lexos.cutter import Ginsu
try:
    from lexos.cluster.dendrogram import Dendrogram
except ImportError:
    print("Dendogram not imported.")

## Load Texts and Convert to spaCy Docs

In [5]:
# Create the loader and load the data
loader = Loader()
loader.load(data)

# Make the docs -- currently takes a long time with full novels
docs = tokenizer.make_docs(loader.texts, model=model)

## Ensure Loader is working correctly

In [6]:
for i, text in enumerate(docs):
    print(text[0:50])
    print("\n")

 Pride and Prejudice
by Jane Austen
Chapter 1
It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
However little known the feelings or views of such a man


 Pride and Prejudice
by Jane Austen
Chapter 1
It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife.
However little known the feelings or views of such a man




## Cut texts

In [7]:
from lexos.cutter import Ginsu
cutter = Ginsu()

result = cutter.splitn(docs, n=3, merge_threshold=0.5, overlap=None)

## Initiate DTM

In [8]:
from lexos.dtm import DTM
labels = ["Pride_and_Prejudice", "Sense_and_Sensibility"]
dtm = DTM(docs, labels)

labels = ["Pride1", "Pride2", "Pride3", "Sense1", "Sense2", "Sense3"]
#result = [doc.text for doc in result]
#print (result[0])
dtm2 = DTM(result[0], labels)

ModuleNotFoundError: No module named 'cytoolz'

## Create Dendogram

In [None]:
dendrogram = Dendrogram(dtm, show=True)

## Create WordCloud

In [None]:
from lexos.visualization.cloud.wordcloud import multicloud
labels = dtm.get_table().columns.tolist()[1:]
multicloud(dtm, docs=None, opts=None, ncols=3, title=None, labels=None, show=True, figure_opts=None, round=None, filename=None)

## Create BubbleViz

In [None]:
from lexos.visualization.bubbleviz import bubbleviz
bubbleviz(dtm)