Visualization

This project is still in active development, but is mostly stable. Various features are subject to appear or disappear with little/no notice.

Current Release

v1.0 - Link

About

This project builds on the work done by Chenhao Tan and Dallas Card in this paper. This tool provides a way to visualize text interactively using the relations outlined in the paper.

Installation

Installing has been tested on Windows (7 and higher) (64-bit), Linux (Centos7), and some version of OSX. For convenience install scripts are provided for Windows and Linux. Binaries are also provided for Windows and Linux.

After running any of the installation methods, if you've never used NLTK you will need to download the NLTK stopword corpus. Activate the virtualenv for the application and run python -m nltk.downloader stopwords

Requirements

Python3 (only tested on 3.5 and higher)
A display
- Running over a x-server is possibly but does not look good

Installing from the repo

Linux Installation

Download or clone the repo
Change the value of the variables $PYTHONPATH and $VenvPath to point to your system python installation and virtualenv
Run ./FullSetup.sh
- This will create the application in a new folder called Application
Activate the virtualenv (Application/venv)
Run the visualizer with: python Visualizer.py

Windows Installation

Download or clone the repo
Change the value of the variable PythonPath to point to your system python installation
Run FullSetup.bat
- This will create the application in a new folder called Application
Activate the virtualenv
- If you called FullSetup.bat from the command line, the virtualenv may already be active
Run the visualizer with: python Visualizer.py

Manual Installation

Download or clone the repo
Run: git submodule init and then git submodule update
Create a new folder where you want to install the application
From the folder Visualizer, copy all .py files and all folders to your application folder
Copy the top level idea_relations folder into the application folder
Create a virtual environment containing the packages listed in the FullRequirements.txt file
Activate the virtual environment and run the visualizer with python Visualizer.py

Anaconda Installation

Follow the Manual Installation instructions up to step 6
Instead of creating a python virtualenv, you can create an anaconda environment containing the packages in Conda_package_list.txt
- Linux and Windows users can try to use the appropriate spec file
Run the program the same as the manual installation instructions

Packaged Installation

The application is distributed in three forms. A standalone Windows binary, standalone Linux binary, and the plain python files. Both the Windows and Linux binary should run without any issue, just double click on the executable (Visualizer.exe) to run. The Linux binary should function on any Linux system (including OSX), but has only been tested on Centos 7.

For the python files, you must have python version >3.5 available on your system. Installation is similar to the manual installation. Unpack the zip file and follow the manual installation instructions above from step 7.

Because of the much larger size of the Windows and Linux binaries, it is probably better to use the python version, especially if you already have Python 3 on your system.

As of right now, the prepacked installations cannot run the preprocessor due to bugs with scipy. A fix will potentially be available late December. Using a virtualenv with the raw python files works fine though.

Note for Windows installation

On Windows, it may be difficult/impossible to install scipy using pip. To get around that we recommend downloading a prebuilt wheel for numpy and scipy from this website. You will need wheels for both numpy and scipy. Download the appropriate wheel for your system (matching python version and architecture). If you're installing the program from the repo using the installation scripts, place them in the root directory for the repo, and edit lines 52 and 53 of FullSetup.py to have the correct file names. If you're installing from the zipped python files, copy them wherever you unpacked the code and install them using pip into your virtualenv. You must install them before you install packages from FullRequirements.txt otherwise pip will install numpy from the web and then be unable to install scipy. If you're installing from the standalone Windows binary, you don't need to do anything.

Usage

The preprocessor can be run in two modes, Keywords and Topics. In keywords, the system will take two files a "input" file and a "background" file. The system calculates the prevelance of each word in both files, and creates the topic list from the top n words/phrases in the input file that aren't in the background file. While not necessary, the background file should be in the same domain as the input file for more accurate results. In Topic mode the system will run LDA on the input file using Mallet to "learn" the top 50 topics on its own. This doesn't require a background file, but does take significantly longer (~30 minutes on the demo dataset). Further details for how the topics are selected is available in this paper.

After running the preprocessor to get the relationships a new tab will open with the plot of the relationships between topics. Each dot on the graph represents a single topic/topic pair, if there are more than 1000 pairs then 1000 items will be selected randomly to populate the graph. Selecting an item will display the frequency time series for each relation as well as populating the 'Top Relation' filters.

On the left side of the window is the filter tab area, here you can filter down the visible topics. Selecting items in the ideas tab causes all relationship involving the selected ideas to populate the graph. In the types tab, relations are sorted by their type and strength, selecting a relationship here causes all relations involving either topic to appear on the graph. The other two tabs here 'Top Relation 1' and 'Top Relation 2' simply display a sorted list of the strength of all relations each selected topic has.

You can save visualizations for later use from the File menu, this saves you from having to wait for the preprocessor to run again. Additionally, you can load multiple visualizations into multiple tabs. In the visualization menu you can save images of the current PMI/correlation plot and the time series plot, the images saved are exactly what appears on the screen when you hit save.

Input file format

Input files should be given as jsonlists, one document per line. The only two fields needed are date and text. The date will be parsed using the python dateutil library with a default date of datetime(1,1,1) and all other arguemnts at their defaults. It will be converted to a string first as well. The text should be just the full text of the document properly quoted. For examples see acl.jsonlist in the included example data folder. The datafile can be compressed with gz before being passed in.

Two example visualizations are provided for you, acl.p and nips_t.p. The first one is the acl dataset in acl.jsonlist.gz processed using keywords with the nips dataset as the background file, the second one is the nips dataset processed using topics.

Preprocessor options

You must pass the preprocessor 3 options, exploration name which determines what the visualization will be called (mainly setting the tab title), the number of ideas (which tells the preprocessor how many ideas to look for), and time grouping (year, month, day) to set the time scale the preprocessor uses when grouping articles. There are also 3 optional options, Tokenize, Lemmatize, and No Stop Words. These will perform the relevant preprocessing to the data before searching for topics. Additionally, you can use the advanced options to force the preprocessor to save it's intermediate output data somewhere else if you wanted. The intermediate data isn't terrible useful/readable though.

Name		Name	Last commit message	Last commit date
Latest commit History 168 Commits
.idea		.idea
Visualizer		Visualizer
examples		examples
idea_relations @ 4593f45		idea_relations @ 4593f45
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
Conda_linux64_spec_file.txt		Conda_linux64_spec_file.txt
Conda_package_list.txt		Conda_package_list.txt
Conda_win64_spec_file.txt		Conda_win64_spec_file.txt
FullRequirements.txt		FullRequirements.txt
FullSetup.bat		FullSetup.bat
FullSetup.sh		FullSetup.sh
README.md		README.md
UpdateApplication.bat		UpdateApplication.bat
UpdateApplication.sh		UpdateApplication.sh
Vis_OrigColors.png		Vis_OrigColors.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visualization

Current Release

About

Installation

Requirements

Installing from the repo

Linux Installation

Windows Installation

Manual Installation

Anaconda Installation

Packaged Installation

Note for Windows installation

Usage

Input file format

Preprocessor options

About

Releases 1

Packages

Languages

nwrush/Visualization

Folders and files

Latest commit

History

Repository files navigation

Visualization

Current Release

About

Installation

Requirements

Installing from the repo

Linux Installation

Windows Installation

Manual Installation

Anaconda Installation

Packaged Installation

Note for Windows installation

Usage

Input file format

Preprocessor options

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages