# **Scattertext**

Scattertext is an open-source python library that is used with the help of spacy to create beautiful visualizations of what words and phrases are more characteristics of a given category. It is a tool for finding distinguishing terms in corpora and presenting them in an interactive, HTML scatter plot. Scattertext visualizations are highly informative because in the visualization the points corresponding to terms are selectively labeled so that they don’t overlap with other labels or points.

In this practice session, we will draw a sentiment analysis visualization using spacy and scatter text and see how beautifully scatter text allows you to visualize and find text in the data.

## **Implementation:**

We will start by installing spacy and scattertext using pip install spacy and pip install scattertext respectively.

In [None]:
!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn nltk gensim spacy scattertext --user -q --no-warn-script-location

import IPython
IPython.Application.instance().kernel.do_shutdown(True)

Importing required libraries

We will be importing spacy and scattertext for visualization and pandas for loading our dataset.

In [None]:
import spacy

import pandas as pd

import scattertext as st

Loading the Dataset

For creating a sentiment analysis visualization we will import ‘Twitter Airline Sentiment Dataset’ from [Kaggle](https://www.kaggle.com/crowdflower/twitter-airline-sentiment). The dataset contains different attributes like Username, tweet, id, text, etc. We will use the data to visualize the different terms used for different sentiments

In [None]:
twitter_df = pd.read_csv('Tweets.csv')

twitter_df.dtypes

Downloading English Model

As we have already discussed, spacy contains models for different languages. We will use spacy and download the English model as we are working in the English Language.

In [None]:
nlp = spacy.load('en')

Creating Scatterext Corpus

Next, we will create a scattertext corpus of the dataset we are working on As we are working on the sentiment analysis we will set the category_col to ‘airline_sentiment’, and the text column which contains tweets will be used as text_col.

In [None]:
corpus = st.CorpusFromPandas(twitter_df, category_col='airline_sentiment', text_col='text',  nlp=nlp).build()

For creating this corpus we have used the NLP as the English model which we downloaded in the previous step, and create it using the build() function.

Creating the visualization

This is the main and the final step. Here we will create a visualization with the following parameters:

> * category: We will set this to negative as we will denote negative sentiments using this.
> * category_name: This will be set as “Negative” and displayed as the axis title
> * not_category_name: The sentiments which are not in the negative category are under this category with the name as “Positive”.
> * Metadata: The data we will be using for excerpts.

Now let us define all these and create the visualization using produce_scattertext_explorer.

In [None]:
sent = st.produce_scattertext_explorer(corpus,

        category='negative',

        category_name='Negative',

        not_category_name='Positive',width_in_pixels=1000,

        metadata=twitter_df['name'])

This command will create the desired visualization and we will write this into an Html file that can be run standalone.

In [None]:
open("Twitter_Sentiment.html", 'wb').write(sent.encode('utf-8'))

In [None]:
from IPython.display import HTML
HTML(filename="Twitter_Sentiment.html")

This is the final visualization we created using scattertext.

In the visualization, we can clearly see that X-Axis displays the positive frequency and the y-axis displays the negative frequency. The axis is divided into three sections namely:

> * Frequent: It shows the words with the highest frequency
> * Average: Shows word with an average frequency
> * Infrequent: Shows words with the least frequency.

# **Related Articles:**

> * [Visualizing Sentimental Analysis](https://analyticsindiamag.com/visualizing-sentiment-analysis-reports-using-scattertext-nlp-tool/)

> * [Hands-On Tutorial on Flair](https://analyticsindiamag.com/flair-hands-on-guide-to-robust-nlp-framework-built-upon-pytorch/)

> * [Meet Skweak](https://analyticsindiamag.com/meet-skweak-a-python-toolkit-for-applying-weak-supervision-to-nlp-tasks/)

> * [Simple Chatbot with NLTK](https://analyticsindiamag.com/how-does-a-simple-chatbot-with-nltk-work/)

> * [Guide to DistilBERT](https://analyticsindiamag.com/python-guide-to-huggingface-distilbert-smaller-faster-cheaper-distilled-bert/)

> * [Introduction to Simple Transformers](https://analyticsindiamag.com/text-classification-using-simple-transformers/)


