<a href="https://colab.research.google.com/github/junting-huang/data_storytelling/blob/main/case_1_narrative.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# case_1. narrative


TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. More information can be found: https://textblob.readthedocs.io/en/dev/. This tutorial serves as a quick introduction to the python TextBlob package.

## 1.1 installation

In [1]:
! pip install textblob

Collecting textblob
  Downloading textblob-0.17.1-py2.py3-none-any.whl (636 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m636.8/636.8 kB[0m [31m104.2 kB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: textblob
Successfully installed textblob-0.17.1


In [2]:
! python -m textblob.download_corpora

[nltk_data] Downloading package brown to /Users/liyao/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /Users/liyao/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /Users/liyao/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/liyao/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package conll2000 to /Users/liyao/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to
[nltk_data]     /Users/liyao/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


## 1.2 basic usage

TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob objects as if they were Python strings that learned how to do Natural Language Processing.

First, the import.

In [6]:
from textblob import TextBlob

Let’s create our first TextBlob.

In [7]:
blob = TextBlob("When I wrote the following pages, or rather the bulk of them, I lived alone, in the woods, a mile from any neighbor, in a house which I had built myself, on the shore of Walden Pond, in Concord, Massachusetts, and earned my living by the labor of my hands only. ")

### Part-of-speech Tagging

Part-of-speech tags can be accessed through the tags property.

In [8]:
print(blob.tags)

[('When', 'WRB'), ('I', 'PRP'), ('wrote', 'VBD'), ('the', 'DT'), ('following', 'JJ'), ('pages', 'NNS'), ('or', 'CC'), ('rather', 'RB'), ('the', 'DT'), ('bulk', 'NN'), ('of', 'IN'), ('them', 'PRP'), ('I', 'PRP'), ('lived', 'VBD'), ('alone', 'RB'), ('in', 'IN'), ('the', 'DT'), ('woods', 'NNS'), ('a', 'DT'), ('mile', 'NN'), ('from', 'IN'), ('any', 'DT'), ('neighbor', 'NN'), ('in', 'IN'), ('a', 'DT'), ('house', 'NN'), ('which', 'WDT'), ('I', 'PRP'), ('had', 'VBD'), ('built', 'VBN'), ('myself', 'PRP'), ('on', 'IN'), ('the', 'DT'), ('shore', 'NN'), ('of', 'IN'), ('Walden', 'NNP'), ('Pond', 'NNP'), ('in', 'IN'), ('Concord', 'NNP'), ('Massachusetts', 'NNP'), ('and', 'CC'), ('earned', 'VBD'), ('my', 'PRP$'), ('living', 'NN'), ('by', 'IN'), ('the', 'DT'), ('labor', 'NN'), ('of', 'IN'), ('my', 'PRP$'), ('hands', 'NNS'), ('only', 'RB')]


### Noun Phrase Extraction

Similarly, noun phrases are accessed through the noun_phrases property.

In [None]:
print(blob.noun_phrases)

['walden pond', 'concord', 'massachusetts']


### Tokenization

You can break TextBlobs into words or sentences. Sentence objects have the same properties and methods as TextBlobs. For more advanced tokenization, see the Advanced Usage guide: https://textblob.readthedocs.io/en/dev/advanced_usage.html#advanced.

In [11]:
blob.words

WordList(['When', 'I', 'wrote', 'the', 'following', 'pages', 'or', 'rather', 'the', 'bulk', 'of', 'them', 'I', 'lived', 'alone', 'in', 'the', 'woods', 'a', 'mile', 'from', 'any', 'neighbor', 'in', 'a', 'house', 'which', 'I', 'had', 'built', 'myself', 'on', 'the', 'shore', 'of', 'Walden', 'Pond', 'in', 'Concord', 'Massachusetts', 'and', 'earned', 'my', 'living', 'by', 'the', 'labor', 'of', 'my', 'hands', 'only'])

In [12]:
blob.sentences

[Sentence("When I wrote the following pages, or rather the bulk of them, I lived alone, in the woods, a mile from any neighbor, in a house which I had built myself, on the shore of Walden Pond, in Concord, Massachusetts, and earned my living by the labor of my hands only.")]

### Words Inflection and Lemmatization

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

In [15]:
print(blob.words[5])

'pages'

In [16]:
print(blob.words[5].singularize())

'page'

In [19]:
print(blob.words[19])

'mile'

In [26]:
print(blob.words[19].pluralize())

miles


Words can be lemmatized by calling the lemmatize method.

In [27]:
print(blob.words[2])

wrote


In [28]:
print(blob.words[2].lemmatize('v'))

write


In [25]:
print(blob.words[4])

following


In [24]:
print(blob.words[4].stem())

follow


A complete tutorial from TextBlob official website can be found here: https://textblob.readthedocs.io/en/dev/quickstart.html#quickstart.