<a href="https://colab.research.google.com/github/junting-huang/data_storytelling/blob/main/case_1_narrative.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# case_1. narrative


TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more. More information can be found: https://textblob.readthedocs.io/en/dev/. This tutorial serves as a quick introduction to the python TextBlob package.

## 1.1 installation

In [None]:
! pip install textblob

In [None]:
! python -m textblob.download_corpora

## 1.2 basic usage

TextBlob aims to provide access to common text-processing operations through a familiar interface. You can treat TextBlob objects as if they were Python strings that learned how to do Natural Language Processing.

First, the import.

In [1]:
from textblob import TextBlob

Let’s create our first TextBlob.

In [2]:
blob = TextBlob("When I wrote the following pages, or rather the bulk of them, I lived alone, in the woods, a mile from any neighbor, in a house which I had built myself, on the shore of Walden Pond, in Concord, Massachusetts, and earned my living by the labor of my hands only. ")

### Part-of-speech Tagging

Part-of-speech tagging is a natural language processing task that involves assigning a grammatical category (such as noun, verb, adjective, etc.) to each word in a text. This process is essential for several reasons:

* Syntax Analysis: Part-of-speech tagging helps in understanding the syntactic structure of a sentence. Identifying the part of speech of each word allows for the analysis of how words relate to each other in a grammatical sense.

* Semantic Analysis: It aids in understanding the meaning of words in context. Different parts of speech convey different semantic roles, and knowing the part of speech can provide insights into the intended meaning of a word.

* Information Retrieval: Part-of-speech tagging is crucial in information retrieval systems. It enables more accurate and relevant searches by considering the grammatical roles of words in queries and documents.


Part-of-speech tags can be accessed through the tags property.

In [3]:
print(blob.tags)

[('When', 'WRB'), ('I', 'PRP'), ('wrote', 'VBD'), ('the', 'DT'), ('following', 'JJ'), ('pages', 'NNS'), ('or', 'CC'), ('rather', 'RB'), ('the', 'DT'), ('bulk', 'NN'), ('of', 'IN'), ('them', 'PRP'), ('I', 'PRP'), ('lived', 'VBD'), ('alone', 'RB'), ('in', 'IN'), ('the', 'DT'), ('woods', 'NNS'), ('a', 'DT'), ('mile', 'NN'), ('from', 'IN'), ('any', 'DT'), ('neighbor', 'NN'), ('in', 'IN'), ('a', 'DT'), ('house', 'NN'), ('which', 'WDT'), ('I', 'PRP'), ('had', 'VBD'), ('built', 'VBN'), ('myself', 'PRP'), ('on', 'IN'), ('the', 'DT'), ('shore', 'NN'), ('of', 'IN'), ('Walden', 'NNP'), ('Pond', 'NNP'), ('in', 'IN'), ('Concord', 'NNP'), ('Massachusetts', 'NNP'), ('and', 'CC'), ('earned', 'VBD'), ('my', 'PRP$'), ('living', 'NN'), ('by', 'IN'), ('the', 'DT'), ('labor', 'NN'), ('of', 'IN'), ('my', 'PRP$'), ('hands', 'NNS'), ('only', 'RB')]


### Noun Phrase Extraction

Noun phrase extraction is a natural language processing (NLP) task that involves identifying and extracting noun phrases from a given text. A noun phrase is a group of words that function as a unit and includes a noun (the head) along with its modifiers. The process of noun phrase extraction serves several important purposes:

* Semantic Analysis: Noun phrases often represent meaningful units of information in a sentence. Extracting them helps in understanding the key entities and concepts discussed in the text, contributing to semantic analysis.

* Information Retrieval: Noun phrases play a crucial role in information retrieval systems. By extracting relevant noun phrases from documents, search engines can improve the accuracy of search results and help users find information more effectively.

* Named Entity Recognition (NER): Noun phrase extraction is closely related to named entity recognition. Many named entities, such as people, organizations, and locations, are often part of noun phrases. Extracting noun phrases can be a preliminary step in identifying and categorizing named entities.


Similarly, noun phrases are accessed through the noun_phrases property.

In [4]:
print(blob.noun_phrases)

['walden pond', 'concord', 'massachusetts']


### Tokenization

You can break TextBlobs into words or sentences. Sentence objects have the same properties and methods as TextBlobs. For more advanced tokenization, see the Advanced Usage guide: https://textblob.readthedocs.io/en/dev/advanced_usage.html#advanced.

In [5]:
blob.words

WordList(['When', 'I', 'wrote', 'the', 'following', 'pages', 'or', 'rather', 'the', 'bulk', 'of', 'them', 'I', 'lived', 'alone', 'in', 'the', 'woods', 'a', 'mile', 'from', 'any', 'neighbor', 'in', 'a', 'house', 'which', 'I', 'had', 'built', 'myself', 'on', 'the', 'shore', 'of', 'Walden', 'Pond', 'in', 'Concord', 'Massachusetts', 'and', 'earned', 'my', 'living', 'by', 'the', 'labor', 'of', 'my', 'hands', 'only'])

In [6]:
blob.sentences

[Sentence("When I wrote the following pages, or rather the bulk of them, I lived alone, in the woods, a mile from any neighbor, in a house which I had built myself, on the shore of Walden Pond, in Concord, Massachusetts, and earned my living by the labor of my hands only.")]

### Words Inflection and Lemmatization

Each word in TextBlob.words or Sentence.words is a Word object (a subclass of unicode) with useful methods, e.g. for word inflection.

In [7]:
print(blob.words[5])

pages


In [8]:
print(blob.words[5].singularize())

page


In [9]:
print(blob.words[19])

mile


In [10]:
print(blob.words[19].pluralize())

miles


Words can be lemmatized by calling the lemmatize method.

In [11]:
print(blob.words[2])

wrote


In [12]:
print(blob.words[2].lemmatize('v'))

write


In [13]:
print(blob.words[4])

following


In [14]:
print(blob.words[4].stem())

follow


A complete tutorial from TextBlob official website can be found here: https://textblob.readthedocs.io/en/dev/quickstart.html#quickstart.

## 1.3 labeling agent/action