## Natural Language Understanding
https://natural-language-understanding-demo.mybluemix.net/

Free 6 months trial for students: https://ibm.onthehub.com and click on "Students".
Offer includes IBM Cloud as well as Watson services

In [1]:
#!pip install watson_developer_cloud

In [2]:
import sys
import os
import json
import re
import pandas as pd
sys.path.append(os.path.join(os.getcwd(),'..'))
import watson_developer_cloud
from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import Features, EntitiesOptions, KeywordsOptions, ConceptsOptions, RelationsOptions, SentimentOptions, SemanticRolesOptions, EmotionOptions, CategoriesOptions

In [3]:
natural_language_understanding = watson_developer_cloud.NaturalLanguageUnderstandingV1(version='2017-03-02',
                                                            username='username',
                                                            password='password')

https://watson-api-explorer.mybluemix.net/apis/natural-language-understanding-v1

#### Concepts

Identify general concepts that are referenced or alluded to in your content. Concepts that are detected typically have an associated link to a DBpedia resource.

#### Entities
Detect important people, places, geopolitical entities and other types of entities in your content. Entity detection recognizes consecutive coreferences of each entity. For example, analysis of the following text would count "Barack Obama" and "He" as the same entity:
"Barack Obama was the 44th President of the United States. He took office in January 2009."

#### Keywords
Determine the most important keywords in your content. Keyword phrases are organized by relevance in the results.

#### Categories
Categorize your content into a hierarchical 5-level taxonomy. For example, "Leonardo DiCaprio won an Oscar" returns "/art and entertainment/movies and tv/movies" as the most confident classification.

#### Sentiment
Determine whether your content conveys postive or negative sentiment. Sentiment information can be returned for detected entities, keywords, or user-specified target phrases found in the text.

#### Emotion
Detect anger, disgust, fear, joy, or sadness that is conveyed by your content. Emotion information can be returned for detected entities, keywords, or user-specified target phrases found in the text.

#### Relations
Recognize when two entities are related, and identify the type of relation. For example, you can identify an "awardedTo" relation between an award and its recipient.

#### Semantic Roles
Parse sentences into subject-action-object form, and identify entities and keywords that are subjects or objects of an action.

## Analyzing text

In [4]:
text='After overtaking BMW last year as the world’s largest luxury automaker, \
Mercedes-Benz is turning its attention to a newer, untapped audience: \
millennials seeking smaller cars as they progress through the stages of early adulthood.'

In [5]:
response = natural_language_understanding.analyze(
            language = "en", text = text,
            features = Features(
                        entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "entities": [
    {
      "type": "Company",
      "text": "BMW",
      "sentiment": {
        "score": 0.216764,
        "label": "positive"
      },
      "relevance": 0.33,
      "emotion": {
        "sadness": 0.212793,
        "joy": 0.507769,
        "fear": 0.064729,
        "disgust": 0.03569,
        "anger": 0.030211
      },
      "disambiguation": {
        "subtype": [
          "Engine",
          "AutomobileCompany",
          "AwardWinner"
        ],
        "name": "BMW",
        "dbpedia_resource": "http://dbpedia.org/resource/BMW"
      },
      "count": 1
    },
    {
      "type": "Company",
      "text": "Mercedes-Benz",
      "sentiment": {
        "score": 0.736997,
        "label": "positive"
      },
      "relevance": 0.33,
      "emotion": {
        "sadness": 0.375508,
        "joy": 0.255631,
        "fear": 0.137987,
        "disgust": 0.036792,
  

In [6]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "keywords": [
    {
      "text": "largest luxury automaker",
      "sentiment": {
        "score": 0.216764,
        "label": "positive"
      },
      "relevance": 0.951351,
      "emotion": {
        "sadness": 0.212793,
        "joy": 0.507769,
        "fear": 0.064729,
        "disgust": 0.03569,
        "anger": 0.030211
      }
    },
    {
      "text": "untapped audience",
      "sentiment": {
        "score": 0.769293,
        "label": "positive"
      },
      "relevance": 0.758759,
      "emotion": {
        "sadness": 0.151545,
        "joy": 0.174219,
        "fear": 0.093654,
        "disgust": 0.072945,
        "anger": 0.08686
      }
    },
    {
      "text": "early adulthood",
      "sentiment": {
        "score": 0.583717,
        "label": "positive"
      },
      "relevance": 0.673927,
      "emotion": {
        "sadness": 0.073033,
        "joy": 0.637075

In [7]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "concepts": [
    {
      "text": "BMW",
      "relevance": 0.888452,
      "dbpedia_resource": "http://dbpedia.org/resource/BMW"
    },
    {
      "text": "Mercedes-Benz",
      "relevance": 0.888266,
      "dbpedia_resource": "http://dbpedia.org/resource/Mercedes-Benz"
    },
    {
      "text": "Automotive industry",
      "relevance": 0.882851,
      "dbpedia_resource": "http://dbpedia.org/resource/Automotive_industry"
    },
    {
      "text": "Luxury vehicle",
      "relevance": 0.852593,
      "dbpedia_resource": "http://dbpedia.org/resource/Luxury_vehicle"
    },
    {
      "text": "Karl Benz",
      "relevance": 0.780259,
      "dbpedia_resource": "http://dbpedia.org/resource/Karl_Benz"
    },
    {
      "text": "Diesel engine",
      "relevance": 0.765713,
      "dbpedia_resource": "http://dbpedia.org/resource/Diesel_engine"
    },
    {
      "text": "World",
    

In [8]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        categories = CategoriesOptions()))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 2
  },
  "relations": [
    {
      "type": "basedIn",
      "sentence": "After overtaking BMW last year as the world's largest luxury automaker, Mercedes-Benz is turning its attention to a newer, untapped audience: millennials seeking smaller cars as they progress through the stages of early adulthood.",
      "score": 0.808415,
      "arguments": [
        {
          "text": "automaker",
          "location": [
            61,
            70
          ],
          "entities": [
            {
              "type": "Organization",
              "text": "Mercedes-Benz",
              "disambiguation": {
                "subtype": [
                  "Commercial"
                ]
              }
            }
          ]
        },
        {
          "text": "world",
          "location": [
            38,
            43
          ],
          "entities": [
            {
              "type": "Geopolitical

In [9]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "sentiment": {
    "document": {
      "score": 0.906302,
      "label": "positive"
    }
  },
  "language": "en"
}


In [10]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "semantic_roles": [
    {
      "subject": {
        "text": "the world",
        "keywords": [
          {
            "text": "world"
          }
        ]
      },
      "sentence": "After overtaking BMW last year as the world\u2019s largest luxury automaker, Mercedes-Benz is turning its attention to a newer, untapped audience: millennials seeking smaller cars as they progress through the stages of early adulthood.",
      "object": {
        "text": "largest luxury automaker",
        "keywords": [
          {
            "text": "largest luxury automaker"
          }
        ]
      },
      "action": {
        "verb": {
          "text": "has",
          "tense": "present"
        },
        "text": "s",
        "normalized": "s"
      }
    },
    {
      "subject": {
        "text": "Mercedes-Benz",
        "keywords": [
          {
            "text": "Mercedes-Benz"
          }
        ],

In [11]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "emotion": {
    "document": {
      "emotion": {
        "sadness": 0.19455,
        "joy": 0.595719,
        "fear": 0.069535,
        "disgust": 0.050423,
        "anger": 0.046226
      }
    }
  }
}


In [12]:
response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 230,
    "features": 1
  },
  "language": "en",
  "categories": [
    {
      "score": 0.554889,
      "label": "/automotive and vehicles/cars"
    },
    {
      "score": 0.152065,
      "label": "/automotive and vehicles/vehicle brands/mercedes-benz"
    },
    {
      "score": 0.130826,
      "label": "/business and industrial"
    }
  ]
}


## Analyzing URLS

In [13]:
url='http://www.chicagotribune.com/news/local/breaking/ct-fbi-chase-bank-branch-robbed-on-far-south-side-20170524-story.html'

response = natural_language_understanding.analyze(
            url = url,
            features = Features(
                        entities = EntitiesOptions(sentiment = True, emotion = True),
                        keywords = KeywordsOptions(sentiment = True, emotion = True),
                        concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 811,
    "features": 3
  },
  "retrieved_url": "http://www.chicagotribune.com/news/local/breaking/ct-fbi-chase-bank-branch-robbed-on-far-south-side-20170524-story.html",
  "language": "en",
  "keywords": [
    {
      "text": "would-be robber",
      "sentiment": {
        "score": 0.0,
        "label": "neutral"
      },
      "relevance": 0.97343,
      "emotion": {
        "sadness": 0.23608,
        "joy": 0.086439,
        "fear": 0.177302,
        "disgust": 0.416375,
        "anger": 0.214233
      }
    },
    {
      "text": "Calumet Heights neighborhood",
      "sentiment": {
        "score": -0.567525,
        "label": "negative"
      },
      "relevance": 0.879791,
      "emotion": {
        "sadness": 0.376757,
        "joy": 0.307912,
        "fear": 0.100655,
        "disgust": 0.191158,
        "anger": 0.098523
      }
    },
    {
      "text": "Chase Bank",
      "sentiment": {
        "score": -0.567525,
  

In [14]:
response = natural_language_understanding.analyze(
            url = url,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 811,
    "features": 1
  },
  "semantic_roles": [
    {
      "subject": {
        "text": "A man",
        "keywords": [
          {
            "text": "man"
          }
        ]
      },
      "sentence": "A man tried to rob a Chase Bank branch Wednesday morning in the Calumet Heights neighborhood on the Far South Side, the FBI said.",
      "object": {
        "text": "to rob a Chase Bank branch",
        "keywords": [
          {
            "text": "Chase Bank branch"
          }
        ],
        "entities": [
          {
            "type": "Person",
            "text": "rob"
          },
          {
            "type": "Company",
            "text": "Chase Bank",
            "disambiguation": {
              "subtype": [
                "Company"
              ],
              "name": "Chase (bank)",
              "dbpedia_resource": "http://dbpedia.org/resource/Chase_(bank)"
            }
          }
        ]
     

### Semantically difficult examples

In [15]:
url='http://www.cnbc.com/2017/05/11/caterpillar-earnings-will-beat-all-year-long-bank-of-america-says.html'

response = natural_language_understanding.analyze(
            url = url,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 1037,
    "features": 1
  },
  "retrieved_url": "https://www.cnbc.com/2017/05/11/caterpillar-earnings-will-beat-all-year-long-bank-of-america-says.html",
  "language": "en",
  "concepts": [
    {
      "text": "Bank of America",
      "relevance": 0.955122,
      "dbpedia_resource": "http://dbpedia.org/resource/Bank_of_America"
    },
    {
      "text": "Merrill Lynch",
      "relevance": 0.88865,
      "dbpedia_resource": "http://dbpedia.org/resource/Merrill_Lynch"
    },
    {
      "text": "Private banking",
      "relevance": 0.571594,
      "dbpedia_resource": "http://dbpedia.org/resource/Private_banking"
    },
    {
      "text": "Stock",
      "relevance": 0.551895,
      "dbpedia_resource": "http://dbpedia.org/resource/Stock"
    },
    {
      "text": "Troubled Asset Relief Program",
      "relevance": 0.485637,
      "dbpedia_resource": "http://dbpedia.org/resource/Troubled_Asset_Relief_Program"
    },
    {
      "

In [16]:
text="Caterpillar earnings will 'beat all year long,' Bank of America says"

response = natural_language_understanding.analyze(
            text = text,
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        #sentiment = SentimentOptions(),
                        semantic_roles = SemanticRolesOptions(entities = True, keywords = True)
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 1,
    "text_characters": 68,
    "features": 1
  },
  "semantic_roles": [
    {
      "subject": {
        "text": "Caterpillar earnings",
        "keywords": [
          {
            "text": "Caterpillar earnings"
          }
        ],
        "entities": [
          {
            "type": "Company",
            "text": "Caterpillar"
          }
        ]
      },
      "sentence": "Caterpillar earnings will 'beat all year long,' Bank of America says",
      "action": {
        "verb": {
          "text": "'beat",
          "tense": "future"
        },
        "text": "will 'beat",
        "normalized": "will 'beat"
      }
    },
    {
      "subject": {
        "text": "Bank of America",
        "keywords": [
          {
            "text": "Bank"
          },
          {
            "text": "America"
          }
        ],
        "entities": [
          {
            "type": "Company",
            "text": "Bank of America"
          }
        ]
 

## Expect the explosion of data volumes

In [17]:
url='https://en.wikipedia.org/wiki/University_of_Chicago'

response = natural_language_understanding.analyze(
            url = url,
            features = Features(
                        entities = EntitiesOptions(sentiment = True, emotion = True),
                        keywords = KeywordsOptions(sentiment = True, emotion = True),
                        concepts = ConceptsOptions(limit = 50),
                        relations = RelationsOptions(),
                        sentiment = SentimentOptions(),
                        semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        emotion = EmotionOptions(),
                        categories = CategoriesOptions()
            ))

print(json.dumps(response, indent=2))

{
  "usage": {
    "text_units": 5,
    "text_characters": 50000,
    "features": 8
  },
  "sentiment": {
    "document": {
      "score": -0.306135,
      "label": "negative"
    }
  },
  "semantic_roles": [
    {
      "subject": {
        "text": "University of Illinois at Chicago",
        "keywords": [
          {
            "text": "Illinois"
          },
          {
            "text": "Chicago"
          }
        ],
        "entities": [
          {
            "type": "Organization",
            "text": "University of Illinois",
            "disambiguation": {
              "subtype": [
                "Location",
                "AcademicInstitution",
                "CollegeUniversity"
              ],
              "name": "University of Illinois at Urbana\u2013Champaign",
              "dbpedia_resource": "http://dbpedia.org/resource/University_of_Illinois_at_Urbana\u2013Champaign"
            }
          },
          {
            "type": "Location",
            "text":

## Running Tweets through NLU

In [18]:
directory = 'C://Users//IBM_ADMIN//Documents//Teaching//Data Projects//Text//Tweets//'
    
file = 'jeep_new.txt'
path = directory + file

In [19]:
tweets = pd.read_csv(path,sep='\t', names = ['id', 'lang', 'created_at', 'screen_name', \
                                                       'name', 'location', 'retweet_count', 'text'])

tweets = tweets[tweets['text'].str.contains("love|hate", case=False)==True]

tweets = tweets.sample(n=1000)

In [20]:
tweets.head(5)

Unnamed: 0,id,lang,created_at,screen_name,name,location,retweet_count,text
41958,9.238661e+17,en,Fri Oct 27 10:56:46 +0000 2017,harDCor_barra,Emperor Grunfeld,,0.0,@JimmyDonofrio @md_dc I love the touch of ligh...
13614,9.225938e+17,en,Mon Oct 23 22:41:12 +0000 2017,FieldsCJDR,FieldsCJDR,"Sanford, FL",0.0,A royal pair. 📸: Victoria D. #jeep #itsajeepth...
56150,9.230461e+17,en,Wed Oct 25 04:38:27 +0000 2017,danticvs,danticvs,FLA,0.0,📷 bertmacklin-atf: The Kaiser-Jeep 1969 Bolide...
66256,9.228317e+17,en,Tue Oct 24 14:26:43 +0000 2017,morgan_xc17,Morgan Evans,"Locust Grove,GA ✈️Deland, FL",0.0,RT @JamieXC16: Happy Birthday to the only pers...
10136,9.220377e+17,en,Sun Oct 22 09:51:28 +0000 2017,Addicted2Emison,wayhaught earper,"Vancouver, British Columbia",0.0,"RT @nowhere897: I don’t know why, but I’d love..."


In [21]:
#Take a small sample
#tweets_eng = tweets[tweets['lang']=='en'].reset_index(drop=True).sample(frac=0.01, replace=True)
tweets_eng = tweets[tweets['lang']=='en'].reset_index(drop=True).head(10)

In [22]:
len(tweets_eng)

10

In [23]:
# For NLU we DO NOT have to remove special characters to avoid problems with analysis
tweets_eng['text_clean'] = tweets_eng['text'].map(lambda x: re.sub('[^a-zA-Z0-9 @ . , : - _]', '', str(x)))

In [24]:
pd.set_option('display.max_colwidth', 150)
pd.set_option('display.max_rows', 10)
tweets_eng[['text']].head(5)

Unnamed: 0,text
0,@JimmyDonofrio @md_dc I love the touch of light brown for the Jeep interior
1,A royal pair. 📸: Victoria D. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepporn #jeepfamily… https://t.co/nYHFjlNght
2,📷 bertmacklin-atf: The Kaiser-Jeep 1969 Bolide XJ-002 Concept. I am in love. Want one! #jeep https://t.co/Kk0h0BQFry
3,RT @JamieXC16: Happy Birthday to the only person I would listen to loud country music in a muddy Jeep with. Love you Morgan!!💕😁…
4,"RT @nowhere897: I don’t know why, but I’d love to see #WayHaught driving around in Waverly’s Jeep"


In [25]:
# Read individual columns
raw_id = tweets_eng["id"].values
raw_text = tweets_eng["text"].values

In [26]:
raw_text

array([ '@JimmyDonofrio @md_dc I love the touch of light brown for the Jeep interior',
       'A royal pair. 📸: Victoria D. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepporn #jeepfamily… https://t.co/nYHFjlNght',
       '📷 bertmacklin-atf: The Kaiser-Jeep 1969 Bolide XJ-002 Concept. I am in love. Want one! #jeep https://t.co/Kk0h0BQFry',
       'RT @JamieXC16: Happy Birthday to the only person I would listen to loud country music in a muddy Jeep with. Love you Morgan!!💕😁…',
       'RT @nowhere897: I don’t know why, but I’d love to see #WayHaught driving around in Waverly’s Jeep',
       'See that black classic jeep in the background... Lovely 😍 😍 😘 https://t.co/y2D0AjS3AH',
       'Needs a little love but hey ya girls got a jeep again https://t.co/Q2Vfvj84Xp',
       'I love my jeep but the amount of gas this thing burns thru is ridiculous',
       'RT @jeepfederation: You love your Jeep but also want to drive a Tank! #wranglerLIFE #beatank https://t.co/PXNbTNm

In [27]:
# Set up lists to collect sentiments, and scores
sentiment_list = []
score_list = []

# carry over existing values
tweet_id = []
tweet_list = []

In [28]:
count = 0
for raw_id, raw_text in \
zip(raw_id, raw_text):
    response = natural_language_understanding.analyze(
        language = "en", #do not NLU to guess language on Tweets
        text=raw_text,
        
            features = Features(
                        #entities = EntitiesOptions(sentiment = True, emotion = True),
                        #keywords = KeywordsOptions(sentiment = True, emotion = True),
                        #concepts = ConceptsOptions(limit = 50),
                        #relations = RelationsOptions(),
                        sentiment = SentimentOptions(),
                        #semantic_roles = SemanticRolesOptions(entities = True, keywords = True),
                        #emotion = EmotionOptions(),
                        #categories = CategoriesOptions()
            ))

    if "label" in response["sentiment"]["document"]:
        sentiment_list.append(str(response["sentiment"]["document"]["label"]))
    else:
        sentiment_list.append("NA") 

    if "score" in response["sentiment"]["document"]:
        score_list.append(str(response["sentiment"]["document"]["score"]))
    else:
        score_list.append("0") 

    tweet_list.append(raw_text)
    tweet_id.append(raw_id)
    #print(count)
    #print(tweet_list)
    #print(tweet_id)
    count += 1

#### Dealing with nested JSONs

In [29]:
response

{'language': 'en',
 'sentiment': {'document': {'label': 'positive', 'score': 0.883441}},
 'usage': {'features': 1, 'text_characters': 110, 'text_units': 1}}

In [30]:
response["sentiment"]

{'document': {'label': 'positive', 'score': 0.883441}}

In [31]:
response["sentiment"]["document"]

{'label': 'positive', 'score': 0.883441}

In [32]:
response["sentiment"]["document"]["label"]

'positive'

In [33]:
response["sentiment"]["document"]["score"]

0.883441

#### Combining results together

In [34]:
# Create a dataframe from the results
column_names = ["Tweet_Id", "Tweet_Text", "NLU_Sentiment", "NLU_Score"]
nlu_results = [tweet_id, tweet_list, sentiment_list, score_list]
results_dict = dict(zip(column_names,nlu_results))
results = pd.DataFrame.from_dict(results_dict, orient='columns')
results = results[column_names]   # set specific column order

In [35]:
results.head(10)

Unnamed: 0,Tweet_Id,Tweet_Text,NLU_Sentiment,NLU_Score
0,9.238661e+17,@JimmyDonofrio @md_dc I love the touch of light brown for the Jeep interior,positive,0.734282
1,9.225938e+17,A royal pair. 📸: Victoria D. #jeep #itsajeepthing #jeeplove #jeeplife #wrangler #jeepwrangler #jeepporn #jeepfamily… https://t.co/nYHFjlNght,neutral,0.0
2,9.230461e+17,📷 bertmacklin-atf: The Kaiser-Jeep 1969 Bolide XJ-002 Concept. I am in love. Want one! #jeep https://t.co/Kk0h0BQFry,positive,0.853089
3,9.228317e+17,RT @JamieXC16: Happy Birthday to the only person I would listen to loud country music in a muddy Jeep with. Love you Morgan!!💕😁…,positive,0.767229
4,9.220377e+17,"RT @nowhere897: I don’t know why, but I’d love to see #WayHaught driving around in Waverly’s Jeep",neutral,0.0
5,9.227348e+17,See that black classic jeep in the background... Lovely 😍 😍 😘 https://t.co/y2D0AjS3AH,positive,0.673325
6,9.233692e+17,Needs a little love but hey ya girls got a jeep again https://t.co/Q2Vfvj84Xp,neutral,0.0
7,9.23657e+17,I love my jeep but the amount of gas this thing burns thru is ridiculous,negative,-0.705518
8,9.226811e+17,RT @jeepfederation: You love your Jeep but also want to drive a Tank! #wranglerLIFE #beatank https://t.co/PXNbTNmqHw,neutral,0.0
9,9.230392e+17,@lindarutter You go Dad! Nice looking ride for a deserving patriot. We are a two Jeep family - love our Jeeps!,positive,0.883441


In [36]:
# Data Export
outfile = "jeep_with_sentiment.xlsx"
outpath = directory + outfile

writer = pd.ExcelWriter(outpath, engine = "xlsxwriter")
results.to_excel(writer)
writer.save() 