# WDP Developer Day: Analyze Facebook Data Using IBM Watson and IBM Data Platform

This Python notebook leverages the Natural Language analysis capability of Watson Tone Analyzer to help interpret emotional intent from a collection of Facebook posts regarding one of Acme’s largest Outdoor Protection product lines (Sunscreen). Read through each step of the notebook to understand what is taking place and run each cell step-by-step. You do not need to add any code to the notebook.
 
At a high-level, the notebook is performing the following actions:
1. The Watson Tone Analyzer Service pulls out `Emotion Tones` and related `Keywords` within the Facebook post. 

2. Data is prep for analysis and visualization. Pandas DataFrames will contain the results of the analysis.

3. PixieDust, a visualization library, is used to analyze the data contained in the DataFrames and visualize the resuls. 


### Step 1. Install packages

In [43]:
!pip install --upgrade watson-developer-cloud

Requirement already up-to-date: watson-developer-cloud in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages
Requirement already up-to-date: pysolr<4.0,>=3.3 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages (from watson-developer-cloud)
Requirement already up-to-date: requests<3.0,>=2.0 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages (from watson-developer-cloud)
Requirement already up-to-date: pyOpenSSL>=16.2.0 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages (from watson-developer-cloud)
Requirement already up-to-date: urllib3<1.23,>=1.21.1 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages (from requests<3.0,>=2.0->watson-developer-cloud)
Requirement alre

<a id="pixie"></a>
### Step 2. Install PixieDust for visualization
If you are new to PixieDust or would like to learn more about the library, please go to this [Introductory Notebook](https://apsportal.ibm.com/exchange/public/entry/view/5b000ed5abda694232eb5be84c3dd7c1) or visit the [PixieDust Github](https://ibm-cds-labs.github.io/pixiedust/). The `Setup` section for this notebook uses instructions from the [Intro To PixieDust](https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/Intro%20to%20PixieDust.ipynb) notebook

In [44]:
!pip install --user --upgrade pixiedust

Requirement already up-to-date: pixiedust in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages
Requirement already up-to-date: lxml in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages (from pixiedust)
Requirement already up-to-date: geojson in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages (from pixiedust)
Requirement already up-to-date: mpld3 in /gpfs/global_fs01/sym_shared/YPProdSpark/user/sb4b-e211cfa7dc4fc9-a26c58fb4b3e/.local/lib/python3.5/site-packages (from pixiedust)


<a id="setup2"></a>
### Step 3. Import Packages and Libraries
To check if you have package already installed, open new cell and write: *help.('Package Name')*

In [45]:
import json
import sys
import watson_developer_cloud
from watson_developer_cloud import ToneAnalyzerV3, VisualRecognitionV3
import watson_developer_cloud.natural_language_understanding.features.v1 as features

import operator
from functools import reduce
from io import StringIO
import numpy as np
# from bs4 import BeautifulSoup as bs
from operator import itemgetter
from os.path import join, dirname
import pandas as pd
import numpy as np
import requests
import pixiedust

<a id='setup3'></a>
### Step 4. Insert Service Credentials From Bluemix for Watson Tone Analyzer

In [46]:
tone_analyzer = ToneAnalyzerV3(version='2016-05-19',
                               username='8c8b2ece-1a91-4f0a-a304-aa6f1fc79b77',
                               password='WsNuIErKBNzU')

### Step 5. Connect to Db2 Warehouse on cloud instance which contains Acme's Facebook comments. 

In [47]:
from ibmdbpy import IdaDataBase, IdaDataFrame

# @hidden_cell
# This connection object is used to access your data and contains your credentials.
# You might want to remove those credentials before you share your notebook.
idadb_abe8608cb7a84abf8ea60a002a751df5 = IdaDataBase(dsn='DASHDB;Database=BLUDB;Hostname=dashdb-entry-yp-dal09-08.services.dal.bluemix.net;Port=50000;PROTOCOL=TCPIP;UID=dash6285;PWD=lX2Vd~9l@lXL')

data_df_5 = IdaDataFrame(idadb_abe8608cb7a84abf8ea60a002a751df5, 'DASH6285.ACME_FACEBOOK_EXPORT').as_dataframe()
data_df_5.head()

# You can close the database connection with the following code. Please keep the comment line with the @hidden_cell tag,
# because the close function displays parts of the credentials.
# @hidden_cell
# idadb_abe8608cb7a84abf8ea60a002a751df5.close()
# To learn more about the ibmdby package, please read the documentation: http://pythonhosted.org/ibmdbpy/


Unnamed: 0,POST_ID,PERMALINK,POST_MESSAGE,TYPE,COUNTRIES,LANGUAGES,POSTED,AUDIENCE_TARGETING,LIFETIME_POST_TOTAL_REACH,LIFETIME_POST_ORGANIC_REACH,...,LIFETIME_POST_CONSUMERS_BY_TYPE___PHOTO_VIEW,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___LINK_CLICKS,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___OTHER_CLICKS,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___PHOTO_VIEW,LIFETIME_NEGATIVE_FEEDBACK___HIDE_ALL_CLICKS,LIFETIME_NEGATIVE_FEEDBACK___HIDE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK___UNLIKE_PAGE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___HIDE_ALL_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___HIDE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___UNLIKE_PAGE_CLICKS
0,187446750783_10153359024455784,https://www.facebook.com/ibmwatson/posts/10153...,Cheers to a wonderful Summer Check out Acme's ...,Photo,,,12/31/2015 6:28,,2291,2291,...,4.0,21.0,27.0,4.0,,,,,,
1,187446750783_10153215851080784,https://www.facebook.com/ibmwatson/posts/10153...,I love my Acme sunscreen. Great price for the ...,Photo,,,12/31/2015 6:26,,158,158,...,307.0,,,544.0,,,,,,
2,187446750783_10153357233820784,https://www.facebook.com/ibmwatson/posts/10153...,How great is Acme's suncreen packaging Very co...,Photo,,,12/30/2015 7:00,,4203,4203,...,67.0,26.0,102.0,94.0,,,,,,
3,187446750783_10153355476175784,https://www.facebook.com/ibmwatson/posts/10153...,Acme's suncreen is too Expensive and poor qual...,Link,,,12/29/2015 6:26,,3996,3996,...,,44.0,20.0,,,,,,,
4,187446750783_10153353697105784,https://www.facebook.com/ibmwatson/posts/10153...,If you want great long-lasting sunscreen buy A...,Photo,,,12/28/2015 7:05,,2847,2847,...,62.0,19.0,37.0,83.0,1.0,,,1.0,,


In [48]:
#Make sure this equals the variable above.
df = data_df_5
df.head()

Unnamed: 0,POST_ID,PERMALINK,POST_MESSAGE,TYPE,COUNTRIES,LANGUAGES,POSTED,AUDIENCE_TARGETING,LIFETIME_POST_TOTAL_REACH,LIFETIME_POST_ORGANIC_REACH,...,LIFETIME_POST_CONSUMERS_BY_TYPE___PHOTO_VIEW,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___LINK_CLICKS,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___OTHER_CLICKS,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___PHOTO_VIEW,LIFETIME_NEGATIVE_FEEDBACK___HIDE_ALL_CLICKS,LIFETIME_NEGATIVE_FEEDBACK___HIDE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK___UNLIKE_PAGE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___HIDE_ALL_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___HIDE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___UNLIKE_PAGE_CLICKS
0,187446750783_10153359024455784,https://www.facebook.com/ibmwatson/posts/10153...,Cheers to a wonderful Summer Check out Acme's ...,Photo,,,12/31/2015 6:28,,2291,2291,...,4.0,21.0,27.0,4.0,,,,,,
1,187446750783_10153215851080784,https://www.facebook.com/ibmwatson/posts/10153...,I love my Acme sunscreen. Great price for the ...,Photo,,,12/31/2015 6:26,,158,158,...,307.0,,,544.0,,,,,,
2,187446750783_10153357233820784,https://www.facebook.com/ibmwatson/posts/10153...,How great is Acme's suncreen packaging Very co...,Photo,,,12/30/2015 7:00,,4203,4203,...,67.0,26.0,102.0,94.0,,,,,,
3,187446750783_10153355476175784,https://www.facebook.com/ibmwatson/posts/10153...,Acme's suncreen is too Expensive and poor qual...,Link,,,12/29/2015 6:26,,3996,3996,...,,44.0,20.0,,,,,,,
4,187446750783_10153353697105784,https://www.facebook.com/ibmwatson/posts/10153...,If you want great long-lasting sunscreen buy A...,Photo,,,12/28/2015 7:05,,2847,2847,...,62.0,19.0,37.0,83.0,1.0,,,1.0,,


In [27]:

# @hidden_cell
#credentials_1 = {
 # 'port':'50000',
  #'db':'BLUDB',
  #'username':'dash6285',
  #'ssljdbcurl':'jdbc:db2://dashdb-entry-yp-dal09-08.services.dal.bluemix.net:50001/BLUDB:sslConnection=true;',
  #'host':'dashdb-entry-yp-dal09-08.services.dal.bluemix.net',
  #'https_url':'https://dashdb-entry-yp-dal09-08.services.dal.bluemix.net:8443',
  #'dsn':'DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal09-08.services.dal.bluemix.net;PORT=50000;PROTOCOL=TCPIP;UID=dash6285;PWD=lX2Vd~9l@lXL;',
  #'hostname':'dashdb-entry-yp-dal09-08.services.dal.bluemix.net',
  #'jdbcurl':'jdbc:db2://dashdb-entry-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB',
  #'ssldsn':'DATABASE=BLUDB;HOSTNAME=dashdb-entry-yp-dal09-08.services.dal.bluemix.net;PORT=50001;PROTOCOL=TCPIP;UID=dash6285;PWD=lX2Vd~9l@lXL;Security=SSL;',
  #'uri':'db2://dash6285:lX2Vd%7E9l%40lXL@dashdb-entry-yp-dal09-08.services.dal.bluemix.net:50000/BLUDB',
  #'password':"""lX2Vd~9l@lXL"""
#}

<a id='prepare'></a>
### Step 6. Prepare Data by cleansing it to rename columns, remove noticeble noise in the data, pull out URLs and append to a new column to run through NLU



In [50]:
df.rename(columns={'POST_MESSAGE': 'Text'}, inplace=True)

In [51]:
df = df.drop([0])
df.head()

Unnamed: 0,POST_ID,PERMALINK,Text,TYPE,COUNTRIES,LANGUAGES,POSTED,AUDIENCE_TARGETING,LIFETIME_POST_TOTAL_REACH,LIFETIME_POST_ORGANIC_REACH,...,LIFETIME_POST_CONSUMERS_BY_TYPE___PHOTO_VIEW,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___LINK_CLICKS,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___OTHER_CLICKS,LIFETIME_POST_CONSUMPTIONS_BY_TYPE___PHOTO_VIEW,LIFETIME_NEGATIVE_FEEDBACK___HIDE_ALL_CLICKS,LIFETIME_NEGATIVE_FEEDBACK___HIDE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK___UNLIKE_PAGE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___HIDE_ALL_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___HIDE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___UNLIKE_PAGE_CLICKS
1,187446750783_10153215851080784,https://www.facebook.com/ibmwatson/posts/10153...,I love my Acme sunscreen. Great price for the ...,Photo,,,12/31/2015 6:26,,158,158,...,307.0,,,544.0,,,,,,
2,187446750783_10153357233820784,https://www.facebook.com/ibmwatson/posts/10153...,How great is Acme's suncreen packaging Very co...,Photo,,,12/30/2015 7:00,,4203,4203,...,67.0,26.0,102.0,94.0,,,,,,
3,187446750783_10153355476175784,https://www.facebook.com/ibmwatson/posts/10153...,Acme's suncreen is too Expensive and poor qual...,Link,,,12/29/2015 6:26,,3996,3996,...,,44.0,20.0,,,,,,,
4,187446750783_10153353697105784,https://www.facebook.com/ibmwatson/posts/10153...,If you want great long-lasting sunscreen buy A...,Photo,,,12/28/2015 7:05,,2847,2847,...,62.0,19.0,37.0,83.0,1.0,,,1.0,,
5,187446750783_10153351555645784,https://www.facebook.com/ibmwatson/posts/10153...,Acme's packaging is very hard to open Needs im...,Photo,,,12/27/2015 7:00,,2514,2514,...,71.0,6.0,13.0,97.0,,,,,,


<a id='enrich'></a> 
### Step 7. Enrichment Time!
<a id='nlupost'></a>

Below uses Natural Language Understanding to iterate through each post and extract the enrichment features we want to use in our future analysis.

Each feature we extract will be appended to the `.csv` in a new column we determine at the end of this script. If you want to run this same script for the other columns, define `free_form_responses` to the column name, if you are using URLs, change `text=response` parameter to `url=response`, and update the new column names as you see fit. 

In [52]:
# Extract the free form text response from the data frame
# If you are using this script for a diff CSV, you will have to change this column name
free_form_responses = df['Text']
# define the list of enrichments to apply
# if you are modifying this script add or remove the enrichments as needed
f = [features.Entities(), features.Keywords(),features.Emotion(),features.Sentiment()]#'typed-rels'

# Create a list to store the enriched data
overallSentimentScore = []
overallSentimentType = []
highestEmotion = []
highestEmotionScore = []
kywords = []
entities = []

# Go thru every reponse and enrich the text using NLU
for idx, response in enumerate(free_form_responses):
    #print("Processing record number: ", idx, " and text: ", response)
    try:
        enriched_json = json.loads(json.dumps(nlu.analyze(text=response, features=f)))
        #print(enriched_json)

        # get the SENTIMENT score and type
        if 'sentiment' in enriched_json:
            if('score' in enriched_json['sentiment']["document"]):
                overallSentimentScore.append(enriched_json["sentiment"]["document"]["score"])
            else:
                overallSentimentScore.append('0')

            if('label' in enriched_json['sentiment']["document"]):
                overallSentimentType.append(enriched_json["sentiment"]["document"]["label"])
            else:
                overallSentimentType.append('0')

        # read the EMOTIONS into a dict and get the key (emotion) with maximum value
        if 'emotion' in enriched_json:
            me = max(enriched_json["emotion"]["document"]["emotion"].items(), key=operator.itemgetter(1))[0]
            highestEmotion.append(me)
            highestEmotionScore.append(enriched_json["emotion"]["document"]["emotion"][me])

        else:
            highestEmotion.append("")
            highestEmotionScore.append("")

        #iterate and get KEYWORDS with a confidence of over 50%
        if 'keywords' in enriched_json:
            #print((enriched_json['keywords']))
            tmpkw = []
            for kw in enriched_json['keywords']:
                if(float(kw["relevance"]) >= 0.5):
                    #print("kw is: ", kw, "and val is ", kw["text"])
                    tmpkw.append(kw["text"])#str(kw["text"]).strip('[]')
            #convert multiple keywords in a list to a string
            if(len(tmpkw) > 1):
                tmpkw = "".join(reduce(lambda a, b: a + ', ' + b, tmpkw))
            elif(len(tmpkw) == 0):
                tmpkw = ""
            else:
                tmpkw = "".join(reduce(lambda a, b='': a + b , tmpkw))
            kywords.append(tmpkw)
        else:
            kywords.append("")
            
        #iterate and get Entities with a confidence of over 30%
        if 'entities' in enriched_json:
            #print((enriched_json['entities']))
            tmpent = []
            for ent in enriched_json['entities']:
                
                if(float(ent["relevance"]) >= 0.3):
                    tmpent.append(ent["type"])
            #convert multiple concepts in a list to a string
            if(len(tmpent) > 1):
                tmpent = "".join(reduce(lambda a, b: a + ', ' + b, tmpent))
            elif(len(tmpent) == 0):
                tmpent = ""
            else:
                tmpent = "".join(reduce(lambda a, b='': a + b , tmpent))
            entities.append(tmpent)
        else:
            entities.append("")    
            
    except:
        # catch *all* exceptions
        e = sys.exc_info()[0]
        overallSentimentScore.append(' ')
        overallSentimentType.append(' ')
        highestEmotion.append(' ')
        highestEmotionScore.append(' ')
        kywords.append(' ')
        entities.append(' ')
        pass
    
# Create columns from the list and append to the dataframe
if highestEmotion:
    df['TextHighestEmotion'] = highestEmotion
if highestEmotionScore:
    df['TextHighestEmotionScore'] = highestEmotionScore

if overallSentimentType:
    df['TextOverallSentimentType'] = overallSentimentType
if overallSentimentScore:
    df['TextOverallSentimentScore'] = overallSentimentScore

df['TextKeywords'] = kywords
df['TextEntities'] = entities
df.head()
#df.info()

Unnamed: 0,POST_ID,PERMALINK,Text,TYPE,COUNTRIES,LANGUAGES,POSTED,AUDIENCE_TARGETING,LIFETIME_POST_TOTAL_REACH,LIFETIME_POST_ORGANIC_REACH,...,LIFETIME_NEGATIVE_FEEDBACK___UNLIKE_PAGE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___HIDE_ALL_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___HIDE_CLICKS,LIFETIME_NEGATIVE_FEEDBACK_FROM_USERS_BY_TYPE___UNLIKE_PAGE_CLICKS,TextHighestEmotion,TextHighestEmotionScore,TextOverallSentimentType,TextOverallSentimentScore,TextKeywords,TextEntities
1,187446750783_10153215851080784,https://www.facebook.com/ibmwatson/posts/10153...,I love my Acme sunscreen. Great price for the ...,Photo,,,12/31/2015 6:26,,158,158,...,,,,,,,,,,
2,187446750783_10153357233820784,https://www.facebook.com/ibmwatson/posts/10153...,How great is Acme's suncreen packaging Very co...,Photo,,,12/30/2015 7:00,,4203,4203,...,,,,,,,,,,
3,187446750783_10153355476175784,https://www.facebook.com/ibmwatson/posts/10153...,Acme's suncreen is too Expensive and poor qual...,Link,,,12/29/2015 6:26,,3996,3996,...,,,,,,,,,,
4,187446750783_10153353697105784,https://www.facebook.com/ibmwatson/posts/10153...,If you want great long-lasting sunscreen buy A...,Photo,,,12/28/2015 7:05,,2847,2847,...,,1.0,,,,,,,,
5,187446750783_10153351555645784,https://www.facebook.com/ibmwatson/posts/10153...,Acme's packaging is very hard to open Needs im...,Photo,,,12/27/2015 7:00,,2514,2514,...,,,,,,,,,,


### Step 8. After we extract all of the Keywords and Entities from each Post, we have a column with multiple Keywords, and Entities separated by commas. For our future Analysis we also wanted the top Keyword and Entity for each Post. Because of this, we added two new columns to capture the `MaxTextKeyword` and `MaxTextEntity`

In [53]:
#choose first of Keywords,Concepts, Entities
df["MaxTextKeywords"] = df["TextKeywords"].apply(lambda x: x.split(',')[0])
df["MaxTextEntity"] = df["TextEntities"].apply(lambda x: x.split(',')[0])
#df.head()

 <a id='tonepost'></a> 
### Step 9. Run Tone Analyzer on Facebook comments

In [54]:
# Extract the free form text response from the data frame
# If you are using this script for a diff CSV, you will have to change this column name
free_form_responses = df['Text']

#Create a list to store the enriched data

highestEmotionTone = []
emotionToneScore = []

languageToneScore = []
highestLanguageTone = []

socialToneScore = []
highestSocialTone = []


for idx, response in enumerate(free_form_responses):
    #print("Processing record number: ", idx, " and text: ", response)
    try:
        enriched_json = json.loads(json.dumps(tone_analyzer.tone(text=response)))
        #print(enriched_json)
        
        if 'tone_categories' in enriched_json['document_tone']:
            me = max(enriched_json["document_tone"]["tone_categories"][0]["tones"], key = itemgetter('score'))['tone_name']      
            highestEmotionTone.append(me)
            you = max(enriched_json["document_tone"]["tone_categories"][0]["tones"], key = itemgetter('score'))['score']
            emotionToneScore.append(you)
            
            me1 = max(enriched_json["document_tone"]["tone_categories"][1]["tones"], key = itemgetter('score'))['tone_name']      
            highestLanguageTone.append(me1)
            you1 = max(enriched_json["document_tone"]["tone_categories"][1]["tones"], key = itemgetter('score'))['score']
            languageToneScore.append(you1)
            
            me2 = max(enriched_json["document_tone"]["tone_categories"][2]["tones"], key = itemgetter('score'))['tone_name']      
            highestSocialTone.append(me2)
            you2 = max(enriched_json["document_tone"]["tone_categories"][2]["tones"], key = itemgetter('score'))['score']
            socialToneScore.append(you2)
            
            
            
    except:
        # catch *all* exceptions
        e = sys.exc_info()[0]
        emotionToneScore.append(' ')
        highestEmotionTone.append(' ')
        languageToneScore.append(' ')
        highestLanguageTone.append(' ')
        socialToneScore.append(' ')
        highestSocialTone.append(' ')
        pass
    
if highestEmotionTone:
    df['highestEmotionTone'] = highestEmotionTone    
if emotionToneScore:
    df['emotionToneScore'] = emotionToneScore
    
if languageToneScore:
    df['languageToneScore'] = languageToneScore
if highestLanguageTone:
    df['highestLanguageTone'] = highestLanguageTone
    
if highestSocialTone:
    df['highestSocialTone'] = highestSocialTone    
if socialToneScore:
    df['socialToneScore'] = socialToneScore 
    
df.head()
#df.info()

Unnamed: 0,POST_ID,PERMALINK,Text,TYPE,COUNTRIES,LANGUAGES,POSTED,AUDIENCE_TARGETING,LIFETIME_POST_TOTAL_REACH,LIFETIME_POST_ORGANIC_REACH,...,TextKeywords,TextEntities,MaxTextKeywords,MaxTextEntity,highestEmotionTone,emotionToneScore,languageToneScore,highestLanguageTone,highestSocialTone,socialToneScore
1,187446750783_10153215851080784,https://www.facebook.com/ibmwatson/posts/10153...,I love my Acme sunscreen. Great price for the ...,Photo,,,12/31/2015 6:26,,158,158,...,,,,,Joy,0.952695,0.655978,Confident,Extraversion,0.817111
2,187446750783_10153357233820784,https://www.facebook.com/ibmwatson/posts/10153...,How great is Acme's suncreen packaging Very co...,Photo,,,12/30/2015 7:00,,4203,4203,...,,,,,Joy,0.731807,0.80026,Confident,Extraversion,0.741637
3,187446750783_10153355476175784,https://www.facebook.com/ibmwatson/posts/10153...,Acme's suncreen is too Expensive and poor qual...,Link,,,12/29/2015 6:26,,3996,3996,...,,,,,Sadness,0.727545,0.67368,Analytical,Extraversion,0.393746
4,187446750783_10153353697105784,https://www.facebook.com/ibmwatson/posts/10153...,If you want great long-lasting sunscreen buy A...,Photo,,,12/28/2015 7:05,,2847,2847,...,,,,,Joy,0.653999,0.687768,Analytical,Agreeableness,0.797647
5,187446750783_10153351555645784,https://www.facebook.com/ibmwatson/posts/10153...,Acme's packaging is very hard to open Needs im...,Photo,,,12/27/2015 7:00,,2514,2514,...,,,,,Sadness,0.156907,0.80026,Confident,Emotional Range,0.517814


### Step 10. Create a new dataframe for analyzing the emotional tone from each Facebook entry

In [55]:
post_tones = ["highestEmotionTone"]

#Create a new dataframe with tones
df_post_tones = df[post_tones]
#Aggregate the tone data for Analysis
tones = df_post_tones
tones = pd.DataFrame(tones.groupby('highestEmotionTone').size().reset_index(name='Posts'))
tones.head()

Unnamed: 0,highestEmotionTone,Posts
0,Anger,5
1,Fear,2
2,Joy,33
3,Sadness,20


### Step 11. Use the Pixiedust library to easily visualize the data. 

PixieDust lets you visualize your data in just a few clicks using the display() API. You can find more info at https://ibm-cds-labs.github.io/pixiedust/displayapi.html. The following cell creates a DataFrame and uses the display() API to create a pie chart:

In [56]:
display(tones)

### [Action Required] What is the chart telling us about how customer's feel about Acme's sunscreen? Include your answer in your final team recommendations


### Step 12. Filter out the "Joy" tone and perform Language Analysis for only the negative tones to understand why customers do not like Acme's sunscreen

In [66]:
#Filter by Negative Tones

df_negative=df.loc[df['highestEmotionTone'] != 'Joy']


df_words = pd.DataFrame(df_negative['Text'])


df_words=df_words['Text'].str.split(' ', expand=True).stack().to_frame()
df_words.columns=['Word']
df_words=pd.DataFrame(df_words.groupby('Word').size().reset_index(name='Count'))
df_words=df_words.loc[~df_words['Word'].isin(['Acme','to','I','it',"Acme's",'the','and','Sunscreen','sunscreen','suncreen','is','will'])]
df_words=df_words.sort_values('Count', ascending=False)

In [68]:
#Use the Pixiedust library to easily visualize the data
display(df_words.head(2))

### [Action Required] What are the top 2 reasons why customers do not like Acme's suncreen. What should Acme do to address concerns? Will this help improve their sales? Include your answers in your final team recommendations.