# **Compute Sentiment Using 4 SyuzhetR and 7 SentimentR Models**

* https://www.youtube.com/watch?v=U3ByGh8RmSc

* https://github.com/ttimbers/intro-to-reticulate

[Use R on Google Colab!](https://colab.research.google.com/notebook#create=true&language=r)

# **[STEP 1] Configuration and Setup**

## Configure Jupyter Notebook

In [None]:
# Ignore warnings

import warnings
warnings.filterwarnings('ignore')

In [None]:
# Configure Jupyter

# Enable multiple outputs from one code cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

from IPython.display import display
from IPython.display import Image
from ipywidgets import widgets, interactive

## [INPUT] Connect Google gDrive to this Jupyter Notebook

In [None]:
# [INPUT REQUIRED]: Authorize access to Google gDrive

# Connect this Notebook to your permanent Google Drive
#   so all generated output is saved to permanent storage there

try:
  from google.colab import drive
  IN_COLAB=True
except:
  IN_COLAB=False

if IN_COLAB:
  print("Attempting to attach your Google gDrive to this Colab Jupyter Notebook")
  drive.mount('/gdrive')
else:
  print("Your Google gDrive is attached to this Colab Jupyter Notebook")

Attempting to attach your Google gDrive to this Colab Jupyter Notebook
Mounted at /gdrive


In [None]:
!ls

sample_data


In [None]:
# [CUSTOMIZE]: Change the text after the Unix '%cd ' command below (change directory)
#              to math the full path to your gDrive subdirectory which should be the 
#              root directory cloned from the SentimentArcs github repo.

# NOTE: Make sure this subdirectory already exists and there are 
#       no typos, spaces or illegals characters (e.g. periods) in the full path after %cd

# NOTE: In Python all strings must begin with an upper or lowercase letter, and only
#         letter, number and underscores ('_') characters should appear afterwards.
#         Make sure your full path after %cd obeys this constraint or errors may appear.



# Step #1: Get full path to SentimentArcs subdir on gDrive
# =======
#@markdown **Accept default path on gDrive or Enter new one:**

Path_to_SentimentArcs = "/gdrive/MyDrive/cdh/sentiment_arcs/" #@param ["/gdrive/MyDrive/sentiment_arcs/"] {allow-input: true}

#@markdown (e.g. /gdrive/MyDrive/research/sentiment_arcs/)



# Step #2: Move to Parent directory of Sentiment_Arcs
# =======
parentdir_sentiment_arcs = '/'.join(Path_to_SentimentArcs.split('/')[:-2])
print(f'subdir_parent: {parentdir_sentiment_arcs}')
%cd $parentdir_sentiment_arcs


# Step #3: If project sentiment_arcs subdir does not exist, 
#          clone it from github
# =======
import os

if not os.path.isdir('sentiment_arcs'):
  # NOTE: This will not work until SentimentArcs becomes an open sourced PUBLIC repo
  # !git clone https://github.com/jon-chun/sentiment_arcs.git

  # Test on open access github repo
  !git clone https://github.com/jon-chun/nabokov_palefire.git


# Step #4: Change into sentiment_arcs subdir
# =======
%cd ./sentiment_arcs
# Test on open acess github repo
# %cd ./nabokov_palefire

# Step #5: Confirm contents of sentiment_arcs subdir
# =======
!ls


subdir_parent: /gdrive/MyDrive/cdh
/gdrive/MyDrive/cdh
/gdrive/MyDrive/cdh/sentiment_arcs
config	notebooks  text_clean  text_raw


In [None]:
# [VERIFY]: Ensure that all the manually preprocessed novel are in plain text
#   files and file names are formatted correctly

# %cd ../sentiment_arcs
!pwd
!ls ./text_raw

/gdrive/MyDrive/cdh/sentiment_arcs
finance_raw  novels_raw


## Define Directory Tree Structure

In [None]:
#@markdown **Sentiment Arcs Directory Structure** \
#@markdown \
#@markdown **1. Input Directories:** \
#@markdown (a) Raw textfiles in subdir: ./text_raw/(text_type)/  \
#@markdown (b) Cleaned textfiles in subdir: ./text_clean/(text_type)/ \
#@markdown \
#@markdown **2. Output Directories** \
#@markdown (1) Raw Sentiment time series datafiles and plots in subdir: ./sentiment_raw/(text_type) \
#@markdown (2) Cleaned Sentiment time series datafiles and plots in subdir: ./sentiment_clean/(text_type) \
#@markdown \
#@markdown **Which type of texts are you analyzing?** \

Text_Type = "novels" #@param ["novels", "social_media", "finance"]

#@markdown Please check that the required textfiles and datafiles exist in the correct subdirectories before continuing.




In [None]:
# Create Directory CONSTANTS based On Document Type

SUBDIR_TEXT_RAW = f"./text_raw/{Text_Type}_raw/"
SUBDIR_TEXT_CLEAN = f"./text_clean/{Text_Type}_clean/"
SUBDIR_SENTIMENT_RAW = f"./sentiment_raw/{Text_Type}_raw/"
SUBDIR_SENTIMENT_CLEAN = f"./sentiment_clean/{Text_Type}_clean/"
SUBDIR_PLOTS = f"./plots/{Text_Type}_plots/"

# Verify Directory Structure

print('Verify the Directory Structure:\n')
print('-------------------------------\n')

print(f'           [Corpus Type]: {Text_Type}\n')
print(f'       [SUBDIR_TEXT_RAW]: {SUBDIR_TEXT_RAW}\n')
print(f'     [SUBDIR_TEXT_CLEAN]: {SUBDIR_TEXT_CLEAN}\n')
print(f'  [SUBDIR_SENTIMENT_RAW]: {SUBDIR_SENTIMENT_RAW}\n')
print(f'[SUBDIR_SENTIMENT_CLEAN]: {SUBDIR_SENTIMENT_CLEAN}\n')
print(f'          [SUBDIR_PLOTS]: {SUBDIR_PLOTS}\n')

Verify the Directory Structure:

-------------------------------

           [Corpus Type]: novels

       [SUBDIR_TEXT_RAW]: ./text_raw/novels_raw/

     [SUBDIR_TEXT_CLEAN]: ./text_clean/novels_clean/

  [SUBDIR_SENTIMENT_RAW]: ./sentiment_raw/novels_raw/

[SUBDIR_SENTIMENT_CLEAN]: ./sentiment_clean/novels_clean/

          [SUBDIR_PLOTS]: ./plots/novels_plots/



## Read YAML Configuration File

In [None]:
!pip install pyyaml



In [None]:
import yaml

### Define Texts to Analyze

In [184]:
# Read SentimentArcs YAML Config Files for Different Corpora Types(3) and Text Files Details

# Novel Text Files
with open("./config/novels_info.yaml", "r") as stream:
  try:
    novels_dt = yaml.safe_load(stream)
  except yaml.YAMLError as exc:
    print(exc)

# Finance Text Files
with open("./config/finance_info.yaml", "r") as stream:
  try:
    finance_dt = yaml.safe_load(stream)
  except yaml.YAMLError as exc:
    print(exc)

# Social Media Text Files

with open("./config/social_info.yaml", "r") as stream:
  try:
    social_dt = yaml.safe_load(stream)
  except yaml.YAMLError as exc:
    print(exc)

In [None]:
import json

In [None]:
# Verify the Corpora: Novel Textfiles in novels_dt

print (json.dumps(novels_dt, indent=2))

{
  "cdickens_achristmascarol": [
    "A Christmas Carol by Charles Dickens ",
    1843,
    1399
  ],
  "cdickens_greatexpectations": [
    "Great Expectations by Charles Dickens",
    1861,
    7230
  ],
  "dbrown_thedavincicode": [
    "The Da Vinci Code by Dan Brown",
    2003,
    9475
  ],
  "ddefoe_robinsoncrusoe": [
    "Robinson Crusoe by Daniel Defoe",
    1719,
    2280
  ],
  "eljames_fiftyshadesofgrey": [
    "Fifty Shades of Grey by E.L. James",
    2011,
    8184
  ],
  "emforster_howardsend": [
    "Howards End by E.M. Forester",
    1910,
    8999
  ],
  "fbaum_thewonderfulwizardofoz": [
    "The Wonderful Wizard of Oz by Frank Baum",
    1850,
    2238
  ],
  "fdouglass_narrativelifeofaslave": [
    "Narrative of the life of Frederick Douglass, an American Slave by Frederick Douglass",
    1845,
    1688
  ],
  "fscottfitzgerald_thegreatgatsby": [
    "The Great Gatsby by F. Scott Fitzgerald",
    1925,
    2950
  ],
  "geliot_middlemarch": [
    "Middlemarch by Georg

In [None]:
# Verify the Corpora: Novel Textfiles in finance_dt

print (json.dumps(finance_dt, indent=2))

{
  "academic_semanticscholar": [
    [
      "etherium",
      "",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "analyst_goldmansachs": [
    [
      "coin",
      "",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "analyst_morningstar": [
    [
      "coin",
      "",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "corporate_coin": [
    [
      "regulator",
      "sec_10k",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "news_msnbc": [
    [
      "bitcoin",
      "Squakbox",
      "Jim Cramer"
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "news_nyt": [
    [
      "coin",
      "finance",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "news_wsj": [
    [
      "coin",
      "",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "social_reddit": [
    [
     

In [None]:
# Verify the Corpora: Novel Textfiles in social_dt

print (json.dumps(social_dt, indent=2))

{
  "instagram_c19": [
    [
      "Covid 19 Pandemic",
      "#c19",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "instagram_covid": [
    [
      "Covid 19 Pandemic",
      "#covid",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "instagram_pandemic": [
    [
      "Covid 19 Pandemic",
      "#pandemic",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "reddit_c19": [
    [
      "Covid 19 Pandemic",
      "#c19",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "reddit_covid": [
    [
      "Covid 19 Pandemic",
      "#covid",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "redditr_pandemic": [
    [
      "Covid 19 Pandemic",
      "#pandemic",
      ""
    ],
    [
      "2021-12-01",
      "2021-12-31"
    ],
    0
  ],
  "twitter_c19": [
    [
      "Covid 19 Pandemic",
      "#c19",
      ""
    ],
    [
      "2

## Define Globals

In [None]:
# TODO

## Install Libraries: R

In [None]:
# !pip install rpy2

In [None]:
# !pip install -U rpy2

In [None]:
# Load Jupyter rpy2 Extension  
#   enables the %%R magic commands

%load_ext rpy2.ipython

In [None]:
# %reload_ext rpy2.ipython

In [None]:
%%time 
%%capture 
%%R

# Install Syuzhet.R, Sentiment.R and Utility Libraries

# NOTE: 1m12s 
#       1m05s

install.packages(c('syuzhet', 'sentimentr', 'tidyverse', 'lexicon'))

library(syuzhet)
library(sentimentr)
library(tidyverse)
library(lexicon)

CPU times: user 2.81 s, sys: 238 ms, total: 3.05 s
Wall time: 1min 12s


In [None]:
# %reload_ext rpy2.ipython

In [None]:
# Load Python libraries to exchange data with R Program Space and read R Datafiles

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr

In [None]:
%%R

# Verify R in Kernel Version

R.version.string

[1] "R version 4.1.2 (2021-11-01)"


In [None]:
%%R

# Verify R Kernel Session Info

sessionInfo()

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas/liblapack.so.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] lexicon_1.2.1    forcats_0.5.1    stringr_1.4.0    dplyr_1.0.8     
 [5] purrr_0.3.4      readr_2.1.2      tidyr_1.2.0      tibble_3.1.6    
 [9] ggplot2_3.3.5    tidyverse_1.3.1  sentimentr_2.9.0 syuzhet_1.0.6   

loaded via a namespace (and not atta

In [None]:
%%R

# Verfiy R Kernel Environment

# Sys.getenv


NULL


## Install Libraries: Python

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [None]:
from glob import glob
import copy
import json

## Setup Matplotlib Style

* https://matplotlib.org/stable/tutorials/introductory/customizing.html

In [None]:
from cycler import cycler

colors = ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']   
linestyles = ['-', '--', ':', '-.','-', '--', ':', '-.','-', '--']

cycle = plt.cycler("color", colors) + plt.cycler("linestyle", linestyles)

# View previous matplotlib configuration
print('\n Old Matplotlib Configurtion Settings:\n')
# plt.rc.show
print('\n\n')

# Update and view new matplotlib configuration
print('\n New Matplotlib Configurtion Settings:\n')
myparams = {'axes.prop_cycle': cycle}
plt.rcParams.update(myparams)

plt.rcParams["axes.titlesize"] = 16
plt.rcParams['figure.figsize'] = 20,10
plt.rcParams["legend.fontsize"] = 10
plt.rcParams["xtick.labelsize"] = 12
plt.rcParams["ytick.labelsize"] = 12
plt.rcParams["axes.labelsize"] = 12



 Old Matplotlib Configurtion Settings:





 New Matplotlib Configurtion Settings:



In [None]:
"""
import matplotlib.colors as mcolors

mcolors.TABLEAU_COLORS

all_named_colors = {}
all_named_colors.update(mcolors.TABLEAU_COLORS)

print('\n')
all_named_colors.values()
""";

In [None]:
# Set matplotlib plot figure.figsize

new_plt_size = plt.rcParams["figure.figsize"]=(20,10)

print(" New figure size: ",new_plt_size)

 New figure size:  (20, 10)


## Setup Seaborn Style

In [None]:
# View previous seaborn configuration
print('\n Old Seaborn Configurtion Settings:\n')
sns.axes_style()
print('\n\n')

# Update and View new seaborn configuration
print('\n New Seaborn Configurtion Settings:\n')
# sns.set_style('white')
sns.set_context('paper')
sns.set_style('white')
sns.set_palette('tab10')

# Change defaults
# sns.set(style='white', context='talk', palette='tab10')


 Old Seaborn Configurtion Settings:



{'axes.axisbelow': 'line',
 'axes.edgecolor': 'black',
 'axes.facecolor': 'white',
 'axes.grid': False,
 'axes.labelcolor': 'black',
 'axes.spines.bottom': True,
 'axes.spines.left': True,
 'axes.spines.right': True,
 'axes.spines.top': True,
 'figure.facecolor': (1, 1, 1, 0),
 'font.family': ['sans-serif'],
 'font.sans-serif': ['DejaVu Sans',
  'Bitstream Vera Sans',
  'Computer Modern Sans Serif',
  'Lucida Grande',
  'Verdana',
  'Geneva',
  'Lucid',
  'Arial',
  'Helvetica',
  'Avant Garde',
  'sans-serif'],
 'grid.color': '#b0b0b0',
 'grid.linestyle': '-',
 'image.cmap': 'viridis',
 'lines.solid_capstyle': 'projecting',
 'patch.edgecolor': 'black',
 'patch.force_edgecolor': False,
 'text.color': 'black',
 'xtick.bottom': True,
 'xtick.color': 'black',
 'xtick.direction': 'out',
 'xtick.top': False,
 'ytick.color': 'black',
 'ytick.direction': 'out',
 'ytick.left': True,
 'ytick.right': False}





 New Seaborn Configurtion Settings:



In [None]:
# Seaborn: Set Theme (Scale of Font)

sns.set_theme('paper')  # paper, notebook, talk, poster


# Seaborn: Set Context
# sns.set_context("notebook")



# Seaborn: Set Style

# sns.set_style('ticks') # darkgrid, whitegrid, dark, white, and ticks

In [None]:
# Seaborn: Default Palette (Pastel?)

sns.color_palette()

In [None]:
# Seaborn: Set to High-Contrast Palette (more Vision Impaired Friendly)

sns.set_palette('tab10')
sns.color_palette()

In [None]:
plt.style.available

['Solarize_Light2',
 '_classic_test_patch',
 'bmh',
 'classic',
 'dark_background',
 'fast',
 'fivethirtyeight',
 'ggplot',
 'grayscale',
 'seaborn',
 'seaborn-bright',
 'seaborn-colorblind',
 'seaborn-dark',
 'seaborn-dark-palette',
 'seaborn-darkgrid',
 'seaborn-deep',
 'seaborn-muted',
 'seaborn-notebook',
 'seaborn-paper',
 'seaborn-pastel',
 'seaborn-poster',
 'seaborn-talk',
 'seaborn-ticks',
 'seaborn-white',
 'seaborn-whitegrid',
 'tableau-colorblind10']

In [None]:
plt.style.use('seaborn-whitegrid')

## Python Utility Functions

In [None]:
# Utility functions to read/write nested Dictionary (key=novel) of DataFrames (Cols = Model Sentiment Series) 

def write_dict_dfs(adict, out_file='sentiments.json', out_dir=SUBDIR_SENTIMENT_RAW):
  '''
  Given a Dictionary of DataFrames and optional output filename and output directory
  Write as nested json file
  '''

  # convert dataframes into dictionaries
  data_dict = {
      key: adict[key].to_dict(orient='records') 
      for key in adict.keys()
  }

  # write to disk
  out_fullpath = f'{out_dir}{out_file}'
  print(f'Saving file to: {out_fullpath}')
  with open(out_fullpath, 'w') as fp:
    json.dump(
      data_dict, 
      fp, 
      indent=4, 
      sort_keys=True
    )

  return 

def read_dict_dfs(in_file='sentiments.json', in_dir=SUBDIR_SENTIMENT_RAW):
  '''
  Given a Dictionary of DataFrames and optional output filename and output directory
  Read nested json file into Dictionary of DataFrames
  '''

  # read from disk
  in_fullpath = f'{in_dir}{in_file}'
  with open(in_fullpath, 'r') as fp:
      data_dict = json.load(fp)

  # convert dictionaries into dataframes
  all_dt = {
      key: pd.DataFrame(data_dict[key]) 
      for key in data_dict
  }

  return all_dt

# **[STEP 2] Read all Preprocessed Novels**

In [None]:
!pwd

/gdrive/MyDrive/cdh/sentiment_arcs


In [None]:
SUBDIR_TEXT_CLEAN

'./text_clean/novels_clean/'

In [None]:
!ls $SUBDIR_TEXT_CLEAN

cdickens_achristmascarol.csv	     jkrowling_1sorcerersstone.csv
cdickens_greatexpectations.csv	     jkrowling_4gobletoffire.csv
dbrown_thedavincicode.csv	     jkrowling_4gobletoffire_screenplay.csv
ddefoe_robinsoncrusoe.csv	     kvonnegut_slaughterhousefive.csv
eljames_fiftyshadesofgrey.csv	     mproust-mtreharne_3guermantesway.csv
emforster_howardsend.csv	     mshelley_frankenstein.csv
fbaum_thewonderfulwizardofoz.csv     mtwain_huckleberryfinn.csv
fdouglass_narrativelifeofaslave.csv  pjackson_thelightningthief.csv
fscottfitzgerald_thegreatgatsby.csv  staugustine_confessions9end.csv
geliot_middlemarch.csv		     tmorrison_beloved.csv
hjames_portraitofalady.csv	     vnabokov_palefire.csv
homer-ewilson_odyssey.csv	     vwoolf_mrsdalloway.csv
imcewan_machineslikeme.csv	     vwoolf_orlando.csv
jausten_prideandprejudice.csv	     vwoolf_thewaves.csv
jconrad_heartofdarkness.csv	     vwoolf_tothelighthouse.csv
jjoyce_portraitoftheartist.csv	     wgolding_lordoftheflies.csv


In [None]:
# Create a List (preprocessed_ls) of all preprocessed text files

try:
    preprocessed_ls = glob(f'{SUBDIR_TEXT_CLEAN}*.csv')
    preprocessed_ls = [x.split('/')[-1] for x in preprocessed_ls]
    preprocessed_ls = [x.split('.')[0] for x in preprocessed_ls]
except IndexError:
    raise RuntimeError('No csv file found')

print('\n'.join(preprocessed_ls))
print('\n')
print(f'Found {len(preprocessed_ls)} Preprocessed files in {SUBDIR_TEXT_CLEAN}')

cdickens_achristmascarol
cdickens_greatexpectations
dbrown_thedavincicode
ddefoe_robinsoncrusoe
eljames_fiftyshadesofgrey
emforster_howardsend
fbaum_thewonderfulwizardofoz
fdouglass_narrativelifeofaslave
fscottfitzgerald_thegreatgatsby
geliot_middlemarch
hjames_portraitofalady
homer-ewilson_odyssey
imcewan_machineslikeme
jausten_prideandprejudice
jconrad_heartofdarkness
jjoyce_portraitoftheartist
jkrowling_1sorcerersstone
jkrowling_4gobletoffire
jkrowling_4gobletoffire_screenplay
kvonnegut_slaughterhousefive
mproust-mtreharne_3guermantesway
mshelley_frankenstein
mtwain_huckleberryfinn
pjackson_thelightningthief
staugustine_confessions9end
tmorrison_beloved
vnabokov_palefire
vwoolf_mrsdalloway
vwoolf_orlando
vwoolf_thewaves
vwoolf_tothelighthouse
wgolding_lordoftheflies


Found 32 Preprocessed files in ./text_clean/novels_clean/


In [None]:
# Read all preprocessed text files into master DataFrame (corpus_dt)

corpus_dt = {}

for i,anovel in enumerate(preprocessed_ls):
  print(f'Processing #{i}: {anovel}...')
  afile_fullpath = f'{SUBDIR_TEXT_CLEAN}{anovel}.csv'
  print(f'               {afile_fullpath}')
  anovel_df = pd.read_csv(afile_fullpath)
  corpus_dt[anovel] = anovel_df

Processing #0: cdickens_achristmascarol...
               ./text_clean/novels_clean/cdickens_achristmascarol.csv
Processing #1: cdickens_greatexpectations...
               ./text_clean/novels_clean/cdickens_greatexpectations.csv
Processing #2: dbrown_thedavincicode...
               ./text_clean/novels_clean/dbrown_thedavincicode.csv
Processing #3: ddefoe_robinsoncrusoe...
               ./text_clean/novels_clean/ddefoe_robinsoncrusoe.csv
Processing #4: eljames_fiftyshadesofgrey...
               ./text_clean/novels_clean/eljames_fiftyshadesofgrey.csv
Processing #5: emforster_howardsend...
               ./text_clean/novels_clean/emforster_howardsend.csv
Processing #6: fbaum_thewonderfulwizardofoz...
               ./text_clean/novels_clean/fbaum_thewonderfulwizardofoz.csv
Processing #7: fdouglass_narrativelifeofaslave...
               ./text_clean/novels_clean/fdouglass_narrativelifeofaslave.csv
Processing #8: fscottfitzgerald_thegreatgatsby...
               ./text_clean/novels_cle

In [None]:
# Verify the novels read into master Dictionary of DataFrames

corpus_dt.keys()
print('\n')
print(f'There were {len(corpus_dt)} preprocessed novels read into the Dict corpus_dt')

dict_keys(['cdickens_achristmascarol', 'cdickens_greatexpectations', 'dbrown_thedavincicode', 'ddefoe_robinsoncrusoe', 'eljames_fiftyshadesofgrey', 'emforster_howardsend', 'fbaum_thewonderfulwizardofoz', 'fdouglass_narrativelifeofaslave', 'fscottfitzgerald_thegreatgatsby', 'geliot_middlemarch', 'hjames_portraitofalady', 'homer-ewilson_odyssey', 'imcewan_machineslikeme', 'jausten_prideandprejudice', 'jconrad_heartofdarkness', 'jjoyce_portraitoftheartist', 'jkrowling_1sorcerersstone', 'jkrowling_4gobletoffire', 'jkrowling_4gobletoffire_screenplay', 'kvonnegut_slaughterhousefive', 'mproust-mtreharne_3guermantesway', 'mshelley_frankenstein', 'mtwain_huckleberryfinn', 'pjackson_thelightningthief', 'staugustine_confessions9end', 'tmorrison_beloved', 'vnabokov_palefire', 'vwoolf_mrsdalloway', 'vwoolf_orlando', 'vwoolf_thewaves', 'vwoolf_tothelighthouse', 'wgolding_lordoftheflies'])



There were 32 preprocessed novels read into the Dict corpus_dt


In [None]:
# Check if there are any Null strings in the text_clean columns

for i, anovel in enumerate(list(corpus_dt.keys())):
  print(f'\nNovel #{i}: {anovel}')
  nan_ct = corpus_dt[anovel].text_clean.isna().sum()
  if nan_ct > 0:
    print(f'      {nan_ct} Null strings in the text_clean column')


Novel #0: cdickens_achristmascarol

Novel #1: cdickens_greatexpectations

Novel #2: dbrown_thedavincicode
      8 Null strings in the text_clean column

Novel #3: ddefoe_robinsoncrusoe

Novel #4: eljames_fiftyshadesofgrey
      3 Null strings in the text_clean column

Novel #5: emforster_howardsend

Novel #6: fbaum_thewonderfulwizardofoz

Novel #7: fdouglass_narrativelifeofaslave
      1 Null strings in the text_clean column

Novel #8: fscottfitzgerald_thegreatgatsby
      24 Null strings in the text_clean column

Novel #9: geliot_middlemarch

Novel #10: hjames_portraitofalady

Novel #11: homer-ewilson_odyssey

Novel #12: imcewan_machineslikeme
      16 Null strings in the text_clean column

Novel #13: jausten_prideandprejudice

Novel #14: jconrad_heartofdarkness
      8 Null strings in the text_clean column

Novel #15: jjoyce_portraitoftheartist

Novel #16: jkrowling_1sorcerersstone
      1 Null strings in the text_clean column

Novel #17: jkrowling_4gobletoffire
      377 Null strin

In [None]:
# Fill in all the Null value of text_clean with placeholder 'empty_string'

for i, anovel in enumerate(list(corpus_dt.keys())):
  # print(f'Novel #{i}: {anovel}')
  # Fill all text_clean == Null with 'empty_string' so sentimentr::sentiment doesn't break
  corpus_dt[anovel][corpus_dt[anovel].text_clean.isna()] = 'empty_string'

In [None]:
# Verify one DataFrame in the master Dictionary

corpus_dt['dbrown_thedavincicode'].head()

Unnamed: 0.1,Unnamed: 0,text_raw,text_clean
0,0,The Da Vinci Code Dan Brown,the da vinci code dan brown
1,1,FOR BLYTHE...,for blythe
2,2,AGAIN.,again
3,3,MORE THAN EVER.,much than ever
4,4,Acknowledgments,acknowledgment


# **[STEP 3] Get Sentiments with SyuzhetR (4 Models)**

## Option (a): Read Previously Computed SyuzhetR Values from Datafiles

In [None]:
# Read in Saved SyuzhetR Datafile from subdir_sentiments/all_4syuzhetr.json

corpus_syuzhetr_dt = read_dict_dfs('all_4syuzhetr.json')
corpus_syuzhetr_dt.keys()

In [None]:
# Verify all the Novels have 4 Syuzhet Model Values

for i, anovel in enumerate(list(corpus_syuzhetr_dt.keys())):
  print(f'Novel #{i}: {anovel}')
  corpus_syuzhetr_dt[anovel].drop(columns=['Unnamed: 0'], inplace=True)
  print(f'      df.shape: {corpus_syuzhetr_dt[anovel].shape}')

In [None]:
# Verify DataFrame for test novel

novel_str = 'cdickens_achristmascarol'
corpus_syuzhetr_dt[novel_str].head()

## Option (b): Compute New SyuzhetR Values

In [None]:
# Verify text_clean of sample text

text_sample = 'cdickens_achristmascarol'

corpus_dt[text_sample]['text_clean'].to_list()[:10]

In [None]:
%%time

# Compute Sentiments from all 4 Syuzhet Models applied to all 32 Novels (4 x 32 = 128 runs)

# NOTE:  9m45s 23:30 on 20220114 Colab Pro (33 Novels)
#       28:32s 21:06 on 20220226 Colab Pro (33 Novels)

# base = importr('base')
syuzhet = importr('syuzhet')

# corpus_syuzhetr_dt = {}

# base.rank(0, na_last = True)
novels_keys_ls = list(corpus_dt.keys())
novels_keys_ls.sort()
for i, anovel in enumerate(novels_keys_ls):
  print(f'Processing Novel #{i}: {anovel}...')
  corpus_dt[anovel]['syuzhetr_syuzhet'] = syuzhet.get_sentiment(corpus_dt[anovel]['text_clean'].to_list(), method='syuzhet')
  corpus_dt[anovel]['syuzhetr_bing'] = syuzhet.get_sentiment(corpus_dt[anovel]['text_clean'].to_list(), method='bing')
  corpus_dt[anovel]['syuzhetr_afinn'] = syuzhet.get_sentiment(corpus_dt[anovel]['text_clean'].to_list(), method='afinn')
  corpus_dt[anovel]['syuzhetr_nrc'] = syuzhet.get_sentiment(corpus_dt[anovel]['text_clean'].to_list(), method='nrc')

Processing Novel #0: cdickens_achristmascarol...
Processing Novel #1: cdickens_greatexpectations...
Processing Novel #2: dbrown_thedavincicode...
Processing Novel #3: ddefoe_robinsoncrusoe...
Processing Novel #4: eljames_fiftyshadesofgrey...
Processing Novel #5: emforster_howardsend...
Processing Novel #6: fbaum_thewonderfulwizardofoz...
Processing Novel #7: fdouglass_narrativelifeofaslave...
Processing Novel #8: fscottfitzgerald_thegreatgatsby...
Processing Novel #9: geliot_middlemarch...
Processing Novel #10: hjames_portraitofalady...
Processing Novel #11: homer-ewilson_odyssey...
Processing Novel #12: imcewan_machineslikeme...
Processing Novel #13: jausten_prideandprejudice...
Processing Novel #14: jconrad_heartofdarkness...
Processing Novel #15: jjoyce_portraitoftheartist...
Processing Novel #16: jkrowling_1sorcerersstone...
Processing Novel #17: jkrowling_4gobletoffire...
Processing Novel #18: jkrowling_4gobletoffire_screenplay...
Processing Novel #19: kvonnegut_slaughterhousefive

## Checkpoint: Save SyuzhetR Values

In [None]:
# Verify in SentimentArcs Root Directory

!pwd
print('\n')
!ls

/gdrive/MyDrive/cdh/sentiment_arcs


config	notebooks  sentiment_clean  sentiment_raw  text_clean  text_raw


In [None]:
# Verify Save Destination Subdir: SUBDIR_SENTIMENT_RAW

SUBDIR_SENTIMENT_RAW
print('\n')
!ls $SUBDIR_SENTIMENT_RAW

'./sentiment_raw/novels_raw/'





In [None]:
corpus_dt.keys()

dict_keys(['cdickens_achristmascarol', 'cdickens_greatexpectations', 'dbrown_thedavincicode', 'ddefoe_robinsoncrusoe', 'eljames_fiftyshadesofgrey', 'emforster_howardsend', 'fbaum_thewonderfulwizardofoz', 'fdouglass_narrativelifeofaslave', 'fscottfitzgerald_thegreatgatsby', 'geliot_middlemarch', 'hjames_portraitofalady', 'homer-ewilson_odyssey', 'imcewan_machineslikeme', 'jausten_prideandprejudice', 'jconrad_heartofdarkness', 'jjoyce_portraitoftheartist', 'jkrowling_1sorcerersstone', 'jkrowling_4gobletoffire', 'jkrowling_4gobletoffire_screenplay', 'kvonnegut_slaughterhousefive', 'mproust-mtreharne_3guermantesway', 'mshelley_frankenstein', 'mtwain_huckleberryfinn', 'pjackson_thelightningthief', 'staugustine_confessions9end', 'tmorrison_beloved', 'vnabokov_palefire', 'vwoolf_mrsdalloway', 'vwoolf_orlando', 'vwoolf_thewaves', 'vwoolf_tothelighthouse', 'wgolding_lordoftheflies'])

In [None]:
corpus_dt['cdickens_achristmascarol']

Unnamed: 0.1,Unnamed: 0,text_raw,text_clean,syuzhetr_syuzhet,syuzhetr_bing,syuzhetr_afinn,syuzhetr_nrc
0,0,CHAPTER I: MARLEY'S GHOST,chapter i marley s ghost,-0.60,0,-1,0.0
1,1,MARLEY was dead: to begin with.,marley be dead to begin with,-1.00,-1,-3,0.0
2,2,There is no doubt whatever about that.,there be no doubt whatever about that,-0.75,-1,-2,-1.0
3,3,The register of his burial was signed by the c...,the register of his burial be sign by the cler...,-1.50,-1,0,-1.0
4,4,Scrooge signed it: and Scrooge's name was good...,scrooge sign it and scrooge s name be good upo...,0.75,1,3,1.0
...,...,...,...,...,...,...,...
1941,1941,Some people laughed to see the alteration in h...,some people laugh to see the alteration in him...,1.80,2,5,3.0
1942,1942,His own heart laughed: and that was quite enou...,his own heart laugh and that be quite enough f...,0.25,1,1,1.0
1943,1943,"He had no further intercourse with Spirits, bu...",he have no far intercourse with spirit but liv...,2.35,1,4,6.0
1944,1944,"May that be truly said of us, and all of us!",may that be truly say of us and all of us,0.00,0,0,0.0


In [None]:
# Save sentiment values to subdir_sentiments

write_dict_dfs(corpus_dt, out_file='all_4syuzhetr.json', out_dir=SUBDIR_SENTIMENT_RAW)

Saving file to: ./sentiment_raw/novels_raw/all_4syuzhetr.json


In [None]:
# Verify Dictionary was saved correctly by reading back the *.json datafile

test_dt = read_dict_dfs(in_file='all_4syuzhetr.json', in_dir=SUBDIR_SENTIMENT_RAW)
test_dt.keys()

dict_keys(['cdickens_achristmascarol', 'cdickens_greatexpectations', 'dbrown_thedavincicode', 'ddefoe_robinsoncrusoe', 'eljames_fiftyshadesofgrey', 'emforster_howardsend', 'fbaum_thewonderfulwizardofoz', 'fdouglass_narrativelifeofaslave', 'fscottfitzgerald_thegreatgatsby', 'geliot_middlemarch', 'hjames_portraitofalady', 'homer-ewilson_odyssey', 'imcewan_machineslikeme', 'jausten_prideandprejudice', 'jconrad_heartofdarkness', 'jjoyce_portraitoftheartist', 'jkrowling_1sorcerersstone', 'jkrowling_4gobletoffire', 'jkrowling_4gobletoffire_screenplay', 'kvonnegut_slaughterhousefive', 'mproust-mtreharne_3guermantesway', 'mshelley_frankenstein', 'mtwain_huckleberryfinn', 'pjackson_thelightningthief', 'staugustine_confessions9end', 'tmorrison_beloved', 'vnabokov_palefire', 'vwoolf_mrsdalloway', 'vwoolf_orlando', 'vwoolf_thewaves', 'vwoolf_tothelighthouse', 'wgolding_lordoftheflies'])

## Plot SyuzhetR 4 Models

In [None]:
#@markdown Select option to save plots:
Save_Raw_Plots = True #@param {type:"boolean"}

Save_Smooth_Plots = True #@param {type:"boolean"}
Resolution = "300" #@param ["100", "300"]



In [None]:
# Get Col Names for all 4 SyuzhetR Models

cols_all_ls = corpus_dt['cdickens_achristmascarol'].columns
cols_syuzhetr_ls = [x for x in cols_all_ls if 'syuzhetr_' in x]
cols_syuzhetr_ls

['syuzhetr_syuzhet', 'syuzhetr_bing', 'syuzhetr_afinn', 'syuzhetr_nrc']

In [None]:
novels_dt['cdickens_achristmascarol'][0]

'A Christmas Carol by Charles Dickens '

In [None]:
SUBDIR_PLOTS

'./plots/novels_plots/'

In [None]:
# Verify 4 SyuzhetR Models with Plots

for i, anovel in enumerate(list(corpus_dt.keys())):

  print(f'Novel #{i}: {novels_dt[anovel][0]}')

  # Raw Sentiments 
  fig = corpus_dt[anovel][cols_syuzhetr_ls].plot(title=f'{novels_dt[anovel][0]}\n SyuzhetR 4 Models: Raw Sentiments', alpha=0.3)
  plt.show();

  if Save_Raw_Plots:
    plt.savefig(f'{SUBDIR_PLOTS}plot_syuzhetr_raw_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))

  
  # Smoothed Sentiments (SMA 10%)
  # novel_sample = 'cdickens_achristmascarol'
  win_10per = int(corpus_dt[anovel].shape[0] * 0.1)
  corpus_dt[anovel][cols_syuzhetr_ls].rolling(win_10per, center=True, min_periods=0).mean().plot(title=f'{novels_dt[anovel][0]}\n SyuzhetR 4 Models: Smoothed Sentiments (SMA 10%)', alpha=0.3)
  plt.show();

  if Save_Smooth_Plots:
    plt.savefig(f'{SUBDIR_PLOTS}plot_syuzhetr_smooth10sma_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))


Output hidden; open in https://colab.research.google.com to view.

# **[STEP 4] Get Sentiments with SentimentR (7 Models)**

In [None]:
# Make a copy of DataFrame for 7 SentimentR Models

# novels_sentimentr_df = copy.deepcopy(corpus_dt)


In [128]:
# !pip install -U rpy2



In [129]:
# import rpy2

In [130]:
# dir(rpy2.robjects)

['Array',
 'BoolVector',
 'ComplexVector',
 'DataFrame',
 'DateVector',
 'Environment',
 'FactorVector',
 'FloatVector',
 'Formula',
 'Function',
 'IntVector',
 'ListVector',
 'Matrix',
 'NA_Character',
 'NA_Complex',
 'NA_Integer',
 'NA_Logical',
 'NA_Real',
 'NULL',
 'POSIXct',
 'POSIXlt',
 'PairlistVector',
 'R',
 'RObject',
 'RObjectMixin',
 'RS4',
 'Sexp',
 'SexpClosure',
 'SexpEnvironment',
 'SexpExtPtr',
 'SexpS4',
 'SexpVector',
 'SignatureTranslatedFunction',
 'StrSexpVector',
 'StrVector',
 'TYPEORDER',
 'Vector',
 '_',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '_convert_rpy2py_boolvector',
 '_convert_rpy2py_bytevector',
 '_convert_rpy2py_complexvector',
 '_convert_rpy2py_floatvector',
 '_convert_rpy2py_intvector',
 '_convert_rpy2py_langvector',
 '_convert_rpy2py_strvector',
 '_function_to_rpy',
 '_globalenv',
 '_py2rpy_array',
 '_py2rpy_bool',
 '_py2rpy_bytes',
 '_py2rpy_complex',
 '_py2rpy

In [132]:
# dir(lexicon)

['___NAMESPACE___',
 '___S3MethodsTable___',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__rdata__',
 '__rname__',
 '__spec__',
 '__version__',
 '_env',
 '_exported_names',
 '_packageName',
 '_rpy2r',
 '_symbol_r2python',
 '_symbol_resolve',
 '_translation',
 'as_key',
 'available_data',
 'grady_pos_feature',
 'hash_sentiment_jockers',
 'key_sentiment_jockers']

In [None]:
# %%R
# if (!require("pacman")) install.packages("pacman")
# pacman::p_load_gh("trinker/lexicon")

In [139]:
"""

# THIS CODE DOES NOT WORK FOR LEXICON, SEE ALT METHOD BELOW

# from rpy2.robjects.packages import importr

# NOTE: 1m22s for 1 out of (7x32 = 224)

sentimentr = importr('sentimentr')
lexicon = importr('lexicon')

novels_keys_ls = list(corpus_dt.keys())
# lexicon_robj = lexicon.hash_sentiment_huliu
for i, anovel in enumerate(novels_keys_ls[:1]):
  print(f'Processing Novel #{i}: {anovel}...')
  print( '                 jockers_rinker')
  corpus_dt[anovel]['sentimentr_jockersrinker'] = corpus_dt[anovel]['text_clean'].apply(lambda x: sentimentr.sentiment(x, polarity_dt=lexicon.hash_sentiment_jockers_rinker)) # polarity_dt=lexicon.hash_sentiment_jockers_rinker))

  print( '                 jockers')  # 1m20s
  corpus_dt[anovel]['sentimentr_jockers'] = corpus_dt[anovel]['text_clean'].apply(lambda x: sentimentr.sentiment(x, polarity_dt=lexicon.hash_sentiment_jockers)) # polarity_dt=lexicon.hash_sentiment_jockers_rinker))

  # lexicon_robj = lexicon.hash_sentiment_huliu
  print( '                 huliu')
  corpus_dt[anovel]['sentimentr_huliu'] = corpus_dt[anovel]['text_clean'].apply(lambda x: sentimentr.sentiment(x, polarity_dt=lexicon.hash_sentiment_huliu)) # lexicon_robj)) # polarity_dt=lexicon.hash_sentiment_huliu))

  # lexicon_robj = lexicon.hash_sentiment_nrc
  print( '                 nrc')
  corpus_dt[anovel]['sentimentr_nrc'] = corpus_dt[anovel]['text_clean'].apply(lambda x: sentimentr.sentiment(x, polarity_dt=lexicon_robj)) # polarity_dt=lexicon.hash_sentiment_nrc))

  print( '                 senticnet')
  corpus_dt[anovel]['sentimentr_senticnet'] = corpus_dt[anovel]['text_clean'].apply(lambda x: sentimentr.sentiment(x, polarity_dt=lexicon_robj)[[3]]) # polarity_dt=lexicon.hash_sentiment_senticnet))

  print( '                 sentiword')
  corpus_dt[anovel]['sentimentr_sentiword'] = corpus_dt[anovel]['text_clean'].apply(lambda x: sentimentr.sentiment(x, polarity_dt=lexicon_robj)) # polarity_dt=lexicon.hash_sentiment_sentiword))

  print( '                 loughran_mcdonald')
  corpus_dt[anovel]['sentimentr_loughran_mcdonald'] = corpus_dt[anovel]['text_clean'].apply(lambda x: sentimentr.sentiment(x, polarity_dt=lexicon_robj)) # polarity_dt=lexicon.hash_sentiment_loughran_mcdonald))

  # test_sent = 'I love lint very much'
  # test_str = sentimentr.sentiment(test_sent)
  # print(f'SentimentR: {test_str} for \n            {test_sent}')
  # corpus_dt[anovel]['sentimentr_jockersrinker'] = sentimentr.sentiment(corpus_dt[anovel]['text_clean'].to_list())
""";

Processing Novel #0: cdickens_achristmascarol...
                 jockers_rinker


AttributeError: ignored

In [None]:
# corpus_dt['cdickens_achristmascarol'].head()

In [None]:
"""
from rpy2.robjects.packages import importr

sentiment = importr('sentimentr')

novels_keys_ls = list(corpus_dt.keys())
for i, anovel in enumerate(novels_keys_ls[:1]):
  print(f'Processing Novel #{i}: {anovel}...')
  corpus_dt[anovel]['sentimentr_jockersrinker'] = sentiment(corpus_dt[anovel]['text_clean'].to_list(), polarity_dt=lexicon::hash_sentiment_jockers_rinker, 
                                                                hypen="", amplifier_weight=0.8, n_before=5, n_after=2,
                                                                adversative_weight=0.25, neutral_nonverb_like = FALSE, missing_value = 0)
""";

In [None]:
# %%R 

# SentimentAnalysis <- apply(analyzeSentiment(s_v)[c('SentimentGI', 'SentimentLM', 'SentimentQDAP') ], 2, round, 2)
# colnames(SentimentAnalysis) <- gsub('^Sentiment', "SA_", colnames(SentimentAnalysis))

In [None]:
# %%R

# sentimentr_jockersrinker <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers_rinker, 
#                                       hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
#                                       adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
# test_ls = corpus_dt['cdickens_achristmascarol']['text_clean'].to_list()
# len(test_ls)

In [None]:
# import rpy2.robjects as robjects
# from rpy2.robjects.packages import importr

In [None]:
# s_v = robjects.StrVector(test_ls)
# type(s_v)

# lexicon_name = lexicon::hash_sentiment_jockers_rinker

In [None]:
# novels_sentimentr_df = pd.DataFrame()

In [None]:
# novels_roots_ls = list(corpus_dt.keys())

In [None]:
"""
for i,anovel_root in enumerate(novels_roots_ls):
  print(f'Novel #{i}: {anovel_root}')
  print(f'     {corpus_dt[anovel_root].shape}')
""";

In [None]:
# %%R -i novels_roots_ls

# novels_key_ls = c('cdickens_achristmascarol')

# for(i in 1:length(novels_roots_ls)) {
#   {print(novels_roots_ls[i])}
# }

## Option (a): Read Previous Computed SentimentR Values from DataFile

In [None]:
# Read in Saved SyuzhetR Datafile from subdir_sentiments/all_4syuzhetr.json

corpus_sentimentr_dt = read_dict_dfs('all_7sentimentr.json')
corpus_sentimentr_dt.keys()

In [None]:
# Verify all the Novels have 4 Syuzhet Model Values

for i, anovel in enumerate(list(corpus_sentimentr_dt.keys())):
  print(f'Novel #{i}: {anovel}')
  corpus_sentimentr_dt[anovel].drop(columns=['Unnamed: 0'], inplace=True)
  print(f'      df.shape: {corpus_sentimentr_dt[anovel].shape}')

In [None]:
# Verify DataFrame for test novel

novel_str = 'cdickens_achristmascarol'
corpus_sentimentr_dt[novel_str].head()

## Option (b): Compute New SentimentR Values

Call function in external get_sentimentr.R from within Python Loop

* https://medium.com/analytics-vidhya/calling-r-from-python-magic-of-rpy2-d8cbbf991571

* https://rpy2.github.io/doc/v3.0.x/html/generated_rst/pandas.html

In [140]:
%%file get_sentimentr.R

library(sentimentr)
library(lexicon)

get_sentimentr_values <- function(s_v) {
  
  print('Processing sentimentr_jockersrinker')
  sentimentr_jockersrinker <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers_rinker, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_jockers')
  sentimentr_jockers <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_huliu')
  sentimentr_huliu <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_huliu, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_nrc')
  sentimentr_nrc <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_nrc, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_senticnet')
  sentimentr_senticnet <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_senticnet, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_sentiword')
  sentimentr_sentiword <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_sentiword, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_loughran_mcdonald')
  sentimentr_loughran_mcdonald <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_loughran_mcdonald, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_socal_google')
  sentimentr_socal_google <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_socal_google, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  anovel_sentimentr_df <- data.frame('text_clean' = s_v,
                                'sentimentr_jockersrinker' = sentimentr_jockersrinker$sentiment,
                                'sentimentr_jockers' = sentimentr_jockers$sentiment,
                                'sentimentr_huliu' = sentimentr_huliu$sentiment,
                                'sentimentr_nrc' = sentimentr_nrc$sentiment,
                                'sentimentr_senticnet' = sentimentr_senticnet$sentiment,
                                'sentimentr_sentiword' = sentimentr_sentiword$sentiment,
                                'sentimentr_loughran_mcdonald' = sentimentr_loughran_mcdonald$sentiment,
                                'sentimentr_socal_google' = sentimentr_socal_google$sentiment
                                )
  return(anovel_sentimentr_df)

}

Writing get_sentimentr.R


In [141]:
# Verify the *.R file above was written correctly

!cat get_sentimentr.R


library(sentimentr)
library(lexicon)

get_sentimentr_values <- function(s_v) {
  
  print('Processing sentimentr_jockersrinker')
  sentimentr_jockersrinker <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers_rinker, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_jockers')
  sentimentr_jockers <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                        adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  print('Processing sentimentr_huliu')
  sentimentr_huliu <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_huliu, 
                                        hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
           

In [142]:
# Setup python robject with external library::function()
# https://rpy2.github.io/doc/v3.0.x/html/generated_rst/pandas.html

# import rpy2.robjects as robjects

# Defining the R script and loading the instance in Python
# from rpy2.robjects import pandas2ri 
r = robjects.r

# Loading the function we have defined in R.
r['source']('get_sentimentr.R')

# Reading and processing data
get_sentimentr_function_r = robjects.globalenv['get_sentimentr_values']

0,1
value,[RTYPES.CLOSXP]
visible,[RTYPES.LGLSXP]


In [143]:
# Test

# Convert Python List of Strings to a R vector of characters
test_ls = corpus_dt['cdickens_achristmascarol']['text_clean'].to_list()
s_v = robjects.StrVector(test_ls)
type(s_v)

get_sentimentr_function_r(s_v)

rpy2.robjects.vectors.StrVector

[1] "Processing sentimentr_jockersrinker"
[1] "Processing sentimentr_jockers"
[1] "Processing sentimentr_huliu"
[1] "Processing sentimentr_nrc"
[1] "Processing sentimentr_senticnet"
[1] "Processing sentimentr_sentiword"
[1] "Processing sentimentr_loughran_mcdonald"
[1] "Processing sentimentr_socal_google"


text_clean,sentimentr_jockersrinker,sentimentr_jockers,...,sentimentr_sentiword,sentimentr_loughran_mcdonald,sentimentr_socal_google
'chapter ...,-0.268328,-0.268328,...,-0.055902,0.000000,1.527067
'marley b...,-0.408248,-0.408248,,-0.102062,0.000000,-0.486695
'there be...,0.283473,0.283473,,-0.283473,0.377964,0.000000
'the regi...,-0.353553,-0.353553,,0.000000,0.000000,0.470472
...,...,...,,...,...,...
'his own ...,0.015076,0.015076,,0.293974,0.000000,0.614583
'he have ...,0.168224,0.168224,,0.268618,0.162221,0.326224
'may that...,0.000000,0.000000,,0.339200,0.000000,0.000000
'and so a...,0.301511,0.301511,,0.075378,0.000000,-0.427565


In [158]:
novels_dt.keys()

dict_keys(['cdickens_achristmascarol', 'cdickens_greatexpectations', 'dbrown_thedavincicode', 'ddefoe_robinsoncrusoe', 'eljames_fiftyshadesofgrey', 'emforster_howardsend', 'fbaum_thewonderfulwizardofoz', 'fdouglass_narrativelifeofaslave', 'fscottfitzgerald_thegreatgatsby', 'geliot_middlemarch', 'hjames_portraitofalady', 'homer-ewilson_odyssey', 'imcewan_machineslikeme', 'jausten_prideandprejudice', 'jconrad_heartofdarkness', 'jjoyce_portraitoftheartist', 'jkrowling_1sorcerersstone', 'jkrowling_4gobletoffire', 'jkrowling_4gobletoffire_screenplay', 'kvonnegut_slaughterhousefive', 'mproust-mtreharne_3guermantesway', 'mshelley_frankenstein', 'mtwain_huckleberryfinn', 'pjackson_thelightningthief', 'staugustine_confessions9end', 'tmorrison_beloved', 'vnabokov_palefire', 'vwoolf_mrsdalloway', 'vwoolf_orlando', 'vwoolf_thewaves', 'vwoolf_tothelighthouse', 'wgolding_lordoftheflies'])

In [None]:
text_clean_ct = corpus_dt['dbrown_thedavincicode'].text_clean.isna().sum()
text_clean_ct
# len(text_clean_ls.isnull())

**[RE-EXECUTE] May have to re-execute following code cell several times**

In [147]:
%whos dict

Variable             Type    Data/Info
--------------------------------------
corpus_all_dt        dict    n=32
corpus_dt            dict    n=32
corpus_syuzhetr_dt   dict    n=0
finance_dt           dict    n=10
myparams             dict    n=1
novels_dt            dict    n=32
social_dt            dict    n=9
test_dt              dict    n=32


In [148]:
%%time

# NOTE: 8m19s 13 Novels 
#      16m39s 19 Novels
#     -----------------
#      24m58s 32 Novels

# Call external get_sentimentr::get_sentimentr_values with Python loop over all novels

# novels_sentimentr_dt = {}

anovel_df = pd.DataFrame()

novels_keys_ls = list(corpus_dt.keys())
novels_keys_ls.sort()
# for i, anovel in enumerate(novels_keys_ls[:19]):
for i, anovel in enumerate(novels_keys_ls):  
  print(f'\nProcessing Novel #{i}: {anovel}')
  print(f'     {corpus_dt[anovel].shape}')
  # Get text_clean as list of strings
  text_clean_ls = corpus_dt[anovel]['text_clean'].to_list()

  # Convert Python List of Strings to a R vector of characters
  # https://rpy2.github.io/doc/v3.0.x/html/generated_rst/pandas.html
  s_v = robjects.StrVector(text_clean_ls)
  anovel_df_r = get_sentimentr_function_r(s_v)

  # Convert rpy2.robjects.vectors.DataFrame to pandas.core.frame.DataFrame
  # https://stackoverflow.com/questions/20630121/pandas-how-to-convert-r-dataframe-back-to-pandas 
  print(f'type(anovel_df_r): {type(anovel_df_r)}')
  anovel_df = pd.DataFrame.from_dict({ key : np.asarray(anovel_df_r.rx2(key)) for key in anovel_df_r.names })
  print(f'type(anovel_df): {type(anovel_df)}')

  # Save Results
  # novels_dt[anovel] = anovel_df.copy(deep=True)

  corpus_dt[anovel]['sentimentr_jockersrinker'] = anovel_df[anovel]['sentimentr_jockersrinker']
  corpus_dt[anovel]['sentimentr_jockers'] = anovel_df[anovel]['sentimentr_jockers']
  corpus_dt[anovel]['sentimentr_huliu'] = anovel_df[anovel]['sentimentr_huliu']
  corpus_dt[anovel]['sentimentr_nrc'] = anovel_df[anovel]['sentimentr_nrc']
  corpus_dt[anovel]['sentimentr_senticnet'] = anovel_df[anovel]['sentimentr_senticnet']
  corpus_dt[anovel]['sentimentr_sentiword'] = anovel_df[anovel]['sentimentr_sentiword']
  corpus_dt[anovel]['sentimentr_loughran_mcdonald'] = anovel_df[anovel]['sentimentr_loughran_mcdonald']
  corpus_dt[anovel]['sentimentr_socal_google'] = anovel_df[anovel]['sentimentr_socal_google']  


Novel #0: cdickens_achristmascarol
     (1946, 7)
[1] "Processing sentimentr_jockersrinker"
[1] "Processing sentimentr_jockers"
[1] "Processing sentimentr_huliu"
[1] "Processing sentimentr_nrc"
[1] "Processing sentimentr_senticnet"
[1] "Processing sentimentr_sentiword"
[1] "Processing sentimentr_loughran_mcdonald"
[1] "Processing sentimentr_socal_google"
type(anovel_df_r): <class 'rpy2.robjects.vectors.DataFrame'>
type(anovel_df): <class 'pandas.core.frame.DataFrame'>

Novel #1: cdickens_greatexpectations
     (9975, 7)
[1] "Processing sentimentr_jockersrinker"
[1] "Processing sentimentr_jockers"
[1] "Processing sentimentr_huliu"
[1] "Processing sentimentr_nrc"
[1] "Processing sentimentr_senticnet"
[1] "Processing sentimentr_sentiword"
[1] "Processing sentimentr_loughran_mcdonald"
[1] "Processing sentimentr_socal_google"
type(anovel_df_r): <class 'rpy2.robjects.vectors.DataFrame'>
type(anovel_df): <class 'pandas.core.frame.DataFrame'>

Novel #2: dbrown_thedavincicode
     (13079, 7)
[

In [164]:
cols_sentimentr_ls = [x for x in novels_dt['cdickens_greatexpectations'].columns if 'sentimentr_' in x]
cols_sentimentr_ls

['sentimentr_jockersrinker',
 'sentimentr_jockers',
 'sentimentr_huliu',
 'sentimentr_nrc',
 'sentimentr_senticnet',
 'sentimentr_sentiword',
 'sentimentr_loughran_mcdonald',
 'sentimentr_socal_google']

In [168]:
for i, anovel in enumerate(novels_keys_ls):
  print(f'Novel #{i}: {anovel}')
  for j, amodel in enumerate(cols_sentimentr_ls):
    print(f'           Model #{j}: {amodel}')
    corpus_dt[anovel][amodel] = novels_dt[anovel][amodel]

Novel #0: cdickens_achristmascarol
          Model #0: sentimentr_jockersrinker
          Model #1: sentimentr_jockers
          Model #2: sentimentr_huliu
          Model #3: sentimentr_nrc
          Model #4: sentimentr_senticnet
          Model #5: sentimentr_sentiword
          Model #6: sentimentr_loughran_mcdonald
          Model #7: sentimentr_socal_google
Novel #1: cdickens_greatexpectations
          Model #0: sentimentr_jockersrinker
          Model #1: sentimentr_jockers
          Model #2: sentimentr_huliu
          Model #3: sentimentr_nrc
          Model #4: sentimentr_senticnet
          Model #5: sentimentr_sentiword
          Model #6: sentimentr_loughran_mcdonald
          Model #7: sentimentr_socal_google
Novel #2: dbrown_thedavincicode
          Model #0: sentimentr_jockersrinker
          Model #1: sentimentr_jockers
          Model #2: sentimentr_huliu
          Model #3: sentimentr_nrc
          Model #4: sentimentr_senticnet
          Model #5: sentimentr_sentiw

In [169]:
corpus_dt['cdickens_greatexpectations'].head()

Unnamed: 0.1,Unnamed: 0,text_raw,text_clean,syuzhetr_syuzhet,syuzhetr_bing,syuzhetr_afinn,syuzhetr_nrc,sentimentr_jockersrinker,sentimentr_jockers,sentimentr_huliu,sentimentr_nrc,sentimentr_senticnet,sentimentr_sentiword,sentimentr_loughran_mcdonald,sentimentr_socal_google
0,0,"My father's family name being Pirrip, and my C...",my father s family name be pirrip and my chris...,0.35,0,0,1.0,0.028868,0.028868,0.0,0.19245,-0.200725,0.291081,0.0,0.766824
1,1,"So, I called myself Pip, and came to be called...",so i call myself pip and come to be call pip,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.112162,0.037689,0.0,0.0
2,2,"I give Pirrip as my father's family name, on t...",i give pirrip a my father s family name on the...,0.5,0,1,1.0,0.117851,0.117851,0.0,0.235702,0.329512,0.10312,0.0,0.804835
3,3,"Joe Gargery, who married the blacksmith.",joe gargery who marry the blacksmith,0.6,0,0,1.0,0.244949,0.244949,0.0,0.408248,-0.243724,0.0,0.0,0.0
4,4,"As I never saw my father or my mother, and nev...",a i never see my father or my mother and never...,0.5,2,2,1.0,0.234261,0.078087,0.312348,0.156174,0.526306,0.101838,-0.156174,0.169226


In [170]:
len(corpus_dt)

32

## Checkpoint: Save SentimentR Values

In [151]:
# Verify in SentimentArcs Root Directory

!pwd
print('\n')
!ls

/gdrive/MyDrive/cdh/sentiment_arcs


config		  notebooks  sentiment_clean  text_clean
get_sentimentr.R  plots      sentiment_raw    text_raw


In [152]:
# Verify Save Destination Subdir: SUBDIR_SENTIMENT_RAW

SUBDIR_SENTIMENT_RAW
print('\n')
!ls $SUBDIR_SENTIMENT_RAW

'./sentiment_raw/novels_raw/'



all_4syuzhetr.json


In [171]:
corpus_dt.keys()

dict_keys(['cdickens_achristmascarol', 'cdickens_greatexpectations', 'dbrown_thedavincicode', 'ddefoe_robinsoncrusoe', 'eljames_fiftyshadesofgrey', 'emforster_howardsend', 'fbaum_thewonderfulwizardofoz', 'fdouglass_narrativelifeofaslave', 'fscottfitzgerald_thegreatgatsby', 'geliot_middlemarch', 'hjames_portraitofalady', 'homer-ewilson_odyssey', 'imcewan_machineslikeme', 'jausten_prideandprejudice', 'jconrad_heartofdarkness', 'jjoyce_portraitoftheartist', 'jkrowling_1sorcerersstone', 'jkrowling_4gobletoffire', 'jkrowling_4gobletoffire_screenplay', 'kvonnegut_slaughterhousefive', 'mproust-mtreharne_3guermantesway', 'mshelley_frankenstein', 'mtwain_huckleberryfinn', 'pjackson_thelightningthief', 'staugustine_confessions9end', 'tmorrison_beloved', 'vnabokov_palefire', 'vwoolf_mrsdalloway', 'vwoolf_orlando', 'vwoolf_thewaves', 'vwoolf_tothelighthouse', 'wgolding_lordoftheflies'])

In [172]:
corpus_dt['cdickens_achristmascarol']

Unnamed: 0.1,Unnamed: 0,text_raw,text_clean,syuzhetr_syuzhet,syuzhetr_bing,syuzhetr_afinn,syuzhetr_nrc,sentimentr_jockersrinker,sentimentr_jockers,sentimentr_huliu,sentimentr_nrc,sentimentr_senticnet,sentimentr_sentiword,sentimentr_loughran_mcdonald,sentimentr_socal_google
0,0,CHAPTER I: MARLEY'S GHOST,chapter i marley s ghost,-0.60,0,-1,0.0,-0.268328,-0.268328,0.000000,0.000000,-0.221371,-0.055902,0.000000,1.527067
1,1,MARLEY was dead: to begin with.,marley be dead to begin with,-1.00,-1,-3,0.0,-0.408248,-0.408248,-0.408248,0.000000,-0.148602,-0.102062,0.000000,-0.486695
2,2,There is no doubt whatever about that.,there be no doubt whatever about that,-0.75,-1,-2,-1.0,0.283473,0.283473,0.377964,0.377964,-0.160257,-0.283473,0.377964,0.000000
3,3,The register of his burial was signed by the c...,the register of his burial be sign by the cler...,-1.50,-1,0,-1.0,-0.353553,-0.353553,-0.235702,-0.235702,0.365103,0.000000,0.000000,0.470472
4,4,Scrooge signed it: and Scrooge's name was good...,scrooge sign it and scrooge s name be good upo...,0.75,1,3,1.0,0.167705,0.167705,0.223607,0.223607,0.515414,0.184010,0.223607,1.182146
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1941,1941,Some people laughed to see the alteration in h...,some people laugh to see the alteration in him...,1.80,2,5,3.0,0.136111,0.252778,0.305556,0.561111,0.504650,0.379861,0.333333,0.775901
1942,1942,His own heart laughed: and that was quite enou...,his own heart laugh and that be quite enough f...,0.25,1,1,1.0,0.015076,0.015076,0.542720,0.301511,0.970264,0.293974,0.000000,0.614583
1943,1943,"He had no further intercourse with Spirits, bu...",he have no far intercourse with spirit but liv...,2.35,1,4,6.0,0.168224,0.168224,0.162221,0.486502,0.465690,0.268618,0.162221,0.326224
1944,1944,"May that be truly said of us, and all of us!",may that be truly say of us and all of us,0.00,0,0,0.0,0.000000,0.000000,0.000000,0.000000,0.855749,0.339200,0.000000,0.000000


In [173]:
# Save sentiment values to subdir_sentiments

write_dict_dfs(corpus_dt, out_file='all_7sentimentr.json', out_dir=SUBDIR_SENTIMENT_RAW)

Saving file to: ./sentiment_raw/novels_raw/all_7sentimentr.json


In [175]:
# Verify Dictionary was saved correctly by reading back the *.json datafile

test_dt = read_dict_dfs(in_file='all_7sentimentr.json', in_dir=SUBDIR_SENTIMENT_RAW)
test_dt.keys()

dict_keys(['cdickens_achristmascarol', 'cdickens_greatexpectations', 'dbrown_thedavincicode', 'ddefoe_robinsoncrusoe', 'eljames_fiftyshadesofgrey', 'emforster_howardsend', 'fbaum_thewonderfulwizardofoz', 'fdouglass_narrativelifeofaslave', 'fscottfitzgerald_thegreatgatsby', 'geliot_middlemarch', 'hjames_portraitofalady', 'homer-ewilson_odyssey', 'imcewan_machineslikeme', 'jausten_prideandprejudice', 'jconrad_heartofdarkness', 'jjoyce_portraitoftheartist', 'jkrowling_1sorcerersstone', 'jkrowling_4gobletoffire', 'jkrowling_4gobletoffire_screenplay', 'kvonnegut_slaughterhousefive', 'mproust-mtreharne_3guermantesway', 'mshelley_frankenstein', 'mtwain_huckleberryfinn', 'pjackson_thelightningthief', 'staugustine_confessions9end', 'tmorrison_beloved', 'vnabokov_palefire', 'vwoolf_mrsdalloway', 'vwoolf_orlando', 'vwoolf_thewaves', 'vwoolf_tothelighthouse', 'wgolding_lordoftheflies'])

In [176]:
test_dt['cdickens_greatexpectations'].columns

Index(['Unnamed: 0', 'sentimentr_huliu', 'sentimentr_jockers',
       'sentimentr_jockersrinker', 'sentimentr_loughran_mcdonald',
       'sentimentr_nrc', 'sentimentr_senticnet', 'sentimentr_sentiword',
       'sentimentr_socal_google', 'syuzhetr_afinn', 'syuzhetr_bing',
       'syuzhetr_nrc', 'syuzhetr_syuzhet', 'text_clean', 'text_raw'],
      dtype='object')

## Plot SentimentR 7 Models

In [177]:
#@markdown Select option to save plots:
Save_Raw_Plots = True #@param {type:"boolean"}

Save_Smooth_Plots = True #@param {type:"boolean"}
Resolution = "100" #@param ["100", "300"]



In [178]:
# Get Col Names for all SentimentR Models
cols_all_ls = corpus_dt['cdickens_achristmascarol'].columns
cols_sentimentr_ls = [x for x in cols_all_ls if 'sentimentr_' in x]
cols_sentimentr_ls

['sentimentr_jockersrinker',
 'sentimentr_jockers',
 'sentimentr_huliu',
 'sentimentr_nrc',
 'sentimentr_senticnet',
 'sentimentr_sentiword',
 'sentimentr_loughran_mcdonald',
 'sentimentr_socal_google']

In [None]:
novels_dt['cdickens_achristmascarol'][0]

'A Christmas Carol by Charles Dickens '

In [179]:
SUBDIR_PLOTS

'./plots/novels_plots/'

In [183]:
novels_dt['cdickens_greatexpectations']

Unnamed: 0,text_clean,sentimentr_jockersrinker,sentimentr_jockers,sentimentr_huliu,sentimentr_nrc,sentimentr_senticnet,sentimentr_sentiword,sentimentr_loughran_mcdonald,sentimentr_socal_google
0,my father s family name be pirrip and my chris...,0.028868,0.028868,0.000000,0.192450,-0.200725,0.291081,0.000000,0.766824
1,so i call myself pip and come to be call pip,0.000000,0.000000,0.000000,0.000000,0.112162,0.037689,0.000000,0.000000
2,i give pirrip a my father s family name on the...,0.117851,0.117851,0.000000,0.235702,0.329512,0.103120,0.000000,0.804835
3,joe gargery who marry the blacksmith,0.244949,0.244949,0.000000,0.408248,-0.243724,0.000000,0.000000,0.000000
4,a i never see my father or my mother and never...,0.234261,0.078087,0.312348,0.156174,0.526306,0.101838,-0.156174,0.169226
...,...,...,...,...,...,...,...,...,...
9970,i have be bend and break buti hopeinto a well ...,0.165831,0.165831,0.000000,0.301511,0.606339,0.166892,-0.301511,0.000000
9971,be a considerate and good to me a you be and t...,0.575000,0.575000,0.500000,0.750000,0.502750,0.054688,0.250000,0.302033
9972,we be friend say i rise and bend over her a sh...,0.200000,0.200000,0.000000,0.250000,0.369750,0.156250,0.000000,0.000000
9973,and will continue friend apart say estella,0.453557,0.453557,0.000000,0.755929,0.485306,-0.141737,0.000000,0.000000


In [180]:
# Verify 7 SentimentR Models with Plots


for i, anovel in enumerate(list(corpus_dt.keys())):

  print(f'Novel #{i}: {novels_dt[anovel][0]}')

  # Raw Sentiments 
  fig = corpus_dt[anovel][cols_sentimentr_ls].plot(title=f'{novels_dt[anovel][0]}\n SentimentR 7 Models: Raw Sentiments', alpha=0.3)
  plt.show();

  if Save_Raw_Plots:
    plt.savefig(f'{SUBDIR_PLOTS}plot_sentimentr_raw_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))

  
  # Smoothed Sentiments (SMA 10%)
  # novel_sample = 'cdickens_achristmascarol'
  win_10per = int(corpus_dt[anovel].shape[0] * 0.1)
  corpus_dt[anovel][cols_sentimentr_ls].rolling(win_10per, center=True, min_periods=0).mean().plot(title=f'{novels_dt[anovel][0]}\n SentimentR 7 Models: Smoothed Sentiments (SMA 10%)', alpha=0.3)
  plt.show();

  if Save_Smooth_Plots:
    plt.savefig(f'{SUBDIR_PLOTS}plot_sentimentr_smooth10sma_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))


In [None]:
sentimentr_cols_ls = novels_sentimentr_dt['cdickens_achristmascarol'].columns
sentimentr_models_ls = [x for x in sentimentr_cols_ls if 'sentimentr_' in x]
sentimentr_models_ls

# OR

# syuzhetr_models_ls = ['syuzhetr_afinn', 'syuzhetr_bing', 'syuzhetr_nrc', 'syuzhetr_syuzhet']

In [None]:
# Verify 7 Sentiment Models from Syuzhet for sample Novel

for i, anovel in enumerate(list(corpus_dt.keys())):

  # Raw Sentiments 
  fig = novels_sentimentr_dt[anovel][sentimentr_models_ls].plot(title=f'{novels_dt[anovel][0]}\n SentimentR 4 Models: Raw Sentiments', alpha=0.3)
  plt.show()

  if Save_Raw_Plots:
    plt.savefig(f'{subdir_plots_arcs}plot_sentimentr_raw_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))

  
  # Smoothed Sentiments (SMA 10%)
  # novel_sample = 'cdickens_achristmascarol'
  win_10per = int(novels_sentimentr_dt[anovel].shape[0] * 0.1)
  novels_sentimentr_dt[anovel][sentimentr_models_ls].rolling(win_10per, center=True, min_periods=0).mean().plot(title=f'{novels_dt[anovel][0]}\n SentimentR 4 Models: Smoothed Sentiments (SMA 10%)', alpha=0.3)
  plt.show()

  if Save_Smooth_Plots:
    plt.savefig(f'{subdir_plots_arcs}plot_sentimentr_smooth10sma_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))


# **END OF NOTEBOOK**

### Save Checkpoint

In [None]:
# Verify save_to directory

subdir_sentiments
print('\n')
!ls $subdir_sentiments

In [None]:
# Save sentiment values to subdir_sentiments

write_dict_dfs(corpus_syuzhetr_dt, out_file='all_4syuzhetr.json', out_dir=subdir_sentiments)

## Plot SyuzhetR 4 Models

In [None]:
#@markdown Select option to save plots:
Save_Raw_Plots = True #@param {type:"boolean"}

Save_Smooth_Plots = True #@param {type:"boolean"}
Resolution = "100" #@param ["100", "300"]



In [None]:
syuzhetr_cols_ls = corpus_syuzhetr_dt['cdickens_achristmascarol'].columns
syuzhetr_models_ls = [x for x in syuzhetr_cols_ls if 'syuzhetr_' in x]
syuzhetr_models_ls

# OR

# syuzhetr_models_ls = ['syuzhetr_afinn', 'syuzhetr_bing', 'syuzhetr_nrc', 'syuzhetr_syuzhet']

In [None]:
novels_dt['cdickens_achristmascarol'][0]

In [None]:
subdir_plots_arcs

In [None]:
# Verify 4 Sentiment Models from Syuzhet for sample Novel

for i, anovel in enumerate(list(corpus_dt.keys())):

  # Raw Sentiments 
  fig = corpus_syuzhetr_dt[anovel][syuzhetr_models_ls].plot(title=f'{novels_dt[anovel][0]}\n SyuzhetR 4 Models: Raw Sentiments', alpha=0.3)
  plt.show()

  if Save_Raw_Plots:
    plt.savefig(f'{subdir_plots_arcs}plot_syuzhetr_raw_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))

  
  # Smoothed Sentiments (SMA 10%)
  # novel_sample = 'cdickens_achristmascarol'
  win_10per = int(corpus_syuzhetr_dt[anovel].shape[0] * 0.1)
  corpus_syuzhetr_dt[anovel][syuzhetr_models_ls].rolling(win_10per, center=True, min_periods=0).mean().plot(title=f'{novels_dt[anovel][0]}\n SyuzhetR 4 Models: Smoothed Sentiments (SMA 10%)', alpha=0.3)
  plt.show()

  if Save_Smooth_Plots:
    plt.savefig(f'{subdir_plots_arcs}plot_syuzhetr_smooth10sma_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))


### Save Checkpoint

In [None]:
# Verify save_to directory

subdir_sentiments
print('\n')
!ls $subdir_sentiments

In [None]:
# Save sentiment values to subdir_sentiments

write_dict_dfs(corpus_syuzhetr_dt, out_file='all_4syuzhetr.json', out_dir=subdir_sentiments)

## Plot SyuzhetR 4 Models

In [None]:
#@markdown Select option to save plots:
Save_Raw_Plots = True #@param {type:"boolean"}

Save_Smooth_Plots = True #@param {type:"boolean"}
Resolution = "100" #@param ["100", "300"]



In [None]:
syuzhetr_cols_ls = corpus_syuzhetr_dt['cdickens_achristmascarol'].columns
syuzhetr_models_ls = [x for x in syuzhetr_cols_ls if 'syuzhetr_' in x]
syuzhetr_models_ls

# OR

# syuzhetr_models_ls = ['syuzhetr_afinn', 'syuzhetr_bing', 'syuzhetr_nrc', 'syuzhetr_syuzhet']

In [None]:
novels_dt['cdickens_achristmascarol'][0]

In [None]:
subdir_plots_arcs

In [None]:
# Verify 4 Sentiment Models from Syuzhet for sample Novel

for i, anovel in enumerate(list(corpus_dt.keys())):

  # Raw Sentiments 
  fig = corpus_syuzhetr_dt[anovel][syuzhetr_models_ls].plot(title=f'{novels_dt[anovel][0]}\n SyuzhetR 4 Models: Raw Sentiments', alpha=0.3)
  plt.show()

  if Save_Raw_Plots:
    plt.savefig(f'{subdir_plots_arcs}plot_syuzhetr_raw_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))

  
  # Smoothed Sentiments (SMA 10%)
  # novel_sample = 'cdickens_achristmascarol'
  win_10per = int(corpus_syuzhetr_dt[anovel].shape[0] * 0.1)
  corpus_syuzhetr_dt[anovel][syuzhetr_models_ls].rolling(win_10per, center=True, min_periods=0).mean().plot(title=f'{novels_dt[anovel][0]}\n SyuzhetR 4 Models: Smoothed Sentiments (SMA 10%)', alpha=0.3)
  plt.show()

  if Save_Smooth_Plots:
    plt.savefig(f'{subdir_plots_arcs}plot_syuzhetr_smooth10sma_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))


### Save Checkpoint

In [None]:
# Verify save_to directory

subdir_sentiments
print('\n')
!ls $subdir_sentiments

In [None]:
# Save sentiment values to subdir_sentiments

write_dict_dfs(corpus_syuzhetr_dt, out_file='all_4syuzhetr.json', out_dir=subdir_sentiments)

## Plot SyuzhetR 4 Models

In [None]:
#@markdown Select option to save plots:
Save_Raw_Plots = True #@param {type:"boolean"}

Save_Smooth_Plots = True #@param {type:"boolean"}
Resolution = "100" #@param ["100", "300"]



In [None]:
syuzhetr_cols_ls = corpus_syuzhetr_dt['cdickens_achristmascarol'].columns
syuzhetr_models_ls = [x for x in syuzhetr_cols_ls if 'syuzhetr_' in x]
syuzhetr_models_ls

# OR

# syuzhetr_models_ls = ['syuzhetr_afinn', 'syuzhetr_bing', 'syuzhetr_nrc', 'syuzhetr_syuzhet']

In [None]:
novels_dt['cdickens_achristmascarol'][0]

In [None]:
subdir_plots_arcs

In [None]:
# Verify 4 Sentiment Models from Syuzhet for sample Novel

for i, anovel in enumerate(list(corpus_dt.keys())):

  # Raw Sentiments 
  fig = corpus_syuzhetr_dt[anovel][syuzhetr_models_ls].plot(title=f'{novels_dt[anovel][0]}\n SyuzhetR 4 Models: Raw Sentiments', alpha=0.3)
  plt.show()

  if Save_Raw_Plots:
    plt.savefig(f'{subdir_plots_arcs}plot_syuzhetr_raw_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))

  
  # Smoothed Sentiments (SMA 10%)
  # novel_sample = 'cdickens_achristmascarol'
  win_10per = int(corpus_syuzhetr_dt[anovel].shape[0] * 0.1)
  corpus_syuzhetr_dt[anovel][syuzhetr_models_ls].rolling(win_10per, center=True, min_periods=0).mean().plot(title=f'{novels_dt[anovel][0]}\n SyuzhetR 4 Models: Smoothed Sentiments (SMA 10%)', alpha=0.3)
  plt.show()

  if Save_Smooth_Plots:
    plt.savefig(f'{subdir_plots_arcs}plot_syuzhetr_smooth10sma_{anovel}_dpi{Resolution}.png', dpi=int(Resolution))


In [None]:
%%R -i s_v -i novels_roots -o novels_sentimentr_df

# novels_key_ls = c('cdickens_achristmascarol')

get_sentimentr_vals = function()
print('Processing sentimentr_jockersrinker')
sentimentr_jockersrinker <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers_rinker, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_jockers')
sentimentr_jockers <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_huliu')
sentimentr_huliu <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_huliu, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_nrc')
sentimentr_nrc <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_nrc, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_senticnet')
sentimentr_senticnet <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_senticnet, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_sentiword')
sentimentr_sentiword <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_sentiword, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_loughran_mcdonald')
sentimentr_loughran_mcdonald <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_loughran_mcdonald, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_socal_google')
sentimentr_socal_google <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_socal_google, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

novels_sentimentr_df <- data.frame('text_clean' = s_v,
                              'sentimentr_jockersrinker' = sentimentr_jockersrinker$sentiment,
                              'sentimentr_jockers' = sentimentr_jockers$sentiment,
                              'sentimentr_huliu' = sentimentr_huliu$sentiment,
                              'sentimentr_nrc' = sentimentr_nrc$sentiment,
                              'sentimentr_senticnet' = sentimentr_senticnet$sentiment,
                              'sentimentr_sentiword' = sentimentr_sentiword$sentiment,
                              'sentimentr_loughran_mcdonald' = sentimentr_loughran_mcdonald$sentiment,
                              'sentimentr_socal_google' = sentimentr_socal_google$sentiment
                              )


In [None]:
%%R -i s_v -i novels_roots -o novels_sentimentr_df

# novels_key_ls = c('cdickens_achristmascarol')


print('Processing sentimentr_jockersrinker')
sentimentr_jockersrinker <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers_rinker, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_jockers')
sentimentr_jockers <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_huliu')
sentimentr_huliu <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_huliu, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_nrc')
sentimentr_nrc <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_nrc, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_senticnet')
sentimentr_senticnet <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_senticnet, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_sentiword')
sentimentr_sentiword <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_sentiword, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_loughran_mcdonald')
sentimentr_loughran_mcdonald <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_loughran_mcdonald, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

print('Processing sentimentr_socal_google')
sentimentr_socal_google <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_socal_google, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

novels_sentimentr_df <- data.frame('text_clean' = s_v,
                              'sentimentr_jockersrinker' = sentimentr_jockersrinker$sentiment,
                              'sentimentr_jockers' = sentimentr_jockers$sentiment,
                              'sentimentr_huliu' = sentimentr_huliu$sentiment,
                              'sentimentr_nrc' = sentimentr_nrc$sentiment,
                              'sentimentr_senticnet' = sentimentr_senticnet$sentiment,
                              'sentimentr_sentiword' = sentimentr_sentiword$sentiment,
                              'sentimentr_loughran_mcdonald' = sentimentr_loughran_mcdonald$sentiment,
                              'sentimentr_socal_google' = sentimentr_socal_google$sentiment
                              )


In [None]:
novels_sentimentr_df.head()

In [None]:
# Get list of sentimentr models from columns

sentimentr_models_ls = [x for x in novels_sentimentr_df.columns if 'sentimentr_' in x]
sentimentr_models_ls

In [None]:
# Verify 4 Sentiment Models from Syuzhet for sample Novel

# Raw Sentiments 
# corpus_dt['cdickens_achristmascarol'][['syuzhetr_syuzhet','syuzhetr_bing','syuzhetr_afinn','syuzhetr_nrc']].plot(title='Raw Sentiments', alpha=0.3)
novels_sentimentr_df[sentimentr_models_ls].plot(title='SentimentR Raw Sentiments', alpha=0.3)

# Smoothed Sentiments (SMA 10%)
novel_sample = 'cdickens_achristmascarol'
win_10per = int(corpus_dt['cdickens_achristmascarol'].shape[0] * 0.1)
novels_sentimentr_df[sentimentr_models_ls].rolling(win_10per, center=True, min_periods=0).mean().plot(title='SentimentR Smoothed Sentiments (SMA 10%)', alpha=0.3)
plt.show()


In [None]:
%%file get_sentimentr.R

library(sentimentr)
library(lexicon)

get_sentimentr_values <- function(df_name, col_name){
  #' Preprocessing df to filter country
  #'
  #' This function returns a subset of the df
  #' if the value of the country column contains 
  #' the country we are passing
  #'
  #' @param df The dataframe containing the data 
  #' @param country The country we want to filter
  #
  print('Processing sentimentr_jockers')
  sentimentr_jockers <- sentiment(df_name$col_name, polarity_dt=lexicon::hash_sentiment_jockers, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)



  return(sentimentr_jockers)
}

In [None]:
!cat get_sentimentr.R

In [None]:
df = corpus_dt['cdickens_achristmascarol']
df.head()

In [None]:
dir(pandas2ri)

In [None]:
# https://medium.com/analytics-vidhya/calling-r-from-python-magic-of-rpy2-d8cbbf991571

# import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri# Defining the R script and loading the instance in Python
from rpy2.robjects.conversion import localconverter

r = robjects.r
r['source']('get_sentimentr.R')# Loading the function we have defined in R.
sentimentr_function_r = robjects.globalenv['get_sentimentr_values']

# r['source']('sentimentr')# Loading the function we have defined in R.
# sentiment_function_r = robjects.globalenv['sentiment']

# Reading and processing data
# df = pd.read_csv("Country-Sales.csv")#converting it into r object for passing into r function
df = corpus_dt['cdickens_achristmascarol']
with localconverter(robjects.default_converter + pandas2ri.converter):
  df_r = robjects.conversion.py2rpy(df)

# df_r = pandas2ri.py2rpy_dataframe(df)

#Invoking the R function and getting the result
df_result_r = sentimentr_function_r(df_r, 'text_clean')

#Converting it back to a pandas dataframe.
df_result = pandas2ri.py2ri(df_result_r)

df_result.head()



## Save Checkpoint


# **END OF NOTEBOOK**

In [None]:
pd.DataFrame(syuzhetr_syuzhet).plot()

In [None]:
install.packages('syuzhet')
library(syuzhet)

In [None]:
files <- list.files(pattern="*.csv", full.names=TRUE, recursive=FALSE)
length(files)

In [None]:
files[1]

In [None]:
sentiment_syuzhet <- get_sentiment(s_v, method='syuzhet')
sentiment_syuzhet

In [None]:
get_syuzhetr_sentiments <- function(afilename) {
  anovel_df <- read.csv(file = afilename, header=FALSE)
  # typeof(anovel_df$V3)
  # return(anovel_df)
  s_v <- anovel_df$V3

  syuzhetr_jockers <- get_sentiment(s_v, method='syuzhet')
  syuzhetr_bing <- get_sentiment(s_v, method='bing')
  syuzhetr_afinn <- get_sentiment(s_v, method='afinn')
  syuzhetr_nrc <- get_sentiment(s_v, method='nrc')

  syuzhet_df <- data.frame(syuzhetr_jockers,
                           syuzhetr_bing,
                           syuzhetr_afinn,
                           syuzhetr_nrc)

  return(syuzhet_df)
}

In [None]:
get_syuzhetr_sentiments(files[1])

In [None]:
get_syuzhetr_sentiments()

In [None]:
lapply(files, function(x) {
  print(paste0('Novel: ', x))
  anovel_df <- read.csv(file = x, header=FALSE)
  syuzhetr_df <- get_syuzhetr_sentiments(anovel_df)
  head(anovel_df)
})

## Syuzhet Lexicon

In [None]:
sentiment_syuzhet <- get_sentiment(s_v, method='syuzhet')
sentiment_syuzhet

In [None]:
simple_plot(sentiment_syuzhet)

In [None]:
s_v_sentiment_dct <- get_dct_transform(
  sentiment_syuzhet,
  low_pass_size = 5,
  x_reverse_len = 100,
  scale_vals = F,
  scale_range = T
)
plot(
  s_v_sentiment_dct,
  type = 'l',
  main="DCT Transformed Syuzhet Sentiments",
  xlab = "Narrative Time",
  ylab = "Emotional Valence",
  col = "red"
)

## Bing Lexicon

In [None]:
sentiment_bing <- get_sentiment(s_v, method='bing')
sentiment_bing

In [None]:
simple_plot(sentiment_bing)

In [None]:
s_v_sentiment_dct <- get_dct_transform(
  sentiment_bing,
  low_pass_size = 5,
  x_reverse_len = 100,
  scale_vals = F,
  scale_range = T
)
plot(
  s_v_sentiment_dct,
  type = 'l',
  main="DCT Transformed Bing Sentiments",
  xlab = "Narrative Time",
  ylab = "Emotional Valence",
  col = "red"
)

## AFINN Lexicon

In [None]:
sentiment_afinn <- get_sentiment(s_v, method='afinn')
sentiment_afinn

In [None]:
simple_plot(sentiment_afinn)

In [None]:
s_v_sentiment_dct <- get_dct_transform(
  sentiment_afinn,
  low_pass_size = 5,
  x_reverse_len = 100,
  scale_vals = F,
  scale_range = T
)
plot(
  s_v_sentiment_dct,
  type = 'l',
  main="DCT Transformed AFINN Sentiments",
  xlab = "Narrative Time",
  ylab = "Emotional Valence",
  col = "red"
)

## NRC Lexicon

In [None]:
sentiment_nrc <- get_sentiment(s_v, method='nrc')
sentiment_nrc

In [None]:
simple_plot(sentiment_nrc)

In [None]:
s_v_sentiment_dct <- get_dct_transform(
  sentiment_nrc,
  low_pass_size = 5,
  x_reverse_len = 100,
  scale_vals = F,
  scale_range = T
)
plot(
  s_v_sentiment_dct,
  type = 'l',
  main="DCT Transformed Bing Sentiments",
  xlab = "Narrative Time",
  ylab = "Emotional Valence",
  col = "red"
)

## Stanford Lexicon (Disabled, req Java OpenNLP)

In [None]:
sentiment_stanford <- get_sentiment(s_v, method='stanford')
sentiment_stanford

In [None]:
simple_plot(sentiment_nrc)

In [None]:
s_v_sentiment_dct <- get_dct_transform(
  sentiment_nrc,
  low_pass_size = 5,
  x_reverse_len = 100,
  scale_vals = F,
  scale_range = T
)
plot(
  s_v_sentiment_dct,
  type = 'l',
  main="DCT Transformed Bing Sentiments",
  xlab = "Narrative Time",
  ylab = "Emotional Valence",
  col = "red"
)

## Combine all SyuzhetR Lexicons

In [None]:
syuzhetr_all_df <- data.frame(syuzhet = sentiment_syuzhet,
                              bing = sentiment_bing,
                              afinn = sentiment_afinn,
                              nrc = sentiment_nrc)

syuzhetr_all_df

In [None]:
write.csv(syuzhetr_all_df, 'syuzhetr_novel.csv', row.names=FALSE)

# SentimentR Library

* https://github.com/trinker/sentimentr

* https://cran.r-project.org/web/packages/sentimentr/

In [None]:
install.packages('sentimentr')
library(sentimentr)

In [None]:
if (!require("pacman")) install.packages("pacman")
pacman::p_load_gh("trinker/sentimentr", "trinker/stansent", "sfeuerriegel/SentimentAnalysis", "wrathematics/meanr")
pacman::p_load(syuzhet, qdap, microbenchmark, RSentiment)

In [None]:
files <- list.files(pattern="*.csv", full.names=TRUE, recursive=FALSE)
length(files)

In [None]:
files[1]

In [None]:
sentiment_syuzhet <- get_sentiment(s_v, method='syuzhet')
sentiment_syuzhet

In [None]:
get_sentimentr_sentiments <- function(afilename) {
  anovel_df <- read.csv(file = afilename, header=FALSE)
  # typeof(anovel_df$V3)
  # return(anovel_df)
  s_v <- anovel_df$V3

  jockersrinker <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers_rinker, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  jockers <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  huliu <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_huliu, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  loughran_mcdonald <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_loughran_mcdonald, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  sentimentr_nrc <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_nrc, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  senticnet <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_senticnet, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

  sentiword <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_sentiword, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)                                                                                                 

  socal_google <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_socal_google, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)                                                                                                 

  slangsd <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_slangsd, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)                                                                                                 

  emojis <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_emojis, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)                                                                                                 

  sentimentr_df <- data.frame(element_id = jockersrinker$element_id,
                              sentence_id = jockersrinker$sentence_id,
                              word_count = jockersrinker$word_count,
                              jockers_rinker = jockersrinker$sentiment,
                              jockers = jockers$sentiment,
                              huliu = huliu$sentiment,
                              lmcd = loughran_mcdonald$sentiment,
                              nrc = sentimentr_nrc$sentiment,
                              senticnet = senticnet$sentiment,
                              sentiword = sentiword$sentiment,
                              socal_google = socal_google$sentiment,
                              slangsd = slangsd$sentiment,
                              emojis = emojis$sentiment)
  
  return(sentimentr_df)
}

In [None]:
files[1]

In [None]:
get_sentimentr_sentiments(files[1])

In [None]:
typeof(s_v)

In [None]:
ase <- c(
    "I haven't been sad in a long time.",
    "I am extremely happy today.",
    "It's a good day.",
    "But suddenly I'm only a little bit happy.",
    "Then I'm not happy at all.",
    "In fact, I am now the least happy person on the planet.",
    "There is no happiness left in me.",
    "Wait, it's returned!",
    "I don't feel so bad after all!"
)
typeof(ase)

In [None]:
syuzhet <- setNames(as.data.frame(lapply(c("syuzhet", "bing", "afinn", "nrc"),
    function(x) get_sentiment(s_v, method=x))), c("jockers", "bing", "afinn", "nrc"))


In [None]:
dim(syuzhet)

In [None]:
syuzhet

## Jockers_Rinker Lexicon

In [None]:
SentimentAnalysis <- apply(analyzeSentiment(s_v)[c('SentimentGI', 'SentimentLM', 'SentimentQDAP') ], 2, round, 2)
colnames(SentimentAnalysis) <- gsub('^Sentiment', "SA_", colnames(SentimentAnalysis))

In [None]:
sentimentr_jockersrinker <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers_rinker, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_jockersrinker

## Jockers Lexicon

In [None]:
sentimentr_jockers <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_jockers, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_jockers

## Hu_Liu Lexicon

In [None]:
sentimentr_huliu <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_huliu, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_huliu

## Loughran_McDonald Lexicon

In [None]:
sentimentr_loughranmcdonald <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_loughran_mcdonald, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_loughranmcdonald

## NRC Lexicon

In [None]:
sentimentr_nrc <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_nrc, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_nrc

## SenticNet Lexicon

In [None]:
sentimentr_senticnet <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_senticnet, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_senticnet

## SentiWord Lexicon

In [None]:
sentimentr_sentiword <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_sentiword, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_sentiword

## Socal_Google Lexicon

In [None]:
sentimentr_socalgoogle <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_socal_google, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_socalgoogle

## SlangSD Lexicon

In [None]:
sentimentr_slangsd <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_slangsd, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_slangsd

## Emoji Lexicon

In [None]:
sentimentr_emojis <- sentiment(s_v, polarity_dt=lexicon::hash_sentiment_emojis, 
                                      hypen="", amplifier.weight=0.8, n.before=5, n.after=2,
                                      adversative.weight=0.25, neutral.nonverb.like = FALSE, missing_value = 0)

In [None]:
sentimentr_emojis

## Combine all SentimentR Lexicons

In [None]:
sentimentr_all_df <- data.frame(element_id = sentimentr_jockersrinker$element_id,
                               sentence_id = sentimentr_jockersrinker$sentence_id,
                               word_count = sentimentr_jockersrinker$word_count,
                               jockersrinker = sentimentr_jockersrinker$sentiment,
                               jockers = sentimentr_jockers$sentiment,
                               huliu = sentimentr_huliu$sentiment,
                               nrc = sentimentr_nrc$sentiment,
                               senticnet = sentimentr_senticnet$sentiment,
                               sentiword = sentimentr_sentiword$sentiment,
                               loughranmcdonald = sentimentr_loughranmcdonald$sentiment,
                               socalgoogle = sentimentr_socalgoogle$sentiment,
                               slangsd = sentimentr_slangsd$sentiment,
                               emojis = sentimentr_emojis$sentiment)

sentimentr_all_df

In [None]:
write.csv(sentimentr_all_df, 'sentiment_novel.csv', row.names=FALSE)

# Combine SentimentR and SyuzhetR

In [None]:
sentiments_all_df <- merge(syuzhetr_all_df, sentimentr_all_df)
sentiments_all_df

In [None]:
write.csv(sentiments_all_df, 'sentiments_novel.csv', row.names=FALSE)

# Iterate over all files in the Directory

In [None]:
getwd()

In [None]:
list.files()

In [None]:
SUBDIR_TEXT_CLEAN

In [None]:
list.files(pattern="*.csv")

In [None]:
# files <- list.files(path=SUBDIR_TEXT_CLEAN, pattern="*.csv", full.names=TRUE, recursive=FALSE)
files <- list.files(pattern="*.csv", full.names=TRUE, recursive=FALSE)
length(files)

In [None]:
files[1]

In [None]:
(strsplit(files[1], '[/.]')[[1]])[3]

In [None]:
for (afile in files) {
  # afile = .
  file_root = (strsplit(afile, '[/.]')[[1]])[3]
  file_roots_v <- c(file_roots_v, file_root)
}

In [None]:
length(file_roots_v)

In [None]:
library(stringr)

In [None]:
tibble(files) %>%
  separate(files, into = c("lv1", "lv2", "lv3"), sep = "/", fill = "left") %>%
  mutate("version" = str_extract(lv3, regex("v\\d+")))

In [None]:
files[0]

In [None]:
lapply(files, function(x) {
  print(paste0('\nNovel: ', x))
  # anovel_df <- read.csv(file = x, header=FALSE)
  syuzhetr_df <- get_syuzhetr_sentiments(x)
  sentimentr_df <- get_sentimentr_sentiments(x)

  anovel_sentiments_df <- merge(sentimentr_df, syuzhetr_df)
  # head(anovel_sentiments_df, 5)

  file_root = (strsplit(x, '[/.]')[[1]])[3]
  outfile <- paste0('sentiments_', file_root, '.csv')
  print(paste0('  saving to: ', outfile))
  write.csv(x=anovel_sentiments_df, file=outfile, row.names=FALSE)
})

In [None]:
lapply(files, function(x) {
  carSpeeds <- read.csv(file = 'data/car-speeds.csv')
  head(carSpeeds)

  
    t <- read.table(x, header=TRUE) # load file
    # apply function
    out <- function(t)
    # write to file
    write.table(out, "path/to/output", sep="\t", quote=FALSE, row.names=FALSE, col.names=TRUE)
})

# **END OF NOTEBOOK**

In [None]:
sentiment_all <- left_just(data.frame(
    sentimentr_jockersrinker = round(sentiment(s_v, question.weight = 0)[["sentiment"]], 2),
    sentimentr_jockers = round(sentiment(s_v, lexicon::hash_sentiment_jockers, question.weight = 0)[["sentiment"]], 2),    
    sentimentr_huliu = round(sentiment(s_v, lexicon::hash_sentiment_huliu, question.weight = 0)[["sentiment"]], 2),    
    sentimentr_sentiword = round(sentiment(s_v, lexicon::hash_sentiment_sentiword, question.weight = 0)[["sentiment"]], 2),    
))

In [None]:
left_just(data.frame(
    # stanford = sentiment_stanford(s_v)[["sentiment"]],
    sentimentr_jockersrinker = round(sentiment(s_v, question.weight = 0)[["sentiment"]], 2),
    sentimentr_jockers = round(sentiment(s_v, lexicon::hash_sentiment_jockers, question.weight = 0)[["sentiment"]], 2),    
    sentimentr_huliu = round(sentiment(s_v, lexicon::hash_sentiment_huliu, question.weight = 0)[["sentiment"]], 2),    
    sentimentr_sentiword = round(sentiment(s_v, lexicon::hash_sentiment_sentiword, question.weight = 0)[["sentiment"]], 2),    
    RSentiment = calculate_score(s_v), 
    SentimentAnalysis,
    meanr = score(s_v)[['score']],
    syuzhet,
    sentences = s_v,
    stringsAsFactors = FALSE
), "sentences")

In [None]:
os <- import("os")
os$listdir(".")

In [None]:
typeof(novel_df['text_clean'])

In [None]:
print(novel_df['text_clean'])

In [None]:
(novel_df['text_clean'])

In [None]:
library(stringi)

In [None]:
novel_str = stri_join_list(novel_df['text_clean'], sep=' ', collapse=TRUE)
typeof(novel_str)

In [None]:
substr(novel_str, 1, 100)

In [None]:
bovary_s_v <- get_sentences(bovary_str)
bovary_s_v

In [None]:
typeof(bovary_s_v)

In [None]:
bovary_sentiment <- get_sentiment(bovary_s_v)

In [None]:
typeof(bovary_sentiment)

In [None]:

simple_plot(bovary_sentiment)

## Installing libraries

In [None]:
install.packages('reticulate')

In [None]:
# which version of python are we using, where is it?

cli_msg <- system('which python', intern=TRUE)
cli_msg

In [None]:
# in RStudio, type 'usethis::edit_r_profile()' add this line

Sys.setenv(RETICULATE_PYTHON = "/usr/local/bin/python")

library(reticulate)

In [None]:
print("hello")

In [None]:
repl_python(input='print("hello")')

In [None]:
system('wget https://raw.githubusercontent.com/mwaskom/seaborn-data/master/flights.csv', intern=TRUE)


In [None]:
cli_msg <- system('ls -altr', intern=TRUE)
cli_msg

In [None]:
os <- import("os")
os$listdir(".")

In [None]:
#importing required Python libraries/modules
sns <- import('seaborn')
plt <- import('matplotlib.pyplot')
pd <- import('pandas')

# Syuzhet

In [None]:
install.packages('syuzhet')

In [None]:
library(syuzhet)

In [None]:
install.packages('gutenbergr')

In [None]:
library(gutenbergr)

In [None]:
library(dplyr)

In [None]:
gutenberg_metadata %>%
  filter(title == "Wuthering Heights")

In [None]:
gutenberg_works(author == "Austen, Jane")

In [None]:
library(stringr)
gutenberg_works(str_detect(author, "Austen"))

In [None]:
library(stringr)
gutenberg_works(str_detect(title, "Bovary"))

In [None]:
bovary <- gutenberg_download(2413)
bovary

In [None]:
os <- import("os")
os$listdir(".")

In [None]:
typeof(bovary[1])

In [None]:
typeof(bovary[2][0])

In [None]:
library(stringi)

In [None]:
bovary_str = stri_join_list(bovary[2], sep=' ', collapse=TRUE)
typeof(bovary_str)

In [None]:
substr(bovary_str, 1, 100)

In [None]:
s_v <- get_sentences(bovary_str)
s_v

In [None]:
s_v_sentiment <- get_sentiment(s_v)
s_v_sentiment

In [None]:
simple_plot(s_v_sentiment)

In [None]:
bovary_str = str_flatten(bovary[2], " ")

In [None]:
bovary_str2 = str_flatten(noquote(bovary[2]), " ")

In [None]:
bovary_str2

In [None]:
typeof(bovary_str)

In [None]:
bovary_str

In [None]:
bovary_v <- get_sentences(bovary_str)

In [None]:
typeof(bovary_v)

In [None]:
sent_v <- get_sentences(bovary)

In [None]:
```{python}
import os
```

In [None]:
#using R's inbuilt AirPassengers dataset
df <- datasets::AirPassengers

#converting Time-Series object into an R Dataframe 
#Thx: https://stackoverflow.com/questions/5331901/transforming-a-time-series-into-a-data-frame-and-back
df1 <- data.frame(tapply(df, list(year = floor(time(df)), month = month.abb[cycle(df)]), c))
df1 <- df1[month.abb]

#building a heatmap using seaborn 
#please note the function r_to_py() that converts R object into a python 
sns$heatmap(r_to_py(df1), fmt="g", cmap ='viridis')

#display the plot
plt$show()

In [None]:
```{python}

name = 'Bill'

print(f'Hello {name}!')
```

In [None]:
```{python}
import pandas as pd

flights = pd.read_csv('flights.csv')
flights = flights[flights['dest'] == "ORD"]
flights = flights[['carrier', 'dep_delay', 'arr_delay']]
flights = flights.dropna()
```

In [None]:
# using R's inbuilt AirPassengers dataset
df <- datasets::AirPassengers

In [None]:
df

In [None]:
install.packages('caret')

In [None]:
install.packages('mlbench')

## Importing libraries

In [None]:
library(caret)

In [None]:
library(ggplot2)

In [None]:
library(mlbench)

## How many CPU cores are there?

In [None]:
library(parallel)
detectCores(all.tests = FALSE, logical = TRUE)

---

# Machine Learning in R: Building a Linear Regression Model

YouTube:
https://www.youtube.com/watch?v=el8xP38SWdk

GitHub:
https://github.com/dataprofessor/code/blob/master/linear-regression/boston-housing-linear-regression.R

In [None]:
############################################
# Data Professor                           #
# http://youtube.com/dataprofessor         #
# http://github.com/dataprofessor          #
# http://facebook.com/dataprofessor        #
# https://www.instagram.com/data.professor #
############################################

# Importing libraries
library(mlbench) # Contains several benchmark data sets (especially the Boston Housing dataset)
library(caret) # Package for machine learning algorithms / CARET stands for Classification And REgression Training

# Importing the Boston Housing data set
data(BostonHousing)

head(BostonHousing)

# Check to see if there are missing data?
sum(is.na(BostonHousing))

# To achieve reproducible model; set the random seed number
set.seed(100)

# Performs stratified random split of the data set
TrainingIndex <- createDataPartition(BostonHousing$medv, p=0.8, list = FALSE)
TrainingSet <- BostonHousing[TrainingIndex,] # Training Set
TestingSet <- BostonHousing[-TrainingIndex,] # Test Set


###############################

# Build Training model
Model <- train(medv ~ ., data = TrainingSet,
               method = "lm",
               na.action = na.omit,
               preProcess=c("scale","center"),
               trControl= trainControl(method="none")
)

# Apply model for prediction
Model.training <-predict(Model, TrainingSet) # Apply model to make prediction on Training set
Model.testing <-predict(Model, TestingSet) # Apply model to make prediction on Testing set

# Model performance (Displays scatter plot and performance metrics)
  # Scatter plot of Training set
plot(TrainingSet$medv,Model.training, col = "blue" )
plot(TestingSet$medv,Model.testing, col = "blue" )

---

# Machine Learning in R: Building a Linear Regression Model

YouTube:
https://www.youtube.com/watch?v=el8xP38SWdk

GitHub:
https://github.com/dataprofessor/code/blob/master/linear-regression/boston-housing-linear-regression.R

### Importing libraries

In [None]:
library(mlbench) # Contains several benchmark data sets (especially the Boston Housing dataset)
library(caret) # Package for machine learning algorithms / CARET stands for Classification And REgression Training

### Importing the Boston Housing data set

In [None]:
data(BostonHousing)

head(BostonHousing)

### Check to see if there are missing data?

In [None]:
sum(is.na(BostonHousing))

### To achieve reproducible model; set the random seed number

In [None]:
set.seed(100)

### Performs stratified random split of the data set

In [None]:
TrainingIndex <- createDataPartition(BostonHousing$medv, p=0.8, list = FALSE)
TrainingSet <- BostonHousing[TrainingIndex,] # Training Set
TestingSet <- BostonHousing[-TrainingIndex,] # Test Set

### Build Training model

In [None]:
Model <- train(medv ~ ., data = TrainingSet,
               method = "lm",
               na.action = na.omit,
               preProcess=c("scale","center"),
               trControl= trainControl(method="none")
)

### Apply model for prediction

In [None]:
Model.training <-predict(Model, TrainingSet) # Apply model to make prediction on Training set
Model.testing <-predict(Model, TestingSet) # Apply model to make prediction on Testing set


### Model performance (Displays scatter plot and performance metrics)
Scatter plot of Training set

In [None]:
plot(TrainingSet$medv,Model.training, col = "blue" )

Scatter plot of Testing set

In [None]:
plot(TestingSet$medv,Model.testing, col = "blue" )