<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Preprocess-emojis-and-emoticons" data-toc-modified-id="Preprocess-emojis-and-emoticons-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Preprocess emojis and emoticons</a></span><ul class="toc-item"><li><span><a href="#Import-libraries" data-toc-modified-id="Import-libraries-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Import libraries</a></span></li><li><span><a href="#Load-reviews" data-toc-modified-id="Load-reviews-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Load reviews</a></span></li><li><span><a href="#Define-a-function-to-convert-emojis-and-emoticons" data-toc-modified-id="Define-a-function-to-convert-emojis-and-emoticons-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Define a function to convert emojis and emoticons</a></span></li><li><span><a href="#Define-a-function-to-convert-emoticons" data-toc-modified-id="Define-a-function-to-convert-emoticons-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Define a function to convert emoticons</a></span></li><li><span><a href="#Convert-emojis-and-emoticons-into-words" data-toc-modified-id="Convert-emojis-and-emoticons-into-words-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Convert emojis and emoticons into words</a></span></li><li><span><a href="#Export-file" data-toc-modified-id="Export-file-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Export file</a></span></li></ul></li></ul></div>

# Preprocess emojis and emoticons

App reviews contain emojis and emoticons. The users express their emotions and/or opinion when including them in their review. In order to keep this information, we 'translate' them into written expressions for easier automatic analysis:
* In order to manually label sentiment in the training set, we use the opinion units extracted from the review before emojis and emoticons preprocessing. 
* In order to train our sentiment classifier, we convert the emojis and emoticons before training our classifier.


The dictionaries of emojis and emoticons were found in this 
[post about emojis and emoticons](https://studymachinelearning.com/text-preprocessing-handle-emoji-emoticon/), as well as the main functions used to convert them into written text. The functions have been slighly modified to make sure emojis were well separated from one another.

## Import libraries

In [4]:
import os
import pandas as pd

In [5]:
import pickle
import re

In [6]:
import spacy
nlp = spacy.load('en_core_web_sm')

## Load reviews

In [7]:
path = os.getcwd()
filename = 'app_reviews_airvisual-air-quality-forecast_1048912974_by_lang_us_exp_abb.csv'
subfolder = '/../data/1_preprocessed_data/'

In [8]:
df = pd.read_csv(path+subfolder+filename)

In [9]:
df.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,review_id,rating,title,review_date,user_name,review,response_id,dev_response,response_date,lang,title_expanded,review_expanded
0,0,3095,6121840341,5,Happy to finally see when and why I can’t brea...,2020-06-25T23:18:57Z,Abbsteroni,Having allergies is annoying but I’m glad to s...,,,,en,Happy to finally see when and why I ca n’t bre...,Having allergies is annoying but I ’m glad to ...
1,1,982,6114444527,5,Super,2020-06-24T02:12:59Z,WillJosue,Easy to keep track on specific Local areas,,,,en,Super,Easy to keep track on specific Local areas
2,2,1965,6114325210,5,Great App,2020-06-24T01:31:38Z,Nejinater,Full of good information!,,,,en,Great App,Full of good information !
3,3,797,6111838742,5,Great app for filtering the air,2020-06-23T11:01:04Z,Jamieissad,Tells you everything you need to know about th...,,,,en,Great app for filtering the air,Tells you everything you need to know about th...
4,4,962,6104666348,5,I look everyday,2020-06-21T14:29:25Z,TorchPitchfork,This is part of my daily planning. I love the ...,,,,en,I look everyday,This is part of my daily planning . I love the...


## Define a function to convert emojis and emoticons

In [106]:
# load emoji dictionary
with open(path+'/Emoji_Dict.p', 'rb') as fp:
    Emoji_Dict = pickle.load(fp)

In [107]:
# we swap keys and values of the Emoji_Dict, such that emojis become keys (instead of values) of the dictionary
Emoji_Dict = {v: k for k, v in Emoji_Dict.items()}

In [108]:
# define a function for converting emojis into word
def convert_emojis_to_word(text):
    for emot in Emoji_Dict:
        text = re.sub(r'('+emot+')', " "+"_".join(Emoji_Dict[emot].replace(",","").replace(":","").split())+" ", text)
    return text

In [109]:
text = df.iloc[11].loc['review']
print(text)

What a great app , beautiful design, useful handling , perfect information 👍 5 stars and high recommended. What I miss ? Only 1 thing , a complication for the Siri watchface , it would be so cool when I see t by rough the day some cards with the air data on my Siri watchface 👍 any chance for this feature? Keep working on this app ist a 5 Star app 👍


In [110]:
convert_emojis_to_word(text)

'What a great app , beautiful design, useful handling , perfect information  thumbs_up  5 stars and high recommended. What I miss ? Only 1 thing , a complication for the Siri watchface , it would be so cool when I see t by rough the day some cards with the air data on my Siri watchface  thumbs_up  any chance for this feature? Keep working on this app ist a 5 Star app  thumbs_up '

In [111]:
text = df.iloc[12].loc['review']
print(text)
convert_emojis_to_word(text)

Fast and easy to use 😂👍🙏


'Fast and easy to use  face_with_tears_of_joy  thumbs_up  folded_hands '

NB : need to separate emojis, one from another.

## Define a function to convert emoticons

In [112]:
# load emoticon dictionary
with open(path+'/Emoticon_Dict.p', 'rb') as fp:
    Emoticon_Dict = pickle.load(fp)

In [113]:
# define a function for converting emoticons into word
def convert_emoticons_to_word(text):
    for emot in Emoticon_Dict:
        text = re.sub(u'('+emot+')', "_".join(Emoticon_Dict[emot].replace(",","").replace(":","").split())+" ", text)
    return text 

In [114]:
print(df.iloc[84].loc['review'])

That is my daily application at the moment and can’t stop using it - Thank you :)


In [115]:
convert_emoticons_to_word(df.iloc[84].loc['review'])

'That is my daily application at the moment and can’t stop using it - Thank you Happy_face_or_smiley '

## Convert emojis and emoticons into words

In [116]:
convert_emojis_to_word(df_test.iloc[1].loc['review'])

'What a great app , beautiful design, useful handling , perfect information  thumbs_up  5 stars and high recommended. What I miss ? Only 1 thing , a complication for the Siri watchface , it would be so cool when I see t by rough the day some cards with the air data on my Siri watchface  thumbs_up  any chance for this feature? Keep working on this app ist a 5 Star app  thumbs_up '

In [117]:
df_test = df.iloc[10:15]

In [118]:
df_test.loc[:,'title_transl_emo'] = df_test['title_expanded'].apply(convert_emojis_to_word).apply(convert_emoticons_to_word)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


In [119]:
df_test.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,review_id,rating,title,review_date,user_name,review,response_id,dev_response,response_date,lang,title_expanded,review_expanded,title_transl_emo
10,10,2580,6052473443,5,good luv app,2020-06-08T23:06:36Z,맑은이의영상창고,clean world,,,,en,good luv app,clean world,good luv app
11,11,300,6043053831,5,Perfect 👍,2020-06-06T15:00:25Z,Kofolu,"What a great app , beautiful design, useful ha...",,,,en,Perfect 👍,"What a great app , beautiful design , useful h...",Perfect thumbs_up
12,12,2150,6023976474,5,Great App,2020-06-02T02:11:14Z,Smile7777,Fast and easy to use 😂👍🙏,,,,en,Great App,Fast and easy to use 😂 👍 🙏,Great App
13,13,1689,6012386491,5,Helpful,2020-05-30T06:36:46Z,nroose,Wish I didn’t need it.,,,,en,Helpful,Wish I did n’t need it .,Helpful
14,14,1205,6012239877,5,Great app,2020-05-30T05:44:05Z,12lena34,Easy to use and great for checking everyday ai...,,,,en,Great app,Easy to use and great for checking everyday ai...,Great app


In [120]:
df_test.loc[:,'review_transl_emo'] = df_test['review_expanded'].apply(convert_emojis_to_word).apply(convert_emoticons_to_word)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


In [121]:
df_test.loc[:,'review_transl_emo'] = df_test['review_expanded'].apply(convert_emojis_to_word).apply(convert_emoticons_to_word)

In [122]:
df_test.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,review_id,rating,title,review_date,user_name,review,response_id,dev_response,response_date,lang,title_expanded,review_expanded,title_transl_emo,review_transl_emo
10,10,2580,6052473443,5,good luv app,2020-06-08T23:06:36Z,맑은이의영상창고,clean world,,,,en,good luv app,clean world,good luv app,clean world
11,11,300,6043053831,5,Perfect 👍,2020-06-06T15:00:25Z,Kofolu,"What a great app , beautiful design, useful ha...",,,,en,Perfect 👍,"What a great app , beautiful design , useful h...",Perfect thumbs_up,"What a great app , beautiful design , useful h..."
12,12,2150,6023976474,5,Great App,2020-06-02T02:11:14Z,Smile7777,Fast and easy to use 😂👍🙏,,,,en,Great App,Fast and easy to use 😂 👍 🙏,Great App,Fast and easy to use face_with_tears_of_joy ...
13,13,1689,6012386491,5,Helpful,2020-05-30T06:36:46Z,nroose,Wish I didn’t need it.,,,,en,Helpful,Wish I did n’t need it .,Helpful,Wish I did n’t need it .
14,14,1205,6012239877,5,Great app,2020-05-30T05:44:05Z,12lena34,Easy to use and great for checking everyday ai...,,,,en,Great app,Easy to use and great for checking everyday ai...,Great app,Easy to use and great for checking everyday ai...


In [123]:
# convert emojis and emoticons in review title
df.loc[:,'title_transl_emo'] = df['title_expanded'].apply(convert_emojis_to_word).apply(convert_emoticons_to_word)

In [124]:
# convert emojis and emoticons in review text
df.loc[:,'review_transl_emo'] = df['review_expanded'].apply(convert_emojis_to_word).apply(convert_emoticons_to_word)

## Export file

In [125]:
export_filename = filename[:-4]+'_transl_emo.csv'
export_filename

'app_reviews_airvisual-air-quality-forecast_1048912974_by_lang_us_exp_abb_transl_emo.csv'

In [126]:
export_subfolder = '/../data/1_preprocessed_data/'
export_subfolder

'/../data/1_preprocessed_data/'

In [127]:
df.to_csv(path+export_subfolder+export_filename)