# Dimensions -  Adjectives

In the first part, we analyzed all adjectives regardless of what they describe. In this analysis, we will focus on adjectives for each dimension to find out which adjectives guests use to describe each dimension.

# Importing Libraries and Data Loading

In [1]:
import pandas as pd
import nltk
from nltk.tokenize import RegexpTokenizer
from functools import reduce
import numpy as np

In [2]:
reviews = pd.read_excel('Reviews.xlsx')
ps = pd.read_excel('Polarity_scores.xlsx')

# Food terms

First, we will start with terms related to food *(food, dish, meal, portion, menu, offer, presented)*. We will **split** the *Review* column on the given term. For example, when the term *'food'* appears in the *Review*, the column *Review* will be **divided into three columns**. The first column is first part of review **before** the term food, the second column is the **term** food itself, and the third column is last part of review **after** the term.

After that in the first column *Before* we will leave only the **last word** and in the third column *After* we will leave **first three words** of review. The logic behind it is, for example, if the guest described the food as *'good food'*, the adjective 'good' will be **before** the term and will be in the **first column**. If the guest wrote *'food is good'*, the adjective will be placed **after** the term in the **third column** *After*.

Using *pos_tag from nltk library*, we will find **adjectives and adverbs** *(JJ JJR, JJS, RB, RBR, RBS)* in the columns *Before and After*. We will create two functions, one for the column *Before* and another for the column *After* to **extract** the adjectives. At the end, we will **combine** them into one dataframe named *food_review3*. The column *Adjective_occurrence_[term]* will be created to count number of **occurrences** for each adjective.

### Food

In [3]:
def getAdjectiveBefore(label): 
    
    def cleaning(label):
        df = reviews.Review.str.lower()
        df = df.str.partition(label) #Splitting column on a given term
        df = df.rename(columns={0:'previous', 1:label, 2:'next'})
        df = df[df[label]== label]
        df = df.reset_index(drop=True)
        return df
    df_split = cleaning(label)

    def get_words(df_split, label):
        #column before the term, only the last word 
        df_split['Before'] = df_split['previous'].str.split() \
            .str[-1] \
            .fillna(label)
        #column after the term, the first three words
        df_split['After'] = df_split['next'].str.split() \
            .str[0:3] \
            .fillna(label) \
            .str.join(' ')
        return df_split
    df_before = get_words(df_split, label)
    
    def tokenize(df_before):
        df_before['tokenized'] = df_before.Before.apply(lambda x: nltk.word_tokenize(x))
        df_before['Pos_tag_before'] = df_before.tokenized.apply(lambda x: nltk.pos_tag(x))

        df_before['tokenized_after'] = df_before.After.apply(lambda x: nltk.word_tokenize(x))
        df_before['Pos_tag_after'] = df_before.tokenized_after.apply(lambda x: nltk.pos_tag(x))

        df_before.Pos_tag_before = df_before.Pos_tag_before.apply(lambda x: x[0] if isinstance(x, list) else x).str.join(',')

        return df_before
    df_before = tokenize(df_before)
    
    def get_adjective(df_before):
        output_df = pd.DataFrame(df_before.loc[:, 'Pos_tag_before'])
        output_df = output_df.reset_index(drop=True)
        output_df = output_df.rename(columns={'Pos_tag_before': 'Adjective'})
        output_df = output_df.query('Adjective.str.contains("JJ")')
        return output_df

    term_review = get_adjective(df_before)
    return df_before, term_review

In [4]:
df_food, food_review = getAdjectiveBefore('food')

In [5]:
#Occurrences_food_first - counts the number of occurrences for each adjective in a review
food_review = (food_review            
 .assign(Adjective = food_review.Adjective.str.split(',').str[0],
         Occurrences_food_first = food_review.groupby('Adjective')['Adjective'].transform('count'))
 .drop_duplicates(keep='first')

)

In [6]:
# Assuming that if the adjective was in the first column before the term
# then the adjective for that same term will not be found after the term
# dropping all rows where Pos_tag_before had adjective because we already detected those adjectives in previous functions

df2_food = df_food[~df_food.Pos_tag_before.str.contains('JJ')].reset_index(drop=True)

In [7]:
#finding adjectives after the term

def getAdjectiveAfter(df):
    output_df = pd.DataFrame(df.Pos_tag_after.explode().str.join(','))
    output_df = output_df.rename(columns={'Pos_tag_after': 'Adjective'})
    output_df = output_df.dropna()
    output_df = output_df.query('Adjective.str.contains("JJ")')
    output_df = output_df.reset_index(drop=True)
    return output_df

In [8]:
food_review2 = getAdjectiveAfter(df2_food)

In [9]:
#merging two dataframes: food_review with adjectives before the term and food_review3 with adjectives after the term

food_review3 = (food_review2
 .assign(Adjective = food_review2.Adjective.str.split(',').str[0],
         Occurrences_food_last = food_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(food_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 .reset_index(drop=True)
 
)

In [10]:
#creating column for the total number of occurrences of the adjective before and after the term 
food_review3['Adjective_occurrence_food'] = food_review3[['Occurrences_food_first', 'Occurrences_food_last']].sum(axis=1)

### Dish

In [11]:
df_dish, dish_review = getAdjectiveBefore('dish')

In [12]:
dish_review = (dish_review
 .assign(Adjective = dish_review.Adjective.str.split(',').str[0],
         Occurrences_dish_first = dish_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [13]:
df2_dish = df_dish[~df_dish.Pos_tag_before.str.contains('JJ')].reset_index()

In [14]:
dish_review2 = getAdjectiveAfter(df2_dish)

In [15]:
dish_review2 = (dish_review2
 .assign(Adjective = dish_review2.Adjective.str.split(',').str[0],
         Occurrences_dish_last = dish_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(dish_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [16]:
food_review3 = food_review3.merge(dish_review2, on='Adjective', how='left')

In [17]:
food_review3['Adjective_occurrence_food'] = food_review3[['Occurrences_dish_last', 'Occurrences_dish_first', 
                                                          'Adjective_occurrence_food']].sum(axis=1)

### Meal

In [18]:
df_meal, meal_review = getAdjectiveBefore('meal')

In [19]:
meal_review = (meal_review
 .assign(Adjective = meal_review.Adjective.str.split(',').str[0],
         Occurrences_meal_first = meal_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [20]:
df2_meal = df_meal[~df_meal.Pos_tag_before.str.contains('JJ')].reset_index()

In [21]:
meal_review2 = getAdjectiveAfter(df2_meal)

In [22]:
meal_review2 = (meal_review2
 .assign(Adjective = meal_review2.Adjective.str.split(',').str[0],
         Occurrences_meal_last = meal_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(meal_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [23]:
food_review3 = food_review3.merge(meal_review2, on='Adjective', how='left')

In [24]:
food_review3['Adjective_occurrence_food'] = food_review3[['Occurrences_meal_last', 'Occurrences_meal_first', 
                                                          'Adjective_occurrence_food']].sum(axis=1)

### Portion

In [25]:
df_portion, portion_review = getAdjectiveBefore('portion')

In [26]:
portion_review = (portion_review
 .assign(Adjective = portion_review.Adjective.str.split(',').str[0],
         Occurrences_portion_first = portion_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [27]:
df2_portion = df_portion[~df_portion.Pos_tag_before.str.contains('JJ')].reset_index(drop=True)

In [28]:
portion_review2 = getAdjectiveAfter(df2_portion)

In [29]:
portion_review2 = (portion_review2
 .assign(Adjective = portion_review2.Adjective.str.split(',').str[0],
         Occurrences_portion_last = portion_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(portion_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [30]:
food_review3 = food_review3.merge(portion_review2, on='Adjective', how='left')

In [31]:
food_review3['Adjective_occurrence_food'] = food_review3[['Occurrences_portion_last', 'Occurrences_portion_first', 
                                                          'Adjective_occurrence_food']].sum(axis=1)

### Menu

In [32]:
df_menu, menu_review = getAdjectiveBefore('menu')

In [33]:
menu_review = (menu_review
 .assign(Adjective = menu_review.Adjective.str.split(',').str[0],
         Occurrences_menu_first = menu_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [34]:
df2_menu = df_menu[~df_menu.Pos_tag_before.str.contains('JJ')].reset_index(drop=True)

In [35]:
menu_review2 = getAdjectiveAfter(df2_menu)

In [36]:
menu_review2 = (menu_review2
 .assign(Adjective = menu_review2.Adjective.str.split(',').str[0],
         Occurrences_menu_last = menu_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(menu_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [37]:
food_review3 = food_review3.merge(menu_review2, on='Adjective', how='left')

In [38]:
food_review3['Adjective_occurrence_food'] = food_review3[['Occurrences_menu_last', 'Occurrences_menu_first', 
                                                          'Adjective_occurrence_food']].sum(axis=1)

### Offer

In [39]:
df_offer, offer_review = getAdjectiveBefore('offer')

In [40]:
offer_review = (offer_review
 .assign(Adjective = offer_review.Adjective.str.split(',').str[0],
         Occurrences_offer_first = offer_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [41]:
df2_offer = df_offer[~df_offer.Pos_tag_before.str.contains('JJ')].reset_index(drop=True)

In [42]:
offer_review2 = getAdjectiveAfter(df2_offer)

In [43]:
offer_review2 = (offer_review2
 .assign(Adjective = offer_review2.Adjective.str.split(',').str[0],
         Occurrences_offer_last = offer_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(offer_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [44]:
food_review3 = food_review3.merge(offer_review2, on='Adjective', how='left')

In [45]:
food_review3['Adjective_occurrence_food'] = food_review3[['Occurrences_offer_last', 'Occurrences_offer_first', 
                                                          'Adjective_occurrence_food']].sum(axis=1)

### Presented

In [46]:
df_presented, presented_review = getAdjectiveBefore('presented')

In [47]:
presented_review = (presented_review
 .assign(Adjective = presented_review.Adjective.str.split(',').str[0],
         Occurrences_presented_first = presented_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [48]:
df2_presented = df_presented[~df_presented.Pos_tag_before.str.contains('JJ')].reset_index(drop=True)

In [49]:
presented_review2 = getAdjectiveAfter(df2_presented)

In [50]:
presented_review2 = (presented_review2
 .assign(Adjective = presented_review2.Adjective.str.split(',').str[0],
         Occurrences_presented_last = presented_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(presented_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [51]:
food_review3 = food_review3.merge(presented_review2, on='Adjective', how='left')

In [52]:
food_review3['Adjective_occurrence_food'] = food_review3[['Occurrences_presented_last', 'Occurrences_presented_first', 
                                                          'Adjective_occurrence_food']].sum(axis=1)

In [53]:
#getting the dataframe with only adjectives and adjective occurrence
food_adjectives = (food_review3
 .loc[:, ['Adjective', 'Adjective_occurrence_food']]
 .astype({'Adjective_occurrence_food':'int'})
 .sort_values(by='Adjective_occurrence_food', ascending=False)
 .query('Adjective_occurrence_food >5')
 .reset_index(drop=True)

)

# Service terms

Next, we will apply the same functions to extract adjectives from term related to service *(service, waiter, staff, manager, reception, welcome)*.

### Service

In [54]:
df_service, service_review = getAdjectiveBefore('service')

In [55]:
service_review = (service_review
 .assign(Adjective = service_review.Adjective.str.split(',').str[0],
         Occurrences_service_first = service_review.groupby('Adjective')['Adjective'].transform('count'))
 .drop_duplicates(keep='first')

)

In [56]:
df2_service = df_service[~df_service.Pos_tag_before.str.contains('JJ')].reset_index()

In [57]:
service_review2 = getAdjectiveAfter(df2_service)

In [58]:
service_review3 = (service_review2
 .assign(Adjective = service_review2.Adjective.str.split(',').str[0],
         Occurrences_service_last = service_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(service_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 .reset_index(drop=True)
 
)

In [59]:
service_review3['Adjective_occurrence_service'] = service_review3[['Occurrences_service_first', 
                                                                   'Occurrences_service_last']].sum(axis=1)

### Welcome

In [60]:
df_welcome, welcome_review = getAdjectiveBefore('welcome')

In [61]:
welcome_review = (welcome_review
 .assign(Adjective = welcome_review.Adjective.str.split(',').str[0],
         Occurrences_welcome_first = welcome_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [62]:
df2_welcome = df_welcome[~df_welcome.Pos_tag_before.str.contains('JJ')].reset_index()

In [63]:
welcome_review2 = getAdjectiveAfter(df2_welcome)

In [64]:
welcome_review2 = (welcome_review2
 .assign(Adjective = welcome_review2.Adjective.str.split(',').str[0],
         Occurrences_welcome_last = welcome_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(welcome_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [65]:
service_review3 = service_review3.merge(welcome_review2, on='Adjective', how='left')

In [66]:
service_review3['Adjective_occurrence_service'] = service_review3[['Adjective_occurrence_service', 'Occurrences_welcome_first', 
                                                                   'Occurrences_welcome_last']].sum(axis=1)

### Staff

In [67]:
df_staff, staff_review = getAdjectiveBefore('staff')

In [68]:
staff_review = (staff_review
 .assign(Adjective = staff_review.Adjective.str.split(',').str[0],
         Occurrences_staff_first = staff_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [69]:
df2_staff = df_staff[~df_staff.Pos_tag_before.str.contains('JJ')].reset_index()

In [70]:
staff_review2 = getAdjectiveAfter(df2_staff)

In [71]:
staff_review2 = (staff_review2
 .assign(Adjective = staff_review2.Adjective.str.split(',').str[0],
         Occurrences_staff_last = staff_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(staff_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [72]:
service_review3 = service_review3.merge(staff_review2, on='Adjective', how='left')

In [73]:
service_review3['Adjective_occurrence_service'] = service_review3[['Adjective_occurrence_service', 'Occurrences_staff_first', 
                                                                   'Occurrences_staff_last']].sum(axis=1)

### Manager

In [74]:
df_manager, manager_review = getAdjectiveBefore('manager')

In [75]:
manager_review = (manager_review
 .assign(Adjective = manager_review.Adjective.str.split(',').str[0],
         Occurrences_manager_first = manager_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [76]:
df2_manager = df_manager[~df_manager.Pos_tag_before.str.contains('JJ')].reset_index()

In [77]:
manager_review2 = getAdjectiveAfter(df2_manager)

In [78]:
manager_review2 = (manager_review2
 .assign(Adjective = manager_review2.Adjective.str.split(',').str[0],
         Occurrences_manager_last = manager_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(manager_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [79]:
service_review3 = service_review3.merge(manager_review2, on='Adjective', how='left')

In [80]:
service_review3['Adjective_occurrence_service'] = service_review3[['Adjective_occurrence_service', 'Occurrences_manager_first', 
                                                                   'Occurrences_manager_last']].sum(axis=1)

### Waiter

In [81]:
df_waiter, waiter_review = getAdjectiveBefore('waiter')

In [82]:
waiter_review = (waiter_review
 .assign(Adjective = waiter_review.Adjective.str.split(',').str[0],
         Occurrences_waiter_first = waiter_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [83]:
df2_waiter = df_waiter[~df_waiter.Pos_tag_before.str.contains('JJ')].reset_index()

In [84]:
waiter_review2 = getAdjectiveAfter(df2_waiter)

In [85]:
waiter_review2 = (waiter_review2
 .assign(Adjective = waiter_review2.Adjective.str.split(',').str[0],
         Occurrences_waiter_last = waiter_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(waiter_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [86]:
service_review3 = service_review3.merge(waiter_review2, on='Adjective', how='left')

In [87]:
service_review3['Adjective_occurrence_service'] = service_review3[['Adjective_occurrence_service', 'Occurrences_waiter_first', 
                                                                   'Occurrences_waiter_last']].sum(axis=1)

In [88]:
#getting the dataframe with only adjectives and adjective occurrence
service_adjectives = (service_review3
 .loc[:, ['Adjective', 'Adjective_occurrence_service']]
 .astype({'Adjective_occurrence_service':'int'})
 .sort_values(by='Adjective_occurrence_service', ascending=False)
 .query('Adjective_occurrence_service >5')
 .reset_index(drop=True)

)

# Ambience terms

Next, we will apply the same functions to extract adjectives from term related to ambience *(ambience, view, music, interior, decor, atmosphere)*.

### Ambience

In [89]:
df_ambience, ambience_review = getAdjectiveBefore('ambience')

In [90]:
ambience_review = (ambience_review
 .assign(Adjective = ambience_review.Adjective.str.split(',').str[0],
         Occurrences_ambience_first = ambience_review.groupby('Adjective')['Adjective'].transform('count'))
 .drop_duplicates(keep='first')

)

In [91]:
df2_ambience = df_ambience[~df_ambience.Pos_tag_before.str.contains('JJ')].reset_index()

In [92]:
ambience_review2 = getAdjectiveAfter(df2_ambience)

In [93]:
ambience_review3 = (ambience_review2
 .assign(Adjective = ambience_review2.Adjective.str.split(',').str[0],
         Occurrences_ambience_last = ambience_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(ambience_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 .reset_index(drop=True)
 
)

In [94]:
ambience_review3['Adjective_occurrence_ambience'] = ambience_review3[['Occurrences_ambience_first', 
                                                                      'Occurrences_ambience_last']].sum(axis=1)

### Decor

In [95]:
df_decor, decor_review = getAdjectiveBefore('decor')

In [96]:
decor_review = (decor_review
 .assign(Adjective = decor_review.Adjective.str.split(',').str[0],
         Occurrences_decor_first = decor_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [97]:
df2_decor = df_decor[~df_decor.Pos_tag_before.str.contains('JJ')].reset_index()

In [98]:
decor_review2 = getAdjectiveAfter(df2_decor)

In [99]:
decor_review2 = (decor_review2
 .assign(Adjective = decor_review2.Adjective.str.split(',').str[0],
         Occurrences_decor_last = decor_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(decor_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [100]:
ambience_review3 = ambience_review3.merge(decor_review2, on='Adjective', how='left')

In [101]:
ambience_review3['Adjective_occurrence_ambience'] = ambience_review3[['Adjective_occurrence_ambience', 
                                                            'Occurrences_decor_first' , 'Occurrences_decor_last']].sum(axis=1)

### View

In [102]:
df_view, view_review = getAdjectiveBefore('view')

In [103]:
view_review = (view_review
 .assign(Adjective = view_review.Adjective.str.split(',').str[0],
         Occurrences_view_first = view_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [104]:
df2_view = df_view[~df_view.Pos_tag_before.str.contains('JJ')].reset_index()

In [105]:
view_review2 = getAdjectiveAfter(df2_view)

In [106]:
view_review2 = (view_review2
 .assign(Adjective = view_review2.Adjective.str.split(',').str[0],
         Occurrences_view_last = view_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(view_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [107]:
ambience_review3 = ambience_review3.merge(view_review2, on='Adjective', how='left')

In [108]:
ambience_review3['Adjective_occurrence_ambience'] = ambience_review3[['Adjective_occurrence_ambience', 'Occurrences_view_first',
                                                                      'Occurrences_view_last']].sum(axis=1)

### Interior

In [109]:
df_interior, interior_review = getAdjectiveBefore('interior')

In [110]:
interior_review = (interior_review
 .assign(Adjective = interior_review.Adjective.str.split(',').str[0],
         Occurrences_interior_first = interior_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [111]:
df2_interior = df_interior[~df_interior.Pos_tag_before.str.contains('JJ')].reset_index()

In [112]:
interior_review2 = getAdjectiveAfter(df2_interior)

In [113]:
interior_review2 = (interior_review2
 .assign(Adjective = interior_review2.Adjective.str.split(',').str[0],
         Occurrences_interior_last = interior_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(interior_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [114]:
ambience_review3 = ambience_review3.merge(interior_review2, on='Adjective', how='left')

In [115]:
ambience_review3['Adjective_occurrence_ambience'] = ambience_review3[['Adjective_occurrence_ambience', 
                                                        'Occurrences_interior_first', 'Occurrences_interior_last']].sum(axis=1)

### Music

In [116]:
df_music, music_review = getAdjectiveBefore('music')

In [117]:
music_review = (music_review
 .assign(Adjective = music_review.Adjective.str.split(',').str[0],
         Occurrences_music_first = music_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [118]:
df2_music = df_music[~df_music.Pos_tag_before.str.contains('JJ')].reset_index()

In [119]:
music_review2 = getAdjectiveAfter(df2_music)

In [120]:
music_review2 = (music_review2
 .assign(Adjective = music_review2.Adjective.str.split(',').str[0],
         Occurrences_music_last = music_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(music_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [121]:
ambience_review3 = ambience_review3.merge(music_review2, on='Adjective', how='left')

In [122]:
ambience_review3['Adjective_occurrence_ambience'] = ambience_review3[['Adjective_occurrence_ambience', 
                                                            'Occurrences_music_first', 'Occurrences_music_last']].sum(axis=1)

### Atmosphere

In [123]:
df_atmosphere, atmosphere_review = getAdjectiveBefore('atmosphere')

In [124]:
atmosphere_review = (atmosphere_review
 .assign(Adjective = atmosphere_review.Adjective.str.split(',').str[0],
         Occurrences_atmosphere_first = atmosphere_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [125]:
df2_atmosphere = df_atmosphere[~df_atmosphere.Pos_tag_before.str.contains('JJ')].reset_index()

In [126]:
atmosphere_review2 = getAdjectiveAfter(df2_atmosphere)

In [127]:
atmosphere_review2 = (atmosphere_review2
 .assign(Adjective = atmosphere_review2.Adjective.str.split(',').str[0],
         Occurrences_atmosphere_last = atmosphere_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(atmosphere_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [128]:
ambience_review3 = ambience_review3.merge(atmosphere_review2, on='Adjective', how='left')

In [129]:
ambience_review3['Adjective_occurrence_ambience'] = ambience_review3[['Adjective_occurrence_ambience', 
                                                'Occurrences_atmosphere_first', 'Occurrences_atmosphere_last']].sum(axis=1)

In [130]:
#getting the dataframe with only adjectives and adjective occurrence
ambience_adjectives = (ambience_review3
 .loc[:, ['Adjective', 'Adjective_occurrence_ambience']]
 .astype({'Adjective_occurrence_ambience':'int'})
 .sort_values(by='Adjective_occurrence_ambience', ascending=False)
 .query('Adjective_occurrence_ambience >5')
 .reset_index(drop=True)

)

# Value
Lastly, we will use the same functions to extract adjectives from term related to value *(price, pay, value, money, bill)*.

### Price

In [131]:
df_price, price_review = getAdjectiveBefore('price')

In [132]:
price_review = (price_review
 .assign(Adjective = price_review.Adjective.str.split(',').str[0],
         Occurrences_price_first = price_review.groupby('Adjective')['Adjective'].transform('count'))
 .drop_duplicates(keep='first')

)

In [133]:
df2_price = df_price[~df_price.Pos_tag_before.str.contains('JJ')].reset_index()

In [134]:
price_review2 = getAdjectiveAfter(df2_price)

In [135]:
price_review3 = (price_review2
 .assign(Adjective = price_review2.Adjective.str.split(',').str[0],
         Occurrences_price_last = price_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(price_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 .reset_index(drop=True)
 
)

In [136]:
price_review3['Adjective_occurrence_price'] = price_review3[['Occurrences_price_first', 'Occurrences_price_last']].sum(axis=1)

### Value

In [137]:
df_value, value_review = getAdjectiveBefore('value')

In [138]:
value_review = (value_review
 .assign(Adjective = value_review.Adjective.str.split(',').str[0],
         Occurrences_value_first = value_review.groupby('Adjective')['Adjective'].transform('count')
 )
 .drop_duplicates(keep='first')
 .reset_index(drop=True)

)

In [139]:
df2_value = df_value[~df_value.Pos_tag_before.str.contains('JJ')].reset_index()

In [140]:
value_review2 = getAdjectiveAfter(df2_value)

In [141]:
value_review2 = (value_review2
 .assign(Adjective = value_review2.Adjective.str.split(',').str[0],
         Occurrences_value_last = value_review2.groupby('Adjective')['Adjective'].transform('count'))
 .merge(value_review, on='Adjective', how='right')
 .drop_duplicates(keep='first')
 
)

In [142]:
price_review3 = price_review3.merge(value_review2, on='Adjective', how='left')

In [143]:
price_review3['Adjective_occurrence_price'] = price_review3[['Adjective_occurrence_price', 'Occurrences_value_first', 
                                                             'Occurrences_value_last']].sum(axis=1)

In [144]:
#getting the dataframe with only adjectives and adjective occurrence
price_adjectives = (price_review3
 .loc[:, ['Adjective', 'Adjective_occurrence_price']]
 .astype({'Adjective_occurrence_price':'int'})
 .sort_values(by='Adjective_occurrence_price', ascending=False)
 .query('Adjective_occurrence_price >5')
 .reset_index(drop=True)

)

# Putting all together
Finally, all adjectives are extracted from all terms and data will be combined in one dataframe called *df_final* including the dataframe *ps* from previous analysis with all adjectives. Given that the adjectives **expensive, cheap, overpriced** are frequent adjectives used by guests to describe value, but they are *independent* adjectives and do not stand with any term, so we will extract them *additionally* from the reviews to find out how many times they occurred in reviews. Next we will drop all adjectives that don't appear with any term. We will sort values by *most popular adjectives based on food terms*. Adjectives like *italian, japanese, portuguese* will be dropped because they are describing the type of food served in a restaurant. 

In [145]:
data_frames = [ps, food_adjectives, service_adjectives, ambience_adjectives, price_adjectives]

df_final = reduce(lambda left, right: pd.merge(left, right, on='Adjective', how = 'outer'), data_frames)

In [146]:
# extracting adjectives expensive, cheap, overprices
price_values = (df_final
 .loc[(df_final.Adjective == 'expensive')].Adjective_occurrence_price \
        .fillna(reviews[reviews.Review.str.contains('expensive')].shape[0] \
               + reviews[reviews.Review.str.contains('not cheap')].shape[0]),
 df_final.loc[(df_final.Adjective == 'overpriced')].Adjective_occurrence_price \
        .fillna(reviews[reviews.Review.str.contains('overpriced')].shape[0]),
 df_final.loc[(df_final.Adjective == 'cheap')].Adjective_occurrence_price \
        .fillna(reviews[reviews.Review.str.contains('cheap')].shape[0] \
                - reviews[reviews.Review.str.contains('not cheap')].shape[0])
 

)

In [147]:
# adding adjectives to the dataframe
df_final.loc[df_final.Adjective == 'expensive', 'Adjective_occurrence_price'] = 82.0
df_final.loc[df_final.Adjective == 'overpriced', 'Adjective_occurrence_price'] = 31.0
df_final.loc[df_final.Adjective == 'cheap', 'Adjective_occurrence_price'] = 21.0

In [148]:
df_final = df_final[(df_final.Adjective_occurrence_food.notna()) | (df_final.Adjective_occurrence_price.notna()) | \
         (df_final.Adjective_occurrence_service.notna()) | (df_final.Adjective_occurrence_ambience.notna())] \
         .sort_values(by='Adjective_occurrence_food', ascending=False) \
         .reset_index(drop=True) \
         .drop([7,10,14,15]) \
         .reset_index(drop=True)

In [149]:
df_final['Adjectives_sum'] = df_final[['Adjective_occurrence_food', 'Adjective_occurrence_service', 
                                       'Adjective_occurrence_ambience', 'Adjective_occurrence_price']].sum(axis=1)

In [150]:
#df_final.memory_usage(deep=True).sum()
#df_final.to_excel('Adjective.xlsx')