# Airbnb Review Analysis

The objective of this project is to develop a deep knowledge of **Airbnb** guests' perspectives and experiences. We want to determine the top 20 keywords for each neighborhood, rating group, and sentiment score by looking at customer feedback and reviews using `Regular Expression` (Regex) package. We will be able to determine the words and phrases that appear most frequently in guest reviews using this method, and we will gain insight into the most typical themes and experiences shared by Airbnb guests. Understanding the elements that have the greatest impact on guest satisfaction will help Airbnb hosts improve the user experience, which may lead to more bookings, better ratings, and a better reputation. Also, by looking at the top keywords for each neighborhood, we can get more insights into the unique features and characteristics of various locations/neighborhoods, which can help hosts customize their amenities to suit the tastes of possible guests.

**Dataset Explanation:** <br>

`listing_id `   : Airbnb’s unique identifier for the listing  <br>   
`id`            : Airbnb’s unique identifier for each review <br>                     
`date`          : The specific date on which the review was posted. <br>                     
`reviewer_id `  : A unique identifier assigned to each reviewer <br>                  
`reviewer_name `: The name of the individual who submitted the review <br>             
`comments `     :  The text content of the review provided by the reviewer <br>              


## Data PreProcessing

**Review data preprocessing involves several steps, including language detection, translation, removal of punctuation marks, removal of stop words, word tokenization, and sentiment score extraction.**

#### Importing Required Library:

In [1]:
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
import nltk
import re
from nltk.corpus import stopwords
import pickle
import os

#### Reading the Compressed CSV File:

In [2]:
Review_df = pd.read_csv(r"C:\Users\AIRBNB_project\los.angeles.2021-07-05.reviews.csv.gz", 
                        compression='gzip', header=0, sep=',', quotechar='"') #Change the path before reading csv
 # drop columns we don't need

In [3]:
Review_df

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,109,449036,2011-08-15,927861,Edwin,The host canceled my reservation the day befor...
1,109,74506539,2016-05-15,22509885,Jenn,Me and two friends stayed for four and a half ...
2,2708,13994902,2014-06-09,10905424,Kuberan,i had a wonderful stay. Everything from start ...
3,2708,14606598,2014-06-23,2247288,Camilla,Charles is just amazing and he made my stay sp...
4,2708,39597339,2015-07-25,27974696,Fallon,Staying with Chas was an absolute pleasure. He...
...,...,...,...,...,...,...
1076621,50749551,399970210760281864,2021-07-05,378721073,Ning,apartment was super nice and cost friendly- we...
1076622,50753420,397801924587553864,2021-07-02,33443576,Steve,Edwin‘s place was quite lovely and my stay was...
1076623,50753420,399278196206436395,2021-07-04,50698003,Johnny,It was a great stay!
1076624,50755273,399234464053633752,2021-07-04,150441511,Yuan,The best house I have ever lived!The room is v...


In [7]:
Review_df.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,comments
0,109,449036,2011-08-15,927861,Edwin,The host canceled my reservation the day befor...
1,109,74506539,2016-05-15,22509885,Jenn,Me and two friends stayed for four and a half ...
2,2708,13994902,2014-06-09,10905424,Kuberan,i had a wonderful stay. Everything from start ...
3,2708,14606598,2014-06-23,2247288,Camilla,Charles is just amazing and he made my stay sp...
4,2708,39597339,2015-07-25,27974696,Fallon,Staying with Chas was an absolute pleasure. He...


In [8]:
Review_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1076626 entries, 0 to 1076625
Data columns (total 6 columns):
 #   Column         Non-Null Count    Dtype 
---  ------         --------------    ----- 
 0   listing_id     1076626 non-null  int64 
 1   id             1076626 non-null  int64 
 2   date           1076626 non-null  object
 3   reviewer_id    1076626 non-null  int64 
 4   reviewer_name  1076624 non-null  object
 5   comments       1075625 non-null  object
dtypes: int64(3), object(3)
memory usage: 49.3+ MB


### Language Detection

It is fundamental to identify the languages used in the review dataset. It is significant because it will assist us to pinpoint reviews that need to be translated into English. We need to employ a language identification tool called `Langdect` to determine the languages used in the review dataset.

In [None]:
#Defining the Language Detection Function:
from langdetect import detect

def detect_my(text):
    try:
        return detect(text)
    except:
        return 'unknown'
#assigning the language to new column "language"
Review_df['language'] = Review_df['comments'].apply(detect_my)

In [None]:
#Saving the dataframe before proceeding
# sometime python crashed so its better to save before proceeding to next analysis
path= r"C:\Users\AIRBNB_project" #Change the path
import os
Review_df.to_csv(os.path.join(path,r'2021-07-05.review_language.csv'))

In [12]:
df = Review_df.rename(columns = {'comments':'review'}, inplace = True) #renaming comments to review
# Droppping the column rows that have null value in it
df = df.dropna(subset=['review']) 

In [13]:
#Convertig the review date to 3 different column
Review_df['Year'],Review_df['Month'],Review_df['Day']=Review_df['date'].str.split('-',2).str
Review_df.head()

  


Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,review,Year,Month,Day
0,109,449036,2011-08-15,927861,Edwin,The host canceled my reservation the day befor...,2011,8,15
1,109,74506539,2016-05-15,22509885,Jenn,Me and two friends stayed for four and a half ...,2016,5,15
2,2708,13994902,2014-06-09,10905424,Kuberan,i had a wonderful stay. Everything from start ...,2014,6,9
3,2708,14606598,2014-06-23,2247288,Camilla,Charles is just amazing and he made my stay sp...,2014,6,23
4,2708,39597339,2015-07-25,27974696,Fallon,Staying with Chas was an absolute pleasure. He...,2015,7,25


In [14]:
#Counting the frequency for each year
count_year_type =Review_df['Year'].value_counts()
pd.set_option('display.max_rows', None)

count_year_type_df = count_year_type.to_frame().reset_index()
count_year_type_df = count_year_type_df.rename(columns = {'index':'year', 'Year':'frequency'})
count_year_type_df

Unnamed: 0,year,frequency
0,2019,285476
1,2018,201756
2,2020,182977
3,2017,130387
4,2021,120460
5,2016,82383
6,2015,41928
7,2014,18880
8,2013,7654
9,2012,3160


### Translating Reviews

In [15]:
#creatig a new dataframe that have review from 2021
df_21= Review_df[Review_df['Year'] == '2021']
df_21.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,review,Year,Month,Day
30,2708,733454164,2021-02-21,349357923,Gwendolyn,Very nice place to stay...Charles was great,2021,2,21
31,2708,388464514550605494,2021-06-19,130482838,Adam,I had a good experience and Charles works very...,2021,6,19
616,5729,730418199,2021-02-11,54572610,Andrei,"This place is the best place in the world,you ...",2021,2,11
617,5729,386256999451025116,2021-06-16,38367844,Matt,Wonderful experience in the Zen room. The prop...,2021,6,16
769,6931,724463576,2021-01-15,67036093,Alona,.,2021,1,15


In [17]:
#Counting the no. of reviews for each language
count_language_type =df_21['language'].value_counts()
pd.set_option('display.max_rows', None)

count_language_type_df = count_language_type.to_frame().reset_index()
count_language_type_df = count_language_type_df.rename(columns = {'index':'language_type', 'language_type':'frequency'})
count_language_type_df

Unnamed: 0,language_type,language
0,en,263973
1,ro,4224
2,fr,4125
3,es,2859
4,zh-cn,2738
5,af,1554
6,so,543
7,ja,472
8,ca,312
9,ru,311


In [None]:
#TRANSLATING REVIEWS TO ENGLISH USING GOOGLE TRANSLATER
import translators as ts

df_bg = df1[df1["language"] == 'bg']
df_bg['trans_text'] = df_bg['comments'].apply(lambda x: ts.google(x, from_language='bg', to_language='en'))

df_mk = df_21[df_21["language"] == 'mk']
df_mk['trans_text'] = df_mk['comments'].apply(lambda x: ts.google(x, from_language='mk', to_language='en'))

df_th = df_21[df_21["language"] == 'th']
df_th['trans_text'] = df_th['comments'].apply(lambda x: ts.google(x, from_language='th', to_language='en'))

df_fa = df1[df1["language"] == 'fa']
df_fa['trans_text'] = df_fa['comments'].apply(lambda x: ts.google(x, from_language='fa', to_language='en'))

df_lv = df_21[df_21["language"] == 'lv']
df_lv['trans_text'] = df_lv['comments'].apply(lambda x: ts.google(x, from_language='lv', to_language='en'))

df_ar = df1[df1["language"] == 'ar']
df_ar['trans_text'] = df_ar['comments'].apply(lambda x: ts.google(x, from_language='ar', to_language='en'))

df_lt = df1[df1["language"] == 'lt']
df_lt['trans_text'] = df_lt['comments'].apply(lambda x: ts.google(x, from_language='lt', to_language='en'))

df_sq = df_21[df_21["language"] == 'sq']
df_sq['trans_text'] = df_sq['comments'].apply(lambda x: ts.google(x, from_language='sq', to_language='en'))

df_zhtw = df1[df1["language"] == 'zh-tw']
df_zhtw['trans_text'] = df_zhtw['comments'].apply(lambda x: ts.google(x, from_language='zh-TW', to_language='en'))

df_sk = df1[df1["language"] == 'sk']
df_sk['trans_text'] = df_sk['comments'].apply(lambda x: ts.google(x, from_language='sk', to_language='en'))

df_tr = df1[df1["language"] == 'tr']
df_tr['trans_text'] = df_tr['comments'].apply(lambda x: ts.google(x, from_language='tr', to_language='en'))

df_vi = df1[df1["language"] == 'vi']
df_vi['trans_text'] = df_vi['comments'].apply(lambda x: ts.google(x, from_language='vi', to_language='en'))

df_ja = df1[df1["language"] == 'ja']
df_ja['trans_text'] = df_ja['comments'].apply(lambda x: ts.google(x, from_language='ja', to_language='en'))

df_fi = df1[df1["language"] == 'fi']
df_fi['trans_text'] = df_fi['comments'].apply(lambda x: ts.google(x, from_language='fi', to_language='en'))

df_id = df1[df1["language"] == 'id']
df_id['trans_text'] = df_id['comments'].apply(lambda x: ts.google(x, from_language='id', to_language='en'))

df_sv = df1[df1["language"] == 'sv']
df_sv['trans_text'] = df_sv['comments'].apply(lambda x: ts.google(x, from_language='sv', to_language='en'))

df_hu = df1[df1["language"] == 'hu']
df_hu['trans_text'] = df_hu['comments'].apply(lambda x: ts.google(x, from_language='hu', to_language='en'))

df_et = df1[df1["language"] == 'et']
df_et['trans_text'] = df_et['comments'].apply(lambda x: ts.google(x, from_language='et', to_language='en'))

df_pt = df1[df1["language"] == 'pt']
df_pt['trans_text'] = df_pt['comments'].apply(lambda x: ts.google(x, from_language='pt', to_language='en'))

df_da = df1[df1["language"] == 'da']
df_da['trans_text'] = df_da['comments'].apply(lambda x: ts.google(x, from_language='da', to_language='en'))

df_it = df1[df1["language"] == 'it']
df_it['trans_text'] = df_it['comments'].apply(lambda x: ts.google(x, from_language='it', to_language='en'))

df_sl = df1[df1["language"] == 'sl']
df_sl['trans_text'] = df_sl['comments'].apply(lambda x: ts.google(x, from_language='sl', to_language='en'))

df_cy = df1[df1["language"] == 'cy']
df_cy['trans_text'] = df_cy['comments'].apply(lambda x: ts.google(x, from_language='cy', to_language='en'))

df_hr = df1[df1["language"] == 'hr']
df_hr['trans_text'] = df_hr['comments'].apply(lambda x: ts.google(x, from_language='hr', to_language='en'))

df_no = df1[df1["language"] == 'no']
df_no['trans_text'] = df_no['comments'].apply(lambda x: ts.google(x, from_language='no', to_language='en'))

df_ru = df1[df1["language"] == 'ru']
df_ru['trans_text'] = df_ru['comments'].apply(lambda x: ts.google(x, from_language='ru', to_language='en'))

df_sw = df1[df1["language"] == 'sw']
df_sw['trans_text'] = df_sw['comments'].apply(lambda x: ts.google(x, from_language='sw', to_language='en'))

df_de = df1[df1["language"] == 'de']
df_de['trans_text'] = df_de['comments'].apply(lambda x: ts.google(x, from_language='de', to_language='en'))

df_nl = df1[df1["language"] == 'nl']
df_nl['trans_text'] = df_nl['comments'].apply(lambda x: ts.google(x, from_language='nl', to_language='en'))

df_cs = df1[df1["language"] == 'cs']
df_cs['trans_text'] = df_cs['comments'].apply(lambda x: ts.google(x, from_language='cs', to_language='en'))

df_pl = df1[df1["language"] == 'pl']
df_pl['trans_text'] = df_pl['comments'].apply(lambda x: ts.google(x, from_language='pl', to_language='en'))

df_ca = df1[df1["language"] == 'ca']
df_ca['trans_text'] = df_ca['comments'].apply(lambda x: ts.google(x, from_language='ca', to_language='en'))

df_zhcn = df1[df1["language"] == 'zh-cn']
df_zhcn['trans_text'] = df_zhcn['comments'].apply(lambda x: ts.google(x, from_language='zh-CN', to_language='en'))

df_tl = df1[df1["language"] == 'tl']
df_tl['trans_text'] = df_tl['comments'].apply(lambda x: ts.google(x, from_language='tl', to_language='en'))

df_so = df1[df1["language"] == 'so']
df_so['trans_text'] = df_so['comments'].apply(lambda x: ts.google(x, from_language='so', to_language='en'))

df_af = df1[df1["language"] == 'af']
df_af['trans_text'] = df_af['comments'].apply(lambda x: ts.google(x, from_language='af', to_language='en'))

df_fr = df1[df1["language"] == 'fr']
df_fr['trans_text'] = df_fr['comments'].apply(lambda x: ts.google(x, from_language='fr', to_language='en'))

df_ro = df1[df1["language"] == 'ro']
df_ro['trans_text'] = df_ro['comments'].apply(lambda x: ts.google(x, from_language='ro', to_language='en'))

df_es = df1[df1["language"] == 'es']
df_es['trans_text'] = df_es['comments'].apply(lambda x: ts.google(x, from_language='es', to_language='en'))

In [None]:
df_final=pd.concat([df_bg, df_mk, df_th, df_fa, df_lv, df_ar, df_lt, df_sq, df_zhtw, df_sk, df_tr, df_vi, df_ja, 
                    df_fi, df_id, df_sv, df_hu, df_et, df_pt, df_da, df_it, df_sl, df_cy, df_hr, df_no, df_ru,
                    df_sw, df_de, df_nl, df_nl, df_cs, df_pl, df_ca, df_zhcn, df_tl, df_so, df_so, df_af, 
                    df_fr, df_ro, df_es])

In [None]:
#Saving the dataframe before proceeding
# sometime python crashed so its better to save before proceeding to next analysis
path= r"C:\Users\AIRBNB_project" #Change the path
import os
df_final.to_csv(os.path.join(path,r'Translated_Review.csv'))

In the dataset, we found **6.18%** of the data was in another language besides English. We then used a translation tool called `google trans` to translate the non-English reviews into English. This ensures that every review uses the same language for us to analyze.  

In [None]:
#loading the dataset
listing_df21 = pd.read_csv(r"C:\Users\AIRBNB_project\los.angeles.2021-07-05.listings.csv.gz", compression='gzip', 
                         header=0, sep=',', quotechar='"')

#Remnaming the column to merge the review data
listing_df21.rename(columns={'id': 'listing_id'}, inplace=True)

In [None]:
#merging the listing data with Review data
merged = pd.merge(Review_df, listing_df21, how="left", on=["listing_id"])

In [None]:
# Getting the columns we need for review analysis
train= merged[["listing_id", "id", "date", "reviewer_id", "reviewer_name","review_scores_rating", "review", 
               "language", "trans_text",]].copy()
train.info()

### Text Cleaning

After translating reviews to English removed all punctuation, changed all text to lowercase, and eliminated any additional white space from the review text using the **NLTK** package. The data is standardized at this step, which facilitates processing and analysis. All stop words, including "a," "the," "of," "I," "you," "is," "it," and others, have been eliminated from the review data. To facilitate retrieval, each review is tokenized into words and saved in a list. Using the pre-built Python library, sentiment. The **Vader** package was used to extract the sentiment score using `SentimentIntensityAnalyzer` from **NLTK**. Positive, Negative, Neutral, and compound scores are the four features that Sentiment Analysis retrieved.

In [None]:
corpus = np.array(train['trans_text']) #convert from Dataframe to array
corpus = corpus.ravel()#used to convert from multi-dimensional array to 1 dimension

In [None]:
#Preprocessing Text Corpus
wpt=nltk.WordPunctTokenizer()

stop_words = set(stopwords.words('english')) 

In [None]:
def normalize_docuement (doc):
    #lowercase and remove special characters\whitespace
    doc=re.sub(r'[^a-zA-Z\s]', '', str(doc), re.I|re.A) #re.I ignore case sensitive, ASCII-only matching
    doc=doc.lower()
    doc=doc.strip()
    #tokenize document
    tokens=wpt.tokenize(doc)
    #filter stopwords out of document
    filtered_tokens=[token for token in tokens if token not in stop_words]
    #re-create documenr from filtered tokens
    doc=' '.join(filtered_tokens)
    return doc

In [None]:
normalize_corpus=np.vectorize(normalize_docuement)
norm_corpus=normalize_corpus(corpus)

In [None]:
train['corpus'] = norm_corpus.tolist()

In [41]:
train.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,review_scores_rating,review,Year,Month,Day,language,trans_text,corpus
0,19102307,503325055,2019-08-05,195062131,Abdulhameed,5.0,Ø±Ø§Ø§Ø§Ø¦Ø¹,2019,8,5,ur,RAIA,raia
1,7478647,469007615,2019-06-13,196342565,Helene,4.85,Nous n'avons pas vu Melinda,2019,6,13,lv,Nous n'Avons pas vu melinda,nous navons pas vu melinda
2,16458178,514196068,2019-08-20,70802035,Peter,4.97,Puppies,2019,8,20,lv,Puppies,puppies
3,21234077,580530563,2019-12-22,310702292,Eddie,4.96,BEST airbnb IN LA,2019,12,22,lv,Best Airbnb in la,best airbnb la
4,24086996,566220677,2019-11-18,308207926,Skaidra,4.83,"Ir viss nepiecieÅ¡amais, atsaucÄ«ga saimniece",2019,11,18,lv,"Is all necessary, responsive hostess",necessary responsive hostess


### Sentiment Score Extraction

In [None]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

train['neg'] = train['corpus'].apply(lambda x:analyzer.polarity_scores(x)['neg'])
train['neu'] = train['corpus'].apply(lambda x:analyzer.polarity_scores(x)['neu'])
train['pos'] = train['corpus'].apply(lambda x:analyzer.polarity_scores(x)['pos'])
train['compound'] = train['corpus'].apply(lambda x:analyzer.polarity_scores(x)['compound'])

In [29]:
train.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,review_scores_rating,review,Year,Month,Day,language,trans_text,corpus,neg,neu,pos,compound
0,19102307,503325055,2019-08-05,195062131,Abdulhameed,5.0,Ø±Ø§Ø§Ø§Ø¦Ø¹,2019,8,5,ur,RAIA,raia,0.0,1.0,0.0,0.0
1,7478647,469007615,2019-06-13,196342565,Helene,4.85,Nous n'avons pas vu Melinda,2019,6,13,lv,Nous n'Avons pas vu melinda,nous navons pas vu melinda,0.0,1.0,0.0,0.0
2,16458178,514196068,2019-08-20,70802035,Peter,4.97,Puppies,2019,8,20,lv,Puppies,puppies,0.0,1.0,0.0,0.0
3,21234077,580530563,2019-12-22,310702292,Eddie,4.96,BEST airbnb IN LA,2019,12,22,lv,Best Airbnb in la,best airbnb la,0.0,0.323,0.677,0.6369
4,24086996,566220677,2019-11-18,308207926,Skaidra,4.83,"Ir viss nepiecieÅ¡amais, atsaucÄ«ga saimniece",2019,11,18,lv,"Is all necessary, responsive hostess",necessary responsive hostess,0.0,0.444,0.556,0.3612


###  Function to Count Number of Words

In [None]:
#Function to help us extract count of regex patterns from review text
def count_regex(pattern, row):
    return len(re.findall(pattern, row))

In [None]:
#Extracting count of capital words
count_words = train['trans_text'].apply(lambda x: count_regex(r'\b[a-zA-Z]{2,}\b', x))
train['Word_Count'] = count_words.to_frame()

In [36]:
train.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,review_scores_rating,review,Year,Month,Day,language,trans_text,corpus,neg,neu,pos,compound,Word_Count
0,19102307,503325055,2019-08-05,195062131,Abdulhameed,5.0,Ø±Ø§Ø§Ø§Ø¦Ø¹,2019,8,5,ur,RAIA,raia,0.0,1.0,0.0,0.0,1
1,7478647,469007615,2019-06-13,196342565,Helene,4.85,Nous n'avons pas vu Melinda,2019,6,13,lv,Nous n'Avons pas vu melinda,nous navons pas vu melinda,0.0,1.0,0.0,0.0,5
2,16458178,514196068,2019-08-20,70802035,Peter,4.97,Puppies,2019,8,20,lv,Puppies,puppies,0.0,1.0,0.0,0.0,1
3,21234077,580530563,2019-12-22,310702292,Eddie,4.96,BEST airbnb IN LA,2019,12,22,lv,Best Airbnb in la,best airbnb la,0.0,0.323,0.677,0.6369,4
4,24086996,566220677,2019-11-18,308207926,Skaidra,4.83,"Ir viss nepiecieÅ¡amais, atsaucÄ«ga saimniece",2019,11,18,lv,"Is all necessary, responsive hostess",necessary responsive hostess,0.0,0.444,0.556,0.3612,5


### Function to Count Noun phrases in review text

In [None]:
#! pip install -U git+https://github.com/sloria/TextBlob.git@dev
from textblob import TextBlob

In [None]:
def identify_noun_count(sentence):
    blob = TextBlob(sentence)
    return len(blob.noun_phrases)

In [None]:
#Extracting noun count from our review text
train['noun_count'] = train['trans_text'].apply(identify_noun_count)

In [39]:
train.head()

Unnamed: 0,listing_id,id,date,reviewer_id,reviewer_name,review_scores_rating,review,Year,Month,Day,language,trans_text,corpus,neg,neu,pos,compound,Word_Count,noun_count
0,19102307,503325055,2019-08-05,195062131,Abdulhameed,5.0,Ø±Ø§Ø§Ø§Ø¦Ø¹,2019,8,5,ur,RAIA,raia,0.0,1.0,0.0,0.0,1,1
1,7478647,469007615,2019-06-13,196342565,Helene,4.85,Nous n'avons pas vu Melinda,2019,6,13,lv,Nous n'Avons pas vu melinda,nous navons pas vu melinda,0.0,1.0,0.0,0.0,5,2
2,16458178,514196068,2019-08-20,70802035,Peter,4.97,Puppies,2019,8,20,lv,Puppies,puppies,0.0,1.0,0.0,0.0,1,1
3,21234077,580530563,2019-12-22,310702292,Eddie,4.96,BEST airbnb IN LA,2019,12,22,lv,Best Airbnb in la,best airbnb la,0.0,0.323,0.677,0.6369,4,1
4,24086996,566220677,2019-11-18,308207926,Skaidra,4.83,"Ir viss nepiecieÅ¡amais, atsaucÄ«ga saimniece",2019,11,18,lv,"Is all necessary, responsive hostess",necessary responsive hostess,0.0,0.444,0.556,0.3612,5,1


In [None]:
#Saving the dataframe before proceeding
# sometime python crashed so its better to save before proceeding to next analysis
path= r"C:\Users\13152\Desktop\AIRBNB_project" #Change the path
import os
train.to_csv(os.path.join(path,r'Final_Review_df.csv'))