# Importing Data

In [1]:
import pandas as pd

In [2]:
# Importing the scraped datasets
df = pd.DataFrame(pd.read_csv("review_ratings.csv"))
df2 = pd.DataFrame(pd.read_csv("disneytrivago.csv"))
df3 = pd.DataFrame(pd.read_csv("disneygooglereviews.csv"))

In [3]:
df.head(1)

Unnamed: 0,review,rating,date
0,This place is definitely the happiest place on...,5 star rating,12/26/2022


In [4]:
df2.head(1)

Unnamed: 0.1,Unnamed: 0,ratings,date,review
0,0,5,Jan 2023,A fantastic experience. Very busy in the after...


In [5]:
df3.head(1)

Unnamed: 0.1,Unnamed: 0,reviews,ratings,date
0,0,There's an undeniable magic to Disneyland. I m...,4/5,9 months ago on Google


Current issue with the dataset:

1. Different Date and Ratings Formats
2. Dataframe 'df' does not have the column 'Unnamed: 0' while df2 and df3 has.
3. Different column naming: reviews and review column

# Data Merging

## Removing Unnecessary Column

In [6]:
df2 = df2.drop(columns=['Unnamed: 0'])
df3 = df3.drop(columns=['Unnamed: 0'])

In [7]:
df2.head(1)

Unnamed: 0,ratings,date,review
0,5,Jan 2023,A fantastic experience. Very busy in the after...


In [8]:
df3.head(1)

Unnamed: 0,reviews,ratings,date
0,There's an undeniable magic to Disneyland. I m...,4/5,9 months ago on Google


## Renaming Columns

In [9]:
df2 = df2.rename(columns={"ratings": "rating"})

In [10]:
df3 = df3.rename(columns={"ratings": "rating", "reviews":"review"})

In [15]:
df.head(1)

Unnamed: 0,review,rating,date
0,This place is definitely the happiest place on...,5 star rating,12/26/2022


In [12]:
df2.head(1)

Unnamed: 0,rating,date,review
0,5,Jan 2023,A fantastic experience. Very busy in the after...


In [13]:
df3.head(1)

Unnamed: 0,review,rating,date
0,There's an undeniable magic to Disneyland. I m...,4/5,9 months ago on Google


The dataframes now have the same column names and are ready for merging.

## Merging DataFrames

In [20]:
print("Number of records in dataframe df:",df.shape[0])
print("Number of records in dataframe df2:",df2.shape[0])
print("Number of records in dataframe df3:",df3.shape[0])
print("Expected number of records after merging the dataframes:",df.shape[0]+df2.shape[0]+df3.shape[0])

Number of records in dataframe df: 6000
Number of records in dataframe df2: 15990
Number of records in dataframe df3: 9810
Expected number of records after merging the dataframes: 31800


In [26]:
merged_df = pd.concat([df, df2, df3])

In [27]:
print("Number of records in the merged dataframe:",merged_df.shape[0])

Number of records in the merged dataframe: 31800


In [28]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 31800 entries, 0 to 9809
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   review  31800 non-null  object
 1   rating  31800 non-null  object
 2   date    31398 non-null  object
dtypes: object(3)
memory usage: 993.8+ KB


In [31]:
merged_df.to_csv("merged_reviews.csv", index=False)