# 01-💌NLP Sentiment Analysis Dating Apps Beoordelingen | 2017-2023 💌

# Inleiding

Het hoofddoel van dit project is het analyseren van beoordelingen van dating-apps uit de Google Play Store in verschillende landen (India en de VS), beide grote markten met een breed netwerk van gebruikers van dating-apps. Door deze datasets samen te voegen, zal ik een NLP-sentimentanalyse uitvoeren om de verschillende gedragingen en algemene gevoelens van gebruikers op verschillende datingplatforms te onderzoeken.

De analyse richt zich op het beantwoorden van drie hoofdvragen:

- Wat zijn de gebruikelijke gedragingen en gevoelens die mensen hebben bij het gebruik van verschillende dating-apps (Tinder, Bumble, Hinge)?
- Welke dating-app is het meest geschikt voor daten, relaties of beste functionaliteiten?
- Wat is het algemene gevoel van gebruikers (in algemeen) over het gebruik van dating-apps?

Aan het einde van deze notebook zal een samenvatting staan van de inzichten uit de sentimentanalyse. De verschillende datasets die ik voor deze analyse heb gebruikt, zijn:


- Dating Apps Reviews India 2017-2022 (dating_review): https://www.kaggle.com/datasets/sidharthkriplani/datingappreviews/data
- Dating Apps Reviews US 2023 (dating_review2): https://figshare.com/articles/dataset/Text_of_user_reviews_of_dating_apps/21895827

01- 💌NLP Sentiment Analysis Dating Apps Beoordelingen | 2017-2023 💌 -> Zal de merging, cleaning, EDA van alle datasets

02- 💌NLP Sentiment Analysis Dating Apps Beoordelingen | 2017-2023 💌 -> ML en sentimentanalyse

# 📦Packages laden

In dit eerste deel van de notebook laden we de benodigde libraries voor het schoonmaken van data (cleaning), het uitvoeren van data-analyse (EDA), en het combineren (mergen) van datasets. In het tweede deel zullen we dieper ingaan op data-analyse en sentimentanalyse om voorspellingen te doen. Voor dit deel gebruiken we alleen de basisbibliotheken:

In [221]:
import pandas as pd
import numpy as np
import string
from faker import Faker
import pytz

# Waarom gebruiken we Faker?

In dit eerste deel gaan we nepdata genereren, zoals namen, tijdstippen en datums. Dit is handig om de tweede dataset compleet te maken en het proces van het samenvoegen (merging) eenvoudiger en efficiënter te maken, zodat je kunt testen zonder echte gegevens te gebruiken.

# 🔋Datasets Laden

In dit gedeelte laden we de datasets en leggen we de stappen vast die nodig zijn om tot een complete en opgeschoonde dataset te komen. Het proces verloopt als volgt:

Stap 1: Eerste dataset laden

- We beginnen met het laden van de eerste dataset. Deze dataset wordt geanalyseerd (EDA) en opgeschoond (cleaning) om inzicht te krijgen in de data en eventuele problemen op te lossen. De eerste dataset bevat gebruikersbeoordelingen van verschillende dating-apps uit de periode 2017-2023 uit India.

Stap 2: Tweede dataset laden

- Daarna creëren we een tweede dataset door drie afzonderlijke datasets samen te voegen. Deze samengevoegde dataset wordt opgeschoond en geanalyseerd (EDA).

- Vervolgens voegen we synthetische data toe aan de tweede dataset en voeren we nog een keer een opschoning en EDA uit om de consistentie te waarborgen. De (nieuwe)tweede dataset zou gebruikersbeoordelingen van verschillende dating-apps uit de periode 2023 uit VS bevaten.


Stap 3: Samenvoegen (mergen) van datasets

- Na de voorbereiding van beide datasets (eerste en nieuwe tweede dataset), combineren we ze tot één definitieve dataset.

- Tot slot wordt de gecombineerde dataset verder opgeschoond en geanalyseerd (laatste EDA), zodat deze klaar is voor verdere verwerking, zoals modellering of voorspellingen.


# 🪜Stap 1: Eerste dataset laden

We gaan de eerste dataset laden en daarna beginnen we met het analyseren (EDA) en het opschonen (cleaning) om inzichten te verkrijgen uit deze dataset.

In [222]:
# Eerste Dataset 'dating app reviews India 2017-2023' laden 
dating_review= pd.read_csv("C:\\Users\\kaile\\OneDrive\\Documents\\S4\\notebooks\\DatingAppReviewsDataset.csv\\DatingAppReviewsDataset.csv",index_col=0)
dating_review

Unnamed: 0,Name,Review,Rating,#ThumbsUp,Date&Time,App
0,linah sibanda,On this app i cant find a partner,5,0,18-02-2022 01:19,Tinder
1,Norman Johnson,Tinder would be so much better if we could spe...,3,0,18-02-2022 01:16,Tinder
2,David Hume,Still doesn't correctly notify matches or mess...,1,0,18-02-2022 01:11,Tinder
3,Last 1 Standing,"Got banned because I updated my bio to say ""I ...",2,0,18-02-2022 01:11,Tinder
4,Arthur Magamedov,Love it!,5,0,18-02-2022 01:06,Tinder
...,...,...,...,...,...,...
52989,A Google user,Useless - I'm in the UK and it tells me i'm ov...,2,5,12-07-2017 01:44,Hinge
52990,Brian Shook,I can't get past the initial set up. It won't...,1,11,12-07-2017 01:36,Hinge
52991,A Google user,This is incredible! A quality dating app for A...,5,1,12-07-2017 01:32,Hinge
52992,A Google user,"""Over Water"" ... Can't choose location.",2,8,12-07-2017 01:28,Hinge


# EDA (Exploratory Data Analysis)

EDA helpt ons om deze dataset beter te begrijpen door inzicht te krijgen in de structuur, datakwaliteit en mogelijke trends. We analyseren kolommen zoals Rating, #ThumbsUp, en Date&Time om gebruikerspatronen en tevredenheid te ontdekken. Daarnaast controleren we op ontbrekende waarden en bereiden we de data voor op verdere analyse.

In [223]:
# Dataset laden 10 eerste rijen
dating_review.head()

Unnamed: 0,Name,Review,Rating,#ThumbsUp,Date&Time,App
0,linah sibanda,On this app i cant find a partner,5,0,18-02-2022 01:19,Tinder
1,Norman Johnson,Tinder would be so much better if we could spe...,3,0,18-02-2022 01:16,Tinder
2,David Hume,Still doesn't correctly notify matches or mess...,1,0,18-02-2022 01:11,Tinder
3,Last 1 Standing,"Got banned because I updated my bio to say ""I ...",2,0,18-02-2022 01:11,Tinder
4,Arthur Magamedov,Love it!,5,0,18-02-2022 01:06,Tinder


In [224]:
#Alle informatie over de datatypes en de shape van dataset
dating_review.info()
print("shape of the dataset -->>",np.shape(dating_review))

<class 'pandas.core.frame.DataFrame'>
Index: 681994 entries, 0 to 52993
Data columns (total 6 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   Name       681987 non-null  object
 1   Review     680609 non-null  object
 2   Rating     681994 non-null  int64 
 3   #ThumbsUp  681994 non-null  int64 
 4   Date&Time  681994 non-null  object
 5   App        681994 non-null  object
dtypes: int64(2), object(4)
memory usage: 36.4+ MB
shape of the dataset -->> (681994, 6)


Opmerkingen van de dataset informatie: 

- De dtypes zijn de verkeerde dus we zullen op de correcte manier converteren
- Het opsplitsen van Date&Time maakt het makkelijker om te analyseren of beoordelingen verbeteren over tijd, met specifieke datums en tijdstippen apart.

We gaan eerst de datatypes wijzigen naar de correcte types:

- Converteer Rating en #ThumbsUp naar gehele getallen.
- Converteer Date&Time naar een datetime-object.
- Zorg ervoor dat Review en App strings zijn.

Daarna gaan we de Date & Time aan elkaar scheiding: 
- We gaan de kolom Date&Time in twee aparte kolommen opsplitsen : één voor de datum en één voor de tijd. Daarna verwijderen we de oorspronkelijke kolom Date&Time, omdat deze niet meer nodig is.

In [225]:
#Verandeer de dtypes van object naar int
dating_review['Rating'] = dating_review['Rating'].astype(int)
dating_review['#ThumbsUp'] = dating_review['#ThumbsUp'].astype(int) 

In [226]:
#Verandeer de dtypes van object naar string
dating_review['Review'] = dating_review['Review'].astype(str) 
dating_review['App'] = dating_review['App'].astype(str)  

In [227]:
#Alle informatie over de datatypes en de shape van dataset
dating_review.info()

<class 'pandas.core.frame.DataFrame'>
Index: 681994 entries, 0 to 52993
Data columns (total 6 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   Name       681987 non-null  object
 1   Review     681994 non-null  object
 2   Rating     681994 non-null  int32 
 3   #ThumbsUp  681994 non-null  int32 
 4   Date&Time  681994 non-null  object
 5   App        681994 non-null  object
dtypes: int32(2), object(4)
memory usage: 31.2+ MB


# Date & Time opsplitsen

In [228]:
dating_review["Date"] = pd.to_datetime(dating_review['Date&Time'], dayfirst= True).dt.date

In [229]:
dating_review["Time"] = pd.to_datetime(dating_review['Date&Time'], dayfirst = True).dt.time

In [230]:
dating_review= dating_review.drop("Date&Time",axis=1)

In [231]:
dating_review["Date"] = pd.to_datetime(dating_review["Date"])

In [232]:
dating_review["Time"] = dating_review["Time"].astype(str)

In [233]:
#Alle informatie over de datatypes en de shape van dataset
dating_review.info()

<class 'pandas.core.frame.DataFrame'>
Index: 681994 entries, 0 to 52993
Data columns (total 7 columns):
 #   Column     Non-Null Count   Dtype         
---  ------     --------------   -----         
 0   Name       681987 non-null  object        
 1   Review     681994 non-null  object        
 2   Rating     681994 non-null  int32         
 3   #ThumbsUp  681994 non-null  int32         
 4   App        681994 non-null  object        
 5   Date       681994 non-null  datetime64[ns]
 6   Time       681994 non-null  object        
dtypes: datetime64[ns](1), int32(2), object(4)
memory usage: 36.4+ MB


In [234]:
#Reorder columns in a way that is understanding
new_column_order = ['Name', 'Review', 'Rating', '#ThumbsUp', 'Date', 'Time', 'App']
dating_review = dating_review[new_column_order]

In [235]:
dating_review.head()

Unnamed: 0,Name,Review,Rating,#ThumbsUp,Date,Time,App
0,linah sibanda,On this app i cant find a partner,5,0,2022-02-18,01:19:00,Tinder
1,Norman Johnson,Tinder would be so much better if we could spe...,3,0,2022-02-18,01:16:00,Tinder
2,David Hume,Still doesn't correctly notify matches or mess...,1,0,2022-02-18,01:11:00,Tinder
3,Last 1 Standing,"Got banned because I updated my bio to say ""I ...",2,0,2022-02-18,01:11:00,Tinder
4,Arthur Magamedov,Love it!,5,0,2022-02-18,01:06:00,Tinder


Na de transformatie en veranderingen van dtypes kunnen we nu verdergaan met de rest van de EDA: het controleren van ontbrekende waarden en het bekijken van de distributie van de dataset.

In [236]:
# Check non-null values for each column
print(dating_review.isnull().sum())

Name         7
Review       0
Rating       0
#ThumbsUp    0
Date         0
Time         0
App          0
dtype: int64


We gaan in de volgende stap de null waarden verwijderen/aanpassen in het opschonen stap (cleaning) van de data

In [237]:
# We gaan alleen de numerieke nummers dtypes van de describe tabel zien. 
dating_review.describe(include=[np.number])

Unnamed: 0,Rating,#ThumbsUp
count,681994.0,681994.0
mean,2.997183,1.873719
std,1.746953,24.448095
min,0.0,0.0
25%,1.0,0.0
50%,3.0,0.0
75%,5.0,0.0
max,5.0,5507.0


In [238]:
dating_review.tail()

Unnamed: 0,Name,Review,Rating,#ThumbsUp,Date,Time,App
52989,A Google user,Useless - I'm in the UK and it tells me i'm ov...,2,5,2017-07-12,01:44:00,Hinge
52990,Brian Shook,I can't get past the initial set up. It won't...,1,11,2017-07-12,01:36:00,Hinge
52991,A Google user,This is incredible! A quality dating app for A...,5,1,2017-07-12,01:32:00,Hinge
52992,A Google user,"""Over Water"" ... Can't choose location.",2,8,2017-07-12,01:28:00,Hinge
52993,Dylan Fick,"My entire town counts as ""over water"" and I ca...",2,15,2017-07-12,01:24:00,Hinge


# 🧼 Data cleaning 

In deze stap zullen we de null warden vervangen, duplicates verwijderen en verwijderen van onnodige kolommen 

In [239]:
# Check non-null values for each column
print(dating_review.isnull().sum())

Name         7
Review       0
Rating       0
#ThumbsUp    0
Date         0
Time         0
App          0
dtype: int64


In [240]:
# Replace null values in the 'Name' column with 'A Google User'
dating_review['Name'] = dating_review['Name'].fillna('A Google User')

In [241]:
# Verify if null values are replaced
print(dating_review.isnull().sum())

Name         0
Review       0
Rating       0
#ThumbsUp    0
Date         0
Time         0
App          0
dtype: int64


In [263]:
# Controleer op duplicaten
print(dating_review.duplicated().sum())

0


In [266]:
# Onnodige kolommen verwijderen:
dating_review.drop(['Time'], axis=1, inplace=True)

In [267]:
dating_review.head()

Unnamed: 0,Name,Review,Rating,#ThumbsUp,Date,App
0,linah sibanda,On this app i cant find a partner,5,0,2022-02-18,Tinder
1,Norman Johnson,Tinder would be so much better if we could spe...,3,0,2022-02-18,Tinder
2,David Hume,Still doesn't correctly notify matches or mess...,1,0,2022-02-18,Tinder
3,Last 1 Standing,"Got banned because I updated my bio to say ""I ...",2,0,2022-02-18,Tinder
4,Arthur Magamedov,Love it!,5,0,2022-02-18,Tinder


Na het opschonen process, zullen we de tweede dataset opbouwen om naar beide datasets combineren (mergen).

# 🪜Stap 2: Tweede Dataset opbouwen

Allereerst combineren we de drie datasets die we gaan gebruiken om één samengevoegde dataset te creëren. Daarna starten we met het opschonen van de gegevens en voeren we een Exploratory Data Analysis (EDA) uit om de dataset geschikt te maken voor verdere analyse

# 🔋Datasets Laden

In [242]:
# Dataset laden van 'tinder_reviews'
tinder_reviews= pd.read_csv("C:\\Users\\kaile\\OneDrive\\Documents\\S4\\notebooks\\21895827\\tinder.csv", encoding='ISO-8859-1')
tinder_reviews

Unnamed: 0,content,score,thumbsUpCount
0,Matches can be misleading. I keep getting matc...,2,0
1,This app is not working. Whenever i got notifi...,1,0
2,They want you to pay,1,0
3,BANNED ME FOR NO REASON AND I DIDN'T EVEN GET ...,1,0
4,Used to be good and easy to use... Every time ...,2,0
...,...,...,...
538445,Best app ever finally on android,5,2
538446,Tinder is extremely buggy on the galaxy S4 act...,1,0
538447,Keeps crashing.,1,0
538448,Crashes. Doesn't load. Total failure. Take it ...,1,0


In [243]:
# Dataset laden van 'bumble_reviews'
bumble_reviews= pd.read_csv("C:\\Users\\kaile\\OneDrive\\Documents\\S4\\notebooks\\21895827\\bumble.csv", encoding='ISO-8859-1')
bumble_reviews

Unnamed: 0,content,score,thumbsUpCount
0,"After being a premium user, I'm not able to lo...",1,0
1,superb,1,0
2,"Fraudulent App, If you install a basic version...",1,0
3,"It's a lot better than Hinge, but it's still n...",3,0
4,"good app, thanks dear women, you are beautiful",5,0
...,...,...,...
109865,Finally here!,5,54
109866,Finally!,5,76
109867,"Finally, an app where women have to start the ...",4,2
109868,At last we have Android version!,5,60


In [244]:
# Dataset van 'hinge_reviews'
hinge_reviews= pd.read_csv("C:\\Users\\kaile\\OneDrive\\Documents\\S4\\notebooks\\21895827\\hinge.csv", encoding='ISO-8859-1')
hinge_reviews

Unnamed: 0,content,score,thumbsUpCount
0,The chat function doesn't work no matter how m...,1,0
1,Would be great to have the option to just like...,4,0
2,I wrote three matches and no one got my message!!,1,0
3,You guys blocked me for no reason at all... An...,1,0
4,"Absolutely useless, hardly anybody on it and t...",1,0
...,...,...,...
54854,Useless - I'm in the UK and it tells me i'm ov...,2,5
54855,I can't get past the initial set up. It won't...,1,11
54856,This is incredible! A quality dating app for A...,5,1
54857,"""Over Water"" ... Can't choose location.",2,8


# 🫂Merging Datasets

In deze stap hebben we drie datasets gecombineerd door eerst een nieuwe kolom toe te voegen die de naam van de app aangeeft, zodat de herkomst van de gegevens duidelijk is. Vervolgens hebben we ervoor gezorgd dat de kolomnamen consistent zijn tussen de datasets. Daarna hebben we de datasets samengevoegd tot één uniforme dataset en deze weergegeven om te controleren of de samenvoeging correct is uitgevoerd.

In [245]:
# Add an 'App' column to each dataset
tinder_reviews['App'] = 'Tinder'
bumble_reviews['App'] = 'Bumble'
hinge_reviews['App'] = 'Hinge'

In [246]:
# Align column names for consistency before merging
columns_to_keep = ['content', 'score', 'thumbsUpCount', 'App']
tinder_reviews = tinder_reviews[columns_to_keep]
bumble_reviews = bumble_reviews[columns_to_keep]
himge_reviews = hinge_reviews[columns_to_keep]

In [247]:
# Merge all datasets into a unified review dataset
all_reviews_combined = pd.concat([tinder_reviews, bumble_reviews, hinge_reviews], ignore_index=True)

In [248]:
# Display the DataFrame in a table format
display(all_reviews_combined)

Unnamed: 0,content,score,thumbsUpCount,App
0,Matches can be misleading. I keep getting matc...,2,0,Tinder
1,This app is not working. Whenever i got notifi...,1,0,Tinder
2,They want you to pay,1,0,Tinder
3,BANNED ME FOR NO REASON AND I DIDN'T EVEN GET ...,1,0,Tinder
4,Used to be good and easy to use... Every time ...,2,0,Tinder
...,...,...,...,...
703174,Useless - I'm in the UK and it tells me i'm ov...,2,5,Hinge
703175,I can't get past the initial set up. It won't...,1,11,Hinge
703176,This is incredible! A quality dating app for A...,5,1,Hinge
703177,"""Over Water"" ... Can't choose location.",2,8,Hinge


We kunnen zien dat de dataset succesvol is samengevoegd. Vervolgens gaan we verder met de volgende stap: het uitvoeren van een EDA (Exploratory Data Analysis) en het opschonen van de data voordat we synthetische data creëren.

# EDA (Exploratory Data Analysis) & Data Cleaning

In [249]:
#Dataset laden 10 eerste rijen 
all_reviews_combined.head()

Unnamed: 0,content,score,thumbsUpCount,App
0,Matches can be misleading. I keep getting matc...,2,0,Tinder
1,This app is not working. Whenever i got notifi...,1,0,Tinder
2,They want you to pay,1,0,Tinder
3,BANNED ME FOR NO REASON AND I DIDN'T EVEN GET ...,1,0,Tinder
4,Used to be good and easy to use... Every time ...,2,0,Tinder


In [250]:
# Dataset laden 10 laatste rijen 
all_reviews_combined.tail()

Unnamed: 0,content,score,thumbsUpCount,App
703174,Useless - I'm in the UK and it tells me i'm ov...,2,5,Hinge
703175,I can't get past the initial set up. It won't...,1,11,Hinge
703176,This is incredible! A quality dating app for A...,5,1,Hinge
703177,"""Over Water"" ... Can't choose location.",2,8,Hinge
703178,"My entire town counts as ""over water"" and I ca...",2,15,Hinge


In [251]:
all_reviews_combined.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 703179 entries, 0 to 703178
Data columns (total 4 columns):
 #   Column         Non-Null Count   Dtype 
---  ------         --------------   ----- 
 0   content        701755 non-null  object
 1   score          703127 non-null  object
 2   thumbsUpCount  702592 non-null  object
 3   App            703179 non-null  object
dtypes: object(4)
memory usage: 21.5+ MB


In [252]:
all_reviews_combined.describe()

Unnamed: 0,content,score,thumbsUpCount,App
count,701755,703127,702592,703179
unique,534901,430,1326,3
top,Good,1,0,Tinder
freq,14309,263702,573219,538450


In [253]:
# Visualize distributions of numeric columns
for column in all_reviews_combined.select_dtypes(include=['float64', 'int64']).columns:
    plt.figure()
    sns.histplot(data[column], kde=True)
    plt.title(f'Distribution of {column}')
    plt.show()

In [254]:
# Check for missing values
missing_values = all_reviews_combined.isnull().sum()
print("Missing values:\n", missing_values)

Missing values:
 content          1424
score              52
thumbsUpCount     587
App                 0
dtype: int64


# Syntetische Data crëeren

In [255]:
!pip install faker



In [256]:
# Create a Faker instance
fake = Faker()

# Number of samples in all_reviews_combined
num_samples = len(all_reviews_combined)

# Generate synthetic 'Name' and 'Date&Time' data for the dataset
all_reviews_combined['Name'] = [fake.name() for _ in range(num_samples)]

# Generate synthetic 'Date&Time' data restricted to 2023
all_reviews_combined['Date&Time'] = [
    fake.date_time_between_dates(
        datetime_start=pd.Timestamp('2023-01-01 00:00:00'),
        datetime_end=pd.Timestamp('2023-12-31 23:59:59')
    ).strftime('%d-%m-%Y %H:%M') for _ in range(num_samples)
]

# Display the updated dataset
display(all_reviews_combined)

Unnamed: 0,content,score,thumbsUpCount,App,Name,Date&Time
0,Matches can be misleading. I keep getting matc...,2,0,Tinder,Nicholas Mckee,25-07-2023 20:10
1,This app is not working. Whenever i got notifi...,1,0,Tinder,Lori Larson,24-01-2023 13:14
2,They want you to pay,1,0,Tinder,Elijah Rocha,10-08-2023 00:13
3,BANNED ME FOR NO REASON AND I DIDN'T EVEN GET ...,1,0,Tinder,John Butler,11-01-2023 16:46
4,Used to be good and easy to use... Every time ...,2,0,Tinder,Brian Villarreal,18-01-2023 18:53
...,...,...,...,...,...,...
703174,Useless - I'm in the UK and it tells me i'm ov...,2,5,Hinge,Natasha Bradley,17-11-2023 10:56
703175,I can't get past the initial set up. It won't...,1,11,Hinge,Omar Jones,24-03-2023 20:56
703176,This is incredible! A quality dating app for A...,5,1,Hinge,Jonathan Hale,09-03-2023 21:04
703177,"""Over Water"" ... Can't choose location.",2,8,Hinge,Paul Williams,10-02-2023 16:57


In [257]:
# Rename columns in all_reviews_combined to match the first dataset
all_reviews_combined = all_reviews_combined.rename(columns={
    'content': 'Review',
    'score': 'Rating',
    'thumbsUpCount': '#ThumbsUp',
    'Name': 'Name',
    'Date&Time': 'Date&Time',
    'App': 'App'
})

# Reorder the columns to match the first dataset
all_reviews_combined = all_reviews_combined[['Name', 'Review', 'Rating', '#ThumbsUp', 'Date&Time', 'App']]

# Display the updated DataFrame
display(all_reviews_combined)


Unnamed: 0,Name,Review,Rating,#ThumbsUp,Date&Time,App
0,Nicholas Mckee,Matches can be misleading. I keep getting matc...,2,0,25-07-2023 20:10,Tinder
1,Lori Larson,This app is not working. Whenever i got notifi...,1,0,24-01-2023 13:14,Tinder
2,Elijah Rocha,They want you to pay,1,0,10-08-2023 00:13,Tinder
3,John Butler,BANNED ME FOR NO REASON AND I DIDN'T EVEN GET ...,1,0,11-01-2023 16:46,Tinder
4,Brian Villarreal,Used to be good and easy to use... Every time ...,2,0,18-01-2023 18:53,Tinder
...,...,...,...,...,...,...
703174,Natasha Bradley,Useless - I'm in the UK and it tells me i'm ov...,2,5,17-11-2023 10:56,Hinge
703175,Omar Jones,I can't get past the initial set up. It won't...,1,11,24-03-2023 20:56,Hinge
703176,Jonathan Hale,This is incredible! A quality dating app for A...,5,1,09-03-2023 21:04,Hinge
703177,Paul Williams,"""Over Water"" ... Can't choose location.",2,8,10-02-2023 16:57,Hinge


# EDA & Cleaning Dataset

First of all you have to do cleaning and EDA in both datasets before continuing further: So first pre processing from the data (cleaning + EDA) for this dataset (all_reviews_combined) and then (cleaning + EDA) for the first one(dating_review) and then last EDA for the merged one

In [258]:
# Display the data types of each column
print(all_reviews_combined.dtypes)

Name         object
Review       object
Rating       object
#ThumbsUp    object
Date&Time    object
App          object
dtype: object


In [259]:
# Convert 'Rating' and '#ThumbsUp' columns to int64, handling non-numeric values as NaN
all_reviews_combined['Rating'] = pd.to_numeric(all_reviews_combined['Rating'], errors='coerce').astype('Int32')
all_reviews_combined['#ThumbsUp'] = pd.to_numeric(all_reviews_combined['#ThumbsUp'], errors='coerce').astype('Int32')

# Display the data types to confirm the change
print(all_reviews_combined.dtypes)


Name         object
Review       object
Rating        Int32
#ThumbsUp     Int32
Date&Time    object
App          object
dtype: object


In [260]:
all_reviews_combined

Unnamed: 0,Name,Review,Rating,#ThumbsUp,Date&Time,App
0,Nicholas Mckee,Matches can be misleading. I keep getting matc...,2,0,25-07-2023 20:10,Tinder
1,Lori Larson,This app is not working. Whenever i got notifi...,1,0,24-01-2023 13:14,Tinder
2,Elijah Rocha,They want you to pay,1,0,10-08-2023 00:13,Tinder
3,John Butler,BANNED ME FOR NO REASON AND I DIDN'T EVEN GET ...,1,0,11-01-2023 16:46,Tinder
4,Brian Villarreal,Used to be good and easy to use... Every time ...,2,0,18-01-2023 18:53,Tinder
...,...,...,...,...,...,...
703174,Natasha Bradley,Useless - I'm in the UK and it tells me i'm ov...,2,5,17-11-2023 10:56,Hinge
703175,Omar Jones,I can't get past the initial set up. It won't...,1,11,24-03-2023 20:56,Hinge
703176,Jonathan Hale,This is incredible! A quality dating app for A...,5,1,09-03-2023 21:04,Hinge
703177,Paul Williams,"""Over Water"" ... Can't choose location.",2,8,10-02-2023 16:57,Hinge


In [261]:
# Save the final merged dataset to a CSV file for review
all_reviews_combined.to_csv("all_reviews_combined.csv", index=False)
print("Merged dataset saved as 'all_reviews_combined.csv'")


Merged dataset saved as 'all_reviews_combined.csv'


# Bronnen

https://medium.com/@robdelacruz/sentiment-analysis-using-natural-language-processing-nlp-3c12b77a73ec
https://www.analyticsvidhya.com/blog/2021/06/nlp-sentiment-analysis/