# Influencer RFM Analysis

## Introduciton
In this notebook, we will be analyzing the RFM (Recency, Frequency, Monetary) of influencers. We will be using the RFM to segment influencers into different groups and analyze the characteristics of each group.

### 1. Import Libraries

In [98]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
import warnings
warnings.filterwarnings("ignore")

### 2. Importing datasets
NOTE: Only content, reaction type and reaction csv files are needed for this analysis. Rest of the files were used for preliminary findings.

In [99]:
# Import user.csv
user = pd.read_csv('../Datasets/User.csv')
user = user.iloc[:, 1:]

# Import content.csv
content = pd.read_csv('../Datasets/Content.csv')
content = content.iloc[:, 1:]

# Import reaction_types.csv
reaction_types = pd.read_csv('../Datasets/ReactionTypes.csv')
reaction_types = reaction_types.iloc[:, 1:]

# Import reaction.csv
reaction = pd.read_csv('../Datasets/Reactions.csv')
reaction = reaction.iloc[:, 1:]

# Import profile.csv
profile = pd.read_csv('../Datasets/Profile.csv')
profile = profile.iloc[:, 1:]

# Import session.csv
session = pd.read_csv('../Datasets/Session.csv')
session = session.iloc[:, 1:]

### 3. Data Cleaning
1. Dropping NA values
2. Renaming column names for easier understanding and merging
3. Removing duplicates in category column. Eg: Studying and studying are the same category.

In [100]:
reaction.rename(columns={'Type': 'Reaction Type'}, inplace=True)
reaction = reaction.dropna()

In [101]:
content.rename(columns={'Type': 'Content Type'}, inplace=True)
content = content.dropna()
content = content.drop('URL', axis=1)
content['Category'] = content['Category'].str.lower()

In [102]:
reaction_types.rename(columns={'Type': 'Reaction Type'}, inplace=True)

### 4. Data Modeling
Our analysis requires more information about the content produced by the influencers. Hence, we combine the reactions, reaction type and content dataframes.

#### Merging reaction and reaction types
It will give us the respective sentiment and score of each reaction 

In [103]:
df_reactions = reaction.merge(reaction_types,how='left',on='Reaction Type')

#### Merge reactions with content table
This will consolidate all the reactions for each content in one dataframe.

In [104]:
df_complete = df_reactions.merge(content[['Content ID','User ID', 'Content Type', 'Category']], how='right', on='Content ID')

df_complete.rename(columns={'User ID_x': 'Viewer User ID','User ID_y': 'Influencer User ID'}, inplace=True)
df_complete.rename(columns={'Datetime': 'Date'}, inplace=True)
df_complete = df_complete.reindex(columns=['Viewer User ID', 'Reaction Type', 'Sentiment', 'Score', 'Date','Content ID', 'Influencer User ID', 'Content Type', 'Category'])

# To make recency calculations feasible 
df_complete['Date'] = pd.to_datetime(df_complete['Date']).dt.date.astype('datetime64[ns]')

### 5. RFM Analysis

In [105]:
# Calculate recency: How recently did the influencer post
max_date = pd.to_datetime(df_complete['Date']).max()
df_complete['Recency'] = max_date - df_complete.groupby('Influencer User ID')['Date'].transform('max')

# Calculate frequency: how much content does the influencer produce
df_complete['Frequency'] = df_complete.groupby('Influencer User ID')['Content ID'].transform('count')

# Calculate monetary value: How much is the sentiment score of the influencer.
df_complete['Monetary'] = df_complete.groupby('Influencer User ID')['Score'].transform('sum')

# Calculate quartiles and assign scores inversely for recency (most recent gets highest score)
df_complete['R_score'] = pd.qcut(df_complete['Recency'], q=4, labels=False, duplicates='drop')

# Adjust the scores to have the highest score as the most recent dates
max_score = df_complete['R_score'].max()
df_complete['R_score'] = max_score - df_complete['R_score']  # Invert the scores

# Assign Frequency and Monetary scores
df_complete['F_score'] = pd.qcut(df_complete['Frequency'], q=4, labels=False)
df_complete['M_score'] = pd.qcut(df_complete['Monetary'], q=4, labels=False)

# Analyze RFM scores
rfm_analysis = df_complete.groupby(['R_score', 'F_score', 'M_score']).agg({'Influencer User ID': 'nunique'}).reset_index()
rfm_analysis = rfm_analysis.rename(columns={'Influencer User ID': 'Count'})

### 6. Ranking Influencers by RFM Score

In [106]:
new_df = df_complete.groupby(['Influencer User ID', 'Recency', 'Frequency', 'Monetary', 'R_score', 'F_score', 'M_score']).agg({'Influencer User ID': 'count'}).rename(columns={'Influencer User ID': 'Count'}).reset_index()
new_df = new_df.sort_values(by=['R_score', 'F_score', 'M_score'], ascending=False)
new_df.rename(columns={'Influencer User ID': 'User ID'}, inplace=True)
new_df.drop('Count', axis=1, inplace=True)

#### Get names of influencers

In [107]:
influencer_df = new_df.merge(user, how='left', on='User ID')

influencer_df.drop("Email", axis=1, inplace=True)
columns = influencer_df.columns.tolist()
columns.remove("Name")
columns.insert(1, "Name")
influencer_df = influencer_df[columns]

#### Storing the top 10 influencers separately

In [108]:
influencer_df_sorted = influencer_df.head(10).sort_values(by='Frequency', ascending=False)

In [109]:
influencer_df.to_csv('../output/influencer_df.csv', index=False)

### 7. Key insights
The top 10 influencers by RFM scores can be a source of momentum for content generation.

### 8. Strategic Recommendations 
Pertinent content categories production can be incentivized by:
1. identifying the content categories produced by the top influencers. 
2. Prioritizing the content categories by opportunity gaps found in the other analysis.

### 8. Feedback
From the discussions during the QnA session, we recognized that the RFM analysis provides us user personas and not only a strict ranking of influencers.

Segmenting influencers by RFM scores can provide more actionable insights.