## Market segmentation

This information was gathered as part of a market research study using followers of a well-known consumer brand's Twitter account, which we'll call "NutrientH20" for the sake of labeling it. In order to sharpen its messaging a little bit further, NutrientH20 wanted to gain a little bit more insight into its social media audience.

A little context on the data collection: a sample of the brand's Twitter followers were collected by the advertising company that manages NutrientH20's online marketing campaigns. Over the course of a week in June 2014, they gathered every tweet ("tweet") made by each of those followers. A human annotator hired by Amazon's Mechanical Turk service reviewed each post. 

Each of the 36 pre-defined categories represented a broad area of interest (e.g., politics, sports, family, etc.), and were used to categorize each tweet based on its content. A post could have multiple categories assigned to it by annotators. For illustration, a fictitious post like "I'm really excited to see grandpa go wreck shop in his geriatic soccer league this Sunday!" might be classified as both "family" and "sports." You see what I mean.

A random (anonymous, unique) 9-digit alphanumeric code assigned to each entry in social_marketing.csv designates a single user. The interests represented by each column are identified by labels at the top of the data file. The number of posts made by a specific user that fit the specified category is represented by the entries.

Not all posts will have accurate annotations. Some annotators may have merely fallen asleep at the wheel occasionally or perhaps often! As a result, some error and noise are unavoidable during the annotation process.

Let's assess this data and write a brief report for NutrientH20 that identifies any intriguing market segments that stand out in their social media audience. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df= pd.read_csv('social_marketing.csv')
df.rename(columns={'Unnamed: 0':'user'}, inplace=True)
df.head()

In [None]:
df.shape

In [None]:
df.columns

In [None]:
#Dropping the columns of chatter and uncategorized as they do not explain any useful user characteristics
df.drop(columns=['uncategorized'], axis=1, inplace=True)
df.drop(columns=['chatter'], axis=1, inplace=True)
df.head()

In [None]:
numeric_df = df.drop(columns=['user'])

# Get the column name with the maximum value for each row
df['topic'] = numeric_df.idxmax(axis=1)

df.head()

In [None]:
#Checking if there is any user who has 0 interests along all the column

features_df= df.drop(columns=['user','topic'],axis=1)
features_df.loc[(df==0).all(axis=1)]

In [None]:
# from sklearn.preprocessing import StandardScaler

# features_df = StandardScaler().fit_transform(features_df)

In [None]:
features_df.shape

In [None]:
from sklearn.decomposition import PCA
pca= PCA(n_components=25, random_state=41)
pcs= pca.fit_transform(features_df)


In [None]:
print('Explained variation per principal component: {}'.format(pca.explained_variance_ratio_))

In [None]:
# Plot the explained variances
features = range(pca.n_components_)
plt.bar(features, pca.explained_variance_ratio_, color='black')
plt.xlabel('PCA features')
plt.ylabel('variance %')
plt.xticks(features)
plt.show()

In [None]:
# Looks like PC1 to PC5 explain most of the variablility, lets calculate how much
print(pca.explained_variance_ratio_[:6].sum())
# It's ~70%, lets add some more so that we can explain around 85%

print(pca.explained_variance_ratio_[:13].sum())

# Let's choose PC1 through PC12

# Save components to a DataFrame
PCA_components = pd.DataFrame(pcs)
PCA_components=PCA_components.iloc[:, : 13]

PCA_components.head()

In [None]:
#Trying to find groups through scatter plots if evident clusters are found between principal components 1 to 5

# Create scatter plots between 5 columns
sns.set(style="ticks")
sns.pairplot(PCA_components[[0,1,2,3,4]], diag_kind='kde',corner=True)

plt.show()

In [None]:
# Could not observe anything significant, lets try to visualize the Principal components through tSNE

In [None]:
from sklearn.manifold import TSNE

tsne = TSNE(n_components=2, verbose=1, perplexity=30, n_iter=1000, learning_rate=200)
tsne_results = tsne.fit_transform(PCA_components)


In [None]:
plt.figure(figsize=(10,8))
plt.scatter(tsne_results[:, 0], tsne_results[:, 1])
plt.xticks([])
plt.yticks([])
plt.show()

In [None]:
#Let's create an interactive visualization using plotly based on the customer topics which we assumed earlier

import plotly.express as px
import random
# convert the t-SNE results to a DataFrame
tsne_df = pd.DataFrame(data = tsne_results, columns = ['Dim1', 'Dim2'])

# add the county names to this DataFrame
tsne_df['Topic'] = df['topic'].values

topic_counts = df['topic'].value_counts()


# Get the top 75% topics by count
top_topics = topic_counts.index[:round(len(df['topic'].unique())*0.75)]

# Create a dictionary to map topic to color
color_map = {topic: 'white' if topic not in top_topics else "#{:06x}".format(random.randint(0, 0xFFFFFF)) for topic in tsne_df['Topic'].unique() if topic != 'white'}

# Create an interactive plot and color the points based on the color map
fig = px.scatter(tsne_df, x='Dim1', y='Dim2', hover_data=['Topic'], color='Topic', color_discrete_map=color_map)

# Set the height of the plot to elongate it
fig.update_layout(height=800)

# Show the plot
fig.show()

Alhough we can get the most frequent topics just by value counts, these topics can be far apart in the Dim1-Dim2 plot

Just by the look of the plot, we can make the following observations about the user clusters

By looking at this plot, in the order of size, we can observe big clusters of 
 - Photo Sharing with Shopping and Current events
 - Health Nutrition with personal fitness
 - Cooking
 - College uni and online gaming
 - News, politics and travel

In [None]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

silhouette_scores = []
inertia_values = []

for k in range(2, 11):  # Start from 2 clusters as silhouette score requires at least 2 clusters
    kmeans = KMeans(n_clusters=k, init='k-means++', random_state=42)
    kmeans.fit(PCA_components)
    
    # Calculate silhouette score and inertia for each k
    silhouette_scores.append(silhouette_score(PCA_components, kmeans.labels_))
    inertia_values.append(kmeans.inertia_)

# Plot silhouette scores
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(range(2, 11), silhouette_scores, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Silhouette Score')
plt.title('Silhouette Score for K-means Clustering')

# Plot the elbow curve
plt.subplot(1, 2, 2)
plt.plot(range(2, 11), inertia_values, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Curve for K-means Clustering')

plt.tight_layout()
plt.show()

Based on the above plots, lets try to visualize the clusters based on 4 and 5 clusters respectively

In [None]:
# Perform K-means clustering with the optimal number of clusters with initializatio
kmeans = KMeans(n_clusters=4, init='k-means++', random_state=42)
cluster_labels = kmeans.fit_predict(PCA_components)

PCA_components_original= PCA_components.copy()

PCA_components['cluster'] = cluster_labels

In [None]:
#Merge the original dataframe with cluster information to analyze the clusters
concatenated_df = pd.concat([df, PCA_components[['cluster']]], axis=1)

#We do not need our user column and our topic column that we previously created
concatenated_df.drop(columns=['user','topic'], axis=1, inplace=True)

concatenated_df.head()

In [None]:
# Calculate mean values for each cluster to analyze prominance of certain variables in each cluster
cluster_analysis = concatenated_df.groupby('cluster').mean()  
cluster_analysis.head()

In [None]:
#Let's try to get the top 4 topics by per cluster based on the mean values along with the cluster sizes

cluster_info_dict = {}

# Iterate over each cluster
for cluster in cluster_analysis.index:
    # Sort the topics within the cluster based on their mean values
    sorted_topics = cluster_analysis.loc[cluster].sort_values(ascending=False)
    
    # Extract the top 3 topics (excluding 'cluster' column)
    top_topics = sorted_topics.index[0:4].tolist()  # Corrected index range

    # Extract corresponding mean values for those topics
    top_values = sorted_topics.values[0:4].tolist()
    
    # Calculate the cluster size (number of data points in the cluster)
    cluster_size = len(concatenated_df[concatenated_df['cluster'] == cluster])
    
    # Store the top topics and cluster size in the dictionary
    cluster_info_dict[cluster] = {'top_topics': top_topics, 'cluster_size': cluster_size, 'top_values':top_values}

# Display the top topics and cluster sizes for each cluster
for cluster, info in cluster_info_dict.items():
    top_topics = ', '.join(info['top_topics'])
    top_values = info['top_values']
    cluster_size = info['cluster_size']
    print(f"Cluster {cluster}: Top 4 Topics - {top_topics}, Corresponding mean values- {top_values}, Cluster Size: {cluster_size}")

In [None]:
# Plotting top topics per cluster 


# Create a grid of subplots
num_clusters = len(cluster_info_dict)
rows = num_clusters // 2  
cols = 2  
fig, axes = plt.subplots(rows, cols, figsize=(15, 10))  

if rows == 1:
    axes = [axes]

# Iterate over each cluster and its corresponding subplot
for i, (cluster, info) in enumerate(cluster_info_dict.items()):
    row = i // cols
    col = i % cols
    
    top_topics = info['top_topics']
    top_values = info['top_values']
    cluster_size = info['cluster_size']
    
    # Plot the top topics for the current cluster on the corresponding subplot
    axes[row][col].barh(top_topics, top_values, color='skyblue')
    axes[row][col].set_xlabel('Mean Values')
    axes[row][col].set_ylabel('Topics')
    axes[row][col].set_title(f'Cluster {cluster}')
    
    # Display the cluster size as a text annotation
    axes[row][col].annotate(f'Cluster Size: {cluster_size}', xy=(0.5, 0.02),
                            xycoords='axes fraction', ha='center', fontsize=10)
    
# Adjust layout for better spacing between subplots
plt.tight_layout()
plt.show()

This cluster analysis revealed distinct topic preferences and strengths within different clusters.

Cluster 0: Centered on "college_uni" and "online_gaming," with a focus on "photo_sharing" and "sports_playing," reflecting an interest in education, gaming, and sports.

Cluster 1: Emphasizes "photo_sharing" and "politics," while also showing interest in "sports_fandom" and "travel," suggesting engagement in current affairs, sports, and travel.
`
Cluster 2: Primarily revolves around "health_nutrition" and "personal_fitness," accompanied by interest in "cooking" and "photo_sharing," signifying health-conscious behaviors and culinary interests.

Cluster 3: Showcases a blend of "cooking," "photo_sharing," "fashion," and "beauty," revealing an affinity for cooking, fashion trends, and beauty-related content.

In [None]:
# Perform clustering with the optimal number of clusters with k-means++ initialization
kmeans = KMeans(n_clusters=5, init='k-means++', random_state=42)

PCA_components = PCA_components_original.copy()
cluster_labels = kmeans.fit_predict(PCA_components)



PCA_components['cluster'] = cluster_labels

#Merge the original dataframe with cluster information to analyze the clusters
concatenated_df = pd.concat([df, PCA_components[['cluster']]], axis=1)

#We do not need our user column and our topic column that we previously created
concatenated_df.drop(columns=['user','topic'], axis=1, inplace=True)

cluster_analysis = concatenated_df.groupby('cluster').mean()  # Calculate mean values for each cluster

cluster_info_dict = {}

# Iterate over each cluster
for cluster in cluster_analysis.index:
    # Sort the topics within the cluster based on their mean values
    sorted_topics = cluster_analysis.loc[cluster].sort_values(ascending=False)
    
    # Extract the top 3 topics (excluding 'cluster' column)
    top_topics = sorted_topics.index[0:4].tolist()  # Corrected index range

    # Extract corresponding mean values for those topics
    top_values = sorted_topics.values[0:4].tolist()
    
    # Calculate the cluster size (number of data points in the cluster)
    cluster_size = len(concatenated_df[concatenated_df['cluster'] == cluster])
    
    # Store the top topics and cluster size in the dictionary
    cluster_info_dict[cluster] = {'top_topics': top_topics, 'cluster_size': cluster_size, 'top_values':top_values}

# Display the top topics and cluster sizes for each cluster
for cluster, info in cluster_info_dict.items():
    top_topics = ', '.join(info['top_topics'])
    top_values = info['top_values']
    cluster_size = info['cluster_size']
    print(f"Cluster {cluster}: Top 4 Topics - {top_topics}, Corresponding mean values- {top_values}, Cluster Size: {cluster_size}")

In [None]:
# Plotting top topics per cluster 


# Create a grid of subplots
num_clusters = len(cluster_info_dict)
rows = num_clusters // 2  
cols = 2  
fig, axes = plt.subplots(rows, cols, figsize=(15, 10))  

if rows == 1:
    axes = [axes]

# Iterate over each cluster and its corresponding subplot
for i, (cluster, info) in enumerate(cluster_info_dict.items()):
    row = i // cols
    col = i % cols
    
    top_topics = info['top_topics']
    top_values = info['top_values']
    cluster_size = info['cluster_size']
    
    # Plot the top topics for the current cluster on the corresponding subplot
    axes[row][col].barh(top_topics, top_values, color='skyblue')
    axes[row][col].set_xlabel('Mean Values')
    axes[row][col].set_ylabel('Topics')
    axes[row][col].set_title(f'Cluster {cluster}')
    
    # Display the cluster size as a text annotation
    axes[row][col].annotate(f'Cluster Size: {cluster_size}', xy=(0.5, 0.02),
                            xycoords='axes fraction', ha='center', fontsize=10)
    
# Adjust layout for better spacing between subplots
plt.tight_layout()
plt.show()

This cluster analysis provides insights into diverse topic preferences and strengths within distinct clusters.

Cluster 0: Focused on "cooking," "photo_sharing," "fashion," and "beauty," showcasing a combination of interests in culinary arts and style.

Cluster 1: Highlights "health_nutrition" and "personal_fitness," coupled with "cooking" and "photo_sharing," indicating a health-conscious segment with culinary inclinations.

Cluster 2: Centers on "college_uni" and "online_gaming," accompanied by "photo_sharing" and "sports_playing," representing an audience engaged in education, gaming, and sports.

Cluster 3: Emphasizes "photo_sharing," "sports_fandom," "current_events," and "shopping," suggesting an interest in photography, sports, current affairs, and shopping.

Cluster 4: Showcases "politics," "travel," "news," and "photo_sharing," portraying an engaged group interested in politics, travel, and current events.

In [None]:
from sklearn.cluster import SpectralClustering
from sklearn.metrics import silhouette_score
import numpy as np
import matplotlib.pyplot as plt

PCA_components = PCA_components_original.copy()

silhouette_scores = []
inertia_values = []

for k in range(2, 11):  # Start from 2 clusters as silhouette score requires at least 2 clusters
    spectral_clustering = SpectralClustering(n_clusters=k, random_state=42, affinity='nearest_neighbors')
    labels = spectral_clustering.fit_predict(PCA_components)
    
    # Calculate silhouette score
    silhouette_scores.append(silhouette_score(PCA_components, labels))
    
    # Inertia is not applicable for spectral clustering, but you can use a placeholder value
    inertia_values.append(0.0)

# Plot silhouette scores
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(range(2, 11), silhouette_scores, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Silhouette Score')
plt.title('Silhouette Score for Spectral Clustering')

# Plot the elbow curve (use placeholder values)
plt.subplot(1, 2, 2)
plt.plot(range(2, 11), inertia_values, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Inertia')
plt.title('Elbow Curve for Spectral Clustering')

plt.tight_layout()
plt.show()

From this Silhoutte plot we will try the optimal number of clusters as 4 and 5

In [None]:
from sklearn.cluster import SpectralClustering


# Specify the number of clusters for spectral clustering
n_clusters = 4

# Initialize SpectralClustering
spectral_clustering = SpectralClustering(n_clusters=n_clusters, random_state=42, affinity='nearest_neighbors')

# Perform spectral clustering and get cluster labels
cluster_labels = spectral_clustering.fit_predict(PCA_components_original)

# Copy PCA_components_original to avoid modifying the original data
PCA_components = PCA_components_original.copy()

# Add cluster labels to PCA_components DataFrame
PCA_components['cluster'] = cluster_labels

# Merge the original dataframe with cluster information to analyze the clusters
concatenated_df = pd.concat([df, PCA_components[['cluster']]], axis=1)

# Drop unnecessary columns
concatenated_df.drop(columns=['user', 'topic'], axis=1, inplace=True)

# Calculate mean values for each cluster
cluster_analysis = concatenated_df.groupby('cluster').mean()

# Create a dictionary to store cluster information
cluster_info_dict = {}

# Iterate over each cluster
for cluster in cluster_analysis.index:
    # Sort the topics within the cluster based on their mean values
    sorted_topics = cluster_analysis.loc[cluster].sort_values(ascending=False)
    
    # Extract the top 4 topics (excluding 'cluster' column)
    top_topics = sorted_topics.index[0:4].tolist()

    # Extract corresponding mean values for those topics
    top_values = sorted_topics.values[0:4].tolist()
    
    # Calculate the cluster size (number of data points in the cluster)
    cluster_size = len(concatenated_df[concatenated_df['cluster'] == cluster])
    
    # Store the top topics and cluster size in the dictionary
    cluster_info_dict[cluster] = {'top_topics': top_topics, 'cluster_size': cluster_size, 'top_values': top_values}

# Display the top topics and cluster sizes for each cluster
for cluster, info in cluster_info_dict.items():
    top_topics = ', '.join(info['top_topics'])
    top_values = info['top_values']
    cluster_size = info['cluster_size']
    print(f"Cluster {cluster}: Top 4 Topics - {top_topics}, Corresponding mean values - {top_values}, Cluster Size: {cluster_size}")

In [None]:
# Plotting top topics per cluster 


# Create a grid of subplots
num_clusters = len(cluster_info_dict)
rows = num_clusters // 2  
cols = 2  
fig, axes = plt.subplots(rows, cols, figsize=(15, 10))  

if rows == 1:
    axes = [axes]

# Iterate over each cluster and its corresponding subplot
for i, (cluster, info) in enumerate(cluster_info_dict.items()):
    row = i // cols
    col = i % cols
    
    top_topics = info['top_topics']
    top_values = info['top_values']
    cluster_size = info['cluster_size']
    
    # Plot the top topics for the current cluster on the corresponding subplot
    axes[row][col].barh(top_topics, top_values, color='skyblue')
    axes[row][col].set_xlabel('Mean Values')
    axes[row][col].set_ylabel('Topics')
    axes[row][col].set_title(f'Cluster {cluster}')
    
    # Display the cluster size as a text annotation
    axes[row][col].annotate(f'Cluster Size: {cluster_size}', xy=(0.5, 0.02),
                            xycoords='axes fraction', ha='center', fontsize=10)
    
# Adjust layout for better spacing between subplots
plt.tight_layout()
plt.show()

This cluster analysis uncovers significant topic preferences within distinct clusters.

Cluster 0: Focused on "college_uni," "online_gaming," "photo_sharing," and "sports_playing," indicating an audience engaged in education, gaming, and sports-related activities.

Cluster 1: Highlights "health_nutrition," "personal_fitness," "cooking," and "photo_sharing," suggesting a health-conscious segment with culinary interests.

Cluster 2: Centers around "photo_sharing," "sports_fandom," "cooking," and "current_events," depicting an audience enthusiastic about photography, sports, and current affairs.

Cluster 3: Emphasizes "politics," "travel," "news," and "computers," revealing a tech-savvy group interested in politics, travel, and technology.

In [None]:
from sklearn.cluster import SpectralClustering


# Specify the number of clusters for spectral clustering
n_clusters = 5

# Initialize SpectralClustering
spectral_clustering = SpectralClustering(n_clusters=n_clusters, random_state=42, affinity='nearest_neighbors')

# Perform spectral clustering and get cluster labels
cluster_labels = spectral_clustering.fit_predict(PCA_components_original)

# Copy PCA_components_original to avoid modifying the original data
PCA_components = PCA_components_original.copy()

# Add cluster labels to PCA_components DataFrame
PCA_components['cluster'] = cluster_labels

# Merge the original dataframe with cluster information to analyze the clusters
concatenated_df = pd.concat([df, PCA_components[['cluster']]], axis=1)

# Drop unnecessary columns
concatenated_df.drop(columns=['user', 'topic'], axis=1, inplace=True)

# Calculate mean values for each cluster
cluster_analysis = concatenated_df.groupby('cluster').mean()

# Create a dictionary to store cluster information
cluster_info_dict = {}

# Iterate over each cluster
for cluster in cluster_analysis.index:
    # Sort the topics within the cluster based on their mean values
    sorted_topics = cluster_analysis.loc[cluster].sort_values(ascending=False)
    
    # Extract the top 4 topics (excluding 'cluster' column)
    top_topics = sorted_topics.index[0:4].tolist()

    # Extract corresponding mean values for those topics
    top_values = sorted_topics.values[0:4].tolist()
    
    # Calculate the cluster size (number of data points in the cluster)
    cluster_size = len(concatenated_df[concatenated_df['cluster'] == cluster])
    
    # Store the top topics and cluster size in the dictionary
    cluster_info_dict[cluster] = {'top_topics': top_topics, 'cluster_size': cluster_size, 'top_values': top_values}

# Display the top topics and cluster sizes for each cluster
for cluster, info in cluster_info_dict.items():
    top_topics = ', '.join(info['top_topics'])
    top_values = info['top_values']
    cluster_size = info['cluster_size']
    print(f"Cluster {cluster}: Top 4 Topics - {top_topics}, Corresponding mean values - {top_values}, Cluster Size: {cluster_size}")

In [None]:
# Plotting top topics per cluster 

# Create a grid of subplots
num_clusters = len(cluster_info_dict)
rows = num_clusters // 2  
cols = 2  
fig, axes = plt.subplots(rows, cols, figsize=(15, 10))  

if rows == 1:
    axes = [axes]

# Iterate over each cluster and its corresponding subplot
for i, (cluster, info) in enumerate(cluster_info_dict.items()):
    row = i // cols
    col = i % cols
    
    top_topics = info['top_topics']
    top_values = info['top_values']
    cluster_size = info['cluster_size']
    
    # Plot the top topics for the current cluster on the corresponding subplot
    axes[row][col].barh(top_topics, top_values, color='skyblue')
    axes[row][col].set_xlabel('Mean Values')
    axes[row][col].set_ylabel('Topics')
    axes[row][col].set_title(f'Cluster {cluster}')
    
    # Display the cluster size as a text annotation
    axes[row][col].annotate(f'Cluster Size: {cluster_size}', xy=(0.5, 0.02),
                            xycoords='axes fraction', ha='center', fontsize=10)
    
# Adjust layout for better spacing between subplots
plt.tight_layout()
plt.show()

Cluster 0: Centered on "health_nutrition," "personal_fitness," "cooking," and "photo_sharing," this group exhibits a health-conscious focus with culinary and wellness interests.

Cluster 1: Emphasizes "photo_sharing," "sports_fandom," "current_events," and "shopping," indicating an engagement with photography, sports, current affairs, and shopping.

Cluster 2: Highlights "college_uni," "online_gaming," "photo_sharing," and "sports_playing," suggesting an audience interested in education, online gaming, and sports-related content.

Cluster 3: Focused on "politics," "travel," "news," and "computers," this cluster indicates an engagement with political news, travel content, and technology.

Cluster 4: Revolves around "cooking," "photo_sharing," "fashion," and "beauty," portraying interests in culinary arts, photography, fashion, and beauty.

**Conclusion**

In this analysis, we employed spectral clustering to segment a dataset based on user preferences across different topics. The dataset was first transformed using Principal Component Analysis (PCA) for dimension reduction. Subsequently, K-means and SpectralClustering was employed with varying cluster counts to identify distinctive user clusters. The resulting clusters reveal meaningful insights that can guide business strategies for NutrientH20.

Across all our analyses, clusters exhibit distinct interests and corresponding mean values, enabling targeted business strategies:

Fitness Focussed: Centers around health, fitness, cooking, and photo sharing. This presents opportunities for health and wellness brands, recipe platforms, and fitness apps to engage this health-conscious group.

Social Media Focused: Shows affinity for social sharing, politics, and travel. Tailored content creation and partnerships with travel and news platforms can effectively resonate with this socially and politically engaged segment.

Educational and Sports focused: Displays interests in education, gaming, and sports. Businesses in the education sector and sports industry can capitalize on this cluster by offering relevant content, courses, and gaming experiences.

Lifestyle Focused: Prioritizes lifestyle topics like fashion, beauty, and shopping. E-commerce platforms, beauty brands, and fashion retailers can optimize marketing to captivate this style-conscious audience.


Business Implications:
The clustering analysis provides valuable insights for tailored marketing strategies, content creation, and product offerings for NutrientH20. They can personalize user experiences, targeting each cluster with relevant content and promotions. For instance, health and wellness brands can collaborate with influencers from 'Fitness Focused' cluster to amplify their message. E-commerce platforms can curate product recommendations, and News agencies can tailor news updates for 'Social Media Focused' cluster , while educational institutions can engage 'Educational and Sports focused' with suitable offerings. Fashion brands can collaborate with 'Lifestyle Focused' enthusiasts. By understanding and leveraging these user clusters, NutrientH20 can enhance customer engagement and drive growth by delivering precisely what their users are interested in.