<h1> Unsupervised Analysis Project </h1>

The following analysis will be performed in order to generate actionable recommendations for the marketing department as an strategy to increase and retain the total amount of active users after analyzing the data of user's downloading behavior, and giving them insights to target consumers more efficiently. <br>

Our main insights are further explained through the analysis, and a summary is provided below according to the clusters that were created:<br>

- The first cluster should be targeted with adverting related to the company's app on social media, entertaining, and gaming apps as well as on Facebook website.
- For the second cluster they should focus more on advertising apps on Facebook and Youtube since the majority of the users are millenials.
- The third cluster should be targeted via Youtube website as well as on gaming and social media apps. 
- For the fourth cluster the marketing departing should advertise heavily on free apps related to music and gaming.

In [None]:
##########################################
# importing packages
##########################################
import numpy             as np                   # mathematical essentials
import pandas            as pd                   # data science essentials
import matplotlib.pyplot as plt                  # fundamental data visualization
import seaborn           as sns                  # enhanced visualization

# packages for unsupervised learning
from sklearn.preprocessing   import StandardScaler      # standard scaler
from sklearn.decomposition   import PCA                 # pca
from scipy.cluster.hierarchy import dendrogram, linkage # dendrograms
from sklearn.cluster         import KMeans              # k-means clustering

##########################################
# loading data and setting display options
##########################################
# loading data
app_df = pd.read_excel('./Mobile_App_Survey_Data.xlsx')

# setting print options
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
pd.set_option('display.max_colwidth', 100)

#app_df.head(n = 5)

# analyzing the dataframe
#app_df.describe().round(decimals = 2)

In [None]:
##########################################
# loading user defined functions
##########################################

########################
# scree_plot           #
########################
def scree_plot(pca_object, export = False):
    """
    Visualizes a scree plot from a pca object.
    
    PARAMETERS
    ----------
    pca_object | A fitted pca object
    export     | Set to True if you would like to save the scree plot to the
               | current working directory (default: False)
    """
    # building a scree plot

    # setting plot size
    fig, ax = plt.subplots(figsize=(10, 8))
    features = range(pca_object.n_components_)


    # developing a scree plot
    plt.plot(features,
             pca_object.explained_variance_ratio_,
             linewidth = 2,
             marker = 'o',
             markersize = 10,
             markeredgecolor = 'black',
             markerfacecolor = 'grey')


    # setting more plot options
    plt.title('Scree Plot')
    plt.xlabel('PCA feature')
    plt.ylabel('Explained Variance')
    plt.xticks(features)

    if export == True:
    
        # exporting the plot
        plt.savefig('./__analysis_images/top_customers_correlation_scree_plot.png')
        
    # displaying the plot
    plt.show()


########################
# unsupervised_scaler  #
########################
def unsupervised_scaler(df):
    """
    Standardizes a dataset (mean = 0, variance = 1). Returns a new DataFrame.
    Requires sklearn.preprocessing.StandardScaler()
    
    PARAMETERS
    ----------
    df     | DataFrame to be used for scaling
    """

    # INSTANTIATING a StandardScaler() object
    scaler = StandardScaler()


    # FITTING the scaler with the data
    scaler.fit(df)


    # TRANSFORMING our data after fit
    x_scaled = scaler.transform(df)

    
    # converting scaled data into a DataFrame
    new_df = pd.DataFrame(x_scaled)


    # reattaching column names
    new_df.columns = df.columns
    
    return new_df

<h1>Creating Data Frames</h1>

The principal component analysis, should only include psychometric features, therefore a new DataFrame will be created including the results from questions 24, 25 and 26. In addition, two more DataFrame will be created to group demographic and usage behavior features , as later demographic and behavior data will be used to compare results.

In [None]:
# creating dataframes for each psychometric features
app_behavior_24 = pd.DataFrame(app_df.loc[ : , 'q24r1':'q24r12' ])
app_behavior_25 = pd.DataFrame(app_df.loc[ : , 'q25r1':'q25r12' ])
app_behavior_26 = pd.DataFrame(app_df.loc[ : , 'q26r18':'q26r17'])


# creating dataframes for demographic features
app_demo = pd.concat([app_df.loc[ : , 'q1'], app_df.loc[ : , 'q48':'q57']], 
                     axis = 1)


# creating dataframes for behavior features
app_usage = pd.DataFrame(app_df.loc[ : ,'q2r1':'q13r12'])


<h1>Analyzing psychometric data from question 24</h1>

We transposed the psychometric DataFrames and scaled them since we want to make sure that people that have really strong and different opinions from the average are statistically treated equally in the analysis.

In [None]:
# first transposing for question 24 DataFrame
app_behavior_24_transposed = app_behavior_24.transpose()

# first applying the unsupervised_scaler function to question 24 DataFrame
app_scaled_24_tran = unsupervised_scaler(df = app_behavior_24_transposed)


# second transposing for question 24 DataFrame
app_24_tech = app_scaled_24_tran.transpose()

# naming columns as orignal features
app_24_tech.columns = app_behavior_24.columns

# second applying the unsupervised_scaler function to question 24 DataFrame
app_24_tech_scaled = unsupervised_scaler(df = app_24_tech)

<h2> Principal Component Analysis </h2>

In [None]:
# Instantiating a PCA object with no limit to principal components
pca_24 = PCA(n_components = None,
             random_state = 219)

# Fitting and transforming the scaled data
app_pca_24 = pca_24.fit_transform(app_24_tech_scaled)

# comparing dimensions of each DataFrame
#print("Original shape:", app_24_tech_scaled.shape)
#print("PCA shape     :", app_pca_24.shape)

In [None]:
# component number counter
component_number    = 0
cumulative_variance = 0

# looping over each principal component
for variance in pca_24.explained_variance_ratio_:
    component_number    += 1
    cumulative_variance += variance

#    print(f"""
#PC:                  {component_number}
#Percentage Variance: {variance.round(3)}
#Cumulative Variance: {cumulative_variance.round(3)}""") 

<h2> Optimal number of PCA </h2>

With a visual analysis, we determined the optimal number of PCAs comparing two visualizations.

In [None]:
#calling scree_plot function
scree_plot(pca_object = pca_24, 
           export     = False)

In [None]:
# Instantiating new model using optimal number of principal components
pca_24_2 = PCA(n_components = 2,
               random_state = 219)

# Fitting and transforming question 24 scaled data
app_pca_24_2 = pca_24_2.fit_transform(app_24_tech_scaled)

# calling scree_plot function
scree_plot(pca_object = pca_24_2,
           export     = False)

In [None]:
# component number counter
component_number    = 0
cumulative_variance = 0

# looping over each principal component
for variance in pca_24_2.explained_variance_ratio_:
    component_number    += 1
    cumulative_variance += variance
    
    #checking variances
    print(f"""
PC:                  {component_number}
Percentage Variance: {variance.round(3)}
Cumulative Variance: {cumulative_variance.round(3)}""") 

<h2> Interpretation and Persona Development </h2>

There are several groups classified in the survey that gather data to decide which customers to target according to user behavior. We created groups of tech savvy and not tech savvy people customers, leaders, optimistic, groups that avoid luxury brands and freemium users. <br>

- For tech savvy, since they already enjoy technology and being updated with new apps, therefore the marketing department should focus their strategies on these type of persona, and take advantage of their technology usage behavior.<br>

- For the not tech savvy, we separated them according to their skepticism to the use of technology and their preference of not using social media platforms.

In [None]:
# transposing optimal number pca components
factor_loadings_24_2 = pd.DataFrame(np.transpose(pca_24_2.components_))

# naming rows as original features
factor_loadings_24_2 = factor_loadings_24_2.set_index(app_24_tech_scaled.columns)

In [None]:
# naming principal components
factor_loadings_24_2.columns = ['Technology savvy', # embraces technology
                                'Not technology savvy'] # avoid  technology

# checking the result
factor_loadings_24_2.round(decimals = 2)

# saving to Excel for analysis
#factor_loadings_24_2.to_excel('factor_loadings_24.xlsx')

<h1>Analyzing psychometric data from question 25</h1>

In [None]:
# first transposing for question 25 DataFrame
app_behavior_25_transposed = app_behavior_25.transpose()

# first applying the unsupervised_scaler function to question 25 DataFrame
app_scaled_25_tran = unsupervised_scaler(df = app_behavior_25_transposed)


# second transposing for question 25 DataFrame
app_25_tech = app_scaled_25_tran.transpose()

# naming columns as orignal features
app_25_tech.columns = app_behavior_25.columns

# second applying the unsupervised_scaler function to question 25 DataFrame
app_25_tech_scaled = unsupervised_scaler(df = app_25_tech)

<h2> Principal Component Analysis </h2>

In [None]:
# Instantiating a PCA object with no limit to principal components
pca_25 = PCA(n_components = None,
            random_state = 219)

# Fitting and transforming the scaled data
app_pca_25 = pca_25.fit_transform(app_25_tech_scaled)

# comparing dimensions of each DataFrame
#print("Original shape:", app_25_tech_scaled.shape)
#print("PCA shape     :", app_pca_25.shape)

In [None]:
# component number counter
component_number    = 0
cumulative_variance = 0

# looping over each principal component
for variance in pca_25.explained_variance_ratio_:
    component_number    += 1
    cumulative_variance += variance
    
#    print(f"""
#PC:                  {component_number}
#Percentage Variance: {variance.round(3)}
#Cumulative Variance: {cumulative_variance.round(3)}""") 

<h2> Optimal number of PCA </h2>

In [None]:
#calling scree_plot function
scree_plot(pca_object = pca_25,
           export     = False)

In [None]:
# Instantiating new model using optimal number of principal components
pca_25_2 = PCA(n_components = 2,
               random_state = 219)

# Fitting and transforming question 25 scaled data
app_pca_25_2 = pca_25_2.fit_transform(app_25_tech_scaled)


# calling scree_plot function
scree_plot(pca_object = pca_25_2,
           export     = False)

In [None]:
# component number counter
component_number    = 0
cumulative_variance = 0

# looping over each principal component
for variance in pca_25_2.explained_variance_ratio_:
    component_number    += 1
    cumulative_variance += variance
    
    #checking variances
    print(f"""
PC:                  {component_number}
Percentage Variance: {variance.round(3)}
Cumulative Variance: {cumulative_variance.round(3)}""") 

<h2> Interpretation and Persona Development </h2>

- For the leader persona, they tend to be more competitive than the average people and their behavior is not influenced by other people's opinions. In addition, when it comes to purchase decisions, they are more inclined towards following their inner instinct and knowledge. They may perhaps look like a more difficult kind of customer to approach.<br>

- On the other hand, we have optimistic people, which may tend to be more willing to accept deals and buy new products given their stronger condescending nature. Usually, they also are the ones who are more brand loyal, because they would rather follow somebody else's higher competency in a field (like a big company's way of doing business) than by doing some research by themselves; for these reasons they may be an easier target for new products entering the market.<br>

In [None]:
# transposing optimal number pca components
factor_loadings_25_2 = pd.DataFrame(np.transpose(pca_25_2.components_))

# naming rows as original features
factor_loadings_25_2 = factor_loadings_25_2.set_index(app_25_tech_scaled.columns)

In [None]:
# naming principal components
factor_loadings_25_2.columns = ['Leader',
                                'Optimistic']

# checking the result
factor_loadings_25_2.round(decimals = 2)

# saving to Excel
#factor_loadings_25_2.to_excel('factor_loadings_25.xlsx')

<h1>Analyzing psychometric data from question 26</h1>

In [None]:
# first transposing for question 26 DataFrame
app_behavior_26_transposed = app_behavior_26.transpose()

# first applying the unsupervised_scaler function to question 26 DataFrame
app_scaled_26_tran = unsupervised_scaler(df = app_behavior_26_transposed)


# second transposing for question 25 DataFrame
app_26_tech = app_scaled_26_tran.transpose()

# naming columns as orignal features
app_26_tech.columns = app_behavior_26.columns

# second applying the unsupervised_scaler function to question 26 DataFrame
app_26_tech_scaled = unsupervised_scaler(df = app_26_tech)

## Principal Component Analysis and Scaling 

In [None]:
# Instantiating a PCA object with no limit to principal components
pca_26 = PCA(n_components = None,
            random_state = 219)

# Fitting and transforming the scaled data
app_pca_26 = pca_26.fit_transform(app_26_tech_scaled)

# comparing dimensions of each DataFrame
print("Original shape:", app_26_tech_scaled.shape)
print("PCA shape     :", app_pca_26.shape)

In [None]:
# component number counter
component_number = 0

# looping over each principal component
for variance in pca_26.explained_variance_ratio_:
    component_number += 1
    
    print(f"PC {component_number}: {variance.round(3)}")

In [None]:
# component number counter
component_number    = 0
cumulative_variance = 0

# looping over each principal component
for variance in pca_26.explained_variance_ratio_:
    component_number    += 1
    cumulative_variance += variance
#    print(f"""
#PC:                  {component_number}
#Percentage Variance: {variance.round(3)}
#Cumulative Variance: {cumulative_variance.round(3)}""") 

### Evaluation of PCA

In [None]:
scree_plot (pca_object  = pca_26,
            export = False)

In [None]:
# transposing pca components
factor_loadings_df_26 = pd.DataFrame(np.transpose(pca_26.components_.round(decimals = 2)))

# naming rows as original features
factor_loadings_df_26 = factor_loadings_df_26.set_index(app_26_tech_scaled.columns)

# saving to Excel
#factor_loadings_df_26.to_excel('app_factor_loadings_26.xlsx')

#factor_loadings_df_26

In [None]:
# Instantiating a new model using the first two principal components
pca_26_2 = PCA(n_components = 2,
            random_state = 219)


# Fitting and transforming the app scaled
app_pca_26_2 = pca_26_2.fit_transform(app_26_tech_scaled)


# calling the scree_plot function
scree_plot(pca_object = pca_26_2,
           export     = False)

In [None]:
# setting plot size
#fig, ax = plt.subplots(figsize = (12, 3))


# developing a PC to feature heatmap
#sns.heatmap(pca_26_2.components_, 
#            cmap = 'coolwarm',
#            square = True,
#            annot = True,
#            linewidths = 0.1,
#            linecolor = 'black')


# setting more plot options
#plt.yticks([0, 1],
#           ["PC 1", "PC 2"])

#plt.xticks(range(0, 16),
#           app_26_tech_scaled.columns,
#           rotation=60,
#           ha='left')

#plt.xlabel(xlabel = "Feature")
#plt.ylabel(ylabel = "Principal Component")


# displaying the plot
#plt.show()

In [None]:
# component number counter
component_number    = 0
cumulative_variance = 0

# looping over each principal component
for variance in pca_26_2.explained_variance_ratio_:
    component_number    += 1
    cumulative_variance += variance
    print(f"""
PC:                  {component_number}
Percentage Variance: {variance.round(3)}
Cumulative Variance: {cumulative_variance.round(3)}""") 

<h2> Interpretation and Persona Development </h2>

- For the third psychometric variable, we classified the first persona as "avoids luxury brands" because this would be a person that prefers to buy more affordable brands and abstain from luxury brands. This target group could be more difficult to reach since they usually don't spend that much money.

- The second persona was classified as a Freemium user because they lean toward discounts and prefer not to spend a lot of money on apps. They don't download that many apps and they believe it's not worth it to spend money to get better app features

In [None]:
# transposing pca components (pc = 2)
factor_loadings_26_2 = pd.DataFrame(np.transpose(pca_26_2.components_))


# naming rows as original features
factor_loadings_26_2 = factor_loadings_26_2.set_index(app_26_tech_scaled.columns)


In [None]:
# naming each principal component
factor_loadings_26_2.columns = ['Avoids luxury brands',
                                'Freemium user']

factor_loadings_26_2.round(decimals = 2)

In [None]:
# saving to Excel
#factor_loadings_26_2.to_excel('factor_loadings_26.xlsx')

Analysis of the factor loading for each customer to develop a strategy

In [None]:
# analyzing factor strengths per customer
factor_load24 = pca_24_2.transform(app_24_tech_scaled)

# converting to a DataFrame
factor_load_df_24 = pd.DataFrame(factor_load24)

# renaming columns
factor_load_df_24.columns = factor_loadings_24_2.columns

factor_load_df_24.head(n=10)

In [None]:
# analyzing factor strengths per customer
factor_load25 = pca_25_2.transform(app_25_tech_scaled)

# converting to a DataFrame
factor_load_df_25 = pd.DataFrame(factor_load25)

# renaming columns
factor_load_df_25.columns = factor_loadings_25_2.columns

factor_load_df_25.head(n=15)

In [None]:
# analyzing factor strengths per customer
factor_load26 = pca_26_2.transform(app_26_tech_scaled)

# converting to a DataFrame
factor_load_df_26 = pd.DataFrame(factor_load26)

# renaming columns
factor_load_df_26.columns = factor_loadings_26_2.columns

factor_load_df_26.head(n=15)

In [None]:
# exploring customers in the Technology savvy persona
len(factor_load_df_24['Technology savvy'][factor_load_df_24['Technology savvy'] > 1.0])

In [None]:
# exploring customers in the Technology savvy persona
len(factor_load_df_24['Not technology savvy'][factor_load_df_24['Not technology savvy'] > 1.0])

In [None]:
# exploring customers in the Leader persona
len(factor_load_df_25['Leader'][factor_load_df_25['Leader'] > 1.0])

In [None]:
# exploring customers in the Leader persona
len(factor_load_df_25['Optimistic'][factor_load_df_25['Optimistic'] > 1.0])

In [None]:
# exploring customers in the Avoids luxury brands persona
len(factor_load_df_26['Avoids luxury brands'][factor_load_df_26['Avoids luxury brands'] > 1.0])

In [None]:
# exploring customers in the Leader persona
len(factor_load_df_26['Freemium user'][factor_load_df_26['Freemium user'] > 1.0])

## Clustering
We developed a clustering model to group data with similar treats, we used a dendrogram to show a visual representation of our clusters.

In [None]:
#adding the 3 psychometrical together
all_factors = pd.concat([factor_load_df_24,factor_load_df_25,factor_load_df_26],axis = 1)

# applying the unsupervised_scaler function
pca_scaled = unsupervised_scaler(all_factors)

# grouping data based on Ward distance
standard_mergings_ward = linkage(y = pca_scaled,
                                 method = 'ward',
                                 optimal_ordering = True)


fig, ax = plt.subplots(figsize=(12, 12))

# developing a dendrogram
dendrogram(Z = standard_mergings_ward,
           leaf_rotation = 90,
           leaf_font_size = 6)
plt.show()

In [None]:
#testing the candidate number of clusters
# Instantiating a k-Means object with five clusters
survey_k_pca = KMeans(n_clusters   = 4,
                        random_state = 219)


# fitting the object to the data
survey_k_pca.fit(pca_scaled)


# converting the clusters to a DataFrame
survey_kmeans_pca = pd.DataFrame({'Cluster': survey_k_pca.labels_})

print(survey_kmeans_pca.iloc[: , 0].value_counts())

Displaying the mean of each clusters we can develop stories for each group

In [None]:
# storing cluster centers
centroids_pca = survey_k_pca.cluster_centers_


# converting cluster centers into a DataFrame
centroids_pca_df = pd.DataFrame(centroids_pca)

# renaming principal components
centroids_pca_df.columns = ['Technology savvy',
                           'Not technology savvy',
                           'Leader',
                           'Optimistic',
                           'Avoids luxury brands',
                           'Freemium user']

# checking results (clusters = rows, pc = columns)
centroids_pca_df.round(2)

In [None]:
# concatinating cluster memberships with principal components
clst_pca_df = pd.concat([survey_kmeans_pca,
                         all_factors],
                         axis = 1)

# concatenating demographic and behavioral information with pca-clusters
final_pca_clust_df = pd.concat([app_demo, app_usage,
                                clst_pca_df.round(decimals = 2)],
                                axis = 1)
# renaming columns
final_pca_clust_df.columns = ['Age','Education','Marital status','No children',
                             'Children under 6 yrs','Children 6-12 yrs','Children 13-17 yrs',
                             'Children >18','Race', 'Hispanic or Latino', 'Income before taxes',
                             'Gender','iPhone','iPod touch','Android','BlackBerry','Nokia',
                             'Windows','HP','Tablet','Other','None','Music App','TV Check in App',
                             'Entretainment App','TV Show App','Gaming App','Social Networking App',
                             'News App','Shopping App','Specific News App','Other Apps','No apps',
                             'Number of apps','Free apps', 'Facebook','Twitter','Myspace','Pandora radio',
                             'Vevo','YouTube','AOL Radio','Last.fm','Yahoo Entertainment and Music',
                              'IMBD','LinkedIn','Netflix','Cluster','Technology savvy','Not technology savvy',
                             'Leader','Optimistic','Avoids luxury brands','Freemium user']

final_pca_clust_df.head(n = 5)

In [None]:
#renaiming channels

age = {1: 'Under 18',
       2: '18-24',
       3: '25-29',
       4: '30-34',
       5: '35-39',
       6: '40-44',
       7: '45-49',
       8: '50-54',
       9: '55-59',
       10: '60-64',
       11: '>65'}
final_pca_clust_df['Age'].replace(age, inplace = True)

education = {1: 'Some high school',
             2: 'High school graduate',
             3: 'Some college',
             4: 'College graduate',
             5: 'Some post-graduate studies',
             6: 'Post graduate degree'}
final_pca_clust_df['Education'].replace(education, inplace = True)

marital_status = {1: 'Married',
                  2: 'Single',
                  3: 'Single with a partner',
                  4: 'Separated/Widowed/ Divorced'}
final_pca_clust_df['Marital status'].replace(marital_status, inplace = True)

race = {1: 'White/Caucasian',
        2: 'Black/African American',
        3: 'Asian',
        4: 'Native Hawaiian/Pacific Islander',
        5: 'American Indian/Alaska Native',
        6: 'Other race'}
final_pca_clust_df['Race'].replace(race, inplace = True)

ethinicity = {1: 'Yes',
              2: 'No'}
final_pca_clust_df['Hispanic or Latino'].replace(ethinicity, inplace = True)

income = {1: 'Under 10k',
          2: '10k-15k[' ,
          3: '15k-20k[',
          4: '20k-30k[',
          5: '30k-40k[',
          6: '40k-50k[',
          7: '50k-60k[',
          8: '60k-70k[',
          9: '70k-80k[',
          10:'80k-90k[',
          11: '90k-100k[',
          12:'100k-125k[',
          13: '125k-150k[',
          14: '>150k'}
final_pca_clust_df['Income before taxes'].replace(income, inplace = True)

cluster_names = {0 : 'Cluster 1',
                 1 : 'Cluster 2',
                 2 : 'Cluster 3',
                 3 : 'Cluster 4'}
final_pca_clust_df['Cluster'].replace(cluster_names, inplace = True)

data_df = final_pca_clust_df


# checking results
#data_df.head(n = 5)

# saving to Excel
#data_df.to_excel('data_df.xlsx')

In [None]:
#data_df[data_df.loc[ : , 'Cluster'] == 'Cluster 1']

In [None]:
#fig, ax = plt.subplots(figsize = (12, 8))
#sns.boxplot(x = 'Age',
#            y = 'Technology savvy',
#            hue = 'Cluster',
#            data = data_df)
#plt.tight_layout()
#plt.show()

In [None]:
#fig, ax = plt.subplots(figsize = (12, 8))
#sns.boxplot(x = 'Income before taxes',
#            y = 'Technology savvy',
#            hue = 'Cluster',
#            data = data_df)
#plt.tight_layout()
#plt.show()

In [None]:
#fig, ax = plt.subplots(figsize = (12, 8))
#sns.boxplot(x = 'Age',
#            y = 'Not technology savvy',
#            hue = 'Cluster',
#            data = data_df)
#plt.tight_layout()
#plt.show()

In [None]:
#fig, ax = plt.subplots(figsize = (12, 8))
#sns.boxplot(x = 'Income before taxes',
#            y = 'Not technology savvy',
#            hue = 'Cluster',
#            data = data_df)
#plt.tight_layout()
#plt.show()

<h1>Final analysis with Demographic variables</h1>

After dividing our data in four clusters, we can see some tendencies which will help the marketing department develop strategies to target these groups for more app downloading.<br>

<h2>Cluster 1</h2>

Cluster 1 mainly includes White/Caucasian young people between the ages of 18 and 24 years old who are still in collage or have already completed a college degree, most likely do not have children; this cluster earn on average more than 150k annualy, however they are not inclined toward acquiring luxury brand products as this type of persona is not common whihin the cluster.<br>

In addition, this cluster mostly includes the type of persona that avoids luxury brands and are more skeptical with use their data in technological and digital platforms. This cluster is a regular visitor of Facebook, and it is very unlikely that they will visit websites like MySpace and Vevo. However, they do not show a strong behavior toward Netflix, as 32% of the observations are regular visitor and 37% of the participants regale visit the website.Finally, this cluster are avid users of music, gaming, social media apps, and do not show a clear trend toward the usage of entertaining, news, and shopping apps.<br> 

In this sense, it is recommended that the Marketing team creates a strategy to targets this cluster in case that the Company’s app is not luxury brand or tech related or if it falls into the category of music, gaming, social media apps with mobile support. If the criteria explained before is met, then it would be strongly advised to create a campaign advertising the company’s app on other social media apps as well as in Facebook as to increase the number of users; in addition, it should be developed app features meant to retain and create a loyal engagement with said users as a measure to strength the marketing strategy stated above.
<br><br>

<h2>Cluster 2</h2>

Cluster 2 has mostly people from 18 to 29 years old, with an income between 40k to 60k and they are mostly white with no children, these group is willing to spend money on technology since they have more paid apps downloaded in their devices compared to other customers. <br>

These type of cluster would fall into the category of millenials, which are willing to spend money in new technology as well as in luxury and designer brands. This cluster uses mostly social media, music and gaming apps.<br>

The marketing department should focus more on advertising apps on Facebook and Youtube, since it is the one the majority of the people in the cluster uses. If they advertise apps via these platforms, there are more chances of people paying for the apps.
<br><br> 


<h2>Cluster 3</h2>

For cluster 3 on average their age range from 18 to 34 years old, are mostly tech savvy type of persona, they earn between 30k to 60k annually and are most likely White/Caucasian, and they are predominantly married without children. Additionally, they are probably people that are more conscious about their expenir income is low and probably that could be the reason they decide not to have children.<br>  

This segment of people is a little bit older but they tend to enjoy technology, and usually have a leader personality, they are not freemium users, neither optimistic nor use apps for entertainment, and we identified that most of them frequently use Facebook and YouTube websites. <br> <br>

The recommendation for this cluster would be to target them via Youtube, but start to add more weight on paid apps that have to do with entertainment because this could make them prone to have more experiences and would be willing to pay more for new apps. If the apps are related to specific news or TV related apps, this cluster should not be targeted at all.<br> <br> 



<h2>Cluster 4</h2>

Based on our analysis, cluster 4 includes ages ranging between 18-29 years old and it is highly presumably that are single, and  on average earn between 20k to 50k annually. This cluster is likely the type of persona defined as tech savvy as well as leaders.<br>  

In addition they are regular users of free apps as they are available at no cost for the user and most likely include adds on them. They are engaged the Facebook social medial platform, however, do not show a strong trend for the usage of Twitter or Netflix website. Yet within the same age groups, they avoid luxury brands as they don't have the means to spend on this type of products.<br>  

With their highly engaged social media presence, the marketing departing should advertise heavily on free apps related to music and gaming in addition to create a separate strategy to target the conversion of those users that are not active on entertaining, news and shopping apps.