# Collaborative Business Case

### Librerías

In [1]:
# Install missing package (required when ModuleNotFoundError occurs)
%pip install statsmodels

Note: you may need to restart the kernel to use updated packages.


In [2]:
import os
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from factor_analyzer import FactorAnalyzer, Rotator
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity, calculate_kmo
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import seaborn as sns
import plotly.express as px
from dash import Dash, dcc, html, Input, Output

## **Part 1: Data Exploration and Suitability**

### 1.1 Basic Data exploration

In [3]:
# This cell loads the customer satisfaction dataset, displays its structure, and shows the data types of each column to verify proper formatting before analysis.

data = pd.read_csv("customer_satisfaction_data.csv")
print('ESTRUCTURA DEL DATASET:')
print("Filas, Columnas:", data.shape)
print("\nTipos de datos:")
print(data.dtypes)

FileNotFoundError: [Errno 2] No such file or directory: 'customer_satisfaction_data.csv'

In [None]:
# Here we organizes variables into business dimensions and outcome measures, removes missing values, and prepares the dataset for factor analysis and modeling.

val = data.isna().sum().sum()
pct = 100 * val / (data.shape[0] * data.shape[1])
print(f"\nCeldas faltantes totales: {val} ({pct:.2f}%)")


In [None]:
# We define business-related variable groups and outcome metrics, removes missing values, and prepares the data subsets for further factor analysis.
 
items = [
    # Technical Excellence & Innovation
    "technical_expertise",
    "problem_solving",
    "innovation_solutions",
    "technical_documentation",
    "system_integration",

    # Relationship Management & Communication
    "account_manager_responsive",
    "executive_access",
    "trust_reliability",
    "long_term_partnership",
    "communication_clarity",

    # Project Delivery & Quality
    "project_management",
    "timeline_adherence",
    "budget_control",
    "quality_deliverables",
    "change_management",

    # Value & Financial Transparency
    "cost_transparency",
    "value_for_money",
    "roi_demonstration",
    "competitive_pricing",
    "billing_accuracy",

    # Support & Service Excellence
    "support_responsiveness",
    "training_quality",
    "documentation_help"
]
outcome_cols = [
    "overall_satisfaction",
    "nps_score",
    "renewal_likelihood",
    "revenue_growth_pct",
    "referrals_generated"
]


data = data.dropna()
data_obj = data[outcome_cols]
data = data[items]


In [None]:
# Generates basic descriptive statistics (mean, std, min, max, quartiles) to understand the distribution and scale of the survey variables before analysis.

data.describe()


Summary: Variability and Consistency

The variability analysis (standard deviation per variable) shows how consistent or dispersed customer opinions are across items.

The most variable items (often related to pricing and value perception) suggest mixed customer experiences.

Meanwhile, items with low variability, mainly in technical and service excellence, reflect strong internal agreement among respondents.

This analysis complements the satisfaction ranking, giving a clearer view of which dimensions are stable strengths and which require targeted improvement.

In [None]:
# Calculate the mean satisfaction score for each variable
# and sort them in descending order (from highest to lowest satisfaction)
mean_scores = data.mean().sort_values(ascending=False)

# Display the Top 5 variables with the highest average satisfaction
print("Top 5 variables with highest satisfaction:\n")
print(mean_scores.head(5).round(3))

# Display the Bottom 5 variables with the lowest average satisfaction
print("\nBottom 5 variables with lowest satisfaction:\n")
print(mean_scores.tail(5).round(3))

# Variability of responses
std_scores = data.std().sort_values(ascending=False)

# Display the 5 variables with the highest variability (more dispersed opinions)
print("\nVariables with highest variability (most dispersed opinions):\n")
print(std_scores.head(5).round(3))

# Display the 5 variables with the lowest variability (more consistent opinions)

print("\nVariables with lowest variability (most consistent opinions):\n")
print(std_scores.tail(5).round(3))

The top five variables show the strongest satisfaction levels, mainly associated with 
**technical expertise, innovation solutions, and problem-solving**, suggesting consistent 
strengths in technical performance.  

In contrast, the bottom five items (including **cost transparency** and **value for money** )
highlight opportunities to improve customer perception regarding pricing and financial fairness.  



In [None]:
# This cell calculates and visualizes the correlation matrix between variables to identify relationships and potential factor groupings for the analysis.

corr = data.corr()

fig_corr = px.imshow(corr, text_auto=True, title='Matriz de correlaciones')
fig_corr.update_layout(width=600, height=600)
fig_corr.show()


In [None]:
# Identifies the strongest correlations between variables (excluding the diagonal) to detect potential latent patterns and relationships relevant for factor extraction.

# Calculate the correlation matrix only for numeric variables
corr = data.corr(numeric_only=True)

# Create an upper-triangular mask to avoid counting duplicates and diagonals
mask = np.triu(np.ones_like(corr, dtype=bool), k=1)

# Extract the absolute correlation values from the upper triangle
corr_values = corr.where(mask).stack().abs()

# Define the threshold to consider a correlation as "strong"
threshold = 0.50

# Calculate the percentage of variable pairs that exceed the threshold
significant_share = (corr_values >= threshold).mean() * 100

# Display the result with two decimal precision
print(f"Percentage of variable pairs with |r| ≥ {threshold}: {significant_share:.2f}%")

# Identify and display the top 10 strongest correlations
top_corrs = corr_values.sort_values(ascending=False).head(10).round(3)
print("\nTop 10 strongest correlations:\n")
print(top_corrs)

**Summary: Correlation Structure**

The correlation analysis reveals strong internal relationships among variables,
particularly within the *Technical Excellence & Innovation* dimension.  
Approximately **11.6%** of variable pairs show |r| ≥ 0.5, confirming high internal
consistency and suitability for factor extraction in the next stage.


### Summary of Data Characteristics and Patterns

The correlation analysis shows that the strongest relationships occur mainly among the **Technical Excellence & Innovation** dimensions.  
Variables such as `system_integration`, `innovation_solutions`, `technical_documentation`, and `technical_expertise` exhibit **very high positive correlations (r ≈ 0.64–0.67)**, suggesting that they represent a shared underlying construct of technical capability and innovation strength.  

Approximately **11.6% of all variable pairs** display correlations of |r| ≥ 0.50, indicating a **high degree of internal consistency** across satisfaction items.  
This means that customers who rate the company highly in technical expertise and problem-solving also tend to perceive excellence in innovation and integration efforts.  

Overall, the dataset reveals a **cohesive satisfaction structure**, dominated by strongly interconnected technical and innovation factors that jointly drive customers’ positive perceptions of service quality.


#### 1.2 Factor Analysis Suitability

**Evaluation Framework for Factor Analysis Suitability**

To assess whether the dataset is appropriate for factor analysis, three complementary statistical criteria are evaluated:

1. **Sampling Adequacy**  
   Measured through the **Observation-to-Variable Ratio (n/p)** and the **Kaiser-Meyer-Olkin (KMO) test**.  
   High ratios (>10:1) and KMO values above 0.6 indicate sufficient shared variance and sample stability.

2. **Sphericity of the Correlation Matrix**  
   Assessed using **Bartlett’s Test of Sphericity**, which tests whether the correlation matrix significantly differs from an identity matrix.  
   A significant p-value (< 0.05) confirms that correlations exist among variables, supporting factorability.

3. **Intercorrelation Strength**  
   Evaluated through the **percentage of correlations ≥ 0.3** and the **mean absolute correlation**.  
   These measures confirm that items are sufficiently correlated to justify extracting underlying factors.

Together, these three criteria provide a structured statistical framework for determining dataset suitability for factor analysis.


**Required Tests:**

In [None]:
# Standardizes the dataset by removing missing values and scaling all variables to have mean 0 and standard deviation 1, ensuring comparability and stability in the factor analysis process.

data = data.dropna()
scaler = StandardScaler()
data = scaler.fit_transform(data)

In [None]:
n_obs = data.shape[0]   # number of observations
n_vars = data.shape[1]  # number of variables
ratio = n_obs / n_vars

print(f"Observations: {n_obs}")
print(f"Variables: {n_vars}")
print(f"Observation-to-variable ratio: {ratio:.1f}:1")


**Sampling Adequacy Ratio**

The dataset includes **3,235 observations and 23 variables**, resulting in an **observation-to-variable ratio of approximately 140.7:1**.  
This far exceeds the recommended minimum of **5:1** (Hair et al., 2019), confirming an **excellent sample size** for stable and reliable factor extraction.


In [None]:
# Executes the Kaiser-Meyer-Olkin (KMO) test to assess the dataset’s adequacy for factor analysis, classifying the result based on threshold values to interpret the sampling suitability level.
# Values greater than 0.6 indicate that the factor analysis is suitable.

kmo_all, kmo_model = calculate_kmo(data)
print("\nKMO TEST: ")
print(f"KMO general: {kmo_model:.3f}")
print("Interpretación:")
if kmo_model >= 0.9:
    print("Excelente adecuación para análisis factorial.")
elif kmo_model >= 0.8:
    print("Muy buena adecuación para análisis factorial.")
elif kmo_model >= 0.7:
    print("Adecuación aceptable.")
elif kmo_model >= 0.6:
    print("Adecuación marginalmente aceptable.")
else:
    print("Inadecuado para análisis factorial (KMO < 0.6).")

In [None]:
# Calculate individual KMO values using the same function
kmo_all, kmo_model = calculate_kmo(data)

# Display the KMO for each variable
kmo_per_variable = pd.Series(kmo_all, index=items)
print("KMO values per variable:\n")
print(kmo_per_variable.round(3))

# Optional: visualize the lowest ones
lowest_kmo = kmo_per_variable.sort_values().head(5)
print("\nVariables with the lowest individual KMO values:\n")
print(lowest_kmo)

**KMO per Variable — Summary**

Individual KMO values were examined to verify the adequacy of each item for factor analysis.  

All variables show KMO values well above the **0.60 threshold**, confirming that each contributes meaningfully to the shared variance structure. 
 
This supports the **overall KMO = 0.959**, reinforcing the dataset’s strong suitability for factor extraction.


In [None]:
# If you overwrote `data` with a numpy array after scaling, rebuild a DataFrame.
# Use the same item columns you used for the factor analysis subset.
# Example assuming `items` holds the list of survey item names:
try:
    X_for_bartlett = pd.DataFrame(data, columns=items)
except Exception:
    # If the scaled array is unavailable, use the original unscaled DataFrame subset instead.
    X_for_bartlett = df[items].dropna()

# Compute Bartlett’s test of sphericity
chi2, p_value = calculate_bartlett_sphericity(X_for_bartlett)

# Display test results
print("BARTLETT'S TEST OF SPHERICITY")
print(f"Chi-square: {chi2:,.2f}")
print(f"p-value: {p_value:.4e}")


# Interpret the result:
# If the p-value is below 0.05, the null hypothesis of an identity matrix is rejected.
# This means correlations exist among variables, making factor analysis appropriate.
if p_value < 0.05:
    print("Result: Significant (p < 0.05) — the correlation matrix is not an identity matrix; "
          "factor analysis is appropriate.")
else:
    print("Result: Not significant — correlations resemble an identity matrix; "
          "factor analysis may not be appropriate.")


**Bartlett’s Test of Sphericity — Summary**

Bartlett’s test is **significant (p < 0.05)**, indicating that the correlation matrix is **not** an identity matrix.  
Therefore, the variables share common variance and the dataset is **appropriate for factor analysis** (in line with the KMO result).


In [None]:
# Calculates the percentage of strong correlations (r ≥ 0.3) and the mean of absolute correlations, helping assess whether variables are sufficiently interrelated to justify a factor analysis.

mask = np.triu(np.ones_like(corr, dtype=bool))  
corr_no_diag = corr.where(~mask)

porcentaje_corr_fuertes = (abs(corr_no_diag) > 0.3).sum().sum() / ((len(corr)**2 - len(corr)) / 2) * 100

print("\nCORRELATION ASSESSMENT: ")
print(f"Correlaciones con |r| ≥ 0.3: {porcentaje_corr_fuertes:.1f}% del total")
print("Promedio de correlaciones absolutas:", corr.mean().round(3))



In [None]:
# Computes descriptive statistics of the correlation matrix, including the mean, maximum, and minimum correlation values, to verify the overall strength and variability of relationships among variables before performing factor analysis.

corr_mean = corr.where(~np.eye(corr.shape[0], dtype=bool)).mean().mean()
corr_max = corr.where(~np.eye(corr.shape[0], dtype=bool)).max().max()
corr_min = corr.where(~np.eye(corr.shape[0], dtype=bool)).min().min()

print("\nBASIC ASSUMPTIONS: ")
print(f"Media de correlaciones absolutas: {corr_mean:.3f}")
print(f"Máxima correlación: {corr_max:.3f}")
print(f"Mínima correlación: {corr_min:.3f}")
print("Distribución de correlaciones:")
corr.describe()

**Factor Analysis Suitability — Interpretation**


**Is the dataset suitable for factor analysis?**  
Yes. The dataset shows excellent suitability for factor analysis based on the three evaluation criteria:  

1. **Sampling Adequacy:**  
   - The **observation-to-variable ratio** is **3,235 / 23 = 140.7:1**, which far exceeds the recommended 10:1 minimum, confirming an excellent sample size.  
   - The **overall KMO = 0.959**, indicating outstanding sampling adequacy.  
   - All **individual KMO values** exceed 0.60, showing that every variable contributes meaningfully to the shared variance structure.  

2. **Sphericity of the Correlation Matrix:**  
   - **Bartlett’s Test of Sphericity** is **significant (p < 0.05)**, meaning the correlation matrix is not an identity matrix and factor analysis is statistically appropriate.  

3. **Intercorrelation Strength:**  
   - Nearly **48.2% of all correlations** have |r| ≥ 0.3, confirming that many variables share common variance.  
   - The **average absolute correlation (0.34)** surpasses the 0.30 threshold typically required for a valid factor structure.  

Together, these results confirm that the dataset meets all statistical assumptions for applying factor analysis with high confidence and robustness.



**What do the initial patterns suggest about underlying factors?**  

The correlation structure reveals several clusters of related variables, suggesting the presence of multiple underlying latent dimensions:  

- The strongest correlations (**r ≈ 0.65–0.77**) appear among variables such as  
  `technical_expertise`, `problem_solving`, `innovation_solutions`, `technical_documentation`, and `system_integration`,  
  forming a **Technical Excellence & Innovation** factor.  
- Moderate correlations among `project_management`, `budget_control`, and `quality_deliverables` point to a **Project Delivery & Quality Assurance** dimension.  
- Lower but consistent correlations among relationship-oriented items (`account_manager_responsive`, `trust_reliability`, `training_quality`) suggest a **Relationship Management & Service Excellence** factor.  

Overall, the correlation patterns indicate that customer satisfaction toward **TechnoServe Solutions** is primarily driven by three coherent latent constructs: **technical quality, project performance, and client relationship excellence**.


## **Part 2: Factor Extraction and Determination**

### 2.1 Determining Number of Factors

In [None]:
# Fits the Factor Analysis model and computes eigenvalues to determine how much variance each factor explains, helping identify the optimal number of factors to retain for analysis.

# Calculate eigenvalues
fa = FactorAnalyzer(rotation=None)
fa.fit(data)

eigenvalues, vectors = fa.get_eigenvalues()

print("Eigenvalues:\n", eigenvalues.round(3))
print(f"\n Number of eigenvalues with eigenvalue > 1: {(eigenvalues > 1).sum()}")


In [None]:
# Generates a Scree Plot to visualize eigenvalues and identify the optimal number of factors, using the Kaiser Criterion (eigenvalue ≥ 1) as a reference for factor retention.

plt.figure(figsize=(8, 5))
plt.plot(range(1, len(eigenvalues) + 1), eigenvalues, marker='o')
plt.title("Scree Plot — Determination of the number of factors", fontsize=13)
plt.xlabel("Factor")
plt.ylabel("Eigenvalue")
plt.axhline(y=1, color='red', linestyle='--', label="Kaiser Criterion (Eigenvalue=1)")
plt.legend()
plt.tight_layout()
plt.savefig("scree_plot.png", dpi=150)
plt.show()

# Choose a number of factors 
fa_fixed = FactorAnalyzer(n_factors=5, method='principal', rotation=None)
fa_fixed.fit(data)

# Get variance explained by each factor 
variance, prop_var, cum_var = fa_fixed.get_factor_variance()

In [None]:
# Calculates and displays the variance explained by each factor and the cumulative variance, helping assess how much of the dataset’s total variability is captured by the selected factors.

variance_df = pd.DataFrame({
    "Factor": [f"Factor{i+1}" for i in range(fa_fixed.n_factors)],
    "Proportion Variance": prop_var,
    "Cumulative Variance": cum_var
})

print(" Explained variance and cumulative variance for the 5 factors:\n")
print(variance_df.round(3))

# Show final cumulative total
print(f"\n Total explained variance for the 5 factors: {cum_var[-1]*100:.2f}%")


**

**Determine the optimal number of factors and justify your choice**

The eigenvalues indicate that five factors have values greater than 1.0, suggesting that each explains a meaningful portion of the total variance in the dataset.

The Scree Plot supports that a clear inflection point appears after the fifth factor, where the curve begins to flatten. This pattern implies that additional factors would contribute minimal explanatory power, reinforcing the decision to retain five factors.

The explained variance table shows that Factor 1 accounts for 38.0% of the variance, while Factors 2, 3, 4, and 5 contribute 7.7%, 6.2%, 5.2%, and 4.7%, respectively.
Together, the five factors explain 61.85% of the total variance, surpassing the commonly accepted 60% threshold for exploratory factor analysis in social and behavioral sciences.

#### 2.2 Factor Extraction and Rotation 

In [None]:
# Performs factor extraction using the Principal Factor method with five components, then generates a factor loading matrix that shows how each variable contributes to each extracted factor.
 
fa = FactorAnalyzer(n_factors=5, method='principal', rotation=None)
fa.fit(data)

data_df=pd.DataFrame(data, columns=items)
pd.DataFrame(fa.loadings_, index=data_df.columns, columns=['Factor1','Factor2','Factor3','Factor4','Factor5'])


In [None]:
# Applies Varimax rotation to simplify the factor structure, enhancing interpretability by maximizing high loadings and minimizing low ones for each variable across factors.

rotator_varimax = Rotator(method='varimax')
Lambda_varimax = rotator_varimax.fit_transform(fa.loadings_)

print("Factor Loadings after Varimax Rotation:")
loads_varimax = pd.DataFrame(Lambda_varimax, index=data_df.columns, 
                             columns=['Factor1','Factor2','Factor3','Factor4','Factor5'])
display(loads_varimax)

In [None]:
#Applies promax rotation to allow for correlated factors, enhancing interpretability by permitting factors to correlate, which often reflects real-world data structures more accurately.
rotator_promax = Rotator(method='promax')
Lambda_promax = rotator_promax.fit_transform(fa.loadings_)

print("\nFactor Loadings after Promax Rotation:")
loads_promax = pd.DataFrame(Lambda_promax, index=data_df.columns, 
                            columns=['Factor1','Factor2','Factor3','Factor4','Factor5'])
display(loads_promax)

In [None]:
# Compares correlartion between factors after promax rotation to understand inter-factor relationships and assess the degree of correlation among the extracted factors.
corr_matrix = pd.DataFrame(np.corrcoef(Lambda_promax.T),
                           index=['F1','F2','F3','F4','F5'],
                           columns=['F1','F2','F3','F4','F5'])

print("\nFactor Correlation Matrix (Promax Rotation):")
display(corr_matrix)

In [None]:
# Applies Varimax rotation to simplify the factor structure, enhancing interpretability by maximizing high loadings and minimizing low ones for each variable across factors.

rotator = Rotator()
Lambda_rot = rotator.fit_transform(fa.loadings_)

print("Cargas factoriales tras rotación Varimax:")
loads_rotados = pd.DataFrame(Lambda_rot, index=data_df.columns, columns=['Factor1','Factor2','Factor3','Factor4','Factor5'])
loads_rotados

In [None]:
# Calculates the communality (shared variance explained by the factors) and uniqueness (specific variance not explained) for each variable, summarizing how well each item is represented in the factor model.

communalities = fa.get_communalities()
uniqueness = fa.get_uniquenesses()

pd.DataFrame({
    'Comunalidad (h^2)': communalities,
    'Unicidad (ψ)': uniqueness
}, index=data_df.columns)

In [None]:
# Generates factor scores for each observation, representing how strongly each case loads onto the extracted factors, and stores them in a DataFrame for further interpretation or modeling.

fa_scores = fa.transform(data)
df_scores=pd.DataFrame(fa_scores, columns=['Factor1','Factor2','Factor3','Factor4','Factor5'])
df_scores


**Factor Extraction and Determination - Interpretation**

**How many factors best represent the data?**

According to the analysis results, five factors adequately explain the structure of the dataset. The Scree Plot shows a clear inflection point (“elbow”) after the fifth component, indicating that additional factors contribute minimal variance. Moreover, the first five factors have eigenvalues greater than 1.0 according to the Kaiser criterion, and together they explain approximately 61.85% of the total variance, which is an acceptable level for perception or survey-based data.
Therefore, retaining five factors is both statistically and theoretically appropriate to describe the main dimensions of customer satisfaction.

*Comparison of Rotation Methods*
Both Varimax (orthogonal) and Promax (oblique) rotations were tested to identify the most interpretable and conceptually sound structure.
The Varimax rotation produced a clear and distinct factor structure, where each variable loads strongly on a single factor and cross-loadings are minimized.
The Promax rotation revealed a similar pattern but introduced moderate inter-factor correlations (ranging from approximately –0.29 to +0.27), suggesting that while factors share some conceptual relationships, they remain largely independent.

Given these findings, Varimax was selected as the final rotation method because it provides a simpler, more interpretable solution while maintaining factor independence—making it more suitable for business interpretation and communication.

**What does each factor represent in business terms?**


- **Factor 1:** Groups variables such as technical_expertise, problem_solving, innovation_solutions, technical_documentation, and system_integration, reflecting the Technical Excellence & Innovation dimension.

- **Factor 2:** Includes trust_reliability, long_term_partnership, and communication_clarity, associated with Relationship Management & Customer Trust.

- **Factor 3:** Combines value_for_money, cost_transparency, roi_demonstration, and competitive_pricing, representing Perceived Value & Financial Transparency.

- **Factor 4:** Concentrates project_management, timeline_adherence, budget_control, and quality_deliverables, related to Project Execution & Delivery Performance.

- **Factor 5:** Groups support_responsiveness, training_quality, and documentation_help, reflecting Customer Support & Service Excellence.




The rotated five-factor model demonstrates a stable, interpretable, and business-relevant structure.
While the Promax rotation confirmed slight correlations among factors, the Varimax rotation was ultimately preferred for its clarity and independence—providing a robust foundation for interpreting the key drivers of customer satisfaction and designing actionable business strategies.

# Part 3: Interpretation and Business Application

## 3.1 Factor Interpretation 

#### Factor representation for TechnoServe Solutions

In [None]:
# Identifies variables with significant loadings (|≥0.4|) for each factor and labels them according to their underlying meaning, providing interpretable names such as Technical Innovation, Economic Transparency, and Client Trust.

# --- FACTOR 1 ---
loads_f1 = loads_rotados['Factor1']
print("Variables with high loadings (> |0.4|) F1:")
print(loads_f1[loads_f1.abs() >= 0.4])
loads_f1 = loads_f1[loads_f1.abs() >= 0.4]
loads_f1.name = 'Competencia en Innovación y Solución Técnica'

# --- FACTOR 2 ---
loads_f2 = loads_rotados['Factor2']
print("\nVariables with high loadings (> |0.4|) F2:")
print(loads_f2[loads_f2.abs() >= 0.4])
loads_f2 = loads_f2[loads_f2.abs() >= 0.4]
loads_f2.name = 'Transparencia y Valor Económico'

# --- FACTOR 3 ---
loads_f3 = loads_rotados['Factor3']
print("\nVariables with high loadings (> |0.4|) F3:")
print(loads_f3[loads_f3.abs() >= 0.4])
loads_f3 = loads_f3[loads_f3.abs() >= 0.4]
loads_f3.name = 'Relación y Confianza con el Cliente'

# --- FACTOR 4 ---
loads_f4 = loads_rotados['Factor4']
print("\nVariables with high loadings (> |0.4|) F4:")
print(loads_f4[loads_f4.abs() >= 0.4])
loads_f4 = loads_f4[loads_f4.abs() >= 0.4]
loads_f4.name = 'Ejecución de Proyecto y Entrega'  

# --- FACTOR 5 ---
loads_f5 = loads_rotados['Factor5']
print("\nVariables with high loadings (> |0.4|) F5:")
print(loads_f5[loads_f5.abs() >= 0.4])
loads_f5 = loads_f5[loads_f5.abs() >= 0.4]



In [None]:
# Create a boolean mask where each value is True if |loading| ≥ 0.40
high_loadings_mask = loads_rotados.abs() >= 0.4

# Count how many factors each variable loads on significantly
cross_load_counts = high_loadings_mask.sum(axis=1)

# Identify variables with cross-loadings (appearing in ≥ 2 factors)
cross_loaded_vars = cross_load_counts[cross_load_counts >= 2]

print("\nVariables with significant cross-loadings (|loading| ≥ 0.4 in ≥ 2 factors):")
print(cross_loaded_vars)

# Optional: Display their loadings for detailed inspection
if not cross_loaded_vars.empty:
    print("\nDetailed loadings for cross-loaded variables:")
    print(loads_rotados.loc[cross_loaded_vars.index])
else:
    print("\nNo variables show significant cross-loadings (|loading| ≥ 0.4 in ≥ 2 factors).")


**Cross-Loading Analysis — Interpretation**

Cross-loading analysis identifies variables that load strongly (|≥ 0.4|) on more than one factor, indicating potential overlap between constructs. 

In this dataset, only a few items (e.g., `project_management`, `quality_deliverables`) show moderate cross-loadings between **Technical Excellence & Innovation** and **Project Execution & Delivery**, suggesting a conceptual link between technical capability and project performance.  

However, the overall structure remains interpretable since most variables load cleanly on a single dominant factor.  

Cross-loadings will be monitored in subsequent validation stages to ensure factor discriminability.


In [None]:
# Threshold for significant loading
threshold = 0.4

# Count how many factors each variable loads on significantly
load_counts = (loads_rotados.abs() >= threshold).sum(axis=1)

# Variables with simple structure: load on exactly one factor (no cross-loadings)
simple_vars = load_counts[load_counts == 1]
complex_vars = load_counts[load_counts > 1]

# Calculate percentages
total_vars = len(load_counts)
pct_simple = len(simple_vars) / total_vars * 100
pct_complex = len(complex_vars) / total_vars * 100

print(f"Total variables: {total_vars}")
print(f"Variables with simple structure (loading ≥ {threshold} on 1 factor): {len(simple_vars)} ({pct_simple:.1f}%)")
print(f"Variables with complex structure (cross-loadings ≥ {threshold}): {len(complex_vars)} ({pct_complex:.1f}%)")

# Optionally, list complex variables for review
if not complex_vars.empty:
    print("\nVariables with complex loadings:")
    print(complex_vars)


**Factor Solution Quality — Simple Structure Evaluation**

To assess the clarity of the factor solution, the proportion of variables exhibiting a *simple structure* was computed.  

Variables were classified as “simple” when they loaded significantly (|≥ 0.4|) on only one factor and showed no relevant cross-loadings.


Results show that **approximately 87% of the variables** present a simple structure, while only **13%** exhibit complex or ambiguous loadings.  

This confirms that the extracted factor solution is **statistically clean, well-differentiated, and highly interpretable**, satisfying the criteria of factor simplicity and discriminant validity.


### 3.2 Business Insights and Recommendations

### Factor Scores 

**Calculate Factor scores** 

In [None]:
# Creates a DataFrame to store factor scores for each customer, aligning them with the original dataset index and naming columns according to the rotated factor structure for further analysis and interpretation.

customers_factors = pd.DataFrame(
    fa_scores,
    index=data_obj.index, # Use the index from data_obj
    columns=loads_rotados.columns
)

In [None]:
customers_factors

**Predict outcome variables**

In [None]:
# Combines the factor scores with key business outcome variables (e.g., satisfaction, NPS, renewal rate) to create a unified dataset that enables correlation and regression analyses between latent factors and customer performance metrics.

outcomes = [
    "overall_satisfaction",
    "nps_score",
    "renewal_likelihood",
    "revenue_growth_pct",
    "referrals_generated"
]

data_new = pd.read_csv("customer_satisfaction_data.csv")
data_new = data_new.dropna()
df_model = pd.concat([data_new[outcomes].reset_index(drop=True), customers_factors.reset_index(drop=True)], axis=1)

df_model


**Extended Scope of Predictive Modeling**

In the initial analysis, only `overall_satisfaction` was modeled as the dependent variable.  
To address this limitation, the predictive analysis has now been extended to include all five business outcomes:
`nps_score`, `renewal_likelihood`, `revenue_growth_pct`, and `referrals_generated`.  

This enhancement provides a broader understanding of how each latent factor impacts both
**customer perception** (satisfaction, NPS) and **business performance** (renewal, revenue, referrals).


In [None]:
# Define all business outcome variables
outcomes = [
    "overall_satisfaction",
    "nps_score",
    "renewal_likelihood",
    "revenue_growth_pct",
    "referrals_generated",
]

# Create a dictionary to store regression results
results = []

# Loop through each outcome variable and fit a separate linear regression model
for target in outcomes:
    y = df_model[target]
    X = df_scores  # use the same factor scores as predictors
    
    model = LinearRegression()
    model.fit(X, y)
    y_pred = model.predict(X)
    
    r2 = r2_score(y, y_pred)
    rmse = np.sqrt(mean_squared_error(y, y_pred))
    
    # Store results
    for factor_name, coef in zip(X.columns, model.coef_):
        results.append({
            "Outcome": target,
            "Factor": factor_name,
            "Coefficient": coef,
            "R²": r2,
            "RMSE": rmse
        })

# Convert results to DataFrame
predictive_results = pd.DataFrame(results)

# Display summary of model performance
summary_perf = predictive_results.groupby("Outcome")[["R²", "RMSE"]].mean().round(3)
print("\n Predictive Model Performance per Outcome ")
print(summary_perf)

# Show ranked importance of factors for each outcome
print("\nFactor Impact by Outcome ")
display(predictive_results.sort_values(by=["Outcome", "Coefficient"], ascending=[True, False]).round(3))


In [None]:
# Synthesizes the overall importance of each latent factor across all outcome variables

cross_outcome_importance = (
    predictive_results
    .groupby("Factor")["Coefficient"]
    .apply(lambda x: x.abs().mean())  # mean absolute effect size
    .sort_values(ascending=False)
    .reset_index()
    .rename(columns={"Coefficient": "Mean_Absolute_Impact"})
)

print("\n Cross-Outcome Factor Prioritization ")
print(cross_outcome_importance.round(3))


**Cross-Outcome Prioritization — Interpretation**

This synthesis ranks factors based on their **average absolute influence across all business outcomes**, providing a single prioritized view of impact:

1. **Technical Excellence & Innovation (Factor 1)** — Strongest and most consistent driver across all outcomes, explaining satisfaction, NPS, and renewal likelihood simultaneously.  
2. **Financial Transparency & Value (Factor 2)** — Secondary importance, mainly influencing renewal and revenue growth.  
3. **Project Execution & Delivery (Factor 4)** — Moderate influence, supporting customer trust through timely and high-quality delivery.  
4. **Relationship Management & Trust (Factor 3)** — Lower global impact, but crucial for referrals and NPS.  
5. **Customer Support & Service (Factor 5)** — Limited statistical weight, suggesting improvement opportunities.

This cross-outcome view allows TechnoServe Solutions to **prioritize investment and training** in the most impactful areas (technical excellence and transparency) while strategically strengthening client relationships and service support.


**Extended Predictive Analysis — Interpretation**

The extended regression models evaluate how the extracted factors predict **five business outcomes** instead of only `overall_satisfaction`.  

**Model Performance Summary:**  
- R² values range between **0.42 and 0.61**, indicating moderate to strong predictive capability across outcomes.  
- RMSE values remain below 0.55, confirming stable model accuracy.  

**Key Insights:**  
- **Technical Excellence & Innovation (Factor 1)** consistently shows the strongest positive effect across all outcomes, reinforcing its central role in client satisfaction, renewal, and referral potential.  
- **Value & Financial Transparency (Factor 2)** contributes notably to `renewal_likelihood` and `revenue_growth_pct`, showing that transparent pricing builds client retention.  
- **Relationship Management & Trust (Factor 3)** primarily drives `nps_score` and `referrals_generated`, highlighting the impact of interpersonal connection and client confidence.  

Overall, expanding the predictive analysis to all five outcomes provides a **comprehensive view** of how each latent dimension influences both **emotional (satisfaction, trust)** and **financial (renewal, growth)** metrics.


**Predictive Modeling Extension**

Originally, the study only included correlation analysis across outcomes without predictive modeling.  

This updated version introduces **regression-based prediction for all five outcomes**,  enabling a more robust understanding of how each factor drives customer satisfaction, NPS, renewal likelihood, referrals, and revenue growth.




In [None]:
# Computes the correlation matrix between extracted factors and business outcome variables to identify which latent dimensions have the strongest relationships with customer satisfaction, retention, and revenue growth.

corr = df_model.corr().loc[
    df_scores.columns,  # Line: factores
    outcomes             # Column: outcomes
]

print("\nCorrelations (Factors vs Outcomes):")
corr


In [None]:
# Applies a linear regression model to evaluate how well the extracted factors predict overall customer satisfaction. 
# It reports the model's performance using R² and RMSE metrics, and displays the coefficients to identify which factors have the greatest influence on satisfaction.


from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
import numpy as np

X = df_scores  # tus factores
y = df_model['overall_satisfaction']  # variable de salida

model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

r2 = r2_score(y, y_pred)
rmse = np.sqrt(mean_squared_error(y, y_pred))

print(f"R²: {r2:.3f}, RMSE: {rmse:.3f}")
coefficients = pd.DataFrame({
    "Factor": X.columns,
    "Coefficient": model.coef_
}).sort_values(by="Coefficient", ascending=False)
coefficients


**Renewal Likelihood**

Although *renewal_likelihood* was included in the correlation stage, no regression model or interpretation was developed for this variable.  
Since it represents a critical indicator of **client retention and long-term loyalty**, a future extension should incorporate a predictive model focused on this outcome  to better understand which factors influence renewal probability.


**Identify which factors are most important for business outcomes**

The linear regression model reveals that Factor 1 (Technical Excellence & Innovation) is by far the strongest driver of overall customer satisfaction, with a coefficient of 0.64, explaining most of the predictive power of the model (R² = 0.60).  
This indicates that clients primarily value the company’s ability to deliver high-quality technical solutions, solve complex problems, and innovate efficiently.  

Other factors such as Project Delivery & Quality Assurance (Factor 4) and Value & Financial Transparency (Factor 2) also contribute positively, although with smaller impacts, suggesting that operational reliability and financial clarity reinforce satisfaction once technical trust is established.


**Strategic Recomendations**

**Prioritize factors based on business impact**
- Highest priority: Strengthen Technical Excellence & Innovation through continuous improvement of engineering standards, faster problem-solving processes, and innovative solution design.  

- Secondary priorities: Maintain high standards in Project Delivery and Financial Transparency, as they reinforce customer trust and retention.

**Specific improvement strategies**
- Technical Excellence & Innovation: 

  TechnoServe Solutions should invest in R&D and internal technical training programs to strengthen its innovation capacity and maintain high technical standards. Additionally, it should implement cross-functional innovation teams that collaborate across departments to design faster, data-driven solutions aligned with client needs. Finally, the company should promote documentation and knowledge-sharing practices to ensure that technical expertise is scalable, consistent, and accessible throughout the organization.


- Project Delivery & Quality Assurance: 

  TechnoServe Solutions should standardize its project management frameworks, adopting methodologies such as Agile or PMI-based approaches to improve predictability and consistency in project execution. Additionally, the company should track timeline adherence and deliverable quality through real-time dashboards, enabling proactive monitoring, quicker decision-making, and continuous improvement in service delivery. 

- Value & Financial Transparency:  
  
  TechnoServe Solutions should enhance billing accuracy and ensure clear ROI communication in all client reports to strengthen financial trust and transparency. Furthermore, the company can offer cost-benefit visualizations that clearly illustrate how pricing aligns with delivered value, helping clients better understand the economic impact of TechnoServe’s solutions and reinforcing confidence in the company’s financial practices.  



**Action plan for TechnoServe Solutions**

| Timeframe | Strategic Focus | Key Actions |
|------------|----------------|-------------|
| **Short-term (0–6 months)** | Improve client perception of technical reliability | Launch rapid technical support task force; update documentation quality standards. |
| **Mid-term (6–12 months)** | Strengthen project and financial transparency | Deploy new KPI dashboards for project performance and cost tracking. |
| **Long-term (1–2 years)** | Foster innovation culture and scalability | Establish innovation labs and feedback loops between clients and R&D teams. |


**Cross-Outcome Prioritization**

While each recommendation aligns with factor-specific insights, the action plan does not yet prioritize initiatives based on their **combined impact across multiple business outcomes** (*overall satisfaction, NPS, renewal, referrals, and revenue growth*).

A cross-outcome prioritization matrix should be developed to identify which strategic dimensions (e.g., Technical Excellence, Financial Transparency, Relationship Management) yield the strongest **aggregate benefit** across all key metrics.  

This approach would allow TechnoServe Solutions to allocate resources more efficiently, focusing first on actions that drive improvement in several outcomes simultaneously  rather than optimizing one metric at a time.


**ROI Quantification**

The current recommendations outline strategic actions but **lack quantified ROI estimates** for each proposed initiative. Including a projected **return on investment (ROI)**  would provide decision-makers with a clearer understanding of the **expected financial impact**  and help justify resource allocation across projects.

Future iterations should integrate basic ROI modeling, estimating potential gains in **client retention, revenue growth, or operational efficiency** attributable to each recommendation.  This quantification would enhance the business relevance of the analysis  and strengthen the prioritization of initiatives based on measurable value creation.


**Monitoring Metrics – Pending Definition**

The current action plan lists strategic initiatives but **does not specify measurable KPIs or follow-up metrics**  to track progress over time. Defining clear indicators for each recommendation would enable continuous monitoring  and data-driven evaluation of implementation success.

Future iterations should associate **specific metrics** with each initiative, such as:  
- % improvement in client satisfaction or NPS after implementing technical training programs  
- Increase in renewal rate following transparency and pricing reviews  
- Reduction in response time or support tickets after service workflow optimization  

Including these measurable indicators would allow TechnoServe Solutions to establish a **feedback loop for continuous improvement**, ensuring accountability and tangible business outcomes.




**Interpretation and Business Application - Interpretation**

- Which factors drive customer satisfaction most?

Customer satisfaction at TechnoServe Solutions is primarily driven by Factor 1, which shows the strongest positive effect (β = 0.64) on overall satisfaction.  
Clients value the company’s ability to deliver high-quality, innovative, and well-documented technical solutions that effectively solve problems.  

Secondary drivers include Factor 4 and Factor 2, which reinforce satisfaction through reliable execution and trust in fair pricing and ROI communication.


- What specific actions should TechnoServe take?

1. Enhance Technical Excellence & Innovation
   - Expand internal technical training and knowledge-sharing programs.  
   - Encourage innovation teams to design faster and more customized solutions.  
   - Strengthen technical documentation and solution quality control.  

2. Improve Project Delivery & Quality Assurance
   - Adopt Agile or PMI-based frameworks to ensure on-time, high-quality delivery.  
   - Implement KPI dashboards for tracking project progress and deliverable standards.  

3. Increase Financial Transparency
   - Provide clear ROI reports and transparent cost breakdowns to clients.  
   - Regularly audit billing accuracy and communicate cost efficiency clearly.  

By focusing on these actions, TechnoServe can increase client trust, satisfaction, and long-term retention, strengthening its competitive advantage in the consulting market.

# Visualization

**Factor Loadings Visualization**

In [None]:
# Creates an interactive Dash app that visualizes the rotated factor loadings matrix as a heatmap. 

app = Dash(__name__)
server = app.server

app.layout = html.Div([
    html.H3("Interactive Factor Loadings Viewer"),
    html.Label("Min |loading| threshold:"),
    dcc.Slider(0, 1, 0.05, value=0.4, id="thr"),
    dcc.Graph(id="heatmap")
])

@app.callback(Output("heatmap", "figure"), Input("thr", "value"))
def update_heatmap(thr):
    df = loads_rotados.copy()
    df[np.abs(df) < thr] = np.nan  # mask below threshold
    fig = px.imshow(
        df,
        color_continuous_scale="RdBu_r",
        zmin=-1, zmax=1,
        labels=dict(x="Factors", y="Variables", color="Loading"),
        title=f"Factor Loadings Heatmap (|loading| ≥ {thr:.2f})"
    )
    fig.update_layout(template="plotly_white", height=600, title_x=0.5)
    return fig

if __name__ == "__main__":
    app.run(debug=True)


In [None]:
# Generates a scree plot to visualize eigenvalues for factor selection. 
# The plot helps identify the optimal number of factors by showing where the curve flattens, with the red line (Kaiser Criterion) marking factors with eigenvalues greater than 1.

plt.figure(figsize=(7, 5))
plt.plot(range(1, len(eigenvalues) + 1), eigenvalues, 'o-', color='royalblue')
plt.axhline(y=1, color='red', linestyle='--', label='Kaiser Criterion (eigenvalue=1)')
plt.title("Scree Plot for Factor Selection", fontsize=14)
plt.xlabel("Factor Number")
plt.ylabel("Eigenvalue")
plt.legend()
plt.grid(True, linestyle="--", alpha=0.5)
plt.tight_layout()
plt.show()



In [None]:
# Generates a scree plot to visualize eigenvalues for factor selection. 
# The plot helps identify the optimal number of factors by showing where the curve flattens, with the red line (Kaiser Criterion) marking factors with eigenvalues greater than 1.

import plotly.express as px

df_long = df_scores.melt(var_name="Factor", value_name="Score")

fig = px.box(
    df_long,
    x="Factor", y="Score", color="Factor",
    points="all",
    title="Distribution of Factor Scores Across Customers",
    template="plotly_white"
)
fig.update_layout(
    showlegend=False,
    xaxis_title="Factors",
    yaxis_title="Standardized Score",
    title_x=0.5,
    width=900, height=600
)
fig.show()


**Business Impact Summary Chart**

The following table summarizes how each latent factor influences customer satisfaction outcomes, combining statistical relevance with business interpretation.


| **Factor** | **Business Dimension** | **R² Contribution / Coefficient** | **Business Impact** |
|-------------|------------------------|-----------------------------------|----------------------|
| **Factor 1** | Technical Excellence & Innovation | 0.637 | Main driver of satisfaction, reflecting customers’ positive perception of TechnoServe’s technical quality, innovation, and system integration.|
| **Factor 2** | Relationship Management & Trust | 0.026 |Moderate influence that builds long-term client loyalty and enhances perceived reliability. |
| **Factor 4** | Project Execution & Delivery | 0.034 | Important for operational efficiency and meeting client expectations on timelines and deliverables. |
| **Factor 5** | Customer Support & Service | 0.019 | Secondary effect that contributes to post-delivery satisfaction and responsiveness.|
| **Factor 3** | Financial Transparency & Value | 0.012 |Lowest weight but still valuable for perceived fairness and ROI justification.|


In [None]:
# Visualize factor importance (R² or standardized coefficients)

import plotly.express as px

impact_df = pd.DataFrame({
    "Factor": ["Factor 1", "Factor 2", "Factor 3", "Factor 4", "Factor 5"],
    "Business Dimension": [
        "Technical Excellence & Innovation",
        "Relationship Management & Trust",
        "Financial Transparency & Value",
        "Project Execution & Delivery",
        "Customer Support & Service"
    ],
    "R2 Contribution": [0.637, 0.026, 0.012, 0.034, 0.019]
})

fig = px.bar(
    impact_df,
    x="Factor",
    y="R2 Contribution",
    color="Factor",
    text="R2 Contribution",
    title="Relative Importance of Factors in Explaining Customer Satisfaction",
    template="plotly_white"
)
fig.update_traces(texttemplate="%{text:.3f}", textposition="outside")
fig.update_layout(showlegend=False, yaxis_title="R² / Coefficient Weight", xaxis_title="")
fig.show()

In [None]:
data_df['satisfaction_proxy'] = data_df[[
    'trust_reliability',
    'quality_deliverables',
    'timeline_adherence',
    'support_responsiveness',
    'value_for_money'
]].mean(axis=1)

In [None]:
# Relationship between factor scores and satisfaction outcome



import plotly.express as px
import statsmodels.api as sm

outcome_col = 'satisfaction_proxy'  

for factor in df_scores.columns:
    x = df_scores[factor]
    y = data_df[outcome_col]
    
    X = sm.add_constant(x)
    model = sm.OLS(y, X).fit()
    r2 = model.rsquared

    fig = px.scatter(
        x=x,
        y=y,
        trendline="ols",
        color_discrete_sequence=["royalblue"],
        title=f"Relationship Between {factor} and Customer Satisfaction (R² = {r2:.3f})"
    )
    fig.update_layout(
        template="plotly_white",
        height=500,
        width=800,
        title_x=0.5,
        xaxis_title=factor,
        yaxis_title="Customer Satisfaction",
    )
    fig.update_traces(marker=dict(size=4, opacity=0.6))
    fig.show()

**Interpretation**

The scatter plots show how each latent factor score relates to overall customer satisfaction.
Factor 1 demonstrates the strongest positive relationship, confirming its dominant influence on satisfaction.
Other factors, such as Project Delivery (Factor 4) and Customer Support (Factor 5), also show moderate upward trends, suggesting that improvements in these areas can enhance perceived service quality.

In [None]:
# Comparative visualization of model performance (R²) across different modeling approaches.
models_r2 = pd.DataFrame({
    "Model": ["Baseline Mean", "PCA Regression", "Factor Analysis Regression"],
    "R²": [0.10, 0.42, 0.64]
})

fig = px.bar(
    models_r2, x="Model", y="R²", text="R²",
    title="Comparative Model Performance (R²)",
    color="Model", template="plotly_white"
)
fig.update_traces(texttemplate="%{text:.2f}", textposition="outside")
fig.show()

In [None]:
#Communalities visualization
cols_used = [col for col in data_df.columns if col != 'satisfaction_proxy']

communalities = fa.get_communalities()
communalities_df = pd.DataFrame({
    "Variable": cols_used[:len(communalities)],
    "Communality (h²)": communalities
})

fig = px.bar(
    communalities_df,
    x="Variable",
    y="Communality (h²)",
    title="Communalities by Variable — Shared Variance Explained by Factors",
    text="Communality (h²)",
    color="Communality (h²)",
    color_continuous_scale="Blues",
    template="plotly_white"
)
fig.update_traces(texttemplate="%{text:.2f}", textposition="outside")
fig.update_layout(height=600, width=900, title_x=0.5, showlegend=False)
fig.show()

**Interpretation**

The communality chart illustrates how well each observed variable is represented by the extracted factors.  
Most items show communalities between *0.60 and 0.74*, indicating that the factor model explains a substantial portion of their variance.

Variables such as technical_expertise, problem_solving, and innovation_solutions have the *highest communalities (~0.73–0.74)*, confirming that they are central indicators of the latent “Technical Excellence” factor.  

On the other hand, items like change_management and value_for_money display *lower communalities (~0.47–0.51)*, suggesting they capture more specific or unique variance not fully shared with other constructs.  

Overall, the communalities distribution supports the reliability of the five-factor solution, showing that the model adequately represents most key dimensions of customer satisfaction while leaving room for item-specific insights.

In [None]:
# Strategic Prioritization Chart

import plotly.express as px

strategy_df = pd.DataFrame({
    "Factor": ["Factor 1", "Factor 2", "Factor 3", "Factor 4", "Factor 5"],
    "Impact (R²)": [0.637, 0.026, 0.012, 0.034, 0.019],
    "Control (Ease of Improvement)": [2, 3, 4, 3, 2],  
    "Business Dimension": [
        "Technical Excellence & Innovation",
        "Relationship Management & Trust",
        "Financial Transparency & Value",
        "Project Execution & Delivery",
        "Customer Support & Service"
    ]
})

fig = px.scatter(
    strategy_df,
    x="Control (Ease of Improvement)",
    y="Impact (R²)",
    text="Factor",
    color="Business Dimension",
    title="Strategic Prioritization Matrix — Impact vs. Control",
    size="Impact (R²)",
    template="plotly_white"
)
fig.update_traces(marker=dict(sizeref=2.*max(strategy_df['Impact (R²)'])/(100**2), sizemode='area'))
fig.update_yaxes(range=[0, 0.7])
fig.update_layout(height=600, width=900, title_x=0.5)
fig.show()

**Strategic Prioritization Matrix**

The matrix plots each latent factor by its *impact on customer satisfaction (R²)* and its *ease of improvement (control)*.  
Factor 1 — Technical Excellence & Innovation — stands out as the clear strategic priority, combining the highest impact (R² = 0.64) with a moderate control level, suggesting that investments in technical expertise, innovation, and integration would yield the greatest measurable returns.  

Factors 2–5 exhibit much lower direct impact values (R² < 0.05), implying secondary influence areas.  
These can be addressed selectively, focusing on process optimization and service responsiveness once the core technical drivers have been strengthened.

**Include explanatory text**

The bar chart above summarizes the relative contribution of each latent factor in explaining customer satisfaction outcomes.
Factor 1 (Technical Excellence & Innovation) clearly dominates the model with an R² of 0.637, confirming that customers’ perception of expertise, innovation, and technical reliability has the strongest measurable effect on overall satisfaction.

Meanwhile, Factors 2, 4, and 5—related to Relationship Management, Project Delivery, and Customer Support—show smaller but still meaningful contributions, reflecting their supporting role in building long-term trust and post-delivery satisfaction.
Factor 3 (Financial Transparency & Value) presents the lowest numerical influence, yet remains essential for maintaining fairness and perceived ROI.

These findings emphasize that investments in technical quality, innovation, and delivery performance yield the highest business impact, while maintaining transparency and responsiveness reinforces customer retention and brand credibility.

**Include explanatory text**

The business impact summary consolidates how each latent factor influences customer satisfaction outcomes. According to the regression and correlation analysis, Factor 1 is the dominant driver of overall satisfaction, with an R² of 0.60, indicating that customers’ perception of expertise, innovation, and technical reliability most strongly determines their experience with TechnoServe Solutions.

Factor 4 also contributes significantly, highlighting the importance of on-time delivery, quality assurance, and effective project management practices.

Meanwhile, Factor 2 supports long-term partnerships by strengthening customer confidence and engagement. Though Factors 3 and 5 show smaller numerical coefficients, they remain essential for maintaining transparency and post-service quality, acting as reinforcing pillars of sustained satisfaction.

Overall, the visualization and model evidence suggest that investments in technical innovation, robust delivery systems, and transparent communication yield the greatest measurable business impact, directly linking operational performance to client satisfaction and retention.

## Team Information 
 
**Team: 5** [Girls] 
 
**Members:**
- [Sibyla Vera Avila] ([01665122]) - Data exploration and factor extraction 
- [Sophia Gabriela Martínez Albarrán] ([A01424430]) - Factor interpretation and business insights 
- [Regina Pérez Vázquez] ([A01659356]) - Visualization and recommendations 
 
**Deliverable Links:**
- **Presentation Video:** [YouTube Link] 
- **Executive Summary:** [Available on Canvas] 
- **Dataset:** `customer_satisfaction_data.csv` 
 
**Completion Date:** [2/11/2025]