Data Collection:
First, I would collect employee feedback, which could come from surveys, open-ended responses, or reviews about the two-pot system. This feedback would typically contain information about their experience with the accessible and locked pots, concerns, or satisfaction.

Data Preprocessing:
Lowercasing: Convert all text to lowercase for uniformity.
Tokenization: Break down the text into individual words or tokens.
Stop Words Removal: Remove common words like "and", "is", "the" which don’t contribute to sentiment.
Lemmatization/Stemming: Convert words to their base form

Sentiment Detection:
Positive: Feedback that shows satisfaction or approval of the system (e.g., “I love how flexible the accessible pot is”).
Negative: Feedback expressing dissatisfaction or concerns.
Neutral: Feedback that does not convey strong emotions, possibly descriptive (e.g., “The system is functional”).
One popular tool for this is VADER (Valence Aware Dictionary and Sentiment Reasoner) or TextBlob, which are both lexicon-based approaches. Alternatively, a supervised machine learning model like Logistic Regression or Naive Bayes could be used if we had labeled training data.

Categorization:
Positive if the feedback contains words such as "flexible", "benefit", "happy".
Negative if the feedback includes words like "restrictive", "concern", "frustrating".
Neutral for comments that do not express strong emotion, such as “The system was implemented last year.”

Output and Insights:
Positive Feedback: A percentage of employees might express satisfaction with the flexibility the accessible pot offers for emergencies.
Negative Feedback: Some might be concerned about the restrictions on withdrawing or the sufficiency of the locked pot for retirement.
Neutral Feedback: General observations about the system’s features without any emotional tone.

In [None]:
import pandas as pd

# Load the datasets
economic_data = pd.read_csv('./economic_indicators_dataset_2010_2023.csv')
financial_behavior = pd.read_csv('./Financial_ Application_ Behavior_ Dataset.csv')


print(economic_data.head())
print(financial_behavior.head())

# Basic data information
print(economic_data.info())
print(financial_behavior.info())


In [None]:
# Check for missing values
print(economic_data.isnull().sum())
print(financial_behavior.isnull().sum())

# Handling missing values by either filling or dropping them
economic_data = economic_data.dropna()
financial_behavior = financial_behavior.dropna()

# Verify cleaning
print(economic_data.isnull().sum())
print(financial_behavior.isnull().sum())


In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

data = pd.read_csv('./economic_indicators_dataset_2010_2023.csv')

# Spliting data into features (X) and target variable (y)
X = data.drop("withdrawal_flag", axis=1)
y = data["withdrawal_flag"]

# Spliting data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature scaling (if necessary)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Model 1: Logistic Regression
logistic_model = LogisticRegression()
logistic_model.fit(X_train_scaled, y_train)
y_pred_logistic = logistic_model.predict(X_test_scaled)

# Model 2: Decision Tree
decision_tree_model = DecisionTreeClassifier()
decision_tree_model.fit(X_train_scaled, y_train)
y_pred_decision_tree = decision_tree_model.predict(X_test_scaled)

# Model 3: Random Forest
random_forest_model = RandomForestClassifier()
random_forest_model.fit(X_train_scaled, y_train)
y_pred_random_forest = random_forest_model.predict(X_test_scaled)

# Evaluation
def evaluate_model(y_true, y_pred):
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    print("Accuracy:", accuracy)
    print("Precision:", precision)
    print("Recall:", recall)
    print("F1-score:", f1)

print("Logistic Regression:")
evaluate_model(y_test, y_pred_logistic)

print("Decision Tree:")
evaluate_model(y_test, y_pred_decision_tree)

print("Random Forest:")
evaluate_model(y_test, y_pred_random_forest)

In [None]:

import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
from prophet import Prophet

data = pd.read_csv('./Financial_ Application_ Behavior_ Dataset.csv')

# Prepare the data
data["Date"] = pd.to_datetime(data["Date"])
data.set_index("Date", inplace=True)

# Create a Prophet model
prophet_model = Prophet()
prophet_model.fit(data)

# Make future predictions
future = prophet_model.make_future_dataframe(periods=36)
forecast = prophet_model.predict(future)

# Visualize the forecast
prophet_model.plot(forecast)
plt.title("Locked Pot Value Forecast")
plt.show()

# ARIMA model
arima_model = ARIMA(data, order=(1, 1, 1))
arima_model_fit = arima_model.fit()

# Make predictions
forecast_arima = arima_model_fit.forecast(steps=36)

# Visualize ARIMA forecast
plt.plot(forecast_arima)
plt.title("ARIMA Locked Pot Value Forecast")
plt.show()

Dataset Selection and Relevance for Analyzing Employee Behavior

Economic Indicators Dataset
Rationale:

Understanding External Influences: Economic indicators provide a broader context for employee financial decisions. Factors like inflation, GDP growth, and interest rates can significantly impact savings behavior.
Correlation with Employee Behavior: Changes in economic conditions may influence employees' propensity to withdraw from their accessible pots, especially during downturns.
Relevance:

Predicting Withdrawals: Economic downturns might lead to increased withdrawals as employees seek financial security.
Forecasting Savings Growth: Economic indicators can be used to model future market conditions and their impact on investment returns.
Financial Application Behavior Dataset
Rationale:

Direct Employee Data: This dataset contains individual-level information on user behavior within the two-pot system.
Understanding Usage Patterns: Analyzing factors like screen_list, numscreens, and minigame can provide insights into employee engagement and potential areas of confusion.
Relevance:

Predicting Withdrawals: Usage patterns might indicate financial distress or a need for liquidity, suggesting potential withdrawal behavior.
Forecasting Savings Growth: Understanding employee engagement with financial tools can help assess their financial literacy and potential for making informed savings decisions.
Combined Analysis:
By combining these datasets, you can gain a more comprehensive understanding of how economic conditions and individual behavior interact within the two-pot system. This enables you to:

Identify Vulnerable Groups: Analyze which segments of the workforce are more likely to withdraw funds based on both economic indicators and individual behavior.
Tailor Financial Advice: Provide targeted financial advice to employees based on their specific circumstances and the prevailing economic conditions.
Inform Policy Decisions: Assess the effectiveness of the two-pot system and identify areas for improvement.
In conclusion, the combination of these datasets provides a strong foundation for analyzing employee behavior within the two-pot system. By considering both external economic factors and individual-level data, you can gain valuable insights to inform policy decisions and improve financial well-being.
