# Data Analysis for HR Assistant

This notebook is used for exploring and analyzing the training data for the HR assistant model. It includes visualizations and insights derived from the data in `data/all_texts.jsonl`.

In [1]:
import pandas as pd
import json

# Load the data
data_path = '../data/all_texts.jsonl'
data = pd.read_json(data_path, lines=True)

# Display the first few rows of the dataset
data.head()

In [2]:
# Summary statistics of the dataset
data.describe(include='all')

In [3]:
# Visualize the distribution of competencies
import matplotlib.pyplot as plt
import seaborn as sns

plt.figure(figsize=(12, 6))
sns.countplot(data=data, x='competency')
plt.title('Distribution of Competencies')
plt.xticks(rotation=45)
plt.show()

In [4]:
# Analyze feedback scores
plt.figure(figsize=(12, 6))
sns.boxplot(data=data, x='competency', y='feedback_score')
plt.title('Feedback Scores by Competency')
plt.xticks(rotation=45)
plt.show()

## Insights

1. The distribution of competencies shows which areas are most frequently addressed in the training data.
2. The boxplot of feedback scores provides insights into how different competencies are perceived in terms of performance.