# **Project Name: Data Mining for Confidence Level Detection **

# **Katie Park and Lydia Lonzarich**

# **Dataset Description**

The dataset used for this project is called "Confidence Detection Dataset," from [Kaggle](https://www.kaggle.com/datasets/muhammadkhubaibahmad/confidence-detection-dataset), formatted as a table in a CSV file with 5,950 rows. It includes 19 different attributes, with 15 being continuous, already normalized within the dataset, and 4 being categorical:
1. eye_shoulder_y_ratio
2. shoulder_y_diff
3. wrist_distance_x
4. wrist_shoulder_ratio
5. nose_eye_center_offset_x
6. shoulder_span
7. hip_should_y_diff
8. body_lean_x
9. shoulder_center_x
10. hip_center_x
11. spine_angle
12. eye_distance
13. head_tilt_angle
14. eye_distance_ratio
15. shoulder_slope
16. head_direction
17. arm_position
18. posture
19. confidence_label

The different attributes are extracted from human body landmarks, as well as body postures, and are used to classify the confidence_label, which will also be the predicted attribute for this project.

# **Implementation and Technical Merit**

As there are no missing values, and the continuous attributes are already normalized, the only anticipated challenge we may encounter may be class imbalance, as there are more instances of an instance having a "Confident" class (53%) over having a "Neutral" or "Low" class (28% and 19%, respectively). This may be addressed, as mentioned in the Dataset Notes from the source author, by either class weights or oversampling to prevent any bias.

Additionally, as there are over 18 attributes to predict an unseen instance's class, the number of attributes will likely be reduced. The attributes that will not be used will most likely be the columns that contain information about unchangeable features on a person. Attributes such as eye_shoulder_y_ratio (the vertical ratio between eyes and shoulders), wrist_distance_x (the horizontal distance between wrists), eye_distance (distance between eyes), etc., will likely be unused for prediction, as they are unchangeable features of a person, and do not change if a person is confident or not.

Further, a decision tree to decide on the most important features will likely be used for the remaining attributes, and the k attributes with the lowest entropy/highest information gain will be the attributes to predict confidence levels in this project.

# **Potential Impact**

This dataset aims to predict a person’s confidence level using various body features and postures. Confidence is an important behavioral indicator because it can provide insights into how someone is engaging with an environment or task. Low confidence levels might reveal areas where improvement or intervention is necessary.

For example, a professor who learns that their students have low-confidence in their class might reconsider their teaching approach or pacing. Similarly, interview preparation tools can provide feedback to help users strengthen their interview skills and present themselves more confidently and assertively. In healthcare settings, a patient’s confidence level can inform healthcare professionals of the level of support and care they require for treatment and recovery.

Therefore, developing a model that can accurately and robustly predict a person’s confidence level based on observable body features and postures supports a wide range of applications, including education, career development, and healthcare.

An internal stakeholder for this classification task might be behavioral scientists who are interested in analyzing the relationship between body language and human behavioral patterns. External stakeholders may include human-computer-interaction (HCI) designers, who could use these results to refine models that respond appropriately to humans based on their confidence and behavior during interaction. Additionally, augmented reality (AR) developers represent another external stakeholder group. When designing AR technologies, scientists must consider how users will respond within the virtual environment. Individuals with low confidence might interact less authentically, which can hinder the effectiveness of virtual reality as an alternative or supplement for real-world tasks. Understanding which behaviors induce confidence and which inhibit it can help AR developers design more realistic and engaging virtual environments and experiences.


# **Citations**

- Ahmad, Muhammad Khubaib. “Confidence Detection Dataset.” Kaggle, 24 Oct. 2025, www.kaggle.com/datasets/muhammadkhubaibahmad/confidence-detection-dataset.
- Benson, Tim, and Alex Benson. “Measuring Health Confidence: Benefits to Patients, Clinicians and Healthcare Providers.” BMJ Open Quality, U.S. National Library of Medicine, 17 Aug. 2025, pmc.ncbi.nlm.nih.gov/articles/PMC12359424/.
- Landau, Peter. “What Is a Stakeholder? Definitions, Types & Examples.” ProjectManager, ProjectManager, 17 Sept. 2025, www.projectmanager.com/blog/what-is-a-stakeholder.
- Mujahid, Amna, et al. “Multi-Class Confidence Detection Using Deep Learning Approach.” MDPI, Multidisciplinary Digital Publishing Institute, 30 Apr. 2023, www.mdpi.com/2076-3417/13/9/5567.
