This thesis project focuses on developing a system for recognizing Bangla sign language at the phrase level using hand gesture analysis. Sign language serves as the primary form of communication for individuals who are unable to speak.
The thesis addresses the critical aspect of hand detection and tracking in sign recognition. In this regard a dataset comprising 27 Bangla sign phrases was created involving over 1100 videos
MediaPipe Hands is a high-fidelity hand and finger tracking solution. It employs machine learning (ML) to infer 21 3D landmarks of a hand from just a single frame. Whereas current state-of-the-art approaches rely primarily on powerful desktop environments for inference, our method achieves real-time performance on a mobile phone, and even scales to multiple hands.
To detect and track hands and extract key points from the palm portion of each hand, Google's MediaPipe library was utilized. Angle features between each key point on the hand palm were calculated. A Long Short-Term Memory (LSTM) model was employed for classification purposes.
The proposed system achieved an overall accuracy of 92.07% in recognizing Bangla sign language phrases. Additionally, the system was tested on the PKSLMNM dataset, which represents Phrase level sign language done by Pakistani researchers and consists of seven different sign phrases. The recognition accuracy for phrases from the PkSLMNM dataset using the proposed system was 90.98%, surpassing the 82.66% accuracy reported by the researchers of PkSLMNM.
Although the system was primarily evaluated for phrase level sign language recognition, its potential applications extend to recognizing any hand gesture-based sign and facilitating Human-Computer Interaction (HCI)