Skip to content

An Undergraduate Capstone Project under the Digital Signal Processing Laboratory of the University of the Philippines Diliman

Notifications You must be signed in to change notification settings

jbramos9/DSP03_AY2223

Repository files navigation

Multi-Stage Hybrid-CNN Transformer Model for Human Intent-Prediction

This repo contains all the models (including the variant) developed in the project Multi-Stage Hybrid-CNN Transformer Model for Human Intent-Prediction. As an overview, the Multi-Stage Hybrid-CNN Transformer Classifier System is composed of two key components: the Gazed Object Detector and the Intent Classifier.

  • The Gazed Object Detector is in the gazed_object_detectors which contains the three different variations of the model.
  • The Intent Classifier is in the intent_classifier folder.
  • The Overall System for inference is in the multi-stage_human_intent_classifier_system folder.

Each folder has its own readme.md for guidance.

Dataset

The dataset that was used in this project can be accessed here. The generators and statistics for the train-test split are in the split folder.

Recommendations

  1. Add more video samples such that the gaze distribution is balanced ("None" or not looking at objects is currently overrepresented)
  2. Develop the weights of the Gaze Object Detector from scratch to tailor the model for object-gaze classification
  3. Consider the probability of gaze for all objects in a given frame, instead of the most probable gaze, as input to the human intent classifier
  4. Explore other human pose estimation techniques (as inspired by the increase in performance from the additional head information used)

Acknowledgement

We are extremely grateful to the work of DETR and MGTR, where the gazed object detector was heavily based from.

About

An Undergraduate Capstone Project under the Digital Signal Processing Laboratory of the University of the Philippines Diliman

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •