Skip to content

Latest commit

 

History

History
209 lines (134 loc) · 7.2 KB

README.md

File metadata and controls

209 lines (134 loc) · 7.2 KB


DO Not Touch your face (DONT)


NEWS (20.03.30) : DONT ver.0.4

  • Add MobileNet version.

    • Can recognize face-touching actions in 0.07 sec with Intel(R) Core i7-6700 CPU 3.40GHz or higher (92% accuracy)
  • TO DO:

    • Upload a paper on Arxiv about this project

    • Add sound effects for alarm

    • Add a "report" function based on 24-hour monitoring mode

    • Increase the supported languages (Eng, Jap) for GUI

    • Try video clips from ceiling-mounted security cameras

    • Develop a lightweight model for mobile phones, CPU-only machine


Installation

# Clone this repository and enter it:
$ git clone https://github.com/mi2rl/DONT.git

# Set up the environment
$ conda create -n [your virtual environment name] python3

# Activate the environment
$ conda activate fta_gpu

# Install all the dependencies
$ pip install torch==1.2.0+cu92 torchvision==0.4.0+cu92 -f https://download.pytorch.org/whl/torch_stable.html

$ pip install -r requirements.txt

Quick Guide

  • GUI program can be run using

    $ python main.py
  • GUI


    * Run/pause the classifier by 'Start'/'Pause' button
    • The result from action classifier will be shown

    • Turn on Webcam display in live using 'Camera' button


  • Run Windows (EXE) App

    • Unzip a downloaded .zip file, then run "DONT.exe"


  • When Webcam display window is activated, other buttons are deactivated (Terminate Webcam display mode and press the buttons)

  • If the webcam is not ready, the software does NOT work properly.


Further details

  • Datasets

    • In order to make the training dataset, MI2RL members and many collaborators contributed. As a result, we gathered a total of 190,000 images
    • Video clips were recorded at more than 10 different locations
    • Action classes : 11 classes
      • Overall classes : drinking, picking up phone, removing mask, resting chin on hand, rubbing eyes, touching glasses, touching hairs, touching keyboard, touching nose, touching phone, wearing mask
      • Touching actions : picking up phone, resting chin on hand, rubbing eyes, touching hairs, touching nose
  • Action Classification Network (I3D / MobileNet3)

    • I3D Network (https://github.com/deepmind/kinetics-i3d)

      • Training phase

        • The number of frames in each stack for 3D CNN : 16
        • Data augmentation
          • Step in frames between each clip : 4
          • Color distortion
          • Rotation
      • Inference Phase

        • The number of frames in each stack for 3D CNN : 24
    • MobileNet3 (https://github.com/d-li14/mobilenetv3.pytorch)

      • Training phase

        • The number of frames in each stack for CNN : 3

          • Make 3 channel image as following time interval (3, 5, 7, 9, 11, 13, 15)

          • Each time interval, we make 20,000 images. So, the number of training images are 280,000.


        • Datset configuration



        • Inference Phase

          • Time interval (stride): 3 frame
  • H/W specification

    • Test specification.

      • GPU : Geforce GTX 960 4GB
        • CPU : Intel(R) Core i7-6700 CPU 3.40GHz
        • OS : Linux Ubuntu 18.04
        • Inference
          • I3D Network
            • 0.07~0.085 sec on GPU
              • 1.4~1.5 sec on CPU
              • CPU usage ≈ 35%
              • GPU memory usage ≈ 1.1GB
          • MobileNet v3
            • 0.03~0.04 sec on GPU
            • 0.07~0.09 sec on CPU
            • CPU usage ≈ 4%
            • GPU memory usage ≈ 520MB
    • Minimum specification

      • Geforce GTX 960 4GB
      • Intel(R) Core i7-6700 CPU 3.40GHz
      • OS : Linux / Windows

Experimental Results

Confusion matrix : binary-class



Limitations

  • DONT began at 2020.03.05. We decided that it would be more desirable to call for joint efforts through faster release than creating high-performance programs, so we decided to proceed with the disclosure despite the lack of progress.

How to donate your data

  • For more robust model for DONT, we need more data from different environments and people. If you want to donate your data, please send it to namkugkim@gmail.com. Your privacy will be strictly protected, as strong as possible.

Guideline for data donation

  • Please take a video and send it to the e-mail address above.
  • Recording process is as follows.
    • Wearing mask -> (With a mask) -> Touching nose -> Resting chin on hand -> Rubbing eyes -> Touching hairs -> Drinking water-> Touching phone -> Picking up phone -> Touching keyboard -> (Without a mask) -> Touching nose -> Resting chin on hand -> Rubbing eyes -> Touching hairs -> Drinking water-> Touching phone -> Picking up phone -> Touching keyboard
    • Moderate video recording time is about 90 seconds.
    • Example : Gudieline for video recording

Project Contributors