# Dataset
Note: The database is publicly available for non-commercial use.

Please refer to [Schuldt, Laptev and Caputo, Proc. ICPR'04, Cambridge, UK ] if you use this database in your publications.

This database contains sequences of six classes of actions :
* walking.zip (242Mb)
* jogging.zip (168Mb)
* running.zip (149Mb)
* boxing.zip (194Mb)
* handwaving.zip (218Mb)
* handclapping.zip (176Mb)

[Dataset link](http://www.nada.kth.se/cvap/actions/)


## Downloading Dataset
Remove "!" if you are runnning this script on your local machine.

In [0]:
!mkdir dataset
!mkdir data

In [2]:
!wget http://www.nada.kth.se/cvap/actions/boxing.zip
!wget http://www.nada.kth.se/cvap/actions/handclapping.zip
!wget http://www.nada.kth.se/cvap/actions/handwaving.zip
!wget http://www.nada.kth.se/cvap/actions/jogging.zip
!wget http://www.nada.kth.se/cvap/actions/running.zip
!wget http://www.nada.kth.se/cvap/actions/walking.zip
!wget http://www.nada.kth.se/cvap/actions/00sequences.txt -P dataset

!unzip boxing.zip -d dataset/boxing
!unzip handclapping.zip -d dataset/handclapping
!unzip handwaving.zip -d dataset/handwaving
!unzip jogging.zip -d dataset/jogging
!unzip running.zip -d dataset/running
!unzip walking.zip -d dataset/walking

!rm *.zip

--2019-11-25 21:15:43--  http://www.nada.kth.se/cvap/actions/boxing.zip
Resolving www.nada.kth.se (www.nada.kth.se)... 130.237.227.116
Connecting to www.nada.kth.se (www.nada.kth.se)|130.237.227.116|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘boxing.zip’

boxing.zip              [       <=>          ] 185.49M  18.5MB/s    in 9.1s    

2019-11-25 21:15:52 (20.4 MB/s) - ‘boxing.zip’ saved [194498294]

--2019-11-25 21:15:54--  http://www.nada.kth.se/cvap/actions/handclapping.zip
Resolving www.nada.kth.se (www.nada.kth.se)... 130.237.227.116
Connecting to www.nada.kth.se (www.nada.kth.se)|130.237.227.116|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘handclapping.zip’

handclapping.zip        [  <=>               ] 168.71M  23.6MB/s    in 7.6s    

2019-11-25 21:16:02 (22.3 MB/s) - ‘handclapping.zip’ saved [176901831]

--2019-11-25 21:16:03--  http://ww

In [3]:
import cv2
cv2.__version__

'3.4.3'

In [0]:
import numpy as np
import os
import pickle
import PIL
import re
from PIL import Image
import imageio

In [5]:
!pip install scipy==1.1.0

Collecting scipy==1.1.0
[?25l  Downloading https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl (31.2MB)
[K     |████████████████████████████████| 31.2MB 76kB/s 
[31mERROR: albumentations 0.1.12 has requirement imgaug<0.2.7,>=0.2.5, but you'll have imgaug 0.2.9 which is incompatible.[0m
Installing collected packages: scipy
  Found existing installation: scipy 1.3.2
    Uninstalling scipy-1.3.2:
      Successfully uninstalled scipy-1.3.2
Successfully installed scipy-1.1.0


In [0]:
from scipy.misc.pilutil import imresize

In [0]:
CATEGORIES = [
    "boxing",
    "handclapping",
    "handwaving",
    "jogging",
    "running",
    "walking"
]

## Spliting Dataset
Dataset divided according to the instruction at:

http://www.nada.kth.se/cvap/actions/00sequences.txt


In [0]:
TRAIN_PEOPLE_ID = [11, 12, 13, 14, 15, 16, 17, 18]
DEV_PEOPLE_ID = [19, 20, 21, 23, 24, 25, 1, 4]
TEST_PEOPLE_ID = [22, 2, 3, 5, 6, 7, 8, 9, 10]

## Preparing Dataset


In [0]:
def prepare_dataset(dataset="train", sequences=None):
    if dataset == "train":
        ID = TRAIN_PEOPLE_ID
    elif dataset == "dev":
        ID = DEV_PEOPLE_ID
    else:
        ID = TEST_PEOPLE_ID

    if sequences == None:
      frames_idx = clean_sequence_file()
    else:
      frames_idx = sequences

    data = []
    
    for category in CATEGORIES:
        # Get all files in current category's folder.
        folder_path = os.path.join("", "dataset", category)
        filenames = sorted(os.listdir(folder_path))

        for filename in filenames:
            filepath = os.path.join("", "dataset", category, filename)

            # Get id of person in this video.
            person_id = int(filename.split("_")[0][6:])
            if person_id not in ID:
                continue

            vid = imageio.get_reader(filepath, "ffmpeg")

            frames = []

            # Add each frame to correct list.
            for i, frame in enumerate(vid):
                # Boolean flag to check if current frame contains human.
                ok = False
                for seg in frames_idx[filename]:
                    if i >= seg[0] and i <= seg[1]:
                        ok = True
                        break
                if not ok:
                    continue

                # Convert to grayscale.
                frame = Image.fromarray(np.array(frame))
                frame = frame.convert("L")
                frame = np.array(frame.getdata(),
                                 dtype=np.uint8).reshape((120, 160))
                #frame = imresize(frame, (60, 80))

                frames.append(frame)

            data.append({
                "filename": filename,
                "category": category,
                "frames": frames    
            })

    pickle.dump(data, open("data/%s.p" % dataset, "wb+"))

In [0]:
def clean_sequence_file():
    print("Cleaning dataset/00sequences.txt ...")

    # Read 00sequences.txt file.
    with open('dataset/00sequences.txt', 'r') as content_file:
      for _ in range(20): # Skiping first 20 lines of sequences file (instruction details).
        next(content_file)
      content = content_file.read()

    # Replace tab and newline character with space, then split file's content
    # into strings.
    content = re.sub("[\t\n]", " ", content).split()

    # Dictionary to keep ranges of frames with humans.
    # Example:
    # video "person01_boxing_d1": [(1, 95), (96, 185), (186, 245), (246, 360)].
    frames_idx = {}

    # Current video that we are parsing.
    current_filename = ""

    for s in content:
        if s == "frames":
            # Ignore this token.
            continue
        elif s.find("-") >= 0:
            # This is the token we are looking for. e.g. 1-95.
            if s[len(s) - 1] == ',':
                # Remove comma.
                s = s[:-1]

            # Split into 2 numbers => [1, 95]
            idx = s.split("-")

            # Add to dictionary.
            if not current_filename in frames_idx:
                frames_idx[current_filename] = []
            frames_idx[current_filename].append((int(idx[0]), int(idx[1])))
        else:
            # Parse next file.
            current_filename = s + "_uncomp.avi"

    return frames_idx

In [21]:
extracted_sequences = clean_sequence_file()
print("Preparing train dataset ...")
prepare_dataset(dataset="train", sequences=extracted_sequences)
print("Preparing dev dataset ...")
prepare_dataset(dataset="dev", sequences=extracted_sequences)
print("Preparing test dataset ...")
prepare_dataset(dataset="test", sequences=extracted_sequences)

Cleaning dataset/00sequences.txt ...
Preparing train dataset ...
Preparing dev dataset ...
Preparing test dataset ...
