## This is an auxiliary tool to split train/test data sets of TRAFFIC ACCIDENT DETECTION

In [3]:
from pathlib import Path
import shutil

The goal of this function is to split the synthetic **test set** into 2 parts:
- *Train set of anomalies (accidents)*: Merge to trainset, need to reconstruct the train set directory.
- *Test set of nominals (normal)*: Remain into the current direcotry.

In [13]:
# assign the train/test directories
train_dir = "../datasets/CTAD/features/train"  # mind the relative path
test_dir = "../datasets/CTAD/features/test"

##### Splitting test features

In [14]:
# get all normal train features (normal)
normal_features = sorted(list(Path(train_dir).glob("*.npy")))

# get all anomaly features
anomaly_features = sorted(list(Path(test_dir).glob("*.npy")))

# select first 40 instance of each scene into train set
feat_to_move = [path for path in anomaly_features if int(path.stem.split('_')[1]) <= 40]

##### Statistics

In [16]:
total, merge = len(anomaly_features), len(feat_to_move)
print("Total features num: {}, features to be moved: {}".format(total, merge))

print("Total normal features: {}".format(len(normal_features)))

Total features num: 4782, features to be moved: 2411
Total normal features: 3473


##### Reconstruct train directory:

--train

      --normal
  
      --anomaly

In [21]:
# Make new directories

train_anomaly = Path(train_dir) / 'anomaly'
train_anomaly.mkdir(parents=True, exist_ok=True)

train_normal = Path(train_dir) / 'normal'
train_normal.mkdir(parents=True, exist_ok=True)

# Move normal features into train_normal directory
for path in normal_features:
    # print(f"src: {str(path)}, dest: {str(train_normal / path.name)}")
    path.rename(train_normal / path.name)

In [23]:
# Merge anomaly features into train_anomaly directory
for path in feat_to_move:
#     print(f"src: {str(path)}, dest: {str(train_anomaly / path.name)}")
    path.rename(train_anomaly / path.name)

### Done