# Getting started with **Time2Feat**

Note: you can run **[this notebook live in Google Colab](https://colab.research.google.com/github/softlab-unimore/time2feat/blob/master/demo.ipynb)**

Clone **time2feat** code in Colab environments

Remeber to **🔴RESTART RUNTIME ON COLAB**❗🔴

In [1]:
!git clone --quiet https://github.com/softlab-unimore/time2feat.git
!pip install -q -r time2feat/requirements.txt

[K     |████████████████████████████████| 97 kB 3.4 MB/s 
[K     |████████████████████████████████| 837 kB 48.0 MB/s 
[K     |████████████████████████████████| 1.1 MB 50.8 MB/s 
[K     |████████████████████████████████| 9.8 MB 40.0 MB/s 
[K     |████████████████████████████████| 136 kB 44.5 MB/s 
[K     |████████████████████████████████| 136 kB 42.9 MB/s 
[K     |████████████████████████████████| 830 kB 9.1 MB/s 
[K     |████████████████████████████████| 822 kB 45.2 MB/s 
[K     |████████████████████████████████| 802 kB 48.1 MB/s 
[K     |████████████████████████████████| 802 kB 42.5 MB/s 
[K     |████████████████████████████████| 793 kB 44.7 MB/s 
[K     |████████████████████████████████| 793 kB 45.9 MB/s 
[K     |████████████████████████████████| 791 kB 44.2 MB/s 
[K     |████████████████████████████████| 786 kB 47.9 MB/s 
[K     |████████████████████████████████| 779 kB 39.7 MB/s 
[K     |████████████████████████████████| 778 kB 52.7 MB/s 
[K     |██████████████████

**🔴RESTART RUNTIME ON COLAB**❗🔴

Import standard libraries

In [1]:
import os
import sys
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import adjusted_mutual_info_score

Import **time2feat** functions

In [2]:
sys.path.append('./time2feat')

In [3]:
from t2f.dataset import read_ucr_dataset
from t2f.extractor import feature_extraction
from t2f.importance import feature_selection
from t2f.clustering import ClusterWrapper

### Params

In [4]:
# Input and output folder
data_dir = 'time2feat/data/Cricket'

# Model params
transform_type = 'minmax'
model_type = 'Hierarchical'

# Performance params
train_size = 0
batch_size = 500
p = 1

In [None]:
# Simple consistency check
if not os.path.isdir(data_dir) or not os.path.isdir(output_dir):
  raise ValueError('Dataset and/or output folder don\'t exist')

if train_size < 0 or train_size > 1:
    raise ValueError('Train size must be between 0 and 1')

### Read dataset

In [5]:
print('Read ucr dataset: ', data_dir)
ts_list, y_true = read_ucr_dataset(path=data_dir)
n_clusters = len(set(y_true))  # Get number of clusters to find

print('Dataset shape: {}, Num of clusters: {}'.format(ts_list.shape, n_clusters))

labels = {}
if train_size > 0:
    # Extract a subset of labelled mts to train the semi-supervised model
    idx_train, _, y_train, _ = train_test_split(np.arange(len(ts_list)), y_true, train_size=train_size)
    labels = {i: j for i, j in zip(idx_train, y_train)}
    print('Number of Labels: {}'.format(len(labels)))

Read ucr dataset:  time2feat/data/Cricket
Dataset shape: (180, 1197, 6), Num of clusters: 12


### Feature extraction

In [6]:
print('Feature extraction')
df_features = feature_extraction(ts_list, batch_size, p)
print('Number of extracted features: {}'.format(df_features.shape[1]))

Feature extraction


Feature Extraction: 100%|██████████| 1080/1080 [29:36<00:00,  1.64s/it]


Number of extracted features: 4854


### Feature selection

In [7]:
print('Feature selection')
context = {'model_type': model_type, 'transform_type': transform_type}
top_features = feature_selection(df_features, labels, context)
df_features = df_features[top_features]
print('Number of selected features: {}'.format(df_features.shape[1]))

Feature selection
Number of selected features: 116


### Clustering

In [9]:
print('Clustering')
model = ClusterWrapper(n_clusters=n_clusters, model_type=model_type, transform_type=transform_type)
y_pred = model.fit_predict(df_features.values)

print('AMI: {:0.4f}'.format(adjusted_mutual_info_score(y_true, y_pred)))

Clustering
AMI: 0.9312
