Functional Time Series Clustering (FTSC) is a nonparametric clustering algorithm for time series data. Unlike kmeans, PCA and many other clustering algorithms, FTSC could handle the data with missing values in a most natural way. Besides, FTSC is very flexible and works well even when groups in dataset have heterogeneous structures as FTSC takes into consideration the with-in group variability.
This is a summer research project I participate in at Department of Biostatistics, Epidemiology and Informatics in Perelman school of medicine, University of Pennsylvania.
I am very fortunate to be co-advised by Prof. J. Richard Landis and Prof. Wensheng Guo. Without their detailed guidance and caring support, I wouldn't have finished the whole project.
Here is the performance comparison about the classification rate of FTSC, FunHDDC and K-means.
FunHDDC is a popular model-based clustering method.
3 groups of data with different time series structure, each has 50 subjects.
The visualization of the clustering results from FTSC is
Symptom and Health Care Utilization Questionnaire, denoted as SYM-Q5, is a self-rated total severity score of overall urologic or pelvic pain symptoms. An increasing SYM-Q5 score indicates that the overall urologiv or pelvic symptoms are worsening.
Clustering the patients through their SYM-Q5 scores over a period time can help hospital to evaluate the effectness of treatment and concentrate resources on the patients who get worse.
We use FTSC on the SYM-Q5 data from 397 patients with number of clusters equal to 3, the results are:
- 110 (27.5%) getting better (improving, symptom change scores decrease over time);
- 126 (31.7%) remaining stable (symptom change scores vary around 0);
- 161 (40.8%) worsening (symptom change scores increase over time).
This is the version with:
- numerical stable filtering and smoothing algorithm for time-variant state space model
- functional mixed effect model for periodic data
- allow missing data
- functional clustering on time series data
- choosing optimal number of clusters through estimated Kullback-Leibler divergence