A generalizable deep learning pipeline for analyzing sequential signals through recurrence quantification analysis (RQA) and metric learning. Originally developed for character-level linguistic biomarkers in dementia detection (95.9% AUC), this methodology can be applied to any sequential signal data.
This repository implements a novel methodology that transforms sequential signals into visual recurrence patterns, then learns discriminative embeddings using deep metric learning. The approach is domain-agnostic and has been validated on speech data but can be applied to:
- Biomedical Signals: ECG, EEG, EMG, speech patterns
- Financial Time Series: Stock prices, trading patterns
- Industrial Sensors: Manufacturing quality control, predictive maintenance
- Behavioral Data: User interaction sequences, activity recognition
- Natural Language: Character or word-level text analysis
The pipeline consists of four key stages:
- Converts sequential data into fixed-length representations
- Supports custom tokenization/embedding strategies
- Handles variable-length sequences with padding
- Transforms signal embeddings into visual recurrence matrices
- Captures temporal dynamics and self-similarity patterns
- Uses Euclidean distance with adaptive epsilon thresholding
- Learns discriminative embeddings through contrastive loss
- Trains pairs of similar/dissimilar samples
- CNN-based architecture for feature extraction
- Uses learned embeddings for downstream tasks
- Supports any classifier (XGBoost, Random Forest, SVM, etc.)
- Includes cross-validation and performance metrics
# Clone the repository
git clone https://github.com/yourusername/signal2recurrence.git
cd signal2recurrence
# Install dependencies
pip install -r requirements.txtfrom signal2recurrence import SignalPipeline
import pandas as pd
# Load your sequential data
# Format: DataFrame with 'signal' and 'label' columns
data = pd.read_csv('your_data.csv')
# Initialize pipeline
pipeline = SignalPipeline(
embedding_dim=32,
max_sequence_length=None, # Auto-detect
recurrence_epsilon=None, # Auto-calculate
image_size=(128, 128)
)
# Process signals and generate recurrence plots
pipeline.fit_transform(
signals=data['signal'],
labels=data['label'],
save_plots=True,
output_dir='recurrence_plots'
)
# Train deep metric learning model
pipeline.train_siamese_network(
epochs=20,
batch_size=16,
validation_split=0.2
)
# Extract embeddings
embeddings = pipeline.get_embeddings()
# Train classifier
from xgboost import XGBClassifier
classifier = XGBClassifier(random_state=42)
classifier.fit(embeddings['train'], data['label_train'])
# Evaluate
accuracy = classifier.score(embeddings['test'], data['label_test'])signal2recurrence/
βββ signal2recurrence/ # Main package
β βββ __init__.py
β βββ preprocessing.py # Signal preprocessing & embedding
β βββ recurrence.py # Recurrence plot generation
β βββ siamese.py # Siamese network implementation
β βββ pipeline.py # End-to-end pipeline
β βββ utils.py # Utility functions
βββ examples/ # Usage examples
β βββ speech_analysis.py # Character-level speech example
β βββ ecg_classification.py # ECG signal example
β βββ custom_signals.py # Generic signal template
βββ tests/ # Unit tests
βββ notebooks/ # Jupyter notebooks
β βββ demo.ipynb # Interactive demo
βββ requirements.txt
βββ setup.py
βββ LICENSE
βββ README.md
from signal2recurrence.preprocessing import SignalPreprocessor
preprocessor = SignalPreprocessor(
tokenization='character', # 'character', 'word', 'custom'
embedding_type='learned', # 'learned', 'onehot', 'pretrained'
embedding_dim=32,
max_length=None, # Auto-detect or specify
padding='post', # 'post' or 'pre'
truncation='post' # 'post' or 'pre'
)from signal2recurrence.recurrence import RecurrencePlotGenerator
rp_generator = RecurrencePlotGenerator(
epsilon=None, # Auto-calculate or specify
distance_metric='euclidean', # 'euclidean', 'cosine', 'manhattan'
image_size=(128, 128),
colormap='binary'
)from signal2recurrence.siamese import SiameseNetwork
siamese = SiameseNetwork(
input_shape=(128, 128, 1),
base_filters=32,
embedding_dim=128,
learning_rate=0.001,
margin=1.0 # Contrastive loss margin
)The original implementation achieved:
- ROC AUC: 95.9% (character-level linguistic biomarkers)
- Stratified 5-Fold CV: 0.9589 Β± 0.0142
- Precision/Recall: Balanced across classes
- Early detection of cognitive decline
- Parkinson's disease voice analysis
- Sleep apnea detection from breathing patterns
- Anomaly detection in sensor data
- Predictive maintenance from vibration signals
- Quality control in manufacturing
- Fraud detection in transaction sequences
- Market regime classification
- Trading pattern recognition
If you use this methodology in your research, please cite:
@article{mekulu2025character,
title={Character-Level Linguistic Biomarkers for Precision Assessment of Cognitive Decline: A Symbolic Recurrence Approach},
author={Mekulu, Kevin and Aqlan, Faisal and Yang, Hui},
journal={medRxiv},
year={2025},
doi={10.1101/2025.06.12.25329529},
note={Preprint}
}Published: June 13, 2025 (medRxiv preprint)
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Developed at Penn State University, Industrial Engineering Department
- Funded by NSF I-Corps ($50K)
- Forbes 30 Under 30 Healthcare Recognition
Kevin - jkevin2010.kj@gmail.com
Project Link: https://github.com/jkevin2010/signal2recurrence
Note: This is a research tool. For medical applications, consult with healthcare professionals and follow appropriate regulatory guidelines.