Behavioral data embeddings for the stratification of individuals with neurodevelopmental conditions.
Designed for observational measurements of cognition and behavior of individuals with Autism Spectrum Conditions (ASCs).
Python 3.6+
R 3.4+
The full list of required Python Packages is available in requrirements.txt
file. It is possible
to install all the dependency by:
$ pip install -r requirements.txt
A complete example of the Behavioural Phenotype Stratification is available as Jupyter notebook:
jupyter notebook behavioral_phenotyping_pipeline.ipynb
The code is structured into multiple modules (.py
files), including algorithms and methods
for the multiple steps of the pipeline:
dataset.py
: Connects to the database and dump datafeatures.py
: Returns vocabulary and dictionary of behavioral EHRs for each of the 4 possible depth levels. It also returns a dataset with quantitative scores for level 4 featurespt_embedding.py
: Performs TFIDF for patient embeddings; Glove embeddings on words and average them out for subject embeddings; Word2vec embeddings on words, that are then averaged to output individual representationsclustering.py
: Performs Hierarchical Clustering/k-means on embeddings, and quantitative 4th level featuresvisualization.py
: Visualizes results (e.g. scatterplot & dendrogram)for sub-cluster visualization; Heatmap for inspection of quantitative scores between sub-clustersbasic_statistics.py
: Returns basic demographic statistics for dataset descriptiontest-demog-cl.R
: Runs multiple pairwise comparisons between subgroups to check for confounders and support clinical validation