A desktop application for Supervised Semantic Differential (SSD) analysis.
Download | ssdiff core library
SSD finds interpretable semantic dimensions in text data that are associated with a continuous outcome variable or categorical group labels.
Given a corpus of texts with associated numeric scores or group memberships, SSD identifies the direction through word-embedding space that best explains variation in the outcome. The result is a semantic dimension with two interpretable poles one associated with high outcomes, the other with low; complete with thematic clusters, example sentences, and statistical validation.
This application is the GUI frontend for the ssdiff Python package.
Pre-built binaries for Windows, Linux, and macOS are available on the Releases page. No Python installation required — just download the binary for your platform and run it.
The first startup will be quite slow because the app has to set itself up. On Windows you will also have to click through a couple of security messages — the only way to avoid this from my side is to pay Microsoft for a license, which I am not ready to do at the moment.
spaCy language models are downloaded automatically on first use.
Already have v1.0.0? Starting from this release, SSD will automatically notify you when a new version is available — but since v1.0.0 didn't have that feature yet, you'll need to download v1.1.0 manually once. After that, the app will alert you in-app whenever a newer release is out.
- Automatic update notifications — on startup the app silently checks GitHub for a newer release and shows a dismissible banner with a direct download link
- Export options — configure which columns appear in exported Word tables (cluster, regression, and pairwise tables) and how many top words are shown per cluster
- Delete saved runs — remove archived runs from the run selector with a confirmation prompt
- New appearance themes — two new light-mode themes (Crisp and Warm) for users who prefer lower-saturation palettes
- Outcome/Group column shown in run details — the Config tab now displays which outcome or group column was used for a run
SSD supports 23 languages via spaCy models (small, medium, and large variants available for each):
| Code | Language | Code | Language | Code | Language |
|---|---|---|---|---|---|
ca |
Catalan | hr |
Croatian | pl |
Polish |
da |
Danish | it |
Italian | pt |
Portuguese |
de |
German | ja |
Japanese | ro |
Romanian |
el |
Greek | ko |
Korean | ru |
Russian |
en |
English | lt |
Lithuanian | sl |
Slovenian |
es |
Spanish | mk |
Macedonian | sv |
Swedish |
fr |
French | nb |
Norwegian | uk |
Ukrainian |
nl |
Dutch | zh |
Chinese |
SSD requires pre-trained word embeddings, which are not bundled with the application due to their size. You need to download an embedding file separately before running an analysis.
- GloVe (English) — Download GloVe 840B 300d (~2 GB) for the best coverage, or GloVe 6B for quick tests.
- fastText (157 languages) — Pre-trained word vectors for most languages. Download the
.binformat. - Polish distributional models — For Polish-language analyses.
- Custom — Train your own with gensim's Word2Vec or fastText and export as
.kv.
| Format | Extension | Notes |
|---|---|---|
| gensim KeyedVectors | .kv |
Fastest to load |
| word2vec binary | .bin |
Standard binary format |
| Text | .txt, .vec |
One word per line + floats |
| Compressed text | .txt.gz, .vec.gz |
Gzip-compressed text |
Tip: The app offers to convert text-format embeddings to
.kvon first load, which makes subsequent loads much faster.
- Clinical psychology — linking patient narratives to symptom severity or treatment outcomes
- Computational social science — analyzing survey responses across demographic groups
- Political communication — comparing rhetorical framing across party lines
- Psycholinguistics — discovering latent semantic dimensions in language production
- Three-stage guided workflow — Setup, Run, and Results, with validation at each step
- Two analysis modes — continuous outcome regression or categorical group comparison with permutation tests
- Two concept modes — full-document analysis or lexicon-focused with context windows
- Interactive lexicon builder — token suggestions ranked by correlation, per-token coverage statistics with quartile breakdowns
- Automated PCA sweep — elbow detection for optimal dimensionality, with manual override
- Cluster interpretation — K-means clustering of pole neighbors with coherence scores and representative snippets
- Snippet browser — real sentences from the data, organized by cluster or beta alignment, with full document context
- APA-formatted export — regression tables, pairwise comparisons, and cluster summaries as Word documents, with configurable column selection
- Comprehensive export — CSV scores, pole neighbors, PCA plots, configuration JSON, and a human-readable hyperparameters file
- Project system — save, reload, and delete analyses; run multiple analyses with different lexicons or settings
- Automatic update notifications — silent startup check against GitHub releases with a dismissible in-app banner
- Customizable appearance — multiple color themes (including light-mode options) and font size scaling
- In-app tutorial — navigable guide with table of contents
- Python 3.10+
- Pre-trained word embeddings (see Word Embeddings above)
# Clone the repository
git clone https://github.com/hplisiecki/SSD_APP.git
cd SSD_APP
# Install dependencies
pip install -r requirements.txt
# Asian language support (Chinese, Japanese, Korean)
pip install spacy-pkuseg sudachipy sudachidict-core
# Run the application
python run.pypyinstaller SSD.spec --clean --noconfirmConfigure the data, text processing, and embedding settings.
- Load dataset — import a CSV, TSV, or Excel file and select the text, ID, and outcome/group columns
- Validate — check for missing values and confirm the dataset is ready
- Preprocess — tokenize, lemmatize, and sentence-split texts using spaCy
- Load embeddings — load a pre-trained word-embedding file with optional L2 normalization and ABTT (All-But-The-Top) denoising; text-format files (
.txt,.vec) can be auto-converted to.kvfor faster future loading - Choose analysis type — continuous outcome (regression) or group comparison (permutation test)
- Set hyperparameters — PCA sweep range, context window size, SIF weighting, clustering parameters, and more
A ready indicator shows which sections are complete before proceeding.
Define the concept and execute the analysis.
- Lexicon mode — build a keyword list using the interactive lexicon builder with automated suggestions, coverage statistics, and per-token diagnostics
- Full-document mode — analyze entire texts with an optional custom stoplist
- Pre-flight review — a read-only summary of the full configuration with sanity checks (outcome variance, sample size, OOV rate)
- Run — executes the SSD pipeline: document embedding, PCA, beta estimation, pole extraction, clustering, and snippet collection
Explore and export the results across multiple tabs.
| Tab | Contents |
|---|---|
| Summary | R², F-statistic, p-value, standardized beta, effect sizes, sample counts |
| Clusters | Side-by-side positive/negative cluster tables with size, coherence, and top words |
| Poles | Ranked word lists for each pole with cosine similarities |
| Themes | Detailed cluster view with full member lists |
| Snippets | Real sentences organized by cluster or beta alignment with document context |
| Scores | Per-document table with cosine scores, predicted values, and true outcomes |
| PCA Sweep | Plot of fit criterion across K values with selected elbow (auto mode) |
| Config | Read-only snapshot of all settings used for the run |
Multiple runs can be saved and compared using the run selector. Results can be exported as:
- CSV — per-document scores, pole neighbors
- Word (.docx) — APA-formatted regression/comparison tables, cluster summaries, snippet tables
- PNG — PCA sweep plot
- JSON — full configuration snapshot
- TXT — human-readable hyperparameters file
SSD_APP/
├── run.py # Application entry point
├── requirements.txt # Python dependencies
├── SSD.spec # PyInstaller build configuration
│
└── ssdiff_gui/ # Main package
├── main.py # App initialization
├── models/
│ └── project.py # Data models and configuration dataclasses
├── controllers/
│ ├── ssd_runner.py # SSD analysis execution
│ └── export_controller.py # Result export (DOCX, CSV, PNG, JSON, TXT)
├── views/
│ ├── main_window.py # Main application window
│ ├── stage1_setup.py # Stage 1: Setup
│ ├── stage2_concept.py # Stage 2: Concept definition & run
│ ├── stage3_results.py # Stage 3: Results viewer
│ ├── appearance_dialog.py # Theme and font customization
│ ├── settings_dialog.py # Application settings
│ ├── tutorial_dialog.py # In-app tutorial
│ └── widgets/ # Reusable UI components
├── utils/
│ ├── file_io.py # Project save/load
│ ├── validators.py # Input validation
│ └── worker_threads.py # Background workers
└── resources/
├── styles.qss # Application stylesheet
└── quotes.json # Loading screen quotes
| Package | Purpose |
|---|---|
| PySide6 | GUI framework |
| ssdiff | Core SSD analysis engine |
| spaCy | Text preprocessing and lemmatization |
| gensim | Word embedding loading and management |
| scikit-learn | PCA, K-means clustering, metrics |
| pandas | Data manipulation |
| numpy / scipy | Numerical computation |
| python-docx | Word document generation |
| matplotlib / seaborn | Visualization |
If you use SSD in your research, please cite:
Plisiecki, H., Lenartowicz, P., Pokropek, A., Malyska, K., & Flakus, M. (2025). Measuring Individual Differences in Meaning: The Supervised Semantic Differential. PsyArXiv. https://doi.org/10.31234/osf.io/gvrsb_v1
@article{plisiecki2025ssd,
title = {Measuring Individual Differences in Meaning: The Supervised Semantic Differential},
author = {Plisiecki, Hubert and Lenartowicz, Pawe{\l} and Pokropek, Artur and Ma{\l}yska, Kinga and Flakus, Maria},
year = {2025},
journal = {PsyArXiv},
doi = {10.31234/osf.io/gvrsb_v1},
url = {https://doi.org/10.31234/osf.io/gvrsb_v1}
}This project is licensed under the MIT License.
Some dependencies are distributed under the LGPL — see the LICENSES/ directory for details.
- File issues and feature requests on the repo’s Issues page.
- Pull requests welcome — especially for:
- Documentation improvements
Contact: hplisiecki@gmail.com
Project was funded by the National Science Centre, Poland (grant no. 2020/38/E/HS6/00302).