No description, website, or topics provided.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

TMSC Build Status codecov Docker Build Status PyPI

TMSC (Topics Modeling on Source Code) is a command line application to discover the topics of a repository the user provides. A "topic" is a set of keywords, in this case source code identifiers, which typically occur together. This project has nothing to do with GitHub topics.

$ tmsc
                Parallel and distributed processing - General IT	4.43
                Machine Learning, sklearn-like APIs - General IT	3.87
               Java/JS + async + JSON serialization - General IT	3.58
                Java string input/output - Programming languages	3.29
                            Cryptography: libraries - General IT	3.23
                        SQL, working with databases - General IT	3.11
                          Java: Spring, Hibernate - Technologies	3.09
                              Operations on numbers - General IT	2.98
                               Distributed clusters - General IT	2.62
           Functional programming, Scala - Programming languages	2.60

Automatic topic inference can be useful for cataloging repositories or mining concepts from them. The current model was trained on GitHub repositories cloned in October 2016 after de-fuzzy-forking. There is a paper on it.


pip3 install tmsc


Command line:

$ tmsc

Python API:

import tmsc

engine = tmsc.Topics()

Docker image

docker build -t srcd/tmsc
docker run -d --privileged -p 9432:9432 --name bblfshd bblfsh/bblfshd
docker exec -it bblfshd bblfshctl driver install --recommended
docker run -it --rm srcd/tmsc

In order to cache the downloaded models:

docker run -it --rm -v /path/to/cache/on/host:/root srcd/tmsc


...are welcome! See CONTRIBUTING and code of conduct.


Apache 2.0