Commentator is a fucking fast source code comments finder CLI and Rust SDK (crate).
Existing source code comments extractors (see References) forcomments extracting are quite slow, not always accurate (don't find all comments) or doesn't provide SDK. This tool fixes all of this.
$ cat Main.java
/*
* License text.
*/
package com.example;
/**
* Entry point class.
* @since 1.0
*/
public class Main {
/**
* Main method.
*/
public static void main(String... args) {
// TODO: run app here
}
}
$ commentator --format=json --lang=java --trim Main.java | jq
[
{
"line": 1,
"start": 0,
"body": "License text."
},
{
"line": 6,
"start": 0,
"body": "Entry point class.\n@since 1.0"
},
{
"line": 12,
"start": 4,
"body": "Main method."
},
{
"line": 16,
"start": 10,
"body": "TODO: run app here"
}
]
- Get crate: crates.io/crates/commentator
- Get CLI: releases@latest
This library could be used as CLI or from code.
To build CLI from sources (you need Rust and Cargo installed):
# clone repo
git clone https://github.com/g4s8/commentator.git
cd commentator
# build with cargo
cargo build --release --bin commentator --features feat-bin
# move binary to your $PATH
sudo mv ./target/release/commentator /usr/local/bin
Or download from release pages: https://github.com/g4s8/commentator/releases/tag/0.1.0
commentator
require file name argument and supports these options:
--format
- output format: eitherplain
orjson
--lang
- language comment specification, one of:c
,java
,go
,cpp
- for C-like comment syntaxrust
- Rust comments syntaxbash
- for Bash, Python and Rubyhtml
- for HTML, XML
--trim
- trim comment symbols and whitespaces, align to the first sentence indent.
Example:
./commentator --format=json --lang=java filename.java
SDK allows you to find, parse and trim comments from source code files. It's designed to be performance and memory-effecient: you can push source code to tokenizer line by line, and take parsed comment after each push operation, when you finish with tokenizer you need to notify about the end of file.
See documentation for more details: https://docs.rs/commentator/0.2.3/commentator/
Example:
let mut t = Tokenizer::new(&spec::StandardSpec::C);
t.update(1, "/*\n");
t.update(2, " * Entry point.\n");
t.update(3, " */\n");
t.update(4, "public static void main(String... args) {\n");
t.update(5, " System.out.println(\"hello world\");\n");
t.update(6, "}\n");
t.finish();
let mut cmt = t.take().unwrap();
cmt.trim(&spec::StandardSpec::C);
assert_eq!(cmt.text, "Entry point.");
assert!(t.take().is_none());
- github.com/jonschlinkert/extract-comments - supports only JavaScript, not a binary, not fast.
- github.com/nknapp/multilang-extract-comments - no all comment cases could be extracted (didn't find all comments in test files
./test-files
), not a binary tool (requirenpm
andnode
to run), not fast. - tree-sitter.github.io/tree-sitter -too complex for this case, doesn't have a binary CLI.
- (feel free to submit other tools)