-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Markdown / Jupyter markup not getting counted #937
Conversation
Edit: Added a couple of tests. |
a12fd0b
to
99fa852
Compare
811b320
to
5ed5031
Compare
This allows an optional definition of the line type for a language. If it is defined, it will sum them together. Defaults to using `.code`. Fixes o2sh#933
5ed5031
to
8d68fab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done!
Left a few remarks.
Don't forget to apply the same logic to the get_total_loc
function.
"compile" is a bit of a misnomer, since it's related to any compilation, but merely counts the number of lines of code for a language. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>
Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>
Thanks for the review! I'll probably get to |
This references internal logic, and is unnecessarily wordy. Instead, this field is now described in CONTRIBUTING to explain what it is and what values it can take. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>
27858a9
to
3b96046
Compare
src/info/langs/mod.rs
Outdated
|
||
let has_children = !language.children.is_empty(); | ||
|
||
if has_children { | ||
for reports in language.children.values() { | ||
for stats in reports.iter().map(|r| r.stats.summarise()) { | ||
code += stats.code; | ||
loc += stats.code; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I don't use Jupyter notebooks, so I thought Markdown was the "top level" code. Could that Markdown then contain its own code blocks? If so, perhaps this function should be recursive for deeply nested code blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tokei's parser is indeed recursive, as stated in its changelog:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deeply nested stats are now recursively counted with 939e9e1
Oops, I thought that I would be labelled the committer and you would be the author of f41bc02. I guess codespaces behave differently when |
bc2859f
to
875f945
Compare
As something to do in the future, I think it would be nice to collect code samples like linguist and tokei do, and perform some snapshot tests on the CLI output using those samples. As a lazy test, I used {
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Setup\n",
"\n",
"```bash\n",
"echo 'Hello, World!'\n",
"```"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
} And got 4 lines as expected (Jupyter contains Markdown, Markdown contains 3 comment lines and 1 line of Bash). |
9afb9b3
to
cdcd926
Compare
All lines are counted from the language's children. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>
`.summarise()` accomplishes what recursion was doing. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>
b1bbb20
to
aa31a72
Compare
Just checked the source code of Which may be preferable behavior (it makes sense to me IMO), but I just want to throw that out there. |
src/info/langs/language.tera
Outdated
pub fn loc(language_type: &tokei::LanguageType, language: &tokei::Language) -> usize { | ||
let loc = match language_type { | ||
{% for language, attrs in languages -%} | ||
{%- set line_types = attrs.line_types | default(value=['code']) -%} | ||
tokei::LanguageType::{{ language }} => language.{{ line_types.0 }}{% for line_type in line_types | slice(start=1) %} + language.{{ line_type }}{% endfor %}, | ||
{% endfor %} | ||
_ => unimplemented!("Language Type {:?}", language_type), | ||
}; | ||
loc + language.children.iter().fold(0, |sum, (lang_type, reports)| { | ||
sum + reports.iter().fold(0, |sum, report| sum + stats_loc(lang_type, &report.stats)) | ||
}) | ||
} | ||
|
||
/// Counts the lines-of-code of a tokei `Report`. This is the child of a | ||
/// `tokei::CodeStats`. | ||
pub fn stats_loc(language_type: &tokei::LanguageType, stats: &tokei::CodeStats) -> usize { | ||
let stats = stats.summarise(); | ||
match language_type { | ||
{% for language, attrs in languages -%} | ||
{%- set line_types = attrs.line_types | default(value=['code']) -%} | ||
tokei::LanguageType::{{ language }} => stats.{{ line_types.0 }}{% for line_type in line_types | slice(start=1) %} + stats.{{ line_type }}{% endfor %}, | ||
{% endfor %} | ||
_ => unimplemented!("Language Type {:?}", language_type), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you think it would be better to keep the code logic outside of the tera template? I'm refering to lines 123-125 and 131.
We could probably extract those lines and put them back in get_language_distribution()
between line 34 and 36:
let has_children = !language.children.is_empty();
if has_children {
for (lang_type, reports) in language.children.iter() {
loc += reports
.iter()
.map(|r| r.stats.summarise())
.fold(0, |sum, stats| sum + language::stats_loc(lang_type, &stats));
}
}
this would simplify the template but more importantly avoid obfuscating part of the code logic.
Leaving two methods loc
and stats_loc
inside the template which would be almost identical except for one of their parameters. I wonder if there is a way we could combine the two methods 🤔 (maybe in the future)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Yeah, it does obfuscate it a bit by hiding the logic in the template. But my thinking was that I'd like to avoid needing to repeat similar logic (adding the LOC of the children) for both language distributions and total line counts. Since these are kind of private methods, maybe they should be #[doc(hidden)]
and called __loc()
, and have more visible methods called loc()
and stats_loc()
in language.rs
that call these methods and act as wrappers.
fn loc(language_type: &tokei::LanguageType, language: &tokei::Language) -> usize {
__loc(language_type, language) + language
.children
// ...
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you mentioned in your comment #937 (comment)
We do need additional tests covering more complex cases (language with children f. ex.) to make sure we don't break things. Should not block this PR, but preferably before the next release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But my thinking was that I'd like to avoid needing to repeat similar logic (adding the LOC of the children) for both language distributions and total line counts.
You're right, I missed that 🤦
Since these are kind of private methods, maybe they should be #[doc(hidden)] and called __loc(), and have more visible methods called loc() and stats_loc() in language.rs that call these methods and act as wrappers.
I'm sorry, but I'm lost - too many loc
😅 - maybe you can commit your suggestion to illustrate your idea. But yeah, if we could avoid putting logic inside the tera template, that would be great. we could probably factorize this part inside a method:
let mut loc = language::loc(language_name, language);
let has_children = !language.children.is_empty();
if has_children {
for (lang_type, reports) in language.children {
loc += reports
.iter()
.map(|r| r.stats.summarise())
.fold(0, |sum, stats| sum + language::stats_loc(&lang_type, &stats));
}
}
and call it for both language distributions and total line counts.
But it's your call, do what you think is best 👍 I think this PR is already pretty good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can commit your suggestion to illustrate your idea.
Sure! 64077e7 is the basic idea. __<function>
contains the repetitive code that the template manages, and <function>
is a wrapper that calls that function and includes some helper code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably add a test for Jupyter code and call it on get_total_loc
. That way loc
and stats_loc
is 100% covered. __loc
's template would be 100% covered, but the generated code would probably be <10% with each language being a branch 😆
I think I'll just hardcode the structs necessary to simulate Jupyter code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect 🎊 , thanks @spenserblack
This adds public wrapper functions to the templated functions and makes them private. This makes it so as much code is in actual Rust code, and leaves the template to only manage the repetitive code.
3e14718
to
64077e7
Compare
e9b006e
to
9c0d544
Compare
This allows an optional definition of the line type for a language. If it is defined, it will sum them together. Defaults to using
.code
. By using this definition, comments are included in the type of lines of code to be summed for Markdown and Jupyter Notebooks.This also changes the summing of child languages to be recursive, so that deeply nested code (e.g. Bash in Markdown in Jupyter) can be counted.
Fixes #933