Fix Markdown / Jupyter markup not getting counted #937

spenserblack · 2023-01-19T14:48:59Z

This allows an optional definition of the line type for a language. If it is defined, it will sum them together. Defaults to using .code. By using this definition, comments are included in the type of lines of code to be summed for Markdown and Jupyter Notebooks.

This also changes the summing of child languages to be recursive, so that deeply nested code (e.g. Bash in Markdown in Jupyter) can be counted.

Fixes #933

spenserblack · 2023-01-19T14:54:27Z

~~I can add a test or two if needed.~~

Edit: Added a couple of tests.

This allows an optional definition of the line type for a language. If it is defined, it will sum them together. Defaults to using `.code`. Fixes o2sh#933

src/info/langs/mod.rs

languages.yaml

src/info/langs/language.tera

o2sh

Nicely done!

Left a few remarks.

Don't forget to apply the same logic to the get_total_loc function.

"compile" is a bit of a misnomer, since it's related to any compilation, but merely counts the number of lines of code for a language. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

spenserblack · 2023-01-20T22:57:52Z

Thanks for the review! I'll probably get to get_total_loc over the weekend.

This references internal logic, and is unnecessarily wordy. Instead, this field is now described in CONTRIBUTING to explain what it is and what values it can take. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

o2sh · 2023-01-21T17:00:45Z

src/info/langs/mod.rs


        let has_children = !language.children.is_empty();

        if has_children {
            for reports in language.children.values() {
                for stats in reports.iter().map(|r| r.stats.summarise()) {
-                    code += stats.code;
+                    loc += stats.code;


language::loc should probably also be applied here in case one of the children language happens to be Markdown.

Jupyter notebooks is a good example:

That's a good point. I don't use Jupyter notebooks, so I thought Markdown was the "top level" code. Could that Markdown then contain its own code blocks? If so, perhaps this function should be recursive for deeply nested code blocks?

tokei's parser is indeed recursive, as stated in its changelog:

Deeply nested stats are now recursively counted with 939e9e1

languages.yaml

src/info/langs/mod.rs

spenserblack · 2023-01-22T14:20:13Z

Oops, I thought that I would be labelled the committer and you would be the author of f41bc02. I guess codespaces behave differently when --author= is used. Hope you don't mind that I made a commit that looks like you.

spenserblack · 2023-01-22T17:31:47Z

As something to do in the future, I think it would be nice to collect code samples like linguist and tokei do, and perform some snapshot tests on the CLI output using those samples. As a lazy test, I used

{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "```bash\n",
    "echo 'Hello, World!'\n",
    "```"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}

And got 4 lines as expected (Jupyter contains Markdown, Markdown contains 3 comment lines and 1 line of Bash).

src/info/langs/mod.rs

All lines are counted from the language's children. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

`.summarise()` accomplishes what recursion was doing. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

spenserblack · 2023-01-23T13:44:50Z

Just checked the source code of .summarise, and it looks like we'll lose some granularity.
Before aa31a72, a comment in a bash code block in Markdown in Jupyter would be ignored, since only code is counted for Bash. Summarise would combine the CodeStats from the Bash with the Markdown, making all comments get counted, as long as the 1st-level child is Markdown.

Which may be preferable behavior (it makes sense to me IMO), but I just want to throw that out there.

o2sh · 2023-01-23T21:18:59Z

src/info/langs/language.tera

+pub fn loc(language_type: &tokei::LanguageType, language: &tokei::Language) -> usize {
+    let loc = match language_type {
+        {% for language, attrs in languages -%}
+            {%- set line_types = attrs.line_types | default(value=['code']) -%}
+            tokei::LanguageType::{{ language }} => language.{{ line_types.0 }}{% for line_type in line_types | slice(start=1) %} + language.{{ line_type }}{% endfor %},
+        {% endfor %}
+        _ => unimplemented!("Language Type {:?}", language_type),
+    };
+    loc + language.children.iter().fold(0, |sum, (lang_type, reports)| {
+        sum + reports.iter().fold(0, |sum, report| sum + stats_loc(lang_type, &report.stats))
+    })
+}
+
+/// Counts the lines-of-code of a tokei `Report`. This is the child of a
+/// `tokei::CodeStats`.
+pub fn stats_loc(language_type: &tokei::LanguageType, stats: &tokei::CodeStats) -> usize {
+    let stats = stats.summarise();
+    match language_type {
+        {% for language, attrs in languages -%}
+            {%- set line_types = attrs.line_types | default(value=['code']) -%}
+            tokei::LanguageType::{{ language }} => stats.{{ line_types.0 }}{% for line_type in line_types | slice(start=1) %} + stats.{{ line_type }}{% endfor %},
+        {% endfor %}
+        _ => unimplemented!("Language Type {:?}", language_type),
+    }
+}


Don't you think it would be better to keep the code logic outside of the tera template? I'm refering to lines 123-125 and 131.

We could probably extract those lines and put them back in get_language_distribution() between line 34 and 36:

let has_children = !language.children.is_empty(); if has_children { for (lang_type, reports) in language.children.iter() { loc += reports .iter() .map(|r| r.stats.summarise()) .fold(0, |sum, stats| sum + language::stats_loc(lang_type, &stats)); } }

this would simplify the template but more importantly avoid obfuscating part of the code logic.

Leaving two methods loc and stats_loc inside the template which would be almost identical except for one of their parameters. I wonder if there is a way we could combine the two methods 🤔 (maybe in the future)

🤔 Yeah, it does obfuscate it a bit by hiding the logic in the template. But my thinking was that I'd like to avoid needing to repeat similar logic (adding the LOC of the children) for both language distributions and total line counts. Since these are kind of private methods, maybe they should be #[doc(hidden)] and called __loc(), and have more visible methods called loc() and stats_loc() in language.rs that call these methods and act as wrappers.

fn loc(language_type: &tokei::LanguageType, language: &tokei::Language) -> usize { __loc(language_type, language) + language .children // ...

What do you think?

As you mentioned in your comment #937 (comment)
We do need additional tests covering more complex cases (language with children f. ex.) to make sure we don't break things. Should not block this PR, but preferably before the next release.

But my thinking was that I'd like to avoid needing to repeat similar logic (adding the LOC of the children) for both language distributions and total line counts.

You're right, I missed that 🤦

Since these are kind of private methods, maybe they should be #[doc(hidden)] and called __loc(), and have more visible methods called loc() and stats_loc() in language.rs that call these methods and act as wrappers.

I'm sorry, but I'm lost - too many loc 😅 - maybe you can commit your suggestion to illustrate your idea. But yeah, if we could avoid putting logic inside the tera template, that would be great. we could probably factorize this part inside a method:

let mut loc = language::loc(language_name, language); let has_children = !language.children.is_empty(); if has_children { for (lang_type, reports) in language.children { loc += reports .iter() .map(|r| r.stats.summarise()) .fold(0, |sum, stats| sum + language::stats_loc(&lang_type, &stats)); } }

and call it for both language distributions and total line counts.

But it's your call, do what you think is best 👍 I think this PR is already pretty good.

maybe you can commit your suggestion to illustrate your idea.

Sure! 64077e7 is the basic idea. __<function> contains the repetitive code that the template manages, and <function> is a wrapper that calls that function and includes some helper code.

I'll probably add a test for Jupyter code and call it on get_total_loc. That way loc and stats_loc is 100% covered. __loc's template would be 100% covered, but the generated code would probably be <10% with each language being a branch 😆
I think I'll just hardcode the structs necessary to simulate Jupyter code.

Perfect 🎊 , thanks @spenserblack

This adds public wrapper functions to the templated functions and makes them private. This makes it so as much code is in actual Rust code, and leaves the template to only manage the repetitive code.

spenserblack requested a review from o2sh as a code owner January 19, 2023 14:48

spenserblack force-pushed the feature/933/line-types branch from a12fd0b to 99fa852 Compare January 19, 2023 16:28

spenserblack marked this pull request as draft January 19, 2023 16:28

spenserblack force-pushed the feature/933/line-types branch 2 times, most recently from 811b320 to 5ed5031 Compare January 19, 2023 16:31

Conditionally count lines based on language type

8d68fab

This allows an optional definition of the line type for a language. If it is defined, it will sum them together. Defaults to using `.code`. Fixes o2sh#933

spenserblack force-pushed the feature/933/line-types branch from 5ed5031 to 8d68fab Compare January 19, 2023 16:54

spenserblack marked this pull request as ready for review January 19, 2023 16:55

vercel bot deployed to Preview January 20, 2023 15:27 View deployment

o2sh reviewed Jan 20, 2023

View reviewed changes

src/info/langs/mod.rs Outdated Show resolved Hide resolved

o2sh reviewed Jan 20, 2023

View reviewed changes

languages.yaml Outdated Show resolved Hide resolved

o2sh reviewed Jan 20, 2023

View reviewed changes

src/info/langs/language.tera Outdated Show resolved Hide resolved

o2sh requested changes Jan 20, 2023

View reviewed changes

spenserblack and others added 2 commits January 20, 2023 22:47

Simplify name of LOC counting function

8f3ddf4

"compile" is a bit of a misnomer, since it's related to any compilation, but merely counts the number of lines of code for a language. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

Reduce clutter of output loc function

3711141

Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

spenserblack and others added 2 commits January 21, 2023 09:34

Use templated loc function for total LOC

6c509d0

Remove reference to tokei from languages file

3b96046

This references internal logic, and is unnecessarily wordy. Instead, this field is now described in CONTRIBUTING to explain what it is and what values it can take. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

spenserblack force-pushed the feature/933/line-types branch from 27858a9 to 3b96046 Compare January 21, 2023 09:34

vercel bot deployed to Preview January 21, 2023 10:30 View deployment

o2sh reviewed Jan 21, 2023

View reviewed changes

languages.yaml Outdated Show resolved Hide resolved

o2sh reviewed Jan 21, 2023

View reviewed changes

src/info/langs/mod.rs Show resolved Hide resolved

spenserblack marked this pull request as draft January 22, 2023 14:14

Add unit test for get_total_loc

f41bc02

vercel bot deployed to Preview January 22, 2023 15:50 View deployment

vercel bot deployed to Preview January 22, 2023 16:59 View deployment

spenserblack force-pushed the feature/933/line-types branch from bc2859f to 875f945 Compare January 22, 2023 17:00

vercel bot deployed to Preview January 22, 2023 17:06 View deployment

vercel bot deployed to Preview January 22, 2023 17:09 View deployment

vercel bot deployed to Preview January 22, 2023 17:11 View deployment

spenserblack marked this pull request as ready for review January 22, 2023 17:16

vercel bot deployed to Preview January 22, 2023 17:35 View deployment

spenserblack force-pushed the feature/933/line-types branch from 9afb9b3 to cdcd926 Compare January 22, 2023 17:37

vercel bot deployed to Preview January 22, 2023 17:38 View deployment

o2sh reviewed Jan 22, 2023

View reviewed changes

src/info/langs/mod.rs Show resolved Hide resolved

spenserblack and others added 3 commits January 23, 2023 02:09

Recursively count code stats LOC

d5b8519

Unset line types for Jupyter Notebooks

1adece9

All lines are counted from the language's children. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

Remove unnecessary recursion

aa31a72

`.summarise()` accomplishes what recursion was doing. Co-authored-by: Ossama Hjaji <ossama-hjaji@live.fr>

spenserblack force-pushed the feature/933/line-types branch from b1bbb20 to aa31a72 Compare January 23, 2023 02:10

o2sh reviewed Jan 23, 2023

View reviewed changes

o2sh approved these changes Jan 23, 2023

View reviewed changes

Refactor to make templated functions private

64077e7

This adds public wrapper functions to the templated functions and makes them private. This makes it so as much code is in actual Rust code, and leaves the template to only manage the repetitive code.

spenserblack force-pushed the feature/933/line-types branch from 3e14718 to 64077e7 Compare January 24, 2023 00:07

Add Jupyter total LOC test

9c0d544

spenserblack force-pushed the feature/933/line-types branch from e9b006e to 9c0d544 Compare January 24, 2023 00:46

o2sh merged commit 5379ecd into o2sh:main Jan 24, 2023

spenserblack deleted the feature/933/line-types branch January 24, 2023 18:57

o2sh mentioned this pull request Jan 28, 2023

Refactoring of info/langs/mod.rs #948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Markdown / Jupyter markup not getting counted #937

Fix Markdown / Jupyter markup not getting counted #937

spenserblack commented Jan 19, 2023 •

edited

Loading

spenserblack commented Jan 19, 2023 •

edited

Loading

o2sh left a comment •

edited

Loading

spenserblack commented Jan 20, 2023 •

edited

Loading

o2sh Jan 21, 2023 •

edited

Loading

spenserblack Jan 21, 2023

o2sh Jan 21, 2023 •

edited

Loading

spenserblack Jan 22, 2023

spenserblack commented Jan 22, 2023 •

edited

Loading

spenserblack commented Jan 22, 2023

spenserblack commented Jan 23, 2023

o2sh Jan 23, 2023 •

edited

Loading

spenserblack Jan 23, 2023

o2sh Jan 23, 2023 •

edited

Loading

o2sh Jan 23, 2023 •

edited

Loading

spenserblack Jan 23, 2023 •

edited

Loading

spenserblack Jan 24, 2023

o2sh Jan 24, 2023 •

edited

Loading

Fix Markdown / Jupyter markup not getting counted #937

Fix Markdown / Jupyter markup not getting counted #937

Conversation

spenserblack commented Jan 19, 2023 • edited Loading

spenserblack commented Jan 19, 2023 • edited Loading

o2sh left a comment • edited Loading

Choose a reason for hiding this comment

spenserblack commented Jan 20, 2023 • edited Loading

o2sh Jan 21, 2023 • edited Loading

Choose a reason for hiding this comment

spenserblack Jan 21, 2023

Choose a reason for hiding this comment

o2sh Jan 21, 2023 • edited Loading

Choose a reason for hiding this comment

spenserblack Jan 22, 2023

Choose a reason for hiding this comment

spenserblack commented Jan 22, 2023 • edited Loading

spenserblack commented Jan 22, 2023

spenserblack commented Jan 23, 2023

o2sh Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

spenserblack Jan 23, 2023

Choose a reason for hiding this comment

o2sh Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

o2sh Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

spenserblack Jan 23, 2023 • edited Loading

Choose a reason for hiding this comment

spenserblack Jan 24, 2023

Choose a reason for hiding this comment

o2sh Jan 24, 2023 • edited Loading

Choose a reason for hiding this comment

spenserblack commented Jan 19, 2023 •

edited

Loading

spenserblack commented Jan 19, 2023 •

edited

Loading

o2sh left a comment •

edited

Loading

spenserblack commented Jan 20, 2023 •

edited

Loading

o2sh Jan 21, 2023 •

edited

Loading

o2sh Jan 21, 2023 •

edited

Loading

spenserblack commented Jan 22, 2023 •

edited

Loading

o2sh Jan 23, 2023 •

edited

Loading

o2sh Jan 23, 2023 •

edited

Loading

o2sh Jan 23, 2023 •

edited

Loading

spenserblack Jan 23, 2023 •

edited

Loading

o2sh Jan 24, 2023 •

edited

Loading