Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code_count to JSON output #122

Closed
2 tasks
SeanTConrad opened this issue Jul 14, 2023 · 5 comments · Fixed by #153
Closed
2 tasks

Add code_count to JSON output #122

SeanTConrad opened this issue Jul 14, 2023 · 5 comments · Fixed by #153
Assignees
Milestone

Comments

@SeanTConrad
Copy link

SeanTConrad commented Jul 14, 2023

Story

As user of the JSON format I want...

  • ...to have access to the various code counts so that I can decide myself which one fits my purpose best
  • ...to understand the differences of the various code counts so that I can make an educated choice which one fits my purpose best.

Goals

  • The JSON output includes code_count.
  • The documentation section on How pygount counts code includes an explanation of the different counts, especially the difference between code_count and source_count.

Original request: Format=json summary "totalSourceCount" doesn't match format=summary Code Sum

I apologize if I am missing an option or documentation on this.

I was testing Pygount by running on the public Rails repo:
git clone git@github.com:rails/rails.git

First, I test the summary format with:
pygount -F=.git,node_modules --format=summary rails

The resulting "Sum" for the "Code" column is 382801
image

Then, I test the JSON output with:
pygount -F=.git,node_modules --format=json -o pygount_test.json rails

The resulting summary at the bottom of the JSON has 410575 for totalSourceCount
{ "summary":{ "totalDocumentationCount":62533, "totalDocumentationPercentage":10.743837150966607, "totalEmptyCount":108928, "totalEmptyPercentage":18.71499357428063, "totalFileCount":4546, "totalSourceCount":410575, "totalSourcePercentage":70.54116927475276 } }

Am I incorrect in expecting the summary format Code Sum and the json format totalSourceCount to match?

Thank you

@roskakori roskakori self-assigned this Jul 14, 2023
@roskakori
Copy link
Owner

roskakori commented Jul 14, 2023

True, this is an unfortunate inconsistency and lapse in the JSON output.

Background: I always disliked how SLOC tools count even the most trivial pieces of code that add 0 code complexity. That's why pygount never counts lines that only contain "{" in C or pass in Python.

For the same reason, it only reluctantly counts lines that contain nothing but strings. Internally, it collects the following counts, see summary.LanguageSummary. Each lines adds to exactly one of these counts:

  • code_count: contains actual, meaningful code that takes some effort to understand, like variables, function calls, math operations, ...
  • string_count: counts lines, that contain only strings and typical characters to separate them like comma (,). Technically, they are code, but most of the time code wise they are easy to comprehend because it's just some text targeted for the end user. In practice that might not always be true e.g. when cramming strings complex SQL statements into Java code. But I decided this is rare enough to warrant erring on the lower side of complexity from time to time.
  • documentation_count: contain only comments
  • empty_count: contains only white space or language dependent "no operation" code like curly braces, pass, nop,

The --format=summary shows the code_count.

Because I figured that "my" code count might not be popular with everyone, internally there also is:

  • source_count = code_count + string_count: This is pretty close to how e.g. SLOCCount and cloc count code.

The --format=json includes the source_count.

For the record, there also is:

  • line_count = code_count + documentation_count + empty_count + string_count: This is essentially the number of lines the code would show in a text editor or wc -l but always counting the last line, even if it does not end with a new line / carriage return.

To come to a conclusion: Would these two changes resolve your issue?

  • The JSON output includes code_count.
  • The documentation section on How pygount counts code includes an explanation of the different counts, especially the difference between code_count and source_count.

@roskakori roskakori added this to the v1.6.2 milestone Jul 14, 2023
@roskakori
Copy link
Owner

@SeanTConrad If I understand correctly from looking at VerinFast/verinfast#8 you are striving for compatibility with cloc. In that case, the source_count number from the JSON is already the one you are looking for.

Regardless, pygount's inconsistency with --format=summary and the lack of documentation still need to be addressed.

@SeanTConrad
Copy link
Author

@roskakori Thank you.

@SeanTConrad
Copy link
Author

Sorry @roskakori . I just realized I didn't respond to your earlier message.

If it's the same cost, you could show 3 values for "loc" in both, as you described above. They can be called whatever, as long as it's documented as you said. If I understand you correctly, these are the outputs for "LOC":

  1. code_count - Count of meaningful lines of code
  2. source_count - Count of any lines of code, including single character lines. Similar to "CLOC"
  3. line_count - Count of all lines, akin to wc -l (nice to have for QA, but not necessary for our needs)

For our needs, we are comparing two or more repos for size, complexity, amount of work, etc. I would use "your" measure of code, with it being very important that we are consistent across repos.

Thank you!

tldr: I agree with you and adding code_count to the json would be great.

@roskakori roskakori modified the milestones: v1.7.0, v1.8.0 May 12, 2024
@roskakori roskakori changed the title Format=json summary "totalSourceCount" doesn't match format=summary Code Sum Add code_count to JSON output May 13, 2024
roskakori added a commit that referenced this issue May 13, 2024
This is not really part of this issue because this is just for fun. But it avoids the overhead of creating a separate issue and branch for such a minor thing.
roskakori added a commit that referenced this issue May 13, 2024
roskakori added a commit that referenced this issue May 13, 2024
roskakori added a commit that referenced this issue May 13, 2024
@roskakori
Copy link
Owner

@SeanTConrad The JSON format finally contains all available counts, and the documentation describes them.

I also added #152 for the counts to eventually be consistent across formats. This will be a breaking change and thus only be part of future a version 2.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants