Skip to content

Commit

Permalink
Merge pull request #84 from roskakori/62-add-json-output
Browse files Browse the repository at this point in the history
#62 Added JSON as additional output --format.
  • Loading branch information
roskakori committed Jan 5, 2022
2 parents 54b82f2 + 805b529 commit 5f656e7
Show file tree
Hide file tree
Showing 10 changed files with 266 additions and 19 deletions.
15 changes: 15 additions & 0 deletions docs/background.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,21 @@ As example consider this Python code:
This counts as 1 line of code and 3 lines of comments. The line with ``pass``
is considered a "no operation" and thus not taken into account.

.. _binary:

Binary files
------------

When a file is considered to be binary when all of the following conditions
match:

1. The file does not start with a BOM for UTF-8, UTF-16 or UTF-32 (which
indicates text files).
2. The initial 8192 bytes contain at least one 0 byte.

In this case, pygount assigns it the pseudo language ``__binary__`` and
performs no further analysis.


Comparison with other tools
-----------------------------------
Expand Down
3 changes: 3 additions & 0 deletions docs/changes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ This chapter describes the changes coming with each new version of pygount.

Version 1.3.0, 2022-xx-xx

* Added JSON as additional output :option:`--format`, see :doc:`json` for
details (issue `#62 <https://github.com/roskakori/pygount/issues/62>`_).

* Changed build process to `poetry <https://python-poetry.org/>`_ to change
several messy configuration files into a single even more messy
configuration file.
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ code is available from https://github.com/roskakori/pygount.
installation
usage
continuous-integration
json
background
api
contributing
Expand Down
119 changes: 119 additions & 0 deletions docs/json.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
JSON
====

.. program:: pygount

The JavaScript objects notation (JSON) is widely used to interchange data.
Running pygount with :option:`--format` "json" is a simple way to provide
the results of an analysis for further processing.


General format
--------------

The general structure of the resulting JSON is:

.. code-block:: JavaScript
{
"formatVersion": "1.0.0",
"pygountVersion": "1.3.0",
"files": [...],
"languages": [...],
"runtime": {...},
"summary": {...}
}
The naming of the entries deliberately uses camel case to conform to the
`JSLint <https://www.jslint.com/>`_ guidelines.

Both ``formatVersion`` and ``pygountVersion`` use
`semantic versioning <https://semver.org/>`_. The other entries contain the following information:

With ``files`` you can access a list of files analyzed, for example:

.. code-block:: JavaScript
{
"path": "/Users/someone/workspace/pygount/pygount/write.py",
"sourceCount": 253,
"emptyCount": 60,
"documentationCount": 27,
"group": "pygount",
"isCountable": true,
"language": "Python",
"state": "analyzed",
"stateInfo": null
}
Here, ``sourceCount`` is the number of source lines of code (SLOC),
``documentationCount`` the number of lines containing comments and
``emptyCount`` the number of empty lines (which includes "no operation"
lines).

The ``state`` can have one of the following values:

* analyzed: successfully analyzed
* binary: the file is a :ref:`binary file <binary>`
* duplicate: the file is a :ref:`duplicate <duplicates>` of another
* empty: the file is empty (file size = 0)
* error: the source could not be parsed; in this case, ``stateInfo``
contains a message with more details
* generated: the file has been generated as specified with :option:`--generated`
* unknown: pygments does not offer any lexer to analyze the file

In ``languages`` the summary for each language is available, for example:

.. code-block:: JavaScript
{
"documentationCount": 406,
"emptyCount": 631,
"fileCount": 18,
"isPseudoLanguage": false,
"language": "Python",
"sourceCount": 2332
}
In ``summary`` the total counts across the whole project can be accessed, for
example:

.. code-block:: JavaScript
"summary": {
"totalDocumentationCount": 410,
"totalEmptyCount": 869,
"totalFileCount": 32,
"totalSourceCount": 2930
}
The ``runtime`` entry collects general information about how well pygount performed
in collecting the information, for example:

.. code-block:: JavaScript
"runtime": {
"durationInSeconds": 0.712625,
"filesPerSecond": 44.904402736362044
"finishedAt": "2022-01-05T11:49:27.009310",
"linesPerSecond": 5906.332222417121,
"startedAt": "2022-01-05T11:49:26.296685",
}
Pretty printing
---------------

Because the output is concise and consequently mostly illegible for a
human reader, you might want to pipe it through a pretty printer. As you
already have python installed, the easiest way is:

.. code-block:: sh
pygount --format json | python -m json.tool
Another alternativ would be `jq <https://stedolan.github.io/jq/>`_:

.. code-block:: sh
pygount --format json | jq .
18 changes: 13 additions & 5 deletions docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,12 @@ To limit the analysis on certain file types, you can specify a comma separated
list of suffixes to take into account, for example ``--suffix=py,sql,xml``.

.. option:: --out FILE

By default the result of the analysis are written to the standard output. To
redirect the output to a file, use for example ``--out=counts.txt``.

To explicitly redirect to the standard output specify ``--out=STDOUT``.

.. option:: --format FORMAT

By default the result of the analysis are written to the standard output in a
Expand Down Expand Up @@ -64,6 +70,9 @@ overview and a sum total. For example pygount's summary looks like this::
The summary output is designed for human readers and the column widths adjust
to the data.

For further processing the results of pygount, ``--format=json`` should be the
easiest to deal with. For more information see :doc:`json`.


Patterns
--------
Expand All @@ -90,6 +99,8 @@ So for example to specify that generated code can also contain the German word
``--generated="[regex][...](?i).*generiert"``.


.. _duplicates:

Counting duplicates
-------------------

Expand Down Expand Up @@ -154,16 +165,13 @@ Pseudo languages
If a source code is not counted, the number of lines is 0 and the language
shown is a pseudo language indicating the reason:

* ``__binary__`` - the source code is a binary file; the detection of binary files
first ensures that file does not start with a BOM for UTF-8, UTF-16 or
UTF-32 (which indicates text files). After that it checks for zero bytes
within the initial 8192 bytes of the file.
* ``__binary__`` - used for :ref:`binary`.
* ``__duplicate__`` - the source code duplicate as described at the command line
option :option:`--duplicates`.
* ``__empty__`` - the source code is an empty file with a size of 0 bytes.
* ``__error__`` - the source code could not be parsed e.g. due to an I/O error.
* ``__generated__`` - the source code is generated according to the command line
option ``--generated``.
option :option:`--generated`.
* ``__unknown__`` - pygments does not provide a lexer to parse the source code.


Expand Down
5 changes: 5 additions & 0 deletions pygount/analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,11 @@ def string_count(self) -> int:
"""number of lines containing only strings but no other code"""
return self._string

@property
def source_count(self) -> int:
"""number of source lines of code (the sum of code_count and string_count)"""
return self.code_count + self.string_count

@property
def code(self) -> int:
# TODO #47: Remove deprecated property.
Expand Down
3 changes: 2 additions & 1 deletion pygount/command.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import pygount.write

#: Valid formats for option --format.
VALID_OUTPUT_FORMATS = ("cloc-xml", "sloccount", "summary")
VALID_OUTPUT_FORMATS = ("cloc-xml", "json", "sloccount", "summary")

_DEFAULT_ENCODING = "automatic"
_DEFAULT_OUTPUT_FORMAT = "sloccount"
Expand Down Expand Up @@ -55,6 +55,7 @@

_OUTPUT_FORMAT_TO_WRITER_CLASS_MAP = {
"cloc-xml": pygount.write.ClocXmlWriter,
"json": pygount.write.JsonWriter,
"sloccount": pygount.write.LineWriter,
"summary": pygount.write.SummaryWriter,
}
Expand Down
11 changes: 9 additions & 2 deletions pygount/summary.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,11 @@ def string_count(self) -> int:
"""sum number of lines containing only strings for this language"""
return self._string_count

@property
def source_count(self) -> int:
"""sum number of source lines of code"""
return self.code_count + self.string_count

@property
def is_pseudo_language(self) -> bool:
"""``True`` if the language is not a real programming language"""
Expand Down Expand Up @@ -127,14 +132,16 @@ def total_documentation_count(self) -> int:

@property
def total_empty_count(self) -> int:

return self._total_empty_count

@property
def total_string_count(self) -> int:

return self._total_string_count

@property
def total_source_count(self) -> int:
return self.total_code_count + self.total_string_count

@property
def total_file_count(self) -> int:
return self._total_file_count
Expand Down

0 comments on commit 5f656e7

Please sign in to comment.