Merge pull request #15 from sarnold/ssg-nist-controls

Doc stitching and cleanup
sarnold · Apr 19, 2024 · ecdff90 · ecdff90
2 parents 3e6cbe0 + 9129977
commit ecdff90
Show file tree

Hide file tree

Showing 22 changed files with 759 additions and 188 deletions.
diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,7 @@ ext/
 .ymltoxml.y*
 .yasort.y*
 .yagrep.y*
+.oscal.y*
 in.*
 out.*
 sorted-out/

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -47,6 +47,7 @@ repos:
       additional_dependencies:
         - importlib_metadata
         - importlib_resources
+        - natsort
         - munch
         - munch-stubs
         - ruamel.yaml

diff --git a/README.rst b/README.rst
@@ -1,19 +1,19 @@
-=========================
- ymltoxml (and xmltoyml)
-=========================
+================================
+ ymltoxml (and more YAML tools)
+================================
 
 |ci| |wheels| |release| |badge| |coverage|
 
 |pre| |cov| |pylint|
 
 |tag| |license| |python|
 
-Python command line tools to convert between XML_ files and YAML_ files,
+Python command line tools to convert files between XML_ and YAML_,
 preserving attributes and comments (with minor corrections).  The default
 file encoding for both types is UTF-8 without a BOM. Now includes more
 console entry points to grep or sort interesting YAML files (eg, lists
-of rules found in the `SCAP Security Guide`_) and more input file types
-to ingest SSG and other upstream data, eg, NIST oscal-content_.
+of rules found in the `SCAP Security Guide`_) and support for more input
+file types to ingest SSG and other upstream data, eg, NIST oscal-content_.
 
 .. _SCAP Security Guide: https://github.com/ComplianceAsCode/content
 .. _oscal-content: https://github.com/usnistgov/oscal-content.git
@@ -96,9 +96,9 @@ type::
   can be relative or absolute)
 
   + when passing input files as arguments, the output file names/paths
-    are the same as the input files but with the output extension
+    are the same as the input files but with the (new) output extension
 
-By default it will process one more input files as command args, typically
+By default it will process one or more input files as command args, typically
 in the current directory, however, the ``--infile`` option will only
 process a single file path, optionally with an output file path, with no
 extra (file) arguments.
@@ -110,7 +110,7 @@ copy must be named ``.ymltoxml.yaml``.  To get a copy of the default
 configuration file, do::
 
   $ cd path/to/work/dir/
-  $ ymltoxml --dump-config > .ymltoxml.yaml
+  $ ymltoxml --save-config
   $ $EDITOR .ymltoxml.yaml
 
 yagrep
@@ -129,13 +129,13 @@ General usage guidelines:
 
 * use the ``-f`` (filter) arg to search for a value string
 * follow the (json) output from above to find the key name
-* then use the ``-l`` (lookup) arg to extract the values for the above key
+* then use the ``-l`` (lookup) arg to extract the values for the key
 
 Useful yagrep config file settings:
 
 :default_separator: change the path separator to something like ``;`` if data
                     has forward slashes
-:output_format: set the output format to ``raw`` for unformmated output
+:output_format: set the output format to ``raw`` for unformatted output
 
 ::
 
@@ -169,9 +169,9 @@ Useful yagrep config file settings:
 yasort
 ------
 
-Another helper script is included for sorting large (YAML) lists.
-The ``yasort`` script also uses its own configuration file, creatively named
-``.yasort.yaml``. The above applies equally to this config file.
+Yet another helper script is included for sorting large (YAML) lists.
+The ``yasort`` script also uses its own configuration file, creatively
+named ``.yasort.yaml``. The above applies equally to this config file.
 
 ::
 
@@ -237,14 +237,14 @@ Dev workflows
 The following covers two types of workflows, one for tool usage in other
 (external) projects, and one for (internal) tool development.
 
-Mavlink support
----------------
+Mavlink use case
+----------------
 
-The ymltoxml tool is intended to be part of larger workflow, ie, developing
-custom mavlink message dialects and generating/deploying the resulting
-mavlink language interfaces.  To be more specific, for this example we
-use a mavlink-compatible component running on a micro-controller, thus
-the target language bindings are C and C++.
+The ymltoxml tools are intended to be part of a larger workflow, ie,
+developing custom mavlink message dialects and generating/deploying the
+resulting mavlink language interfaces.  To be more specific, for this
+example we use a mavlink-compatible component running on a micro-controller,
+thus the target language bindings are C and C++.
 
 Tool requirements for the full mavlink workflow:
 
@@ -259,13 +259,14 @@ only Git, Python, and Tox.
 .. _XML: https://en.wikipedia.org/wiki/Extensible_Markup_Language
 .. _YAML: https://en.wikipedia.org/wiki/YAML
 
-SCAP support
-------------
+SCAP use case
+-------------
 
-The yasort/yagrep tools are intended to be part of a larger workflow, mainly
-working with SCAP content, ie, the scap-security-guide source files (or
-just content_). It is currently used to sort profiles with large numbers
-of rules to make it easier to visually diff and spot duplicates, etc.
+The yasort/yagrep tools are also intended to be part of a larger
+workflow, mainly working with SCAP content, ie, the scap-security-guide
+source files (or just content_). It is currently used to sort profiles
+with large numbers of rules, as well as create control files and analyze
+existing controls.
 
 The yasort configuration file defaults are based on existing yaml structure,
 but feel free to change them for another use case. To adjust how the sorting

diff --git a/pyproject.toml b/pyproject.toml
@@ -40,7 +40,7 @@ exclude_lines = [
 ]
 
 [tool.black]
-line-length = 90
+line-length = 88
 skip-string-normalization = true
 include = '\.py$'
 exclude = '''

diff --git a/requirements.txt b/requirements.txt
@@ -3,6 +3,7 @@ dpath
 importlib-metadata; python_version < '3.8'
 importlib-resources; python_version < '3.10'
 munch
+natsort
 nested-lookup
 pystache==0.6.5
 PyYAML

diff --git a/scripts/analyze_ssg_controls.py b/scripts/analyze_ssg_controls.py
@@ -4,20 +4,16 @@
 
 import os
 import sys
+import tempfile
 import typing
 from collections import Counter
 from pathlib import Path
 
 from diskcache import Deque
-from nested_lookup import nested_lookup
 
+from nested_lookup import nested_lookup
 from ymltoxml.templates import xform_id
-from ymltoxml.utils import (
-    FileTypeError,
-    get_cachedir,
-    get_filelist,
-    text_file_reader,
-)
+from ymltoxml.utils import FileTypeError, get_filelist, text_file_reader
 
 id_count: typing.Counter[str] = Counter()
 id_queue = Deque(get_cachedir(dir_name='id_queue'))
@@ -37,6 +33,15 @@
 ]
 
 
+def get_cachedir(dir_name='yml_cache'):
+    """
+    Get temp cachedir (create it if needed) and override the dir_name if
+    passed.
+    """
+    cache_dir = tempfile.gettempdir()
+    return os.path.join(cache_dir, dir_name)
+
+
 def set_unique(sequence):
     """
     Remove duplicates and emulate a set with ordered elements.

diff --git a/scripts/csvchk.py b/scripts/csvchk.py
@@ -0,0 +1,16 @@
+"""
+Simple consumer test.
+"""
+
+from ymltoxml.utils import text_data_writer, text_file_reader
+
+OPTS = {
+    'file_encoding': 'utf-8',
+    'output_format': 'csv',
+}
+
+
+# read in some json "column data"
+data = text_file_reader('tests/data/catalog.json', OPTS)
+# spit out CSV records
+ret = text_data_writer(data, OPTS)
diff --git a/scripts/xform_idchk.py b/scripts/xform_idchk.py
@@ -0,0 +1,23 @@
+"""
+Simple consumer test.
+"""
+
+from natsort import os_sorted
+
+from ymltoxml.templates import xform_id
+from ymltoxml.utils import text_file_reader
+
+OPTS = {
+    'file_encoding': 'utf-8',
+    'output_format': 'raw',
+    'default_csv_hdr': None,
+}
+
+# read in some json "column data"
+data = text_file_reader('tests/data/OE-expanded-profile-all-ids.txt', OPTS)
+if data[0].isupper():
+    lc_ids = [xform_id(x) for x in data]
+
+# spit out lowercase id format
+for ctl in os_sorted(lc_ids):
+    print(ctl)
diff --git a/setup.cfg b/setup.cfg
@@ -27,7 +27,7 @@ setup_requires =
 
 install_requires =
     importlib-resources; python_version < '3.10'
-    diskcache
+    natsort
     nested-lookup
     xmltodict
     munch
@@ -57,6 +57,9 @@ console_scripts =
 # extra deps are included here mainly for local/venv installs using pip
 # otherwise deps are handled via tox, ci config files or pkg managers
 [options.extras_require]
+demos =
+    diskcache
+
 doc =
     sphinx
     sphinx_git

diff --git a/src/ymltoxml/data/oscal.yaml b/src/ymltoxml/data/oscal.yaml
@@ -0,0 +1,19 @@
+---
+# comments should be preserved
+file_encoding: 'utf-8'
+default_ext: '.yaml'
+default_content_path: 'ext/oscal-content/nist.gov/SP800-53/rev5'
+default_profile_glob: '*resolved-profile_catalog.yaml'
+default_profile_name: 'PRIVACY'
+default_ssg_glob: 'nist_rhcos4.yml'
+default_ssg_path: 'ext/content/controls'
+default_lookup_key: 'controls'
+default_csv_hdr: null
+new_csv_hdrs: []
+input_format: null
+output_format: 'json'
+preserve_quotes: true
+process_comments: false
+mapping: 4
+sequence: 6
+offset: 4
diff --git a/src/ymltoxml/data/yagrep.yaml b/src/ymltoxml/data/yagrep.yaml
@@ -3,10 +3,7 @@
 file_encoding: 'utf-8'
 default_ext: '.yaml'
 default_separator: '/'
-default_oscal_path: 'ext/oscal-content'
-default_profile_path: 'nist.gov/SP800-53/rev5'
-default_ssg_glob: 'nist_*.yml'
-default_ssg_path: 'ext/content/controls'
+default_csv_hdr: null
 input_format: null
 output_format: 'json'
 preserve_quotes: true