Skip to content

Conversation

@hop-dev
Copy link

@hop-dev hop-dev commented Apr 15, 2025

Pull Request

Summary - What I changed

Add a generate-json command which converts the hunting toml files to json.

JSON files are added to the json directory next to in the same way docs generates the docs folder.

I have added this output to the git ignore.

I have added this because I am investigating adding these prebuilt queries to kibana and wanted them in JSON format.

How To Test

python3.12 -m hunting generate-json

Checklist

  • Added a label for the type of pr: bug, enhancement, schema, maintenance, Rule: New, Rule: Deprecation, Rule: Tuning, Hunt: New, or Hunt: Tuning so guidelines can be generated
  • Documentation and comments were added for features that require explanation

@hop-dev hop-dev added the enhancement New feature or request label Apr 15, 2025
@github-actions
Copy link
Contributor

Enhancement - Guidelines

These guidelines serve as a reminder set of considerations when addressing adding a feature to the code.

Documentation and Context

  • Describe the feature enhancement in detail (alternative solutions, description of the solution, etc.) if not already documented in an issue.
  • Include additional context or screenshots.
  • Ensure the enhancement includes necessary updates to the documentation and versioning.

Code Standards and Practices

  • Code follows established design patterns within the repo and avoids duplication.
  • Code changes do not introduce new warnings or errors.
  • Variables and functions are well-named and descriptive.
  • Any unnecessary / commented-out code is removed.
  • Ensure that the code is modular and reusable where applicable.
  • Check for proper exception handling and messaging.

Testing

  • New unit tests have been added to cover the enhancement.
  • Existing unit tests have been updated to reflect the changes.
  • Provide evidence of testing and validating the enhancement (e.g., test logs, screenshots).
  • Validate that any rules affected by the enhancement are correctly updated.
  • Ensure that performance is not negatively impacted by the changes.
  • Verify that any release artifacts are properly generated and tested.

Additional Checks

  • Ensure that the enhancement does not break existing functionality.
  • Review the enhancement with a peer or team member for additional insights.
  • Verify that the enhancement works across all relevant environments (e.g., different OS versions).
  • Confirm that all dependencies are up-to-date and compatible with the changes.
  • Confirm that the proper version label is applied to the PR patch, minor, major.

@hop-dev hop-dev marked this pull request as ready for review April 15, 2025 13:32
@hop-dev hop-dev added the patch label Apr 15, 2025
@botelastic botelastic bot added the Hunting label Apr 15, 2025
hunting/json.py Outdated
"license": hunt_config.license
}

return json_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checking is there a particular reason to use this effectively customized json format? The code generally looks fine if we want to use this format; however, if not, it could be substantially simpler.

For instance in your generate_json you could load each toml file as a Hunt dataclass object and use the inbuilt json dumps function to convert the dataclass to json.

E.g.

import json
from dataclasses import asdict
from hunting.definitions import Hunt

# Assuming you have a Hunt object
hunt = Hunt(
    author="Elastic",
    description="Example hunt",
    integration=["integration1", "integration2"],
    uuid="123e4567-e89b-12d3-a456-426614174000",
    name="Example Hunt",
    language=["esql"],
    license="Elastic License",
    query=["from logs | stats count() by host.name"],
    notes=["Example note"],
    mitre=["T1003"],
    references=["https://example.com"]
)

# Convert Hunt object to JSON
hunt_json = json.dumps(asdict(hunt), indent=4)

print(hunt_json)

Would result in

❯ python test_hunt.py
{
    "author": "Elastic",
    "description": "Example hunt",
    "integration": [
        "integration1",
        "integration2"
    ],
    "uuid": "123e4567-e89b-12d3-a456-426614174000",
    "name": "Example Hunt",
    "language": [
        "esql"
    ],
    "license": "Elastic License",
    "query": [
        "from logs | stats count() by host.name"
    ],
    "notes": [
        "Example note"
    ],
    "mitre": [
        "T1003"
    ],
    "references": [
        "https://example.com"
    ]
}

This could be called in generate_json on a glob of the toml files in the provided directory and you could write json objects in a similar way to how are you writing them now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I was over-complicating it because I had copied the Markdown converter and was mimicking the markdown structure which was completely unnecessary, I've implemented your recommendation 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are ok with the less complex json structure, have you considered not using any of the code in hunting/json.py and add the functionality by using something similar to the following?

@hunting.command('generate-json')
@click.option('--path', type=Path, default=None, help="Path to a TOML file or directory containing TOML files.")
@click.option(
    '--output-folder',
    type=Path,
    default=Path("json"),
    show_default=True,
    help="Output folder to save the generated JSON files. Defaults to './json'."
)
def generate_json(path: Path = None, output_folder: Path = None):
    """Convert TOML hunting queries to JSON format and save to output folder."""
    output_folder = Path(output_folder)
    output_folder.mkdir(parents=True, exist_ok=True)

    # Determine the list of files to process
    if path:
        path = Path(path)
        if path.is_file() and path.suffix == '.toml':
            files_to_process = [path]
        elif path.is_dir():
            files_to_process = list(path.glob('*.toml'))
        else:
            raise ValueError(f"Invalid path provided: {path}")
    else:
        raise ValueError("Path must be provided as a file or directory.")

    # Process each file
    for file_path in files_to_process:
        hunt_contents = load_toml(file_path)
        json_hunt_contents = json.dumps(asdict(hunt_contents), indent=4)
        output_file = output_folder / f"{file_path.stem}.json"
        with open(output_file, 'w') as f:
            f.write(json_hunt_contents)
        click.echo(f"Generated JSON: {output_file}")

markdown_generator.update_index_md()

@hunting.command('generate-json')
@click.argument('path', required=False)
Copy link
Contributor

@traut traut Apr 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pathlib import Path

...

@click.argument("path", type=click.Path(dir_okay=True, path_type=Path, exists=True))

would ensure the argument path has the correct type, so there will be no need for forced conversion below:

path = Path(path)

from .definitions import Hunt
from .utils import load_index_file, load_toml

class JSONGenerator:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is really no need to create for a class here: the grouping of the logic is achieved by using a separate module (though, it's better to rename to avoid the confusion with default json package) and we have no use for state beyond this module.

I suggest refactoring this as separate stateless functions

@hop-dev hop-dev closed this Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants