Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/source/pages/cli_usage/command_gen_build_spec.rst
Original file line number Diff line number Diff line change
Expand Up @@ -39,5 +39,4 @@ Options

.. option:: --output-format OUTPUT_FORMAT

The desired output format for the build specification. The default format is `rc-buildspec`, which is the Reproducible-Central build specification.
Other formats may be available depending on your configuration.
The output format. Can be `default-buildspec` (default) or `rc-buildspec` (Reproducible-central build spec)
24 changes: 22 additions & 2 deletions docs/source/pages/output_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ Top level structure

output/
├── build_log/
├── buildspec/
├── git_repos/
├── reports/
├── debug.log
Expand All @@ -43,8 +44,8 @@ The report files of Macaron (from using the :ref:`analyze command <analyze-comma
Unique result path
''''''''''''''''''

For each target software component, Macaron creates a directory under ``reports`` to store the report files. This directory
path is formed from the PURL string of that component. The final path is created using the following template:
For each target software component, Macaron creates a directory under ``reports`` to store the report. These directory
paths are formed from the PURL string of that component. The final path is created using the following template:

.. code-block::

Expand Down Expand Up @@ -131,6 +132,25 @@ to the directory:

.. note:: Please see :ref:`pages/using:analyzing a repository on the local file system` to know how to set the directory for analyzing local repositories.

.. _output_files_macaron_build_spec-Gen:

--------------------------------------
Output files of macaron gen-build-spec
--------------------------------------

As part of the ``gen-build-spec`` command, Macaron generates build spec files to help rebuilding artifacts from source. For each target software component, Macaron creates a dedicated directory under ``buildspec`` to store the generated build specification file. These directory paths are derived from the component's PURL (Package URL) string. The resulting path structure follows this template:

.. code-block::

<path_to_output>/buildspec/<purl_type>/<purl_namespace>/<purl_name>

Depending on the chosen output format, the following files may be generated in each directory:
- ``macaron.buildspec`` (default format)
- ``reproducible_central.buildspec`` (when run with the ``rc-buildspec`` output format for Maven artifacts)

Each file contains the build specification for the corresponding software component.


.. _output_files_macaron_verify_policy:

-------------------------------------
Expand Down
3 changes: 2 additions & 1 deletion docs/source/pages/supported_technologies/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ such as GitHub Actions workflows.
Build Specification Generation
------------------------------

* Maven and Gradle builds for Java artifacts
* Maven and Gradle builds for Java packages
* The built-in ``build`` module and various build tools, like Poetry for Python packages

.. _supported_git_services:

Expand Down
54 changes: 48 additions & 6 deletions docs/source/pages/tutorials/rebuild_third_party_artifacts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ These buildspecs help document and automate the build process for packages, enab

* - Currently Supported packages
* - Maven packages built with Gradle or Maven
* - Python packages built with the built-in ``build`` module and various build tools, like Poetry

.. contents:: :local:

Expand All @@ -31,9 +32,9 @@ Addressing this lack of transparency is critical for improving supply chain secu
Background
**********

A build specification is a file that describes all necessary information to rebuild a package from source. This includes metadata such as the build tool, the specific build command to run, the language version, e.g., JDK for Java, and artifact coordinates. Macaron can now generate this file automatically for supported ecosystems, greatly simplifying build from source.
A build specification is a file that describes all necessary information to rebuild a package from source. This includes metadata such as the build tool, the specific build command to run, the language version, e.g., Python or JDK for Java, and artifact coordinates. Macaron can now generate this file automatically for supported ecosystems, greatly simplifying build from source.

The generated buildspec will be stored in an ecosystem- and PURL-specific path under the ``output/`` directory (see more under :ref:`Output Files Guide <output_files_guide>`).
The generated buildspec will be stored in an ecosystem- and PURL-specific path under the ``output/`` directory (see more under :ref:`Output Files Guide <output_files_macaron_build_spec-Gen>`).

******************************
Installation and Prerequisites
Expand Down Expand Up @@ -101,7 +102,48 @@ In the example above, the buildspec is located at:
Step 3: Review and Use the Buildspec File
*****************************************

The generated buildspec uses the `Reproducible Central buildspec <https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/BUILDSPEC.md>`_ format, for example:
By default we generate the buildspec in JSON format as follows:

.. code-block:: ini

{
"macaron_version": "0.18.0",
"group_id": "org.apache.hugegraph",
"artifact_id": "computer-k8s",
"version": "1.0.0",
"git_repo": "https://github.com/apache/hugegraph-computer",
"git_tag": "d2b95262091d6572cc12dcda57d89f9cd44ac88b",
"newline": "lf",
"language_version": [
"11"
],
"ecosystem": "maven",
"purl": "pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0",
"language": "java",
"build_tools": [
"maven"
],
"build_commands": [
[
"mvn",
"-DskipTests=true",
"-Dmaven.test.skip=true",
"-Dmaven.site.skip=true",
"-Drat.skip=true",
"-Dmaven.javadoc.skip=true",
"clean",
"package"
]
]
}

If you use the ``rc-buildspec`` output format, the generated buildspec follows the `Reproducible Central buildspec <https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/BUILDSPEC.md>`_ format. For example, you can generate it with:

.. code-block:: shell

./run_macaron.sh gen-build-spec -purl pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0 --database output/macaron.db --output-format rc-buildspec

The resulting file will be saved as ``output/buildspec/maven/org_apache_hugegraph/computer-k8s/reproducible_central.buildspec``, and will look like this:

.. code-block:: ini

Expand Down Expand Up @@ -136,18 +178,18 @@ The ``gen-build-spec`` works as follows:

- Extracts metadata and build information from Macaron’s local SQLite database.
- Parses and modifies build commands from CI/CD configurations to ensure compatibility with rebuild systems.
- Identifies the JDK version by parsing CI/CD configurations or extracting it from the ``META-INF/MANIFEST.MF`` file in Maven Central artifacts.
- Identifies the language version, e.g., JDK version by parsing CI/CD configurations or extracting it from the ``META-INF/MANIFEST.MF`` file in Maven Central artifacts.
- Ensures that only the major JDK version is included, as required by the build specification format.


This feature is described in more detail in our accepted ASE 2025 Industry ShowCase paper: `Unlocking Reproducibility: Automating the Re-Build Process for Open-Source Software <https://arxiv.org/pdf/2509.08204>`_.
The Java support for this feature is described in more detail in our accepted ASE 2025 Industry ShowCase paper: `Unlocking Reproducibility: Automating the Re-Build Process for Open-Source Software <https://arxiv.org/pdf/2509.08204>`_.

***********************************
Frequently Asked Questions (FAQs)
***********************************

*Q: What formats are supported for buildspec output?*
A: Currently, only ``rc-buildspec`` is supported.
A: Currently, a default JSON spec and optional ``rc-buildspec`` are supported.

*Q: Do I need to analyze the package every time before generating a buildspec?*
A: No, you only need to analyze the package once unless you want to update the database with newer information.
Expand Down
25 changes: 14 additions & 11 deletions src/macaron/build_spec_generator/common_spec/base_spec.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ class BaseBuildSpecDict(TypedDict, total=False):
#: The programming language, e.g., 'java', 'python', 'javascript'.
language: Required[str]

#: The build tool or package manager, e.g., 'maven', 'gradle', 'pip', 'poetry', 'npm', 'yarn'.
build_tool: Required[str]
#: The build tools or package managers, e.g., 'maven', 'gradle', 'pip', 'poetry', 'npm', 'yarn'.
build_tools: Required[list[str]]

#: The version of Macaron used for generating the spec.
macaron_version: Required[str]
Expand Down Expand Up @@ -73,10 +73,13 @@ class BaseBuildSpecDict(TypedDict, total=False):
#: Entry point script, class, or binary for running the project.
entry_point: NotRequired[str | None]

#: The build_requires is the required packages that need to be available in the build environment.
build_requires: NotRequired[dict[str, str]]

#: A "back end" is tool that a "front end" (such as pip/build) would call to
#: package the source distribution into the wheel format. build_backends would
#: be a list of these that were used in building the wheel alongside their version.
build_backends: NotRequired[dict[str, str]]
build_backends: NotRequired[list[str]]


class BaseBuildSpec(ABC):
Expand All @@ -94,21 +97,21 @@ def resolve_fields(self, purl: PackageURL) -> None:
"""

@abstractmethod
def get_default_build_command(
def get_default_build_commands(
self,
build_tool_name: str,
) -> list[str]:
"""Return a default build command for the build tool.
build_tool_names: list[str],
) -> list[list[str]]:
"""Return the default build commands for the build tools.

Parameters
----------
build_tool_name: str
The build tool to get the default build command.
build_tool_names: list[str]
The build tools to get the default build command.

Returns
-------
list[str]
The build command as a list[str].
list[list[str]]
The build command as a list[list[str]].

Raises
------
Expand Down
56 changes: 28 additions & 28 deletions src/macaron/build_spec_generator/common_spec/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ class MacaronBuildToolName(str, Enum):
GRADLE = "gradle"
PIP = "pip"
POETRY = "poetry"
FLIT = "flit"
HATCH = "hatch"
CONDA = "conda"


def format_build_command_info(build_command_info: list[GenericBuildCommandInfo]) -> str:
Expand Down Expand Up @@ -117,18 +120,14 @@ def compose_shell_commands(cmds_sequence: list[list[str]]) -> str:
return result


def get_macaron_build_tool_name(
def get_macaron_build_tool_names(
build_tool_facts: Sequence[BuildToolFacts], target_language: str
) -> MacaronBuildToolName | None:
) -> list[MacaronBuildToolName] | None:
"""
Retrieve the Macaron build tool name for supported projects from the database facts.
Retrieve the Macaron build tool names for supported projects from the database facts.

Iterates over the provided build tool facts and returns the first valid `MacaronBuildToolName`
for a supported language. If no valid build tool name is found, returns None.

.. note::
If multiple build tools are present in the database, only the first valid one encountered
in the sequence is returned.
Iterates over the provided build tool facts and returns the list of valid `MacaronBuildToolName`
for a supported language.

Parameters
----------
Expand All @@ -139,31 +138,27 @@ def get_macaron_build_tool_name(

Returns
-------
MacaronBuildToolName or None
The corresponding Macaron build tool name if found, otherwise None.
list[MacaronBuildToolName] None
The corresponding Macaron build tool names, or None otherwise.
"""
build_tool_names = []
for fact in build_tool_facts:
if fact.language.lower() == target_language:
try:
macaron_build_tool_name = MacaronBuildToolName(fact.build_tool_name)
build_tool_names.append(MacaronBuildToolName(fact.build_tool_name))
except ValueError:
continue

# TODO: What happen if we report multiple build tools in the database?
return macaron_build_tool_name

return None
return build_tool_names or None


def get_build_tool_name(
def get_build_tool_names(
component_id: int, session: sqlalchemy.orm.Session, target_language: str
) -> MacaronBuildToolName | None:
"""
Retrieve the Macaron build tool name for a given component.
) -> list[MacaronBuildToolName] | None:
"""Retrieve the Macaron build tool names for a given component.

Queries the database for build tool facts associated with the specified component ID
and returns the corresponding `MacaronBuildToolName` if found. If no valid build tool
information is available or an error occurs during the query, returns None.
Queries the database for build tool facts associated with the specified component ID.
It returns the corresponding list of `MacaronBuildToolName` if found.

Parameters
----------
Expand All @@ -176,7 +171,7 @@ def get_build_tool_name(

Returns
-------
MacaronBuildToolName or None
list[MacaronBuildToolName] | None
The corresponding build tool name for the component if available, otherwise None.
"""
try:
Expand All @@ -203,7 +198,7 @@ def get_build_tool_name(
[(fact.build_tool_name, fact.language) for fact in build_tool_facts],
)

return get_macaron_build_tool_name(build_tool_facts, target_language)
return get_macaron_build_tool_names(build_tool_facts, target_language)


def get_build_command_info(
Expand Down Expand Up @@ -345,12 +340,17 @@ def gen_generic_build_spec(
latest_component_repository.commit_sha,
)

build_tool_name = get_build_tool_name(
build_tool_names = []
build_tools = get_build_tool_names(
component_id=latest_component.id, session=session, target_language=target_language
)
if not build_tool_name:
if not build_tools:
raise GenerateBuildSpecError(f"Failed to determine build tool for {purl}.")

# This check is for Pylint, which is not able to iterate over build_tools, even though it cannot be None.
if build_tools is not None:
build_tool_names = [build_tool.value for build_tool in build_tools]

build_command_info = get_build_command_info(
component_id=latest_component.id,
session=session,
Expand All @@ -377,7 +377,7 @@ def gen_generic_build_spec(
"ecosystem": purl.type,
"purl": str(purl),
"language": target_language,
"build_tool": build_tool_name.value,
"build_tools": build_tool_names,
"build_commands": [selected_build_command],
}
)
Expand Down
Loading
Loading