Skip to content

Commit ff4fc6a

Browse files
committed
chore: add tutorial for python gen-build-spec and support more build tools
Signed-off-by: behnazh-w <behnaz.hassanshahi@oracle.com>
1 parent 76ec530 commit ff4fc6a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1643
-132
lines changed

docs/source/pages/cli_usage/command_gen_build_spec.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,5 +39,4 @@ Options
3939

4040
.. option:: --output-format OUTPUT_FORMAT
4141

42-
The desired output format for the build specification. The default format is `rc-buildspec`, which is the Reproducible-Central build specification.
43-
Other formats may be available depending on your configuration.
42+
The output format. Can be `default-buildspec` (default) or `rc-buildspec` (Reproducible-central build spec)

docs/source/pages/output_files.rst

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Top level structure
2121
2222
output/
2323
├── build_log/
24+
├── buildspec/
2425
├── git_repos/
2526
├── reports/
2627
├── debug.log
@@ -43,8 +44,8 @@ The report files of Macaron (from using the :ref:`analyze command <analyze-comma
4344
Unique result path
4445
''''''''''''''''''
4546

46-
For each target software component, Macaron creates a directory under ``reports`` to store the report files. This directory
47-
path is formed from the PURL string of that component. The final path is created using the following template:
47+
For each target software component, Macaron creates a directory under ``reports`` to store the report. These directory
48+
paths are formed from the PURL string of that component. The final path is created using the following template:
4849

4950
.. code-block::
5051
@@ -131,6 +132,25 @@ to the directory:
131132
132133
.. note:: Please see :ref:`pages/using:analyzing a repository on the local file system` to know how to set the directory for analyzing local repositories.
133134

135+
.. _output_files_macaron_build_spec-Gen:
136+
137+
--------------------------------------
138+
Output files of macaron gen-build-spec
139+
--------------------------------------
140+
141+
As part of the ``gen-build-spec`` command, Macaron generates build spec files to help rebuilding artifacts from source. For each target software component, Macaron creates a dedicated directory under ``buildspec`` to store the generated build specification file. These directory paths are derived from the component's PURL (Package URL) string. The resulting path structure follows this template:
142+
143+
.. code-block::
144+
145+
<path_to_output>/buildspec/<purl_type>/<purl_namespace>/<purl_name>
146+
147+
Depending on the chosen output format, the following files may be generated in each directory:
148+
- ``macaron.buildspec`` (default format)
149+
- ``reproducible_central.buildspec`` (when run with the ``rc-buildspec`` output format for Maven artifacts)
150+
151+
Each file contains the build specification for the corresponding software component.
152+
153+
134154
.. _output_files_macaron_verify_policy:
135155

136156
-------------------------------------

docs/source/pages/supported_technologies/index.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,8 @@ such as GitHub Actions workflows.
2929
Build Specification Generation
3030
------------------------------
3131

32-
* Maven and Gradle builds for Java artifacts
32+
* Maven and Gradle builds for Java packages
33+
* The built-in ``build`` module and various build tools, like Poetry for Python packages
3334

3435
.. _supported_git_services:
3536

docs/source/pages/tutorials/rebuild_third_party_artifacts.rst

Lines changed: 46 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ These buildspecs help document and automate the build process for packages, enab
1616

1717
* - Currently Supported packages
1818
* - Maven packages built with Gradle or Maven
19+
* - Python packages built with the built-in ``build`` module and various build tools, like Poetry
1920

2021
.. contents:: :local:
2122

@@ -31,9 +32,9 @@ Addressing this lack of transparency is critical for improving supply chain secu
3132
Background
3233
**********
3334

34-
A build specification is a file that describes all necessary information to rebuild a package from source. This includes metadata such as the build tool, the specific build command to run, the language version, e.g., JDK for Java, and artifact coordinates. Macaron can now generate this file automatically for supported ecosystems, greatly simplifying build from source.
35+
A build specification is a file that describes all necessary information to rebuild a package from source. This includes metadata such as the build tool, the specific build command to run, the language version, e.g., Python or JDK for Java, and artifact coordinates. Macaron can now generate this file automatically for supported ecosystems, greatly simplifying build from source.
3536

36-
The generated buildspec will be stored in an ecosystem- and PURL-specific path under the ``output/`` directory (see more under :ref:`Output Files Guide <output_files_guide>`).
37+
The generated buildspec will be stored in an ecosystem- and PURL-specific path under the ``output/`` directory (see more under :ref:`Output Files Guide <output_files_macaron_build_spec-Gen>`).
3738

3839
******************************
3940
Installation and Prerequisites
@@ -101,7 +102,46 @@ In the example above, the buildspec is located at:
101102
Step 3: Review and Use the Buildspec File
102103
*****************************************
103104

104-
The generated buildspec uses the `Reproducible Central buildspec <https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/BUILDSPEC.md>`_ format, for example:
105+
By default we generate the buildspec in JSON format as follows:
106+
107+
.. code-block:: ini
108+
109+
{
110+
"macaron_version": "0.18.0",
111+
"group_id": "org.apache.hugegraph",
112+
"artifact_id": "computer-k8s",
113+
"version": "1.0.0",
114+
"git_repo": "https://github.com/apache/hugegraph-computer",
115+
"git_tag": "d2b95262091d6572cc12dcda57d89f9cd44ac88b",
116+
"newline": "lf",
117+
"language_version": [
118+
"11"
119+
],
120+
"ecosystem": "maven",
121+
"purl": "pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0",
122+
"language": "java",
123+
"build_tool": "maven",
124+
"build_commands": [
125+
[
126+
"mvn",
127+
"-DskipTests=true",
128+
"-Dmaven.test.skip=true",
129+
"-Dmaven.site.skip=true",
130+
"-Drat.skip=true",
131+
"-Dmaven.javadoc.skip=true",
132+
"clean",
133+
"package"
134+
]
135+
]
136+
}
137+
138+
If you use the ``rc-buildspec`` output format, the generated buildspec follows the `Reproducible Central buildspec <https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/doc/BUILDSPEC.md>`_ format. For example, you can generate it with:
139+
140+
.. code-block:: shell
141+
142+
./run_macaron.sh gen-build-spec -purl pkg:maven/org.apache.hugegraph/computer-k8s@1.0.0 --database output/macaron.db --output-format rc-buildspec
143+
144+
The resulting file will be saved as ``output/buildspec/maven/org_apache_hugegraph/computer-k8s/reproducible_central.buildspec``, and will look like this:
105145

106146
.. code-block:: ini
107147
@@ -136,18 +176,18 @@ The ``gen-build-spec`` works as follows:
136176

137177
- Extracts metadata and build information from Macaron’s local SQLite database.
138178
- Parses and modifies build commands from CI/CD configurations to ensure compatibility with rebuild systems.
139-
- Identifies the JDK version by parsing CI/CD configurations or extracting it from the ``META-INF/MANIFEST.MF`` file in Maven Central artifacts.
179+
- Identifies the language version, e.g., JDK version by parsing CI/CD configurations or extracting it from the ``META-INF/MANIFEST.MF`` file in Maven Central artifacts.
140180
- Ensures that only the major JDK version is included, as required by the build specification format.
141181

142182

143-
This feature is described in more detail in our accepted ASE 2025 Industry ShowCase paper: `Unlocking Reproducibility: Automating the Re-Build Process for Open-Source Software <https://arxiv.org/pdf/2509.08204>`_.
183+
The Java support for this feature is described in more detail in our accepted ASE 2025 Industry ShowCase paper: `Unlocking Reproducibility: Automating the Re-Build Process for Open-Source Software <https://arxiv.org/pdf/2509.08204>`_.
144184

145185
***********************************
146186
Frequently Asked Questions (FAQs)
147187
***********************************
148188

149189
*Q: What formats are supported for buildspec output?*
150-
A: Currently, only ``rc-buildspec`` is supported.
190+
A: Currently, a default JSON spec and optional ``rc-buildspec`` are supported.
151191

152192
*Q: Do I need to analyze the package every time before generating a buildspec?*
153193
A: No, you only need to analyze the package once unless you want to update the database with newer information.

src/macaron/build_spec_generator/common_spec/core.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,9 @@ class MacaronBuildToolName(str, Enum):
5757
GRADLE = "gradle"
5858
PIP = "pip"
5959
POETRY = "poetry"
60+
FLIT = "flit"
61+
HATCH = "hatch"
62+
CONDA = "conda"
6063

6164

6265
def format_build_command_info(build_command_info: list[GenericBuildCommandInfo]) -> str:

src/macaron/build_spec_generator/common_spec/pypi_spec.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,12 @@ def get_default_build_command(
6767
default_build_command = "python -m build".split()
6868
case "poetry":
6969
default_build_command = "poetry build".split()
70+
case "flit":
71+
default_build_command = "flit build".split()
72+
case "hatch":
73+
default_build_command = "hatch build".split()
74+
case "conda":
75+
default_build_command = "conda build".split()
7076
case _:
7177
pass
7278

@@ -95,8 +101,7 @@ def resolve_fields(self, purl: PackageURL) -> None:
95101
registry.load_defaults()
96102

97103
registry_info = PackageRegistryInfo(
98-
build_tool_name="pip",
99-
build_tool_purl_type="pypi",
104+
ecosystem="pypi",
100105
package_registry=registry,
101106
metadata=[],
102107
)

src/macaron/config/defaults.ini

Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -293,11 +293,13 @@ packager =
293293
pip3
294294
flit
295295
conda
296+
hatch
296297
publisher =
297298
twine
298299
flit
299300
conda
300301
tox
302+
hatch
301303
# These are the Python interpreters that may be used to load modules.
302304
interpreter =
303305
python
@@ -322,6 +324,11 @@ package_lock = poetry.lock
322324
builder =
323325
poetry
324326
poetry-core
327+
# build-system information.
328+
build_requires =
329+
poetry-core
330+
build_backend =
331+
poetry.core.masonry.api
325332
# These are the Python interpreters that may be used to load modules.
326333
interpreter =
327334
python
@@ -336,6 +343,82 @@ deploy_arg =
336343
[builder.poetry.ci.deploy]
337344
github_actions = pypa/gh-action-pypi-publish
338345

346+
# This is the spec for Flit packaging tool.
347+
[builder.flit]
348+
entry_conf =
349+
build_configs =
350+
pyproject.toml
351+
flit.ini
352+
builder =
353+
flit
354+
# build-system information.
355+
build_requires =
356+
flit_core
357+
build_backend =
358+
flit_core.buildapi
359+
# These are the Python interpreters that may be used to load modules.
360+
interpreter =
361+
python
362+
python3
363+
interpreter_flag =
364+
-m
365+
build_arg =
366+
build
367+
deploy_arg =
368+
publish
369+
370+
[builder.flit.ci.deploy]
371+
github_actions = pypa/gh-action-pypi-publish
372+
373+
# This is the spec for the Hatch packaging tool.
374+
[builder.hatch]
375+
entry_conf =
376+
build_configs =
377+
pyproject.toml
378+
hatch.toml
379+
builder =
380+
hatch
381+
# build-system information.
382+
build_requires =
383+
hatchling
384+
build_backend =
385+
hatchling.build
386+
# These are the Python interpreters that may be used to load modules.
387+
interpreter =
388+
python
389+
python3
390+
interpreter_flag =
391+
-m
392+
build_arg =
393+
build
394+
deploy_arg =
395+
publish
396+
397+
[builder.hatch.ci.deploy]
398+
github_actions = pypa/gh-action-pypi-publish
399+
400+
# This is the spec for the Conda packaging tool.
401+
[builder.conda]
402+
entry_conf =
403+
build_configs =
404+
environment.yml
405+
meta.yaml
406+
builder =
407+
conda
408+
# These are the Python interpreters that may be used to load modules.
409+
interpreter =
410+
python
411+
python3
412+
interpreter_flag =
413+
-m
414+
build_arg =
415+
build
416+
deploy_arg =
417+
publish
418+
419+
[builder.conda.ci.deploy]
420+
github_actions = pypa/gh-action-pypi-publish
421+
339422
# This is the spec for trusted Docker build tool usages.
340423
[builder.docker]
341424
entry_conf =

src/macaron/provenance/provenance_finder.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -554,11 +554,7 @@ def get_artifact_hash(
554554
return None
555555

556556
registry_info = next(
557-
(
558-
info
559-
for info in package_registries_info
560-
if info.package_registry == pypi_registry and info.build_tool_name in {"pip", "poetry"}
561-
),
557+
(info for info in package_registries_info if info.package_registry == pypi_registry),
562558
None,
563559
)
564560
if not registry_info:

src/macaron/repo_finder/repo_finder_pypi.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,7 @@ def find_repo(
3737
if package_registries_info:
3838
# Find the package registry info object that contains the PyPI registry and has the pypi build tool.
3939
pypi_info = next(
40-
(
41-
info
42-
for info in package_registries_info
43-
if isinstance(info.package_registry, PyPIRegistry) and info.build_tool_name in {"poetry", "pip"}
44-
),
40+
(info for info in package_registries_info if isinstance(info.package_registry, PyPIRegistry)),
4541
None,
4642
)
4743
if not pypi_info:

src/macaron/slsa_analyzer/analyzer.py

Lines changed: 6 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -379,7 +379,7 @@ def run_single(
379379
)
380380

381381
# Pre-populate all package registries so assets can be stored for later.
382-
package_registries_info = self._populate_package_registry_info()
382+
package_registries_info = self._populate_package_registry_info(parsed_purl) if parsed_purl else []
383383

384384
provenance_is_verified = False
385385
provenance_asset = None
@@ -1127,18 +1127,14 @@ def _determine_ci_services(self, analyze_ctx: AnalyzeContext, git_service: BaseG
11271127
"[red]Not Found[/]",
11281128
)
11291129

1130-
def _populate_package_registry_info(self) -> list[PackageRegistryInfo]:
1130+
def _populate_package_registry_info(self, parsed_purl: PackageURL) -> list[PackageRegistryInfo]:
11311131
"""Add all possible package registries to the analysis context."""
11321132
package_registries = []
11331133
for package_registry in PACKAGE_REGISTRIES:
1134-
for build_tool in BUILD_TOOLS:
1135-
build_tool_name = build_tool.name
1136-
if build_tool_name not in package_registry.build_tool_names:
1137-
continue
1134+
if package_registry.ecosystem == parsed_purl.type:
11381135
package_registries.append(
11391136
PackageRegistryInfo(
1140-
build_tool_name=build_tool_name,
1141-
build_tool_purl_type=build_tool.purl_type,
1137+
ecosystem=parsed_purl.type,
11421138
package_registry=package_registry,
11431139
)
11441140
)
@@ -1149,14 +1145,10 @@ def _determine_package_registries(
11491145
analyze_ctx: AnalyzeContext,
11501146
package_registries_info: list[PackageRegistryInfo],
11511147
) -> None:
1152-
"""Determine the package registries used by the software component based on its build tools."""
1153-
build_tools = (
1154-
analyze_ctx.dynamic_data["build_spec"]["tools"] or analyze_ctx.dynamic_data["build_spec"]["purl_tools"]
1155-
)
1156-
build_tool_names = {build_tool.name for build_tool in build_tools}
1148+
"""Determine the package registries used by the software component."""
11571149
relevant_package_registries = []
11581150
for package_registry in package_registries_info:
1159-
if package_registry.build_tool_name not in build_tool_names:
1151+
if not package_registry.ecosystem == analyze_ctx.component.type:
11601152
continue
11611153
relevant_package_registries.append(package_registry)
11621154

0 commit comments

Comments
 (0)