diff --git a/docs/detectors/README.md b/docs/detectors/README.md index f63d8569a..3309e9eed 100644 --- a/docs/detectors/README.md +++ b/docs/detectors/README.md @@ -1,6 +1,6 @@ # Detectors -- CocoaPods +- [CocoaPods](cocoapods.md) | Detector | Status | | -------------------- | ------ | @@ -18,10 +18,10 @@ | -------------------------- | ---------- | | CondaLockComponentDetector | DefaultOff | -- DockerFile +- [Dockerfile](dockerfile.md) -| Detector | Status | -| -------------------------- | ---------- | +| Detector | Status | +| --------------------------- | ---------- | | DockerfileComponentDetector | DefaultOff | - [DotNet](dotnet.md) @@ -42,7 +42,7 @@ | ----------------------- | ------ | | GradleComponentDetector | Stable | -- Ivy +- [Ivy](ivy.md) | Detector | Status | | ----------- | ------------ | @@ -84,7 +84,7 @@ | PipComponentDetector | DefaultOff | | SimplePipComponentDetector | DefaultOff | -- Pnpm +- [Pnpm](pnpm.md) | Detector | Status | | ---------------------------- | ------ | @@ -96,7 +96,7 @@ | ----------------------- | ------------ | | PoetryComponentDetector | Experimental | -- Ruby +- [Ruby](ruby.md) | Detector | Status | | --------------------- | ------ | @@ -108,7 +108,7 @@ | ---------------- | ------ | | RustSbomDetector | Stable | -- Spdx +- [Spdx](spdx.md) | Detector | Status | | ----------------------- | ---------- | @@ -126,13 +126,13 @@ | ----------------------- | ------------ | | UvLockComponentDetector | Experimental | -- Vcpkg +- [Vcpkg](vcpkg.md) | Detector | Status | | ---------------------- | ------ | | VcpkgComponentDetector | Stable | -- Yarn +- [Yarn](yarn.md) | Detector | Status | | ------------------------ | ------ | diff --git a/docs/detectors/cocoapods.md b/docs/detectors/cocoapods.md new file mode 100644 index 000000000..213dff52a --- /dev/null +++ b/docs/detectors/cocoapods.md @@ -0,0 +1,23 @@ +# CocoaPods Detection + +## Requirements + +CocoaPods detection relies on a `Podfile.lock` file being present. This file is generated by CocoaPods when dependencies are installed. + +## Detection strategy + +CocoaPods detection is performed by parsing every `Podfile.lock` found under the scan directory. The detector: + +- Parses the YAML-formatted `Podfile.lock` file to extract pod dependencies +- Identifies root dependencies from the `DEPENDENCIES` section +- Constructs a dependency graph by traversing pod relationships +- Supports both standard CocoaPods packages and Git-based dependencies +- Normalizes Git repository URIs (e.g., converting `git@` references to `https://`) +- Maps pods to their spec repositories (TRUNK or custom repositories) +- Handles subspecs (e.g., `AFNetworking/Reachability`) by mapping them to their parent podspec + +## Known limitations + +CocoaPods detection will not work if lock files are not being used or not yet generated. Ensure that `pod install` or `pod update` has been run to generate the `Podfile.lock` file(s) before running the scan. + +The detector constructs a full dependency graph based on the relationships present in the `Podfile.lock` file, including transitive dependencies. However, dependency relationships are limited to what CocoaPods records in the lock file at the time of pod installation. diff --git a/docs/detectors/dockerfile.md b/docs/detectors/dockerfile.md new file mode 100644 index 000000000..8143c031a --- /dev/null +++ b/docs/detectors/dockerfile.md @@ -0,0 +1,32 @@ +# Dockerfile Detection + +## Requirements + +Dockerfile detection depends on the following to successfully run: + +- One or more Dockerfile files matching the patterns: `dockerfile`, `dockerfile.*`, or `*.dockerfile` + +The `DockerfileComponentDetector` is a **DefaultOff** detector and must be explicitly enabled via the `--DetectorArgs` parameter. + +## Detection strategy + +The Dockerfile detector parses Dockerfile syntax to extract Docker image references from `FROM` and `COPY --from` instructions. It uses the [Valleysoft.DockerfileModel](https://github.com/mthalman/DockerfileModel) library to parse Dockerfile syntax. + +### FROM Instruction Detection +The detector extracts base image references from `FROM` instructions and resolves multi-stage build references: +- Direct image references (e.g., `FROM ubuntu:22.04`) +- Multi-stage builds with stage names (e.g., `FROM node:18 AS builder`) +- Stage-to-stage references are tracked to avoid reporting internal build stages as external dependencies + +### COPY --from Instruction Detection +The detector extracts image references from `COPY --from=` instructions that reference external images rather than build stages. + +### Variable Resolution +The detector attempts to resolve Dockerfile variables using the `ResolveVariables()` method from the parser library. Images with unresolved variables (containing `$`, `{`, or `}` characters) are skipped to avoid reporting incomplete or incorrect references. + +## Known limitations + +- **DefaultOff Status**: This detector must be explicitly enabled using `--DetectorArgs DockerReference=EnableIfDefaultOff` +- **Variable Resolution**: Image references containing unresolved Dockerfile `ARG` or `ENV` variables are not reported, which may lead to under-reporting in Dockerfiles that heavily use build-time variables +- **No Version Pinning Validation**: The detector does not warn about unpinned image versions (e.g., `latest` tags), which are generally discouraged in production Dockerfiles +- **No Digest Support**: While Docker supports content-addressable image references using SHA256 digests (e.g., `ubuntu@sha256:abc...`), the parsing and reporting of these references depends on the underlying `DockerReferenceUtility.ParseFamiliarName()` implementation diff --git a/docs/detectors/ivy.md b/docs/detectors/ivy.md new file mode 100644 index 000000000..14ef8e4b0 --- /dev/null +++ b/docs/detectors/ivy.md @@ -0,0 +1,35 @@ +# Ivy Detection + +## Requirements + +Ivy detection depends on the following to successfully run: + +- Apache Ant CLI as part of your PATH (`ant` or `ant.bat` should be runnable from a given command line). +- Java Development Kit (JDK) installed and configured for Ant. +- One or more `ivy.xml` files. +- Optional `ivysettings.xml` files in the same directory as `ivy.xml` for repository configuration. + +## Detection strategy + +Ivy detection is performed by running Apache Ant to resolve dependencies for each `ivy.xml` file found. The detector: + +1. Copies `ivy.xml` (and `ivysettings.xml` if present) to a temporary directory. +2. Creates a synthetic Ant build file with a custom task that invokes Ivy's dependency resolver. +3. Executes `ant resolve-dependencies` to resolve both direct and transitive dependencies. +4. Parses the JSON output produced by the custom Ant task to register components. + +Components are identified using Maven's GAV (group, artifact, version) coordinate system, which corresponds to Ivy's (org, name, rev) coordinates. Dependencies with the same organization as the project are treated as first-party dependencies and ignored. + +Components tagged as development dependencies are marked appropriately. + +Full dependency graph generation is supported. + +## Known limitations + +Ivy detection will not run if `ant` is unavailable in the PATH. + +The `ivy.xml` and `ivysettings.xml` files must be self-contained. Detection will fail if these files: +- Rely on properties defined in the project's `build.xml` +- Use file inclusion mechanisms (e.g., `` tags) + +Dependencies that cannot be resolved by Ivy will be logged as package parse failures and not included in the detection results. diff --git a/docs/detectors/pnpm.md b/docs/detectors/pnpm.md index 004c86fea..6c8c4780f 100644 --- a/docs/detectors/pnpm.md +++ b/docs/detectors/pnpm.md @@ -1,23 +1,65 @@ -# Pnpm detection +# Pnpm Detection + +## Requirements + +Pnpm detection relies on the presence of lockfiles generated by the pnpm package manager. The detector searches for the following files: + +- `pnpm-lock.yaml` +- `shrinkwrap.yaml` (legacy format) + +The detector supports lockfile versions 5, 6, and 9, with version 9 being the maximum supported version. + +## Detection strategy + +Pnpm detection is performed by parsing lockfiles found in the scan directory. The `PnpmComponentDetectorFactory` acts as a version-aware factory that: + +1. **Detects the lockfile version** by parsing the `lockfileVersion` field in the YAML file +2. **Delegates to the appropriate version-specific detector**: + - `Pnpm5Detector` for lockfile version 5.x (also handles legacy `shrinkwrapVersion` files) + - `Pnpm6Detector` for lockfile version 6.x + - `Pnpm9Detector` for lockfile version 9.x + +Each version-specific detector handles the format differences in pnpm lockfiles: + +- **Version 5**: Basic package graph with dependencies listed in the `packages` section +- **Version 6**: Introduced workspace support with `importers` section and improved dependency tracking +- **Version 9**: Changed the structure to use `snapshots` instead of `packages` and removed dev dependency metadata from the lockfile + +### Dependency Graph Construction + +The detectors build a complete dependency graph by: + +1. Registering all packages found in the lockfile as components +2. Creating parent-child relationships based on dependency declarations +3. Marking direct dependencies as explicit references +4. Identifying development dependencies based on: + - Lockfile metadata (versions 5-6) + - Dependency tree position (version 9, where lockfile no longer includes dev dependency flags) + +### Workspace Support + +Pnpm supports both single-package projects ("dedicated shrinkwrap") and multi-package workspaces ("shared shrinkwrap"). The detectors handle both scenarios: + +- Single-package: Dependencies are read directly from the root level +- Workspaces: Dependencies are read from each importer in the `importers` section ## Known limitations -The Pnpm detector doesn't support the resolution of local dependencies -like: - -- Link dependencies -``` -dependencies: - '@learningclient/common': link:../common -``` - -- File dependencies -``` -dependencies: - file:./projects/gmc-bootstrapper.tgz -``` -These kind of components are ignored by the Pnpm detector. - -In the case of `link` dependencies that refer to a folder with a `package.json` file -the component is then going to be detected by the `NpmComponentDetector`. This is going to happen -only if the folder is inside the path that is been use for scanning. +1. **Local dependencies are skipped**: Packages referenced with `file:` or `link:` prefixes are not included in detection as they represent local packages rather than external dependencies. In the case of `link` dependencies that refer to a folder with a `package.json` file, the component may be detected by the `NpmComponentDetector` if the folder is inside the scan path. + + Example of ignored dependencies: + ```yaml + dependencies: + '@learningclient/common': link:../common + file:./projects/gmc-bootstrapper.tgz + ``` + +2. **HTTP/HTTPS dependencies**: In version 9, dependencies referenced via `http:` or `https:` protocols are also skipped as they are treated similarly to local dependencies + +3. **Lockfile version support**: Only versions 5, 6, and 9 are supported. If an unsupported version is detected, the file will be skipped and a warning will be logged + +4. **Version 9 dev dependency detection**: Lockfile version 9 removed the metadata that explicitly marks packages as dev dependencies. The detector relies on the dependency tree structure to determine dev dependency status, which may be less accurate in complex scenarios + +5. **Pnpm dependency path complexity**: Pnpm uses specialized dependency paths that include peer dependency information and other metadata. While the detector handles standard cases, highly complex dependency scenarios with multiple peer dependencies may not be perfectly represented + +6. **Automatic root dependency calculation required**: The detector sets `NeedsAutomaticRootDependencyCalculation = true`, indicating that the orchestrator must perform additional analysis to determine root-level dependencies diff --git a/docs/detectors/ruby.md b/docs/detectors/ruby.md new file mode 100644 index 000000000..01bc99ed3 --- /dev/null +++ b/docs/detectors/ruby.md @@ -0,0 +1,59 @@ +# Ruby + +## Requirements + +The Ruby detector scans for Ruby dependencies defined in Bundler lockfiles. + +**File Patterns:** `Gemfile.lock` + +**Supported Ecosystems:** RubyGems + +## Detection Strategy + +The detector parses `Gemfile.lock` files to identify Ruby gems and their dependencies. It processes the lockfile in multiple passes: + +### Parsing Approach + +1. **Section-based parsing**: The detector reads the lockfile by sections, which are identified by all-caps headings (`GEM`, `GIT`, `PATH`, `BUNDLED WITH`, etc.) + +2. **Component registration**: For each section, the detector extracts: + - **GEM section**: Standard RubyGems components with name, version, and remote source + - **GIT section**: Git-based dependencies with remote URL and revision + - **PATH section**: Local path dependencies + - **BUNDLED WITH section**: The Bundler version used to generate the lockfile + +3. **Dependency graph construction**: After collecting all components, the detector creates parent-child relationships by: + - Identifying top-level dependencies (4-space indentation) + - Mapping sub-dependencies (6-space indentation) to their parent components + - Using automatic root dependency calculation to determine direct vs transitive dependencies + +### Component Types + +- **RubyGemsComponent**: Standard gems from RubyGems.org or custom sources + - Properties: name, version, source +- **GitComponent**: Git-based dependencies + - Properties: remote URL, revision + +## Known Limitations + +### Version Resolution Constraints + +- **Relative versions are excluded**: Components with relative version specifiers (starting with `~` or `=`) are skipped and logged as parse failures. Only absolute versions are registered. +- **Fuzzy version handling**: Different sections of the lockfile can reference the same component, but authoritative version information is only stored in specific sections (e.g., the GEM section), requiring cross-section resolution. + +### Git Component Naming + +- Git components use a Ruby-specific "name" annotation that doesn't map directly to standard GitComponent semantics (remote/version). The detector works around this by maintaining a name-to-component mapping during parsing. + +### Root Dependency Detection + +- The detector uses **automatic root dependency calculation** rather than parsing the `DEPENDENCIES` section of `Gemfile.lock` (which lists user-specified dependencies from the `Gemfile`). +- This approach may not perfectly distinguish between direct and transitive dependencies in all cases. + +### Bundler Source Information + +- The `bundler` version is always registered with `"unknown"` as its source, since the lockfile doesn't specify where Bundler originated. + +### Excluded Dependencies + +- When a parent component has a relative version and is excluded, all of its child dependencies are also excluded from the dependency graph to maintain consistency. diff --git a/docs/detectors/spdx.md b/docs/detectors/spdx.md new file mode 100644 index 000000000..733e18281 --- /dev/null +++ b/docs/detectors/spdx.md @@ -0,0 +1,33 @@ +# SPDX Detection + +## Requirements + +SPDX detection depends on the following to successfully run: + +- One or more `*.spdx.json` files in the scan directory + +## Detection strategy + +The SPDX detector (`Spdx22ComponentDetector`) discovers SPDX SBOM (Software Bill of Materials) files in JSON format and creates components representing the SPDX documents themselves. + +The detector: +- Searches for files matching the pattern `*.spdx.json` +- Validates that the SPDX version is `SPDX-2.2` (currently the only supported version) +- Computes a SHA-1 hash of the SPDX file for identification +- Extracts metadata including: + - Document namespace + - Document name + - SPDX version + - Root element ID from `documentDescribes` (defaults to `SPDXRef-Document` if not specified) +- Creates an `SpdxComponent` to represent the SPDX document + +The detector does not parse or register individual packages listed within the SPDX document; it only registers the SPDX document itself as a component. + +## Known limitations + +- Only SPDX version 2.2 is currently supported +- Only JSON format is supported (`.spdx.json` files) +- The detector is **DefaultOff** and must be explicitly enabled via detector arguments +- If an SPDX document contains multiple elements in `documentDescribes`, only the first element is selected as the root element +- The detector does not create a dependency graph from the packages listed within the SPDX document +- Invalid JSON files or files that cannot be parsed are skipped with a warning diff --git a/docs/detectors/vcpkg.md b/docs/detectors/vcpkg.md index a08922993..decd095ba 100644 --- a/docs/detectors/vcpkg.md +++ b/docs/detectors/vcpkg.md @@ -1,23 +1,40 @@ -# vcpkg Detection +# Vcpkg Detection ## Requirements -vcpkg detection triggers off of `vcpkg.spdx.json` files found under the scan directory. You must use a version of vcpkg in your build that generates SBOM files (newer than 2022-05-05). +The Vcpkg detector searches for the following files: + +- `vcpkg.spdx.json` - SPDX 2.2 format SBOM files generated by vcpkg +- `manifest-info.json` - Metadata files that map installed packages to their source manifest + +You must use a version of vcpkg in your build that generates SBOM files (newer than 2022-05-05). For enhanced detection with manifest linking, use vcpkg >= 2025.02.14 and Component Detection >= v5.2.26. ## Detection strategy -The vcpkg detector searches for `vcpkg.spdx.json` files produced by vcpkg during the install process. These files are typically found under the installed packages directory in a path like `installed//share//vcpkg.spdx.json`. Each vcpkg port installes a separate `vcpkg.spdx.json` file[1]. +The detector operates in two phases: + +### Phase 1: Manifest Discovery +During the preparation phase, the detector locates `manifest-info.json` files within `vcpkg_installed` directories. These files contain the path to the source `vcpkg.json` or `vcpkg-configuration.json` manifest that describes the project's dependencies. The detector builds a mapping between the installed package location and the original manifest file. + +### Phase 2: SBOM Parsing +The detector searches for `vcpkg.spdx.json` files produced by vcpkg during the install process. These files are typically found under the installed packages directory in a path like `installed//share//vcpkg.spdx.json`. Each vcpkg port installs a separate `vcpkg.spdx.json` file[1]. -Because this detection strategy looks for the concrete files in the installed tree, it will accurately detect the precise packages used -during this build and exclude packages optionally used on other platforms. +The detector parses these SPDX files to extract component information and recognizes three types of SPDX package entries: -## Enhancements +- **Port packages** (`SPDXRef-port`): Library ports installed by vcpkg, including the port name, version, and port version +- **Binary packages** (`SPDXRef-binary`): Compiled binaries with triplet information (e.g., `x64-windows`, `x64-linux`) +- **Resource packages** (`SPDXRef-resource-*`): External resources and dependencies referenced by ports -The latest versions of `Component Detector (>= v5.2.26)` and `VCPKG (>= 2025.02.14)` resolve issues with Vcpkg detection by linking `vcpkg.spdx.json` files to their originating `vcpkg.json` file. This improvement, enabled through the new `manifest-info.json` introduced in `VCPKG`, ensures accurate dependency tracking and streamlines workflows like vulnerability resolution (e.g., Dependabot). +When a `manifest-info.json` exists in the `vcpkg_installed` directory, components are attributed to the source manifest rather than the SBOM file itself. The detector checks both the preferred location (`vcpkg_installed/vcpkg/manifest-info.json`) and a fallback location (`vcpkg_installed/manifest-info.json`). This enhancement, introduced in vcpkg >= 2025.02.14, ensures accurate dependency tracking and streamlines workflows like vulnerability resolution (e.g., Dependabot). + +Because this detection strategy looks for the concrete files in the installed tree, it accurately detects the precise packages used during the build and excludes packages optionally used on other platforms. ## Known limitations -The vcpkg detector does not distinguish between direct dependencies and transitive dependencies. It also does not distinguish -"development-only" dependencies that are not intended to impact the final shipping product. +The detector does not distinguish between direct dependencies and transitive dependencies. All detected components are registered as top-level dependencies without parent-child relationships. + +The detector does not distinguish "development-only" dependencies that are not intended to impact the final shipping product. + +The detector relies on vcpkg-generated SPDX files and does not parse `vcpkg.json` manifests directly. Projects must have run vcpkg installation to generate the required SBOM files. [1]: https://learn.microsoft.com/vcpkg/reference/software-bill-of-materials diff --git a/docs/detectors/yarn.md b/docs/detectors/yarn.md new file mode 100644 index 000000000..9760deb24 --- /dev/null +++ b/docs/detectors/yarn.md @@ -0,0 +1,81 @@ +# Yarn Detection + +## Requirements + +Yarn detection relies on the presence of lockfiles generated by the Yarn package manager. The detector searches for the following file: + +- `yarn.lock` + +Additionally, the detector requires a peer `package.json` file in the same directory as the `yarn.lock` file to determine root-level dependencies. + +## Detection strategy + +Yarn detection is performed by the `YarnLockComponentDetector`, which parses `yarn.lock` files found in the scan directory. The detection process follows these steps: + +1. **File Discovery**: Searches for `yarn.lock` files while skipping specific folders: + - `node_modules` + - `pnpm-store` + - `\package\` (folder named "package") + +2. **Lockfile Parsing**: Uses the `YarnLockFileFactory` to parse the `yarn.lock` file format, which includes: + - Package names and resolved versions + - Dependency relationships (regular and optional dependencies) + - Version ranges that each entry satisfies + +3. **Root Dependency Identification**: Reads the peer `package.json` file to determine which packages are direct (root) dependencies. This includes: + - `dependencies` + - `devDependencies` + - `peerDependencies` + - `optionalDependencies` + +4. **Workspace Support**: Handles Yarn workspaces by: + - Reading workspace patterns from the root `package.json` + - Using glob patterns to locate workspace `package.json` files + - Merging dependencies from all workspace packages + - Tracking component file paths for workspace dependencies + +5. **Dependency Graph Construction**: Builds a complete dependency graph by: + - Registering all root dependencies first + - Traversing the dependency tree using a breadth-first approach + - Creating parent-child relationships based on lockfile data + - Handling circular dependencies by tracking processed components + - Marking development dependencies based on the root `package.json` + +### Lookup Table Strategy + +The detector builds a lookup table where each key represents a package request (e.g., `npm@^2.3.4`) and maps to the resolved package entry. This is necessary because: + +- A single resolved package can satisfy multiple version ranges +- The `yarn.lock` file explicitly lists all satisfied version strings +- Example: `npm@2.3.4` satisfies requests for `npm@2`, `npm@2.3.4`, and `npm@^2.3.4` + +### Dependency Types + +The detector correctly identifies: +- **Root dependencies**: Direct dependencies from `package.json` +- **Development dependencies**: Marked when defined in `devDependencies` in the root `package.json` +- **Optional dependencies**: Included in the dependency graph alongside regular dependencies +- **Workspace dependencies**: Dependencies from workspace packages with their source file paths + +## Known limitations + +1. **Requires peer package.json**: If no `package.json` file is found in the same directory as the `yarn.lock` file, the components will not be registered as root dependencies. A warning will be logged and detection will be incomplete. + +2. **Lockfile-only packages**: If a package appears in `yarn.lock` but not in any `package.json` file (root or workspace), it will be registered as a component without parent relationships. This can happen with: + - Orphaned dependencies from previous installations + - Hoisted dependencies in workspaces + +3. **Missing lockfile entries**: If a package is declared in `package.json` but not found in the `yarn.lock` file, a parse failure is registered. This typically indicates: + - The lockfile is out of sync with `package.json` + - The project hasn't run `yarn install` after adding dependencies + +4. **Duplicate entries**: If the lockfile contains duplicate entries for the same package request, only the first entry is used and a warning is logged. + +5. **Workspace glob pattern complexity**: The detector uses `DotNet.Globbing` to match workspace patterns. Complex glob patterns may behave differently than Yarn's native glob implementation, particularly regarding: + - Case sensitivity (handled differently on Windows vs. Linux) + - Nested wildcard patterns + - Negative patterns + +6. **Parallel processing**: While the detector enables parallelism for performance, this requires careful handling of the component recorder's thread safety guarantees. + +7. **Development dependency propagation**: All transitive dependencies of a development dependency are also marked as development dependencies, which may not always reflect the actual usage in the project.