Skip to content

Conversation

ivy
Copy link

@ivy ivy commented Aug 27, 2025

Adds validation to ensure package_todo.yml files maintain the correct serialization format, preventing manual edits from breaking the standard format and causing noise when running pks update.

Changes

  • Add PackageTodoFormatChecker validator that compares existing files with their expected serialized format
  • Integrate validator into existing validation system using parallel processing for performance
  • Add comprehensive tests with fixtures for both valid and invalid formats
  • Show context-aware error messages suggesting correct update command based on packs_first_mode
  • Add detailed documentation explaining validation purpose, implementation, and common failure causes
  • Minimal performance impact - benchmarking shows only 1% slowdown (10ms), negligible compared to the benefit

Test plan

  • Validation fails for incorrectly formatted package_todo.yml files
  • Validation passes for correctly formatted package_todo.yml files
  • Error messages suggest appropriate command based on packs_first_mode setting
  • Parallel processing optimizes performance for large codebases
  • Integration with existing validation system works correctly

This prevents issues where mass search-replace operations (like renaming packs) result in incorrectly formatted files that create large changesets when people run pks update.

ivy added 3 commits August 27, 2025 12:48
Adds validation to ensure package_todo.yml files maintain the correct
serialization format, preventing manual edits from breaking the standard
format and causing noise when running `pks update`.

- Add PackageTodoFormatChecker validator that compares existing files
  with their expected serialized format
- Integrate validator into existing validation system
- Add comprehensive tests with fixtures for both valid and invalid formats
- Show context-aware error messages suggesting correct update command
  based on packs_first_mode

This prevents issues where mass search-replace operations (like renaming
packs) result in incorrectly formatted files that create large changesets
when people run `pks update`.
Add detailed doc comments explaining the package_todo.yml format validation
functionality to help future maintainers understand:

- How the validation works (deserialize -> re-serialize -> compare)
- Why it's needed (prevent format inconsistencies from manual edits)
- Common causes of validation failures
- The role of serialize_package_todo in maintaining format consistency

Documentation covers:
- Module-level overview of the validation purpose
- Function-level details for validation logic
- Test documentation explaining test scenarios
- Implementation details for the serialization format
Replace sequential iteration with par_iter() to validate package_todo.yml
files in parallel, improving performance for large codebases with many packages.
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds validation to ensure package_todo.yml files maintain the correct serialization format, preventing manual edits from breaking the standard format and causing noise when running pks update.

  • Adds a new PackageTodoFormatChecker validator that compares existing files with their expected serialized format
  • Integrates the validator into the existing validation system using parallel processing for performance
  • Includes comprehensive tests with fixtures for both valid and invalid formats

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/package_todo_format_test.rs Test suite validating correct and incorrect package_todo.yml formats
tests/fixtures/incorrectly_formatted_package_todo/* Test fixture with incorrectly ordered violations to test validation failure
tests/fixtures/contains_package_todo/packs/foo/package_todo.yml Updated existing fixture with YAML separator for consistency
src/packs/package_todo.rs Made serialize_package_todo function public and added comprehensive documentation
src/packs/checker/package_todo.rs New validator implementation for package_todo.yml format checking
src/packs/checker.rs Integration of new validator into validation system

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Replace &String parameters with &str to avoid unnecessary heap allocations
and improve API design. Addresses feedback from code review.

- Change serialize_package_todo parameter from &String to &str
- Change header function parameter from &String to &str
- Remove unnecessary to_string() call when passing pack_name
///
/// This function is used both for writing new package_todo.yml files and for
/// format validation to ensure existing files match the expected format.
pub(crate) fn serialize_package_todo(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with Rust's modules. Is this a code smell? I didn't want to dive into the "how should we organize and expose these types/methods" rabbit hole just yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, if you don't want the method to be part of the public API. If you really want to limit the usage you can declare pub(super) so only this and the parent module can use it, but that's likely overkill for this project.

Copy link

@smacfarlane smacfarlane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the unwrap, this looks safe to ship as is. I've left additional comments on how to use anyhow for consistency within pks.

It's been a while since I've written any Rust though, so it would be good to have a second set of eyes on this.

package_todo_path: &Path,
pack_name: &str,
packs_first_mode: bool,
) -> Result<(), String> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This crate uses the anyhow to simplify the Result handling. I would be good to use it here for consistency. Example

Suggested change
) -> Result<(), String> {
) -> anyhow::Result<()> {

The return type can also be written as Result, if use anyhow::Result is included above.


// Compare the current content with the expected serialized format
if current_content != expected_content {
return Err(format!(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using anyhow, this would need to be written as

Suggested change
return Err(format!(
return Err(anyhow!(

with a use anyhow::anyhow in the imports above.

Comment on lines 97 to 100
let current_content =
fs::read_to_string(package_todo_path).map_err(|e| {
format!("Failed to read {}: {}", package_todo_path.display(), e)
})?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With anyhow, you wouldn't need the map_err here

Suggested change
let current_content =
fs::read_to_string(package_todo_path).map_err(|e| {
format!("Failed to read {}: {}", package_todo_path.display(), e)
})?;
let current_content = fs::read_to_string(package_todo_path)?;

Comment on lines 103 to 106
let package_todo: PackageTodo = serde_yaml::from_str(&current_content)
.map_err(|e| {
format!("Failed to parse {}: {}", package_todo_path.display(), e)
})?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With anyhow, you wouldn't need the map_err here

Suggested change
let package_todo: PackageTodo = serde_yaml::from_str(&current_content)
.map_err(|e| {
format!("Failed to parse {}: {}", package_todo_path.display(), e)
})?;
let package_todo: PackageTodo = serde_yaml::from_str(&current_content)?;

.par_iter()
.filter_map(|pack| {
let package_todo_path =
pack.yml.parent().unwrap().join("package_todo.yml");
Copy link

@smacfarlane smacfarlane Aug 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.unwrap() can lead to runtime panics. It's best to explicitly handle the None case.

    /// Returns [`None`] if the path terminates in a root or prefix, or if it's
    /// the empty string.

If, in this context, the parent will never be one of the above cases, we can instead use expect("parent will always exist") to provide the same behavior as unwrap with additional context as to why it's safe.

&pack.name,
configuration.packs_first_mode,
)
.err()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using anyhow you'll need to convert these into String

Suggested change
.err()
.map_err(|e| e.to_string()).err

///
/// This function is used both for writing new package_todo.yml files and for
/// format validation to ensure existing files match the expected format.
pub(crate) fn serialize_package_todo(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, if you don't want the method to be part of the public API. If you really want to limit the usage you can declare pub(super) so only this and the parent module can use it, but that's likely overkill for this project.

use crate::packs::pack::write_pack_to_disk;
use crate::packs::pack::Pack;
use crate::packs::package_todo;
use crate::packs::package_todo as package_todo_module;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to differentiate the names of these two modules more, if not at a higher level, then at least for this alias. It's not clear what the difference is.
If one is meant to be public and the other internal, it may be worth restructuring the code to create an create::internal:: namespace.

@@ -0,0 +1,125 @@
//! Package todo format validation checker.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now. We've introduced a new package_todo module when there was already an existing one in the checker namespace. That's why you're having namespace collisions.

None
} else {
Some(validation_errors)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is totally legit and there's nothing wrong with it, but I would prefer pattern matching in general.
It's also just good to know how to pattern match a slice.

match &validation_errors[..] {
    [] => None,
    _ => Some(validation_errors)
}

I whipped up a more complete example of matching over slices for you.
https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=7932bced185feb6f53dc7b23194c3466

To be 100% clear, I don't actually have a problem with the if/else here.
You were just asking about idiomatic rust so it seemed to be a learning opportunity to bring this up.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This is exactly the kind of feedback I'm looking for. Thank you!

));
}

Ok(())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a place I would definitely prefer pattern matching over an if return.
The if return structure leaves room to wedge code in between this and returning the Ok, which can encourage complexity in the future as more early returns get added in here.

match current_content {
    expected_content => Ok(()),
     _ => {
        Err(/*...*/)     
     }
 }

Copy link
Contributor

@ashleywillard ashleywillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for taking this on! I'll leave the syntax and such to the rust experts, but this looks good to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to have a test that ensures packs without a package todo file are correctly skipped. I wouldn't consider this a blocker though!

return Err(format!(
"Package todo file {} is not in the expected format. Please run `{}` to fix it.",
package_todo_path.display(),
if packs_first_mode { "pks update" } else { "bin/packwerk update-todo" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can definitely be improved, but we have https://github.com/rubyatscale/pks/blob/main/src/packs/bin_locater.rs for identifying the executable.

@ivy ivy mentioned this pull request Aug 29, 2025
ivy and others added 3 commits September 23, 2025 10:40
- Remove unnecessary 'as package_todo_module' alias in checker.rs
- Rename package_todo.rs to todo_format.rs for clarity
- Replace unwrap() with explicit expect() for better error handling

Co-authored-by: Sweta Sanghavi <sweta.sanghavi@gusto.com>
Co-authored-by: Denis Kisselev <denis.kisselev@gusto.com>
Co-authored-by: Martin Emde <martin.emde@gusto.com>
…tion

- Migrate from custom Result<(), String> to anyhow::Result<()> for consistency with codebase
- Remove manual map_err calls and leverage anyhow's automatic error conversion
- Use bin_locater::packs_bin_name() for command suggestions in packs_first_mode
- Maintain backward compatibility with bin/packwerk update-todo for legacy mode

Addresses PR feedback to align error handling patterns and centralize command generation.
- Rename test file: package_todo_format_test.rs → todo_format_validation_test.rs
- Add test for violations existing but no package_todo.yml files
- Add test for clean case with no violations and no todo files
- Correct test #2 to use privacy_violation_overrides fixture with actual violations
- Ensure validator correctly skips packs without todo files in all scenarios
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

Successfully merging this pull request may close these issues.

5 participants