Skip to content

Commit

Permalink
Refactor walk features and the walk module.
Browse files Browse the repository at this point in the history
This change significantly refactors the `walk` module and its features.
Changes were motivated by issue #49, which describes poor performance
when walking directory trees. Filters did not properly compose, so if a
glob discards a directory but not its tree, then a `not` combinator does
not observe the file and so does not discard the tree when necessary.

Major changes include:

1. The `walk` module is now a part of the public API.
2. Unfiltered walking is provided via `PathExt`.
3. `FileIterator` is now very general: any `FileIterator` supports
   `filter_entry` and `not` combinators.
4. Issue #49 is fixed and any and all filtering is composed
   monotonically such that all filters observe both filtrate and
   residue.
5. `Glob::walk` is no longer implemented via a macro and `for_each_ref`
   has been removed.
6. The MSRV has been bumped to `1.66.1` for opaque lifetime features.

To compose filters, a `filter` module has been added that implements
separating filters, which are much like iterators and perform per-item
partitioning. Items are known as the feed and are one of two variants:
filtrate or residue. Filter mapping allows only monotonic separation
with special support for hierarchical tree iterators where there are two
residue variants: node and tree. Filtering by tree has an important side
effect: it uses a cancellation token to cancel the walk into the
sub-tree of that item, completely discarding it.

The paths used when matching a `Glob` against a directory tree have been
implemented in terms of the root path (a `PathBuf`) and a prefix in that
root path represented as a `usize`. A similar approach is used for
entries, such that they can cheaply implement a `root_relative_paths`
function to split the path into its root and relative parts.
  • Loading branch information
olson-sean-k committed Nov 17, 2023
1 parent 303b5d6 commit 35b000b
Show file tree
Hide file tree
Showing 10 changed files with 2,533 additions and 1,361 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/continuous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ jobs:
matrix:
os: [macOS-latest, ubuntu-latest, windows-latest]
toolchain:
- 1.65.0 # Minimum.
- 1.66.1 # Minimum.
- stable
- beta
- nightly
Expand Down
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ description = "Opinionated and portable globs that can be matched against paths
repository = "https://github.com/olson-sean-k/wax"
readme = "README.md"
edition = "2021"
rust-version = "1.65.0"
rust-version = "1.66.1"
license = "MIT"
keywords = [
"glob",
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,8 @@ for entry in glob.walk("doc") {
Match a directory tree against a glob with negations:

```rust
use wax::{Glob, LinkBehavior};
use wax::walk::{FileIterator, LinkBehavior};
use wax::Glob;

let glob = Glob::new("**/*.{md,txt}").unwrap();
for entry in glob
Expand Down
4 changes: 2 additions & 2 deletions src/capture.rs
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,8 @@ impl<'t> MatchedText<'t> {
/// Clones any borrowed data to an owning instance.
///
/// This function is similar to [`into_owned`], but does not consume its receiver. Due to a
/// technical limitation, `MatchedText` cannot implement [`Clone`], so this function is
/// provided as a stop gap that allows a distinct instance to be created that owns its data.
/// technical limitation, `MatchedText` cannot properly implement [`Clone`], so this function
/// is provided as a stop gap that allows a distinct instance to be created that owns its data.
///
/// [`Clone`]: std::clone::Clone
/// [`into_owned`]: crate::MatchedText::into_owned
Expand Down
245 changes: 58 additions & 187 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ mod diagnostics;
mod encode;
mod rule;
mod token;
mod walk;
pub mod walk;

#[cfg(feature = "miette")]
use miette::Diagnostic;
Expand All @@ -53,14 +53,11 @@ use thiserror::Error;
use crate::encode::CompileError;
use crate::rule::{Checked, RuleError};
use crate::token::{InvariantText, ParseError, Token, TokenTree, Tokenized};
#[cfg(feature = "walk")]
use crate::walk::WalkError;

pub use crate::capture::MatchedText;
pub use crate::diagnostics::{LocatedError, Span};
#[cfg(feature = "walk")]
pub use crate::walk::{
FileIterator, FilterTarget, FilterTree, LinkBehavior, Walk, WalkBehavior, WalkEntry, WalkError,
WalkNegation,
};

#[cfg(windows)]
const PATHS_ARE_CASE_INSENSITIVE: bool = true;
Expand Down Expand Up @@ -545,6 +542,7 @@ impl<'b> From<&'b str> for CandidatePath<'b> {
/// iterator over matching paths.
///
/// ```rust,no_run,ignore
/// use wax::walk::Entry;
/// use wax::Glob;
///
/// let glob = Glob::new("**/*.(?i){jpg,jpeg}").unwrap();
Expand Down Expand Up @@ -588,39 +586,6 @@ impl<'t> Glob<'t> {
Ok(Glob { tree, pattern })
}

/// Constructs a [`Glob`] from a glob expression with diagnostics.
///
/// This function is the same as [`Glob::new`], but additionally returns detailed diagnostics
/// on both success and failure.
///
/// See [`Glob::diagnose`].
///
/// # Examples
///
/// ```rust
/// use tardar::DiagnosticResultExt as _;
/// use wax::Glob;
///
/// let result = Glob::diagnosed("(?i)readme.{md,mkd,markdown}");
/// for diagnostic in result.diagnostics() {
/// eprintln!("{}", diagnostic);
/// }
/// if let Some(glob) = result.ok_output() { /* ... */ }
/// ```
///
/// [`Glob`]: crate::Glob
/// [`Glob::diagnose`]: crate::Glob::diagnose
/// [`Glob::new`]: crate::Glob::new
#[cfg(feature = "miette")]
#[cfg_attr(docsrs, doc(cfg(feature = "miette")))]
pub fn diagnosed(expression: &'t str) -> DiagnosticResult<'t, Self> {
parse_and_diagnose(expression).and_then_diagnose(|tree| {
Glob::compile(tree.as_ref().tokens())
.into_error_diagnostic()
.map_output(|pattern| Glob { tree, pattern })
})
}

/// Partitions a [`Glob`] into an invariant [`PathBuf`] prefix and variant [`Glob`] postfix.
///
/// The invariant prefix contains no glob patterns nor other variant components and therefore
Expand Down Expand Up @@ -706,154 +671,6 @@ impl<'t> Glob<'t> {
}
}

/// Gets an iterator over matching files in a directory tree.
///
/// This function matches a [`Glob`] against a directory tree, returning each matching file as
/// a [`WalkEntry`]. [`Glob`]s are the only patterns that support this semantic operation; it
/// is not possible to match combinators over directory trees.
///
/// As with [`Path::join`] and [`PathBuf::push`], the base directory can be escaped or
/// overridden by rooted [`Glob`]s. In many cases, the current working directory `.` is an
/// appropriate base directory and will be intuitively ignored if the [`Glob`] is rooted, such
/// as in `/mnt/media/**/*.mp4`. The [`has_root`] function can be used to check if a [`Glob`]
/// is rooted and the [`Walk::root`] function can be used to get the resulting root directory
/// of the traversal.
///
/// The [root directory][`Walk::root`] is established via the [invariant
/// prefix][`Glob::partition`] of the [`Glob`]. **The prefix and any [semantic
/// literals][`Glob::has_semantic_literals`] in this prefix are interpreted semantically as a
/// path**, so components like `.` and `..` that precede variant patterns interact with the
/// base directory semantically. This means that expressions like `../**` escape the base
/// directory as expected on Unix and Windows, for example.
///
/// This function uses the default [`WalkBehavior`]. To configure the behavior of the
/// traversal, see [`Glob::walk_with_behavior`].
///
/// Unlike functions in [`Pattern`], **this operation is semantic and interacts with the file
/// system**.
///
/// # Examples
///
/// ```rust,no_run
/// use wax::Glob;
///
/// let glob = Glob::new("**/*.(?i){jpg,jpeg}").unwrap();
/// for entry in glob.walk("./Pictures") {
/// let entry = entry.unwrap();
/// println!("JPEG: {:?}", entry.path());
/// }
/// ```
///
/// Glob expressions do not support general negations, but the [`not`] iterator adaptor can be
/// used when walking a directory tree to filter [`WalkEntry`]s using arbitary patterns. **This
/// should generally be preferred over functions like [`Iterator::filter`], because it avoids
/// unnecessary reads of directory trees when matching [exhaustive
/// negations][`Pattern::is_exhaustive`].**
///
/// ```rust,no_run
/// use wax::Glob;
///
/// let glob = Glob::new("**/*.(?i){jpg,jpeg,png}").unwrap();
/// for entry in glob
/// .walk("./Pictures")
/// .not(["**/(i?){background<s:0,1>,wallpaper<s:0,1>}/**"])
/// .unwrap()
/// {
/// let entry = entry.unwrap();
/// println!("{:?}", entry.path());
/// }
/// ```
///
/// [`Glob`]: crate::Glob
/// [`Glob::walk_with_behavior`]: crate::Glob::walk_with_behavior
/// [`has_root`]: crate::Glob::has_root
/// [`Iterator::filter`]: std::iter::Iterator::filter
/// [`not`]: crate::Walk::not
/// [`Path::join`]: std::path::Path::join
/// [`PathBuf::push`]: std::path::PathBuf::push
/// [`Pattern`]: crate::Pattern
/// [`Pattern::is_exhaustive`]: crate::Pattern::is_exhaustive
/// [`Walk::root`]: crate::Walk::root
/// [`WalkBehavior`]: crate::WalkBehavior
/// [`WalkEntry`]: crate::WalkEntry
#[cfg(feature = "walk")]
#[cfg_attr(docsrs, doc(cfg(feature = "walk")))]
pub fn walk(&self, directory: impl AsRef<Path>) -> Walk {
self.walk_with_behavior(directory, WalkBehavior::default())
}

/// Gets an iterator over matching files in a directory tree.
///
/// This function is the same as [`Glob::walk`], but it additionally accepts a
/// [`WalkBehavior`]. This can be used to configure how the traversal interacts with symbolic
/// links, the maximum depth from the root, etc.
///
/// Depth is relative to the [root directory][`Walk::root`] of the traversal, which is
/// determined by joining the given path and any [invariant prefix][`Glob::partition`] of the
/// [`Glob`].
///
/// See [`Glob::walk`] for more information.
///
/// # Examples
///
/// ```rust,no_run
/// use wax::{Glob, WalkBehavior};
///
/// let glob = Glob::new("**/*.(?i){jpg,jpeg}").unwrap();
/// for entry in glob.walk_with_behavior("./Pictures", WalkBehavior::default()) {
/// let entry = entry.unwrap();
/// println!("JPEG: {:?}", entry.path());
/// }
/// ```
///
/// By default, symbolic links are read as normal files and their targets are ignored. To
/// follow symbolic links and traverse any directories that they reference, specify a
/// [`LinkBehavior`].
///
/// ```rust,no_run
/// use wax::{Glob, LinkBehavior};
///
/// let glob = Glob::new("**/*.txt").unwrap();
/// for entry in glob.walk_with_behavior("/var/log", LinkBehavior::ReadTarget) {
/// let entry = entry.unwrap();
/// println!("Log: {:?}", entry.path());
/// }
/// ```
///
/// [`Glob`]: crate::Glob
/// [`Glob::partition`]: crate::Glob::partition
/// [`Glob::walk`]: crate::Glob::walk
/// [`LinkBehavior`]: crate::LinkBehavior
/// [`Walk::root`]: crate::Walk::root
/// [`WalkBehavior`]: crate::WalkBehavior
#[cfg(feature = "walk")]
#[cfg_attr(docsrs, doc(cfg(feature = "walk")))]
pub fn walk_with_behavior(
&self,
directory: impl AsRef<Path>,
behavior: impl Into<WalkBehavior>,
) -> Walk {
walk::walk(self, directory, behavior)
}

/// Gets **non-error** [`Diagnostic`]s.
///
/// This function requires a receiving [`Glob`] and so does not report error-level
/// [`Diagnostic`]s. It can be used to get non-error diagnostics after constructing or
/// [partitioning][`Glob::partition`] a [`Glob`].
///
/// See [`Glob::diagnosed`].
///
/// [`Diagnostic`]: miette::Diagnostic
/// [`Glob`]: crate::Glob
/// [`Glob::diagnosed`]: crate::Glob::diagnosed
/// [`Glob::partition`]: crate::Glob::partition
#[cfg(feature = "miette")]
#[cfg_attr(docsrs, doc(cfg(feature = "miette")))]
pub fn diagnose(&self) -> impl Iterator<Item = Box<dyn Diagnostic + '_>> {
diagnostics::diagnose(self.tree.as_ref())
}

/// Gets metadata for capturing sub-expressions.
///
/// This function returns an iterator over capturing tokens, which describe the index and
Expand Down Expand Up @@ -903,6 +720,60 @@ impl<'t> Glob<'t> {
}
}

/// APIs for diagnosing globs.
#[cfg(feature = "miette")]
#[cfg_attr(docsrs, doc(cfg(feature = "miette")))]
impl<'t> Glob<'t> {
/// Constructs a [`Glob`] from a glob expression with diagnostics.
///
/// This function is the same as [`Glob::new`], but additionally returns detailed diagnostics
/// on both success and failure.
///
/// See [`Glob::diagnose`].
///
/// # Examples
///
/// ```rust
/// use tardar::DiagnosticResultExt as _;
/// use wax::Glob;
///
/// let result = Glob::diagnosed("(?i)readme.{md,mkd,markdown}");
/// for diagnostic in result.diagnostics() {
/// eprintln!("{}", diagnostic);
/// }
/// if let Some(glob) = result.ok_output() { /* ... */ }
/// ```
///
/// [`Glob`]: crate::Glob
/// [`Glob::diagnose`]: crate::Glob::diagnose
/// [`Glob::new`]: crate::Glob::new
#[cfg(feature = "miette")]
#[cfg_attr(docsrs, doc(cfg(feature = "miette")))]
pub fn diagnosed(expression: &'t str) -> DiagnosticResult<'t, Self> {
parse_and_diagnose(expression).and_then_diagnose(|tree| {
Glob::compile(tree.as_ref().tokens())
.into_error_diagnostic()
.map_output(|pattern| Glob { tree, pattern })
})
}

/// Gets **non-error** [`Diagnostic`]s.
///
/// This function requires a receiving [`Glob`] and so does not report error-level
/// [`Diagnostic`]s. It can be used to get non-error diagnostics after constructing or
/// [partitioning][`Glob::partition`] a [`Glob`].
///
/// See [`Glob::diagnosed`].
///
/// [`Diagnostic`]: miette::Diagnostic
/// [`Glob`]: crate::Glob
/// [`Glob::diagnosed`]: crate::Glob::diagnosed
/// [`Glob::partition`]: crate::Glob::partition
pub fn diagnose(&self) -> impl Iterator<Item = Box<dyn Diagnostic + '_>> {
diagnostics::diagnose(self.tree.as_ref())
}
}

impl Display for Glob<'_> {
fn fmt(&self, f: &mut Formatter) -> fmt::Result {
write!(f, "{}", self.tree.as_ref().expression())
Expand Down
Loading

0 comments on commit 35b000b

Please sign in to comment.