Skip to content

snivilised/extendio

Repository files navigation

πŸ‹ extendio: extensions to Go standard io libraries

A B A B A B Go Reference Go report Coverage Status ExtendIO Continuous Integration pre-commit A B

go.dev

⚠ DOCUMENTATION IS A WORK IN PROGRESS

πŸ”° Introduction

This project provides extensions/alternative implementations to Go standard libraries, typically (typically but not limited to) io and filepath. It is intended the client should be abe to use this alongside the standard libraries like io.fs, but to make it easier to do so, the convention within extendio will be to name sub-packages it contains with a prefix of x, so that there is no clash with the standard version and therefore nullifies the requirement to use an alternative alias; eg the fs package inside extendio is called xfs.

πŸš€ Quick Start

πŸ‘£ Traversal

To invoke a traversal, create a PrimarySession with the root path:

  import ("github.com/snivilised/extendio/xfs/nav")

  session := nav.PrimarySession{
    Path: "/foo/bar/",
  }

then configure by calling the Configure method on the session:

  callback := nav.LabelledTraverseCallback{
    Fn: func(item *nav.TraverseItem) error {
      fmt.Printf("Current Item Path: '%v' \n", item.Path)
      err := something
      return err
    },
  }

  result := session.Configure(func(o *nav.TraverseOptions) {
    o.Store.Subscription = nav.SubscribeFolders
    o.Callback = callback
  }).Run()

  noOfFoldersFound := (*result.Metrics)[nav.MetricNoFoldersEn].Count

πŸ“ Points of Note:

  • the callback here is actually an instance of LabelledTraverseCallback, which is a struct that contains the function to be invoked and a Label. The Label is optional and was defined for debugging purposes. When you have a lot of func definitions, its difficult to identify which is which without having some form of identification.

  • function signature of TraverseCallback is defined as follows:

func(item *TraverseItem) error
  • Configure requires a function to be passed in that receives an instance of TraverseOptions, which is already populated with default values. The function the client provides simply sets the required options (see options reference). ⚠ The Callback option is mandatory, if not set then traversal will fail with a panic.
  • the call to Configure returns an instance of NavigationRunner, which contains a single Run method that returns a TraverseResult
  • the TraverseResult contains a Metrics (of type MetricCollection) item which currently indicates the number of files and folders the callback has been invoked for during the traversal. To inspect, use the MetricEnum (MetricNoFilesEn, MetricNoFoldersEn) to index into Metrics as illustrated in the example.
  • this example traverses the file system rooted at the path indicated in the session ('/foo/bar/') and invokes the callback for all folders found in the tree.

πŸŽ€ Features

πŸ‘£ Traverse

  • Provides a pre-emptive declarative paradigm, to allow the client to be notified on a wider set of criteria and to minimise callback invocation. This allows for more efficiency when navigating large directory trees.
  • More comprehensive filtering capabilities incorporating that which is already provided by filepath.Match. The filtering will include positive and negative matching for globs (shell patterns) and regular expressions.
  • The client is able to define custom filters
  • The callback function signature will differ from WalkDir. Instead of being passed just the corresponding fs.DirEntry, another custom type will be introduced which contains as a member fs.DirEntry. More properties can be attached to this new abstraction to support more features (as indicated below).
  • Add Depth property. This will indicate to the callback how many levels of descending has occurred relative to the root directory.
  • Add IsLeaf property. The client may need to know if the current directory being notified for is a leaf directory. In fact as part of the declarative paradigm, the client may if they wish request to be notified for leaf nodes only and this will be achieved using the IsLeaf property.

♻️ Resume

  • Add Resume function. Typically required in recovery scenarios, particularly when a large directory tree is being traversed and has been terminated part way, possibly in response to a CTRL-C interrupt. Instead of requiring a full traversal of the directory tree, the Resume function can be used to only process that part of the tree not visited in the previous run. The Resume function would require the Root path parameter, and a checkpoint path. The term fractured ancestor is introduced which denotes those directory nodes in the tree whose contents were only partially visited. Starting at the checkpoint, Resume would traverse the tree beginning at the checkpoint, then get the parent and find successor sibling nodes, invoking their corresponding trees. Then ascend and repeat the process until the root is encountered. Resume needs to invoke Traverse for each sub tree individually.

🌐 i18n

  • In order to support i18n, error handling will be implemented slightly differently to the standard error handling paradigm already established in Go. Simply returning an error which is just a string containing an error message, is not i18n friendly. We could just return an error code which of course would be documented, but it would be more useful to return a richer abstraction, another object which contains various properties about the error. This object will contain an error code (probably int based, or pseudo enum). It can even contain a string member which contains the error message in English, but the error code would allow for messages to be translated (possibly using Go templates). The exact implementation has not been finalised yet, but this is the direction of travel.

πŸ“¨ Message Bus

  • Contains an alternative version of bus. The requirement for a bus implementation is based upon the need to create loosely coupled internal code. The original bus was designed with concurrency in mind so it uses locks to achieve thread safety. This aspect is surplus to requirements as all we need it for are synchronous scenarios, so it has been striped out.

☒ Error Handling

User Guide

πŸ‘£ Using Traverse

The Traverse feature comes with many options to customise the way a file system is traversed illustrated in the following table:

βš™οΈ Options Reference

Name - - - Default Reference
StoreπŸ”— REF
SubscriptionπŸ”— SubscribeAny REF
DoExtendπŸ”— false
Behaviours
SubPathπŸ”—
KeepTrailingSepπŸ”— true
Sort
IsCaseSensitiveπŸ”— false
DirectoryEntryOrderπŸ”— DirectoryEntryOrderFoldersFirstEn
ListenπŸ”—
InclusiveStartπŸ”— true
InclusiveStopπŸ”— false
LoggingπŸ”—
Path ~/snivilised.extendio.nav.log
TimeStampFormat 2006-01-02 15:04:05
Level InfoLevel
Rotation
MaxSizeInMb 50
MaxNoOfBackups 3
MaxAgeInDays 28
Callback ❌ (mandatory)
NotifyπŸ”—
OnBegin no-op
OnEnd no-op
OnDescend no-op
OnAscend no-op
OnStart no-op
OnStop no-op
HooksπŸ”—
QueryStatus LstatHookFn
ReadDirectory ReadEntries
FolderSubPath RootParentSubPath
FileSubPath RootParentSubPath
InitFilters InitFiltersHookFn
Sort CaseSensitiveSortHookFn / CaseInSensitiveSortHookFn
Extend DefaultExtendHookFn / no-op
ListenπŸ”—
Start no-op
Stop no-op
PersistπŸ”—
Format PersistInJSONEn

Options.Store

  • Sort.IsCaseSensitive: blah

  • Sort.DirectoryEntryOrder: blah

  • Logging: blah

Options.Hooks

Options.Listen

Options.Persist

πŸ₯₯ Subscription Types

A subscription defines which file system item type the callback gets invoked for. The client can make a subscription of one of the following types:

  • files (SubscribeFiles): callback invoked for file items only
  • folders (SubscribeFolders): callback invoked for folder items only
  • folders with files (SubscribeFoldersWithFiles): callback invoked for folder items only, but includes all files that are contained inside the folder, as the Children property of TraverseItem that the callback is invoked with
  • all (SubscribeAny): callback invoked for files and folders

πŸ‰ Scopes

Extra semantics have been assigned to folders which allows for enhanced filtering. Each folder is allocated a scope depending on a combination of factors including the depth of that folder relative to the root and whether the folder contains any child folders. Available scopes are:

  • root: the root node, ie the path specified by the user to start traversing from
  • top: any node that is a direct descendent of the root node
  • leaf: any node that has no sub folders
  • intermediate: nodes which are neither root, top or leaf nodes

A node may contain multiple scope designations. The following are valid combinations:

  • root and leaf
  • top and leaf

πŸ’ Filters

There are 2 categories of filters, a node filter (defined in options at Options.Store.FilterDefs.Node) and a child filter (defined at Options.Store.FilterDefs.Children). The node filter is applied to a single entity (the file system item, for which the callback is being invoked for), where as the child filter is a compound filter which is applied to a collection, ie the list of the current folder's file items (for subscription type folders with files).

Filter Types

The following filter types are available:

  • regex: built in filter by a Go regular expression
  • glob: built in filter by a glob pattern, characterised by use of *
  • custom: allows the client to perform custom filtering

built in filters also benefit from the following features

  • negation: a filter's logic can be reversed, by setting the Negate property of the FilterDef to true. Any node will now only be invoked for, if it does not match the defined pattern.
  • scope: a filter can be restricted to only be applied to those matching the defined scope. Eg a filter may specify a scope of intermediate which means that it is only applicable to intermediate nodes. To turn off scope based filtering, use the all scope (ScopeAllEn) in the filter definition (FilterDef.Scope)
  • ifNotApplicable: when scope filtering is in use, we can also change the behaviour of the filter if it is not applicable to the node. By default, if the filter is not applicable, the callback will not be invoked for that node. The client can invert this behaviour so that if the filter is not applicable, then the filter should not activate and allow the callback to be invoked. To use this, the FilterDef's IfNotApplicable property should be set to true.

πŸ“ Extension

The Extension provides extra information contained in the TraverseItem that is passed to the client callback. To request the Extension, the client should set the DoExtend property in the traverse options at Options.Store.DoExtend to true.

⚠ Warning: Only use the properties on the Extension (TraverseItem.Extension) if the DoExtend described above has been set to true. If Extension is not active, attempting to reference a field on it will result in a panic.

Extension properties include the following:

  • Depth: traversal depth relative to the root
  • IsLeaf: is the item a leaf node (file items are always leaf nodes)
  • Name: is just the name portion of the item's path (TraverseItem.Path)
  • Parent: is the parent path of the current node
  • SubPath: represents the relative path between the root and the current node
  • NodeScope: scope designation applied to the current node
  • Custom: a client defined property that can be set by overriding the Extension (see next)

The Extension can be overridden using the hook function. The default Extension hook is implemented by exported function DefaultExtendHookFn. The client needs to set a custom extend function on the options at: Options.Hooks.Extend. See hooks for function signature. If the client just needs to augment the default functionality rather than replace it, in the custom function implemented by the client, just needs to invoke the default function DefaultExtendHookFn.

Behaviours.SubPath

When composing the SubPath on the Extension, 2 hooks are employed, 1 for files FileSubPath and the other for folders FolderSubPath. The SubPath created by both of these can be configured to retain a trailing path separator using option setting Options.Store.Behaviours.SubPath.KeepTrailingSep which defaults to true.

⛏️ Hooks

The behaviour of the traversal process can be modified by use of the declared hooks. The following shows the hooks with the function type and default hook indicated inside brackets:

  • QueryStatus (QueryStatusHookFn, LstatHookFn): acquires the fs.FileInfo entry of the root node
  • ReadDirectory (ReadDirectoryHookFn, ReadEntries): reads the contents of a directory
  • FolderSubPath (SubPathHookFn, RootParentSubPath): used to populate the SubPath property of TraverseItem.Extension for folder nodes
  • FileSubPath (SubPathHookFn, RootParentSubPath): used to populate the SubPath property of TraverseItem.Extension for file nodes
  • InitFilters (FilterInitHookFn, InitFiltersHookFn): filter initialisation function
  • Sort (SortEntriesHookFn, set depending on value of Options.Store.Behaviours.Sort.IsCaseSensitive): sorting function When Options.Store.Behaviours.Sort.IsCaseSensitive is set to true, then the default function is CaseSensitiveSortHookFn otherwise CaseInSensitiveSortHookFn
  • Extend (ExtendHookFn, set depending on value of Options.Store.DoExtend): When Options.Store.DoExtend is set to true, then the default function is DefaultExtendHookFn otherwise set to an internally defined no op function.

πŸ”” Notifications

Enables client to be called back during specific moments of the traversal. The following notifications are available (with the function type indicated inside brackets):

  • OnBegin (BeginHandler): beginning of traversal
  • OnEnd (EndHandler): end of traversal
  • OnDescend (AscendancyHandler): invoked as a folder is descended
  • OnAscend (AscendancyHandler): invoked as a folder is ascended
  • OnStart (ListenHandler): start listening condition met (if listening enabled)
  • OnStop (ListenHandler): finish listening condition met (if listening enabled)

🎧 Listening

The Listen feature allows the client to define a particular condition when callback invocation is to start and when to stop. The client does this by defining predicate functions in the options at Options.Listen.Start and Options.Listen.Stop.

The client can choose to define either or both of the Listen events. If Start is defined, then once traversal begins, the callback will not be invoked until the first node is encountered that satisfies the condition. If Stop is defined, then the callback will cease to be called at the point when the End predicate fires and the traversal is ended early.

The Start and Stop conditions are defined using ListenBy, eg:

  session.Configure(func(o *nav.TraverseOptions) {
    o.Store.Behaviours.Listen.InclusiveStart = true
    o.Store.Behaviours.Listen.InclusiveStop = false
    o.Listen.Start =   nav.ListenBy{
      Name: "Start listening at Night Drive",
      Fn: func(item *nav.TraverseItem) bool {
        return item.Extension.Name == "Night Drive"
      },
    }
    o.Listen.Stop = nav.ListenBy{
      Name: "Stop listening at Electric Youth",
      Fn: func(item *nav.TraverseItem) bool {
        return item.Extension.Name == "Electric Youth"
      },
    }
  })

πŸ“ Points of Note:

  • start listening when node found whose name is "Night Drive"
  • stop listening when node found whose name is "Electric Youth"
  • InclusiveStart and InclusiveStop shown in this example are the default values so do not need to be specified, (just showed here for illustration). The Inclusive settings allows the client to adjust whether the callback is invoked at the time the predicate is fired. When Inclusive is true, the callback is invoked for the current item. When false, the callback is not invoked for the current node item. So for the default settings, the callback is invoked when the Start predicate fires, but not when the Stop predicate fires (inclusive for Start and exclusive for Stop)
  • the predicates for Start and for Stop are defined by the Listener interface. This means that the client can use a filter to define these predicates, the previous example defined with filters is shown as follows:

πŸ’₯ NOT IMPLEMENTED YET see issue #125

  session.Configure(func(o *nav.TraverseOptions) {
    o.Listen.Start =   nav.RegexFilter{
      Filter: nav.Filter{
        Name:            "Start listening at Night Drive",
        RequiredScope:   nav.ScopeAllEn,
        Pattern:         "^Night Drive$",
      },
    }
    o.Listen.Start =   nav.GlobFilter{
      Filter: nav.Filter{
        Name:            "Stop listening at Electric Youth",
        RequiredScope:   nav.ScopeAllEn,
        Pattern:         "Electric Youth",
      },
    }
  })

🎬 Logging

🧰 Other Utils

πŸ”¨ Development

RxGo

rxjs

To support concurrency features, Extendio uses the reactive model provided by RxGo. However, since RxGo seems to be a dead project with its last release in April 2021 and its unit tests not currently running successfully, the decision has been made to re-implement this locally. One of the main reasons for the project no longer being actively maintained is the release of generics feature in Go version 1.18, and supporting generics in RxGo would require significant effort to re-write the entire library. While work on this has begun, it's unclear when this will be delivered. Despite this, the reactive model's support for concurrency is highly valued, and Extendio aims to make use of a minimal functionality set for parallel processing during directory traversal. The goal is to replace it with the next version of RxGo when it becomes available.

See: