Skip to content

Enable key:value completions & allow removal of tags with --force-cv#87

Closed
jneidel wants to merge 4 commits intonovoid:masterfrom
jneidel:colon-completion
Closed

Enable key:value completions & allow removal of tags with --force-cv#87
jneidel wants to merge 4 commits intonovoid:masterfrom
jneidel:colon-completion

Conversation

@jneidel
Copy link
Copy Markdown
Contributor

@jneidel jneidel commented Feb 20, 2026

Review by commit.

Todo

Significant changes

Improvement: Enable key:value completions

Given the .filetags:

other:bugs other:features other:refactors

Completion on the colon does not work as expected, see this demo for a before and after:
image

Fix: Allow removal of tags not in controlled vocabulary with --force-cv

Follow #80

With --force-cv activated it was previously not possible to remove tags unless they were in the cv.
This is inconvenient. When you rename or remove a tag and then go to cleanup you couldn't use the usual command (which includes --force-cv). Now removal of non-cv tags is possible.

See demo before and after:
image

Tests

image

@novoid
Copy link
Copy Markdown
Owner

novoid commented Feb 20, 2026

I don't understand the idea behind renaming "controlled_vocabulary" (a term often used in literature related to personal management or personal informatics) with "constant_vocabulary". Would you elaborate on that?

@novoid
Copy link
Copy Markdown
Owner

novoid commented Feb 20, 2026

I don't like colon as separator although it's the most commonly used separator for key value pairs: it's an illegal character for Windows file systems: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file

Therefore it would impose issues when people start adapting colons as parts of file names on non-Windows file systems and then face issues when trying to copy files to USB thumb drives or share via network services.

Would you provide a PR without colons?

@jneidel
Copy link
Copy Markdown
Contributor Author

jneidel commented Feb 20, 2026

I don't like colon as separator although it's the most commonly used separator for key value pairs: it's an illegal character for Windows file systems: https://learn.microsoft.com/en-us/windows/win32/fileio/naming-a-file
Therefore it would impose issues when people start adapting colons as parts of file names on non-Windows file systems and then face issues when trying to copy files to USB thumb drives or share via network services.

I am aware of the windows problem, but why must stand in the way of completion on linux?

Even as it is right now, one can just use colon separated tags.
This change does not incentivize nor advertise colon separation, it merely makes the completion work as expected for them.

I only got the idea of key:value tags from your great how to do tagging article, in which you clearly mention the windows caveat.

@jneidel
Copy link
Copy Markdown
Contributor Author

jneidel commented Feb 20, 2026

I don't understand the idea behind renaming "controlled_vocabulary" (a term often used in literature related to personal management or personal informatics) with "constant_vocabulary". Would you elaborate on that?

I introduced the variable "controlled_vocabulary" in #80.
A variable "vocabulary" already exists. What is the difference between vocabulary and controlled_vocabulary (the variables)? The name does not tell you.
The difference is that vocabulary will be modified by the code, controlled_vocabulary will not be.
Renaming make the purpose of the variable more clear. Constant = unchanging. Also see the comment at creation:

    vocabulary = list(controlled_vocabulary) # will be modified
    constant_vocabulary = controlled_vocabulary # will stay constant/unmodified

@novoid
Copy link
Copy Markdown
Owner

novoid commented Feb 21, 2026

I introduced the variable "controlled_vocabulary" in #80. A variable "vocabulary" already exists. What is the difference between vocabulary and controlled_vocabulary (the variables)? The name does not tell you. The difference is that vocabulary will be modified by the code, controlled_vocabulary will not be. Renaming make the purpose of the variable more clear. Constant = unchanging. Also see the comment at creation:

    vocabulary = list(controlled_vocabulary) # will be modified
    constant_vocabulary = controlled_vocabulary # will stay constant/unmodified

I see.

However, the term is problematic in my opinion. In the books, "controlled vocabulary" (CV) stands for the general idea of having a user-curated set of vocabulary - in this case: tags.

When I understand you correctly, both variable names deal with user-curated tags.

Therefore, I would rename both variables in order to avoid misunderstanding - just as I did misunderstand the nature of the variable's purpose.

In general, I'd say that "controlled vocabulary" should always related to the content of .filetags files, that do represent the concept of CV within filetags implementation. If a variable is not just containing that data, it should not be named like CV. If we have multiple derived data structures, better to avoid the CV term and use descriptive names that differ.

@novoid
Copy link
Copy Markdown
Owner

novoid commented Feb 21, 2026

I am aware of the windows problem, but why must stand in the way of completion on linux?

Yes. This is not a Linux-only tool despite the fact that (most likely) we both are not using Windows-based file systems at all.

Even as it is right now, one can just use colon separated tags. This change does not incentivize nor advertise colon separation, it merely makes the completion work as expected for them.

I would even say that filetags must make sure that colons may not entered by the user by accident. I don't think that I've introduced any user input sanitation check to avoid illegal characters. Maybe this is the right moment to think about introducing such a check.

I only got the idea of key:value tags from your great how to do tagging article, in which you clearly mention the windows caveat.

I know what your intentions are and I'm really pissed myself that Microsoft did that stupid design decision back in the old days (or inherited from CP/M and/or QDOS). Colons are very common characters for all sorts of things that also might be present in file names.

However, when somebody is starting with filetags, introducing colons to file names (because not unlike we both, the user fancies colons for particular purposes), I don't want the user to face issues when working with Windows people. This always falls back to filetags one way or the other.

Therefore, I would suggest that you modify your PR so that it nudges for dashes (only) instead.

If documentation is also adapted accordingly, I really would like to see that idea in filetags if I don't find anything that would stop working which did work before with respect to user experience: format of .filetags files, text completion as well as Python UI completion, ...

@jneidel
Copy link
Copy Markdown
Contributor Author

jneidel commented Feb 21, 2026

In general, I'd say that "controlled vocabulary" should always related to the content of .filetags files, that do represent the concept of CV within filetags implementation. If a variable is not just containing that data, it should not be named like CV.

Both vocabulary and constant_vocabulary contain the structured contents of the .filetags file. So they contain the CV, name fits.

I would even say that filetags must make sure that colons may not entered by the user by accident. I don't think that I've introduced any user input sanitation check to avoid illegal characters. Maybe this is the right moment to think about introducing such a check.

Sure, this can be combined with a operating system = windows check.
I would have this check throw an error a la "Colons were detected in your tags. Rename them to proceed. This is to avoid file system problems + LINK"
I can add this to the PR and adjust the docs.

This way the colon improvement for completion can be applied and windows users can be protected from (unknowingly) shooting themselves in the foot 🙂
WDYT?

I really would like to see that idea in filetags if I don't find anything that would stop working which did work before with respect to user experience: format of .filetags files, text completion

In terms of backward compatibility (which is what you are referring to here) the changes would means:

  • Incompatibility for windows users with colon separated tags in .filetags: they would need to rename those tags
  • No change to text completion, unless using colons in tags on non-windows platforms, then you get the expected behavior (working completion, whereas it's broken currently)

@novoid
Copy link
Copy Markdown
Owner

novoid commented Feb 21, 2026

In general, I'd say that "controlled vocabulary" should always related to the content of .filetags files, that do represent the concept of CV within filetags implementation. If a variable is not just containing that data, it should not be named like CV.

Both vocabulary and constant_vocabulary contain the structured contents of the .filetags file. So they contain the CV, name fits.

But their content differs, otherwise one of them would be redundant.

You wrote:

The difference is that vocabulary will be modified by the code, controlled_vocabulary will not be.

So I would suggest to use "user_controlled_vocabulary" and maybe something like "filtered_controlled_vocabulary" - depending on the use-case which you have a better understanding of at this moment. If my assumption is true, this would clarify the situation.

I would even say that filetags must make sure that colons may not entered by the user by accident. I don't think that I've introduced any user input sanitation check to avoid illegal characters. Maybe this is the right moment to think about introducing such a check.

Sure, this can be combined with a operating system = windows check.

No, I would prevent colons on any operating system as the generated file names should not cause issues when transferred later-on to a different computer/user/system.

I would have this check throw an error a la "Colons were detected in your tags. Rename them to proceed. This is to avoid file system problems + LINK" I can add this to the PR and adjust the docs.

I'd prefer to not let the user enter problematic characters at all instead of issuing an error or warning which requires a manual change by the user. It's better to avoid errors (if possible) than to notify about them later.

I'm not sure if this is possible with the two input methods filetags does have: CLI + GUI.

If it's not possible to prevent while entering, we have to accept the warning I guess.

This way the colon improvement for completion can be applied and windows users can be protected from (unknowingly) shooting themselves in the foot 🙂 WDYT?

Yes.

Once again: I would love to be able to use colons for exactly that use-case myself as it's standard for almost any implementation of key-value-pairs I know of. My personal preference is to value the priority of "avoid filename issues" higher than "use the standard delimiter for key-value-pairs".

I really would like to see that idea in filetags if I don't find anything that would stop working which did work before with respect to user experience: format of .filetags files, text completion

In terms of backward compatibility (which is what you are referring to here) the changes would means:

* Incompatibility for windows users with colon separated tags in .filetags: they would need to rename those tags

Yes. I want to avoid that as much as possible in the first place. Most users probably won't even realize what the issue at hand is if the tool used to copy/sync does not provide meaningful help. For example, I doubt that NextCloud sync or Syncthing.net would help the user here: they would simply not synchronize and the user would assume, that there is a bug or similar.

I do have similar issue when syncing with Syncthing to Android: somehow, Google decided that problematic characters (Windows file systems included!) are not allowed on Android storage even though it could handle them itself. (This was different for a couple of years where this limitation was not in place as far as I remember.)

* No change to text completion, unless using colons in tags on non-windows platforms, then you get the expected behavior (working completion, whereas it's broken currently)

Again: I don't want different behavior depending on the current platform used. Even when using Linux, colons must be avoided.

There are edge cases where the user still might shoot himself in the foot: filetags -t "foo:bar" * is the most obvious. Well, at least for those cases, I'm not sure myself it a warning would be appropriate.

@jneidel
Copy link
Copy Markdown
Contributor Author

jneidel commented Feb 22, 2026

Yes. I want to avoid that as much as possible in the first place. Most users probably won't even realize what the issue at hand is if the tool used to copy/sync does not provide meaningful help. For example, I doubt that NextCloud sync or Syncthing.net would help the user here: they would simply not synchronize and the user would assume, that there is a bug or similar.

I do have similar issue when syncing with Syncthing to Android: somehow, Google decided that problematic characters (Windows file systems included!) are not allowed on Android storage even though it could handle them itself. (This was different for a couple of years where this limitation was not in place as far as I remember.)

Okay, I got you. That argument makes sense.

I'd prefer to not let the user enter problematic characters at all instead of issuing an error or warning which requires a manual change by the user.
I'm not sure if this is possible with the two input methods filetags does have: CLI + GUI.

Yes, it's possible. The --force-cv does the same thing, warns the user of non cv tags and blocks submission. That mechanism/approach can be reused to do input validation for colons.

There are edge cases where the user still might shoot himself in the foot: filetags -t "foo:bar" * is the most obvious.

--force-cv also covers that. That input would also be validated.

Would you want the colon input validation/blocking to be disablable via e.g. --force-allow-colon-separator? That would allow for backwards compatibility for those currently using colons. It would be accompanied by a warning.
Having the --force-allow would allow the users that really wanted colons to still use them.

@novoid
Copy link
Copy Markdown
Owner

novoid commented Feb 22, 2026

I'd prefer to not let the user enter problematic characters at all instead of issuing an error or warning which requires a manual change by the user.
I'm not sure if this is possible with the two input methods filetags does have: CLI + GUI.

Yes, it's possible. The --force-cv does the same thing, warns the user of non cv tags and blocks submission. That mechanism/approach can be reused to do input validation for colons.

Great to know.

There are edge cases where the user still might shoot himself in the foot: filetags -t "foo:bar" * is the most obvious.

--force-cv also covers that. That input would also be validated.

Would you want the colon input validation/blocking to be disablable via e.g. --force-allow-colon-separator? That would allow for backwards compatibility for those currently using colons. It would be accompanied by a warning. Having the --force-allow would allow the users that really wanted colons to still use them.

No, I would not add an additional parameter for that. In my head, this doesn't make much sense.

Instead, I would prevent all problematic characters in the input (as explained above) and optionally introduce a new parameter like "--allow-problematic-characters" for those people who really do know what they are doing. This is also the same group of people who read manpages and online help output. 😉

@nbehrnd
Copy link
Copy Markdown
Collaborator

nbehrnd commented Feb 23, 2026

I prefer if filetags remains agnostic to the operating system used. This is the reason why

Instead, I would prevent all problematic characters in the input (as explained above) ...

resonates well with me. Since filetags already uses argparse from the Python standard library, I propose to use it to validate the input. Throughout the booklet of Tiny Python Projects there are a couple of examples. Thus I thought to extend the underlying idea, to define a blacklist of characters (i.e., strings) disallowed, at least for now. (Because I dislike "option x reliably works only in Linux, but not so much in Windows".)

Prior to a dive into / an edit of filetags, a separate minimal working example:

#!/usr/bin/env python3
"""Constrain choices with Python's argparse.

The constraints on a garment's size and color basically is one of the
examples in Ken Youens-Clarks's _Tiny Python Projects_ appendix about
argparse.  The idea of a blacklist of strings however is not.
"""

import argparse


BLACKLIST = [":", ";", "*"]


def get_args():
    """Collect the arguments."""

    parser = argparse.ArgumentParser(
        description="Choices", formatter_class=argparse.ArgumentDefaultsHelpFormatter
    )

    parser.add_argument(
        "name",
        metavar="str",
        help="Name to greet",
    )

    parser.add_argument(
        "-c",
        "--color",
        metavar="str",
        help="Color to choose",
        choices=["red", "yellow", "blue", "colorless"],
        default="colorless",
    )

    parser.add_argument(
        "-s",
        "--size",
        metavar="size",
        type=int,
        choices=range(1, 11),
        default=5,
        help="The size of the garment",
    )

    # addition to the book:
    parser.add_argument(
        "-S",
        "--separator",
        metavar="str",
        help="Define a separator enclosed in single/double quotes.",
        type=validate_string,
        default="_",
    )

    return parser.parse_args()


def validate_string(element: str) -> str:
    """Prevent using a string of the blacklist."""
    if element in BLACKLIST:
        raise argparse.ArgumentTypeError(
            # a more verbose note:
            # f"invalid value `{element}` is among the forbidden strings {BLACKLIST}"
            # less verbose note:
            f"`{element}` is blacklisted (cf. documentation)"
        )
    return element


def main():
    """Join the functionalities."""

    args = get_args()

    print(f"{args.name} uses as separator the string `{args.separator}`.")
    print(f"The garment is {args.color} and of size {args.size}.")


if __name__ == "__main__":
    main()

@nbehrnd
Copy link
Copy Markdown
Collaborator

nbehrnd commented Feb 23, 2026

@jneidel Please note filetags moved; the reference of your checkout might require an update. What happened: when filing a PR, the already present unit tests will run (python 3.14 and GitHub runners of ubuntu, windows, macos) as a check. As by now, a few of the checks fail, but only in Windows.

I speculate it might be better to solve the errors ahead of merging a new flag --force-cv.

@jneidel
Copy link
Copy Markdown
Contributor Author

jneidel commented Feb 24, 2026

Ok, I will add a new PR which will prohibit problematic characters from being used in tags (colon for now).
This is enforced regardless of the platform and can be disabled with --allow-problematic-characters.

For non-interactive input (-t), argparse can be used to validate.
For interactive input validation (gui/tui) the same approach as for --force-cv is used.

@jneidel jneidel closed this Feb 24, 2026
jneidel added a commit to jneidel/filetags that referenced this pull request Feb 27, 2026
and differentiate it from controlled_vocabulary.

Follows discussion on novoid#87
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants