Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add .stem method to IO::Path for 6.e #5031

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add .stem method to IO::Path for 6.e #5031

wants to merge 2 commits into from

Conversation

lizmat
Copy link
Contributor

@lizmat lizmat commented Aug 16, 2022

"stem" takes the number of extension levels to remove.
By default all, * and Inf accepted as arguments.

say "foo/bar.tar.gz".IO.stem;    # bar
say "foo/bar.tar.gz".IO.stem(1);  # bar.tar

- stem: anything in basename before first ".", basename if no "."
- suffix: anything in basename after first ".", "" if no "."
@lizmat lizmat changed the title Add .stem and .suffix methods to IO::Patyh for 6.e Add .stem and .suffix methods to IO::Path for 6.e Aug 16, 2022
Suffix we basically already have, with the "extension" method.

Adapted "stem" to take the number of extension levels to remove.
By default all, * and Inf accepted as arguments.

    say "foo/bar.tar.gz".IO.stem;    # bar
    say "foo/bar.tar.gz".IO.stem(1);  # bar.tar
@lizmat lizmat changed the title Add .stem and .suffix methods to IO::Path for 6.e Add .stem method to IO::Path for 6.e Aug 16, 2022
@tbrowder
Copy link
Member

As another set of names for the basename’s parts, consider:

say $basename.IO.barename;
say $basename.IO.extension; 

Thus only one new method need be added.

@lizmat
Copy link
Contributor Author

lizmat commented Aug 17, 2022

Are you suggesting s/stem/barename/ ?

Or are you suggesting that barename is the part before the first . ?

@tbrowder
Copy link
Member

tbrowder commented Aug 17, 2022

Are you suggesting s/stem/barename/ ?

No.

Or are you suggesting that barename is the part before the first . ?

Yes!

@vrurg
Copy link
Member

vrurg commented Aug 18, 2022

Does it mean then that for a file.tar.gz.1:

.barename: file
.stem: file.tar.gz
.stem.stem: file.tar

@lizmat
Copy link
Contributor Author

lizmat commented Aug 18, 2022

.stem: file
.stem(1): file.tar
.stem(2): file.tar.gz

FWIW, I don't like "barename" as conceptually, people could think "file.tar" to be the barename of "file.tar.gz". I think stem is clearer in that respect.

@vrurg
Copy link
Member

vrurg commented Aug 18, 2022

I would say, that to me the argument is more about skipping from the end, not about leaving at the start. Normally we strip off extra extensions. Otherwise I may need to know how many extensions are there in order to strip the amount I need.

There is a compromise option of *-$n argument. Although in my view $n must be the number of extensions to be kept, as with the currently propsed .stem(1): file.tar. I mean it would be preferable to be implemented as .stem(*-1): file.tar.

@tbrowder
Copy link
Member

tbrowder commented Aug 19, 2022

Does it mean then that for a file.tar.gz.1:
.barename: file
.stem: file.tar.gz
.stem.stem: file.tar

No. For file <file.tar.gz.1>, my CURRENT recommendation is:

basename: <file.tar.gz.1>                    # as in current Raku
barename: <file.tar.gz>                      # my NEW recommendation, change required to current Raku
extension: <1>                               # as in current Raku

So only one change would be seen by the user.

However, if @lizmat's change (my original suggestion) is kept, I believe the result should be:

basename: <file.tar.gz.1>                    # as in current Raku
stem: <file.tar.gz>                          # change required to current Raku
suffix: <1>                                  # change required to current Raku
extension: <1>                               # as in current Raku

And two changes would be seen by the user.

@Leont
Copy link
Contributor

Leont commented Aug 19, 2022

I have a hard time seeing the use-case of these particular functions. What is want is the basename as implemented on the unix command line and in Perl: one that can remove a given suffix. Bonus points if it can take a junction of them but a list is acceptable too. IO::Path's basename method not doing what any other basename does is rather frustrating to be honest.

Just give me a method basename(*@suffixes) and I'd be happy.

@jonathanstowe
Copy link
Contributor

the basename as implemented on the unix command line

This would be my suggestion too. I guess it could be awesomified to accept matchers as well as strings for the suffixes to remove, but the basic idea seems good.

@lizmat
Copy link
Contributor Author

lizmat commented Aug 20, 2022

I'm not clear in which direction this should go now, so closing this PR. I'd welcome any new PR with a clear definition and path forward (pun intended).

@lizmat lizmat closed this Aug 20, 2022
@tbrowder
Copy link
Member

tbrowder commented Aug 20, 2022

I'll make one more observation. The one thing that bothers me about the current situation is the ease of determining the extension of a basename as contrasted with getting the basename without the extension. It just seems like there ought to be a simple, default method for doing so, regardless of what we call it.

In pseudocode:

    my $extension = $basename.extension;
    my $basename'less'extension = $basename.SOME_MISSING_METHOD;

Each can have more complexities of use, but the default for getting the two pieces of the basename defined by the last dot in the name should need no arguments.

@lizmat lizmat deleted the lizmat-stem-suffix branch August 20, 2022 12:01
@lizmat
Copy link
Contributor Author

lizmat commented Aug 20, 2022

I think the issue revolves around the question: "what is the extension?". On "foo.tar.gz", is the extension "tar.gz" or "gz" ?

As long as we don't agree on the name, asking for a method that will give the basename "without the extension", will be impossible to give a good answer, let alone an implementation.

@tbrowder
Copy link
Member

tbrowder commented Aug 20, 2022

I'm now arguing that the existing 'extension' method is fine. As a default, the extension captured should be the last one in basename as is the default now.

The new direction I favor now has two parts:

  1. What to name the part of the 'basename' when the default 'extension' is removed?

  2. How to implement it as a method with its default behavior being to return the default 'basename' with the default 'extension' (and the joining dot) removed?

You have basically solved Part 2. We just need to agree on Part 1.

@jonathanstowe
Copy link
Contributor

I think stem is probably a reasonable name for 1 TBH. And I think what was proposed is good for that on the assumption that the suffix is by default everything after the first '.' and if there are special cases with "stacked" suffixes then that can be dealt with too. It seems to me that the semantics of any particular "compound suffix" are going to be application specific and would need to be dealt with a program that needs to deal with them individually. After all 'tar.gz' is merely a convention, some software will do '.tgz' and so forth.

@tbrowder
Copy link
Member

tbrowder commented Aug 20, 2022

I think I can live with that since 'extension' gives me the last part (after the last dot) and 'stem' will give me the first part (before the first dot). That suits my usual use case:

  1. use File::Find to find files with a certain 'extension' (usually pdf or csv or ofx) for further processing.
  2. remove the stem for a rename with a new extension

And if 'suffix' becomes the basename remainder after the first dot, that can be useful, too.

@lizmat lizmat restored the lizmat-stem-suffix branch October 13, 2022 15:05
@lizmat lizmat reopened this Oct 13, 2022
@masukomi
Copy link

finding the "stem" of a file passed into a script is something i've had to do in every language. Yes, it's simple to combine basename and extention to get, and then strip the extension, but you could just as easily argue that it's trivial to strip off everything after the last path delimiter in order to get basename.

If the argument against .stem is essentially that "we don't need a method for something so trivial" then what's the argument for .basename ?

I think the issue revolves around the question: "what is the extension?". On "foo.tar.gz", is the extension "tar.gz" or "gz" ? - @lizmat

I disagree. I don't think that's a "valid" question in this context.

We've already got a .extension method. So the "extension" is, by definition, whatever the .extension method returns. So the "stem" is, by definition, the basename minus whatever .extension returns.

If there's debate about what the "correct" extension is, then that should be taken up in a ticket about the .extension method. It doesn't have any real bearing on this PR because this functionality doesn't have to weigh in on what the extension is. It can simply use the existing definition of extension, and if that changes, then this will stay in sync with that.

As long as we don't agree on the name, asking for a method that will give the basename "without the extension", will be impossible to give a good answer, let alone an implementation.

So, what's the historical resolution here? How has Raku decided on what name to use for things?

I'm voting for "stem" because i haven't heard anything that's less ambiguous. Personally I don't really care. Every language has method names I disagree with. ;)

@coke coke changed the base branch from master to main April 19, 2023 15:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants