Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the function stripExtension #43

Closed
thomie opened this issue Mar 19, 2015 · 7 comments
Closed

Add the function stripExtension #43

thomie opened this issue Mar 19, 2015 · 7 comments

Comments

@thomie
Copy link
Contributor

thomie commented Mar 19, 2015

-- | Drop the given extension from a FilePath, and the \".\" preceding it.
--
-- It returns Nothing if the FilePath does not have the extension given, or
-- Just the part before the extension, if it does.
--
-- It is safer to use this function than System.FilePath.dropExtensions,
-- because FilePath might be something like 'file.name.ext1.ext2', where we
-- want to only drop the 'ext1.ext2' part, but keep the full 'file.name' part.
--
-- > dropExtension x == fromJust (stripExtension (takeExtension x) x)
-- > dropExtensions x == fromJust (stripExtension (takeExtensions x) x)
-- > stripExtension "c.d"  "a.b.c.d"  == Just "a.b"
-- > stripExtension ".c.d" "a.b.c.d"  == Just "a.b"
-- > stripExtension "c.d"  "a.b..c.d" == Just "a.b."
-- > stripExtension ".c.d" "a.b..c.d" == Just "a.b."
-- > stripExtension "baz"  "foo.bar"  == Nothing
-- > stripExtension "bar"  "foobar"   == Nothing
-- > stripExtension ""     "foobar"   == Just "foobar"
stripExtension :: String -> FilePath -> Maybe FilePath
stripExtension []        path = Just path
stripExtension ext@(x:_) path = stripSuffix dotExt path
  where
    dotExt = if isExtSeparator x then ext else '.':ext

-- | The stripSuffix function drops the given suffix from a list. It returns
-- Nothing if the list did not end with the suffix given, or Just the list
-- before the suffix, if it does.
stripSuffix :: Eq a => [a] -> [a] -> Maybe [a]
stripSuffix xs ys = reverse <$> stripPrefix (reverse xs) (reverse ys)

I think stripping an empty extension should return Just the original FilePath, for consistency with stripPrefix (and stripSuffix). If however it should return Nothing, then the laws could be changed to:

-- > dropExtension x == fromMaybe x (stripExtension (takeExtension x) x)
-- > dropExtensions x == fromMaybe x (stripExtension (takeExtensions x) x)
@ndmitchell
Copy link
Contributor

Hmm, interesting thought. I'm not changing any code in the repo until a few weeks after the GHC 7.10 release, but time to think about it before then.

I'm not super happy about the empty string being so different - it seems a weird special case. At the moment it always removes one dot at least, unless you pass the empty string. That said <.> adds a . automatically unless you pass the empty string, so I can see the argument for consistency. It seems whatever we go with will be slightly inconsistent in one way or another.

I can't help but feel stripSuffix ".foo" might be a cleaner way for people to write this, without going anywhere near the filepath library - it's not as if the extension separator is really something that ever changes. I know the base library doesn't yet have stripSuffix, but maybe it should? I've already got stripSuffix in my extra library: https://hackage.haskell.org/package/extra-1.1/docs/Data-List-Extra.html#v:stripSuffix

@thomie
Copy link
Contributor Author

thomie commented Mar 19, 2015

I like that "it removes at least one dot, or Nothing" rule.

I can't help but feel stripSuffix ".foo" might be a cleaner way for people to write this

I see 2 problems:

  1. People might look for something like stripSuffix (edit: stripExtension) in filepath, not find it, and go for the next best thing which is dropExtensions. I was about to do that, until I realized my mistake (it assumes basename doesn't contain any dots). This is troubling, because dropExtensions probably works most of the time. Using dropExtension isn't good either, because the extension might consist of two parts: ".ext1.ext2".
  2. Or, the extension is a variable instead of a literal string, and users write their own version of stripExtension, possibly using stripSuffix. This is potential source for bugs, because they might not handle a (missing) leading dot in the extension correctly. Here's a bug currently the GHC codebase (it drops the first character from the file's basename if osuf already starts with a dot):
                let file_base = dropTail (length osuf + 1) file

Related: the number of bugs I've discovered in GHC because of improper mangling of FilePaths is astounding. Simple usage of one of the filepath functions has fixed all of them sofar.

@ndmitchell
Copy link
Contributor

OK, I'm getting slowly more convinced :). Shouldn't stripSuffix "" "foo." == Just "foo" and stripSuffix "" "foo" == Just "foo"?

Glad to hear that filepath is eliminating some GHC bugs, indeed, string mangling is hard to get right, especially given the weirdness of filepaths in the wild.

@thomie
Copy link
Contributor Author

thomie commented May 18, 2015

Getting back to this.

Shouldn't stripExtension "" "foo." == Just "foo"...

That could work. The rule being that it doesn't matter whether you prepend the extension with a dot or not.

with dot:              stripExtension ".bar" "foo.bar" == Just "foo"
without dot:           stripExtension "bar"  "foo.bar" == Just "foo"

with dot:              stripExtension "." "foo." == Just "foo"
hence, without dot:    stripExtension ""  "foo." == Just "foo"

... and stripExtension "" "foo" == Just "foo"?

I would then perhaps say stripExtension "" "foo" == Nothing, guided by the following short and simple implementation (no special case for the empty extension):

stripExtension :: String -> FilePath -> Maybe FilePath
stripExtension ext@('.':_) = stripSuffix ext
stripExtension ext         = stripSuffix ('.':ext)

This would also give us the easy to remember/explain "it removes at least one dot, or returns Nothing" rule.

@ndmitchell
Copy link
Contributor

My thought about stripExtension "" = id was that "x" <.> "" == "x", so in a way extensions already treat the empty string as a special case. I imagine there's some stripExtension (takeExtension x) x = Just $ dropExtension x property that you would expect to hold that might not if we don't special-case the empty string. Note that you can always do stripExtension "." if you really meant to do that, just like you can do "x" <.> "." if you really want a blank extension.

That said, I'm not totally convinced.

@thomie
Copy link
Contributor Author

thomie commented May 21, 2015

Below are 4 pairs of properties. For each pair, the first property is for when we special-case the empty string (stripExtension "" = id), and the second property is for when we don't (stripSuffix "" "foo." == Just "foo" and stripSuffix "" "foo" == Nothing).

-- 1. It doesn't matter whether you prepend the extension with a dot or not: 
 null x || head x /= '.'  && stripExtension x y == stripExtension ('.':x) y || x == "." || stripExtension x y == stripExtension (tail x) y
(null x || head x /= '.') && stripExtension x y == stripExtension ('.':x) y ||             stripExtension x y == stripExtension (tail x) y

 -- 2. (If the extension is not empty), it removes at least one dot, or returns Nothing:
fmap (filter isExtSeparator) (stripExtension x y) /= Just (filter isExtSeparator y) || null x
fmap (filter isExtSeparator) (stripExtension x y) /= Just (filter isExtSeparator y)

-- 3.
fmap (`addExtension` (takeExtension x)) (stripExtension (takeExtension x) x) == Just x
fmap (`addExtension` (takeExtension x)) (stripExtension (takeExtension x) x) == Just x || null (takeExtension x)

-- 4.
fromJust    (stripExtension (takeExtension x) x) == dropExtension  x
fromMaybe x (stripExtension (takeExtension x) x) == dropExtension  x

So properties 3&4 are shorter when we special case the empty string, and properties 1&2 are shorter when we don't.

@ndmitchell
Copy link
Contributor

Added. Thinking more, I decided that the extension "foo" is really ["foo"], while "" is [] and "." is ["."]. When viewed like that, your initial formulation with the special case for "" makes sense. Therefore, I merged your original code, with a few property cleanups and tweaks based on the following discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants