Wildcards in data-files don't work with filenames containing multiple dots #784

Closed
bos opened this Issue May 24, 2012 · 11 comments

Comments

Projects
None yet
8 participants
@bos
Contributor

bos commented May 24, 2012

(Imported from Trac #794, reported by guest on 2011-01-22)

The Hoogle cabal file reads:

data-files:
    resources/_.js
    -- surely a Cabal bug that this isn't picked up by *.js
    -- but that hoogle.js is matched
    resources/jquery-1.4.2.js
    resources/jquery.cookie.js
It seems that the filename a.b.ext doesn't get picked by the wildcard _.ext.

-- Neil Mitchell

@bos

This comment has been minimized.

Show comment
Hide comment
@bos

bos May 24, 2012

Contributor

(Imported comment by @dcoutts on 2011-01-22)

From the user guide:

A limited form of * wildcards in file names, for example data-files: images/*.png matches all the .png files in the images directory.
The limitation is that * wildcards are only allowed in place of the file name, not in the directory name or file extension. In particular, wildcards do not include directories contents recursively. Furthermore, if a wildcard is used it must be used with an extension, so data-files: data/* is not allowed. When matching a wildcard plus extension, a file's full extension must match exactly, so *.gz matches foo.gz but not foo.tar.gz. A wildcard that does not match any files is an error.

So currently this is deliberate, that ".gz matches foo.gz but not foo.tar.gz". It has the unfortunate side effect that ".js" does not match "jquery-1.4.2.js". In this case "1.4.2" is not really a bunch of extensions but part of the name. On the other hand, "jquery.cookie.js" is exactly the sort of thing the test is there to avoid.

Perhaps it's not a problem at all and the behavior should be reversed. Are there any cases anyone can think of where it would be misleading/dangerous, where a wildcard would accidentally pick up more files than was really intended (especially e.g. generated/temp files).

Contributor

bos commented May 24, 2012

(Imported comment by @dcoutts on 2011-01-22)

From the user guide:

A limited form of * wildcards in file names, for example data-files: images/*.png matches all the .png files in the images directory.
The limitation is that * wildcards are only allowed in place of the file name, not in the directory name or file extension. In particular, wildcards do not include directories contents recursively. Furthermore, if a wildcard is used it must be used with an extension, so data-files: data/* is not allowed. When matching a wildcard plus extension, a file's full extension must match exactly, so *.gz matches foo.gz but not foo.tar.gz. A wildcard that does not match any files is an error.

So currently this is deliberate, that ".gz matches foo.gz but not foo.tar.gz". It has the unfortunate side effect that ".js" does not match "jquery-1.4.2.js". In this case "1.4.2" is not really a bunch of extensions but part of the name. On the other hand, "jquery.cookie.js" is exactly the sort of thing the test is there to avoid.

Perhaps it's not a problem at all and the behavior should be reversed. Are there any cases anyone can think of where it would be misleading/dangerous, where a wildcard would accidentally pick up more files than was really intended (especially e.g. generated/temp files).

@bos

This comment has been minimized.

Show comment
Hide comment
@bos

bos May 24, 2012

Contributor

(Imported comment by guest on 2011-01-24)

Many of those restrictions seem reasonable. I think the only one I'd change is that *.ext means all files ending ".ext", not all files ending ".ext" that don't have a '.' in the filename. I understand the desire about foo.gz, but who would expect *.gz not to match foo.tar.gz? That seems completely counter intuitive.

-- Neil

Contributor

bos commented May 24, 2012

(Imported comment by guest on 2011-01-24)

Many of those restrictions seem reasonable. I think the only one I'd change is that *.ext means all files ending ".ext", not all files ending ".ext" that don't have a '.' in the filename. I understand the desire about foo.gz, but who would expect *.gz not to match foo.tar.gz? That seems completely counter intuitive.

-- Neil

@bos

This comment has been minimized.

Show comment
Hide comment
@bos

bos May 24, 2012

Contributor

(Imported comment by guest on 2011-01-24)

Btw, jquery.cookie.js is a standard .js file, and if you asked for it's extension I would describe it has having extension .js. jquery files are often named with jquery.nameofthething.js for extensions.

Contributor

bos commented May 24, 2012

(Imported comment by guest on 2011-01-24)

Btw, jquery.cookie.js is a standard .js file, and if you asked for it's extension I would describe it has having extension .js. jquery files are often named with jquery.nameofthething.js for extensions.

@bos

This comment has been minimized.

Show comment
Hide comment
@bos

bos May 24, 2012

Contributor

(Imported comment by @dcoutts on 2011-01-24)

Mm, I'll consider changing it for the next major version (perhaps only with a change in the "cabal-version" field, so we do not change the behavior of older packages). See also #722.

Contributor

bos commented May 24, 2012

(Imported comment by @dcoutts on 2011-01-24)

Mm, I'll consider changing it for the next major version (perhaps only with a change in the "cabal-version" field, so we do not change the behavior of older packages). See also #722.

@hdgarrood

This comment has been minimized.

Show comment
Hide comment
@hdgarrood

hdgarrood May 21, 2013

Contributor

+1 for this, I just spent half an hour trying to work out why half of my JS files weren't coming across. Should I submit a pull request?

Contributor

hdgarrood commented May 21, 2013

+1 for this, I just spent half an hour trying to work out why half of my JS files weren't coming across. Should I submit a pull request?

@23Skidoo

This comment has been minimized.

Show comment
Hide comment
@23Skidoo

23Skidoo May 21, 2013

Member

@hdgarrood Yes, patches are always welcome.

Member

23Skidoo commented May 21, 2013

@hdgarrood Yes, patches are always welcome.

hdgarrood added a commit to hdgarrood/cabal that referenced this issue May 21, 2013

Fix file globbing with multiple dots (#784)
Setting the "data-files" field in a .cabal file to pull in
"resources/*.js" would previously not work for any file containing more
than one dot, for example "resources/jquery-1.9.1.js".

This commit changes this behaviour so that a glob counts as a match,
provided that its extension is a suffix of the filename. So, if
resources/ contains foo.js and foo.bar.js:

* "resources/*.js" will match both "resources/foo.js" and
  "resources/foo.bar.js"
* "resources/*.bar.js" will match "resources/foo.bar.js" but not
  "resources/foo.js"

It should be noted that I've only tested this by calling
matchFileDirGlob in ghci -- I'm fairly new to Cabal, and can't see how
to test it properly.
@BardurArantsson

This comment has been minimized.

Show comment
Hide comment
@BardurArantsson

BardurArantsson Nov 5, 2013

Collaborator

It seems rather bizarre that Cabal should behave this way given that almost(?) every single other program out there which does globbing does it differently. Surely the principle of least surprise should prevail here?

Any chance of getting this merged any time soon?

(Yes, I've also wasted 15-30 minutes on this recently :).)

Collaborator

BardurArantsson commented Nov 5, 2013

It seems rather bizarre that Cabal should behave this way given that almost(?) every single other program out there which does globbing does it differently. Surely the principle of least surprise should prevail here?

Any chance of getting this merged any time soon?

(Yes, I've also wasted 15-30 minutes on this recently :).)

@tibbe

This comment has been minimized.

Show comment
Hide comment
@tibbe

tibbe Nov 5, 2013

Member

If this matches standard shell globs, I'm fine with this getting merged.

Member

tibbe commented Nov 5, 2013

If this matches standard shell globs, I'm fine with this getting merged.

@BardurArantsson

This comment has been minimized.

Show comment
Hide comment
@BardurArantsson

BardurArantsson Nov 14, 2013

Collaborator

I don't know if there's a "standard" for shell globs, but shell globs do have more features. (E.g. {} and [].)

As far as I can tell from the code this should at least make Cabal's globbing a proper subset of shell globbing.

Collaborator

BardurArantsson commented Nov 14, 2013

I don't know if there's a "standard" for shell globs, but shell globs do have more features. (E.g. {} and [].)

As far as I can tell from the code this should at least make Cabal's globbing a proper subset of shell globbing.

@mietek

This comment has been minimized.

Show comment
Hide comment
@mietek

mietek Nov 27, 2014

Contributor

Halcyon supports declaring additional data-file for use at run-time with the HALCYON_EXTRA_DATA_FILES option, using standard GNU bash globbing syntax.

See Haskell Language for an example of declaring static website content as data files.

Contributor

mietek commented Nov 27, 2014

Halcyon supports declaring additional data-file for use at run-time with the HALCYON_EXTRA_DATA_FILES option, using standard GNU bash globbing syntax.

See Haskell Language for an example of declaring static website content as data files.

@mietek mietek referenced this issue in mietek/halcyon Feb 14, 2015

Open

Add and improve glob-based options #39

0 of 2 tasks complete

Ericson2314 added a commit to Ericson2314/clash-compiler that referenced this issue Feb 26, 2015

@hdgarrood hdgarrood referenced this issue Apr 4, 2015

Closed

Implement bash (with globstar) style globbing #2522

1 of 4 tasks complete

@ttuegel ttuegel modified the milestones: Cabal-1.24, Cabal-1.22 Apr 23, 2015

@BardurArantsson

This comment has been minimized.

Show comment
Hide comment
@BardurArantsson

BardurArantsson Jun 25, 2015

Collaborator

It seems that this is essentially a duplicate of #2522 (given the discussion therein). Close?

Collaborator

BardurArantsson commented Jun 25, 2015

It seems that this is essentially a duplicate of #2522 (given the discussion therein). Close?

@23Skidoo 23Skidoo modified the milestones: Cabal 1.24, Cabal 1.26 Feb 21, 2016

@ezyang ezyang modified the milestone: Cabal 2.0 Sep 6, 2016

quasicomputational added a commit to quasicomputational/cabal that referenced this issue Jun 10, 2018

Allow globs to match against a suffix of a file's extensions
This has the effect of allowing a glob `*.html` to match the file
`foo.en.html`. For compatibility, this is only allowed with
`cabal-version: 3.0` or later; for earlier spec versions, a warning
will be generated by `cabal check` if there are files affected by this
change in behaviour.

Fixes #5057. Fixes #784. Closes #5061.

quasicomputational added a commit to quasicomputational/cabal that referenced this issue Jun 11, 2018

Allow globs to match against a suffix of a file's extensions
This has the effect of allowing a glob `*.html` to match the file
`foo.en.html`. For compatibility, this is only allowed with
`cabal-version: 3.0` or later; for earlier spec versions, a warning
will be generated by `cabal check` if there are files affected by this
change in behaviour.

Fixes #5057. Fixes #784. Closes #5061.

23Skidoo added a commit to quasicomputational/cabal that referenced this issue Jun 13, 2018

Allow globs to match against a suffix of a file's extensions
This has the effect of allowing a glob `*.html` to match the file
`foo.en.html`. For compatibility, this is only allowed with
`cabal-version: 3.0` or later; for earlier spec versions, a warning
will be generated by `cabal check` if there are files affected by this
change in behaviour.

Fixes #5057. Fixes #784. Closes #5061.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment