glob.glob() can return bizarre ordering #26

erykoff · 2016-06-22T16:03:26Z

Hey, stackexchange has the (unsatisfying) answer:
http://stackoverflow.com/questions/6773584/how-is-pythons-glob-glob-ordered

Apparently, it's order in the filesystem, whatever that may be; the order they were written to disk? But for two globs of files, they are by no means guaranteed to have the same matched ordering.

(I think sorting might fix this but would require some testing)

esheldon · 2016-06-22T17:01:00Z

I always do the following

flist=glob(pattern)
flist.sort()

note sort is in place, so this will not work, since sort() returns None

flist=glob(pattern).sort()

rmjarvis · 2016-06-22T17:08:38Z

Or

flist = sorted(glob(pattern))

But the point remains that it would be helpful to have other ways to specify the list of image and catalog files to make sure they match up 1-1.

esheldon · 2016-06-22T17:15:04Z

FYI, sorted() returns an iterator, not a list.

rmjarvis · 2016-06-22T17:21:13Z

I don't think so...

>>> type(sorted(glob.glob('*.fits')))
<type 'list'>

or in Python 3.4 (where I thought maybe they changed the nature of this function)

>>> type(sorted(glob.glob('*.fits')))
<class 'list'>

esheldon · 2016-06-22T17:25:07Z

I was clearly confused

erykoff · 2016-06-22T17:38:36Z

I did not know about the sorted() thingy. I always did it the way Erin did.

In any event, as Mike has said, the original point still stands that we
need to guarantee 1-1 matching. It may be that simply using sorted() does
the trick if you have filenames that are all the same except for some
obviously sortable index, which is the "expected" behavior.

On Wed, Jun 22, 2016 at 10:25 AM, Erin Sheldon notifications@github.com
wrote:

I was clearly confused

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#26 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AE7bEx0CKEX-E3kLVWiCbgzI_QV6j8SZks5qOW_0gaJpZM4I78t1
.

esheldon · 2016-06-22T18:05:39Z

In my experience it is best to determine such things algorithmically. For example, if I'm processing an exposure I can generate the file names for the ccds in the exposure using a simple algorithm based on the DESDM run identifier. The base name is known and the suffix is based on ccd number.

I recommend doing that by default. e.g. specify an exposure you want to process by name or id, and the code looks for it in the usual place ($DESDATA/OPS/red/...) and does its thing.

This has the advantage that if a file is missing you will notice it; you generated the file list and you know what should be there, if the file is missing the code will crash. Also there are usually auxilliary files you will need to use that will be generated using the same algorithms, and don't need to be determined from the input file list.

Then if you want to be able to do something different, perhaps for testing, that can be an alternative method, e.g. specifying an exact list of files, or a pattern as in the above config.

rmjarvis · 2016-06-29T15:59:40Z

This is now fixed on branch #20. There are a few ways allowed to specify the file lists. From the doc string:

        There are a number of ways to specify the input files (parameters `images` and `cats`):

        1. If you only have a single image/catalog, you may just give the file name directly
           as a single string.
        2. For multiple images, you may specify a list of strings listing all the file names.
        3. You may specify a string with ``{chipnum}`` which will be filled in by the chipnum
           values given in the `chipnums` parameter using ``s.format(chipnum=chipnum)``.
        4. You may specify a string with ``%s`` (or perhaps ``%02d``, etc.) which will be filled
           in by the chipnum values given in the `chipnums` parameter using ``s % chipnum``.
        5. You may specify a string that ``glob.glob(s)`` will understand and convert into a
           list of file names.  Caveat: ``glob`` returns the files in native directory order
           (cf. ``ls -f``).  This can thus be different for the images and catalogs if they
           were written to disk out of order.  Therefore, we sort the list returned by
           ``glob.glob(s)``.  Typically, this will result in the image file names and catalog
           file names matching up correctly, but it is the users responsibility to ensure
           that this is the case.

        The `chipnums` parameter specifies chip "numbers" which are really just any identifying
        number or string that is different for each chip in the exposure.  Typically, these are
        numbers, but they don't have to be if you have some other way of identifying the chips.

        There are a number of ways that the chipnums may be specified:

        1. A single number or string.
        2. A list of numbers or strings.
        3. A string that can be ``eval``ed to yield the appropriate list.  e.g.
           `[ c for c in range(1,63) if c is not 61 ]`
        4. None, in which case range(len(images)) will be used.  In this case options 3,4 above
           for the images and cats parameters are not allowed.

rmjarvis added a commit that referenced this issue Jun 29, 2016

Add more ways to specify the file lists. cf Issue #26

7517398

rmjarvis mentioned this issue Jul 1, 2016

#20 Lots of updates to front end API #28

Merged

rmjarvis closed this as completed Jul 8, 2016

rmjarvis added this to the Version 1.0 milestone Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glob.glob() can return bizarre ordering #26

glob.glob() can return bizarre ordering #26

erykoff commented Jun 22, 2016

esheldon commented Jun 22, 2016 •

edited

Loading

rmjarvis commented Jun 22, 2016

esheldon commented Jun 22, 2016

rmjarvis commented Jun 22, 2016 •

edited

Loading

esheldon commented Jun 22, 2016

erykoff commented Jun 22, 2016

esheldon commented Jun 22, 2016

rmjarvis commented Jun 29, 2016

glob.glob() can return bizarre ordering #26

glob.glob() can return bizarre ordering #26

Comments

erykoff commented Jun 22, 2016

esheldon commented Jun 22, 2016 • edited Loading

rmjarvis commented Jun 22, 2016

esheldon commented Jun 22, 2016

rmjarvis commented Jun 22, 2016 • edited Loading

esheldon commented Jun 22, 2016

erykoff commented Jun 22, 2016

esheldon commented Jun 22, 2016

rmjarvis commented Jun 29, 2016

esheldon commented Jun 22, 2016 •

edited

Loading

rmjarvis commented Jun 22, 2016 •

edited

Loading