Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files matching when padding is different #41

Closed
broganross opened this issue Sep 22, 2017 · 13 comments
Closed

Files matching when padding is different #41

broganross opened this issue Sep 22, 2017 · 13 comments
Labels
bug
Milestone

Comments

@broganross
Copy link

@broganross broganross commented Sep 22, 2017

If a file name is the same in every way except it's numerical padding, contains still says it's a match.

files = ["file_name.{:04}.ext".format(i + 1) for i in xrange(10)]
seq = pyseq.get_sequences(files)[0]
seq.contains("file_name.1.ext") # returns True
seq.contains("file_name.0001.ext") # also returns True
@nebukadhezer

This comment has been minimized.

Copy link
Contributor

@nebukadhezer nebukadhezer commented Sep 22, 2017

https://github.com/rsgalloway/pyseq/blob/master/pyseq.py#L282

is_sibling = (len(d) == 1) and (self.parts == item.parts) and (len(d[0]['frames'][0]) == len(d[0]['frames'][1]))

with this I am getting one failed unittest...
but I think something along those line should fix this...

@nebukadhezer

This comment has been minimized.

Copy link
Contributor

@nebukadhezer nebukadhezer commented Sep 22, 2017

https://github.com/rsgalloway/pyseq/blob/master/tests/test_pyseq.py#L528

Bit of a design question...
The unittest describes the problem, that sometimes the padding just changes with every 10* multiplication...
Sounds to me like we should have sth like a mode for that upon init, if we want to be strict or not strict with this...

@broganross

This comment has been minimized.

Copy link
Author

@broganross broganross commented Sep 22, 2017

I think uncompress should fail in that unit test, because the formatting specifies it has a padding of 3 which 10000 obviously wouldn't fit in. If the formatting didn't specify padding it would should pass.

But, I see your point about the digit size increasing the apparent padding. I think having an optional strictness setting is a good plan. I would also recommend defaulting it to strict. Just on the wild assumption that it's more common for people to pad the digits than it is for them to not.

You could probably also compare the lens of Item.digits and get the same result. Not sure which is faster.

@rsgalloway rsgalloway added the bug label Sep 22, 2017
@rsgalloway rsgalloway added this to the 0.5.1 milestone Sep 22, 2017
@rsgalloway

This comment has been minimized.

Copy link
Owner

@rsgalloway rsgalloway commented Sep 22, 2017

Hi Brogan, thanks. Just looked at this and I agree with both of your assessments, so I've categorized this as a bug. However, I'm not sure about adding a 'strictness' option. It's not a bad idea, but having different conventions in different contexts could lead to problems. To me, it seems obvious that

seq.contains("file_name.1.ext") # returns True

should be False, and the uncompress unit test you mentioned should be updated or expected to fail. If you disagree, then I'm Ok with adding a strictness setting to provide some flexibility, I guess I'd rather try to make pyseq smarter first if possible.

I've already made the discussed changes and will submit an update shortly for testing. The only difference is I put the pad length conditional in diff() instead. Thoughts?

            if (m1.start() == m2.start()) and \
               (m1.group() != m2.group()) and \
               (len(m1.group()) == len(m2.group())):
                d.append({
                    'start': m1.start(),
                    'end': m1.end(),
                    'frames': (m1.group(), m2.group())
                })

All unit tests pass with this change (except the one mentioned, which I also fixed).

I'd like to push this into a 0.5.1 release. Technically, this changes the behavior slightly. Hopefully this change would not break pyseq for anyone for which this bug is the expected behavior.

@broganross

This comment has been minimized.

Copy link
Author

@broganross broganross commented Sep 22, 2017

That sounds good to me. I just think the files with no padding is a valid use case, and it would be good to provide a way to work with those files. I don't foresee it being my use case, but now that i've said that I know I'll run into it.

@rsgalloway

This comment has been minimized.

Copy link
Owner

@rsgalloway rsgalloway commented Sep 22, 2017

Do you mean the use case where a non-padded file should be included in a padded sequence? i.e. the current behavior where this should return True in some cases?

seq.contains("file_name.1.ext")

@broganross

This comment has been minimized.

Copy link
Author

@broganross broganross commented Sep 22, 2017

Well, that's sort of the question. If padding strictness is added, the following files won't work:

file.8.ext
file.9.ext
file.10.ext

But not adding it causes the reason for this ticket.
Maybe there's a way to handle both cases. But, the quick fix is to add padding strictness. Though I think we should give a mechanism for user's to disable it in case they have files that aren't padded. Say a global variable STRICT_PAD, or whatever.

        if (m1.start() == m2.start()) and (m1.group() != m2.group()):
            if STRICT_PAD is True and (len(m1.group()) != len(m2.group())):
                continue
            d.append({
                'start': m1.start(),
                'end': m1.end(),
                'frames': (m1.group(), m2.group())
            })

Does that make sense?

@nebukadhezer

This comment has been minimized.

Copy link
Contributor

@nebukadhezer nebukadhezer commented Sep 24, 2017

For our usecases pyseq can be "strict" about the padding, we almost never have sequences with an increasing padding.
As we all think the strict mode is the default I would rather call the "switch" something like PYSEQ_LOOSE_PAD.
@rsgalloway What do you think about a switch to control the behaviour? And it would need to be an envvar, there is no other option ?

@rsgalloway

This comment has been minimized.

Copy link
Owner

@rsgalloway rsgalloway commented Sep 26, 2017

I agree, my gut tells me strict padding is probably the vast majority of use cases. I've never seen variable padding in a given sequence unless it's a mistake. The padding is always predefined. So my initial reaction was to just fix the bug, period.

To make it backwards compatible for anyone relying on this bug, then, we could add a global variable to use the old/current behavior. I don't think it needs to go as far as an environment variable because that implies variable padding in some contexts and not others, which seems like overkill.

@broganross

This comment has been minimized.

Copy link
Author

@broganross broganross commented Sep 26, 2017

I agree. I'm pretty sure most use cases would have static padding. Just trying to keep the flexibility. Because now that the functionality is going to be changed, one of us is totally going to get some files without padding 😃

@rsgalloway

This comment has been minimized.

Copy link
Owner

@rsgalloway rsgalloway commented Sep 29, 2017

@broganross I took your advice and code snippet, but I changed STRICT_PAD to strict_pad to match the case of the other variables (I know, probably bad practice but I wanted it to be consistent). It's in the issue-41 branch right now, will merge shortly.

@rsgalloway rsgalloway closed this Sep 29, 2017
@broganross

This comment has been minimized.

Copy link
Author

@broganross broganross commented Oct 4, 2017

Yeah, it doesn't really matter. I think it's PEP8 thing to name global variables with upper case, but I'm not sure. Could just be in my head.

@herronelou

This comment has been minimized.

Copy link

@herronelou herronelou commented Jun 27, 2019

I'm sorry to comment on a closed issue, but this has just caused me some woes.

I'm in a case right now where I need both strict padding and not strict padding. Not necessarily at the same time, but in the same code base, in which case neither the global variable nor an env variable could help me (plus I don't know in advance which is which).

We have a full file browser that uses pyseq to display sequences rather than individual files (think Nuke's file browser), so it encounters many many different forms. In general, strict padding is the desired behavior, however today we ran into this:

somefile.0947.exr
[...]
somefile.9998.exr
somefile.9999.exr
somefile.10000.exr
somefile.10001.exr
[...]

This came from a client, so we had no control over it, although it's quite obvious that the client decided to render with %04d padding, but ran out of digits.

This came in as 2 sequences using get_sequences, turning off strict padding solved that issues, although I have seen other cases where we definitely want to be strict on the padding, though I can't quite cite an example right now.

I have a feeling pyseq should be a little bit smarter, maybe using leading zeros to check if the padding is implicit or explicit.

seq = pyseq.get_sequences(['test.0003.jpg', 'test.0004.jpg' ,'test.0005.jpg'])[0]  # padding is explicitly 4
seq.includes('test.1.jpg')  # False, because it's obviously not matching the padding
seq.includes('test.1001.jpg')  # True, normal
seq.contains('test.10000.jpg')  # True, because we ran out of padding, so this would indeed happen

seq = pyseq.get_sequences(['test.1003.jpg', 'test.1004.jpg' ,'test.1005.jpg'])[0]  # no explicit padding
seq.includes('test.1.jpg')  # True
seq.includes('test.1001.jpg')  # True
seq.contains('test.10000.jpg')  # True
seq.append('test.0002.jpg')  # Now we add something with explicit padding, we can infer the actual padding
seq.includes('test.1.jpg')  # False

What do you think about that? Would that be sufficient? Overkill? Would that address the original issue? (It would handle the test code from the original post)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.