Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added missing typing annotations in datasets/ucf101 #4171

Merged
merged 4 commits into from
Jul 20, 2021

Conversation

frgfm
Copy link
Contributor

@frgfm frgfm commented Jul 12, 2021

Following up on #2025, this PR adds missing typing annotations in datasets/ucf101.py.

Any feedback is welcome!

@pmeier pmeier self-requested a review July 14, 2021 12:31
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @frgfm! Unfortunately, the mypy failures are relevant:

torchvision/datasets/ucf101.py:106: error: List comprehension has incompatible
type List[List[str]]; expected List[str]  [misc]
                data = [x.strip().split(" ") for x in data]
                        ^
torchvision/datasets/ucf101.py:109: error: Incompatible types in assignment
(expression has type "Set[str]", variable has type "List[str]")  [assignment]
            selected_files = set(selected_files)
                             ^
Found 2 errors in 1 file (checked 110 source files)

@NicolasHug
Copy link
Member

NicolasHug commented Jul 15, 2021

Sorry, rant time.

The offending lines are the following:

        selected_files = []
        with open(f, "r") as fid:
            data = fid.readlines()
            data = [x.strip().split(" ") for x in data]
            data = [os.path.join(self.root, x[0]) for x in data]
            selected_files.extend(data)
        selected_files = set(selected_files)

This is perfectly fine Python code.

Mypy is unhappy about this very useful and common pattern:

l = some_iterable
l = [f(y) for y in l]

It's also unhappy about l = set(l), making the set() constructor pretty much unusable with anything else but an empty set.

mypy is wrong here, and will force us to re-write the code in something that makes less sense, and that is less readable. These kind of false positive happen all the time, everywhere. They cost maintenance time, and cause debt.

End of rant, sorry again. I'm really questioning the benefits of type checking torchvision.

@pmeier
Copy link
Collaborator

pmeier commented Jul 15, 2021

This is perfectly fine Python code.

Disagree.

data = fid.readlines()  # List[str]
data = [x.strip().split(" ") for x in data]  # List[List[str]]
data = [os.path.join(self.root, x[0]) for x in data]  # List[str]

We split on line 2 and on line 3 select only the first element. By simply moving the [0] index to line 2 mypy is happy.

data = fid.readlines()  # List[str]
data = [x.strip().split(" ")[0] for x in data]  # List[str]
data = [os.path.join(self.root, x) for x in data]  # List[str]

It is also better "decoupled". First line reads raw data, second line removes unwanted stuff and third line adds some paths to it.

As for the set call. mypy complains because data was a List[str] before and is a Set[str] afterwards. Four possible remedies:

  1. Instead of using a list at first and converting it into a set later, why not use a set directly?
  2. We can move this call directly into the list comprehension removing the need for a temporary variable.
  3. We can put a type: ignore[no-redef] on the line if we want to keep it as is.
  4. If we don't want mypy to ever complain about this we can put a allow_redefinition = True into mypy.ini and be done with it.

From these, IMO 1. is the best option.

To summarize: is the code running? Yes. Is perfectly fine?`No.

@pmeier
Copy link
Collaborator

pmeier commented Jul 15, 2021

@frgfm Since I already dug into this, you can git apply

--- a/torchvision/datasets/ucf101.py
+++ b/torchvision/datasets/ucf101.py
@@ -100,13 +100,12 @@ class UCF101(VisionDataset):
         name = "train" if train else "test"
         name = "{}list{:02d}.txt".format(name, fold)
         f = os.path.join(annotation_path, name)
-        selected_files = []
+        selected_files = set()
         with open(f, "r") as fid:
             data = fid.readlines()
-            data = [x.strip().split(" ") for x in data]
-            data = [os.path.join(self.root, x[0]) for x in data]
-            selected_files.extend(data)
-        selected_files = set(selected_files)
+            data = [x.strip().split(" ")[0] for x in data]
+            data = [os.path.join(self.root, x) for x in data]
+            selected_files.update(data)
         indices = [i for i in range(len(video_list)) if video_list[i] in selected_files]
         return indices

to improve the code and make mypy happy.

@NicolasHug
Copy link
Member

NicolasHug commented Jul 15, 2021

Is perfectly fine?`No.

Agree to disagree then :).

It might not be as "perfect", but it's still perfectly fine. There's no bug (hopefully), it's readable, and isn't bottlenecking performance.

There's nothing in the original code that is inherently wrong. It doesn't warrant a CI to go red, nor does it justify the extra work of re-writing it.

Outside of the type-annotation context, if a contributor opened a PR with those changes, we might accept the changes but we might also reply "thanks but no thanks, this doesn't bring enough improvement to be considered".

@pmeier
Copy link
Collaborator

pmeier commented Jul 15, 2021

but we might also reply "thanks but no thanks, this doesn't bring enough improvement to be considered".

I don't understand. If someone opens a PR fixing a typo in the documentation would you also consider turning it down because it "doesn't bring enough improvement to be considered"?

If no, what is different in this case? This is also easily reviewed and 100% covered by tests.

@NicolasHug
Copy link
Member

NicolasHug commented Jul 15, 2021

No, typo fixes are net and clear improvements IMO. This re-writing isn't, or at least it's very, very marginal. The fact that mypy systematically forces us to address those things which provide very little gain is a concern to me.

@frgfm
Copy link
Contributor Author

frgfm commented Jul 19, 2021

No, typo fixes are net and clear improvements IMO. This re-writing isn't, or at least it's very, very marginal. The fact that mypy systematically forces us to address those things which provide very little gain is a concern to me.

Hi @NicolasHug,

I agree that the transition is troublesome sometimes, but from what I gather in #2025, typing annotations are being added for new pieces of code. The tricky part is that we would prefer the entire codebase to have typing, and fortunately, you guys have some contributors hoping to help with that 👍

So yes, this PR doesn't fix a typing issue, it slightly modifies the codebase so that we could, later on, enjoy typing over the entire codebase. If that has changed, happy to close this PR but then I guess we need to close or discuss a few things in #2025 🤷‍♂️

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @frgfm!

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @frgfm , LGTM. I hope my rant didn't deter you :). I have one question below but I'll merge this one anyway

_video_height: int = 0,
_video_min_dimension: int = 0,
_audio_samples: int = 0
) -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just curious here: were you using something (an IDE, a type checker (mypy??) or something else) that was prompting you to type the return of __init__?

If yes, I'm curious to know which one, and why they would do that. But if not, I would recommend not to type those in future PRs, as this is essentially useless.

I have the same question/remark for typing the return type of __len__ below actually

Copy link
Contributor Author

@frgfm frgfm Jul 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I don't since I have Sublime Text.
But it's not the first PR I've opened regarding typing on this repo, and the last time I was asked to add those. Happy to change that in future PRs, let me know what you prefer :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently mypy probably accepts those, but if we run with no_untyped_defs it will yell at us.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details. OK, if mypy is that needy, I guess we have to surrender and keep typing __init__

@NicolasHug NicolasHug merged commit 642ad75 into pytorch:master Jul 20, 2021
@frgfm frgfm deleted the ucf101-typing branch July 20, 2021 14:41
facebook-github-bot pushed a commit that referenced this pull request Jul 27, 2021
Reviewed By: fmassa

Differential Revision: D29932701

fbshipit-source-id: b36efaa3365fc46857271ca07aa536851b6af8db
@frgfm frgfm mentioned this pull request Jul 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants