-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added missing typing annotations in datasets/ucf101 #4171
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @frgfm! Unfortunately, the mypy
failures are relevant:
torchvision/datasets/ucf101.py:106: error: List comprehension has incompatible
type List[List[str]]; expected List[str] [misc]
data = [x.strip().split(" ") for x in data]
^
torchvision/datasets/ucf101.py:109: error: Incompatible types in assignment
(expression has type "Set[str]", variable has type "List[str]") [assignment]
selected_files = set(selected_files)
^
Found 2 errors in 1 file (checked 110 source files)
Sorry, rant time. The offending lines are the following: selected_files = []
with open(f, "r") as fid:
data = fid.readlines()
data = [x.strip().split(" ") for x in data]
data = [os.path.join(self.root, x[0]) for x in data]
selected_files.extend(data)
selected_files = set(selected_files) This is perfectly fine Python code. Mypy is unhappy about this very useful and common pattern: l = some_iterable
l = [f(y) for y in l] It's also unhappy about mypy is wrong here, and will force us to re-write the code in something that makes less sense, and that is less readable. These kind of false positive happen all the time, everywhere. They cost maintenance time, and cause debt. End of rant, sorry again. I'm really questioning the benefits of type checking torchvision. |
Disagree. data = fid.readlines() # List[str]
data = [x.strip().split(" ") for x in data] # List[List[str]]
data = [os.path.join(self.root, x[0]) for x in data] # List[str] We split on line 2 and on line 3 select only the first element. By simply moving the data = fid.readlines() # List[str]
data = [x.strip().split(" ")[0] for x in data] # List[str]
data = [os.path.join(self.root, x) for x in data] # List[str] It is also better "decoupled". First line reads raw data, second line removes unwanted stuff and third line adds some paths to it. As for the
From these, IMO 1. is the best option. To summarize: is the code running? Yes. Is perfectly fine?`No. |
@frgfm Since I already dug into this, you can --- a/torchvision/datasets/ucf101.py
+++ b/torchvision/datasets/ucf101.py
@@ -100,13 +100,12 @@ class UCF101(VisionDataset):
name = "train" if train else "test"
name = "{}list{:02d}.txt".format(name, fold)
f = os.path.join(annotation_path, name)
- selected_files = []
+ selected_files = set()
with open(f, "r") as fid:
data = fid.readlines()
- data = [x.strip().split(" ") for x in data]
- data = [os.path.join(self.root, x[0]) for x in data]
- selected_files.extend(data)
- selected_files = set(selected_files)
+ data = [x.strip().split(" ")[0] for x in data]
+ data = [os.path.join(self.root, x) for x in data]
+ selected_files.update(data)
indices = [i for i in range(len(video_list)) if video_list[i] in selected_files]
return indices to improve the code and make |
Agree to disagree then :). It might not be as "perfect", but it's still perfectly fine. There's no bug (hopefully), it's readable, and isn't bottlenecking performance. There's nothing in the original code that is inherently wrong. It doesn't warrant a CI to go red, nor does it justify the extra work of re-writing it. Outside of the type-annotation context, if a contributor opened a PR with those changes, we might accept the changes but we might also reply "thanks but no thanks, this doesn't bring enough improvement to be considered". |
I don't understand. If someone opens a PR fixing a typo in the documentation would you also consider turning it down because it "doesn't bring enough improvement to be considered"? If no, what is different in this case? This is also easily reviewed and 100% covered by tests. |
No, typo fixes are net and clear improvements IMO. This re-writing isn't, or at least it's very, very marginal. The fact that mypy systematically forces us to address those things which provide very little gain is a concern to me. |
Hi @NicolasHug, I agree that the transition is troublesome sometimes, but from what I gather in #2025, typing annotations are being added for new pieces of code. The tricky part is that we would prefer the entire codebase to have typing, and fortunately, you guys have some contributors hoping to help with that 👍 So yes, this PR doesn't fix a typing issue, it slightly modifies the codebase so that we could, later on, enjoy typing over the entire codebase. If that has changed, happy to close this PR but then I guess we need to close or discuss a few things in #2025 🤷♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @frgfm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @frgfm , LGTM. I hope my rant didn't deter you :). I have one question below but I'll merge this one anyway
_video_height: int = 0, | ||
_video_min_dimension: int = 0, | ||
_audio_samples: int = 0 | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious here: were you using something (an IDE, a type checker (mypy??) or something else) that was prompting you to type the return of __init__
?
If yes, I'm curious to know which one, and why they would do that. But if not, I would recommend not to type those in future PRs, as this is essentially useless.
I have the same question/remark for typing the return type of __len__
below actually
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I don't since I have Sublime Text.
But it's not the first PR I've opened regarding typing on this repo, and the last time I was asked to add those. Happy to change that in future PRs, let me know what you prefer :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently mypy
probably accepts those, but if we run with no_untyped_defs
it will yell at us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details. OK, if mypy is that needy, I guess we have to surrender and keep typing __init__
Reviewed By: fmassa Differential Revision: D29932701 fbshipit-source-id: b36efaa3365fc46857271ca07aa536851b6af8db
Following up on #2025, this PR adds missing typing annotations in datasets/ucf101.py.
Any feedback is welcome!