Added missing typing annotations in datasets/ucf101 #4171

frgfm · 2021-07-12T22:55:57Z

Following up on #2025, this PR adds missing typing annotations in datasets/ucf101.py.

Any feedback is welcome!

pmeier

Thanks for the PR @frgfm! Unfortunately, the mypy failures are relevant:

torchvision/datasets/ucf101.py:106: error: List comprehension has incompatible
type List[List[str]]; expected List[str]  [misc]
                data = [x.strip().split(" ") for x in data]
                        ^
torchvision/datasets/ucf101.py:109: error: Incompatible types in assignment
(expression has type "Set[str]", variable has type "List[str]")  [assignment]
            selected_files = set(selected_files)
                             ^
Found 2 errors in 1 file (checked 110 source files)

NicolasHug · 2021-07-15T12:58:27Z

Sorry, rant time.

The offending lines are the following:

        selected_files = []
        with open(f, "r") as fid:
            data = fid.readlines()
            data = [x.strip().split(" ") for x in data]
            data = [os.path.join(self.root, x[0]) for x in data]
            selected_files.extend(data)
        selected_files = set(selected_files)

This is perfectly fine Python code.

Mypy is unhappy about this very useful and common pattern:

l = some_iterable
l = [f(y) for y in l]

It's also unhappy about l = set(l), making the set() constructor pretty much unusable with anything else but an empty set.

mypy is wrong here, and will force us to re-write the code in something that makes less sense, and that is less readable. These kind of false positive happen all the time, everywhere. They cost maintenance time, and cause debt.

End of rant, sorry again. I'm really questioning the benefits of type checking torchvision.

pmeier · 2021-07-15T13:38:22Z

This is perfectly fine Python code.

Disagree.

data = fid.readlines()  # List[str]
data = [x.strip().split(" ") for x in data]  # List[List[str]]
data = [os.path.join(self.root, x[0]) for x in data]  # List[str]

We split on line 2 and on line 3 select only the first element. By simply moving the [0] index to line 2 mypy is happy.

data = fid.readlines()  # List[str]
data = [x.strip().split(" ")[0] for x in data]  # List[str]
data = [os.path.join(self.root, x) for x in data]  # List[str]

It is also better "decoupled". First line reads raw data, second line removes unwanted stuff and third line adds some paths to it.

As for the set call. mypy complains because data was a List[str] before and is a Set[str] afterwards. Four possible remedies:

Instead of using a list at first and converting it into a set later, why not use a set directly?
We can move this call directly into the list comprehension removing the need for a temporary variable.
We can put a type: ignore[no-redef] on the line if we want to keep it as is.
If we don't want mypy to ever complain about this we can put a allow_redefinition = True into mypy.ini and be done with it.

From these, IMO 1. is the best option.

To summarize: is the code running? Yes. Is perfectly fine?`No.

pmeier · 2021-07-15T13:41:59Z

@frgfm Since I already dug into this, you can git apply

--- a/torchvision/datasets/ucf101.py
+++ b/torchvision/datasets/ucf101.py
@@ -100,13 +100,12 @@ class UCF101(VisionDataset):
         name = "train" if train else "test"
         name = "{}list{:02d}.txt".format(name, fold)
         f = os.path.join(annotation_path, name)
-        selected_files = []
+        selected_files = set()
         with open(f, "r") as fid:
             data = fid.readlines()
-            data = [x.strip().split(" ") for x in data]
-            data = [os.path.join(self.root, x[0]) for x in data]
-            selected_files.extend(data)
-        selected_files = set(selected_files)
+            data = [x.strip().split(" ")[0] for x in data]
+            data = [os.path.join(self.root, x) for x in data]
+            selected_files.update(data)
         indices = [i for i in range(len(video_list)) if video_list[i] in selected_files]
         return indices

to improve the code and make mypy happy.

NicolasHug · 2021-07-15T14:25:01Z

Is perfectly fine?`No.

Agree to disagree then :).

It might not be as "perfect", but it's still perfectly fine. There's no bug (hopefully), it's readable, and isn't bottlenecking performance.

There's nothing in the original code that is inherently wrong. It doesn't warrant a CI to go red, nor does it justify the extra work of re-writing it.

Outside of the type-annotation context, if a contributor opened a PR with those changes, we might accept the changes but we might also reply "thanks but no thanks, this doesn't bring enough improvement to be considered".

pmeier · 2021-07-15T15:10:58Z

but we might also reply "thanks but no thanks, this doesn't bring enough improvement to be considered".

I don't understand. If someone opens a PR fixing a typo in the documentation would you also consider turning it down because it "doesn't bring enough improvement to be considered"?

If no, what is different in this case? This is also easily reviewed and 100% covered by tests.

NicolasHug · 2021-07-15T15:19:32Z

No, typo fixes are net and clear improvements IMO. This re-writing isn't, or at least it's very, very marginal. The fact that mypy systematically forces us to address those things which provide very little gain is a concern to me.

frgfm · 2021-07-19T17:48:31Z

No, typo fixes are net and clear improvements IMO. This re-writing isn't, or at least it's very, very marginal. The fact that mypy systematically forces us to address those things which provide very little gain is a concern to me.

Hi @NicolasHug,

I agree that the transition is troublesome sometimes, but from what I gather in #2025, typing annotations are being added for new pieces of code. The tricky part is that we would prefer the entire codebase to have typing, and fortunately, you guys have some contributors hoping to help with that 👍

So yes, this PR doesn't fix a typing issue, it slightly modifies the codebase so that we could, later on, enjoy typing over the entire codebase. If that has changed, happy to close this PR but then I guess we need to close or discuss a few things in #2025 🤷‍♂️

pmeier

LGTM, thanks @frgfm!

NicolasHug

Thanks @frgfm , LGTM. I hope my rant didn't deter you :). I have one question below but I'll merge this one anyway

NicolasHug · 2021-07-20T11:15:16Z

torchvision/datasets/ucf101.py

+        _video_height: int = 0,
+        _video_min_dimension: int = 0,
+        _audio_samples: int = 0
+    ) -> None:


I'm just curious here: were you using something (an IDE, a type checker (mypy??) or something else) that was prompting you to type the return of __init__?

If yes, I'm curious to know which one, and why they would do that. But if not, I would recommend not to type those in future PRs, as this is essentially useless.

I have the same question/remark for typing the return type of __len__ below actually

Actually, I don't since I have Sublime Text.
But it's not the first PR I've opened regarding typing on this repo, and the last time I was asked to add those. Happy to change that in future PRs, let me know what you prefer :)

Currently mypy probably accepts those, but if we run with no_untyped_defs it will yell at us.

Thanks for the details. OK, if mypy is that needy, I guess we have to surrender and keep typing __init__

Reviewed By: fmassa Differential Revision: D29932701 fbshipit-source-id: b36efaa3365fc46857271ca07aa536851b6af8db

style: Added missing typing annotations

2722187

facebook-github-bot added the cla signed label Jul 12, 2021

chore: Fixed missing import

468d2aa

pmeier self-requested a review July 14, 2021 12:31

pmeier requested changes Jul 15, 2021

View reviewed changes

frgfm added 2 commits July 19, 2021 19:32

Merge branch 'master' into ucf101-typing

4245694

style: Fixed typing

dc2fcab

pmeier approved these changes Jul 20, 2021

View reviewed changes

NicolasHug approved these changes Jul 20, 2021

View reviewed changes

NicolasHug merged commit 642ad75 into pytorch:master Jul 20, 2021

NicolasHug added code quality module: datasets labels Jul 20, 2021

frgfm deleted the ucf101-typing branch July 20, 2021 14:41

facebook-github-bot pushed a commit that referenced this pull request Jul 27, 2021

[fbsync] Added missing typing annotations in datasets/ucf101 (#4171)

8f0bf1f

Reviewed By: fmassa Differential Revision: D29932701 fbshipit-source-id: b36efaa3365fc46857271ca07aa536851b6af8db

frgfm mentioned this pull request Jul 28, 2021

Type annotations #2025

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added missing typing annotations in datasets/ucf101 #4171

Added missing typing annotations in datasets/ucf101 #4171

frgfm commented Jul 12, 2021

pmeier left a comment

NicolasHug commented Jul 15, 2021 •

edited

Loading

pmeier commented Jul 15, 2021 •

edited

Loading

pmeier commented Jul 15, 2021

NicolasHug commented Jul 15, 2021 •

edited

Loading

pmeier commented Jul 15, 2021

NicolasHug commented Jul 15, 2021 •

edited

Loading

frgfm commented Jul 19, 2021 •

edited

Loading

pmeier left a comment

NicolasHug left a comment

NicolasHug Jul 20, 2021

frgfm Jul 20, 2021 •

edited

Loading

pmeier Jul 20, 2021

NicolasHug Jul 20, 2021

Added missing typing annotations in datasets/ucf101 #4171

Added missing typing annotations in datasets/ucf101 #4171

Conversation

frgfm commented Jul 12, 2021

pmeier left a comment

Choose a reason for hiding this comment

NicolasHug commented Jul 15, 2021 • edited Loading

pmeier commented Jul 15, 2021 • edited Loading

pmeier commented Jul 15, 2021

NicolasHug commented Jul 15, 2021 • edited Loading

pmeier commented Jul 15, 2021

NicolasHug commented Jul 15, 2021 • edited Loading

frgfm commented Jul 19, 2021 • edited Loading

pmeier left a comment

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

NicolasHug Jul 20, 2021

Choose a reason for hiding this comment

frgfm Jul 20, 2021 • edited Loading

Choose a reason for hiding this comment

pmeier Jul 20, 2021

Choose a reason for hiding this comment

NicolasHug Jul 20, 2021

Choose a reason for hiding this comment

NicolasHug commented Jul 15, 2021 •

edited

Loading

pmeier commented Jul 15, 2021 •

edited

Loading

NicolasHug commented Jul 15, 2021 •

edited

Loading

NicolasHug commented Jul 15, 2021 •

edited

Loading

frgfm commented Jul 19, 2021 •

edited

Loading

frgfm Jul 20, 2021 •

edited

Loading