New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Style inconsistency makes it hard to contribute using pycharm #205
Comments
Thanks for the thoughtful discussion! IIRC you can configure pycharm to disable this functionality - with most jetbrains stuff you can do so on a per-project basis, although I can't recall if pycharm allows that or not. The main thing I note in the change you show is that it uses up a lot more vertical space, which I certainly don't like. Also, I often try to line things up vertically when 2 lines have similar params, which means inserting some extra spaces here and there. I wouldn't want to have a linter remove that manual formatting. Since I generally prefer to format code based on the semantics and structure of the specific code I'm looking at, rather than based on automatic rules, I'm not sure how well auto-linting will work with this project - at least for formatting. Many of the other issues mentioned in your screenshot look like things I'd be happy to fix, although they'd need to be tested carefully of course. |
You've got me, the pycharm argument was a good reason, not the true one. Moreover my "PEP8 example" had more many blank lines than required by PEP8. (I've posted a corrected version at the end) The point is that I could see your library being widely adopted I love the idea of an accessible deep learning library that can be used by regular programmers, with a sensible set of defaults. But a regular python programmer is used to PEP8, and it is hard to shake off, especially when most of the python libraries (including TF & pytorch) follow this style to some extent. So there will be a constant tension, that I'd like to minimize. The argument about autoformatting was a bad turn, It was just to give you a good rationale to talk about the styles without going into the discussion around personal tastes and adoption. You are right formatting the code automatically won't work and I wasn't even thinking about it. I meant configuring linters to return warnings that you truly want to fix. When It returns 8k warnings it isn't helpful. If we configure the pep8 linter then we can use it to create a consistent style for the project that can be learned by contributors, it autocorrect the code. To create a consistent style we can see what types of warnings we have in the code and decide what to correct and which checks to disable:
I guess you don't care about the infrequent issues and we can assume that all of the should be corrected. So let's talk about the most frequrent issues:
I guess you don't mind having that corrected. For me, it is a sign of clean code so I would love to see it fixed.
I bet you had your reasoning for that, and I don't think ppl will care too much about this. Although the reason why PEP recommends spaces is to avoid having it mixed with spaces as this can introduce errors that are hard to notice, especially for newbies. Correcting this would be a pain so I would keep it. I think there will be another warning if the tabs and spaces are mixed, so this warning can be safely disabled.
I bet no-one cares 4 characters. The only reason to correct this is to be able to enforce the 80c rule in the future.
That seems to be caused by commented code or the alignment you were talking about. I would turn that warning off.
This is at the end of the classes, I would love that to be corrected. It helps me locate places where a class ends.
That is one major readability issue for me, I think it is the second reason after E231 why I started this issue.
I looked at few examples and I guess you are fine to get that corrected. Here they are:
Here is an example: def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, custom_head=None, precompute=False, **kwargs):
models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg,
ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut, custom_head=custom_head)
return cls(data, models, precompute, **kwargs) should become: def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, custom_head=None, precompute=False, **kwargs):
models = ConvnetBuilder(
f, data.c, data.is_multi, data.is_reg,
ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut, custom_head=custom_head)
return cls(data, models, precompute, **kwargs) or def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, custom_head=None, precompute=False, **kwargs):
models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg,
ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut,
custom_head=custom_head)
return cls(data, models, precompute, **kwargs)
I guess we can have this corrected if it does not destroy vertical alignment. Here is a corrected version of PEP8 code. I hope it isn't that bad anymore. def resize(self, targ, new_path):
"""Return a copy of ImageData with resized images cached in {self.path}/{new_path}/{targ}/."""
new_ds = []
dls = [self.trn_dl, self.val_dl, self.fix_dl, self.aug_dl]
if self.test_dl:
dls += [self.test_dl, self.test_aug_dl]
else:
dls += [None, None]
t = tqdm_notebook(dls)
for dl in t: new_ds.append(self.resized(dl, targ, new_path))
t.close()
return self.__class__(new_ds[0].path, new_ds, self.bs, self.num_workers, self.classes) Let me know which PEP8 checks would you like to disable. Does the list I've suggested make sense? |
Since I deeply dislike PEP8 and the entire culture around bike-shedding and lack of open-mindedness in much of the python community, I think this tension might be hard to avoid ;)
Yup. Although I haven't found any linters that are thoughtful enough to actually work the way I want for nearly any language, unfortunately.
Thanks this is helpful
This is intentional. Nearly every class has something like:
The spacing clearly separates LHS from RHS. Adding a space after commas here makes it less readable (to me).
That is not intentional. Where do we have tabs? I thought my vimrc automatically fixes that.
The 80 char line rule is a hangover from a long-gone era!
Many classes are just a line or two, and often they're put right next to each other so you can clearly see how they are similar/different. Only classes longer than a about screen should have 2 blank lines.
I much prefer this:
Generally, where the 2 parts of a conditional or try block fit on a line and don't have too much going on, I prefer to have them together.
For each operator it depends on the context. Generally if there aren't spaces around '=' it's probably a mistake. But for math equations I try to lay it out closest to how it looks in a paper. Often that means no spaces around some operators.
Thanks for the example. Replacing 2 lines of code here with 3 means more vertical space is used for no (IMO) good reason, so I'd rather not do that.
Generally I try to lay things out in a way that maximizes ability to understand the code block with jumping around. I'm sure there are examples where I've done a suboptimal job and am happy to fix them on a case by case basis.
It's better, although I still don't like the conditional.
I can't see any checks here which could be automated without breaking things, except perhaps for the tabs (I'd want to see what was triggering that first though). Thanks again for your careful analysis! |
Ok fair enough no more bike-sheeding :). I've learned enough to be able to replicate your style where it matter so that I can try to contribute. I will set PEP8 to ignore what is intentional, and correct all the rest and we see where it get us. The following warnings are going to be disabled:
The rest of warnings stay, with intention to get rid of them or put them on the list above.
Let me think how I can best reflect this discussion in the linter configurations / tests etc. and I will propose something in a PR. |
Thanks - I think this issue will be a useful reference for the future,
in the absence (for now) of a fastai style-guide.
|
btw. pytorch managed to used auto pep8, some time in 2017, I will try to configure it according to your preferences and we see what results we get. |
If you ignore PEP8 warnings https://github.com/hhatto/autopep8 is nice and you can configure things to ignore https://github.com/pytorch/pytorch/blob/e7c1e6a8e39df0d206efe247f5eb0481eb8b8b6c/setup.cfg maybe you can focus in what to ignore from the top of the warnings (because they show the way a programmer do things, or the more prevalent things on fast ai source code)
I think which is the largest line in the code? So your configure could be some like
with this you will skip 3110 Remember PEP8 is a guidance, but if your repo does have some specifics, they could be keep in place, you only need to configure things out, so that all people could follow and check if needed... that means that you can choose to ignore (maybe corrects the ones with less than 100 ocurrences in code, or maybe less than 50). We should test if pycharm follows this warnings turn offs in automatic (no need to manually shut them down), if it does, it will be worth to do it. |
if you are working on open-source project I would strongly recommend to learn PEP8 and what python community likes in general. My personal and the most unfavorite part is PEP8
You say "Practical Deep Learning For Coders" - and at the same time writing you code in a way codes hate a lot :) |
@oduvan fastai is in early stages when you iterate a lot, the There is a lot of merit to the styles decisions and Jeremy explains it quite well. I think we will find a middle ground soon enough. |
how so? I haven't seen this kind approach in any project. When Open Source project has a very bad coding style at the very beginning in order to clean everything up later. How long this project exists? 2 weeks? I know, PEP8 is just a recommendations and some of them are not crucial (like line length), but I this using In order to save time on coding you can use PyCharm. It adds import lines automatically when you use functions |
Here is an easy example. I'm trying to understand how open_image function works. https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L218 From source code, I see that it is using cv2 object. I want to understand what this object actually means (or can I learn something about the object), I scroll up to the top of the module and here is what I see https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L3
"thank you for choosing our airlines" (c) |
@oduvan, it looks you are quit passionate about good engineering practices and It seems that you want to help but you are put a back with the code smell. Maybe you even wonder why such awesome idea as fastai is being hampered by poor code readability? On the other hand Jeremy and some fastai students are focussing on completly different topics like making the library the fastest in the world. Have you seen this results:
I'm still amazed with them, and if you are too help us with getting this library up to open-source standards. To do so we need good test coverage otherwise we are risking breaking things for Jeremy and others that are trying to promote this library. If you want to help here is a pull request #286 where you will find how the test are being written. Re. cv2 - that opencv, simply click with pycharm on the object to jump to the right place. I know import * make things harder to follow but without test it is impossible to get rid of them, besides a lot of great minds that work with fast.ai use vim without plugins :( and they will get irritated when they would have to jump to the top to import every single thing they need. This can be cleaned up but first we need good tests. |
I would be more than happy to contribute, but let me finish the course first :) |
Hi Jeremy,
I know you have particularly strong opinion regarding the code style that suites data science, and I fully support you. Normally I would refrain from starting a discussion about such unimportant topic.
However I would like to help document and fix bugs in the library and I've noticed that pycharm cleans up the files automatically to adhere to PEP 8, which makes it hard to use it to make clean commits.
To address this I would like to create a style configuration that reflects your conventions for pycharm and for one of the linters (pylint or flake8), and then clean up all inconsistencies such linter would find.
I know you use mnemonics in names which make the code shorter, which I find nice to read. I guess that Indent 2 is also chosen to make the code shorter. But how about other parts of pep8. Like whitespace, blank lines (especially after if/else), imports etc? Would you consider adhering to some part of the PEP8 to make it easier for others to contribute and to make it easier to configure the linters?
Here is how an example 'resize' function would look like after applying PEP8 advice:
How it is now in dataset.py
After following PEP8
Btw there seems to be a bug in the
resize_imgs
used byImageData.resize
as the new_path isn't being used. (that for different issue)btw. here are screen showing all stylistic issues pycharm has found:

The text was updated successfully, but these errors were encountered: