Style inconsistency makes it hard to contribute using pycharm #205

PiotrCzapla · 2018-03-10T18:17:57Z

Hi Jeremy,

I know you have particularly strong opinion regarding the code style that suites data science, and I fully support you. Normally I would refrain from starting a discussion about such unimportant topic.
However I would like to help document and fix bugs in the library and I've noticed that pycharm cleans up the files automatically to adhere to PEP 8, which makes it hard to use it to make clean commits.

To address this I would like to create a style configuration that reflects your conventions for pycharm and for one of the linters (pylint or flake8), and then clean up all inconsistencies such linter would find.

I know you use mnemonics in names which make the code shorter, which I find nice to read. I guess that Indent 2 is also chosen to make the code shorter. But how about other parts of pep8. Like whitespace, blank lines (especially after if/else), imports etc? Would you consider adhering to some part of the PEP8 to make it easier for others to contribute and to make it easier to configure the linters?

Here is how an example 'resize' function would look like after applying PEP8 advice:

How it is now in dataset.py

    def resize(self, targ, new_path):
        new_ds = []
        dls = [self.trn_dl,self.val_dl,self.fix_dl,self.aug_dl]
        if self.test_dl: dls += [self.test_dl, self.test_aug_dl]
        else: dls += [None,None]
        t = tqdm_notebook(dls)
        for dl in t: new_ds.append(self.resized(dl, targ, new_path))
        t.close()
        return self.__class__(new_ds[0].path, new_ds, self.bs, self.num_workers, self.classes)

After following PEP8

      def resize(self, targ, new_path):
        """Return a copy of ImageData with resized images   
        
        The images are cached in {self.path}/{new_path}/{targ}/

        :param targ: the new size (sz) to resize images to [targ, trag].
        :param new_path: the directory where to store resized images relative to self.path
        :return: a new ImageData that represents directory with resized images
        """
        
        new_ds = []

        dls = [self.trn_dl, self.val_dl, self.fix_dl, self.aug_dl]
        if self.test_dl:
            dls += [self.test_dl, self.test_aug_dl]
        else:
            dls += [None, None]

        t = tqdm_notebook(dls)
        for dl in t:
            new_ds.append(self.resized(dl, targ, new_path))
        t.close()

        return self.__class__(new_ds[0].path, new_ds, self.bs, self.num_workers, self.classes)

Btw there seems to be a bug in theresize_imgs used by ImageData.resize as the new_path isn't being used. (that for different issue)

btw. here are screen showing all stylistic issues pycharm has found:

The text was updated successfully, but these errors were encountered:

jph00 · 2018-03-13T16:50:17Z

Thanks for the thoughtful discussion! IIRC you can configure pycharm to disable this functionality - with most jetbrains stuff you can do so on a per-project basis, although I can't recall if pycharm allows that or not.

The main thing I note in the change you show is that it uses up a lot more vertical space, which I certainly don't like. Also, I often try to line things up vertically when 2 lines have similar params, which means inserting some extra spaces here and there. I wouldn't want to have a linter remove that manual formatting.

Since I generally prefer to format code based on the semantics and structure of the specific code I'm looking at, rather than based on automatic rules, I'm not sure how well auto-linting will work with this project - at least for formatting. Many of the other issues mentioned in your screenshot look like things I'd be happy to fix, although they'd need to be tested carefully of course.

PiotrCzapla · 2018-03-13T22:21:39Z

You've got me, the pycharm argument was a good reason, not the true one. Moreover my "PEP8 example" had more many blank lines than required by PEP8. (I've posted a corrected version at the end)

The point is that I could see your library being widely adopted I love the idea of an accessible deep learning library that can be used by regular programmers, with a sensible set of defaults. But a regular python programmer is used to PEP8, and it is hard to shake off, especially when most of the python libraries (including TF & pytorch) follow this style to some extent. So there will be a constant tension, that I'd like to minimize.

The argument about autoformatting was a bad turn, It was just to give you a good rationale to talk about the styles without going into the discussion around personal tastes and adoption. You are right formatting the code automatically won't work and I wasn't even thinking about it.

I meant configuring linters to return warnings that you truly want to fix. When It returns 8k warnings it isn't helpful. If we configure the pep8 linter then we can use it to create a consistent style for the project that can be learned by contributors, it autocorrect the code.

To create a consistent style we can see what types of warnings we have in the code and decide what to correct and which checks to disable:

$ pep8 --statistics -qq . | sort -nr
3110    E231 missing whitespace after ','
1938    W191 indentation contains tabs
669     E501 line too long (83 > 79 characters)
619     E261 at least two spaces before inline comment
266     E302 expected 2 blank lines, found 1
188     E701 multiple statements on one line (colon)
164     E225 missing whitespace around operator
142     E128 continuation line under-indented for visual indent
130     E111 indentation is not a multiple of four
55      E301 expected 1 blank line, found 0
54      W293 blank line contains whitespace
22      W391 blank line at end of file
20      E265 block comment should start with '# '
13      W291 trailing whitespace
10      E401 multiple imports on one line
9       E202 whitespace before ']'
8       E114 indentation is not a multiple of four (comment)
6       E266 too many leading '#' for block comment
6       E221 multiple spaces before operator
6       E127 continuation line over-indented for visual indent
5       E201 whitespace after '('
4       E702 multiple statements on one line (semicolon)
4       E402 module level import not at top of file
4       E116 unexpected indentation (comment)
4       E101 indentation contains mixed spaces and tabs
3       W503 line break before binary operator
3       W292 no newline at end of file
3       E303 too many blank lines (2)
3       E272 multiple spaces before keyword
3       E211 whitespace before '['
3       E203 whitespace before ':'
2       E251 unexpected spaces around keyword / parameter equals
2       E122 continuation line missing indentation or outdented
1       E712 comparison to True should be 'if cond is True:' or 'if cond:'
1       E711 comparison to None should be 'if cond is None:'
1       E703 statement ends with a semicolon
1       E502 the backslash is redundant between brackets
1       E271 multiple spaces after keyword
1       E228 missing whitespace around modulo operator
1       E222 multiple spaces after operator
1       E124 closing bracket does not match visual indentation

I guess you don't care about the infrequent issues and we can assume that all of the should be corrected. So let's talk about the most frequrent issues:

3110 E231 missing whitespace after ','

I guess you don't mind having that corrected. For me, it is a sign of clean code so I would love to see it fixed.

1938 W191 indentation contains tabs

I bet you had your reasoning for that, and I don't think ppl will care too much about this. Although the reason why PEP recommends spaces is to avoid having it mixed with spaces as this can introduce errors that are hard to notice, especially for newbies. Correcting this would be a pain so I would keep it. I think there will be another warning if the tabs and spaces are mixed, so this warning can be safely disabled.

669 E501 line too long (83 > 79 characters)

I bet no-one cares 4 characters. The only reason to correct this is to be able to enforce the 80c rule in the future.

619 E261 at least two spaces before inline comment

That seems to be caused by commented code or the alignment you were talking about. I would turn that warning off.

266 E302 expected 2 blank lines, found 1

This is at the end of the classes, I would love that to be corrected. It helps me locate places where a class ends.

188 E701 multiple statements on one line (colon)

That is one major readability issue for me, I think it is the second reason after E231 why I started this issue.

164 E225 missing whitespaces around operator

I looked at few examples and I guess you are fine to get that corrected. Here they are:

*xs,y=args should become:
*xs, y = args
self.n_emb, self.n_cont=n_emb, n_cont should become
self.n_emb, self.n_cont = n_emb, n_cont

142 E128 continuation line under-indented for visual indent

Here is an example:

    def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, custom_head=None, precompute=False, **kwargs):
        models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg,
            ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut, custom_head=custom_head)
        return cls(data, models, precompute, **kwargs)

should become:

    def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, custom_head=None, precompute=False, **kwargs):
        models = ConvnetBuilder(
            f, data.c, data.is_multi, data.is_reg,
            ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut, custom_head=custom_head)
        return cls(data, models, precompute, **kwargs)

or

    def pretrained(cls, f, data, ps=None, xtra_fc=None, xtra_cut=0, custom_head=None, precompute=False, **kwargs):
        models = ConvnetBuilder(f, data.c, data.is_multi, data.is_reg,
                                ps=ps, xtra_fc=xtra_fc, xtra_cut=xtra_cut,
                                custom_head=custom_head)
        return cls(data, models, precompute, **kwargs)

130 E111 indentation is not a multiple of four

I guess we can have this corrected if it does not destroy vertical alignment.

Here is a corrected version of PEP8 code. I hope it isn't that bad anymore.

def resize(self, targ, new_path):
        """Return a copy of ImageData with resized images cached in {self.path}/{new_path}/{targ}/."""
        new_ds = []
        dls = [self.trn_dl, self.val_dl, self.fix_dl, self.aug_dl]
        if self.test_dl:
            dls += [self.test_dl, self.test_aug_dl]
        else:
            dls += [None, None]
        t = tqdm_notebook(dls)
        for dl in t:  new_ds.append(self.resized(dl, targ, new_path))
        t.close()
        return self.__class__(new_ds[0].path, new_ds, self.bs, self.num_workers, self.classes)

Let me know which PEP8 checks would you like to disable. Does the list I've suggested make sense?

jph00 · 2018-03-14T13:53:19Z

But a regular python programmer is used to PEP8, and it is hard to shake off, especially when most of the python libraries (including TF & pytorch) follow this style to some extent. So there will be a constant tension, that I'd like to minimize.

Since I deeply dislike PEP8 and the entire culture around bike-shedding and lack of open-mindedness in much of the python community, I think this tension might be hard to avoid ;)

I meant configuring linters to return warnings that you truly want to fix.

Yup. Although I haven't found any linters that are thoughtful enough to actually work the way I want for nearly any language, unfortunately.

To create a consistent style we can see what types of warnings we have in the code and decide what to correct and which checks to disable:

Thanks this is helpful

3110 E231 missing whitespace after ','

This is intentional. Nearly every class has something like:

self.x,self.y = x,y

The spacing clearly separates LHS from RHS. Adding a space after commas here makes it less readable (to me).

1938 W191 indentation contains tabs

That is not intentional. Where do we have tabs? I thought my vimrc automatically fixes that.

669 E501 line too long (83 > 79 characters)

The 80 char line rule is a hangover from a long-gone era!

266 E302 expected 2 blank lines, found 1

Many classes are just a line or two, and often they're put right next to each other so you can clearly see how they are similar/different. Only classes longer than a about screen should have 2 blank lines.

188 E701 multiple statements on one line (colon)

I much prefer this:

if a: do_a()
else: do_b()

Generally, where the 2 parts of a conditional or try block fit on a line and don't have too much going on, I prefer to have them together.

164 E225 missing whitespaces around operator

For each operator it depends on the context. Generally if there aren't spaces around '=' it's probably a mistake. But for math equations I try to lay it out closest to how it looks in a paper. Often that means no spaces around some operators.

142 E128 continuation line under-indented for visual indent

Thanks for the example. Replacing 2 lines of code here with 3 means more vertical space is used for no (IMO) good reason, so I'd rather not do that.

130 E111 indentation is not a multiple of four

Generally I try to lay things out in a way that maximizes ability to understand the code block with jumping around. I'm sure there are examples where I've done a suboptimal job and am happy to fix them on a case by case basis.

Here is a corrected version of PEP8 code. I hope it isn't that bad anymore.

It's better, although I still don't like the conditional.

Let me know which PEP8 checks would you like to disable. Does the list I've suggested make sense?

I can't see any checks here which could be automated without breaking things, except perhaps for the tabs (I'd want to see what was triggering that first though).

Thanks again for your careful analysis!

PiotrCzapla · 2018-03-14T14:20:40Z

Ok fair enough no more bike-sheeding :).

I've learned enough to be able to replicate your style where it matter so that I can try to contribute. I will set PEP8 to ignore what is intentional, and correct all the rest and we see where it get us.

The following warnings are going to be disabled:

3110 E231 missing whitespace after ','
669 E501 line too long (83 > 79 characters)
266 E302 expected 2 blank lines, found 1
188 E701 multiple statements on one line (colon)
142 E128 continuation line under-indented for visual indent

The rest of warnings stay, with intention to get rid of them or put them on the list above.
That includes:

1938 W191 indentation contains tabs
164 E225 missing whitespaces around operator
130 E111 indentation is not a multiple of four

Let me think how I can best reflect this discussion in the linter configurations / tests etc. and I will propose something in a PR.

jph00 · 2018-03-14T17:16:28Z

Thanks - I think this issue will be a useful reference for the future, in the absence (for now) of a fastai style-guide.

PiotrCzapla · 2018-03-14T17:56:41Z

btw. pytorch managed to used auto pep8, some time in 2017, I will try to configure it according to your preferences and we see what results we get.
pytorch/pytorch@e7c1e6a

tyoc213 · 2018-03-21T16:08:27Z

If you ignore PEP8 warnings 7497-6838 =659 things to fix.

https://github.com/hhatto/autopep8 is nice and you can configure things to ignore https://github.com/pytorch/pytorch/blob/e7c1e6a8e39df0d206efe247f5eb0481eb8b8b6c/setup.cfg

maybe you can focus in what to ignore from the top of the warnings (because they show the way a programmer do things, or the more prevalent things on fast ai source code)

$ pep8 --statistics -qq . | sort -nr
3110    E231 missing whitespace after ','
1938    W191 indentation contains tabs
669     E501 line too long (83 > 79 characters)
619     E261 at least two spaces before inline comment
266     E302 expected 2 blank lines, found 1
188     E701 multiple statements on one line (colon)
164     E225 missing whitespace around operator
142     E128 continuation line under-indented for visual indent
130     E111 indentation is not a multiple of four
55      E301 expected 1 blank line, found 0
54      W293 blank line contains whitespace

I think W191 should be fixed automatically because it contains tabs! :).

which is the largest line in the code?

So your configure could be some like

[pep8]
max-line-length = 512
ignore = E231

[flake8]
max-line-length = 512

with this you will skip 3110 some,other and 669 lines to loong. 3779 skips of 7497. Fixing 3718, maybe it will be nice to configure it and run it one time, so in the future the configuration could be checked.

Remember PEP8 is a guidance, but if your repo does have some specifics, they could be keep in place, you only need to configure things out, so that all people could follow and check if needed... that means that you can choose to ignore (maybe corrects the ones with less than 100 ocurrences in code, or maybe less than 50).

We should test if pycharm follows this warnings turn offs in automatic (no need to manually shut them down), if it does, it will be worth to do it.

oduvan · 2018-04-27T09:47:55Z

Since I deeply dislike PEP8 and the entire culture around bike-shedding and lack of open-mindedness in much of the python community, I think this tension might be hard to avoid ;)

if you are working on open-source project I would strongly recommend to learn PEP8 and what python community likes in general.

My personal and the most unfavorite part is import *

PEP8

Wildcard imports (from import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools.

You say "Practical Deep Learning For Coders" - and at the same time writing you code in a way codes hate a lot :)

PiotrCzapla · 2018-04-28T09:07:42Z

@oduvan fastai is in early stages when you iterate a lot, the import * is a very useful approach at this stage. It can be later cleaned up quickly with pycharm or any other tool. I'm accustomed to pep8 and I find it easier to follow but read the style guide file: https://github.com/fastai/fastai/blob/master/docs/style.md

There is a lot of merit to the styles decisions and Jeremy explains it quite well. I think we will find a middle ground soon enough.

oduvan · 2018-04-29T13:05:49Z

when you iterate a lot, the import * is a very useful approach at this stage.

how so?

I haven't seen this kind approach in any project. When Open Source project has a very bad coding style at the very beginning in order to clean everything up later.

How long this project exists? 2 weeks?

I know, PEP8 is just a recommendations and some of them are not crucial (like line length), but I this using import * all over the place, just because it is easier to code like this - is bad idea. Because it become so much hard to read.

In order to save time on coding you can use PyCharm. It adds import lines automatically when you use functions

oduvan · 2018-04-29T13:40:35Z

Here is an easy example. I'm trying to understand how open_image function works.

https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L218

From source code, I see that it is using cv2 object. I want to understand what this object actually means (or can I learn something about the object), I scroll up to the top of the module and here is what I see

https://github.com/fastai/fastai/blob/master/fastai/dataset.py#L3

from .imports import *
from .torch_imports import *
from .core import *
from .transforms import *

"thank you for choosing our airlines" (c)

PiotrCzapla · 2018-04-29T13:50:39Z

@oduvan, it looks you are quit passionate about good engineering practices and It seems that you want to help but you are put a back with the code smell. Maybe you even wonder why such awesome idea as fastai is being hampered by poor code readability?

On the other hand Jeremy and some fastai students are focussing on completly different topics like making the library the fastest in the world. Have you seen this results:

CIFAR from 1h in Jan April 2018 to 2 min in April 2018 and
ImageNet from 3h30m -> 2h57m and 128 less nodes - anything above Fastai result is on google's private cloud.

I'm still amazed with them, and if you are too help us with getting this library up to open-source standards. To do so we need good test coverage otherwise we are risking breaking things for Jeremy and others that are trying to promote this library.

If you want to help here is a pull request #286 where you will find how the test are being written.
I think you could start with the https://github.com/fastai/fastai/blob/master/fastai/core.py it has plenty of small functions that are used everywhere so good test there would be essential.

Re. cv2 - that opencv, simply click with pycharm on the object to jump to the right place. I know import * make things harder to follow but without test it is impossible to get rid of them, besides a lot of great minds that work with fast.ai use vim without plugins :( and they will get irritated when they would have to jump to the top to import every single thing they need. This can be cleaned up but first we need good tests.

oduvan · 2018-05-01T16:05:37Z

I would be more than happy to contribute, but let me finish the course first :)

jph00 closed this as completed Mar 23, 2018

borisdayma pushed a commit to borisdayma/fastai that referenced this issue Aug 18, 2020

fixes fastai#205 (fastai#209)

10dad0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Style inconsistency makes it hard to contribute using pycharm #205

Style inconsistency makes it hard to contribute using pycharm #205

PiotrCzapla commented Mar 10, 2018 •

edited

jph00 commented Mar 13, 2018

PiotrCzapla commented Mar 13, 2018

jph00 commented Mar 14, 2018

PiotrCzapla commented Mar 14, 2018

jph00 commented Mar 14, 2018 via email

PiotrCzapla commented Mar 14, 2018

tyoc213 commented Mar 21, 2018 •

edited

oduvan commented Apr 27, 2018

PiotrCzapla commented Apr 28, 2018

oduvan commented Apr 29, 2018

oduvan commented Apr 29, 2018

PiotrCzapla commented Apr 29, 2018 •

edited

oduvan commented May 1, 2018

Style inconsistency makes it hard to contribute using pycharm #205

Style inconsistency makes it hard to contribute using pycharm #205

Comments

PiotrCzapla commented Mar 10, 2018 • edited

jph00 commented Mar 13, 2018

PiotrCzapla commented Mar 13, 2018

jph00 commented Mar 14, 2018

PiotrCzapla commented Mar 14, 2018

jph00 commented Mar 14, 2018 via email

PiotrCzapla commented Mar 14, 2018

tyoc213 commented Mar 21, 2018 • edited

oduvan commented Apr 27, 2018

PiotrCzapla commented Apr 28, 2018

oduvan commented Apr 29, 2018

oduvan commented Apr 29, 2018

PiotrCzapla commented Apr 29, 2018 • edited

oduvan commented May 1, 2018

PiotrCzapla commented Mar 10, 2018 •

edited

tyoc213 commented Mar 21, 2018 •

edited

PiotrCzapla commented Apr 29, 2018 •

edited