Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type annotations for early stage compiler functions #1504

Merged
merged 9 commits into from Jul 2, 2019

Conversation

@davesque
Copy link
Contributor

davesque commented Jun 27, 2019

What I did

Added type annotations for a number of functions encountered during the first stages of invoking the vyper binary. Also did some refactors to facilitate type checking and annotation.

How to verify it

Run the tests. Reviewers might also find it helpful to look at each commit individually instead of the entire branch diff.

Description for the changelog

  • The API of vyper.compile_codes has been simplified to always return an OrderedDict mapping vyper code file names to their respective compilation results. This means that this function no longer accepts an output_type keyword argument. Users wishing to duplicate the functionality of passing output_type='list' should convert the results of this function to a list like so: list(compile_codes(...).values()).

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

davesque added 6 commits Jun 26, 2019
Improves readability and passes mypy.
@davesque davesque force-pushed the davesque:type-checking branch from b262833 to c531068 Jun 27, 2019
@davesque

This comment has been minimized.

Copy link
Contributor Author

davesque commented Jun 27, 2019

Actually, I'll just say that this PR is ready for review. I have some other changes that build on top of this but I don't want people to feel like they have to read through too many commits. I'll make those as another PR.

@davesque davesque changed the title [WIP] Type annotations for early stage compiler functions Type annotations for early stage compiler functions Jun 27, 2019
@davesque davesque requested review from pipermerriam and jacqueswww Jun 27, 2019
davesque added 2 commits Jun 27, 2019
@davesque davesque force-pushed the davesque:type-checking branch from 216b69a to 2403d26 Jun 27, 2019
@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jun 27, 2019

I am afraid you'll have to revert the compile_codes interface work as it's now a solidified compiler interface for tools to use and the change is not backwards compatible.
Looks good for the annotations and replacements with compile_code where we could use that.

bin/vyper Outdated

def uniq(seq: Iterable[T]) -> List[T]:
"""
Returns unique items in ``seq`` in order.

This comment has been minimized.

Copy link
@cburgdorf

cburgdorf Jun 27, 2019

Contributor

Nitpick: We use imperative present tense among the rest of the Python libraries and I think this should also follow this. (Same as Python docs and many others)

https://github.com/ethereum/snake-charmers-tactical-manual/blob/master/documentation.md#grammar

Return unique items in seq in order

As a rule of thumb, imperative present tense comes out when it fits in this:

If we call this API it will: __________________________

bin/vyper Outdated
T = TypeVar('T')


def uniq(seq: Iterable[T]) -> List[T]:

This comment has been minimized.

Copy link
@cburgdorf

cburgdorf Jun 27, 2019

Contributor

I'm aware this has nothing to do with this PR and should not be addressed here but it would probably be a good idea to not turn the Iterable[T] into a List[T] and instead do one of these things:

  1. Either just yield the items so that we return an Iterable[T] and have consumers deal with conversion if needed
    or
  2. Return a Tuple[T, ...] instead of a List[T] to not return something that is mutable.

This comment has been minimized.

Copy link
@davesque

davesque Jun 27, 2019

Author Contributor

@cburgdorf I like the yielding suggestion, thanks!

This comment has been minimized.

Copy link
@cburgdorf

cburgdorf Jun 27, 2019

Contributor

Great! Minor further suggestion. You could change the type annotation from Generator[T, None, None] to Iterable[T] which is basically a subset of a generator and looks a bit cleaner imho. Basically our convention in Trinity is to use Iterator[T] instead of Generator[T, None, None] whenever you are only interested in the yield type but not in the send type or return type (e.g. not using generator.send(something))

@davesque

This comment has been minimized.

Copy link
Contributor Author

davesque commented Jun 27, 2019

@jacqueswww Can you give me an example of why the API of compile_codes can't be changed? Vyper is pre-release software and I would assume these kinds of changes are still on the table. It also seems like compile_codes would be used far less often than command-line invocation of the vyper script. Is its use common enough that changing it is difficult?

Practically speaking, having a function return one type or another depending on the input makes working with mypy difficult. Also, having the API work that way makes it harder to remember how the code behaves.

@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jun 27, 2019

@davesque vyper-debug and vyper online off the top of my head. Last time we changed it many things ended up breaking and people popped on gitter that we had to handle on a case by case basis. I agree it's better to do it the above way, however I don't see mypy as worth breaking backwards compatibility change i.e. I don't see it as critical to have OrderedDict vs. having to find all projects that use it this way currently.

@fubuloubu

This comment has been minimized.

Copy link
Member

fubuloubu commented Jun 27, 2019

Can we ignore it for now?

@davesque

This comment has been minimized.

Copy link
Contributor Author

davesque commented Jun 27, 2019

@jacqueswww @fubuloubu Perhaps we can raise a deprecation warning if the arg is provided? I don't agree that mypy is not worth making backwards incompatible changes. It helps tremendously with code readability and is a very worthy endeavor, even if there are some growing pains in adopting it.

@fubuloubu

This comment has been minimized.

Copy link
Member

fubuloubu commented Jun 27, 2019

Is there a way to have it support both methods if that is the case?

@cburgdorf

This comment has been minimized.

Copy link
Contributor

cburgdorf commented Jun 27, 2019

I don't agree that mypy is not worth making backwards incompatible changes. It helps tremendously with code readability and is a very worthy endeavor, even if there are some growing pains in adopting it.

I can't comment on the specific case (other than that functions with variable return types make me sad 😅 ) but I 💯 % agree that adopting mypy is absolutely worth it. In the long run, it make the code base much more robust. When we started adopting mypy in Trinity we've found plenty of small bugs just by adding type hints! Also it makes refactoring code so much simpler and safer. Mypy does catch lots of things that are easy to go unnoticed otherwise.

@fubuloubu

This comment has been minimized.

Copy link
Member

fubuloubu commented Jun 27, 2019

I think it may make sense to adopt the change here, but in a separate PR and perhaps alongside the suggestion to add the vyper-json script that I now realize doesn't have an issue for it.

Those impacted by the change would probably be happy to find that we added a new script to help them with their use case.

@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jun 27, 2019

I think you guys are misunderstanding what I am saying. I am not saying mypy shouldn't be adopted, I am saying that a single use case that is difficult to annotate is, should rather be difficult to annotate than force backwards incompatibility. I see compatibility to outwards facing APIs as critical to the usage of vyper, we can make the switch but will have to be deprecation warning first for b11 and then disabled in b12.

We changed the interface before and it was painful, just trying to learn from previous mistakes.

@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jun 27, 2019

@cburgdorf @davesque

OK (bear with me here) so I am not sure why in this specific case it's a problem and am trying to understand something:
I ask the the function to provide me either of 2 output types: a list or a dict format. From the calling perspective it's very clear what going on, and this is even clearer than a type annotation because it happens at point of calling. Looking over the function again think the only change I would make is make output type a mandatory kwarg parameter. But if you have function that says compile_codes(codes, output_type='dict') or perhaps a function that you pass formatter into render(result, formater=Formatter) Why not expect different types as a result?

Why is it a bad practice to return different types from a function, if you ask for that type at calling time? I can see the argument if the output is not dependent on the call parameters - as you have no idea how to handle a call. But being able to this in python is great, and to me has a lot of utility.

@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jun 27, 2019

Good news: I scanned most implementations I could find calling compile_codes and they don't use list type, so we can probably just #yolo it.

@davesque

This comment has been minimized.

Copy link
Contributor Author

davesque commented Jun 27, 2019

@jacqueswww That's convenient if true. Are you remembering that the default result type was list? I ask because I also reviewed a number of call sites for compile_codes in the vyper codebase. Then, I realized that I'd forgotten that the default return type was list and so I had to go back and re-examine everything.

As for the exact reason why it's not convenient, here's a commit that causes mypy to complain (I didn't give an annotation for the interface_codes arg because I'm still trying to figure out the best way to do that): davesque@f36931b .

Here's one of the errors that mypy throws that relates to the return statement in the implementation of compile_code:

vyper/compiler.py:271: error: No overload variant of "__getitem__" of "list" matches argument type "str"
vyper/compiler.py:271: note: Possible overload variants:
vyper/compiler.py:271: note:     def __getitem__(self, int) -> Dict[str, Any]
vyper/compiler.py:271: note:     def __getitem__(self, slice) -> List[Dict[str, Any]]

The problem is that the return type could be a list or a dict, so it has to be a Union of those two types. But then mypy correctly complains that a str value isn't a correct value to use as an index into the result of compile_codes because the result could be a list and a str wouldn't work. So mypy is requiring that every type that is part of that union type will implement a __getitem__ that accepts strings. I think you can fix things like that by adding an isinstance check just above the return statement but it's better to avoid doing that everywhere.

Those sorts of problems will come up for any kind of function that potentially returns different types. I think it's good that mypy complains about stuff like this because it highlights an API that is sort of ambiguous and potentially a source of bugs. A possibly better approach would be to just have two different, appropriately named, functions that return results with the desired structure.

As for functions which take formatters as args, I guess I would try to design such an API so that the formatter only modifies the structure of the result within a single type. So a function which does string templating or something might accept a formatter which could build the result differently, but the result would still be a string either way.

@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jun 27, 2019

Yes so far I can trace most use the compile_code will check tomorrow again (a bit late here and could've checked wrong).

@cburgdorf

This comment has been minimized.

Copy link
Contributor

cburgdorf commented Jun 28, 2019

I ask the the function to provide me either of 2 output types: a list or a dict format. From the calling perspective it's very clear what going on, and this is even clearer than a type annotation because it happens at point of calling

You are generally right that reading compile_codes(codes, output_type='dict') is clear to a human reader.

However, it isn't clear enough to be useful to perform static analyzing on it. A static analyzer can't tell the difference between compile_codes(codes, output_type='dict') or compile_codes(codes, output_type='dicct').

Sure compile_codes will throw an exception if output_type is not with the expected values but this is a runtime thing. Mypy is all about catching errors during development time.

However, there is a way to have a function return different types based on its inputs which is entirely type safe and statically analyzable. E.g. you can have a function parse_to_unicorn that returns a type T where the actual type behind T changes based on the input parameter parser.

def parse_to_unicorn(raw: bytes, parser: UnicornParser[T]) -> T:
    return parser.parse(raw)

But because this code uses generics, mypy knows exactly the return type of parse_to_unicorn at any time based on the parameters it was called with.

Here's the entire code for that.

from abc import ABC, abstractmethod
from typing import Generic, Tuple, TypeVar


T = TypeVar("T")


class UnicornParser(Generic[T], ABC):
    @abstractmethod
    def parse(self, raw: bytes) -> T:
        ...


class TupleUnicornParser(UnicornParser[Tuple[str, ...]]):
    def parse(self, raw: bytes) -> Tuple[str, ...]:
        return ("u", "n", "i", "c", "o", "r", "n")


class StringUnicornParser(UnicornParser[str]):
    def parse(self, raw: bytes) -> str:
        return "unicorn"


def parse_to_unicorn(raw: bytes, parser: UnicornParser[T]) -> T:
    return parser.parse(raw)


first_unicorn = parse_to_unicorn(b"", TupleUnicornParser())

# mypy knows that `first_unicorn` is of type `Tuple[str, ...]`

print(first_unicorn)  # prints: ('u', 'n', 'i', 'c', 'o', 'r', 'n')

second_unicorn = parse_to_unicorn(b"", StringUnicornParser())

# mypy knows that `second_unicorn` is of type `str`

print(second_unicorn)  # prints: unicorn

# mypy rightfully rejects this with: `Unsupported operand types for + ("Tuple[str, ...]" and "str")
# first_unicorn + second_unicorn
@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jun 28, 2019

@cburgdorf Looking at the above example - does this mean that the only way to return different types in the mypy sphere is by creating extra classes using generics?

How would it work if I wanted to pass a function pointer instead of a class in the above example? ie. it's after all a class with only one function.

@cburgdorf

This comment has been minimized.

Copy link
Contributor

cburgdorf commented Jun 28, 2019

Looking at the above example - does this mean that the only way to return different types in the mypy sphere is by creating extra classes using generics?

Nope, generics work both for classes and functions. I just did it class based to show case a more complete example.

I'm with you that classes with single functions are generally a smell and are often better to be replaced with standalone functions.

The generics work out the same.

from typing import Callable, Tuple, TypeVar


T = TypeVar("T")


def tuple_unicorn_parser(raw: bytes) -> Tuple[str, ...]:
    return ("u", "n", "i", "c", "o", "r", "n")


def string_unicorn_parser(raw: bytes) -> str:
    return "unicorn"


def parse_to_unicorn(raw: bytes, parser: Callable[[bytes], T]) -> T:
    return parser(raw)


first_unicorn = parse_to_unicorn(b"", tuple_unicorn_parser)

# mypy knows that `first_unicorn` is of type `Tuple[str, ...]`

print(first_unicorn)  # prints: ('u', 'n', 'i', 'c', 'o', 'r', 'n')

second_unicorn = parse_to_unicorn(b"", string_unicorn_parser)

# mypy knows that `second_unicorn` is of type `str`

print(second_unicorn)  # prints: unicorn

# mypy rightfully rejects this with: `Unsupported operand types for + ("Tuple[str, ...]" and "str")
# first_unicorn + second_unicorn
@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jun 28, 2019

Ok that's great :)

@davesque

This comment has been minimized.

Copy link
Contributor Author

davesque commented Jun 28, 2019

@jacqueswww @cburgdorf Yeah, I should have mentioned generics in the context of the formatter example.

@davesque davesque force-pushed the davesque:type-checking branch from 052e8c6 to edbff67 Jun 28, 2019
@davesque

This comment has been minimized.

Copy link
Contributor Author

davesque commented Jul 1, 2019

@jacqueswww Just wanted to ping you on this. Still feel like doing a #yolo on it?

@jacqueswww

This comment has been minimized.

Copy link
Collaborator

jacqueswww commented Jul 1, 2019

Yes compile_code output remains the same so we can go ahead.

@davesque

This comment has been minimized.

Copy link
Contributor Author

davesque commented Jul 2, 2019

I believe I also included all of @cburgdorf 's suggestions. I'll go ahead and merge this.

@davesque davesque merged commit 18d140f into ethereum:master Jul 2, 2019
3 checks passed
3 checks passed
ci/circleci: lint Your tests passed on CircleCI!
Details
ci/circleci: py36-core Your tests passed on CircleCI!
Details
ci/circleci: py37-core Your tests passed on CircleCI!
Details
@davesque davesque deleted the davesque:type-checking branch Jul 3, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.