Skip to content

Latest commit

 

History

History
96 lines (66 loc) · 11.3 KB

type_inference.md

File metadata and controls

96 lines (66 loc) · 11.3 KB

Type inference

Type inference, in other words deducing the type of an object is one of the keystones of Pyccel. It is what allows us to correctly declare our variables in the generated code and thereby obtain the expected results. The majority of the type inference is found in the semantic stage, however there are a few objects whose type cannot be deduced. The type of these objects is inferred from type annotations. As such there are a few steps of the type inference which occur in the syntactic stage.

Syntactic Stage

There are two types of type annotations that are found in Pyccel-compatible code. The first, simplest case is the case of Python type annotations. E.g:

a : int

The annotations on these objects are parsed by Python's ast module. This ensures that the code is syntactically correct. The object can be visited in the same way as any other node in the code.

This treatment is very simple, however it is not trivial to identify the result as a type annotation. As a result the resulting PyccelSymbol/DottedName/etc is stored as an element of the dtypes property of the class pyccel.ast.type_annotations.SyntacticTypeAnnotation. This allows all type annotations to be handled in the same function in the semantic stage regardless of whether they are function argument annotations, variable annotations, or something else.

The second type of annotation is annotations in strings. E.g:

a : 'int'

This kind of annotation is equally important as it allows us to add extra information (e.g. 'const int') or refer to types which are not fully initialised (e.g. passing as a class method argument an instance of the same class).

In order to ensure that the syntax inside the string is correct, the expression is parsed immediately in the syntactic stage. This is done using the package textx. This tool creates a parser for a language described by a file. The files describing the type language can be found in the folder pyccel/parser/grammar/. The results generated by textx are stored in auto-generated classes or classes defined in the folder pyccel/parser/syntax/. As these instances are created by textx we do not have full control over their contents. Notably this means that we cannot pickle the instance (pickling is used for header files to accelerate compilation). It is therefore important to transfer this information to a new class. The pyccel.ast.type_annotations.SyntacticTypeAnnotation class is used for this purpose. It stores all the information described by either a UnionTypeStmt or a Type (see pyccel/parser/grammar/headers.tx for definitions).

Therefore at the end of the syntactic stage, the syntax of all type annotations has been verified, and the raw information is stored in a pyccel.ast.type_annotations.SyntacticTypeAnnotation object ready to be treated in the semantic stage.

Semantic Stage

The semantic stage handles two types of type inference:

  1. Inferring types from type annotations.
  2. Inferring types from assignments.

Each of these cases must be handled in different ways however there are many unifying factors which come into play once the types are deduced. Namely the verification that the types are coherent (i.e. that we are not changing the type of an existing variable), and the creation of any necessary variables. Both of these are usually handled by the function pyccel.parser.semantic.SemanticParser._assign_lhs_variable.

Inferring types from type annotations

After the syntactic stage any type annotations should be stored in a pyccel.ast.type_annotations.SyntacticTypeAnnotation. The critical function is therefore pyccel.parser.semantic.SemanticParser._visit_SyntacticTypeAnnotation. This function converts the annotation to either a pyccel.ast.type_annotations.VariableTypeAnnotation or a pyccel.ast.type_annotations.FunctionTypeAnnotation. Union types exist in Python. As a result a type annotation may indicate more than one type. E.g:

a : (int | float)

In order to make it easier to handle the result of pyccel.parser.semantic.SemanticParser._visit_SyntacticTypeAnnotation, the VariableTypeAnnotations and FunctionTypeAnnotations are always stored in a pyccel.ast.type_annotations.UnionTypeAnnotation, even if there is only one possible type.

These objects contain all the information necessary to create a Variable from a type annotation.

Inferring types from assignments

When assignments occur in the code types must also be inferred. This allows new variables to be declared implicitly, and also enables us to verify that the types of existing variables do not change. In this case the type inference is done via the AST nodes. Each node contains the logic necessary to deduce its type information from the arguments passed to it. The resulting object, which is found on the right hand side of an assignment can be used to verify or define the type of the object on the left hand side of the assignment.

Data types in Pyccel

The data types in Pyccel are designed around two super classes:

  • PrimitiveType : Representing the category of datatype (integer/floating point/etc)
  • PyccelType : Representing the actual type of the object in Python

Subclasses of PyccelType generally fall into one of two categories:

  • FixedSizeType
  • ContainerType

The types can be compared using either the is operator or the == operator. These operators have slightly different behaviour. All instances of PyccelType are singletons so the is operator tests if the types are identical. However the == operator tests if the types are compatible. For example PythonNativeFloat() == NumpyFloat64Type() will return true. This operator should therefore be used when permissive behaviour is required (e.g. when adding elements to a list of PythonNativeFloat() we are capable of adding an instance with the type NumpyFloat64Type() even if this would not be strictly homogeneous in Python).

Fixed Size Type

A FixedSizeType is an object whose size in memory is known and cannot change from one instance to another (e.g. float64, int32, void). In most cases the developer will need the sub-class FixedSizeNumericType which refers to the subset of fixed size types which contain numeric values. These objects are characterised by a primitive_type describing the category of datatype (integer/floating point/etc) and a precision. They additionally implement two magic methods:

  • __add__ (+)
  • __and__ (&)

The add operator describes what happens when two numeric types are combined in an arithmetic operator. In almost all cases this is sufficient to describe all resulting datatypes. Special cases (e.g. float for integer division) are handled in the associated operator in ast.operators or ast.bitwise_operators.

The and operator describes what happens when two numeric types are combined in a bitwise comparison operator. This only applies to integers and booleans.

When using these operators on an unknown number of arguments it can be useful to use NativeGeneric() as a starting point for the sum.

Container Type

A ContainerType is an object which is comprised of FixedSizeType objects (e.g. ndarray,list,tuple, custom class). The sub-class HomogeneousContainerType describes containers which contain homogeneous data. These objects are characterised by an element_type, a rank (and associated container_rank) and an order. The elements of a HomogeneousContainerType are instances of PyccelType, but they can be either FixedSizeTypes or ContainerTypes. The container_rank is an integer equal to the number of indices necessary to index the container and get an element of type element_type. As these elements may also be indexable the rank property allows us to get the number of indices necessary to obtain a scalar element. It is the sum of all the container_ranks in the nested types. The order specifies the order in which the indices should be used to index the object. This is discussed in detail in the order docs.

HomogeneousContainerTypes also contain some utility functions. They implement primitive_type and precision to get the properties of the internal FixedSizeType (even if that type is inside another HomogeneousContainerType). They also implement switch_basic_type which creates a new HomogeneousContainerType which is similar to the current HomogeneousContainerType. The only difference is that the FixedSizeType is exchanged. This is useful when we want to preserve information about the container but need to change the type. For example, when we divide an integer by another we get a floating point type. When we divide a NumPy array or a CuPy array of integers by an integer (or array of integers) we get a NumPy/CuPy array of floating point numbers (with default Python precision). In order to preserve the container type we therefore call switch_basic_type. So for the division in the case of NumPy arrays, we want to change the type from np.ndarray[int] to np.ndarray[float]. This is done in one line:

new_class_type = class_type.switch_basic_type(NativePythonFloat())

instead of the multiple lines that would be needed without this function. The advantage of this is seen most clearly when we consider a function acting on a more complex type e.g. list[np.ndarray[float]] in this case without the switch_basic_type function, the equivalent code would be much longer:

new_container_types = []
old_type = class_type
while isinstance(old_type, ContainerType):
    new_container_types.append(type(old_type))
    old_type = old_type.element_type

new_type = old_type
for container in new_container_types:
    new_type = container(new_type)

The switch_basic_type cannot be implemented generally in PyccelType as there is no logical interpretation for an inhomogeneous ContainerType, however the function is also implemented (as the identity function) for FixedSizeTypes so switch_basic_type can be used without the need for type checks (generally inhomogeneous containers will not be valid arguments to classes which may need to use the switch_basic_type function).

HomogeneousContainerTypes also contain a switch_rank function. This function is similar to switch_basic_type in that it is used to obtain a type which is similar in all but one characteristic. It is usually used to reduce the rank of an object, for example when calculating the type of a slice, however in the future it can also be used to increase the size of the type (e.g. to implement np.newaxis), in this case an order may need to be provided to add additional context. Increasing the rank is only possible for multi-dimensional types (e.g. NumpyNDArrayType) however the rank can be decreased for any ContainerType. If the rank is reduced by more than the container_rank, this function is called recursively.

For multi-dimensional HomogeneousContainerTypes (e.g. NumpyNDArrayType) the function swap_order is also implemented. This inverts the ordering, changing from 'C' to 'F' or 'F' to 'C' if the rank is greater than 1.

In order to access the internal FixedSizeType, PyccelType also implements a datatype property. This makes more sense in the case of a HomogeneousContainerType however it is also implemented (as the identity function) for FixedSizeTypes so the low-level type can be obtained without the need for type checks.