Type inference, in other words deducing the type of an object is one of the keystones of Pyccel. It is what allows us to correctly declare our variables in the generated code and thereby obtain the expected results. The majority of the type inference is found in the semantic stage, however there are a few objects whose type cannot be deduced. The type of these objects is inferred from type annotations. As such there are a few steps of the type inference which occur in the syntactic stage.
There are two types of type annotations that are found in Pyccel-compatible code. The first, simplest case is the case of Python type annotations. E.g:
a : int
The annotations on these objects are parsed by Python's ast
module. This ensures that the code is syntactically correct. The object can be visited in the same way as any other node in the code.
This treatment is very simple, however it is not trivial to identify the result as a type annotation. As a result the resulting PyccelSymbol
/DottedName
/etc is stored as an element of the dtypes
property of the class pyccel.ast.type_annotations.SyntacticTypeAnnotation
. This allows all type annotations to be handled in the same function in the semantic stage regardless of whether they are function argument annotations, variable annotations, or something else.
The second type of annotation is annotations in strings. E.g:
a : 'int'
This kind of annotation is equally important as it allows us to add extra information (e.g. 'const int'
) or refer to types which are not fully initialised (e.g. passing as a class method argument an instance of the same class).
In order to ensure that the syntax inside the string is correct, the expression is parsed immediately in the syntactic stage. This is done using the package textx
. This tool creates a parser for a language described by a file. The files describing the type language can be found in the folder pyccel/parser/grammar/
. The results generated by textx
are stored in auto-generated classes or classes defined in the folder pyccel/parser/syntax/
. As these instances are created by textx
we do not have full control over their contents. Notably this means that we cannot pickle the instance (pickling is used for header files to accelerate compilation). It is therefore important to transfer this information to a new class. The pyccel.ast.type_annotations.SyntacticTypeAnnotation
class is used for this purpose. It stores all the information described by either a UnionTypeStmt
or a Type
(see pyccel/parser/grammar/headers.tx
for definitions).
Therefore at the end of the syntactic stage, the syntax of all type annotations has been verified, and the raw information is stored in a pyccel.ast.type_annotations.SyntacticTypeAnnotation
object ready to be treated in the semantic stage.
The semantic stage handles two types of type inference:
- Inferring types from type annotations.
- Inferring types from assignments.
Each of these cases must be handled in different ways however there are many unifying factors which come into play once the types are deduced. Namely the verification that the types are coherent (i.e. that we are not changing the type of an existing variable), and the creation of any necessary variables. Both of these are usually handled by the function pyccel.parser.semantic.SemanticParser._assign_lhs_variable
.
After the syntactic stage any type annotations should be stored in a pyccel.ast.type_annotations.SyntacticTypeAnnotation
. The critical function is therefore pyccel.parser.semantic.SemanticParser._visit_SyntacticTypeAnnotation
. This function converts the annotation to either a pyccel.ast.type_annotations.VariableTypeAnnotation
or a pyccel.ast.type_annotations.FunctionTypeAnnotation
. Union types exist in Python. As a result a type annotation may indicate more than one type. E.g:
a : (int | float)
In order to make it easier to handle the result of pyccel.parser.semantic.SemanticParser._visit_SyntacticTypeAnnotation
, the VariableTypeAnnotation
s and FunctionTypeAnnotation
s are always stored in a pyccel.ast.type_annotations.UnionTypeAnnotation
, even if there is only one possible type.
These objects contain all the information necessary to create a Variable from a type annotation.
When assignments occur in the code types must also be inferred. This allows new variables to be declared implicitly, and also enables us to verify that the types of existing variables do not change. In this case the type inference is done via the AST nodes. Each node contains the logic necessary to deduce its type information from the arguments passed to it. The resulting object, which is found on the right hand side of an assignment can be used to verify or define the type of the object on the left hand side of the assignment.
The data types in Pyccel are designed around two super classes:
PrimitiveType
: Representing the category of datatype (integer/floating point/etc)PyccelType
: Representing the actual type of the object in Python
Subclasses of PyccelType
generally fall into one of two categories:
FixedSizeType
ContainerType
The types can be compared using either the is
operator or the ==
operator. These operators have slightly different behaviour. All instances of PyccelType
are singletons so the is
operator tests if the types are identical. However the ==
operator tests if the types are compatible. For example PythonNativeFloat() == NumpyFloat64Type()
will return true. This operator should therefore be used when permissive behaviour is required (e.g. when adding elements to a list of PythonNativeFloat()
we are capable of adding an instance with the type NumpyFloat64Type()
even if this would not be strictly homogeneous in Python).
A FixedSizeType
is an object whose size in memory is known and cannot change from one instance to another (e.g. float64
, int32
, void
). In most cases the developer will need the sub-class FixedSizeNumericType
which refers to the subset of fixed size types which contain numeric values. These objects are characterised by a primitive_type
describing the category of datatype (integer/floating point/etc) and a precision
. They additionally implement two magic methods:
__add__
(+)__and__
(&)
The add operator describes what happens when two numeric types are combined in an arithmetic operator. In almost all cases this is sufficient to describe all resulting datatypes. Special cases (e.g. float for integer division) are handled in the associated operator in ast.operators
or ast.bitwise_operators
.
The and operator describes what happens when two numeric types are combined in a bitwise comparison operator. This only applies to integers and booleans.
When using these operators on an unknown number of arguments it can be useful to use NativeGeneric()
as a starting point for the sum.
A ContainerType
is an object which is comprised of FixedSizeType
objects (e.g. ndarray
,list
,tuple
, custom class). The sub-class HomogeneousContainerType
describes containers which contain homogeneous data. These objects are characterised by an element_type
, a rank
(and associated container_rank
) and an order
. The elements of a HomogeneousContainerType
are instances of PyccelType
, but they can be either FixedSizeType
s or ContainerType
s. The container_rank
is an integer equal to the number of indices necessary to index the container and get an element of type element_type
. As these elements may also be indexable the rank
property allows us to get the number of indices necessary to obtain a scalar element. It is the sum of all the container_rank
s in the nested types. The order
specifies the order in which the indices should be used to index the object. This is discussed in detail in the order docs.
HomogeneousContainerType
s also contain some utility functions. They implement primitive_type
and precision
to get the properties of the internal FixedSizeType
(even if that type is inside another HomogeneousContainerType
). They also implement switch_basic_type
which creates a new HomogeneousContainerType
which is similar to the current HomogeneousContainerType
. The only difference is that the FixedSizeType
is exchanged. This is useful when we want to preserve information about the container but need to change the type. For example, when we divide an integer by another we get a floating point type. When we divide a NumPy array or a CuPy array of integers by an integer (or array of integers) we get a NumPy/CuPy array of floating point numbers (with default Python precision). In order to preserve the container type we therefore call switch_basic_type
. So for the division in the case of NumPy arrays, we want to change the type from np.ndarray[int]
to np.ndarray[float]
. This is done in one line:
new_class_type = class_type.switch_basic_type(NativePythonFloat())
instead of the multiple lines that would be needed without this function. The advantage of this is seen most clearly when we consider a function acting on a more complex type e.g. list[np.ndarray[float]]
in this case without the switch_basic_type
function, the equivalent code would be much longer:
new_container_types = []
old_type = class_type
while isinstance(old_type, ContainerType):
new_container_types.append(type(old_type))
old_type = old_type.element_type
new_type = old_type
for container in new_container_types:
new_type = container(new_type)
The switch_basic_type
cannot be implemented generally in PyccelType
as there is no logical interpretation for an inhomogeneous ContainerType
, however the function is also implemented (as the identity function) for FixedSizeType
s so switch_basic_type
can be used without the need for type checks (generally inhomogeneous containers will not be valid arguments to classes which may need to use the switch_basic_type
function).
HomogeneousContainerType
s also contain a switch_rank
function. This function is similar to switch_basic_type
in that it is used to obtain a type which is similar in all but one characteristic. It is usually used to reduce the rank of an object, for example when calculating the type of a slice, however in the future it can also be used to increase the size of the type (e.g. to implement np.newaxis
), in this case an order may need to be provided to add additional context. Increasing the rank is only possible for multi-dimensional types (e.g. NumpyNDArrayType
) however the rank can be decreased for any ContainerType
. If the rank is reduced by more than the container_rank
, this function is called recursively.
For multi-dimensional HomogeneousContainerType
s (e.g. NumpyNDArrayType
) the function swap_order
is also implemented. This inverts the ordering, changing from 'C' to 'F' or 'F' to 'C' if the rank is greater than 1.
In order to access the internal FixedSizeType
, PyccelType
also implements a datatype
property. This makes more sense in the case of a HomogeneousContainerType
however it is also implemented (as the identity function) for FixedSizeType
s so the low-level type can be obtained without the need for type checks.