From 2349b5f6cf963175d885d89bcee43435e80fc0b1 Mon Sep 17 00:00:00 2001 From: Slava Pestov Date: Mon, 10 Nov 2025 18:53:05 -0500 Subject: [PATCH] docs: Update generics book for 2025 - Revised "Substitution Maps" chapter: - New "Subclassing" section - New "SIL Type Lowering" section - New "Opaque Result Types" chapter - Various smaller edits New PDF will be available shortly at https://download.swift.org/docs/assets/generics.pdf. --- docs/Generics/README.md | 7 +- docs/Generics/chapters/archetypes.tex | 255 +-- docs/Generics/chapters/basic-operation.tex | 80 +- .../chapters/building-generic-signatures.tex | 95 +- docs/Generics/chapters/compilation-model.tex | 226 ++- docs/Generics/chapters/completion.tex | 48 +- .../chapters/concrete-conformances.tex | 70 - docs/Generics/chapters/conformance-paths.tex | 117 +- docs/Generics/chapters/conformances.tex | 361 ++-- docs/Generics/chapters/declarations.tex | 211 +-- docs/Generics/chapters/existential-types.tex | 42 +- docs/Generics/chapters/extensions.tex | 90 +- docs/Generics/chapters/generic-signatures.tex | 178 +- docs/Generics/chapters/introduction.tex | 214 ++- docs/Generics/chapters/math-summary.tex | 6 +- ...rule-minimization.tex => minimization.tex} | 23 +- docs/Generics/chapters/monoids.tex | 121 +- .../Generics/chapters/opaque-result-types.tex | 1106 +++++++++++ .../Generics/chapters/opaque-return-types.tex | 436 ----- docs/Generics/chapters/preface.tex | 52 +- docs/Generics/chapters/property-map.tex | 48 +- docs/Generics/chapters/substitution-maps.tex | 1668 ++++++++++++----- .../chapters/symbols-terms-and-rules.tex | 100 +- docs/Generics/chapters/type-resolution.tex | 106 +- .../chapters/type-substitution-summary.tex | 49 +- docs/Generics/chapters/types.tex | 193 +- docs/Generics/generics.bib | 394 +++- docs/Generics/generics.tex | 82 +- 28 files changed, 4107 insertions(+), 2271 deletions(-) delete mode 100644 docs/Generics/chapters/concrete-conformances.tex rename docs/Generics/chapters/{rule-minimization.tex => minimization.tex} (83%) create mode 100644 docs/Generics/chapters/opaque-result-types.tex delete mode 100644 docs/Generics/chapters/opaque-return-types.tex diff --git a/docs/Generics/README.md b/docs/Generics/README.md index c3eeb3190fd11..a94819cd6239d 100644 --- a/docs/Generics/README.md +++ b/docs/Generics/README.md @@ -18,7 +18,7 @@ It's written in TeX, so to typeset the PDF yourself, you need a TeX distribution ### Using `make` -Running `make` in `docs/Generics/` will run `pdflatex` and `bibtex` in the right order to generate the final document with bibliography, index and cross-references: +Running `make` in `docs/Generics/` will run `pdflatex` and `bibtex` in the right order to generate the final document with bibliography, index, and cross-references: ``` cd docs/Generics/ @@ -61,16 +61,13 @@ This is a work in progress. The following chapters need some editing: -- Part II: - - Substitution Maps - Part IV: - Completion The following chapters are not yet written: - Part III: - - Opaque Return Types - Existential Types - Part IV: - The Property Map - - Rule Minimization + - Minimization diff --git a/docs/Generics/chapters/archetypes.tex b/docs/Generics/chapters/archetypes.tex index b74d266ecb758..b3a31c4fe3cc4 100644 --- a/docs/Generics/chapters/archetypes.tex +++ b/docs/Generics/chapters/archetypes.tex @@ -2,16 +2,16 @@ \begin{document} -\chapter{Archetypes}\label{genericenv} +\chapter{Archetypes}\label{chap:archetypes} -\lettrine{A}{n archetype encapsulates} a reduced type parameter and a generic signature, two semantic objects we met in \ChapRef{genericsig}. This representation is self-describing, so it can answer questions about protocol conformance and such without further context. An archetype thus behaves like a concrete type in many ways; it is the ``most general'' concrete type that satisfies the requirements of its type parameter. (Recall that a type parameter is essentially a \emph{name} that requires a separate generic signature to interpret.) Archetypes are created by mapping type parameters into a \emph{generic environment}, an object derived from a generic signature. Multiple environments can be instantiated from a single generic signature, each one generating a distinct family of archetypes. Exactly one of those is the \IndexDefinition{primary generic environment}\emph{primary} generic environment, whose archetypes are called \IndexDefinition{primary archetype}\emph{primary archetypes}. We will discuss these first. +\lettrine{A}{n archetype encapsulates} a reduced type parameter and a generic signature, two semantic objects we met in \ChapRef{chap:generic signatures}. This representation is self-describing, so it can answer questions about protocol conformance and such without further context. An archetype thus behaves like a concrete type in many ways; it is the ``most general'' concrete type that satisfies the requirements of its type parameter. (Recall that a type parameter is essentially a \emph{name} that requires a separate generic signature to interpret.) Archetypes are created by mapping type parameters into a \emph{generic environment}, an object derived from a generic signature. Multiple environments can be instantiated from a single generic signature, each one generating a distinct family of archetypes. Exactly one of those is the \IndexDefinition{primary generic environment}\emph{primary} generic environment, whose archetypes are called \IndexDefinition{primary archetype}\emph{primary archetypes}. We will discuss these first. A primary generic environment defines a correspondence between type parameters and primary archetypes: \begin{itemize} \item \IndexDefinition{map type into environment}\textbf{Mapping into an environment} recursively replaces type parameters in the given interface type with their archetypes from the given primary generic environment. \item \IndexDefinition{map type out of environment}\textbf{Mapping out of an environment} recursively replaces primary archetypes with their reduced type parameters. \end{itemize} -Mapping into an environment produces a type which no longer contains type parameters, but instead may contain \index{primary archetype}primary archetypes. We call this a \IndexDefinition{contextual type}\emph{contextual type}. +Mapping into an environment produces a type which no longer contains type parameters, but instead \index{type!containing archetypes}may contain \index{primary archetype}primary archetypes. We call this a \IndexDefinition{contextual type}\emph{contextual type}. Contextual types and primary archetypes are used in place of interface types inside the \index{expression}expressions that appear in generic function bodies, but archetypes don't surface as a distinct concept in the language model; it's a representation change. \index{SILGen}SILGen lowers expressions to \index{SIL}SIL instructions which consume and produce SIL values, so the types of SIL values are also contextual types. Finally, when \index{IRGen}IRGen generates code for a generic function, archetypes become \emph{values} representing the \index{runtime type metadata}runtime type metadata provided by the caller of the generic function. @@ -20,7 +20,7 @@ \chapter{Archetypes}\label{genericenv} \paragraph{Nesting.} A nested generic declaration with a distinct generic signature (because it introduced new generic parameters or requirements) also introduces a fresh primary generic environment; we do not ``inherit'' primary archetypes from the outer scope. All primary archetypes, including those that represent outer generic parameters, are instantiated from the innermost environment. (If the inner declaration does not declare new generic parameters or requirements, it has the same generic signature, and thus the same generic environment, as its parent context.) -This allows correct modeling of nested declarations that impose new requirements on outer generic parameters, which we discussed in \SecRef{requirements}. This was not supported prior to \IndexSwift{3.0}Swift~3 because outer primary archetypes were re-used. This simple program demonstrates this behavior: +This allows correct modeling of nested declarations that impose new requirements on outer generic parameters, which we discussed in \SecRef{sec:requirements}. This was not supported prior to \IndexSwift{3.0}Swift~3 because outer primary archetypes were re-used. This simple program demonstrates this behavior: \begin{Verbatim} struct Box { var contents: T? @@ -37,15 +37,15 @@ \chapter{Archetypes}\label{genericenv} } } \end{Verbatim} -The \texttt{take()} and \texttt{compare()} methods have different generic signatures, with \texttt{take()} inheriting the generic signature of struct~\texttt{Box}, and \texttt{compare()} adding a conformance requirement. The \index{type representation}type representation ``\tT'' resolves to the \emph{same} generic parameter type~\tT\ (or \rT) in all source locations where it appears. However, this generic parameter maps to two different archetypes in the generic environment of each function. Thus, the type of the expression ``\texttt{self.contents}'' is actually different in \texttt{take()} and \texttt{compare()}. Let's use the notation $\archetype{T}_d$ for the archetype obtained by mapping the type parameter \tT\ into the generic environment of some declaration~$d$. Here, we have two archetypes: +The \texttt{take()} and \texttt{compare()} methods have different generic signatures, with \texttt{take()} inheriting the generic signature of struct~\texttt{Box}, and \texttt{compare()} adding a conformance requirement. The \index{type representation}type representation ``\tT'' resolves to the \emph{same} generic parameter type~\tT\ (or \rT) in all source locations where it appears. However, this generic parameter maps to two different archetypes in the generic environment of each function. Thus, the type of the expression ``\texttt{self.contents}'' is actually different in \texttt{take()} and \texttt{compare()}. Let's use the notation \index{$\archetype{T}$}\index{$\archetype{T}$!z@\igobble|seealso{archetype}}$\archetype{T}_d$ for the archetype obtained by mapping the type parameter \tT\ into the generic environment of some declaration~$d$. Here, we have two archetypes: \begin{enumerate} \item $\archetype{T}_\text{take}$, which does not conform to any protocols. \item $\archetype{T}_\text{compare}$, which conforms to \texttt{Equatable}. \end{enumerate} -By working with archetypes instead of type parameters, the constraint solver knows this, without having to plumb the generic signature of the current context through everywhere that types of expressions must be reasoned about. +The \index{expression type checker}expression type checker is aware of this fact, without needing to plumb the generic signature of the current context through manually, because it operates on archetypes instead of type parameters. \paragraph{Generic environment kinds.} -We can form a unique \IndexDefinition{generic environment}generic environment for each combination of parameters, that depend on the generic environment's \emph{kind}. There are three kinds of generic environments, with the following parameters: +There are three \IndexDefinition{generic environment}generic environment kinds. Instances of each kind are uniquely allocated for each combination of input parameters: \begin{itemize} \item As we already said, every generic signature has exactly one \index{primary generic environment}\textbf{primary generic environment} of \index{primary archetype}primary archetypes. \begin{tightcenter} @@ -55,43 +55,43 @@ \chapter{Archetypes}\label{genericenv} }; \begin{scope}[on background layer] \node (aa) [genericenv, fit=(a), minimum width=15em] {}; -\node [genericenvlabel] at (aa.north) {primary generic environment}; +\node [genericenvlabel] at (aa.north) {primary generic environment:}; \end{scope} \end{tikzpicture} \end{tightcenter} -Primary generic environments preserve the sugared names of generic parameters for the printed representation of an archetype, so two generic signatures which are not equal pointers will instantiate distinct primary generic environments, even if those generic signatures are canonically equal. +Primary generic environments preserve the \index{sugared type}sugared names of \index{generic parameter type!type sugar}generic parameters for the printed representation of an archetype, so a primary generic environment depends on the \index{type pointer equality!generic environment}pointer identity of its generic signature; two generic signatures which are canonically equal but differ in type sugar will instantiate distinct primary generic environments. -\item When a declaration has an opaque return type, we can create an \index{opaque generic environment}\textbf{opaque generic environment} parameterized by a substitution map for the owner declaration's generic signature: +\item When a declaration has an opaque result type, we can create an \index{opaque generic environment}\textbf{opaque generic environment} parameterized by a substitution map for the owner declaration's generic signature: \begin{tightcenter} \begin{tikzpicture} \node [genericenvmatrix] { -\node (a) [genericenvpart] {\strut opaque type declaration};& +\node (a) [genericenvpart] {\strut opaque result declaration};& \node (b) [genericenvpart] {\strut substitution map};\\ }; \begin{scope}[on background layer] \node (ab) [genericenv, fit=(a)(b)] {}; -\node [genericenvlabel] at (ab.north) {opaque generic environment}; +\node [genericenvlabel] at (ab.north) {opaque generic environment:}; \end{scope} \end{tikzpicture} \end{tightcenter} -Unlike primary archetypes, \index{opaque archetype}\emph{opaque return type archetypes} are not scoped to the lexical scope of their owner declaration; they can appear anywhere that their owner declaration is visible. They also behave differently under substitution. It is legal for an interface type to contain an opaque archetype, in particular the return type of the owner declaration itself will contain one. Details are in \ChapRef{opaqueresult}. +Unlike primary archetypes, \index{opaque archetype}\emph{opaque archetypes} are not scoped to the lexical scope of their owner declaration; they can appear anywhere that their owner declaration is visible. They also behave differently under substitution. It is legal for an interface type to contain an opaque archetype, in particular the return type of the owner declaration itself will contain one. Details are in \ChapRef{chap:opaque result types}. \index{call expression} \index{expression} -\item An \index{opened generic environment}\textbf{opened generic environment} is created when an existential value is opened at a call expression. An \index{opened archetype}\emph{opened archetype} instantiated from such an environment represents the concrete payload stored inside a specific value of existential type. +\item An \index{existential generic environment}\textbf{existential generic environment} is created when an existential value is opened at a call expression. An \index{existential archetype}\emph{existential archetype} instantiated from such an environment represents the concrete payload stored inside the existential value. \begin{tightcenter} \begin{tikzpicture} \node [genericenvmatrix] { \node [genericenvpart] (a) {\strut generic signature};& -\node [genericenvpart] (b) {\strut existential type};& +\node [genericenvpart] (b) {\strut constraint type};& \node [genericenvpart] (c) {\strut unique ID};\\ }; \begin{scope}[on background layer] \node (abc) [genericenv, fit=(a)(b)(c)] {}; -\node [genericenvlabel] at (abc.north) {opened generic environment}; +\node [genericenvlabel] at (abc.north) {existential generic environment:}; \end{scope} \end{tikzpicture} \end{tightcenter} -Every opening expression gets a fresh unique ID, and therefore a new opened generic environment. In the AST, opened archetypes cannot ``escape'' from their opening expression; in SIL, they are introduced by an opening instruction and are similarly scoped by the dominance relation on the control flow graph. We will discuss existential types in \ChapRef{existentialtypes}. +Every opening expression gets a fresh unique ID, and therefore a new existential generic environment. In the AST, existential archetypes cannot ``escape'' from their opening expression; in SIL, they are introduced by an opening instruction and are similarly scoped by the dominance relation on the control flow graph. We will discuss existential types in \ChapRef{chap:existential types}. \end{itemize} \paragraph{Archetype equality.} Some notes about contextual types: @@ -103,10 +103,10 @@ \chapter{Archetypes}\label{genericenv} \item Archetypes from non-primary generic environments may appear in both interface types and contextual types, but we won't worry about those for now. \end{enumerate} -In the following, let $G$ be a generic signature, and write \IndexSet{type}{\TypeObj{G}}$\TypeObj{G}$ for the set of all interface types of~$G$, as before. Now, let's say that $\TypeObj{\EquivClass{G}}$ is the set of all contextual types that contain primary archetypes of $G$. We can understand mapping into and out of an environment as giving us a pair of functions: +In the following, let $G$ be a generic signature, and write \IndexSet{type}{\TypeObj{G}}$\TypeObj{G}$ for the set of all interface types of~$G$, as before. Now, let's say that $\TypeObjCtx{G}$ is the set of all contextual types that contain primary archetypes of $G$. We can understand mapping into and out of an environment as giving us a pair of functions: \begin{gather*} -\MapIn\colon\TypeObj{G}\longrightarrow\TypeObj{\EquivClass{G}}\\ -\MapOut\colon\TypeObj{\EquivClass{G}}\longrightarrow\TypeObj{G} +\MapIn\colon\TypeObj{G}\longrightarrow\TypeObjCtx{G}\\ +\MapOut\colon\TypeObjCtx{G}\longrightarrow\TypeObj{G} \end{gather*} In \SecRef{valid type params}, we defined the reduced type equality relation on the valid type parameters of~$G$, and in \SecRef{genericsigqueries}, we extended this to all of $\TypeObj{G}$ via the recursive definition of a reduced type. The fact that an archetype's type parameter is a \index{reduced type}reduced type means the following when it comes to equivalence of types: \begin{enumerate} @@ -122,16 +122,16 @@ \chapter{Archetypes}\label{genericenv} \section{Local Requirements}\label{local requirements} -If we have a type parameter $\tT \in \TypeObj{G}$, we can understand its behavior by looking at the list of all conformance, superclass and layout requirements with a subject type of~\tT\ that can be derived from~$G$. To recover this information, we can apply \index{generic signature query}generic signature queries to \tT\ and $G$, which must be given separately (\SecRef{genericsigqueries}). When working with an archetype~$\archetype{T} \in \TypeObj{\EquivClass{G}}$, we can instead ask the archetype for its \emph{local requirements}, because the archetype knows $G$. The local requirements are the following: +The behavior of a type parameter $\tT \in \TypeObj{G}$ can be understood by considering all conformance, superclass, and layout requirements with a subject type of~\tT\ that can be derived from~$G$. We can recover this using the \index{generic signature query}generic signature queries from \SecRef{genericsigqueries}, which take a $G$ and a \tT; that is, the caller must keep track of the~$G$. If instead we have an archetype that represents \tT\ in a generic signature~$G$, we can directly consult a list of \emph{local requirements} recorded within the archetype itself. We pretend we have a primary archetype~$\archetype{T} \in \TypeObjCtx{G}$ below, but everything we say in this section is also true of the other archetype kinds as well. An archetype $\archetype{T}$ records the following: \begin{itemize} \item \textbf{Required protocols:} A list of all protocols \texttt{P} such that $G\vdash\TP$. This is the result of the \IndexQuery{getRequiredProtocols}$\Query{getRequiredProtocols}{}$ generic signature query. \item \textbf{Superclass bound:} A class type \tC\ such that $G\vdash\TC$. This is the result of the \IndexQuery{getSuperclassBound}$\Query{getSuperclassBound}{}$ generic signature query. \item \textbf{Requires class flag:} True if $G\vdash\TAnyObject$. This is the result of the \IndexQuery{requiresClass}$\Query{requiresClass}{}$ generic signature query. \item \textbf{Layout constraint:} A more fine-grained view than the previous, to differentiate between Objective-C and Swift-native reference counting. This is the result of the \IndexQuery{getLayoutConstraint}$\Query{getLayoutConstraint}{}$ generic signature query. \end{itemize} -When we create an archetype, we issue the \IndexQuery{getLocalRequirements}$\Query{getLocalRequirements}{}$ generic signature query to gather all of the above information in one shot. Local requirements can be inspected directly by looking at an archetype. Even more importantly, they also allow archetypes to take on the behaviors of concrete types. +When we create an archetype, we use the \IndexQuery{getLocalRequirements}$\Query{getLocalRequirements}{}$ generic signature query to gather all of the above information in one shot. -\paragraph{Qualified name lookup.} An archetype can serve as the base type for a \index{qualified lookup!archetype base}qualified name lookup (\SecRef{name lookup}). The visible members are those of the archetype's required protocols, superclass declaration, and finally, the concrete protocol conformances of the superclass. This explains how we type check a member reference expression where the base type is an archetype. +\paragraph{Qualified name lookup.} An archetype can serve as the base type for a \index{qualified lookup!archetype base}qualified name lookup (\SecRef{name lookup}). The visible members are those of the archetype's required protocols, \index{superclass declaration}superclass declaration, and finally, the concrete protocol conformances of the superclass. This explains how we type check a member reference expression where the base type is an archetype. \paragraph{Global conformance lookup.} Consider the two declarations below, together with the generic signature~\verb|, Box: Proto>|: \begin{Verbatim} @@ -149,7 +149,7 @@ \section{Local Requirements}\label{local requirements} Thus, the subject type of an abstract conformance can be an archetype, and not just a type parameter. Also notice how the second conformance is formed from the normal conformance $\ConfReq{Base<\rT>}{OtherProto}$ with this conformance substitution map: \begin{gather*} \ConfReq{Base<\rT>}{OtherProto}\otimes\SubstMap{\SubstType{\rT}{$\archetype{Elt}$}}\\ -\qquad\qquad {} = \ConfReq{Base<$\archetype{Elt}$>}{OtherProto} +\qquad {} = \ConfReq{Base<$\archetype{Elt}$>}{OtherProto} \end{gather*} We will discuss substitution maps with contextual replacement types shortly, but first let's formalize the above behavior of \index{global conformance lookup!with archetype}global conformance lookup. @@ -157,7 +157,7 @@ \section{Local Requirements}\label{local requirements} \begin{enumerate} \item If $G\vdash\TC$ for some class type~\tC, and \tC\ conforms to \texttt{P}, then the archetype \emph{conforms concretely} to \texttt{P}. The class type \tC\ may contain type parameters, so global conformance lookup recursively calls itself on~$\MapIn(\tC)$: \[\PP \otimes \archetype{T} := \PP \otimes \MapIn(\tC)=\ConfReq{$\MapIn(\tC)$}{P}\] -The result is actually wrapped in an \index{inherited conformance}\emph{inherited conformance}, which we recall from \ChapRef{conformances} exists to give it the conforming type~$\archetype{T}$ rather than~$\MapIn(\tC)$. +The result is actually wrapped in an \index{inherited conformance}\emph{inherited conformance}, which we recall from \ChapRef{chap:conformances} exists to give it the conforming type~$\archetype{T}$ rather than~$\MapIn(\tC)$. \item If $G\vdash\TP$, the archetype \emph{conforms abstractly}, and thus global conformance lookup returns an abstract conformance: \[\PP\otimes\archetype{T} := \ConfReq{$\archetype{T}$}{P}\] \item Otherwise, $\archetype{T}$ does not conform to \texttt{P}, and global conformance lookup returns an invalid conformance. @@ -168,56 +168,56 @@ \section{Local Requirements}\label{local requirements} \AssocConf{Self.U}{Q}\otimes \ConfReq{$\archetype{T}$}{P} := \PQ\otimes\MapIn(\texttt{T.U})=\ConfReq{$\archetype{T.U}$}{Q} \end{gather*} -\section{Archetype Substitution}\label{archetypesubst} +\section{Primary Archetypes}\label{archetypesubst} Recall that $\SubMapObj{G}{H}$ is set of substitution maps with input generic signature $G$ and output generic signature $H$, that is, their replacement types are interface types. To understand how type substitution behaves when the original type is a contextual type, we define a new form of our $\otimes$ operator: \[ -\TypeObj{\EquivClass{G}}\otimes\SubMapObj{G}{H}\longrightarrow\TypeObj{H} +\TypeObjCtx{G}\otimes\SubMapObj{G}{H}\longrightarrow\TypeObj{H} \] -To apply a substitution map $\Sigma$ to a contextual type $\tY\in\TypeObj{\EquivClass{G}}$, we first map the contextual type out of its environment, and then apply the substitution map to this interface type: +To apply a substitution map $\Sigma$ to a contextual type $\tY\in\TypeObjCtx{G}$, we first map the contextual type out of its environment, and then apply the substitution map to this interface type: \[\tY\otimes \Sigma = \MapOut(\tY)\otimes \Sigma\] If the replacement types of $\Sigma$ are interface types, we always get an interface type back, regardless of whether the original type was an interface type or a contextual type. -Substitution maps can also have contextual replacement types. This will be important in \SecRef{checking generic arguments}, so we will introduce the notation now. We write \IndexSet{sub}{\SubMapObj{G}{H}}$\SubMapObj{G}{\EquivClass{H}}$ for the set of substitution maps whose replacement types are drawn from $\TypeObj{\EquivClass{H}}$. Now, if~$\Sigma$ has contextual replacement types, we always get a contextual type back, regardless of whether the original type was an interface type or contextual type: +Substitution maps can also have contextual replacement types. This will be important in \SecRef{checking generic arguments}, so we will introduce the notation now. We write \IndexSet{sub}{\SubMapObj{G}{H}}$\SubMapObjCtx{G}{H}$ for the set of substitution maps whose replacement types are drawn from $\TypeObjCtx{H}$. Now, if~$\Sigma$ has contextual replacement types, we always get a contextual type back, regardless of whether the original type was an interface type or contextual type: \begin{gather*} -\TypeObj{G}\otimes\SubMapObj{G}{\EquivClass{H}}\longrightarrow\TypeObj{\EquivClass{H}}\\ -\TypeObj{\EquivClass{G}}\otimes\SubMapObj{G}{\EquivClass{H}}\longrightarrow\TypeObj{\EquivClass{H}} +\TypeObj{G}\otimes\SubMapObjCtx{G}{H}\longrightarrow\TypeObjCtx{H}\\ +\TypeObjCtx{G}\otimes\SubMapObjCtx{G}{H}\longrightarrow\TypeObjCtx{H} \end{gather*} \index{substitution map composition}Substitution map composition generalizes like so: \begin{gather*} -\SubMapObj{F}{G}\otimes\SubMapObj{G}{\EquivClass{H}}\longrightarrow\SubMapObj{F}{\EquivClass{H}}\\ -\SubMapObj{F}{\EquivClass{G}}\otimes\SubMapObj{G}{\EquivClass{H}}\longrightarrow\SubMapObj{F}{\EquivClass{H}} +\SubMapObj{F}{G}\otimes\SubMapObjCtx{G}{H}\longrightarrow\SubMapObjCtx{F}{H}\\ +\SubMapObjCtx{F}{G}\otimes\SubMapObjCtx{G}{H}\longrightarrow\SubMapObjCtx{F}{H} \end{gather*} -\paragraph{Forwarding substitution map.} When working in the constraint solver, SILGen, or anywhere else that deals with both interface types and contextual types, a special substitution map often appears. If $G$ is a generic signature, the \IndexDefinition{forwarding substitution map}\emph{forwarding substitution map} of $G$, denoted \index{$1_{\EquivClass{G}}$}\index{$1_{\EquivClass{G}}$!z@\igobble|seealso{forwarding substitution map}}$1_{\EquivClass{G}}$, sends each generic parameter \ttgp{d}{i} of $G$ to the corresponding contextual type $\MapIn(\ttgp{d}{i})$ in the primary generic environment of $G$: -\[1_{\EquivClass{G}}:=\{\ldots,\,\ttgp{d}{i}\mapsto\MapIn(\ttgp{d}{i}),\,\ldots\}\] -Note that $1_{\EquivClass{G}}\in\SubMapObj{G}{\EquivClass{G}}$. The forwarding substitution map looks similar to the \index{identity substitution map}identity substitution map $1_G\in\SubMapObj{G}{G}$ from \SecRef{submapcomposition}. We recall this substitution map sends every generic parameter to itself: +\paragraph{Forwarding substitution map.} In the \index{expression type checker}expression type checker, \index{SILGen}SILGen, and other places that contextual types appear, a special substitution map often comes in handy. If $G$ is a generic signature, the \IndexDefinition{forwarding substitution map}\emph{forwarding substitution map} of $G$, denoted \index{$\FwdMap{G}$}\index{$\FwdMap{G}$!z@\igobble|seealso{forwarding substitution map}}$\FwdMap{G}$, sends each generic parameter \ttgp{d}{i} of $G$ to the corresponding contextual type $\MapIn(\ttgp{d}{i})$ in the primary generic environment of $G$: +\[\FwdMap{G}:=\{\ldots,\,\ttgp{d}{i}\mapsto\MapIn(\ttgp{d}{i}),\,\ldots\}\] +Note that $\FwdMap{G}\in\SubMapObjCtx{G}{G}$. The forwarding substitution map looks similar to the \index{identity substitution map}identity substitution map $1_G\in\SubMapObj{G}{G}$ from \SecRef{sec:composition}. We recall this substitution map sends every generic parameter to itself: \[1_{G}:=\{\ldots,\,\ttgp{d}{i}\mapsto\ttgp{d}{i},\,\ldots\}\] -If $\tX\in\TypeObj{G}$ is an interface type and $\tY\in\TypeObj{\EquivClass{G}}$ is a contextual type, we can apply $1_G$ and $1_{\EquivClass{G}}$ to each one: +If $\tX\in\TypeObj{G}$ is an interface type and $\tY\in\TypeObjCtx{G}$ is a contextual type, we can apply $1_G$ and $\FwdMap{G}$ to each one: \begin{gather*} \tX\otimes 1_G=\tX\\ -\tX\otimes 1_{\EquivClass{G}}=\MapIn(\tX)\\ +\tX\otimes \FwdMap{G}=\MapIn(\tX)\\ \tY\otimes 1_G=\MapOut(\tY)\otimes 1_G=\MapOut(\tY)\\ -\tY\otimes 1_{\EquivClass{G}}=\MapOut(\tY)\otimes 1_{\EquivClass{G}}=\MapIn(\MapOut(\tY))=\tY +\tY\otimes \FwdMap{G}=\MapOut(\tY)\otimes \FwdMap{G}=\MapIn(\MapOut(\tY))=\tY \end{gather*} -Or in words, +Or in other words: \begin{itemize} -\item The identity substitution map leaves an interface type unchanged, while it maps a contextual type out of the environment. -\item The forwarding substitution map leaves a contextual type unchanged, while it maps an interface type into the environment. +\item The \textbf{identity substitution map} $1_G$ leaves an interface type unchanged, while it maps a contextual type out of the environment. +\item The \textbf{forwarding substitution map} $\FwdMap{G}$ leaves a contextual type unchanged, while it maps an interface type into the environment. \end{itemize} -We can use this to convert between substitution maps with contextual and interface replacement types, by composing the substitution map with the \index{identity substitution map}identity or forwarding substitution map on the right. If $\Sigma\in\SubMapObj{G}{H}$ and $\Sigma^\prime\in\SubMapObj{G}{\EquivClass{H}}$, we have: +We can use this to convert between substitution maps with contextual and interface replacement types, by composing the substitution map with the \index{identity substitution map}identity or forwarding substitution map on the right. If $\Sigma\in\SubMapObj{G}{H}$ and $\Sigma^\prime \in \SubMapObjCtx{G}{H}$, we have: \begin{gather*} -\Sigma\otimes 1_{\EquivClass{H}}\in\SubMapObj{G}{\EquivClass{H}}\\ -\Sigma^\prime\otimes 1_H\in\SubMapObj{G}{H} +\Sigma \otimes \FwdMap{H}\in\SubMapObjCtx{G}{H}\\ +\Sigma^\prime \otimes 1_H\in\SubMapObj{G}{H} \end{gather*} Furthermore, if $\tX\in\TypeObj{G}$, \begin{gather*} -\tX\otimes (\Sigma\otimes 1_{\EquivClass{G}}) = \mathsf{in}_{H}(\tX \otimes \Sigma)\\ -\tX\otimes (\Sigma^\prime \otimes 1_G) = \mathsf{out}_{H}(\tX \otimes \Sigma^\prime) +\tX\otimes (\Sigma\otimes \FwdMap{H}) = \mathsf{in}_{H}(\tX \otimes \Sigma)\\ +\tX\otimes (\Sigma^\prime \otimes 1_H) = \mathsf{out}_{H}(\tX \otimes \Sigma^\prime) \end{gather*} Finally, substitution maps support a \IndexDefinition{map replacement types out of environment}\textbf{map replacement types out of environment} operation which is more direct than composing with the identity substitution map: -\[\mathsf{out}_H\colon\SubMapObj{G}{\EquivClass{H}}\longrightarrow\SubMapObj{G}{H}\] +\[\mathsf{out}_H\colon\SubMapObjCtx{G}{H}\longrightarrow\SubMapObj{G}{H}\] \paragraph{Invariants.} In the implementation, a pair of predicates distinguish interface types from contextual types: \begin{itemize} @@ -246,11 +246,11 @@ \section{The Type Parameter Graph}\label{type parameter graph} We will define an object called the \emph{type parameter graph} of a generic signature, such that valid type parameters are \emph{paths} in this graph, and equivalence classes of type parameters, or archetypes, are the \emph{vertices}. Two equivalent type parameters define a pair of paths with the same destination vertex. We will have occasion to study other directed graphs later, so as usual we begin with the abstract definitions. \begin{definition} -A \IndexDefinition{directed graph}\index{graph|see{directed graph}}\emph{directed graph} is a pair $(V,\, E)$ consisting of a set of \IndexDefinition{vertex}vertices $V$ together with a \index{set}set of \IndexDefinition{edge}edges $E$, where every edge $e\in E$ has an associated \IndexDefinition{source vertex}\emph{source} vertex and a \IndexDefinition{destination vertex}\emph{destination} vertex, denoted $\Src(e)$ and $\Dst(e)$, respectively. +A \IndexDefinition{directed graph}\index{graph|see{directed graph}}\emph{directed graph} is a pair $(V,\, E)$ consisting of a \index{set!vertices}set of \IndexDefinition{vertex}vertices $V$ together with a \index{set!edges}set of \IndexDefinition{edge}edges $E$, where every edge $e\in E$ has an associated \IndexDefinition{source vertex}\emph{source} vertex and a \IndexDefinition{destination vertex}\emph{destination} vertex, denoted $\Src(e)$ and $\Dst(e)$, respectively. Some books (for example, \cite{grimaldi}) define directed graphs such that an edge is exactly an \index{ordered pair}ordered pair of vertices, $(\Src(e),\Dst(e))$. This disallows graphs where two edges share the same source and destination. Our formulation is more general because the source and destination are merely \emph{properties} of an edge, which may also have an additional \IndexDefinition{labeled edge}\emph{label} of its own. This is called a \emph{directed multi-graph} in \cite{alggraph}. -A finite directed graph can be visualized by plotting each vertex as a point, and each edge as an arrow pointing from the source towards the destination. If either $V$ or $E$ is infinite, then we say that $(V, E)$ is an \IndexDefinition{infinite graph}\emph{infinite graph}. We can still visualize some finite \IndexDefinition{subgraph}\emph{subgraph} $(V^\prime, E^\prime)$, where $V^\prime\subseteq V$ and $E^\prime\subseteq E$, if the remaining structure is understood somehow. +A finite directed graph can be visualized by plotting each vertex as a point, and each edge as an arrow pointing from the source towards the destination. If either $V$ or $E$ is infinite, then we say that $(V, E)$ is an \IndexDefinition{infinite graph}\emph{infinite graph}. We can still visualize some finite \IndexDefinition{subgraph}\emph{subgraph} $(V^\prime, E^\prime)$, \index{subset}where $V^\prime\subseteq V$ and $E^\prime\subseteq E$, if the remaining structure is understood somehow. \end{definition} \begin{definition}\label{digraph path} @@ -270,61 +270,30 @@ \section{The Type Parameter Graph}\label{type parameter graph} The \IndexDefinition{path length}\emph{length} of a path is the number of edges in the path. Every vertex $v\in V$ defines an \IndexDefinition{empty path}empty path (of length zero), denoted $1_v$, with source vertex $v$ followed by an empty sequence of edges; by the above definitions, we have $\Src(1_v)=\Dst(1_v)=v$. Every edge $e$ also defines a one-element path (of length one), with source vertex $\Src(e)$ and destination vertex $\Dst(e)$. We can also denote the one-element path as $e$, because $\Src(e)$ and $\Dst(e)$ have the same meaning whether we interpret $e$ as an edge or a path. \end{definition} -\begin{definition} A \IndexDefinition{tree}\emph{tree} is a directed graph with a distinguished \index{root vertex!of tree}root vertex, and the property that there is one unique \index{path!in tree}path from the root to every other vertex. This describes various forms of hierarchical data structures that are familiar to programmers. When we draw a tree, we indicate the root with a darker shade: -\begin{center} -\begin{tikzpicture}[x=1.5cm] -\node (a) [abstractvertex] at (0,0) {}; -\node (b) [abstractvertex2] at (-1,0) {}; -\node (c) [abstractvertex2] at (1,0) {}; -\node (d) [abstractvertex2] at (1,-1) {}; -\node (e) [abstractvertex2] at (1,1) {}; -\node (f) [abstractvertex2] at (2,0) {}; -\node (g) [abstractvertex2] at (2,1) {}; - -\draw [arrow] (a) -- (b); -\draw [arrow] (a) -- (c); -\draw [arrow] (c) -- (d); -\draw [arrow] (c) -- (e); -\draw [arrow] (c) -- (f); -\draw [arrow] (f) -- (g); -\end{tikzpicture} -\end{center} +\begin{definition}\label{dag def} +We use the following terminology throughout: +\begin{itemize} +\item A \IndexDefinition{cycle}\emph{cycle} is a non-empty path with the same source and destination vertex. + +\item A directed graph is \index{DAG|see {directed acyclic graph}}\IndexDefinition{directed acyclic graph}\emph{acyclic} if it does not contain cycles. + +\item A \IndexDefinition{tree}\emph{tree} is a directed graph with a distinguished \index{root vertex!of tree}root vertex, and the property that there is exactly one unique \index{path!in tree}path from the root to every other vertex. Note that every tree is acyclic. +\end{itemize} +We observe that this definition of a tree matches the concept familiar to programmers. \end{definition} -\smallskip We now have enough graph theory to define our main object of interest. \begin{definition} -Let $G$ be a \index{generic signature!type parameter graph}generic signature. The \IndexDefinition{type parameter graph}\emph{type parameter graph} of $G$ is the following \index{directed graph!type parameter graph}directed graph: +Let $G$ be a \index{generic signature!type parameter graph}generic signature. The \IndexDefinition{type parameter graph}\emph{type parameter graph} of~$G$ is the \index{directed graph!type parameter graph}directed graph constructed as follows: \begin{itemize} -\item For each \index{equivalence class!of type parameters}equivalence class of type parameters of~$G$, we add a vertex to the vertex set. Each such vertex is labeled with the \index{reduced type!type parameter graph}reduced type of the equivalence class (which might be a concrete type). -\item If $G$ has more than one generic parameter, we add a distinguished \index{root vertex!of type parameter graph}root vertex to the \index{vertex!of type parameter graph}vertex set. Otherwise, the vertex for this generic parameter is the root. -\item If there is a distinguished root vertex, we add an edge joining the root vertex with each equivalence class that contains a generic parameter type $\ttgp{d}{i}$ of $G$. Each such edge is labeled with this generic parameter type. -\item A second series of edges relates each pair of vertices where the destination contains some dependent member type \texttt{U.A}, and the source contains its base type~\tU. Each such edge is labeled ``\texttt{.A}''. +\item The vertex set contains a distinguished \index{root vertex!of type parameter graph}root vertex. +\item The \index{vertex!of type parameter graph}vertex set also contains a vertex for each \index{equivalence class!of type parameters}equivalence class of type parameters of~$G$. Each such vertex is labeled with the \index{reduced type!type parameter graph}reduced type of the equivalence class (which might be a concrete type). +\item The edge set contains an edge for each generic parameter $\ttgp{d}{i}$ of $G$. The source vertex of each such edge is the root, and the destination vertex the equivalence class of the corresponding generic parameter type. +\item Additional edges join those pairs of vertices where the source is the equivalence class of some type parameter \tU, and the destination is the equivalence class of a dependent member type \texttt{U.A} with base type~\tU. Each such edge is labeled with the name of the associated type~``\texttt{.A}''. \end{itemize} \end{definition} -\begin{proposition} -The edge relation described above is \index{well-defined}\emph{well-defined}, so it does not depend on a choice of representative from the source and destination equivalence class. -\end{proposition} -\begin{proof} -Suppose that for some \texttt{U} and \texttt{V}, we have $G\vdash\texttt{U.A}$ and $G\vdash\SameReq{U.A}{V}$. We note that $G\vdash\texttt{U.A}$ implies that $G\vdash\ConfReq{U}{P}$ for some protocol~\texttt{P} that declares an associated type named~\texttt{A}. -Now, further suppose that $G\vdash\SameReq{U}{$\tUp$}$ and $G\vdash\SameReq{V}{$\texttt{V}^\prime$}$. We can then derive both $G\vdash\tUp\texttt{.A}$ (7) and $G\vdash\SameReq{$\tUp$\texttt{.A}}{$\texttt{V}^\prime$}$ (11): -\begin{gather*} -\AnyStep{\ConfReq{U}{P}}{1}\\ -\AnyStep{\SameReq{U.A}{V}}{2}\\ -\AnyStep{\SameReq{U}{$\tUp$}}{3}\\ -\AnyStep{\SameReq{V}{$\texttt{V}^\prime$}}{4}\\ -\SymStep{3}{$\tUp$}{U}{5}\\ -\SameConfStep{1}{5}{$\tUp$}{P}{6}\\ -\AssocNameStep{6}{$\tUp$.A}{7}\\ -\SameNameStep{1}{5}{$\tUp$.A}{U.A}{8}\\ -\TransStep{8}{2}{$\tUp$.A}{V}{9}\\ -\TransStep{10}{4}{$\tUp$.A}{$\texttt{V}^\prime$}{11} -\end{gather*} -(We could not prove this without the \IndexStep{SameConf}\textsc{SameConf} or \IndexStep{SameName}\textsc{SameName} inference rules.) -\end{proof} - -Every \index{unbound type parameter!type parameter graph}unbound type parameter \texttt{\ttgp{d}{i}.$\texttt{A}_1$...$\texttt{A}_n$} defines a \index{path!in type parameter graph}path from the root vertex to the type parameter's equivalence class. Two type parameters belong to the same equivalence class if and only if their paths end at the same destination. If a type parameter is equivalent to a concrete type, the type parameter's path ends at a vertex labeled by this concrete type. (We could also model bound type parameters as paths by adding more edges, but we will not pursue this direction, because as we recall from \SecRef{bound type params}, bound type parameters don't really give us anything new.) +The key fact about this graph is that every \index{unbound type parameter!type parameter graph}valid type parameter \texttt{\ttgp{d}{i}.$\texttt{A}_1$...$\texttt{A}_n$} defines a \index{path!in type parameter graph}path from the root vertex to the type parameter's equivalence class, if we follow the edges labeled ``\ttgp{d}{i}'' and ``$\texttt{.A}_1$'' and so on. Notice that two type parameters belong to the same equivalence class if and only if the corresponding paths end at the same vertex. If a type parameter is fixed to a concrete type, the type parameter's path ends at a vertex labeled by this concrete type. \begin{example} Consider these two generic signatures: @@ -373,7 +342,7 @@ \section{The Type Parameter Graph}\label{type parameter graph} τ_0_0.Element == τ_0_1.Element> \end{verbatim} \end{quote} -We already studied the equivalence class structure of this generic signature extensively. We can now present what we already know in the form of a type parameter graph, which allows us to look at the member type relationships in a new way: +We already studied the equivalence class structure of this generic signature extensively. The type parameter graph gives us a new way to see the member type relationships: \begin{center} \begin{tikzpicture}[x=3cm,y=1.3cm] \node (Root) [root] at (0,1) {root}; @@ -405,9 +374,49 @@ \section{The Type Parameter Graph}\label{type parameter graph} Notice how the paths corresponding to each one of the type parameters \texttt{\rT.Element}, \texttt{\rU.Element}, \texttt{\rT.Iterator.Element}, and \texttt{\rU.Iterator.Element} all have the same destination, because they belong to the same equivalence class $\archetype{\rT.Element}$. \end{example} -When our generic signature only has a single generic parameter \rT, we can omit the root vertex. This changes our formulation slightly, because now a type parameter maps to a path that starts at \rT, so it has one fewer step. +\begin{example}\label{two sequence same iterator example} Now, recall this generic signature from \ExRef{same name rule example}: +\begin{quote} +\begin{verbatim} +<τ_0_0, τ_0_1 where τ_0_0: Sequence, τ_0_1: Sequence, + τ_0_0.Iterator == τ_0_1.Iterator> +\end{verbatim} +\end{quote} +Here is its type parameter graph: +\begin{center} +\begin{tikzpicture}[x=3cm,y=1.3cm] +\node (Root) [root] at (0,1) {root}; + +\node (T) [interior] at (1,2) {\rT}; +\node (U) [interior] at (1,0) {\rU}; + +\node (TIterator) [interior] at (2.4,1) {\texttt{\rT.Iterator}}; + +\node (TElement) [interior] at (1,1) {\texttt{\rT.Element}}; + +\begin{scope}[on background layer] +\path (Root) edge [arrow] node [sloped, above] {\tiny{\rT}} (T); +\path (Root) edge [arrow] node [sloped, below] {\tiny{\rU}} (U); + +\path (T) edge [arrow] node [sloped, above] {\tiny{\texttt{.Iterator}}} (TIterator); +\path (U) edge [arrow] node [sloped, below] {\tiny{\texttt{.Iterator}}} (TIterator); + +\path (T) edge [arrow] node [right] {\tiny{\texttt{.Element}}} (TElement); +\path (U) edge [arrow] node [right] {\tiny{\texttt{.Element}}} (TElement); + +\path (TIterator) edge [arrow] node [above] {\tiny{\texttt{.Element}}} (TElement); + +\end{scope} +\end{tikzpicture} +\end{center} +From the above graph, we see that the type parameters \texttt{\rT.Element}, \texttt{\rU.Element}, \texttt{\rT.Iterator.Element}, and \texttt{\rU.Iterator.Element} are also equivalent in this signature, just as in the previous example. Here though, the paths for \texttt{\rT.Iterator} and \texttt{\rU.Iterator} also end at the same vertex. +\end{example} + +In each of the remaining examples, our generic signature only has a single generic parameter~\rT, so to simplify the presentation, we omit the distinguished root vertex from the graph. Instead, we say that the equivalence class of \rT\ is the root. This changes our formulation slightly, for now the path corresponding to a type parameter has one fewer step; there is no edge corresponding to \rT. + +The next example exhibits an \index{infinite graph!type parameter graph}infinite type parameter graph. In any given compilation session, the set of archetypes that are actually instantiated will form an arbitrarily large but finite \index{subgraph}subgraph of the generic signature's type parameter graph. -\begin{example}\label{protocol n graph} Recall this protocol from \ExRef{protocol n example}: +\begin{example}\label{protocol n graph} +Recall this protocol from \ExRef{protocol n example}: \begin{Verbatim} protocol N { associatedtype A: N @@ -430,15 +439,13 @@ \section{The Type Parameter Graph}\label{type parameter graph} \end{center} \end{example} -In general, the type parameter graph may be \index{infinite graph!type parameter graph}infinite. If we think of archetypes as the vertices in this graph, then we see that the lazy construction of archetypes means that in any given compilation session, the compiler will explore an arbitrarily large but finite \index{subgraph}subgraph of the type parameter graph. - \begin{example}\label{protocol z4 graph} Now let's change \tN\ by adding an associated same-type requirement: \begin{Verbatim} protocol Z4 { associatedtype A: Z4 where Self == Self.A.A.A.A } \end{Verbatim} -Just like $\GN$, the protocol generic signature $G_\texttt{Z4}$ still defines an infinite set of valid type parameters, but the set of equivalence classes is now finite. The type parameter graph is the \IndexDefinition{cycle graph}\emph{cycle graph} of order~4: +Just like $\GN$, the protocol generic signature $G_\texttt{Z4}$ still defines an infinite set of valid type parameters, but the set of equivalence classes is now finite, so we're back in the world of finite graphs. The type parameter graph is the \IndexDefinition{cycle graph}\emph{cycle graph} of order~4: \begin{center} \begin{tikzpicture}[x=2cm] \node (T) [root] at (1,2) {\rT}; @@ -452,7 +459,7 @@ \section{The Type Parameter Graph}\label{type parameter graph} \path [arrow, bend left] (TAAA) edge [left] node [xshift=-3pt] {\tiny{\texttt{.A}}} (T); \end{tikzpicture} \end{center} -For example, \texttt{\ttgp{0}{0}.A} and \texttt{\ttgp{0}{0}.A.A.A.A.A} are equivalent. Note that a generic signature with an \index{theory!infinite}infinite theory but only finitely many equivalence classes must have at least one infinite equivalence class. In $G_\texttt{Z4}$, \emph{every} equivalence class is infinite. +For example, \texttt{\ttgp{0}{0}.A} and \texttt{\ttgp{0}{0}.A.A.A.A.A} are equivalent in this generic signature. Notice how if a generic signature has an \index{theory!infinite}infinite theory but only finitely many equivalence classes, a simple counting argument shows that at least one equivalence class must be infinite. In $G_\texttt{Z4}$, \emph{every} equivalence class is infinite. \end{example} \begin{example}\label{protocol collection graph} Recall the simplified \tCollection\ protocol from \ExRef{protocol collection example}: @@ -463,7 +470,7 @@ \section{The Type Parameter Graph}\label{type parameter graph} where SubSequence == SubSequence.SubSequence } \end{Verbatim} -We studied the protocol generic signature $G_\texttt{Collection}$ and again we saw that it defines an infinite theory, but only a finite set of equivalence classes. Here is the type parameter graph: +When we studied the protocol generic signature $G_\texttt{Collection}$, we saw that it defines an infinite theory, but only a finite set of equivalence classes. The type parameter graph looks like this: \begin{center} \begin{tikzpicture}[x=3.2cm,y=1.3cm] @@ -492,7 +499,7 @@ \section{The Type Parameter Graph}\label{type parameter graph} \end{tikzpicture} \end{center} -The equivalence class $\archetype{\rT.SubSequence}$ contains infinitely many type parameters, and just like $G_\texttt{Z4}$, the type parameter graph contains a \IndexDefinition{cycle}\emph{cycle}, or a path with the same source and destination, which we can take any number of times to generate more type parameters in this equivalence class: +The equivalence class $\archetype{\rT.SubSequence}$ contains infinitely many type parameters, and just like $G_\texttt{Z4}$, the type parameter graph of $G_\texttt{Collection}$ contains a cycle of length~1 (sometimes known as a \index{loop!see{cycle}}\emph{loop}), which we can take any number of times to generate more type parameters in this equivalence class: \begin{quote} \begin{verbatim} τ_0_0.SubSequence @@ -501,11 +508,11 @@ \section{The Type Parameter Graph}\label{type parameter graph} ... \end{verbatim} \end{quote} -The other infinite equivalence classes are formed by taking the cycle zero or more times, followed by some other edge. In general, if a generic signature has an infinite theory but a finite set of equivalence classes, the type parameter graph must contain a cycle. +We get the other infinite equivalence classes by taking the cycle zero or more times, followed by some other edge. In general, if a generic signature has an infinite theory but a finite set of equivalence classes, the type parameter graph must contain a cycle. However, this is no longer the case if the set of equivalence classes is itself infinite. \end{example} \begin{example}\label{protocol indexable graph} -The real \tCollection\ protocol declares two more associated types, \texttt{Index} and \texttt{Indices}, and imposes some associated requirements on them. We can subset this out into a new protocol to see what it looks like: +The real \tCollection\ protocol declares two more associated types, \texttt{Index} and \texttt{Indices}, and imposes some requirements on them, which boil down to: \begin{Verbatim} protocol Indexable { associatedtype Index @@ -513,7 +520,7 @@ \section{The Type Parameter Graph}\label{type parameter graph} where Index == Indices.Index } \end{Verbatim} -The protocol generic signature $G_\texttt{Indexable}$ defines an infinite set of equivalence classes, like $\GN$. The equivalence class of \texttt{\rT.Index} also has infinitely many representatives: +The protocol generic signature $G_\texttt{Indexable}$ defines an infinite set of equivalence classes, and the equivalence class of \texttt{\rT.Index} is infinite, and yet the graph is acyclic: \begin{center} \begin{tikzpicture}[y=1.8cm] \node (T) [root] at (0,1) {\texttt{\vphantom{y}\rT}}; @@ -557,10 +564,10 @@ \section{The Type Parameter Graph}\label{type parameter graph} If $v$ and $w$ are vertices in some directed graph $(V, E)$, we say that $v$ is a \IndexDefinition{successor!of vertex}\emph{successor} of $w$ if there is an edge~$e\in E$ having $\Src(e)=v$ and $\Dst(e)=w$. Likewise, $w$~is a \IndexDefinition{predecessor}\emph{predecessor} of~$v$. A vertex in the type parameter graph always has a finite set of successors, because all edges that share a source vertex correspond to distinct associated type declarations. The \texttt{Indexable} example above shows that we cannot make the same claim about the \emph{predecessor} set of a vertex, because the vertex $\archetype{\rT.Index}$ has an infinite set of predecessors in $G_\texttt{Indexable}$. \begin{definition}\label{locally finite def} -A directed graph $(V,E)$ is \IndexDefinition{locally finite graph}\emph{locally finite} if for every vertex $v\in V$, there are only finitely many edges $e\in E$ with $\Src(e)=v$. +A directed graph $(V,E)$ is \IndexDefinition{locally finite graph}\emph{locally finite} if every vertex has a finite set of successors, so for every $v\in V$, there are finitely many $e\in E$ with $\Src(e)=v$. \end{definition} -In \SecRef{finding conformance paths}, we will define another directed graph for each generic signature, called the \emph{conformance path graph}. +In \SecRef{finding conformance paths}, we will meet another locally finite directed graph that we can associate with a generic signature, called the \emph{conformance path graph}. \section{The Archetype Builder}\label{archetype builder} @@ -593,7 +600,7 @@ \section{The Archetype Builder}\label{archetype builder} \end{center} A potential archetype~$t$ always represents a fixed type parameter \tT, so that $\PAType(t)=\tT$ for the lifetime of~$t$. In the initial state, we have $\PAForward(t)=\text{null}$, $\PAConforms(t)=\varnothing$, and $\PAMembers(t)=\varnothing$. Potential archetypes are allocated sequentially while the algorithm runs, and then freed all at once when the entire archetype builder instance is freed. -The archetype builder establishes the following invariant. Any potential archetype with a null forwarding pointer represents the reduced type parameter of its equivalence class. In this case, $\PAConforms(t)$ is the set of all protocols this equivalence class conforms to, and $\PAMembers(t)$ lists all of the \index{successor!of potential archetype}successors of the equivalence class in the type parameter graph. Otherwise, if a potential archetype~$t$ has a non-null forwarding pointer, it points to some other potential archetype in the same equivalence class, and the other potential archetype precedes it in type parameter order; that is, $\PAType(\PAForward(t))<\PAType(t)$. By following this chain, we eventually end up at the reduced type parameter. +The archetype builder establishes the following invariant. Any potential archetype with a null forwarding pointer represents the reduced type parameter of its equivalence class. In this case, $\PAConforms(t)$ is the set of all protocols this equivalence class conforms to, and $\PAMembers(t)$ lists all of the \index{successor!of potential archetype}successors of the equivalence class in the \index{type parameter graph}type parameter graph. Otherwise, if a potential archetype~$t$ has a non-null forwarding pointer, it points to some other potential archetype in the same equivalence class, and the other potential archetype precedes it in type parameter order; that is, $\PAType(\PAForward(t))<\PAType(t)$. By following this chain, we eventually end up at the reduced type parameter. We now describe how this invariant is established, then show how the resulting data structure can implement generic signature queries. @@ -663,7 +670,7 @@ \section{The Archetype Builder}\label{archetype builder} \end{enumerate} \end{algorithm} -Once the algorithm returns, the potential archetype structure now encodes the finite type parameter graph built from the input requirements, assuming we didn't diagnose an invalid recursive conformance. At this point, no more expansion or merging takes place; the remaining potential archetypes, those without forwarding pointers, are ``frozen'' into immutable archetype \emph{types} that describe this generic signature to the rest of the compiler. While generic signature queries did not exist back then, we can reconcile this algorithm with our present model by defining each generic signature query as follows: +Once the algorithm returns, the potential archetype structure now encodes the finite \index{type parameter graph}type parameter graph built from the input requirements, assuming we didn't diagnose an invalid recursive conformance. At this point, no more expansion or merging takes place; the remaining potential archetypes, those without forwarding pointers, are ``frozen'' into immutable archetype \emph{types} that describe this generic signature to the rest of the compiler. While generic signature queries did not exist back then, we can reconcile this algorithm with our present model by defining each generic signature query as follows: \begin{itemize} \item \IndexQuery{isValidTypeParameter}$\Query{isValidTypeParameter}{G,\,\tT}$: Resolve \tT\ to a potential archetype~$t$, and check if~$t$ is non-null. \item \IndexQuery{getReducedType}$\Query{getReducedType}{G,\,\tT}$: Resolve \tT\ to a potential archetype~$t$, and return $\PAType(t)$. @@ -681,17 +688,15 @@ \section{The Archetype Builder}\label{archetype builder} \paragraph{Lazy expansion.} From a theoretical point of view, the archetype builder's approach amounts to exhaustive enumeration of all \index{derived requirement!enumeration}derived requirements and \index{valid type parameter!enumeration}valid type parameters of a generic signature, made slightly more efficient by the choice of data structure (the asymmetry in the handling of member types in \AlgRef{archetype builder merge} means we skip parts of the search space that would yield nothing new). -The eager expansion model survived the introduction of protocol \texttt{where} clauses in Swift~4 \cite{se0142}, and thus associated requirements, with only relatively minor changes. The introduction of recursive conformances in Swift~4.1~\cite{se0157} necessitated a larger overhaul. Once the type parameter graph becomes infinite, the eager conformance requirement expansion of \AlgRef{archetype builder expand} no longer makes sense. The \texttt{ArchetypeBuilder} was renamed to the \IndexDefinition{GenericSignatureBuilder@\texttt{GenericSignatureBuilder}}\texttt{GenericSignatureBuilder} as part of a re-design where the recursive expansion was now performed as needed, within the lookup of \AlgRef{archetype builder lookup} itself \cite{implrecursive}. +The eager expansion model survived the introduction of protocol \texttt{where} clauses in Swift~4 \cite{se0142}, and thus associated requirements, with only relatively minor changes. The introduction of recursive conformances in Swift~4.1~\cite{se0157} necessitated a larger overhaul. Once the \index{type parameter graph}type parameter graph becomes infinite, the eager conformance requirement expansion of \AlgRef{archetype builder expand} no longer makes sense. The \texttt{ArchetypeBuilder} was renamed to the \IndexDefinition{GenericSignatureBuilder@\texttt{GenericSignatureBuilder}}\texttt{GenericSignatureBuilder} as part of a re-design where the recursive expansion was now performed as needed, within the lookup of \AlgRef{archetype builder lookup} itself \cite{implrecursive}. In the lazy expansion model, the potential archetype structure was highly mutable, as generic signature queries would create new potential archetypes and merge existing potential archetypes while exploring new \index{subgraph}subgraphs of the type parameter graph. The on-going mutation prevented sharing of structure between \texttt{GenericSignatureBuilder} instances, and every generic signature that depends on a complicated standard library protocol, such as \texttt{RangeReplaceableCollection}, would eventually construct its own copy of a large graph of potential archetypes. This would significantly impact compiler memory usage and performance. Lazy expansion also suffered from correctness problems. Even the second family of generic signatures, those with an infinite theory but a finite set of equivalence classes, did not always work properly. \ExRef{protocol z4 graph} happened to be one such instance, and we will later see another in \ExRef{proto assoc rule}. An even more fundamental problem was discovered later. As we will see in \SecRef{word problem}, the third family of generic signatures, those having an infinite set of equivalence classes, actually contains generic signatures where the question of reduced type equality is \emph{undecidable}. Such generic signatures must be rejected by any correct implementation of Swift. By virtue of its design, the \texttt{GenericSignatureBuilder} pretended to be able to accept any generic signature and answer queries about it. Lazy expansion was a dead end! This seemed surprising at the time, because in the absence of recursive conformances, eager expansion completely ``solved'' Swift generics in a straightforward way. -These problems motivated the search for a sound and decidable foundation on top of which Swift generics should be built, which became the Requirement Machine. Instead of incrementally constructing finite subgraphs of the type parameter graph, the correct approach is to construct a \emph{convergent rewriting system}. While more abstract, this is actually much \emph{simpler} than lazy expansion of the type parameter graph. Just like with the original eager expansion design, the rewriting system is a finite data structure that is built once and then remains immutable after construction. Unlike eager expansion, a rewriting system can describe an infinite set of equivalence classes, and in many cases, it can also encode a \emph{finite} set without exhaustive enumeration. We will describe the contemporary approach and present a correctness proof in \PartRef{part rqm}. - - +These problems motivated the search for a sound and decidable foundation on top of which Swift generics should be built, which became the Requirement Machine. Instead of incrementally constructing finite subgraphs of the type parameter graph, the correct approach is to construct a \emph{convergent rewriting system}. While more abstract, this is actually much \emph{simpler} than lazy expansion of the type parameter graph. Just like with the original eager expansion design, the rewriting system is a finite data structure that is built once and then remains immutable after construction. Unlike eager expansion, a rewriting system can describe an infinite set of equivalence classes, and in many cases, it can also encode a \emph{finite} set without exhaustive enumeration. We will investigate the contemporary approach and study a correctness proof in \PartRef{part rqm}. -\section{Source Code Reference} +\section{Source Code Reference}\label{src:archetypes} \IndexSource{generic environment} \IndexSource{map type into environment} @@ -707,31 +712,31 @@ \section{Source Code Reference} \end{itemize} \apiref{GenericSignature}{class} -See also \SecRef{genericsigsourceref}. +See also \SecRef{src:generic signatures}. \begin{itemize} \item \texttt{getGenericEnvironment()} returns the \IndexSource{primary generic environment}primary generic environment associated with this generic signature. \end{itemize} \apiref{TypeBase}{class} -See also \SecRef{typesourceref}. +See also \SecRef{src:types}. \begin{itemize} \item \texttt{mapTypeOutOfContext()} returns the interface type obtained by \IndexSource{map type out of environment}mapping this contextual type out of its generic environment. \end{itemize} \apiref{SubstitutionMap}{class} -See also \SecRef{substmapsourcecoderef}. +See also \SecRef{src:substitution maps}. \begin{itemize} \item \texttt{mapReplacementTypesOutOfContext()} returns the substitution map obtained by \IndexSource{map replacement types out of environment}mapping this substitution map's replacement types and conformances out of their generic environment. \end{itemize} \apiref{ProtocolConformanceRef}{class} -See also \SecRef{conformancesourceref}. +See also \SecRef{src:conformances}. \begin{itemize} \item \texttt{mapConformanceOutOfContext()} returns the protocol conformance obtained by mapping this protocol conformance out of its generic environment. \end{itemize} \apiref{DeclContext}{class} -See also \SecRef{declarationssourceref}. +See also \SecRef{src:declarations}. \begin{itemize} \item \texttt{getGenericEnvironmentOfContext()} returns the generic environment of the innermost generic declaration containing this declaration context. \item \texttt{mapTypeIntoContext()} Maps an interface type into the primary generic environment for the innermost generic declaration. If at least one outer declaration context is generic, this is equivalent to: @@ -749,13 +754,13 @@ \section{Source Code Reference} \end{itemize} Taking an archetype apart: \begin{itemize} -\item \texttt{getInterfaceType()} returns the reduced type parameter of this archetype. +\item \texttt{getInterfaceType()} returns the \IndexSource{reduced type parameter}reduced type parameter of this archetype. \item \texttt{getGenericEnvironment()} returns the archetype's generic environment. \item \texttt{isRoot()} answers if the reduced type parameter is a generic parameter type. \end{itemize} Local requirements (\SecRef{local requirements}): \begin{itemize} -\item \texttt{getConformsTo()} returns the archetype's required protocols. This set does not include inherited protocols. To actually check if an archetype conforms to a specific protocol, use global conformance lookup (\SecRef{conformancesourceref}) instead of looking through this array. +\item \texttt{getConformsTo()} returns the archetype's required protocols. This set does not include inherited protocols. To actually check if an archetype conforms to a specific protocol, use \IndexSource{global conformance lookup}global conformance lookup (\SecRef{src:conformances}) instead of looking through this array. \item \texttt{getSuperclass()} returns the archetype's superclass bound, or the empty \texttt{Type} if there isn't one. \item \texttt{requiresClass()} answers with the requires class flag. \item \texttt{getLayoutConstraint()} returns the layout constraint, or the empty layout constraint if there isn't one. diff --git a/docs/Generics/chapters/basic-operation.tex b/docs/Generics/chapters/basic-operation.tex index 1f101832cac41..9795ea47f055f 100644 --- a/docs/Generics/chapters/basic-operation.tex +++ b/docs/Generics/chapters/basic-operation.tex @@ -4,38 +4,38 @@ \chapter{Basic Operation}\label{rqm basic operation} -\lettrine{T}{he final part} of this book is devoted to the implementation of \index{generic signature query}generic signature queries (\SecRef{genericsigqueries}) and \index{minimal requirement}requirement minimization (\SecRef{minimal requirements}). More precisely, we're looking at the \emph{decision procedure} for the \index{derived requirement!decision procedure}derived requirements formalism. We achieve this by translating the requirements of each generic signature and protocol into \emph{rewrite rules}, and then analyzing the relationships between these rewrite rules. +\lettrine{T}{he final part} of this book is devoted to the implementation of \index{generic signature query}generic signature queries (\SecRef{genericsigqueries}) and \index{minimal requirement}requirement minimization (\SecRef{minimal requirements}). Our goal will be to understand a \emph{decision procedure} for the \index{derived requirement!decision procedure}derived requirements formalism. We will learn how to translate the requirements of each generic signature and protocol into \emph{rewrite rules}, and then we will analyze a certain relation generated by these rules. We use the proper noun \IndexDefinition{requirement machine}``the Requirement Machine'' to mean the compiler component responsible for this reasoning, while \index{requirement machine}\emph{a} requirement machine---a common noun---is an \emph{instance} of a data structure containing those rewrite rules that describe a specific generic signature or protocol. -In this chapter, we give a high-level overview of how requirement machines operate, without saying anything about what goes on inside. Understanding their inner workings will take two more chapters. \ChapRef{monoids} introduces the theory of finitely-presented monoids and string rewriting, and \ChapRef{symbols terms rules} details the translation of requirements into rewrite rules. +In this chapter, we will gain a high-level overview of how requirement machines operate, without saying anything about what goes on inside. Understanding their inner workings will take two more chapters. \ChapRef{monoids} introduces the theory of finitely-presented monoids and string rewriting, and \ChapRef{chap:symbols terms rules} details the translation of requirements into rewrite rules. \smallskip Let's go. There are two principal entry points into the Requirement Machine: \begin{itemize} -\item To answer a \index{generic signature query}\textbf{generic signature query}, we construct a requirement machine from the requirements of the given generic signature; we call this a \emph{query machine}. We then consult the machine's \emph{property map}: a description of all conformance, superclass, layout and concrete type requirements imposed on each type parameter. +\item To \index{generic signature query}\textbf{answer a generic signature query}, we first construct a requirement machine from the requirements of the given generic signature; we call this a \emph{query machine}. We then consult the query machine's \emph{property map}: a description of all conformance, superclass, layout, and concrete type requirements imposed on each type parameter. -\item To \textbf{build a new generic signature}, we construct a requirement machine from a list of user-written requirements; we call this a \emph{minimization machine}. We compute the minimal requirements as a side-effect of the construction process. +\item To \textbf{build a new generic signature}, we construct a requirement machine from a list of user-written requirements; we call this a \emph{minimization machine}. We get the new signature's minimal requirements as a side-effect of the construction process. \end{itemize} -In addition, we have two more for when we need to reason about protocols: +In addition, we have two more entry points related to protocol declarations: \begin{itemize} -\item To answer questions about \textbf{a protocol's requirement signature}, we construct a requirement machine from the protocol's requirement signature; we call this a \emph{protocol machine}. +\item To construct a query or minimization machine for a generic signature that states a conformance requirement, we first build a requirement machine for the protocol's requirement signature; we call this a \emph{protocol machine}. \item To \textbf{build a new requirement signature} for a protocol written in source, we construct a requirement machine from the list of user-written requirements; we call this a \emph{protocol minimization machine}. We compute the minimal associated requirements as a side-effect of the construction process. \end{itemize} -We partition the rewrite rules of a requirement machine into \IndexDefinition{local rule}\emph{local rules}, corresponding to the initial requirements the machine was built from, and \emph{imported rules}, which as we see shortly, describe associated requirements of protocols referenced by the local rules. We now look at each of the four requirement machine kinds in detail. +A requirement machine's rewrite rules are divided into \IndexDefinition{local rule}\emph{local rules}, corresponding to the initial requirements the machine was built from, and \emph{imported rules}, which as we see shortly, describe associated requirements of protocols referenced by the local rules. We now look at each of the four requirement machine kinds in detail. \paragraph{Query machines.} We maintain a table of \IndexDefinition{query machine}\emph{query machine} instances inside a singleton object, called the \IndexDefinition{rewrite context}\emph{rewrite context}. We can use \index{canonical generic signature}\emph{canonical} generic signatures as keys, because the translation of requirements into rewrite rules ignores \index{sugared type!in requirement machine}type sugar. Once built, a query machine remains live for the duration of the compilation session. We build a query machine from a \index{generic signature!query machine}generic signature like so: \begin{enumerate} -\item We translate the generic signature's explicit requirements into rewrite rules using \AlgRef{build rule} of \SecRef{building rules} to get the list of local rules for our query machine. +\item We translate the generic signature's explicit requirements into rewrite rules using \AlgRef{build rule} to get the list of local rules for our query machine. \item We lazily construct a \emph{protocol machine} for each \index{protocol declaration}protocol appearing on the right-hand side of a \index{conformance requirement!protocol dependency}conformance requirement in our signature. \item We use \AlgRef{importing rules} to collect local rules from each of these protocol machines as well as all protocol machines they transitively reference, to get the list of imported rules for our query machine. -\item We run the \index{completion}\emph{completion procedure} (\ChapRef{completion}) to add new local rules which are ``consequences'' of other rules. This gives us a \index{convergent rewriting system}\emph{convergent rewriting system}. +\item We run the \index{completion}\emph{completion procedure} (\ChapRef{chap:completion}) to add new local rules which are ``consequences'' of other rules. This gives us a \index{convergent rewriting system}\emph{convergent rewriting system}. \item We build the property map data structure (\ChapRef{propertymap}). @@ -68,7 +68,7 @@ \chapter{Basic Operation}\label{rqm basic operation} \end{center} \paragraph{Minimization machines.} -To compute the list of \index{minimal requirement}minimal requirements for a new generic signature, we build a \IndexDefinition{minimization machine}\emph{minimization machine}. This is the final step in \FigRef{inferred generic signature request figure} and \FigRef{abstract generic signature request figure} of \ChapRef{building generic signatures}. Minimization machines have a temporary lifetime owned by the \index{inferred generic signature request}\Request{inferred generic signature request} or \index{abstract generic signature request}\Request{abstract generic signature request}. The flow is similar to building a query machine, except now the inputs are user-written requirements after \index{desugared requirement!in requirement machine}desugaring (\SecRef{requirement desugaring}). The other main differences are: +To compute the list of \index{minimal requirement}minimal requirements for a new generic signature, we build a \IndexDefinition{minimization machine}\emph{minimization machine}. This is the final step in \FigRef{inferred generic signature request figure} and \FigRef{abstract generic signature request figure} of \ChapRef{chap:building generic signatures}. Minimization machines have a temporary lifetime owned by the \index{inferred generic signature request}\Request{inferred generic signature request} or \index{abstract generic signature request}\Request{abstract generic signature request}. The flow is similar to building a query machine, except now the inputs are user-written requirements after \index{desugared requirement!in requirement machine}desugaring (\SecRef{requirement desugaring}). The other main differences are: \begin{enumerate} \item Completion must do additional work if the input requirements contain \index{unbound dependent member type!in requirements}unbound dependent member types, as they do when they come from the \index{structural resolution stage}structural resolution stage. \item Completion also records \emph{rewrite loops}, which describe relations between rewrite rules---in particular, this describes which rules are redundant because they are a consequence of existing rules. @@ -137,7 +137,7 @@ \chapter{Basic Operation}\label{rqm basic operation} \item As we will see in \SecRef{word problem}, completion may fail if the user-written requirements are too complex to reason about. In this case the minimization machine outputs an empty list of minimal requirements. -\item If a conformance requirement is made redundant by a same-type requirement that fixes a type parameter to a concrete type (such as $\TP$ and $\SameReq{T}{X}$ where \tX\ is a concrete type conforming to \tP), we drop the conformance requirement, but this changes the theory; we will talk about this in \ChapRef{concrete conformances}. +\item If a conformance requirement is made redundant by a same-type requirement that fixes a type parameter to a concrete type (such as $\TP$ and $\SameReq{T}{X}$ where \tX\ is a concrete type conforming to \tP), we drop the conformance requirement, but this changes the theory; we will talk about this in \SecRef{rqm concrete conformances}. \end{enumerate} If any of these conditions hold, we record this fact while building the minimization machine. This prevents the minimization machine from being installed into the rewrite context, forcing us to discard it instead. The \IndexFlag{disable-requirement-machine-reuse}\texttt{-disable-requirement-machine-reuse} frontend flag is intended for debugging. It disables this optimization entirely, forcing us to discard all minimization machines immediately without attempting to install them. @@ -147,15 +147,15 @@ \chapter{Basic Operation}\label{rqm basic operation} Protocol machines have a global lifetime and are owned by the \index{rewrite context}rewrite context. The procedure for building a protocol machine is similar to building a query machine: \begin{enumerate} -\item We translate the associated requirements of each protocol into rewrite rules, using \AlgRef{build rule protocol} of \SecRef{building rules}. These become the protocol machine's local rules. +\item We translate the associated requirements of each protocol into rewrite rules, using \AlgRef{build rule protocol}. These become the protocol machine's local rules. \item We lazily construct a protocol machine for each protocol that appears on the right-hand side of an \index{associated conformance requirement!protocol dependency}associated conformance requirement. \item We use \AlgRef{importing rules} to collect local rules from each of these protocol machines as well as all protocol machines they transitively reference, to get our imported rules. Thus, protocol machines recursively import rules from other protocol machines. \item Completion and property map construction proceed as in the query machine case. \end{enumerate} -Usually the user's program will declare an assortment of protocols, with various generic types and functions then stating conformance requirements to those protocols, so that each protocol is typically mentioned in several generic signatures. For this reason, it is often the case that requirement machines for different generic signatures often have many rewrite rules in common, because they depend on the same protocols. +A typical scenario is that the user's program declares an assortment of protocols, and then proceeds to reference those protocols multiple times, by stating conformance requirements on various generic types and functions. For this reason, the requirement machines for two different generic signatures may have many rewrite rules in common, if they depend on the same protocols. -The role of protocol machines is to serve as containers for these shared rules. This eliminates the overhead of translating the associated requirements of each protocol, and processing them with the completion procedure, in every requirement machine that contains a conformance requirement to this protocol. Instead, we look up the protocol machine and \emph{import} its rewrite rules when building a requirement machine that depends on a protocol. +Protocol machines serve as containers for these shared rules. This eliminates the overhead of repeatedly translating the associated requirements of a protocol into rewrite rules, and then running the completion procedure on these rules. Instead, when a query or minimization machine depends on a protocol, we lazily create the protocol machine for this protocol once, and \emph{import} its rewrite rules whenever they are needed to build a requirement machine. \paragraph{Protocol minimization machines.} To actually build the requirement signature of a protocol written in source, we construct a \IndexDefinition{protocol minimization machine}\emph{protocol minimization machine}. A protocol minimization machine has a temporary lifetime, scoped to the \index{requirement signature request}\Request{requirement signature request}. The minimal requirements of the requirement signature are computed as a side-effect of constructing the protocol minimization machine. @@ -163,7 +163,7 @@ \chapter{Basic Operation}\label{rqm basic operation} In the absence of error conditions, we install the protocol minimization machine in the rewrite context, so that it becomes a long-lived protocol machine for the same protocol component. In fact, we only ever directly construct a protocol machine for a protocol written in source if we failed to install the protocol minimization machine. Otherwise, protocol machines are usually only built when the user's program references a protocol from a \index{serialized module}serialized module, such as the standard library. \begin{example} -We talked about these declarations from the start of \ChapRef{genericsig} most recently in \ExRef{same name rule example}: +We talked about these declarations from the start of \ChapRef{chap:generic signatures} most recently in \ExRef{same name rule example}: \begin{Verbatim} func sameElt(_ s1: S1, _ s2: S2) where S1.Element == S2.Element {...} @@ -226,7 +226,7 @@ \chapter{Basic Operation}\label{rqm basic operation} \section{Protocol Components}\label{protocol component} -We now take a closer look at how generic signatures and protocols depend on other protocols, and fully explain how protocol machines work. We begin by revisiting the \index{protocol dependency graph}protocol dependency graph, from \DefRef{protocol dependency graph def} of \SecRef{recursive conformances}. Recall that the \index{vertex}vertices are protocol declarations, and the \index{edge}edges are \index{associated conformance requirement!protocol dependency}associated conformance requirements. That is, if $\AssocConfReq{Self.U}{P}{Q}$ is some associated conformance requirement, we define: +In this section, we take a closer look at how generic signatures and protocols depend on other protocols, and fully explain how protocol machines work. We begin by revisiting the \index{protocol dependency graph}protocol dependency graph, from \DefRef{protocol dependency graph def}. We recall that the \index{vertex}vertices are protocol declarations, and the \index{edge}edges are \index{associated conformance requirement!protocol dependency}associated conformance requirements. That is, if $\AssocConfReq{Self.U}{P}{Q}$ is some associated conformance requirement, we define: \begin{align*} \Src(\AssocConfReq{Self.U}{Q}{P})&:=\tP,\\ \Dst(\AssocConfReq{Self.U}{Q}{P})&:=\tQ. @@ -243,7 +243,7 @@ \section{Protocol Components}\label{protocol component} \begin{proof} For the first part, let \tP\ be any protocol. We can always derive $\GP\vdash\ConfReq{Self}{P}$ via the explicit requirement $\ConfReq{Self}{P}$ of $\GP$, therefore $\tP\prec\tP$, so $\prec$ is reflexive. -For the second part, let \tP, \tQ\ and \tR\ be protocols such that $\tP\prec\tQ$ and $\tQ\prec\tR$. By definition, $\GP\vdash\ConfReq{Self.U}{Q}$ and $G_\tQ\vdash\ConfReq{Self.V}{R}$ for some type parameters \SelfU\ and \texttt{Self.V}. By \LemmaRef{subst lemma}, $\GP\vdash\ConfReq{Self.U.V}{R}$, where \texttt{Self.U.V} is the type parameter formed by \index{formal substitution}replacing \tSelf\ with \SelfU\ in \texttt{Self.V}. Therefore, $\tP\prec\tR$. +For the second part, let \tP, \tQ, and \tR\ be protocols such that $\tP\prec\tQ$ and $\tQ\prec\tR$. By definition, $\GP\vdash\ConfReq{Self.U}{Q}$ and $G_\tQ\vdash\ConfReq{Self.V}{R}$ for some type parameters \SelfU\ and \texttt{Self.V}. By \LemmaRef{subst lemma}, $\GP\vdash\ConfReq{Self.U.V}{R}$, where \texttt{Self.U.V} is the type parameter formed by \index{formal substitution}replacing \tSelf\ with \SelfU\ in \texttt{Self.V}. Therefore, $\tP\prec\tR$. \end{proof} In fact, $\prec$ is exactly the \index{reachability relation}reachability relation in the protocol dependency graph. @@ -283,11 +283,11 @@ \section{Protocol Components}\label{protocol component} \end{wrapfigure} \paragraph{Recursive conformances.} -In \SecRef{recursive conformances} we saw that recursive conformance requirements introduce cycles in the protocol dependency graph. \ListingRef{protocol component listing} shows some protocol declarations. The protocol dependency graph for this example, drawn on the right, has a cycle. +In \SecRef{recursive conformances}, we said that a conformance requirement is \emph{recursive} if it is part of a cycle in the protocol dependency graph. \ListingRef{protocol component listing} shows some protocol declarations. The protocol dependency graph for this example, drawn on the right, contains a cycle. Each one of \texttt{Foo}~and~\texttt{Bar} points at the other via a pair of mutually-recursive associated conformance requirements. Based on our description so far, we cannot build one protocol machine without first building the other: a circular dependency. -We solve this by grouping protocols into \IndexDefinition{protocol component}\emph{protocol components}, so in our example, \texttt{Foo}~and~\texttt{Bar} belong to the same component. A protocol machine describes an entire protocol component, and the local rules of a protocol machine include the associated requirements of all protocols in the component. If we consider dependencies between protocol \emph{components} as opposed to \emph{protocols}, we get a directed \emph{acyclic} graph. To make this all precise, we step back to consider directed graphs in the abstract. +We solve this by grouping protocols into \IndexDefinition{protocol component}\emph{protocol components}, so in our example, \texttt{Foo}~and~\texttt{Bar} belong to the same component. A protocol machine describes an entire protocol component, and the local rules of a protocol machine include the associated requirements of all protocols in the component. If we consider dependencies between protocol \emph{components} as opposed to \emph{protocols}, we get a directed \emph{acyclic} graph. To understand how these components are formed, we first consider directed graphs in the abstract. \begin{listing}\captionabove{Protocol component demonstration}\label{protocol component listing} \begin{Verbatim} @@ -326,7 +326,7 @@ \section{Protocol Components}\label{protocol component} \item \index{transitive relation}(Transitivity) If $x\equiv y$ and $y\equiv z$, then in particular $x\prec y$ and $y\prec z$, so $x\prec z$, because $\prec$ is transitive. By the same argument, we also have $y\prec x$ and $z\prec y$, hence $z\prec x$. Therefore, $x\equiv z$. \end{itemize} -The \index{equivalence class}equivalence classes of $\equiv$ are called the \emph{strongly connected components} of $(V, E)$. We can define a graph where each \emph{vertex} is a strongly connected component of the original graph; two components are joined by an edge if and only if some vertex in the first component is joined by a path with another vertex in the second component in the original graph. (This is \index{well-defined}well-defined, because if $x_1\equiv x_2$ and $y_1\equiv y_2$, then $x_1\prec y_1$ if and only if $x_2\prec y_2$.) The graph of strongly connected components is always acyclic. Furthermore, when the original graph is acyclic, the graph of stringly connected components is the same as the original graph. we can only have $x\equiv y$ if in fact $x$~and~$y$ are the same vertex, in which case every vertex simply belongs to its own one-element strongly connected component. +The \index{equivalence class}equivalence classes of $\equiv$ are called the \emph{strongly connected components} of $(V, E)$. We can then construct a graph whose \emph{vertices} are the strongly connected components of the original graph; two components are joined by an edge if and only if some vertex in the first component is joined by a path with another vertex in the second component in the original graph. (This is \index{well-defined}well-defined, because if $x_1\equiv x_2$ and $y_1\equiv y_2$, then $x_1\prec y_1$ if and only if $x_2\prec y_2$.) The graph of strongly connected components is always acyclic. Furthermore, if our original graph is acyclic, we have $x\equiv y$ if and only if $x$~and~$y$ are in fact the same vertex; in other words, the graph of strongly-connected components of a directed acyclic graph is isomorphic to the original graph itself. \begin{wrapfigure}{l}{3.3cm} \begin{tikzpicture}[x=1cm, y=1.3cm] @@ -413,9 +413,9 @@ \section{Protocol Components}\label{protocol component} We return to the protocol dependency graph from \ListingRef{protocol component listing}. A depth-first search originating from \texttt{Top} produces the numbering of vertices shown on the right, where the first vertex is assigned the value 1. Here, the entire graph was ultimately reachable from 1; more generally, we get a numbering of the \index{subgraph}subgraph reachable from the initial vertex. -When we visit a vertex $v$, we look at each edge $e\in E$ with $\Src(e)=v$. Suppose that $\Dst(e)$ is some other vertex~$w$; we consider the state of $\Number{w}$, and classify the edge $e$ as a tree edge, ignored edge, frond, or cross-link. +When we visit a vertex $v$, we look at each edge $e\in E$ with $\Src(e)=v$. Suppose that $\Dst(e)$ is some other vertex~$w$; we consider the state of $\Number{w}$, and classify the edge~$e$ as a tree edge, ignored edge, frond, or cross-link. -If $\Number{w}$ has not yet been set, we say that $e$ is a \emph{tree edge}; the tree edges define a \IndexDefinition{spanning tree}\emph{spanning tree} of the subgraph explored by the search. Otherwise, we saw~$w$ earlier in our search, and the edge $e$ has one of the three remaining kinds. If $\Number{w}\geq\Number{v}$ (with equality corresponding to an edge from $v$ to itself, which we allow), any path from $v$ to $w$ must pass through a common ancestor of $v$ and $w$ in the spanning tree. We say that $e$ is an \emph{ignored edge}, because $e$ cannot generate any new strongly connected components. The final two kinds arise when $\Number{w}<\Number{v}$. We say that $e$ is a \emph{frond} if $w$ is also an ancestor of $v$ in the spanning tree; otherwise, $e$ is a \emph{cross-link}. (While the distinction between the final two kinds plays a role in Tarjan's correctness proof, we'll see that algorithm handles fronds and cross-links uniformly.) +If $\Number{w}$ has not yet been set, we say that $e$ is a \emph{tree edge}; the tree edges define a \IndexDefinition{spanning tree}\emph{spanning tree} of the subgraph explored by the search. Otherwise, we saw~$w$ earlier in our search, and the edge $e$ has one of the three remaining kinds. If $\Number{w}\geq\Number{v}$, (with equality corresponding to an edge from $v$ to itself, which we allow) any path from $v$ to $w$ must pass through a common ancestor of $v$ and $w$ in the spanning tree. We say that $e$ is an \emph{ignored edge}, because $e$ cannot generate any new strongly connected components. The final two kinds arise when $\Number{w}<\Number{v}$. We say that $e$ is a \emph{frond} if $w$ is also an ancestor of $v$ in the spanning tree; otherwise, $e$ is a \emph{cross-link}. (While the distinction between the final two kinds plays a role in Tarjan's correctness proof, we'll see that algorithm handles fronds and cross-links uniformly.) \begin{wrapfigure}[11]{l}{2.7cm} \begin{tikzpicture}[x=1cm, y=1.3cm] @@ -496,7 +496,7 @@ \section{Protocol Components}\label{protocol component} \end{algorithm} After the outermost recursive call returns, the stack will always be empty. Note that while this algorithm is recursive, it is not re-entrant; in particular, the act of getting the successors of a vertex must not trigger the computation of the same strongly connected components. This is enforced in Step~2. In our case, this can happen because getting the successors of a protocol performs type resolution; in practice this should be extremely difficult to hit, so for simplicity we report a fatal error and exit the compiler instead of attempting to recover. -\paragraph{Protocol components.} We maintain two tables in \index{rewrite context}rewrite context: +\paragraph{Protocol components.} We maintain two tables in the \index{rewrite context}rewrite context: \begin{enumerate} \item A map from \index{protocol declaration}protocol declarations to \IndexDefinition{protocol node}\emph{protocol nodes}. @@ -516,7 +516,7 @@ \section{Protocol Components}\label{protocol component} This new form of $\prec$ is not a binary relation in the sense of \DefRef{def relation}, because the operands are in different sets. However, it is still transitive in the sense that $G\prec\tP$ and $\tP\prec\tQ$ together imply $G\prec\tQ$. Note that the two definitions of $\prec$ are related via the \index{protocol generic signature}protocol generic signature: $\GP\prec\tQ$ if and only if $\tP\prec\tQ$. -Now, consider the protocols appearing on the right-hand side of the \index{explicit requirement}\emph{explicit} \index{conformance requirement!protocol dependency}conformance requirements of our generic signature $G$; call these the \emph{successors} of $G$, by analogy with the successors of a protocol in the protocol dependency graph. If we consider all protocols reachable via paths originating from the successors of~$G$, we arrive at the complete set of protocol dependencies of~$G$. +Now, consider the protocols appearing on the right-hand side of the \index{explicit requirement}\emph{explicit} \index{conformance requirement!protocol dependency}conformance requirements of our generic signature $G$; call these the \emph{successors} of $G$, by analogy with the successors of a protocol in the protocol dependency graph. If we \index{transitive closure}consider all protocols reachable via paths originating from the successors of~$G$, we arrive at the complete set of protocol dependencies of~$G$. We end this section with the algorithm to collect imported rules for a new requirement machine. We find all protocols reachable from some initial set, compute their strongly connected components, lazily construct a protocol machine for each component, and finally, \index{imported rule}collect the local rules from each requirement machine. \begin{algorithm}[Import rules from protocol components]\label{importing rules} @@ -533,7 +533,7 @@ \section{Protocol Components}\label{protocol component} \end{enumerate} \end{algorithm} -We will encounter the protocol dependency graph again when we talk about the Knuth-Bendix completion procedure in \ChapRef{completion}. Completion looks for \emph{overlaps} between pairs of rules, and we will use the protocol dependency graph to cut down on the work performed by showing that pairs of rules where both are imported need not be considered. +We will encounter the protocol dependency graph again when we talk about the Knuth-Bendix completion procedure in \ChapRef{chap:completion}. Completion looks for \emph{overlaps} between pairs of rules, and we will use the protocol dependency graph to cut down on the work performed by showing that pairs of rules where both are imported need not be considered. A protocol component is always operated on as an indivisible unit; for example, in \ChapRef{rqm minimization} we will see that requirement minimization must consider all protocols in a component simultaneously to get correct results. @@ -551,9 +551,9 @@ \section{Debugging Flags}\label{rqm debugging flags} \item \texttt{timers}: \ChapRef{rqm basic operation}. \item \texttt{protocol-dependencies}: \SecRef{protocol component}. \item \texttt{simplify}: \SecRef{term reduction}. -\item \texttt{add}, \texttt{completion}: \ChapRef{completion}. +\item \texttt{add}, \texttt{completion}: \ChapRef{chap:completion}. \item \texttt{concrete-unification}, \texttt{conflicting-rules}, \texttt{property-map}: \ChapRef{propertymap}. -\item \texttt{concretize-nested-types}, \texttt{conditional-requirements}: \SecRef{rqm type witnesses}. +\item \texttt{concretize-nested-types}, \texttt{conditional-requirements}: \SecRef{rqm concrete conformances}. \item \texttt{concrete-contraction}: \SecRef{concrete contraction}. \item \texttt{homotopy-reduction}, \texttt{homotopy-reduction-detail},\\ \texttt{propagate-requirement-ids}: \SecRef{homotopy reduction}. @@ -570,15 +570,15 @@ \section{Debugging Flags}\label{rqm debugging flags} \item Statistics about the minimal conformances algorithm (\SecRef{minimal conformances}). \end{itemize} -The \IndexFlag{dump-requirement-machine}\texttt{-dump-requirement-machine} flag prints each requirement machine before and after the \index{completion}completion procedure runs. The printed representation includes a list of rewrite rules, the property map, and all \index{rewrite loop}rewrite loops. The output will begin to make sense after \ChapRef{symbols terms rules}. +The \IndexFlag{dump-requirement-machine}\texttt{-dump-requirement-machine} flag prints each requirement machine before and after the \index{completion}completion procedure runs. The printed representation includes a list of rewrite rules, the property map, and all \index{rewrite loop}rewrite loops. The output will begin to make sense after \ChapRef{chap:symbols terms rules}. -\section{Source Code Reference}\label{rqm basic operation source ref} +\section{Source Code Reference}\label{src:basic operation} Key source files: \begin{itemize} \item \SourceFile{lib/AST/RequirementMachine/} \end{itemize} -The Requirement Machine implementation is private to \texttt{lib/AST/}. The remainder of the compiler interacts with it indirectly, through the generic signature query methods on \texttt{GenericSignature} (\SecRef{genericsigsourceref}) and the various requests for building new generic signatures (\SecRef{buildinggensigsourceref}). +The Requirement Machine implementation is private to \texttt{lib/AST/}. The remainder of the compiler interacts with it indirectly, through the generic signature query methods on \texttt{GenericSignature} (\SecRef{src:generic signatures}) and the various requests for building new generic signatures (\SecRef{src:building generic signatures}). \subsection*{The Rewrite Context} @@ -590,14 +590,14 @@ \subsection*{The Rewrite Context} \IndexSource{AST context} \apiref{ASTContext}{class} -The global singleton for a single frontend instance. See also \SecRef{compilation model source reference}. +The global singleton for a single frontend instance. See also \SecRef{src:compilation model}. \begin{itemize} \item \texttt{getRewriteContext()} returns the global singleton \texttt{RewriteContext} for this frontend instance. \end{itemize} \IndexSource{rewrite context} \apiref{rewriting::RewriteContext}{class} -A singleton object to manage construction of requirement machines, building of the protocol component graph, and unique allocation of symbols and terms. See also \SecRef{symbols terms rules sourceref}. +A singleton object to manage construction of requirement machines, building of the protocol component graph, and unique allocation of symbols and terms. See also \SecRef{src:symbols terms rules}. \begin{itemize} \item \texttt{getRequirementMachine(CanGenericSignature)} returns a \IndexSource{query machine}query machine for the given generic signature, creating one first if necessary. \item \texttt{getRequirementMachine(ProtocolDecl *)} returns a \IndexSource{protocol machine} for the protocol component which contains the given protocol, creating one first if necessary. @@ -607,7 +607,7 @@ \subsection*{The Rewrite Context} \item \texttt{isRecursivelyConstructingRequirementMachine(CanGenericSignature)}\\ returns true if we are currently constructing a query machine for this generic signature. \item \texttt{isRecursivelyConstructingRequirementMachine(ProtocolDecl *)}\\ -returns true if we are currently constructing a protocol machine for the given protocol's component. These two methods are used to break cycles in \index{associated type inference}associated type inference, since otherwise re-entrant construction triggers an assertion in the compiler. +returns true if we are currently constructing a protocol machine for the given protocol's component. These two methods are used to \index{request cycle}break cycles in \index{associated type inference!request cycle}associated type inference, since otherwise re-entrant construction triggers an assertion in the compiler. \item \verb|installRequirementMachine(CanGenericSignature,|\\ \verb| std::unique_ptr)|\\ @@ -619,7 +619,7 @@ \subsection*{The Rewrite Context} \end{itemize} \apiref{GenericSignatureImpl}{class} -See also \SecRef{genericsigsourceref}. +See also \SecRef{src:generic signatures}. \begin{itemize} \item \texttt{getRequirementMachine()} returns the query machine for this \IndexSource{generic signature!requirement machine}generic signature, by asking the rewrite context to produce one and then caching the result in an instance variable of the \texttt{GenericSignatureImpl} instance itself. @@ -631,14 +631,14 @@ \subsection*{The Rewrite Context} A request evaluator request which computes the all protocols referenced from a given protocol's associated conformance requirements. These are the successors of the protocol in the \IndexSource{protocol dependency graph}. \apiref{ProtocolDecl}{class} -See also \SecRef{declarationssourceref}. +See also \SecRef{src:declarations}. \begin{itemize} \item \texttt{getProtocolDependencies()} evaluates the \texttt{ProtocolDependenciesRequest}. \end{itemize} \IndexSource{requirement machine} \apiref{rewriting::RequirementMachine}{class} -A list of rewrite rules and a property map. See also \SecRef{symbols terms rules sourceref} and \SecRef{property map sourceref}. Entry points for initializing a requirement machine, called by the rewrite context and various requests: +A list of rewrite rules and a property map. See also \SecRef{src:symbols terms rules} and \SecRef{property map sourceref}. Entry points for initializing a requirement machine, called by the rewrite context and various requests: \begin{itemize} \item \texttt{initWithGenericSignature()} initializes a new \IndexSource{query machine}query machine from the requirements of an existing generic signature. \item \texttt{initWithWrittenRequirements()} initializes a new \IndexSource{minimization machine}minimization machine from user-written requirements when building a new generic signature. @@ -647,7 +647,7 @@ \subsection*{The Rewrite Context} \end{itemize} Taking a requirement machine apart: \begin{itemize} -\item \texttt{getRewriteSystem()} returns the \texttt{RewriteSystem} (\SecRef{symbols terms rules sourceref}). +\item \texttt{getRewriteSystem()} returns the \texttt{RewriteSystem} (\SecRef{src:symbols terms rules}). \item \texttt{getPropertyMap()} returns the \texttt{PropertyMap} (\SecRef{property map sourceref}). \end{itemize} Miscellaneous: @@ -669,16 +669,16 @@ \subsection*{Requests} \apiref{InferredGenericSignatureRequest::evaluate}{method} Evaluation function for building a new generic signature from requirements written in source. Constructs a minimization machine. -This ultimately implements the \texttt{GenericContext::getGenericSignature()} method; see \SecRef{genericsigsourceref}. +This ultimately implements the \texttt{GenericContext::getGenericSignature()} method; see \SecRef{src:generic signatures}. \IndexSource{abstract generic signature request} \apiref{AbstractGenericSignatureRequest::evaluate}{method} -Evaluation function for building a new generic signature from a list of generic parameters and requirements. Constructs a minimization machine. This ultimately implements the \texttt{buildGenericSignature()} function; see \SecRef{buildinggensigsourceref}. +Evaluation function for building a new generic signature from a list of generic parameters and requirements. Constructs a minimization machine. This ultimately implements the \texttt{buildGenericSignature()} function; see \SecRef{src:building generic signatures}. \IndexSource{requirement signature request} \apiref{RequirementSignatureRequest::evaluate}{method} Evaluation function for obtaining the requirement signature of a protocol. This either deserializes the requirement signature, or constructs a protocol minimization machine from user-written requirements and uses it to build a new requirement signature. -This ultimately implements the \texttt{ProtocolDecl::getRequirementSignature()} method; see \SecRef{genericsigsourceref}. +This ultimately implements the \texttt{ProtocolDecl::getRequirementSignature()} method; see \SecRef{src:generic signatures}. \subsection*{Debugging} diff --git a/docs/Generics/chapters/building-generic-signatures.tex b/docs/Generics/chapters/building-generic-signatures.tex index 77aa48ca5db30..1115a58da7370 100644 --- a/docs/Generics/chapters/building-generic-signatures.tex +++ b/docs/Generics/chapters/building-generic-signatures.tex @@ -2,9 +2,9 @@ \begin{document} -\chapter{Building Generic Signatures}\label{building generic signatures} +\chapter{Building Generic Signatures}\label{chap:building generic signatures} -\lettrine{B}{uilding a generic signature} from user-written requirements is something we glossed over before, and it's time to detail it now. We're going to fill in missing steps between the syntax for declaring generic parameters and stating requirements of Sections \ref{generic params}~and~\ref{requirements}, and the \index{generic signature}generic signature of \ChapRef{genericsig}, a semantic representation of the generic parameters and requirements of a declaration. +\lettrine{B}{uilding a generic signature} from user-written requirements is something we glossed over before, and it's time to detail it now. We're going to fill in missing steps between the syntax for declaring generic parameters and stating requirements of \SecRef{generic params}~and~\SecRef{sec:requirements}, and the \index{generic signature}generic signature of \ChapRef{chap:generic signatures}, a semantic representation of the generic parameters and requirements of a declaration. The requirements in a generic signature must be \index{reduced requirement}reduced, \index{minimal requirement}minimal, and ordered in a certain way (we will see the formal definitions in \SecRef{minimal requirements}). To build a generic signature then, we must convert user-written requirements into an \emph{equivalent} set of requirements that satisfy these additional invariants. We're going to start from the entry points for building generic signatures and peel away the layers: \begin{itemize} @@ -25,7 +25,7 @@ \chapter{Building Generic Signatures}\label{building generic signatures} \item If the declaration is a \index{protocol declaration!generic signature}protocol or an unconstrained \index{protocol extension!generic signature}protocol extension, we build the \index{protocol generic signature}protocol generic signature \verb|| using the primitive constructor. \index{generic parameter list} -\item A declaration having neither a generic parameter list nor a \Index{where clause@\texttt{where} clause}trailing \texttt{where} clause simply inherits the generic signature from its parent context. If the declaration is at the top level of a source file, we return the empty generic signature. Otherwise, we recursively evaluate the \Request{generic signature request} against the parent. +\item A declaration having neither a generic parameter list nor a \Index{where clause@\texttt{where} clause}trailing \texttt{where} clause simply inherits the generic signature from its parent context. If the declaration is at the \index{global declaration}top level of a \index{source file}source file, we return the \index{empty generic signature}empty generic signature. Otherwise, we recursively evaluate the \Request{generic signature request} against the parent. \end{enumerate} In every other case, the generic signature request kicks off the lower-level \index{request}\index{request evaluator}\IndexDefinition{inferred generic signature request}\Request{inferred generic signature request}, handing it a list of arguments: \begin{enumerate} @@ -49,12 +49,12 @@ \chapter{Building Generic Signatures}\label{building generic signatures} If we're building the generic signature of a function or subscript declaration, this consists of the declaration's parameter and return types; otherwise, it is empty (\SecRef{requirementinference}). -\item A source location for diagnostics. +\item A \index{source location}source location for diagnostics. \end{enumerate} \paragraph{Inferred generic signature request.} A possibly apocryphal story says the name ``inferred generic signature request'' was chosen because ``requirement inference'' is one of the steps below, but this does not infer the generic signature in any real sense. Rather, this request transforms user-written requirements into a minimal, reduced form via a multi-step process shown in \FigRef{inferred generic signature request figure}: \begin{enumerate} -\item \index{requirement resolution}\textbf{Requirement resolution} builds user-written requirements, from constraint types stated in generic parameter \index{inheritance clause!generic parameter declaration}inheritance clauses and requirement representations in the trailing \Index{where clause@\texttt{where} clause}\texttt{where} clause. This happens in \index{structural resolution stage}structural resolution stage (\ChapRef{typeresolution}), so the resolved requirements may contain \index{unbound dependent member type!in requirements}unbound dependent member types, which will reduce to bound dependent member types in requirement minimization. +\item \index{requirement resolution}\textbf{Requirement resolution} builds user-written requirements, from constraint types stated in generic parameter \index{inheritance clause!generic parameter declaration}inheritance clauses and requirement representations in the trailing \Index{where clause@\texttt{where} clause}\texttt{where} clause. This happens in \index{structural resolution stage}structural resolution stage (\ChapRef{chap:type resolution}), so the resolved requirements may contain \index{unbound dependent member type!in requirements}unbound dependent member types, which will reduce to bound dependent member types in requirement minimization. Diagnostics are emitted here if type resolution fails. @@ -98,7 +98,7 @@ \chapter{Building Generic Signatures}\label{building generic signatures} \end{center} \end{figure} -The \index{diagnostic!from inferred generic signature request}diagnostics mentioned above are emitted at the source location of the declaration, which is given to the request. This source location is used for one more diagnostic, an artificial restriction of sorts. Once we have a generic signature, we ensure that every innermost generic parameter is a \index{reduced type parameter}reduced type. If a generic parameter is not reduced, it must be \index{reduced type equality}equivalent to a concrete type or an earlier generic parameter; it serves no purpose and should be removed, so we diagnose an error\footnote{Well, it's a warning prior to \texttt{-swift-version 6}.}: +The \index{diagnostic!from inferred generic signature request}diagnostics mentioned above are emitted at the \index{source location}source location of the declaration, which is given to the request. This source location is used for one more diagnostic, an artificial restriction of sorts. Once we have a generic signature, we ensure that every innermost generic parameter is a \index{reduced type parameter}reduced type. If a generic parameter is not reduced, it must be \index{reduced type equality}equivalent to a concrete type or an earlier generic parameter; it serves no purpose and should be removed. We diagnose an error in \IndexFlag{language-mode}\texttt{-language-mode 6}, or a warning in earlier language modes: \begin{Verbatim} // error: same-type requirement makes generic parameter `T' non-generic func add(_ lhs: T, _ rhs: T) -> T where T == Int { @@ -139,7 +139,7 @@ \chapter{Building Generic Signatures}\label{building generic signatures} Like the inferred generic signature request, the abstract generic signature request decomposes, desugars and minimizes requirements; for this reason, it is preferred over using the primitive constructor. Often this request is invoked with a list of \index{substituted requirement}substituted requirements obtained by applying a substitution map to each requirement of some original generic signature. This request is used in various places: \begin{itemize} -\item Computing the generic signature of an opaque type declaration (\ChapRef{opaqueresult}). +\item Computing the generic signature of an opaque type declaration (\ChapRef{chap:opaque result types}). \item Computing the generic signature for an opened existential type (\SecRef{open existential archetypes}). \item Checking that a method override declared by a subclass satisfies the generic requirements of the overridden method in the superclass. \end{itemize} @@ -148,10 +148,10 @@ \chapter{Building Generic Signatures}\label{building generic signatures} \iffalse -When a subclass overrides a method from a superclass, the type checker must ensure the subclass method is compatible with the superclass method in order to guarantee that instances of the subclass are dynamically interchangeable with a superclass. If neither the superclass nor the subclass are generic, the compatibility check simply compares the fully concrete parameter and result types of the non-generic declarations. Otherwise, the superclass substitution map plays a critical role yet again, because the compatibility relation must project the superclass method's type into the subclass to meaningfully compare it with the override. +When a subclass overrides a method from a superclass, the type checker must ensure the subclass method is compatible with the superclass method in order to guarantee that instances of the subclass are dynamically interchangeable with a superclass. If neither the superclass nor the subclass are generic, the compatibility check simply compares the fully concrete parameter and return types of the non-generic declarations. Otherwise, the superclass substitution map plays a critical role yet again, because the compatibility relation must project the superclass method's type into the subclass to meaningfully compare it with the override. \paragraph{Non-generic overrides} -The simple case is when the superclass or subclass is generic, but the superclass method does not define generic parameters of its own, either explicitly or via the opaque parameters of \SecRef{requirements}. Let's call such a method ``non-generic,'' even if the class it appears inside is generic. So a non-generic method has the same generic signature as its parent context, which in our case is a class. In the non-generic case, the superclass substitution map is enough to understand the relation between the interface type of the superclass method and its override. +The simple case is when the superclass or subclass is generic, but the superclass method does not define generic parameters of its own, either explicitly or via the opaque parameters of \SecRef{sec:requirements}. Let's call such a method ``non-generic,'' even if the class it appears inside is generic. So a non-generic method has the same generic signature as its parent context, which in our case is a class. In the non-generic case, the superclass substitution map is enough to understand the relation between the interface type of the superclass method and its override. \begin{listing}\captionabove{Some method overrides}\label{method overrides} \begin{Verbatim} @@ -191,7 +191,7 @@ \chapter{Building Generic Signatures}\label{building generic signatures} We build the attaching map by ``extending'' the superclass substitution map, adding replacement types for the superclass method's innermost generic parameters, which map to the subclass method's generic parameters via the above correspondence. In addition to new replacement types, the attaching map stores conformances not present in the superclass substitution map, if the superclass method introduces conformance requirements. -\begin{algorithm}[Compute attaching map for generic method override]\label{superclass attaching map} As input, takes the superclass method's generic signature \texttt{G}, the superclass declaration \texttt{B}, and some subclass declaration \texttt{D}. Outputs a substitution map for \texttt{G}. +\begin{algorithm}[Compute attaching map for generic method override]\label{superclass attaching map} As input, takes the superclass method's generic signature \texttt{G}, the \index{superclass declaration}superclass declaration \texttt{B}, and some subclass declaration \texttt{D}. Outputs a substitution map for \texttt{G}. \begin{enumerate} \item Initialize \texttt{R} to an empty list of replacement types. \item Initialize \texttt{C} to an empty list of conformances. @@ -230,7 +230,7 @@ \chapter{Building Generic Signatures}\label{building generic signatures} \item any additional generic requirements imposed by the subclass method \end{enumerate} The computation of the expected generic signature is similar, except in place of the third step, we build the additional requirements by applying the attaching map to each requirement of the \emph{superclass} method. -\begin{algorithm}[Compute override generic signature] As input, takes the superclass method's generic signature \texttt{G}, the superclass declaration \texttt{B}, and some subclass declaration \texttt{D}. Outputs a new generic signature. +\begin{algorithm}[Compute override generic signature] As input, takes the superclass method's generic signature \texttt{G}, the \index{superclass declaration}superclass declaration \texttt{B}, and some subclass declaration \texttt{D}. Outputs a new generic signature. \begin{enumerate} \item Initialize \texttt{P} to an empty list of generic parameter types. \item Initialize \texttt{R} to an empty list of generic requirements. @@ -242,14 +242,14 @@ \chapter{Building Generic Signatures}\label{building generic signatures} \end{enumerate} \end{algorithm} -For the override to satisfy the contract of the superclass method, it should accept any valid set of concrete type arguments also accepted by the superclass method. The override might be more permissive, however. The correct relation is that each generic requirement of the actual override signature must be satisfied by the expected override signature, but not necessarily vice versa. This uses the same mechanism as conditional requirement checking for conditional conformances, described in \SecRef{conditional conformance}. The requirements of one signature can be mapped to archetypes of the primary generic environment of another signature. This makes the requirement types concrete, which allows the \texttt{isSatisfied()} predicate to be checked against the substituted requirement. +For the override to satisfy the contract of the superclass method, it should accept any valid set of concrete type arguments also accepted by the superclass method. The override might be more permissive, however. The correct relation is that each generic requirement of the actual override signature must be satisfied by the expected override signature, but not necessarily vice versa. This uses the same mechanism as conditional requirement checking for conditional conformances, described in \SecRef{sec:conditional conformances}. The requirements of one signature can be mapped to archetypes of the primary generic environment of another signature. This makes the requirement types concrete, which allows the \texttt{isSatisfied()} predicate to be checked against the substituted requirement. \begin{example} In \ListingRef{method overrides}, the superclass method generic signature is \texttt{}. The generic parameter \texttt{A} belongs to the method; the other two are from the generic signature of the superclass. The override signature glues together the innermost generic parameters and their requirements from the superclass method with the generic signature of the subclass, which is \texttt{}. This operation produces the signature \texttt{}. This is different from the actual override generic signature of \texttt{doStuff()} in \texttt{Derived}, which is \texttt{}. However, the actual signature's requirements are satisfied by the expected signature. \end{example} \fi -\paragraph{Requirement signature request.} This \index{request}\IndexDefinition{requirement signature request}request builds the \index{requirement signature}requirement signature for a given \index{protocol component}\emph{protocol component}, or a collection of one or more mutually-recursive protocols; this will be explained in \SecRef{protocol component}. The evaluation function begins by evaluating two subordinate requests to collect the user-written requirements of each protocol: +\paragraph{Requirement signature request.} This \index{request}\IndexDefinition{requirement signature request}request builds the \index{requirement signature}requirement signature for a given \index{protocol component}\emph{protocol component}, or a collection of one or more mutually-recursive protocols, as we will see in \SecRef{protocol component}. The evaluation function begins by evaluating two subordinate requests to collect the user-written requirements of each protocol: \begin{itemize} \item The \IndexDefinition{structural requirements request}\Request{structural requirements request} collects \index{associated requirement}associated requirements from the protocol’s inheritance clause, \index{associated type declaration!inheritance clause}associated type \index{inheritance clause!associated type declaration}inheritance clauses, any \Index{where clause@\texttt{where} clause!associated type declaration}\texttt{where} clauses on the protocol’s associated types, and the \Index{where clause@\texttt{where} clause!protocol declaration}\texttt{where} clause on the protocol itself. Refer to \SecRef{protocols} for a description of the syntax. \item The \IndexDefinition{type alias requirements request}\Request{type alias requirements request} collects \index{protocol type alias}protocol type aliases and converts them to same-type requirements. These are discussed further in \SecRef{protocol type aliases}. @@ -295,7 +295,7 @@ \chapter{Building Generic Signatures}\label{building generic signatures} \AssocSameStep{1}{\rT}{\rT.Tricky.Other}{4}\\ \SameConfStep{3}{4}{\rT}{Base}{5} \end{gather*} -A special code path detects this problem and \index{diagnostic!protocol inheritance clause}diagnoses a \index{warning}warning to explain what is going on. After building a protocol's requirement signature, the \Request{type-check source file request} verifies that any conformance requirements known to be satisfied by \tSelf\ actually appear in the protocol's inheritance clause, or \Index{where clause@\texttt{where} clause!protocol declaration}\texttt{where} clause entries with a subject type of \tSelf. +A special code path detects this problem and \index{diagnostic!protocol inheritance clause}diagnoses a \index{warning}warning to explain what is going on. After building a protocol's requirement signature, the \Request{type-check primary file request} verifies that any conformance requirements known to be satisfied by \tSelf\ actually appear in the protocol's inheritance clause, or \Index{where clause@\texttt{where} clause!protocol declaration}\texttt{where} clause entries with a subject type of \tSelf. In our example, we discover the unexpected derived requirement $\ConfReq{\ttgp{0}{0}}{Base}$ of $G_\texttt{Bad}$, but at this point it is too late to attempt the failed name lookup of \texttt{Salary} again. The compiler instead suggests that the user should explicitly state the inheritance from \texttt{Base} in the inheritance clause of \texttt{Bad}: \begin{Verbatim} @@ -411,14 +411,14 @@ \section{Requirement Inference}\label{requirementinference} We infer requirements if the declaration has generic parameters \emph{or} a \texttt{where} clause, so in the first two, we infer $\ConfReq{\rU}{Hashable}$. The third method inherits the generic signature of the struct, so the requirement is not satisfied and we diagnose an error. \paragraph{Generic type aliases.} -A reference to a \index{generic type alias}generic type alias resolves to a \index{sugared type}sugared \index{type alias type}type alias type (\SecRef{more types}). This sugared type prints as written when it appears in \index{diagnostic!sugared type alias type}diagnostic messages, but it is canonically equal to its \index{substituted underlying type}substituted underlying type, behaving like it otherwise. So here, the interface type of ``\texttt{x}'' prints as \texttt{OptionalElement>}, but it is canonically equal to its substituted underlying type, \texttt{Optional}: +A reference to a \index{generic type alias}generic type alias resolves to a \index{sugared type}sugared \index{type alias type}type alias type (\SecRef{sec:more types}). This sugared type prints as written when it appears in \index{diagnostic!sugared type alias type}diagnostic messages, but it is canonically equal to its \index{substituted underlying type}substituted underlying type, behaving like it otherwise. So here, the interface type of ``\texttt{x}'' prints as \texttt{OptionalElement>}, but it is canonically equal to its substituted underlying type, \texttt{Optional}: \begin{Verbatim} typealias OptionalElement = Optional let x: OptionalElement> = ... \end{Verbatim} -In the above, we form the substituted underlying type \texttt{Optional} by applying a substitution map that replaces $\rT$ with \texttt{Array} to the underlying type of the type alias declaration, \texttt{Optional}. This gives us the \index{structural components}structural components of a type alias type: a reference to a type alias declaration, a substituted underlying type, and a substitution map. The substitution map is used when printing the sugared type's generic arguments. Cruicially to our topic at hand, this substitution map is also considered by requirement inference, making this one of a handful of language features where the appearance of a sugared type \emph{does} have a semantic effect. In this example, we infer the requirement $\rTSequence$ from considering \texttt{OptionalElement<\rT>}: +In the above, we form the substituted underlying type \texttt{Optional} by applying a substitution map that replaces $\rT$ with \texttt{Array} to the underlying type of the type alias declaration, \texttt{Optional}. This gives us the \index{structural components}structural components of a type alias type: a reference to a type alias declaration, a substituted underlying type, and a substitution map. The substitution map is used when printing the sugared type's generic arguments. Crucially to our topic at hand, this substitution map is also considered by requirement inference, making this one of a handful of language features where the appearance of a sugared type \emph{does} have a semantic effect. In this example, we infer the requirement $\rTSequence$ from considering \texttt{OptionalElement<\rT>}: \begin{Verbatim} func maybePickElement(_ sequence: T) -> OptionalElement \end{Verbatim} @@ -470,7 +470,7 @@ \section{Decomposition and Desugaring}\label{requirement desugaring} \end{itemize} In the \index{derived requirement!decomposition}derived requirements formalism, the right-hand side of a \index{conformance requirement!decomposition}conformance requirement is always a protocol type, however in the syntax we can also write a protocol composition type, or a parameterized protocol type. Such conformance requirements must be split up---this is \emph{requirement decomposition}. We also recall that the left-hand side of a derived requirement is always a type parameter, whereas requirement inference (or even the user) can write down a requirement with any subject type. Such requirements are also split up into zero or more simpler requirements, which gives us \emph{requirement desugaring}. -\paragraph{Decomposition.} This \IndexDefinition{requirement decomposition}formalizes the syntax sugar from Sections \ref{requirements}~and~\ref{protocols}. For example, the standard library defines the \texttt{Codable} \index{type alias type}type alias, whose underlying type is a \index{protocol composition type!decomposition}composition of two protocols, \texttt{Decodable} and \texttt{Encodable}: +\paragraph{Decomposition.} This \IndexDefinition{requirement decomposition}formalizes the syntax sugar from \SecRef{sec:requirements}~and~\SecRef{protocols}. For example, the standard library defines the \texttt{Codable} \index{type alias type}type alias, whose underlying type is a \index{protocol composition type!decomposition}composition of two protocols, \texttt{Decodable} and \texttt{Encodable}: \begin{Verbatim} typealias Codable = Decodable & Encodable \end{Verbatim} @@ -518,7 +518,7 @@ \section{Decomposition and Desugaring}\label{requirement desugaring} func f(_: Array) where Array: Sequence {} \end{Verbatim} -A more useful application of decomposition is the following. The \Request{abstract generic signature request} performs decomposition to deal with the opaque return types of \ChapRef{opaqueresult} and existential types of \ChapRef{existentialtypes}. We reason about these kinds of types by using this request to build an ``auxiliary'' generic signature describing the type. The \index{constraint type}constraint type after the \texttt{some} or \texttt{any} keyword defines a conformance requirement; this requirement must be decomposed by the same algorithm, so that we can interpret something like ``\texttt{any Sequence}'' or ``\verb|some Equatable & AnyObject|''. +A more useful application of decomposition is the following. The \Request{abstract generic signature request} performs decomposition to deal with the opaque result types of \ChapRef{chap:opaque result types} and existential types of \ChapRef{chap:existential types}. We reason about these kinds of types by using this request to build an ``auxiliary'' generic signature describing the type. The \index{constraint type}constraint type after the \texttt{some} or \texttt{any} keyword defines a conformance requirement; this requirement must be decomposed by the same algorithm, so that we can interpret something like ``\texttt{any Sequence}'' or ``\verb|some Equatable & AnyObject|''. \paragraph{Desugaring.} Requirement inference is one of several instances of a useful pattern: we apply a substitution map to the minimal requirements of a generic signature, and build a new generic signature from these substituted requirements. We use the \Request{abstract generic signature request} in this way to check class method overrides, for example. When the original requirements are minimal, the substituted requirements are already decomposed, but it is possible for such a requirement to have a left-hand side that is not a type parameter, but a concrete type, introduced by the substitution. @@ -529,8 +529,8 @@ \section{Decomposition and Desugaring}\label{requirement desugaring} This rule essentially determines the implementation of \IndexDefinition{requirement desugaring}requirement desugaring. Let's first consider a requirement that does not contain type parameters at all, such as $\ConfReq{Int}{Hashable}$ or $\SameReq{Int}{String}$. Applying a substitution map cannot ever change our requirement, so it is always true or always false; we can check it with \AlgRef{reqissatisfied}: \begin{itemize} -\item If the requirement is satisfied, we we can delete it, that is, replace it with the \index{empty set}empty set of requirements, without violating our invariant. It doesn't contribute anything new. -\item If the requirement is unsatisfied on the other hand, the only way to proceed is to replace it with something else unsatisfiable, so we must diagnose an error and give up. +\item If the requirement is satisfied, we can delete it, that is, replace it with the \index{empty set}empty set of requirements, without violating our invariant. The requirement doesn't contribute anything new. +\item On the other hand, if the requirement is unsatisfied, the only way to proceed is to replace this requirement with something else unsatisfiable, so we must diagnose an error and give up. \end{itemize} The second case merits further explanation. A generic declaration whose requirements cannot be satisfied by any substitution map is essentially useless, and we will see in \SecRef{minimal requirements} that we try to uncover situations where two requirements are in conflict with each other and cannot be simultaneously satisfied. Here though, requirement desugaring detects the trivial case where a requirement \index{conflicting requirement}``conflicts'' with itself. @@ -595,7 +595,7 @@ \section{Decomposition and Desugaring}\label{requirement desugaring} \item For a \index{superclass requirement!desugaring}\textbf{superclass requirement} $\TC$: \begin{enumerate} \item If \tT\ and \tC\ are two specializations of the same \index{generic class type}\index{class declaration}class declaration, add the same-type requirement $\SameReq{T}{C}$ to the worklist. -\item If \tT\ does not have a \index{superclass type}superclass type (\ChapRef{classinheritance}), then \tT\ cannot be a subclass of~\tC; add $\TC$ to the conflict list. +\item If \tT\ does not have a \index{superclass type}superclass type (\SecRef{classinheritance}), then \tT\ cannot be a subclass of~\tC; add $\TC$ to the conflict list. \item Otherwise, let $\tTp$ be the superclass type of \tT. Add the superclass requirement $\ConfReq{$\tTp$}{C}$ to the worklist. \end{enumerate} \item For a \index{layout requirement!desugaring}\textbf{layout requirement} $\TAnyObject$, any type parameters contained in the concrete type \tT\ have no bearing on the outcome. It suffices to apply \AlgRef{reqissatisfied}. If unsatisfied, add $\TAnyObject$ to the conflict list. @@ -680,7 +680,7 @@ \section{Well-Formed Requirements}\label{generic signature validity} \AssocConfStep{2}{\rT.Element.Iterator}{IteratorProtocol}{3}\\ \AssocNameStep{3}{\rT.Element.Iterator.Element}{4} \end{gather*} -The derived requirements (1)~and~(2) are not well-formed because their subject types are not valid type parameters, and the valid type parameter (4) has an invalid prefix \texttt{\rT.Element}. Clearly, \texttt{Bad} ought to be rejected by the compiler. What actually happens when we type check \texttt{Bad}? Recall the \index{type resolution stage}type resolution stage from \ChapRef{typeresolution}. We first resolve the requirement $\ConfReq{\rT.Element}{Collection}$ in \index{structural resolution stage}structural resolution stage, and we get a requirement whose subject type is an \index{unbound dependent member type!in requirements}unbound dependent member type. We don't know that this requirement is not well-formed, yet. +The derived requirements (1)~and~(2) are not well-formed because their subject types are not valid type parameters, and the valid type parameter (4) has an invalid prefix \texttt{\rT.Element}. Clearly, \texttt{Bad} ought to be rejected by the compiler. What actually happens when we type check \texttt{Bad}? Recall the \index{type resolution stage}type resolution stage from \ChapRef{chap:type resolution}. We first resolve the requirement $\ConfReq{\rT.Element}{Collection}$ in \index{structural resolution stage}structural resolution stage, and we get a requirement whose subject type is an \index{unbound dependent member type!in requirements}unbound dependent member type. We don't know that this requirement is not well-formed, yet. After we build the generic signature for \texttt{Bad}, we revisit the \texttt{where} clause again, and resolve the requirement in the \index{interface resolution stage}interface resolution stage. As the subject type is not a valid type parameter, type resolution \index{diagnostic!invalid type parameter}diagnoses an error and returns an \index{error type}error type: \begin{Verbatim} @@ -706,7 +706,7 @@ \section{Well-Formed Requirements}\label{generic signature validity} \end{theorem} To prove this theorem, we must expand our repertoire for reasoning about derivations. First, we recall the \index{protocol generic signature}protocol generic signature from \SecRef{requirement sig}. If~\tP\ is any protocol, then its generic signature, which we denote by~$\GP$, has the single requirement $\ConfReq{Self}{P}$. As always, the protocol \tSelf\ type is sugar for $\rT$. -The protocol generic signature describes the structure generated by the protocol's requirement signature. These are the \index{valid type parameter!protocol generic signature}valid type parameters and derived requirements inside the declaration of the protocol and its unconstrained extensions. These type parameters are all rooted in the protocol \tSelf\ type, and the derived requirements talk about these \tSelf-rooted type parameters. Informally, anything we can say about the protocol \tSelf\ type in $\GP$, should also be true of an arbitrary type parameter \tT\ in some other generic signature~$G$ where $G\vdash\TP$. We will now make this precise. +The protocol generic signature describes the structure generated by the protocol's requirement signature. These are the \index{valid type parameter!protocol generic signature}valid type parameters and derived requirements inside the declaration of the protocol and its \index{unconstrained extension}unconstrained extensions. These type parameters are all rooted in the protocol \tSelf\ type, and the derived requirements talk about these \tSelf-rooted type parameters. Informally, anything we can say about the protocol \tSelf\ type in $\GP$, should also be true of an arbitrary type parameter \tT\ in some other generic signature~$G$ where $G\vdash\TP$. We will now make this precise. For example, we might first define an algorithm in a protocol extension of \texttt{Collection}, and then call our algorithm from another generic function: \begin{Verbatim} @@ -898,7 +898,7 @@ \section{Well-Formed Requirements}\label{generic signature validity} \end{gather*} This completes the induction. \end{proof} -Formally, structural induction depends on a \index{well-founded order}well-founded order (\SecRef{reduced types}), so we would use the ``containment'' order on derivations. However, the ``recursive algorithm'' viewpoint is good enough for us. Induction over the natural numbers is covered in introductory books such as \cite{grimaldi}; for structural induction in formal logic, see something like~\cite{bradley2007calculus}. We will use structural induction over derivations again to study conformance paths in \SecRef{conformance paths exist}, encode finitely-presented monoids as protocols in \SecRef{monoidsasprotocols}, and finally present a correctness proof for the Requirement Machine in \ChapRef{symbols terms rules}. +Formally, structural induction depends on a \index{well-founded order}well-founded order (\SecRef{reduced types}), so we would use the ``containment'' order on derivations. However, the ``recursive algorithm'' viewpoint is good enough for us. Induction over the natural numbers is covered in introductory books such as \cite{grimaldi}; for structural induction in formal logic, see something like~\cite{bradley2007calculus}. We will use structural induction over derivations again to study conformance paths in \SecRef{conformance paths exist}, encode finitely-presented monoids as protocols in \SecRef{monoidsasprotocols}, and finally present a correctness proof for the Requirement Machine in \ChapRef{chap:symbols terms rules}. \medskip @@ -957,7 +957,7 @@ \section{Requirement Minimization}\label{minimal requirements} \item Type substitution only accepts bound dependent member types. To ensure that we can apply a substitution map to the requirements of a generic signature, as we do in \AlgRef{check generic arguments algorithm} for example, each requirement is rewritten to use \index{bound dependent member type!type substitution}bound dependent member types. -\item Generic signatures describe the calling convention of generic functions, the layout of nominal type metadata, the mangling of symbol names, and so on. To ensure that trivial syntactic changes do not affect ABI, each requirement in a generic signature is \emph{reduced} into the simplest possible form, redundant requirements are dropped to produce a \emph{minimal} list, and this list is sorted in canonical order. +\item Generic signatures describe the calling convention of generic functions, the layout of nominal type metadata, the \index{mangling}mangling of symbol names, and so on. To ensure that trivial syntactic changes do not affect ABI, each requirement in a generic signature is \emph{reduced} into the simplest possible form, redundant requirements are dropped to produce a \emph{minimal} list, and this list is sorted in canonical order. \item We need to detect and \index{diagnostic!conflicting requirement}diagnose generic signatures with \emph{conflicting requirements} that cannot be satisfied by any \index{well-formed substitution map}well-formed substitution map. This allows us to assume that all generic signatures in the \index{main module}main module are satisfiable as long as no diagnostics are emitted during type checking. \end{enumerate} @@ -1221,7 +1221,7 @@ \section{Requirement Minimization}\label{minimal requirements} \end{verbatim} \end{quote} -The fact that \texttt{Knot2} has a distinct generic signature from the other two was actually due to a quirk of the \Index{GenericSignatureBuilder@\texttt{GenericSignatureBuilder}}\texttt{GenericSignatureBuilder}, and this behavior is now part of the Swift \index{ABI}ABI. The stronger form of requirement minimization that guarantees uniqueness would actually be \emph{simpler} to implement, and we will explain the minor complication with the legacy behavior in \SecRef{minimal conformances}. +The fact that \texttt{Knot2} has a distinct generic signature from the other two was actually due to a quirk of the \Index{GenericSignatureBuilder@\texttt{GenericSignatureBuilder}}\texttt{GenericSignatureBuilder}, and this behavior is now part of the Swift \index{ABI}ABI. The stronger form of requirement minimization that guarantees uniqueness would actually be \emph{simpler} to implement. A minor complication caused by the legacy behavior will be explained in \SecRef{minimal conformances}. There is another downside, from a theoretical standpoint. With type parameters, we are able to \index{reduced type equality}check equivalence by comparing their reduced types. However, we cannot check two lists of requirements for ``theory equivalence'' by comparing minimal generic signatures, because they are not unique. In practice though, nothing seems to call for this equivalence check; this is unlike type parameters, which are checked for reduced type equality all over the place. @@ -1309,27 +1309,26 @@ \section{Requirement Minimization}\label{minimal requirements} \end{verbatim} \end{quote} -We now state the general definition. - \begin{definition}\label{conflicting req def} -Let $G$ be a \index{well-formed generic signature}well-formed generic signature. If $G$ has a pair of \index{derived requirement!conflicts}derived requirements $R_1$~and~$R_2$ where $R_1\otimes\Sigma$ and $R_2\otimes\Sigma$ cannot both be \index{satisfied requirement!conflicts}satisfied by the same substitution map~$\Sigma$, then $R_1$~and~$R_2$ define a pair of \IndexDefinition{conflicting requirement}\emph{conflicting requirements}. A generic signature $G$ is \emph{conflict-free} if it does not have any pairs of conflicting requirements. The pairs of derived requirements that can lead to conflicts are enumerated below: +Let $G$ be a \index{well-formed generic signature}well-formed generic signature. If $G$ has a pair of \index{derived requirement!conflicts}derived requirements $R_1$~and~$R_2$ such that for any substitution map $\Sigma$, at least one of $R_1\otimes\Sigma$ or $R_2\otimes\Sigma$ is always \index{satisfied requirement!conflicts}unsatisfied, then $R_1$~and~$R_2$ are \IndexDefinition{conflicting requirement}\emph{conflicting requirements}. We can also characterize conflicting requirements in terms of \index{desugared requirement}requirement desugaring: \begin{enumerate} -\item For two concrete \index{same-type requirement!conflicts}same-type requirements $\SameReq{T}{$\tX_1$}$ and $\SameReq{T}{$\tX_2$}$, we desugar the ``combined'' requirement $\SameReq{$\tX_1$}{$\tX_2$}$, as we already saw. Here and every remaining case below, desugaring will either detect a conflict, or produce a simpler list of requirements that can replace one of the two original requirements. -\item For a concrete same-type requirement $\TX$ and superclass requirement $\TC$, we desugar $\ConfReq{X}{C}$, which can be satisfied only if~\tX\ is a class type that is also a subclass of~\tC. -\item For a same-type requirement $\TX$ and a \index{layout requirement!conflicts}layout requirement $\TAnyObject$, we desugar $\ConfReq{X}{AnyObject}$, which can be satisfied only if \tX\ is a class type. -\item For a same-type requirement $\TX$ and a \index{conformance requirement!conflicts}conformance requirement $\TP$, we desugar $\ConfReq{X}{P}$, which can be satisfied only if \tX\ conforms to \tP. +\item For two concrete \index{same-type requirement!conflicts}same-type requirements $\SameReq{T}{$\tX_1$}$ and $\SameReq{T}{$\tX_2$}$, we desugar the ``combined'' requirement $\SameReq{$\tX_1$}{$\tX_2$}$ using \AlgRef{desugar same type algo}. Desugaring will either detect a conflict, or produce a simpler list of requirements to replace one of the two original requirements, in which case we can look for conflicts again. +\item For a concrete same-type requirement $\TX$ and superclass requirement $\TC$, we desugar $\ConfReq{X}{C}$. Desugaring will succeed if and only if \tX~is a class type that is also a subclass of~\tC, otherwise it will detect a conflict. +\item For a same-type requirement $\TX$ and a \index{layout requirement!conflicts}layout requirement $\TAnyObject$, we desugar $\ConfReq{X}{AnyObject}$. This will succeed if and only if \tX~is any \index{class type}class type. +\item For a same-type requirement $\TX$ and a \index{conformance requirement!conflicts}conformance requirement $\TP$, we desugar $\ConfReq{X}{P}$. This will succeed if and only if \tX\ conforms to \tP. \item For two \index{superclass requirement!conflicts}superclass requirements $\ConfReq{T}{$\tC_1$}$ and $\ConfReq{T}{$\tC_2$}$, we must consider the \index{superclass type}superclass relationship between the declarations of $\tC_1$~and~$\tC_2$: \begin{enumerate} \item If the \index{class declaration!superclass requirement}class declaration of $\tC_1$ is a subclass of the declaration of $\tC_2$, we desugar $\ConfReq{$\tC_1$}{$\tC_2$}$, and $\ConfReq{T}{$\tC_2$}$ becomes redundant. \item If the class declaration of $\tC_2$ is a subclass of the declaration of $\tC_1$, we desugar $\ConfReq{$\tC_2$}{$\tC_1$}$, and $\ConfReq{T}{$\tC_1$}$ becomes redundant. -\item If the two declarations are unrelated, the requirements conflict. +\item If the two declarations are unrelated, we have a conflict. \end{enumerate} \item For a superclass requirement $\TC$ and a layout requirement $\TAnyObject$, we desugar $\ConfReq{C}{AnyObject}$, which is always satisfied and cannot conflict. -\item For a superclass requirement $\TC$ and a conformance requirement $\TP$, we desugar $\ConfReq{C}{P}$. If \tC\ conforms to \tP, the conformance requirement $\TP$ becomes redundant. However, if \tC\ does not conform to \tP, there is no conflict; the generic signature just requires a subclass of \tC\ that \emph{also} conforms to \tP. +\item For a superclass requirement $\TC$ and a conformance requirement $\TP$, we desugar $\ConfReq{C}{P}$. If \tC\ conforms to \tP, the conformance requirement $\TP$ becomes redundant. However, if \tC\ does not conform to \tP, there is no conflict; the generic signature just requires that \tT\ is a subclass of \tC\ that \emph{also} conforms to \tP. \end{enumerate} +We say $G$ is \emph{conflict-free} if it does not have any pairs of conflicting requirements. \end{definition} -We will explain how the implementation deals with superclass, layout and concrete same-type requirements, sans theory, in Chapters \ref{propertymap}~and~\ref{concrete conformances}, but we're going to look at two examples here. +We will investigate the compiler's implementation of superclass, layout, and concrete same-type requirements in \ChapRef{propertymap}, but in this section, we're just going to look at a couple of examples. \smallskip @@ -1416,7 +1415,7 @@ \section{Requirement Minimization}\label{minimal requirements} The major difference is that we must minimize all requirement signatures of a set of mutually-dependent protocols, or a \index{protocol component}\emph{protocol component}, simultaneously. We will discuss this again in \SecRef{protocol component} and see an example in \SecRef{homotopy reduction}. -\section{Source Code Reference}\label{buildinggensigsourceref} +\section{Source Code Reference}\label{src:building generic signatures} \subsection*{Requests} @@ -1426,18 +1425,18 @@ \subsection*{Requests} \item \SourceFile{lib/AST/RequirementMachine/RequirementMachineRequests.cpp} \end{itemize} -The header file declares the requests; the evaluation functions are implemented by the Requirement Machine (\SecRef{rqm basic operation source ref}). +The header file declares the requests; the evaluation functions are implemented by the Requirement Machine (\SecRef{src:basic operation}). \IndexSource{generic signature constructor} \apiref{GenericSignature}{class} -See also \SecRef{genericsigsourceref}. +See also \SecRef{src:generic signatures}. \begin{itemize} \item \texttt{get()} is the primitive constructor, which builds a generic signature directly from a list of generic parameters and minimal requirements. \end{itemize} \IndexSource{generic signature request} \apiref{GenericSignatureRequest}{class} -The \texttt{GenericContext::getGenericSignature()} method (\SecRef{genericsigsourceref}) evaluates this request, which either returns the parent declaration's generic signature, or evaluates \texttt{InferredGenericSignatureRequest} with the appropriate arguments. +The \texttt{GenericContext::getGenericSignature()} method (\SecRef{src:generic signatures}) evaluates this request, which either returns the parent declaration's generic signature, or evaluates \texttt{InferredGenericSignatureRequest} with the appropriate arguments. \IndexSource{inferred generic signature request} \apiref{InferredGenericSignatureRequest}{class} @@ -1473,8 +1472,8 @@ \subsection*{Requests} The \texttt{visitRequirements()} method is called with \texttt{TypeResolutionStage::Structural} by the \texttt{InferredGenericSignatureRequest} and \texttt{RequirementSignatureRequest}. \index{primary file} -\index{type-check source file request} -The \texttt{TypeCheckSourceFileRequest} then visits all \texttt{where} clauses in every primary file again, this time with \texttt{TypeResolutionStage::Interface}. This is how invalid dependent member types in \texttt{where} clauses get diagnosed. Recall that the structural resolution stage builds unbound dependent member types, without any knowledge of what associated type declarations are visible. The interface resolution stage actually performs a name lookup using the generic signature that was built, catching invalid dependent member types. +\index{type-check primary file request} +The \texttt{TypeCheckPrimaryFileRequest} then visits all \texttt{where} clauses in every primary file again, this time with \texttt{TypeResolutionStage::Interface}. This is how we diagnose any invalid dependent member types in \texttt{where} clauses. Recall that the structural resolution stage builds unbound dependent member types, without any knowledge of what associated type declarations are visible. \IndexSource{abstract generic signature request} \apiref{AbstractGenericSignatureRequest}{class} @@ -1486,21 +1485,21 @@ \subsection*{Requests} \apiref{GenericSignatureErrorFlags}{enum class} Error flags returned by \texttt{AbstractGenericSignatureRequest}. We will see these conditions again in \ChapRef{rqm basic operation}; they prevent the requirement machine for this signature from being \emph{installed}. \begin{itemize} -\item \texttt{HasInvalidRequirements}: the original requirements were not \IndexSource{well-formed requirement}well-formed, or were in \IndexSource{conflicting requirement}conflict with each other. Any errors in the requirements handed to this request usually mean there was another error diagnosed elsewhere, like an invalid conformance, so this flag being set is not really actionable to the rest of the compiler. Without source location information, this error cannot be diagnosed in a friendly manner. +\item \texttt{HasInvalidRequirements}: the original requirements were not \IndexSource{well-formed requirement}well-formed, or were in \IndexSource{conflicting requirement}conflict with each other. Any errors in the requirements handed to this request usually mean there was another error diagnosed elsewhere, like an invalid conformance, so this flag being set is not really actionable to the rest of the compiler. Without \IndexSource{source location}source location information, this error cannot be diagnosed in a friendly manner. \item \texttt{HasConcreteConformances}: the generic signature had non-redundant concrete conformance requirements, which is an internal flag used to prevent the requirement machine from being installed. It does not indicate an error condition to the caller. See \SecRef{concrete contraction} for discussion. \item \texttt{CompletionFailed}: the \index{completion}completion procedure could not construct a \index{convergent rewriting system}convergent rewriting system within the maximum number of steps (see the discussion of termination that immediately follows \AlgRef{knuthbendix}). This is actually fatal, so the \texttt{buildGenericSignature()} wrapper function aborts the compiler in this case. \end{itemize} \IndexSource{requirement signature constructor} \apiref{RequirementSignature}{class} -See also \SecRef{genericsigsourceref}. +See also \SecRef{src:generic signatures}. \begin{itemize} \item \texttt{get()} is the primitive constructor, which builds a requirement signature directly from a list of minimal requirements and protocol type aliases. \end{itemize} \IndexSource{requirement signature request} \apiref{RequirementSignatureRequest}{class} -The \texttt{ProtocolDecl::getRequirementSignature()} method (\SecRef{genericsigsourceref}) evaluates this request, which computes the protocol's requirement signature if the protocol is in the main module, or deserializes it if the protocol is from a serialized module. +The \texttt{ProtocolDecl::getRequirementSignature()} method (\SecRef{src:generic signatures}) evaluates this request, which computes the protocol's requirement signature if the protocol is in the main module, or deserializes it if the protocol is from a serialized module. \IndexSource{structural requirements request} \apiref{StructuralRequirementsRequest}{class} @@ -1519,7 +1518,7 @@ \subsection*{Requirement Resolution} \item \SourceFile{lib/AST/RequirementMachine/RequirementLowering.cpp} \end{itemize} -User-written requirements are wrapped in the \texttt{StructuralRequirement} type, which stores a \texttt{Requirement} together with a source location used for diagnostics. A couple of functions defined in \texttt{RequirementLowering.cpp} construct \texttt{StructuralRequirement} instances. The \texttt{InferredGenericSignatureRequest} calls these functions directly. The \texttt{RequirementSignatureRequest} delegates to \texttt{StructuralRequirementsRequest}, which uses them to resolve requirements written in a protocol declaration. +User-written requirements are wrapped in the \texttt{StructuralRequirement} type, which stores a \texttt{Requirement} together with a \IndexSource{source location}source location used for diagnostics. A couple of functions defined in \texttt{RequirementLowering.cpp} construct \texttt{StructuralRequirement} instances. The \texttt{InferredGenericSignatureRequest} calls these functions directly. The \texttt{RequirementSignatureRequest} delegates to \texttt{StructuralRequirementsRequest}, which uses them to resolve requirements written in a protocol declaration. \apiref{rewriting::realizeRequirement()}{function} Calls the \texttt{WhereClauseOwner::visitRequirements()} method to resolve requirements written in \texttt{where} clauses, and wraps the results in \texttt{StructuralRequirement} instances. @@ -1568,7 +1567,7 @@ \subsection*{Requirement Minimization} \IndexSource{requirement order} \apiref{Requirement}{class} -See also \SecRef{genericsigsourceref}. +See also \SecRef{src:generic signatures}. \begin{itemize} \item \texttt{compare()} implements the requirement order (\AlgRef{requirement order}), returning one of the following: \begin{itemize} @@ -1582,7 +1581,7 @@ \subsection*{Requirement Minimization} \IndexSource{minimal requirement} \IndexSource{reduced requirement} \apiref{GenericSignatureImpl}{class} -See also \SecRef{genericsigsourceref}. +See also \SecRef{src:generic signatures}. \begin{itemize} \item \texttt{verify()} ensures that all explicit requirements in this signature are desugared (\DefRef{desugaredrequirementdef}), reduced (\DefRef{reduced requirement}), minimal (\DefRef{minimal generic sig def}), and ordered (\AlgRef{requirement order}). Any violations report a fatal error that crashes the compiler even in no-assert builds, since such generic signatures should not be built at all. \end{itemize} diff --git a/docs/Generics/chapters/compilation-model.tex b/docs/Generics/chapters/compilation-model.tex index 5e811eb10be04..a0e2684a58d5e 100644 --- a/docs/Generics/chapters/compilation-model.tex +++ b/docs/Generics/chapters/compilation-model.tex @@ -2,7 +2,7 @@ \begin{document} -\chapter{Compilation Model}\label{compilation model} +\chapter{Compilation Model}\label{chap:compilation model} \lettrine{M}{ost developers} interact with the Swift compiler through the \index{Xcode}Xcode build system or the \index{Swift package manager}Swift package manager, but for simplicity's sake we're just going to consider direct invocation of \texttt{swiftc} from the command line. The \texttt{swiftc} command runs the \IndexDefinition{Swift driver}\emph{Swift driver}, which invokes the \emph{Swift frontend} program to actually compile each source file; then, depending on the usage mode, the driver runs additional tools, such as the linker, to produce the final build artifact. Most of this book concerns the frontend, but we will briefly review the operation of the driver now. @@ -21,26 +21,26 @@ \chapter{Compilation Model}\label{compilation model} \item The older \texttt{@NSApplicationMain} and \texttt{@UIApplicationMain} attributes, deprecated since \IndexSwift{5.a@5.10}Swift 5.10~\cite{se0383}, provide a similar mechanism specific to Apple platforms. Attaching one of these attributes to a class conforming to \texttt{NSApplicationMain} or \texttt{UIApplicationMain}, respectively, will generate a main entry point which calls the \texttt{NSApplicationMain()} or \texttt{UIApplicationMain()} system framework function. \end{enumerate} -Invoking the driver with the \IndexFlag{emit-library}\texttt{-emit-library} and \IndexFlag{emit-module}\texttt{-emit-module} flags instructs it to generate a shared library, together with the serialized module file consumed by the compiler when importing the library (\SecRef{module system}): +The \IndexFlag{emit-library}\texttt{-emit-library} and \IndexFlag{emit-module}\texttt{-emit-module} flags instruct the driver to generate a shared library together with the \emph{serialized module} file that is consumed by the compiler when importing the library (\SecRef{module system}): \begin{Verbatim} $ swiftc algorithm.swift utils.swift -module-name SudokuSolver -emit-library -emit-module \end{Verbatim} \paragraph{Frontend jobs.} -The \IndexDefinition{Swift frontend}Swift frontend itself is single-threaded, but the driver can benefit from multi-core concurrency by running multiple \IndexDefinition{frontend job}frontend jobs in parallel. Each frontend job compiles one or more source files; these are the \IndexDefinition{primary file}\emph{primary source files} of the frontend job. All non-primary source files are the \IndexDefinition{secondary file}\emph{secondary source files} of the frontend job. The assignment of primary source files to each frontend job is determined by the \emph{compilation mode}: +The \IndexDefinition{Swift frontend}Swift frontend itself is single-threaded, but the driver can benefit from multi-core concurrency by running multiple \IndexDefinition{frontend job}frontend jobs in parallel. Each frontend job compiles one or more \index{source file}source files; these are the \IndexDefinition{primary file}\emph{primary source files} of the frontend job. All non-primary source files are the \IndexDefinition{secondary file}\emph{secondary source files} of the frontend job. The assignment of primary source files to each frontend job is determined by the \emph{compilation mode}: \begin{itemize} \item The \IndexFlag{wmo}\texttt{-wmo} driver flag selects \IndexDefinition{whole module optimization}\emph{whole module mode}, typically used for \index{release build}release builds. In this mode, the driver schedules a single frontend job. The primary files of this job are all the source files in the main module, and there are no secondary files. In whole module mode, the frontend is able to perform more aggressive optimization across source file boundaries, hence its usage for release builds. \item The \IndexFlag{disable-batch-mode}\texttt{-disable-batch-mode} driver flag selects \IndexDefinition{single file mode}\emph{single file mode}, with one frontend job per source file. In this mode, each frontend job has a single primary file, with all other files being secondary files. Single file mode was the default for \index{debug build}debug builds until \IndexSwift{4.1}Swift~4.1, however these days it is only used for testing the compiler. -Single file mode incurs inexorable overhead in the form of duplicated work between frontend jobs; if two source files reference the same declaration in a third source file, the two frontend jobs will both need to parse and type check this declaration as there is no caching across frontend jobs (the next two sections detail how the frontend deals with secondary files, with delayed parsing and the request evaluator respectively). +Single file mode incurs overhead in the form of duplicated work between frontend jobs; if two source files reference some declaration from a third source file, all three frontend jobs will need to parse and type check the third declaration; there is no caching or shared state between frontend jobs. (The next two sections detail how the frontend deals with secondary files, with delayed parsing and the request evaluator respectively.) \item The \IndexFlag{enable-batch-mode}\texttt{-enable-batch-mode} driver flag selects \IndexDefinition{batch mode}\emph{batch mode}, which is a happy medium between whole module and single file mode. In batch mode, the list of source files is partitioned into fixed-size batches, up to the maximum batch size. The source files in each batch become the primary files of each frontend job. By compiling multiple primary files in a single frontend job, batch mode amortizes the cost of parsing and type checking work performed on secondary files. At the same time, it still schedules multiple frontend jobs for parallelism on multi-core systems. Batch mode was first introduced in \IndexSwift{4.2}Swift 4.2, and is now the default for debug builds. \end{itemize} -Note that each source file is a primary source file of exactly one frontend job, and within a single frontend job, the primary files and secondary files together form the full list of source files in the module. A single source file is therefore the minimum unit of parallelism. By default, the number of concurrent frontend jobs is determined by the number of CPU cores; this can be overridden with the \IndexFlag{j}\texttt{-j} driver flag. If there are more frontend jobs than can be run simultaneously, the driver queues them and kicks them off as other frontend jobs complete. In batch mode and single file mode, the driver can also perform an \index{incremental build}\emph{incremental build} by re-using the result of previous compilations, providing an additional compile-time speedup. Incremental builds are described in \SecRef{request evaluator}. +Each source file is a primary source file of exactly one frontend job, and within a single frontend job, the primary files and secondary files form a partition of the full list of source files in the module. A single source file is therefore the minimum unit of parallelism. By default, the number of concurrent frontend jobs is determined by the number of CPU cores; this can be overridden with the \IndexFlag{j}\texttt{-j} driver flag. If there are more frontend jobs than can be run simultaneously, the driver queues them and kicks them off as other frontend jobs complete. In batch mode and single file mode, the driver can also perform an \index{incremental build}\emph{incremental build} by re-using the result of previous compilations, providing an additional compile-time speedup. Incremental builds are described in \SecRef{request evaluator}. The \index[flags]{###@\texttt{-\#\#\#}}\verb|-###| driver flag performs a ``dry run'' which prints all commands to run without actually doing anything. In the below example, the driver schedules three frontend jobs, with each job having a single primary source file and two secondary files. The final command is the linker invocation, which combines the output of each frontend job into our binary executable. \begin{Verbatim} @@ -53,16 +53,16 @@ \chapter{Compilation Model}\label{compilation model} \paragraph{Compilation pipeline.} \FigRef{compilerpipeline} shows a high-level view of the Swift frontend; this resembles the classic multi-pass compiler design, described in \cite{muchnick1997advanced} or \cite{cooper2004engineering} for example: -\begin{itemize} +\begin{enumerate} \item \IndexDefinition{parser}\textbf{Parse:} Source files are parsed, building the \IndexDefinition{abstract syntax tree}\index{AST|see{abstract syntax tree}}\index{syntax tree|see{abstract syntax tree}}abstract syntax \index{tree}tree. \item \IndexDefinition{Sema}\textbf{Sema:} Semantic analysis is performed, producing a type-checked syntax tree. (We'll see shortly the first two stages are not completely sequential.) -\item \IndexDefinition{SILGen}\textbf{SILGen:} The type-checked syntax tree is lowered to \IndexDefinition{raw SIL}``raw SIL.'' \IndexDefinition{SIL}SIL is the Swift Intermediate Language, described in \cite{sil} and \cite{siltalk}. +\item \IndexDefinition{SILGen}\textbf{SILGen:} The syntax tree is lowered to \IndexDefinition{raw SIL}``raw SIL.'' \IndexDefinition{SIL}SIL is the Swift Intermediate Language, an \index{SSA form}SSA (static single assignment) program representation \cite{sil,siltalk}. \item \IndexDefinition{SIL optimizer}\textbf{SILOptimizer:} The raw SIL is transformed into \IndexDefinition{canonical SIL}``canonical SIL'' by a series of \IndexDefinition{SIL mandatory pass}\emph{mandatory passes}, which analyze the control flow graph and emit diagnostics; for example, \IndexDefinition{definite initialization}\emph{definite initialization} ensures that all storage locations are initialized. When the \IndexFlag{O}\texttt{-O} command line flag is specified, the canonical SIL is optimized by a series of \IndexDefinition{SIL performance pass}\emph{performance passes} to improve run-time performance and code size. \item \IndexDefinition{IRGen}\textbf{IRGen:} The optimized SIL is then transformed into LLVM IR. \item \index{LLVM}\textbf{LLVM:} Finally, the LLVM IR is handed off to LLVM, which performs various lower level optimizations before generating machine code. (LLVM is, of course, the project formerly known as the ``Low Level Virtual Machine \cite{llvm}.'') -\end{itemize} +\end{enumerate} \begin{figure}\captionabove{The compilation pipeline}\label{compilerpipeline} \begin{center} @@ -88,7 +88,7 @@ \chapter{Compilation Model}\label{compilation model} \paragraph{Debugging flags.} Various command-line flags are provided to run the pipeline until a certain phase, and dump the output of that phase to the terminal (or some other file, in conjunction with the \IndexFlag{o}\texttt{-o} flag). These are useful for debugging the compiler: \begin{itemize} -\item \IndexFlag{dump-parse}\texttt{-dump-parse} runs only the parser, and prints the \index{abstract syntax tree}syntax tree as an \index{s-expression}s-expression.\footnote{The term comes from \index{Lisp}Lisp. An s-expression represents a tree structure as nested parenthesized lists; e.g.\ \texttt{(a (b c) d)} is a node with three children \texttt{a}, \texttt{(b c)} and \texttt{d}, and \texttt{(b c)} has two children \texttt{b} and \texttt{c}.} +\item \IndexFlag{dump-parse}\texttt{-dump-parse} runs only the parser, and prints the \index{abstract syntax tree}syntax tree as an \IndexDefinition{s-expression}\emph{s-expression}. (The term comes from \index{Lisp}Lisp. An s-expression represents a tree structure by nested parenthesized lists; e.g.\ \texttt{(a (b c) d)} is a node with three children \texttt{a}, \texttt{(b c)} and \texttt{d}, and \texttt{(b c)} has two children \texttt{b} and \texttt{c}.) \item \IndexFlag{dump-ast}\texttt{-dump-ast} runs only the parser and Sema, and prints the type-checked syntax tree as an s-expression. \item \IndexFlag{print-ast}\texttt{-print-ast} prints the type-checked syntax tree in a form that approximates what was written in source code. This is useful for getting a sense of what declarations the compiler \index{synthesized declaration}synthesized, for example for derived conformances to protocols like \texttt{Equatable}. \item \IndexFlag{emit-silgen}\texttt{-emit-silgen} runs only Sema and SILGen, and prints the raw SIL output by SILGen. @@ -101,7 +101,7 @@ \chapter{Compilation Model}\label{compilation model} \index{TBD} \index{textual interface} -The compilation pipeline will vary slightly depending on what the driver and frontend were asked to produce. When the frontend is instructed to emit a serialized module file only, and not an object file, compilation stops after the SIL optimizer. When generating a textual interface file or TBD file, compilation stops after Sema. (Textual interfaces are discussed in \SecRef{module system}. A TBD file is a list of symbols in a shared library, which can be consumed by the linker and is faster to generate than the shared library itself; we're not going to talk about them here.) +The compilation pipeline will vary slightly depending on what the driver and frontend were asked to produce. When the frontend is instructed to emit a serialized module file only, and not an \index{object file}object file, compilation stops after the SIL optimizer. When generating a textual interface file or TBD file, compilation stops after Sema. (Textual interfaces are discussed in \SecRef{module system}. A TBD file is a list of symbols in a shared library, which can be consumed by the linker and is faster to generate than the shared library itself; we're not going to talk about them here.) \paragraph{Frontend flags.} \index{frontend flag} @@ -109,7 +109,7 @@ \chapter{Compilation Model}\label{compilation model} \begin{Verbatim} $ swiftc -frontend -typecheck -primary-file a.swift b.swift \end{Verbatim} -Another mechanism for passing flags to the frontend is the \IndexFlag{Xfrontend}\texttt{-Xfrontend} flag. When this flag appears in a command-line invocation of the driver, the driver schedules job as usual, but the command line argument that comes immediately after is passed directly to each frontend job: +Another mechanism for passing flags to the frontend is provided by the \IndexFlag{Xfrontend}\texttt{-Xfrontend} driver flag. When this flag appears in a command-line invocation, the driver proceeds to schedule jobs as usual, but the command line argument that immediately follows is passed directly to each frontend job: \begin{Verbatim} $ swiftc a.swift b.swift -Xfrontend -dump-requirement-machine \end{Verbatim} @@ -152,7 +152,7 @@ \section{Name Lookup}\label{name lookup} \paragraph{Dynamic lookup.} A qualified lookup whose base is the \texttt{AnyObject} type implements the legacy \index{Objective-C}Objective-C behavior of a message send to \texttt{id}, which can invoke any method defined in any Objective-C class or protocol. In Swift, the so-called \IndexDefinition{AnyObject lookup@\texttt{AnyObject} lookup|see{dynamic lookup}}\IndexDefinition{dynamic lookup}\emph{dynamic lookup} searches a global lookup table constructed to contain all \texttt{@objc} members of all classes and protocols: \begin{itemize} -\item Any class can contain \texttt{@objc} members, and the attribute can either be explicitly stated, or inferred if the method overrides an \texttt{@objc} method from the superclass. +\item Any class can contain \texttt{@objc} members, and the attribute can either be explicitly stated, or inferred if the method \index{override}overrides an \texttt{@objc} method from the superclass. \item Protocol members are \texttt{@objc} only if the protocol itself is \texttt{@objc}. \end{itemize} @@ -165,7 +165,7 @@ \section{Name Lookup}\label{name lookup} Operator symbols do not themselves have an implementation; they are just names. An operator symbol can be used as the name of a function implementing the operator on a specific type (for prefix and postfix operators) or a specific pair of types (for infix operators). Operator functions can be declared either at the top level, or as a member of a type. As far as a name lookup is concerned, the interesting thing about operator functions is that they are visible globally, even when declared inside of a type. Operator functions are found by consulting the operator lookup table, which contains top-level operator functions as well as member operator functions of all declared types. -When the compiler type checks the expression \texttt{2 + 3 * 6}, it must pick two specific operator functions for \texttt{+} and \texttt{*} among all the possibilities in order to make this expression type check. In this case, the overloads for \texttt{Int} are chosen, because \texttt{Int} is the default literal type for the literals \texttt{2}, \texttt{3} and \texttt{6}. +When the compiler \index{expression type checker}type checks the expression \texttt{2 + 3 * 6}, it must pick two specific operator functions for \texttt{+} and \texttt{*} among all the possibilities, in order for this expression to type check. In this case, the overloads for \texttt{Int} are chosen, because \texttt{Int} is the default literal type for the literals \texttt{2}, \texttt{3} and \texttt{6}. \begin{listing}\captionabove{Operator lookup in action}\label{customops} \begin{Verbatim} @@ -202,23 +202,21 @@ \section{Name Lookup}\label{name lookup} \section{Delayed Parsing}\label{delayed parsing} -The ``compilation pipeline'' model as described is an over-simplification of the actual state of affairs. Ultimately, each frontend job only needs to generate machine code from the declarations in its primary files, so all stages from SILGen onward operate on the frontend job's primary files only. The situation while parsing and type checking is more subtle, because name lookup must find declarations in other source files, even secondary files. This requires having the \index{abstract syntax tree}abstract syntax tree for secondary files as well. However, it would be inefficient if every frontend job was required to fully parse all secondary files, because the time spent in the \index{parser}parser would be proportional to the number of frontend jobs multiplied by the number of source files, negating the benefits of parallelism. +The ``compilation pipeline'' model as described is an over-simplification of the actual state of affairs. Ultimately, each frontend job only needs to generate machine code from the declarations in its primary files, so all stages from SILGen onward operate on the frontend job's primary files only. The situation while parsing and type checking is more subtle, because name lookup must find declarations in other \index{source file}source files, even secondary files. This requires having the \index{abstract syntax tree}abstract syntax tree for secondary files as well. However, it would be inefficient if every frontend job was required to fully parse all secondary files, because the time spent in the \index{parser}parser would be proportional to the number of frontend jobs multiplied by the number of source files, negating the benefits of parallelism. -The \IndexDefinition{delayed parsing}\emph{delayed parsing} optimization solves this dilemma. When parsing a \index{secondary file}secondary file for the first time, the parser does not construct syntax tree nodes for the bodies of top-level types, extensions and functions. Instead, it operates in a high-speed mode where comments are skipped and pairs of braces are matched, but very little other work is performed. This outputs a ``skeleton'' representation of each secondary file. (In whole module mode, there is no delayed parsing. There are no secondary files, and delayed parsing of declarations in \index{primary file}primary files is pointless, since they are always needed for type checking and code generation anyway.) If the body of a type or extension declaration from a secondary file is needed later---for example, if type checking of an \index{expression}expression in a primary file performs a name lookup into this declaration---the source range of the declaration is parsed again, this time building the full syntax tree. While it is possible to construct a pathological program where every source file triggers delayed parsing of all declarations in every other file, this does not occur in practice. +The \IndexDefinition{delayed parsing}\emph{delayed parsing} optimization solves this dilemma. When parsing a \index{secondary file}secondary file for the first time, the parser does not construct syntax tree nodes for the bodies of top-level types, extensions, and functions. Instead, it operates in a high-speed mode where comments are skipped and pairs of braces are matched, but very little other work is performed. This outputs a ``skeleton'' representation of each secondary file. (In whole module mode, there is no delayed parsing. There are no secondary files, and delayed parsing of declarations in \index{primary file}primary files is pointless, since they are always needed for type checking and code generation anyway.) If we later need the body of a type or extension declaration from a secondary file---for example, if a name lookup into this declaration is performed while type checking an \index{expression}expression in a primary file---then we will parse the source range of the declaration again, this time building the full syntax tree. -For delayed parsing to work, the skipped members of types and extensions must have no observable effect on compilation. This is always true with two exceptions: operator lookup, and dynamic lookup. +While it is possible to construct a pathological program where every source file triggers delayed parsing of all declarations in every other file, this is unlikely in practice. \paragraph{Operator lookup.} -\index{operator lookup} -\index{expression} -As explained in the previous section, operator functions are visible globally, even when declared as a method of a type. To deal with this, the parser looks for the keyword ``\texttt{func}'' followed by an operator symbol when skipping a type or extension body in a secondary file. The first time an operator lookup is performed, the bodies of all types and extensions that contain operator functions are parsed again. Most types and extensions do not define operator functions, so this occurs rarely in practice. +For delayed parsing to work, the skipped members of types and extensions must have no observable effect on compilation. This is always true with two exceptions: operator lookup, and dynamic lookup. As explained in the previous section, \index{operator lookup}operator functions are visible globally, even when declared as a method of a type. To deal with this, the parser looks for the keyword ``\texttt{func}'' followed by an operator symbol when skipping a type or extension body in a secondary file. The first time an operator lookup is performed, the bodies of all types and extensions that contain operator functions are parsed again. Most types and extensions do not define operator functions, so this occurs rarely in practice. \paragraph{Dynamic lookup.} \index{dynamic lookup} \index{Objective-C} The situation with dynamic lookup is similar, since a method call on a value of type \texttt{AnyObject} must consult a global lookup table constructed from \texttt{@objc} members of classes, and the (implicitly \texttt{@objc}) members of \texttt{@objc} protocols. Unlike operator functions, classes and \texttt{@objc} protocols are quite common in Swift programs, however \texttt{AnyObject} lookup itself is rarely used. The first time a frontend job encounters a dynamic \texttt{AnyObject} method call, all class bodies flagged as potentially containing \texttt{@objc} methods are eagerly parsed. -There's actually one more complication here. Classes can be nested inside of other types, whose bodies are skipped if they appear in a secondary file. This is resolved with the same trick as operator lookup. When skipping the body of a type, the parser looks for occurrences of the ``\texttt{class}'' keyword. If the body contains this keyword, this type is parsed and its members visited recursively when building the \texttt{AnyObject} global lookup table. +There's actually one more complication here. Classes can be nested inside of other types, whose bodies are skipped if they appear in a secondary file. To find such classes when building the \texttt{AnyObject} lookup table, we rely on a similar trick as operator lookup. When skipping the body of a type, the parser looks for occurrences of the ``\texttt{class}'' keyword. If the body contains this keyword, we record this fact, so that this type can be completely parsed later if needed. Most Swift programs, even those making heavy use of Objective-C interoperability, do not contain a dynamic \texttt{AnyObject} method call in every source file, so delayed parsing remains effective. @@ -262,7 +260,7 @@ \section{Request Evaluator}\label{request evaluator} The \IndexDefinition{request evaluator}\emph{request evaluator} generalizes the idea behind delayed parsing to all of type checking. As with parsing, the classic compiler design, where a single semantic analysis pass walks declarations in source order, is not well-suited for Swift: \begin{itemize} -\item Declarations may be written in any order within a Swift source file, without being \index{forward reference}forward declared (unlike \index{Pascal}Pascal or \index{C}C). Expressions and type annotations can also reference declarations in other source files without restriction. Finally, certain kinds of circular references are permitted. +\item Declarations may be written in any order within a Swift \index{source file}source file, without being \index{forward reference}forward declared (unlike \index{Pascal}Pascal or \index{C}C). Expressions and type annotations can also reference declarations in other source files without restriction. Finally, certain kinds of circular references are permitted. In particular, this means that within a single frontend job, an entity in a primary file may reference a declaration that has not yet been type checked, or is in the process of being type checked. @@ -275,7 +273,7 @@ \section{Request Evaluator}\label{request evaluator} Concretely, a \emph{request} packages a list of input parameters together with an \IndexDefinition{evaluation function}\emph{evaluation function}. With the exception of emitting diagnostics, the request function's result should only depend on those inputs, and the results of other requests. The request evaluator directly invokes the evaluation function, and caches the result. Clients only evaluate the request via the request evaluator framework, which returns the cached value if present, detects request cycles automatically, and tracks dependency information for incremental builds. -\IndexDefinition{type-check source file request} +\IndexDefinition{type-check primary file request} \IndexDefinition{AST lowering request} \index{interface type request} \index{generic signature request} @@ -283,11 +281,11 @@ \section{Request Evaluator}\label{request evaluator} \IndexDefinition{unqualified lookup request} The Swift frontend defines hundreds of request kinds; for our purposes, the most important ones are: \begin{itemize} -\item The \Request{type-check source file request} visits each declaration in a primary source file. It is responsible for kicking off enough requests to ensure that SILGen can proceed if all requests succeeded without emitting diagnostics. +\item The \Request{type-check primary file request} visits each declaration in a \index{primary file}primary source file. It is responsible for kicking off enough requests to ensure that SILGen can proceed if all requests succeeded without emitting diagnostics. \item The \Request{AST lowering request} is the entry point into \index{SILGen}SILGen, generating SIL from the abstract syntax tree for a source file. \item The \Request{unqualified lookup request} and \Request{qualified lookup request} perform the two kinds of name lookup described in the previous section. -\item The \Request{interface type request} is explained in \ChapRef{decls}. -\item The \Request{generic signature request} is explained in \ChapRef{building generic signatures}. +\item The \Request{interface type request} is explained in \ChapRef{chap:decls}. +\item The \Request{generic signature request} is explained in \ChapRef{chap:building generic signatures}. \end{itemize} \begin{example} @@ -297,23 +295,24 @@ \section{Request Evaluator}\label{request evaluator} func cook() -> Food {} struct Food {} \end{Verbatim} -Notice how the \index{initial value expression}initial value expression of the variable references the function, and the function's return type is the struct declared immediately after, so the inferred type of the variable is then this struct. This plays out with the request evaluator: +Notice how the \index{initial value expression}initial value expression of \texttt{food} references the \texttt{cook()} function, and the return type of \texttt{cook()} is the \texttt{Food} struct declared immediately after, \texttt{Food} is the inferred type of \texttt{food}. This plays out with the request evaluator: \begin{enumerate} -\item The \Request{type-check source file request} begins by visiting the declaration of \texttt{food} and performing various semantic checks. +\item The \Request{type-check primary file request} begins by visiting the declaration of \texttt{food} and performing various semantic checks. \item One of these checks evaluates the \Request{interface type request} with the declaration of \texttt{food}. This is a variable declaration, so the evaluation function will type check the initial value expression and return the type of the result. \begin{enumerate} \item In order to type check the expression \texttt{cook()}, the \Request{interface type request} is evaluated again, this time with the declaration of \texttt{cook} as its input parameter. -\item The interface type of \texttt{cook()} has not been computed yet, so the request evaluator calls the evaluation function for this request. +\item The interface type of \texttt{cook()} has not been computed yet, so the request evaluator invokes the request evaluation function. \end{enumerate} -\item After computing the interface type of \texttt{food} and performing other semantic checks, the \Request{type-check source file request} moves on to the declaration of \texttt{cook}: +\item After computing the interface type of \texttt{food} and performing other semantic checks, the \Request{type-check primary file request} moves on to the declaration of \texttt{cook}: \begin{enumerate} \item The \Request{interface type request} is evaluated once again, with the input parameter being the declaration of \texttt{cook}. \item The result was already cached, so the request evaluator immediately returns the cached result without computing it again. \end{enumerate} +\item Finally, we type check the declaration of \texttt{Food}, evaluating any remaining requests. \end{enumerate} \end{example} -The \Request{type-check source file request} is special, because it does not return a value; it is evaluated for the side effect of emitting diagnostics, whereas most other requests return a value. The implementation of the \Request{type-check source file request} guarantees that if no diagnostics were emitted, then \index{SILGen}SILGen can generate valid SIL for all declarations in a primary file. However, the next example shows that SILGen can encounter invalid declarations, and diagnose errors in secondary files. +The \Request{type-check primary file request} is special, because it does not return a value; it is evaluated for the side effect of emitting diagnostics, whereas most other requests return a value. The implementation of the \Request{type-check primary file request} guarantees that if no diagnostics were emitted, then \index{SILGen}SILGen can generate valid SIL for all declarations in a primary file. However, the next example shows that SILGen can encounter invalid declarations, and diagnose errors in secondary files. \begin{example} Suppose we run a frontend job with the below primary file: @@ -329,12 +328,12 @@ \section{Request Evaluator}\label{request evaluator} } \end{Verbatim} -Our frontend job does not emit any diagnostics in the semantic analysis pass, because the \texttt{contents} stored property of \texttt{Box} is not actually referenced while type checking the primary file \texttt{a.swift}. However when SILGen runs, it needs to determine whether the parameter of type \texttt{Box} to the \texttt{open()} function needs to be passed directly in registers, or via an address by computing the \emph{type lowering} for the \texttt{Box} type. The type lowering procedure recursively computes the type lowering of each stored property of \texttt{Box}; this evaluates the \index{interface type request}\Request{interface type request} for the \texttt{contents} property of \texttt{Box}, which emits a diagnostic because the identifier \index{identifier}``\texttt{DoesNotExist}'' does not resolve to a valid type. The interface type of the stored property then becomes the \index{error type}error type. +Our frontend job does not emit any diagnostics in the semantic analysis pass, because the \texttt{contents} stored property of \texttt{Box} is not actually referenced while type checking the primary file \texttt{a.swift}. However when SILGen runs, it needs to determine whether the parameter of type \texttt{Box} to the \texttt{open()} function needs to be passed directly in registers, or via an address by computing the \index{type lowering}\emph{type lowering} for the \texttt{Box} type. The type lowering procedure recursively computes the type lowering of each stored property of \texttt{Box}; this evaluates the \index{interface type request}\Request{interface type request} for the \texttt{contents} property of \texttt{Box}, which emits a diagnostic because the identifier \index{identifier}``\texttt{DoesNotExist}'' does not resolve to a valid type. The interface type of the stored property then becomes the \index{error type}error type. We will discuss type lowering in \SecRef{sec:type lowering}. \end{example} The request evaluator framework was first introduced in \IndexSwift{4.2}Swift~4.2 \cite{reqeval}. In subsequent releases, various ad-hoc mechanisms were gradually converted into request evaluator requests, with resulting gains to compiler performance, stability, and implementation maintainability. -\paragraph{Cycles.} In a language supporting \index{forward reference}forward references, it is possible to write a program that is syntactically well-formed, and where all identifiers resolve to valid declarations, but is nonetheless invalid because of circularity. The classic example of this is a pair of classes where each class \index{circular inheritance}inherits from the other: +\paragraph{Cycles.} In a language that permits \index{forward reference}forward references, one can write a syntactically well-formed program where all identifiers reference valid declarations, but the program is nonetheless invalid, because of circularity. The classic example of this is a pair of classes where each class \index{circular inheritance}inherits from the other: \begin{Verbatim} class A: B {} class B: A {} @@ -354,36 +353,33 @@ \section{Request Evaluator}\label{request evaluator} \begin{Verbatim} $ swiftc cycle.swift -Xfrontend -debug-cycles ===CYCLE DETECTED=== - `--TypeCheckSourceFileRequest(source_file "cycle.swift") + `--TypeCheckPrimaryFileRequest(source_file "cycle.swift") `--SuperclassDeclRequest(cycle.(file).A@cycle.swift:1:7) `--SuperclassDeclRequest(cycle.(file).B@cycle.swift:2:7) `--SuperclassDeclRequest(cycle.(file).A@cycle.swift:1:7) \end{Verbatim} -\IndexFlag{trace-stats-events} -\paragraph{Debugging.} A couple of command-line flags are useful for debugging compile-time performance issues. The \texttt{-stats-output-dir} flag is followed by the name of a directory, which must already exist. Each frontend job writes a new JSON file to this directory, with various counters and timers. For each kind of request, there is a counter for the number of unique requests of this kind that were evaluated, not counting requests whose results were cached. The timer records the time spent in the request's evaluation function. - -The output can be sliced and diced in various ways; one can actually make pretty effective use of \Index{awk@\texttt{awk}}\texttt{awk}, despite the \index{JSON}JSON format: +\paragraph{Performance analysis.} A handful of command-line flags are provided to help with understanding compile-time performance. The \IndexFlag{stats-output-dir}\texttt{-stats-output-dir} flag is followed by the name of a directory, which must already exist. Each frontend job writes a new JSON file to this directory, with various counters and timers. When used in conjunction with the \IndexFlag{fine-grained-timers}\texttt{-fine-grained-timers} flag, the compiler will count the number of request evaluations, and the total time spent in request evaluation, broken down into each kind of request. The output can be sliced and diced in various ways; one can actually make pretty effective use of ``\Index{awk@\texttt{awk}}\texttt{awk}'' \cite{awk} for example, despite the \index{JSON}JSON format: \begin{Verbatim} $ mkdir /tmp/stats -$ swiftc ... -stats-output-dir /tmp/stats -$ awk '/InterfaceTypeRequest.wall/ { x += $2 } END { print x }' \ +$ swiftc -stats-output-dir -fine-grained-timers /tmp/stats ... +$ awk -f '/InterfaceTypeRequest.wall/ { x += $2 } END { print x }' \ /tmp/stats/*.json \end{Verbatim} -The second command-line flag is \texttt{-trace-stats-events}. It must be passed in conjunction with \texttt{-stats-output-dir}, and enables output of a trace file to the statistics directory. The trace file records a time-stamped event for the start and end of each request evaluation function, in CSV format. -\IndexFlag{stats-output-dir} +Another command-line flag is \IndexFlag{trace-stats-events}\texttt{-trace-stats-events}. It must be passed in conjunction with \texttt{-stats-output-dir}, and enables output of a trace file to the statistics directory. The trace file is a sequence of time-stamped events that mark the start and end of each request evaluation function, in CSV format. More details about these flags in \cite{compileperf}. + \section{Incremental Builds}\label{incremental builds} \IndexDefinition{incremental build} \IndexFlag{incremental} -The request evaluator also records dependencies for incremental compilation, enabled by the \verb|-incremental| driver flag. The goal of incremental compilation is to prove which files do not need to be rebuilt, in the least conservative way possible. The quality of an incremental compilation implementation can be judged as follows:\footnote{Credit for this idea goes to David Ungar.} +The request evaluator also records dependencies for incremental compilation, enabled by the \verb|-incremental| driver flag. The goal of incremental compilation is to prove which files do not need to be rebuilt, in the least conservative way possible. The quality of an incremental compilation implementation can be judged as follows (the author would like to acknowledge David Ungar for this explanation): \begin{enumerate} -\item Perform a clean build of all source files in the program, and collect the object files. +\item Perform a clean build of all \index{source file}source files in the program, and collect the \index{object file}object files. \item Make a change to one or more source files in the input program. \item Do an incremental build, which rebuilds some subset of source files in the input program. If a source file was rebuilt but the resulting object file is identical to the one saved in Step~1, the incremental build performed \emph{wasted work}. \item Finally, do another clean build, which yet again rebuilds all source files in the input program. If a source file was rebuilt and the resulting object file is different to the one saved in Step~1, the incremental build was \emph{incorrect}. \end{enumerate} -This highlights the difficulty of the incremental compilation problem. Rebuilding \emph{too many} files is an annoyance; rebuilding \emph{too few} files is an error. A correct but ineffective implementation would rebuild all source files every time. The opposite approach of only rebuilding the subset of source files that have changed since the last compiler invocation is also too aggressive. To see why it is incorrect, consider the program shown in \ListingRef{incrlisting1}. Let's say the programmer builds the program, adds the overload \verb|f: (Int) -> ()|, then builds it again. The new overload is more specific, so the call \texttt{f(123)} in \texttt{b.swift} now refers to the new overload; therefore, \texttt{b.swift} must also be rebuilt. +This highlights the difficulty of the incremental compilation problem. Rebuilding \emph{too many} files is an annoyance; rebuilding \emph{too few} files is a correctness issue. A correct but ineffective implementation would rebuild all source files every time. On the other hand, the opposite approach of only rebuilding the subset of source files that have changed since the last compiler invocation is too aggressive. To see why it is incorrect, consider the program shown in \ListingRef{incrlisting1}. Let's say the programmer builds the program, adds the overload \verb|f: (Int) -> ()|, then builds it again. The new overload is more specific, so the call \texttt{f(123)} in \texttt{b.swift} now refers to the new overload; therefore, \texttt{b.swift} must also be rebuilt. \begin{listing}\captionabove{Rebuilding a file after adding a new overload}\label{incrlisting1} \begin{Verbatim} // a.swift @@ -402,13 +398,13 @@ \section{Incremental Builds}\label{incremental builds} \end{listing} \IndexDefinition{dependency file} -The approach taken by the Swift compiler is to construct a \emph{dependency graph}. The frontend outputs a \emph{dependency file} for each source file, recording all names the source file \emph{provides}, and all names the type checker \emph{requires} while compiling the source file. Dependency files use a binary format with the ``\texttt{.swiftdeps}'' file name extension. The list of provided names in the dependency file is generated by walking the \index{abstract syntax tree}abstract syntax tree, collecting all visible declarations in each source file. The list of required names is generated by the request evaluator, using the \index{stack}stack of active requests. Every cached request has a list of required names, and a \index{request}request can optionally be either a dependency sink, or dependency source: +The approach taken by the Swift compiler is to construct a \emph{dependency graph}. The frontend outputs a \emph{dependency file} for each source file, recording all names the source file \emph{provides}, and all names the type checker \emph{requires} while compiling the source file. Dependency files use a binary format with the ``\texttt{.swiftdeps}'' file name extension. The list of provided names in the dependency file is generated by walking the \index{abstract syntax tree}abstract syntax tree and collecting all visible declarations in each source file. The list of required names is generated by the request evaluator, using the \index{stack}stack of active requests. Every cached request has a list of required names, and a \index{request}request can optionally be either a dependency sink, or dependency source: \begin{itemize} \item A \IndexDefinition{dependency sink}\emph{dependency sink} is a name lookup request which records a required name. When a dependency sink request is evaluated, the request evaluator walks the stack of active requests, adding the identifier to each active request's list of required names. Thus, for every request, we track the name lookups that took place from the evaluation function. An important caveat is that when a request with a cached value is evaluated again, the request's cached list of required names must again be ``replayed,'' adding them to each active request that depends on the cached value. -\item A \IndexDefinition{dependency source}\emph{dependency source} is a request which appears at the top of the request stack, such as the \index{type-check source file request}\Request{type-check source file request} or the \index{AST lowering request}\Request{AST lowering request}. A dependency source scopes some amount of work to a source file. +\item A \IndexDefinition{dependency source}\emph{dependency source} is a request which appears at the top of the request stack, such as the \index{type-check primary file request}\Request{type-check primary file request} or the \index{AST lowering request}\Request{AST lowering request}. A dependency source scopes some amount of work to a source file. After the evaluation of a dependency source request completes, all required names attributed to the request are added to the source file's list of required names. \end{itemize} @@ -439,29 +435,29 @@ \section{Incremental Builds}\label{incremental builds} \end{listing} \begin{example} -To understand how request caching interacts with dependency recording, consider the program shown in \ListingRef{dependencyexample}. Suppose the driver decides to compile \emph{both} \texttt{a.swift} and \texttt{b.swift} in the same frontend job (in fact, the issue at hand can only appear in \index{batch mode}batch mode, when a frontend job has more than one primary file). First, the \Request{type-check source file request} runs with the source file \texttt{a.swift}. +To understand how request caching interacts with dependency recording, consider the program shown in \ListingRef{dependencyexample}. Suppose the driver decides to compile \emph{both} \texttt{a.swift} and \texttt{b.swift} in the same frontend job (in fact, the issue at hand can only appear in \index{batch mode}batch mode, when a frontend job has more than one primary file). First, the \Request{type-check primary file request} runs with the source file \texttt{a.swift}. \begin{enumerate} -\item While type checking the body of \texttt{breakfast()}, the type checker evaluates the \Request{unqualified lookup request} with the identifier ``\texttt{soup}.'' -\item This records the identifier ``\texttt{soup}'' in the requires list of each active request. There is one active request, the \Request{type-check source file request} for \texttt{a.swift}. +\item While type checking the body of \texttt{breakfast()}, the type checker evaluates the \Request{unqualified lookup request} to resolve the identifier ``\texttt{soup}.'' +\item This records the identifier ``\texttt{soup}'' in the require names list of each active request. There is one active request, the \Request{type-check primary file request} for \texttt{a.swift}. \item The lookup finds the declaration of \texttt{soup()} in \texttt{c.swift}. \item The type checker evaluates the \Request{interface type request} with the declaration of \texttt{soup()}. \begin{enumerate} \item The \Request{interface type request} evaluates the \Request{unqualified lookup request} with the identifier ``\texttt{Pumpkin}.'' -\item This records the identifier ``\texttt{Pumpkin}'' in the requires list of each active request, of which there are now two: the \Request{interface type request} for \texttt{soup()}, and the \Request{type-check source file request} for \texttt{a.swift}. +\item This records the identifier ``\texttt{Pumpkin}'' in the required names list of each active request, of which there are now two: the \Request{interface type request} for \texttt{soup()}, and the \Request{type-check primary file request} for \texttt{a.swift}. \end{enumerate} -\item The \Request{type-check source file request} for \texttt{a.swift} has now finished. The requires list for this request contains two identifiers, ``\texttt{soup}'' and ``\texttt{Pumpkin}''; both are added to the requires list of the source file \texttt{a.swift}. +\item The \Request{type-check primary file request} for \texttt{a.swift} completes. The required names list for this request contains two identifiers, ``\texttt{soup}'' and ``\texttt{Pumpkin}''; both are added to the required names list of the source file \texttt{a.swift}. \end{enumerate} -Next, the \Request{type-check source file request} runs with the source file \texttt{b.swift}. +Next, the \Request{type-check primary file request} runs with the source file \texttt{b.swift}. \begin{enumerate} \item While type checking the body of \texttt{lunch()}, the type checker evaluates the \Request{unqualified lookup request} with the identifier ``\texttt{soup}.'' -\item This records the identifier ``\texttt{soup}'' in the requires list of each active request. There is one active request, the \Request{type-check source file request} for \texttt{b.swift}. +\item This records the identifier ``\texttt{soup}'' in the required names list of each active request. There is one active request, the \Request{type-check primary file request} for \texttt{b.swift}. \item The lookup finds the declaration of \texttt{soup()} in \texttt{c.swift}. \item The type checker evaluates the \Request{interface type request} with the declaration of \texttt{soup()}. -\item This request has already been evaluated, and the cached result is returned. The requires list for this request is the single identifier ``\texttt{Pumpkin}.'' This requires list is replayed, as if the request was being evaluated for the first time. This adds the identifier ``\texttt{Pumpkin}'' to the requires list of each active request, of which there is just one: the \Request{type-check source file request} for \texttt{b.swift}. -\item The \Request{type-check source file request} for \texttt{b.swift} has now finished. The requires list for this request contains two identifiers, ``\texttt{soup}'' and ``\texttt{Pumpkin}''; both are added to the requires list of the source file \texttt{b.swift}. +\item This request has already been evaluated, and the cached result is returned. The required names list for this request is the single identifier ``\texttt{Pumpkin}.'' This required names list is replayed, as if the request was being evaluated for the first time. This adds the identifier ``\texttt{Pumpkin}'' to the required names list of each active request, of which there is just one: the \Request{type-check primary file request} for \texttt{b.swift}. +\item The \Request{type-check primary file request} for \texttt{b.swift} completes. The required names list for this request contains two identifiers, ``\texttt{soup}'' and ``\texttt{Pumpkin}''; both are added to the required names list of the source file \texttt{b.swift}. \end{enumerate} -The frontend job writes out the dependency files for \texttt{a.swift} and \texttt{b.swift} upon completion. Both source files require the names ``\texttt{soup}'' and ``\texttt{Pumpkin}.'' The dependency of \texttt{b.swift} on ``\texttt{Pumpkin}'' is correctly recorded because evaluating a request with a cached value replays the request's requires list in Step~(2) above. +The frontend job writes out the dependency files for \texttt{a.swift} and \texttt{b.swift} upon completion. Both source files require the names ``\texttt{soup}'' and ``\texttt{Pumpkin}.'' The dependency of \texttt{b.swift} on ``\texttt{Pumpkin}'' is correctly recorded because evaluating a request with a cached value replays the request's required names list in Step~2 above. \end{example} There's a bit more to the incremental build story than this; in particular, we haven't talked about the ``interface hash'' mechanism, meant to avoid rebuilding of dependent source files for changes were limited to comments, whitespace or function bodies. We're already far afield from the goal of describing Swift generics though, so the curious reader can refer to \cite{reqeval} and \cite{incremental} for details. @@ -469,24 +465,24 @@ \section{Incremental Builds}\label{incremental builds} \section{Module System}\label{module system} The frontend represents a module by a \IndexDefinition{module declaration}\emph{module declaration} containing one or more \IndexDefinition{file unit}\emph{file units}. The list of source files in a compiler invocation form the \index{main module}\emph{main module}. The main module is special, because its \index{abstract syntax tree}abstract syntax tree is constructed directly by parsing source code; the file units are \IndexDefinition{source file}\emph{source files}. There are three other kinds of modules: -\begin{itemize} -\item \textbf{Serialized modules} containing one or more \IndexDefinition{serialized AST file unit}\emph{serialized AST file units}. When the main module imports another module written in Swift, the frontend reads a serialized module that was previously built. +\begin{enumerate} +\item \textbf{Serialized modules} that consist of one or more \IndexDefinition{serialized AST file unit}\emph{serialized AST file units}. When the main module imports another module written in Swift, the frontend reads a serialized module that was previously built. -\item \textbf{Imported modules} consisting of one or more \IndexDefinition{Clang file unit}\emph{Clang file units}. These are the modules implemented in C, Objective-C, or C++. +\item \textbf{Imported modules} that consist of one or more \IndexDefinition{Clang file unit}\emph{Clang file units}. These are the modules implemented in C, Objective-C, or C++. -\item \textbf{The builtin module} with exactly one file unit, containing types and intrinsics implemented by the compiler itself. -\end{itemize} -The main module depends on other modules via the \texttt{import} keyword, which parses as an \IndexDefinition{import declaration}\emph{import declaration}. After parsing, one of the first stages in semantic analysis loads all modules imported the main module. The standard library is defined in the \texttt{Swift} module, which is imported automatically unless the frontend was invoked with the \IndexFlag{parse-stdlib}\texttt{-parse-stdlib} flag, used when building the standard library itself. As for the builtin module, it is ordinarily not visible, but the \texttt{-parse-stdlib} flag also causes it to be implicitly imported (\SecRef{misc types}). +\item \textbf{The builtin module}, which has exactly one file unit that contains types and intrinsics implemented by the compiler itself. +\end{enumerate} +The main module depends on other modules via the \texttt{import} keyword, which parses as an \IndexDefinition{import declaration}\emph{import declaration}. After parsing, one of the first stages in semantic analysis loads all modules imported the main module. The standard library is defined in the \texttt{Swift} module, which is imported automatically unless the frontend was invoked with the \IndexFlag{parse-stdlib}\texttt{-parse-stdlib} flag, used when building the standard library itself. As for the builtin module, it is ordinarily not visible, but the \texttt{-parse-stdlib} flag also causes it to be implicitly imported (\SecRef{sec:special types}). -\paragraph{Serialized modules.} The \IndexFlag{emit-module}\texttt{-emit-module} flag instructs the compiler to generate a \index{binary module|see{serialized module}}\IndexDefinition{serialized module}serialized module. Serialized module files use the ``\texttt{.swiftmodule}'' file name extension. Serialized modules are stored in a binary format, closely tied to the specific version of the Swift compiler (when building a shared library for distribution, it is better to publish a textual interface instead, as described at the end of this section). +\paragraph{Serialized modules.} The \IndexFlag{emit-module}\texttt{-emit-module} flag instructs the compiler to generate a \index{binary module|see{serialized module}}\IndexDefinition{serialized module}serialized module. Serialized module files use the ``\texttt{.swiftmodule}'' file name extension. Serialized modules are stored in a binary format, closely tied to the specific version of the Swift compiler. (When building a shared library for distribution, it is better to publish a textual interface instead, as described at the end of this section.) -Name lookup into a serialized module lazily constructs declarations by deserializing records from this binary format as needed. Deserialized declarations generally look like parsed and fully type-checked declarations, but they sometimes contain less information. For example, in \SecRef{requirements}, we will describe various syntactic representations of requirements, such as \texttt{where} clauses. Since this information is only used when type checking the declaration, it is not serialized. Instead, deserialized declarations only need to store a generic signature, described in \ChapRef{genericsig}. +Name lookup into a serialized module lazily constructs declarations by deserializing records from this binary format as needed. Deserialized declarations generally look like parsed and fully type-checked declarations, but they sometimes contain less information. For example, in \SecRef{sec:requirements}, we will encounter various syntactic representations of requirements, such as \texttt{where} clauses. Since this information is only used when type checking the declaration, it is not serialized. Instead, deserialized declarations only need to store a generic signature, described in \ChapRef{chap:generic signatures}. \index{expression} \index{statement} \IndexDefinition{inlinable function} \index{serialized SIL} -Another key difference between parsed declarations and deserialized declarations is that parsed function declarations have a body, consisting of statements and expressions. This body is never serialized, so deserialized function declarations never have a body. The one case where the body of a function is made available across module boundaries is when the function is annotated with the \texttt{@inlinable} attribute; this is implemented by serializing the SIL representation of the function instead. +A parsed function declaration has a body, consisting of statements and expressions. This body is not preserved by serialization, so a deserialized function declaration does not have a body. To implement the \texttt{@inlinable} attribute, which makes a function definition available for inlining and \index{specialization}specialization across module boundaries, we serialize the SIL representation of the function. \IndexDefinition{imported module} \IndexDefinition{Clang importer} @@ -496,23 +492,23 @@ \section{Module System}\label{module system} \IndexDefinition{bridging header} \IndexFlag{import-objc-header} -Invoking the compiler with the \texttt{-import-objc-header} flag followed by a header file name specifies a \emph{bridging header}. This is a shortcut for making C declarations in the bridging header visible to all other source files in the main module, without having to define a separate Clang module first. This is implemented by adding a Clang file unit corresponding to the bridging header to the main module. For this reason, compiler code should not assume that all file units in the main module are necessarily source files. +Invoking the compiler with the \texttt{-import-objc-header} flag followed by a header file name specifies a \emph{bridging header}. This is a shortcut for making C declarations in the bridging header visible to all other source files in the main module, without having to define a separate Clang module first. This is implemented by adding a Clang file unit corresponding to the bridging header to the main module. For this reason, compiler code should not assume that all file units in the main module are Swift source files. -\paragraph{Textual interfaces.} \IndexFlag{emit-module-interface}The binary module format depends on compiler internals and no attempt is made to preserve compatibility across compiler releases. When building a shared library for distribution, it is better to generate a \IndexDefinition{textual interface}\emph{textual interface}:\index{horse} +\paragraph{Textual interfaces.} \IndexFlag{emit-module-interface}The binary module format depends on compiler internals and no attempt is made to preserve compatibility across compiler releases. When building a shared library for distribution, it is better to generate a \index{module interface|see{textual interface}}\IndexDefinition{textual interface}\emph{textual interface}:\index{horse} \begin{Verbatim} $ swiftc Horse.swift -enable-library-evolution -emit-module-interface \end{Verbatim} -Unlike the serialized module format, textual interfaces only describe the public declarations of a module. The \IndexFlag{enable-library-evolution}\texttt{-enable-library-evolution} flag enables \IndexDefinition{library evolution}\IndexDefinition{resilience}\emph{resilience}, which is a prerequisite for emitting a textual interface. Resilience instructs clients to use more abstract access patterns which are guaranteed to only depend on the public declarations of a module. For example, it allows new stored properties to be added to a public struct. Resilience is documented in \cite{libraryevolution}. +Unlike the serialized module format, textual interfaces only describe the public declarations of a module. The \IndexFlag{enable-library-evolution}\texttt{-enable-library-evolution} flag enables \IndexDefinition{library evolution}\IndexDefinition{resilience}\emph{resilience}, which is a prerequisite for emitting a textual interface. Resilience instructs clients to use more abstract access patterns which are guaranteed to only depend on the public declarations of a module. For example, it allows new stored properties to be added to a public struct. Resilience is documented in \cite{evolutionblog,libraryevolution}. -\index{inlinable function} -\index{synthesized declaration} -\index{associated type inference} +\index{inlinable function!textual interfaces} +\index{synthesized declaration!textual interfaces} +\index{associated type inference!textual interfaces} \IndexDefinition{AST printer} Textual interface files use the ``\texttt{.swiftinterface}'' file name extension. They are generated by the AST printer, which prints declarations in a format that looks very much like Swift source code, with a few exceptions: \begin{enumerate} \item Non-\texttt{@inlinable} function bodies are skipped. Bodies of \texttt{@inlinable} functions are printed verbatim, including comments, except that \verb|#if| conditions are evaluated. \item Various synthesized declarations, such as type alias declarations from associated type inference, witnesses for derived conformances such as \texttt{Equatable}, and so on, are written out explicitly. -\item Opaque return types also require special handling (\SecRef{reference opaque archetype}). +\item Opaque result types also require special handling (\SecRef{reference opaque archetype}). \end{enumerate} Note that (1) above means the textual interface format is target-specific; a separate textual interface needs to be generated for each target platform, alongside the shared library itself. @@ -520,7 +516,7 @@ \section{Module System}\label{module system} The \texttt{@inlinable} attribute was introduced in \IndexSwift{4.2}Swift 4.2~\cite{se0193}. The Swift \index{ABI}ABI was formally stabilized in \IndexSwift{5.0}Swift 5, when the standard library became part of the operating system on Apple platforms. Library evolution support and textual interfaces became user-visible features in \IndexSwift{5.1}Swift 5.1~\cite{se0260}. A recent paper describes a formal model for reasoning about the Swift ABI \cite{formalabi}. -\section{Source Code Reference}\label{compilation model source reference} +\section{Source Code Reference}\label{src:compilation model} \IndexSource{Swift frontend} \IndexSource{Swift driver} @@ -529,11 +525,11 @@ \section{Source Code Reference}\label{compilation model source reference} \begin{quote} \url{https://github.com/swiftlang/swift-driver} \end{quote} -The Swift frontend, standard library and runtime are found in the main repository: +The Swift frontend, standard library, and runtime are found in the main repository: \begin{quote} \url{https://github.com/swiftlang/swift} \end{quote} -The major components of the Swift frontend live in their own subdirectories of the main repository. The entities modeling the abstract syntax tree are defined in \SourceFile{lib/AST/} and \SourceFile{include/swift/AST/}; among these, types and declarations are important for the purposes of this book, and will be covered in \ChapRef{types} and \ChapRef{decls}. The core of the SIL intermediate language is implemented in \SourceFile{lib/SIL/} and \SourceFile{include/swift/SIL/}. +The major components of the Swift frontend live in their own subdirectories of the main repository. The entities modeling the abstract syntax tree are defined in \SourceFile{lib/AST/} and \SourceFile{include/swift/AST/}; among these, types and declarations are important for the purposes of this book, and will be covered in \ChapRef{chap:types} and \ChapRef{chap:decls}. The core of the SIL intermediate language is implemented in \SourceFile{lib/SIL/} and \SourceFile{include/swift/SIL/}. Each stage of the compilation pipeline has its own subdirectory: \begin{itemize} @@ -553,7 +549,7 @@ \subsection*{The AST Context} \IndexSource{AST context} \apiref{ASTContext}{class} -The global singleton for a single frontend instance. An AST context provides a memory allocation arena, unique allocation for various immutable data types used throughout the compiler, and storage for various other global singletons. +This is a singleton class representing the frontend instance. An AST context provides a memory allocation arena, unique allocation for various immutable data types used throughout the compiler, and storage for various other global singletons. \subsection*{Request Evaluator} Key source files: @@ -562,31 +558,38 @@ \subsection*{Request Evaluator} \item \SourceFile{lib/AST/Evaluator.cpp} \end{itemize} \apiref{SimpleRequest}{template class} -Each request kind is a subclass of \texttt{SimpleRequest}. The evaluation function is implemented by overriding the \texttt{evaluate()} method of \texttt{SimpleRequest}. +Each request kind is a subclass of \texttt{SimpleRequest}. Subclasses implement the \IndexSource{evaluation function}evaluation function by overriding the \texttt{evaluate()} method of \texttt{SimpleRequest}. -\IndexSource{dependency source} -\IndexSource{dependency sink} \apiref{RequestFlags}{enum class} -One of the template parameters to \texttt{SimpleRequest} is a set of flags: +One of the template parameters to \texttt{SimpleRequest} is a value of this type. A handful of flags specify the caching policy. Exactly one of these must be specified: +\begin{itemize} +\item \texttt{RequestFlags::Uncached} indicates that no caching is to be performed. +\item \texttt{RequestFlags::Cached} indicates that results should be automatically cached. +\item \texttt{RequestFlags::SeparatelyCached} indicates that the request's results should be cached by the request implementation itself. +\item \texttt{RequestFlags::SplitCached} is a hybrid strategy that combines both automatic and separate caching, as described below. +\end{itemize} +Another pair of flags define how this request interacts with the dependency tracking mechanism for incremental builds (\SecRef{incremental builds}). At most one of the below should be specified: \begin{itemize} -\item \texttt{RequestFlags::Uncached}: indicates that the result of the evaluation function should not be cached. -\item \texttt{RequestFlags::Cached}: indicates that the result of the evaluation function should be cached by the request evaluator, which uses a per-request kind \texttt{DenseMap} for this purpose. -\item \texttt{RequestFlags::SeparatelyCached}: the result of the evaluation function should be cached by the request implementation itself, as described below. -\item \texttt{RequestFlags::DependencySource}, \texttt{DependencySink}: if one of these is set, the request kind becomes a dependency source or sink, as described in \SecRef{incremental builds}. +\item \texttt{RequestFlags::DependencySource} marks this request as a \IndexSource{dependency source}dependency source. This is set if the request directly performs a name lookup. +\item \texttt{RequestFlags::DependencySink} marks this request as a \IndexSource{dependency sink}dependency sink. This is set for top-level request associated with an entire source file. \end{itemize} -Separate caching can be more performant if it allows the cached value to be stored directly inside of an AST node, instead of requiring the request evaluator to consult a side table. For example, many requests taking a declaration as input store the result directly inside of the \texttt{Decl} instance or some subclass thereof. Due to expressivity limitations in C++, a bit of boilerplate is involved in the definition of a new request kind. For example, consider the \texttt{InterfaceTypeRequest}, which takes a \texttt{ValueDecl} as input and returns a \texttt{Type} as output: \begin{itemize} \item \begingroup \raggedright The request type ID is declared in \SourceFile{include/swift/AST/TypeCheckerTypeIDZone.def}. \item The \texttt{InterfaceTypeRequest} class is declared in \SourceFile{include/swift/AST/TypeCheckRequests.h}. \item The \texttt{InterfaceTypeRequest::evaluate()} method is defined in \SourceFile{lib/Sema/TypeCheckDecl.cpp}. -\item \endgroup The request is separately cached. The \texttt{InterfaceTypeRequest} class overrides the \texttt{isCached()}, \texttt{getCachedResult()} and \texttt{cacheResult()} methods to store the declaration's interface type inside the \texttt{ValueDecl} instance itself. These methods are implemented in \SourceFile{lib/AST/TypeCheckRequestFunctions.cpp}. +\item \endgroup The request is separately cached, so the \texttt{InterfaceTypeRequest} class also overrides the \texttt{isCached()}, \texttt{getCachedResult()}, and \texttt{cacheResult()} methods, described below. \end{itemize} +These methods are implemented in \SourceFile{lib/AST/TypeCheckRequestFunctions.cpp}. \IndexSource{request evaluator} \apiref{Evaluator}{class} -Request evaluation is performed by calling the \texttt{evaluateOrDefault()} top-level function, passing it an instance of the request evaluator, the request to evaluate, and a sentinel value to return in case of circularity. The \texttt{Evaluator} class is a singleton, stored in the \texttt{evaluator} instance variable of the global \texttt{ASTContext} singleton. The request evaluator will either return a cached value, or invoke the evaluation function and cache the result. For example, the \texttt{getInterfaceType()} method of \texttt{ValueDecl} is implemented as follows: +The \texttt{Evaluator} class is a singleton, stored in the \texttt{evaluator} instance variable of the global \texttt{ASTContext} singleton. + +Requests are evaluated by calling the \texttt{evaluateOrDefault()} top-level function. This function receives the request evaluator singleton, the request to evaluate, and a sentinel value to return in case of circularity. The request evaluator will either return a cached value, or invoke the evaluation function and cache the result. + +For example, the implementation of the \texttt{ValueDecl::getInterfaceType()} method evaluates the \texttt{InterfaceTypeRequest} as follows: \begin{Verbatim} Type ValueDecl::getInterfaceType() const { auto &ctx = getASTContext(); @@ -597,6 +600,28 @@ \subsection*{Request Evaluator} } \end{Verbatim} +\subsubsection*{Request Caching} +We will now discuss the various forms of caching available by specifying \texttt{RequestFlags}. + +An \texttt{Uncached} request is appropriate when the evaluation function wraps some other bit of code that performs its own caching, or when external conditions guarantee that this request will only be evaluated once for each possible input. + +A \texttt{Cached} request will store its results in a per-request kind \texttt{DenseMap} whose keys are the inputs given to the evaluation function. This requires no additional work on the side of the request, but it has the disadvantage that the overhead of the \texttt{DenseMap} becomes significant when the number of distinct keys is too great. The request implementation only needs to declare one additional method in this case, named \texttt{isCached()}. This allows opting out of caching for certain inputs only; always returning \texttt{true} is the most common implementation. + +A \texttt{SeparatelyCached} request must declare \texttt{isCached()} as above, together with two additional methods, \texttt{getCachedResult()} and \texttt{cacheResult()}. Separate caching takes additional work to implement, but it avoids the cost of the \texttt{DenseMap} when the cached value can be stored directly inside the input value. + +For example, in a whole-module build, the \texttt{InterfaceTypeRequest} will be evaluated against almost every single \texttt{ValueDecl}. For this reason, the \texttt{InterfaceTypeRequest} uses separate caching, storing the interface type directly in an instance variable of the \texttt{ValueDecl} itself. + +A \texttt{SplitCached} request is like \texttt{SeparatelyCached} in that the request must declare \texttt{isCached()}, \texttt{getCachedResult()}, and \texttt{cacheResult()} methods, which are called by the evaluator. The evaluator also allocates a \texttt{DenseMap}, just like the \texttt{Cached} case, but the evaluator does not store anything there itself. The request's \texttt{getCachedResult()} and \texttt{cacheResult()} methods either store the result somewhere of their own choosing, or they pass it down to the evaluator's cache, using a pair of methods on the \texttt{Evaluator} singleton: +\begin{itemize} +\item \texttt{getCachedNonEmptyOutput()} takes a request, and returns a \texttt{std::optional} with the cached value, or \texttt{std::nullopt} if there isn't one. +\item \texttt{cacheNonEmptyOutput()} takes a request and the result, and updates the cache. +\end{itemize} +Split caching works best when the request is evaluated against a large number of inputs, but when the result is almost always some empty placeholder value. The idea is that the empty value is stored directly, perhaps with a single bit, whereas all other results are cached by the request evaluator. + +For example, we use split caching for the \texttt{OpaqueResultTypeRequest} of \ChapRef{chap:opaque result types}. This request is evaluated against almost every \texttt{ValueDecl}, but most value declarations don't have an \index{opaque result type}opaque result type, so the result is almost always \texttt{nullptr}. We don't want to build a \texttt{DenseMap} whose keys range over all value declarations and where most values are \texttt{nullptr}. We also don't want to use separate caching, since that would require adding a new instance variable to \texttt{ValueDecl} whose value is almost always \texttt{nullptr}. Split caching allows us to reserve a single bit in the \texttt{ValueDecl} to indicate the presence of an opaque result type, while the actual declaration of the opaque result type is then stored in the request evaluator's cache, but only for those value declarations that actually have one. + +If the \IndexFlag{analyze-request-evaluator}\texttt{-analyze-request-evaluator} \index{frontend flag}frontend flag is specified, the frontend job prints statistics about the request evaluator cache upon completion. This information is useful when choosing the appropriate caching strategy for a new request. + \subsection*{Name Lookup} \IndexSource{name lookup} @@ -608,7 +633,7 @@ \subsection*{Name Lookup} \item \SourceFile{lib/AST/NameLookup.cpp} \item \SourceFile{lib/AST/UnqualifiedLookup.cpp} \end{itemize} -The ``AST scope'' subsystem implements unqualified lookup for local bindings. Outside of the name lookup implementation itself, the rest of the compiler does not generally interact with it directly: +The ``ASTScope'' subsystem implements unqualified lookup for local bindings. This code is internal to the name lookup implementation; the rest of the compiler does not generally interact with ASTScope directly: \begin{itemize} \item \SourceFile{include/swift/AST/ASTScope.h} \item \SourceFile{lib/AST/ASTScope.cpp} @@ -628,7 +653,7 @@ \subsection*{Name Lookup} \begin{itemize} \item The name to look up. \item The declaration context where the lookup starts. -\item The source location where the name was written in source. If not specified, this becomes a top-level lookup. +\item The \IndexSource{source location}source location where the name was written in source. If not specified, this becomes a top-level lookup. \item Various flags, described below. \end{itemize} @@ -642,9 +667,9 @@ \subsection*{Name Lookup} \end{itemize} \apiref{DeclContext}{class} -Declaration contexts will be \IndexSource{declaration context}introduced in \ChapRef{decls}, and the \texttt{DeclContext} class in \SecRef{declarationssourceref}. +Declaration contexts will be \IndexSource{declaration context}introduced in \ChapRef{chap:decls}, and the \texttt{DeclContext} class in \SecRef{src:declarations}. \begin{itemize} -\item \texttt{lookupQualified()} has various overloads, which perform a \IndexSource{qualified lookup}qualified name lookup into one of various combinations of types or declarations. The ``\texttt{this}'' parameter---the \texttt{DeclContext~*} which the method is called on determines the visibility of declarations found via lookup through imports and access control; it is not the base type of the lookup. +\item The various overloads of \texttt{lookupQualified()} perform a \IndexSource{qualified lookup}qualified name lookup into the given base type. The ``\texttt{this}'' parameter---the \texttt{DeclContext~*} which the method is called on determines the visibility of declarations found via lookup through imports and access control; \texttt{this} is \emph{not} the base type of the lookup. \end{itemize} \apiref{NLOptions}{enum} @@ -657,7 +682,7 @@ \subsection*{Name Lookup} \IndexSource{direct lookup} \apiref{NominalTypeDecl}{class} -Nominal type declarations will be introduced in \ChapRef{decls}, and the \texttt{NominalTypeDecl} class in \SecRef{declarationssourceref}. The implementation of direct lookup and lazy member loading is discussed in \SecRef{extensionssourceref}. +Nominal type declarations will be introduced in \ChapRef{chap:decls}, and the \texttt{NominalTypeDecl} class in \SecRef{src:declarations}. The implementation of direct lookup and lazy member loading is discussed in \SecRef{src:extensions}. \begin{itemize} \item \texttt{lookupDirect()} performs a direct lookup, which only searches the nominal type declaration itself and its extensions, ignoring access control. \end{itemize} @@ -673,12 +698,12 @@ \subsection*{Name Lookup} \subsection*{Primary File Type Checking} \IndexSource{primary file} -\index{type-check source file request} +\index{type-check primary file request} Key source files: \begin{itemize} \item \SourceFile{lib/Sema/TypeCheckDeclPrimary.cpp} \end{itemize} -The \texttt{TypeCheckSourceFileRequest} calls the \texttt{typeCheckDecl()} global function, which uses the visitor pattern to switch on the declaration kind. For each declaration kind, it performs various semantic checks and kicks off requests which may emit diagnostics. +The \texttt{TypeCheckPrimaryFileRequest} calls the \texttt{typeCheckDecl()} global function, which uses the visitor pattern to switch on the declaration kind. For each declaration kind, it performs various semantic checks and kicks off requests which may emit diagnostics. \subsection*{Module System} \IndexSource{module declaration} @@ -689,7 +714,6 @@ \subsection*{Module System} \item \texttt{getFiles()} returns an array of \texttt{FileUnit}. \item \texttt{isMainModule()} answers if this is the main module. \end{itemize} -See \SecRef{conformancesourceref} and \SecRef{extensionssourceref} for the global conformance lookup operations defined on \texttt{ModuleDecl}. \apiref{FileUnit}{class} Abstract base class representing a file unit. \IndexSource{primary file} @@ -705,17 +729,15 @@ \subsection*{Module System} \item \texttt{getScope()} returns the root of the scope tree for unqualified lookup. \end{itemize} -\IndexSource{imported module} -\IndexSource{serialized module} -\IndexSource{textual interface} -\IndexSource{AST printer} -\IndexSource{Clang importer} -Imported and serialized modules get a subdirectory each: +\subsubsection*{Imported and serialized modules} +Support for \IndexSource{Clang importer}\IndexSource{imported module}imported and \IndexSource{serialized module}serialized modules can be found in a pair sub-directories: \begin{itemize} \item \SourceFile{lib/ClangImporter/} \item \SourceFile{lib/Serialization/} \end{itemize} -The AST printer for generating textual interfaces is implemented in a pair of files: + +\subsubsection*{AST printer} +The \IndexSource{AST printer}AST printer generates \IndexSource{textual interface}textual interface files: \begin{itemize} \item \SourceFile{include/swift/AST/ASTPrinter.h} \item \SourceFile{lib/AST/ASTPrinter.cpp} diff --git a/docs/Generics/chapters/completion.tex b/docs/Generics/chapters/completion.tex index f0a77ef02e3f4..b3084a486cbee 100644 --- a/docs/Generics/chapters/completion.tex +++ b/docs/Generics/chapters/completion.tex @@ -2,7 +2,7 @@ \begin{document} -\chapter{Completion}\label{completion} +\chapter{Completion}\label{chap:completion} \IndexDefinition{Knuth-Bendix algorithm}% \index{completion!z@\igobble|seealso{Knuth-Bendix algorithm}} @@ -19,20 +19,20 @@ \chapter{Completion}\label{completion} \end{itemize} We begin with Algorithms \ref{critical pair algo}~and~\ref{add rule derived algo}, proceeding from the inside out. The twin concepts of overlapping rule and critical pair are fundamental to the algorithm, and they provide the theoretical justification for the rest. -\paragraph{Local confluence.} We would like our \index{reduction relation}reduction relation $\rightarrow$ to satisfy the \index{Church-Rosser property}Church-Rosser property: if $x\sim y$ are two equivalent terms, then $x\rightarrow z$ and $y\rightarrow z$ for some term $z$. By \ThmRef{church rosser theorem}, this is equivalent to $\rightarrow$ being \index{confluence}confluent, meaning any two \index{positive rewrite path}positive rewrite paths diverging from a common source can be extended to meet each other. This is difficult to verify directly, but a 1941 paper by Max~Newman~\cite{newman} shows there is a simpler equivalent condition when the reduction relation is \index{terminating reduction relation}terminating. +\paragraph{Local confluence.} We would like our \index{reduction relation}reduction relation $\rightarrow$ to satisfy the \index{Church-Rosser property}Church-Rosser property: if $x\sim y$ are two \index{term equivalence relation}equivalent terms, then $x\rightarrow z$ and $y\rightarrow z$ for some term $z$. By \ThmRef{church rosser theorem}, this is equivalent to $\rightarrow$ being \index{confluence}confluent, meaning any two \index{positive rewrite path}positive rewrite paths diverging from a common source can be extended to meet each other. This is difficult to verify directly, but a 1941 paper by Max~Newman~\cite{newman} shows there is a simpler equivalent condition when the reduction relation is \index{terminating reduction relation}terminating. \begin{definition} -A reduction relation $\rightarrow$ is \IndexDefinition{local confluence}\emph{locally confluent}, if whenever $s_1$ and $s_2$ are two positive rewrite steps with $\Src(s_1)=\Src(s_2)$, there exists a term $z$ such that $\Dst(s_1)\rightarrow z$ and $\Dst(s_2)\rightarrow z$. +A reduction relation $\rightarrow$ is \IndexDefinition{local confluence}\emph{locally confluent}, if whenever $s_1$ and $s_2$ are two positive \index{rewrite step!local confluence}rewrite steps with $\Src(s_1)=\Src(s_2)$, there exists a term $z$ such that $\Dst(s_1)\rightarrow z$ and $\Dst(s_2)\rightarrow z$. \end{definition} To test for local confluence, we ``diverge'' from a term by only one step in two different directions, and then check if both sides reduce to some common term. We will see this can be decided algorithmically, and also that we can ``repair'' any local \index{confluence violation}confluence violations we do find. Thus, \index{Newman's lemma}Newman's result is fundamental: \begin{theorem}[Newman's Lemma] If a reduction relation $\rightarrow$ is terminating and locally confluent, then $\rightarrow$ is confluent. \end{theorem} -\paragraph{Overlapping rules.} A pair of positive rewrite steps with a common source define a \IndexDefinition{critical pair}\emph{critical pair}. A critical pair shows that some term can be reduced in ``two different ways.'' We can answer if our rewrite rules define a locally confluent reduction relation by inspecting each critical pair. With any non-trivial list of rewrite rules, there are infinitely many such critical pairs, however, all but a finite subset can be disregarded. Suppose we have these two rules over some alphabet $A$: +\paragraph{Overlapping rules.} A pair of positive \index{rewrite step!critical pair}rewrite steps with a common source define a \IndexDefinition{critical pair}\emph{critical pair}. A critical pair shows that some term can be reduced in ``two different ways.'' We can answer if our rewrite rules define a locally confluent reduction relation by inspecting each critical pair. With any non-trivial list of rewrite rules, there are infinitely many such critical pairs, however, all but a finite subset can be disregarded. Suppose we have these two rules over some alphabet $A$: \begin{gather*} u_1\Rightarrow v_1\\ u_2\Rightarrow v_2 \end{gather*} -For any term $x\in A^*$, we can form the ``sandwich'' term $t := u_1xu_2$. Every such choice of $x$ defines a new critical pair; the occurrences of $u_1$ and $u_2$ within $t$ can be rewritten in two ways, by $s_1 := (u_1\Rightarrow v_1)xu_2$ and $s_2 := u_1x(u_2\Rightarrow v_2)$. However, since $s_1$ and $s_2$ rewrite disjoint subterms of $t$, we say this critical pair is \IndexDefinition{orthogonal rewrite step}\emph{orthogonal}. Orthogonal critical pairs are not interesting because they cannot witness a local confluence violation. To see why, notice that regardless of whether we apply $s_1$ or $s_2$ first, there exists a complementary rewrite step $s_1^\prime$ or $s_2^\prime$ to rewrite $\Dst(s_1)$ or $\Dst(s_2)$ into the ``reduced sandwich'' $v_1xv_2$. In fact, we get a \index{commutative diagram}commutative diagram like this for any orthogonal critical pair: +For any term $x\in A^*$, we can form the ``sandwich'' term $t := u_1xu_2$. Every such choice of $x$ defines a new critical pair; the occurrences of $u_1$ and $u_2$ within $t$ can be rewritten in two ways, by $s_1 := (u_1\Rightarrow v_1)xu_2$ and $s_2 := u_1x(u_2\Rightarrow v_2)$. However, since $s_1$ and $s_2$ rewrite disjoint \index{subterm}subterms of $t$, we say this critical pair is \IndexDefinition{orthogonal rewrite step}\index{rewrite step!orthogonal}\emph{orthogonal}. Orthogonal critical pairs are not interesting because they cannot witness a local confluence violation. To see why, notice that regardless of whether we apply $s_1$ or $s_2$ first, there exists a complementary rewrite step $s_1^\prime$ or $s_2^\prime$ to rewrite $\Dst(s_1)$ or $\Dst(s_2)$ into the ``reduced sandwich'' $v_1xv_2$. In fact, we get a \index{commutative diagram}commutative diagram like this for any orthogonal critical pair: \begin{center} \begin{tikzcd} &u_1xu_2\arrow[ld, Rightarrow, "s_1:=(u_1\Rightarrow v_1)xu_2"', bend right]\arrow[rd, Rightarrow, "s_2:=u_1x(u_2\Rightarrow v_2)", bend left]\\ @@ -93,7 +93,7 @@ \chapter{Completion}\label{completion} \] \end{ceqn} -Clearly, in our quest to uncover local \index{confluence violation}confluence violations, we only need to inspect critical pairs that are \emph{not} orthogonal; that is, they must rewrite \emph{overlapping} subterms of their common source term. There are only finitely many such critical pairs, and they are all generated by inspecting the left-hand sides of our rewrite rules. We can completely characterize them with the below definition. +Clearly, in our quest to uncover local \index{confluence violation}confluence violations, we only need to inspect critical pairs that are \emph{not} orthogonal; that is, they must rewrite \emph{overlapping} \index{subterm}subterms of their common source term. There are only finitely many such critical pairs, and they are all generated by inspecting the left-hand sides of our rewrite rules. We can completely characterize them with the below definition. \begin{definition}\label{overlappingrules} Two rules $(u_1, v_1)$ and $(u_2, v_2)$ \IndexDefinition{overlapping rules}\emph{overlap} if one of the following holds: \begin{enumerate} @@ -162,7 +162,7 @@ \chapter{Completion}\label{completion} \end{example} \paragraph{Resolving critical pairs.} -A critical pair exhibits some term $t$ being rewritten in two distinct ways. If we take the destination term of each of the two rewrite steps, we get a pair of terms that are known to be equivalent to $t$, and each other. For an overlap of the first kind, the two terms are $(v_1,\,xv_2z)$; for the second kind, $(v_1z,\,xv_2)$: +A critical pair exhibits some term $t$ being rewritten in two distinct ways. If we take the destination term of each of the two \index{rewrite step!critical pair}rewrite steps, we get a pair of terms that are known to be equivalent to $t$, and each other. For an overlap of the first kind, the two terms are $(v_1,\,xv_2z)$; for the second kind, $(v_1z,\,xv_2)$: \begin{center} \begin{tabular}{cc} Overlap of the first kind& @@ -282,7 +282,7 @@ \chapter{Completion}\label{completion} Rewrite loops are not just a theoretical tool; our implementation of the Knuth-Bendix algorithm follows \cite{loggedrewriting} and \cite{homotopicalcompletion} in encoding and recording the rewrite loops that describe resolved critical pairs. This enables the computation of minimal requirements in \SecRef{homotopy reduction}. Only local rules are subject to \index{requirement minimization}minimization, so we only record rewrite loops involving local rules. If a requirement machine instance is only to be used for generic signature queries and not minimization, rewrite loops are not recorded at all. \paragraph{An optimization.} -Now that we know how to process a single overlap and resolve a critical pair, the next chunk of code concerns enumerating all candidate overlaps. If our rewrite rules were truly arbitrary, we would need to consider all possible combinations: for every rewrite rule $u_1\Rightarrow v_1$, for every rewrite rule $u_2\Rightarrow v_2$, and for every position $i<|u_1|$, we would need to check if the corresponding subterms of $u_1$ and $u_2$ are identical. However, we can do better. The bottom-up construction of a requirement machine from protocol components, and the partition of rewrite rules into \index{imported rule}imported rules and \index{local rule}local rules, enables an optimization where overlaps between certain pairs of rules need not be considered at all: +Now that we know how to process a single overlap and resolve a critical pair, the next chunk of code concerns enumerating all candidate overlaps. If our rewrite rules were truly arbitrary, we would need to consider all possible combinations: for every rewrite rule $u_1\Rightarrow v_1$, for every rewrite rule $u_2\Rightarrow v_2$, and for every position $i<|u_1|$, we would need to check if the corresponding \index{subterm}subterms of $u_1$ and $u_2$ are identical. However, we can do better. The bottom-up construction of a requirement machine from protocol components, and the partition of rewrite rules into \index{imported rule}imported rules and \index{local rule}local rules, enables an optimization where overlaps between certain pairs of rules need not be considered at all: \begin{itemize} \item We don't need to look for overlaps between imported rules. While an imported rule can overlap with another imported rule, all such critical pairs are trivial and do not need to be resolved again. \item We don't need to look for overlaps between an imported rule and a local rule. An imported rule cannot overlap with a local rule. @@ -393,7 +393,7 @@ \chapter{Completion}\label{completion} \begin{enumerate} \item Clear the flag. \item Build a list of critical pairs with \AlgRef{find overlapping rule algo}. -\item Left-simpify all rewrite rules with \AlgRef{left simplification}. +\item Left-simplify all rules with \AlgRef{left simplification}. \item Resolve each critical pair with \AlgRef{add rule derived algo}, and set the flag if any new rewrite rules were added. \item Right-simplify all rewrite rules with \AlgRef{right simplification}. \item Substitution-simplify all rewrite rules with \AlgRef{subst simplification algo}. @@ -484,8 +484,8 @@ \section{Rule Simplification}\label{rule reduction} &t^\prime& \end{tikzcd} \end{center} -The \IndexDefinition{left simplification}left simplification algorithm considers the left-hand side of each rule, and marks the rule if it finds a subterm matching some other rule. -\begin{algorithm}[Left-simplify rewrite rules]\label{left simplification} +The \IndexDefinition{left simplification}left simplification algorithm considers the left-hand side of each rule, and marks the rule if it finds a \index{subterm}subterm matching some other rule. +\begin{algorithm}[Left-simplify rules]\label{left simplification} Takes the list of local rules as input. Has side effects. \begin{enumerate} \item (Initialize) Let $n$ be the total number of local rules, and set $i:=0$. @@ -517,7 +517,7 @@ \section{Rule Simplification}\label{rule reduction} \end{tikzcd} \end{center} -\begin{algorithm}[Right-simplify rewrite rules]\label{right simplification} +\begin{algorithm}[Right-simplify rules]\label{right simplification} Takes the list of local rules as input. Has side effects. \begin{enumerate} \item (Initialize) Let $n$ be the total number of local rules, and set $i:=0$. @@ -672,7 +672,7 @@ \section{Associated Types}\label{critical pairs} p^\prime := (\ProtoConfInv{\rT}{P})\cdot\nA\cdot\nB \circ \rT\cdot(\AssocIntro{P}{A})\cdot\nB\\ \circ \rT\cdot(\ProtoConfInv{\aPA}{Q})\cdot\nB \circ \rT\cdot\aPA\cdot(\AssocIntro{Q}{B}) \end{multline*} -Unlike $p$, $p^\prime$ is \emph{not} a positive rewrite path, because the first and third steps are negative; each one applies a conformance rule backwards. We can visualize $p^\prime$ as a path in the rewrite graph, with the \index{negative rewrite step}negative rewrite steps going up: +Unlike $p$, $p^\prime$ is \emph{not} a positive rewrite path, because the first and third steps are negative; each one applies a conformance rule backwards. We can visualize $p^\prime$ as a path in the rewrite graph, with the \index{negative rewrite step}negative \index{rewrite step}rewrite steps going up: \begin{center} \begin{tikzcd}[column sep=-5pt] &\rT\cdot\pP\cdot\nA\cdot\nB\arrow[rd, Rightarrow, bend left] @@ -1140,7 +1140,7 @@ \section{More Critical Pairs}\label{more critical pairs} &\protosym{S}\cdot\nC\cdot\nC\Rightarrow\protosym{S}\tag{7}&\\ \bottomrule \end{flalign*} -Completion also adds \emph{ten} new rules; we will reveal a few at a time. Let's consider the first rule, $\ProtoInherit{S}{S}$. Every protocol has an identity conformance rule, but in the previous examples it didn't play an important role so we ignored it. The identity conformance rule always overlaps with itself: +Completion also adds \emph{ten} new rules; we will study a few at a time. Let's consider the first rule, $\ProtoInherit{S}{S}$. Every protocol has an identity conformance rule, but in the previous examples it didn't play an important role so we ignored it. The identity conformance rule always overlaps with itself: \begin{align*} \protosym{S}\cdot{}&\protosym{S}\\ &\protosym{S}\cdot\protosym{S} @@ -1263,7 +1263,7 @@ \section{Tietze Transformations}\label{tietze transformations} \begin{enumerate} \item (Adding a rewrite rule) If a pair of terms $u$, $v\in A^*$ are already joined by a rewrite path from $u$ to $v$---that is, if $u\sim v$ as elements of $\AR$---we can add $(u,v)$: \[\Pres{A}{R\cup\{(u,v)\}}\] -\item (Removing a rewrite rule) If $(u,v)\in R$ and we have a rewrite path from $u$ to $v$ that does not contain the rewrite step $x(u\Rightarrow v)y$ or $x(v\Rightarrow u)y$ for any $x$, $y\in A^*$, we can remove $(u,v)$: +\item (Removing a rewrite rule) If $(u,v)\in R$ and we have a rewrite path from $u$ to $v$ that does not contain the \index{rewrite step!Tietze transformation}rewrite step $x(u\Rightarrow v)y$ or $x(v\Rightarrow u)y$ for any $x$, $y\in A^*$, we can remove $(u,v)$: \[\Pres{A}{R\setminus\{(u,v)\}}\] \item (Adding a generator) If $a$ is some symbol distinct from all other symbols of $A$, and $t\in A^*$ is any term, we can simultaneously add $a$ and make it equivalent to $t$: \[\Pres{A\cup\{a\}}{R\cup\{(t,a)\}}\] @@ -1276,7 +1276,7 @@ \section{Tietze Transformations}\label{tietze transformations} \begin{itemize} \item For every Tietze transformation, there is a complementary transformation which undoes the change; in this way (1) and (2) are inverses, and similarly (3) and (4). -\item We already know that changing the order of the terms in a rewrit erule---replacing $(u,v)\in R$ with $(v,u)$---does not change the set of rewrite steps generated, and thus presents the same monoid. Now we see this operation is actually a composition of two elementary Tietze transformations; we first add $(v,u)$, and remove $(u,v)$. +\item We already know that changing the order of the terms in a rewrite rule---replacing $(u,v)\in R$ with $(v,u)$---does not change the set of rewrite steps generated, and thus presents the same monoid. Now we see this operation is actually a composition of two elementary Tietze transformations; we first add $(v,u)$, and remove $(u,v)$. \item Some definitions of (4) drop the condition that the removed symbol $a$ not appear in any other rewrite rule $(u,v)\in R$. Our restriction does not cost us any generality, because if $a$ occurs in any other rewrite rule, we can always first perform a series of Tietze transformations to eliminate such occurrences: for each such $(u,v)$, we replace occurrences of $a$ with $t$ in $u$ and $v$, add the new rewrite rule, and finally remove the old rule $(u,v)$. However, we must still require that $a$ not occur in $t$; otherwise, replacing $a$~with~$t$ does not eliminate all occurrences of~$a$. \end{itemize} @@ -1286,7 +1286,7 @@ \section{Tietze Transformations}\label{tietze transformations} Thus, we can understand the associated type rules as having been added by a sequence of Tietze transformations of the third kind, applied to some monoid presentation. This ``primordial'' monoid presentation involves only protocol and name symbols, and it encodes the same monoid that ours does, up to isomorphism. So why do we bother with associated type symbols at all? In the next section, we will see the answer has to do with the interaction between recursive conformance requirements and completion. \paragraph{Further discussion.} -The elementary Tietze transformations are ``higher dimensional'' rewrite steps, in that they define an edge relation on a graph whose vertices are \emph{monoid presentations}. A path in this graph witnesses the fact that the source and destination define an isomorphic pair of monoids. The problem of deciding if two presentations are joined by a path is the \index{monoid isomorphism problem}\emph{monoid isomorphism problem}. Like the \index{word problem}word problem, this is of course \index{undecidable problem}undecidable in general. +The elementary Tietze transformations are ``higher dimensional'' rewrite steps, in that they define an edge relation on a graph whose vertices are \emph{monoid presentations}. A path in this graph witnesses the fact that the source and destination define an isomorphic pair of monoids. The problem of deciding if two presentations are joined by a path is the \index{monoid isomorphism problem}\emph{monoid isomorphism problem}. Like the \index{word problem}word problem, this is of course \index{undecidable problem!monoid isomorphism}undecidable in general. Tietze transformations are fundamental to the study of \emph{combinatorial group theory}, and are described in any book on the subject, such as \cite{combinatorialgroup}. Recall that in a group presentation, a rewrite rule $(u, v)$ can always be written as $(uv^{-1},\varepsilon)$, with the identity element on the right hand side. The term $uv^{-1}$ is called a \emph{relator}; a set of relators takes the place of rewrite rules in a group presentation. Tietze transformations of monoid presentations are described in \cite{book2012string} and \cite{henry2021tietze}. @@ -1756,7 +1756,7 @@ \section{Recursive Conformances}\label{recursive conformances redux} \end{example} \begin{example}\label{double encoding} -We end this chapter with one final curiosity. We proved the derived requirements formalism to be undecidable in \SecRef{monoidsasprotocols} by showing that an arbitrary finitely-presented monoid $\AR$ can be encoded in the form of a protocol declaration. In \ChapRef{symbols terms rules} we defined a lowering of a generic signature and its protocol dependencies into a finitely-presented monoid. If we chain both transformations, we see that we can map a finitely-presented monoid to a protocol declaration and then back to a finitely-presented monoid. In what sense does the latter monoid encode the original monoid? +We end this chapter with one final curiosity. We proved the derived requirements formalism to be undecidable in \SecRef{monoidsasprotocols} by showing that an arbitrary finitely-presented monoid $\AR$ can be encoded in the form of a protocol declaration. In \ChapRef{chap:symbols terms rules} we defined a lowering of a generic signature and its protocol dependencies into a finitely-presented monoid. If we chain both transformations, we see that we can map a finitely-presented monoid to a protocol declaration and then back to a finitely-presented monoid. In what sense does the latter monoid encode the original monoid? Let $A^*:=\{a,b,c\}$, $R:=\{(ab,c),\,(bc,\varepsilon)\}$, and consider the monoid $M:=\AR$. Written down as a Swift protocol, $M := \Pres{a,b,c}{ab\sim c,\,bc\sim\varepsilon}$ looks like this: \begin{Verbatim} @@ -1835,7 +1835,7 @@ \section{Recursive Conformances}\label{recursive conformances redux} Our double encoding of ``a monoid as a protocol as a rewrite system'' introduces many new symbols and rewrite rules not found in the original presentation. It is a remarkable fact of our construction then, that a convergent monoid presentation always maps to a convergent rewriting system where the added detritus can always be ``factored out.'' \end{example} -\section{Source Code Reference}\label{completion sourceref} +\section{Source Code Reference}\label{src:completion} Key source files: \begin{itemize} @@ -1845,18 +1845,18 @@ \section{Source Code Reference}\label{completion sourceref} \end{itemize} \apiref{rewriting::RewriteSystem}{class} -See also \SecRef{symbols terms rules sourceref}. +See also \SecRef{src:symbols terms rules}. \begin{itemize} \item \texttt{addRule()} implements \AlgRef{add rule derived algo}. -\item \texttt{recordRewriteLoop()} records a \index{rewrite loop}rewrite loop if this rewrite system is used for minimization. +\item \texttt{recordRewriteLoop()} records a \index{rewrite loop}rewrite loop if this rewriting system is used for minimization. \item \IndexSource{critical pair}\texttt{computeCriticalPair()} implements \AlgRef{critical pair algo}. \item \IndexSource{Knuth-Bendix algorithm}\texttt{performKnuthBendix()} implements \AlgRef{knuthbendix}. -\item \texttt{simplifyLeftHandSides()} \IndexSource{left simplification}\IndexSource{reduced rewrite system}\IndexSource{left-reduced rewrite system}implements \AlgRef{left simplification}. -\item \texttt{simplifyRightHandSides()} \IndexSource{right simplification}\IndexSource{right-reduced rewrite system}implements \AlgRef{right simplification}. +\item \texttt{simplifyLeftHandSides()} \IndexSource{left simplification}\IndexSource{reduced rewriting system}\IndexSource{left-reduced rewriting system}implements \AlgRef{left simplification}. +\item \texttt{simplifyRightHandSides()} \IndexSource{right simplification}\IndexSource{right-reduced rewriting system}implements \AlgRef{right simplification}. \end{itemize} \apiref{rewriting::Trie}{template class} -See also \SecRef{symbols terms rules sourceref}. +See also \SecRef{src:symbols terms rules}. \begin{itemize} \item \texttt{findAll()} finds all \index{rule trie}overlapping rules using \AlgRef{find overlapping rule algo}. \end{itemize} diff --git a/docs/Generics/chapters/concrete-conformances.tex b/docs/Generics/chapters/concrete-conformances.tex deleted file mode 100644 index ab0fbb1743ae4..0000000000000 --- a/docs/Generics/chapters/concrete-conformances.tex +++ /dev/null @@ -1,70 +0,0 @@ -\documentclass[../generics]{subfiles} - -\begin{document} - -\chapter[]{Concrete Conformances}\label{concrete conformances} - -\ifWIP -TODO: -\begin{itemize} -\item Concrete conformance rule, property-like -\item Virtual rule that introduces it -\item Idea: it should eliminate the conformance rule but not the concrete type rule -\item Doesn't actually appear in signature so should not impact minimization -\item Conditional requirement inference, only in generic signatures and not protocols, because we can't merge connected components during completion. for a generic signature this actually has to import new components in the general case -\end{itemize} -\fi - -\section[]{Type Witnesses}\label{rqm type witnesses} - -\IndexTwoFlag{debug-requirement-machine}{concretize-nested-types} - -\ifWIP -TODO: -\begin{itemize} -\item Concrete type witness -\item Abstract type witness -\item Virtual rules -\item Algorithm for building a ``relative'' concrete type symbol from substituting another symbol's pattern type -\end{itemize} -\fi - -\section[]{Recursive Conformances} - -\ifWIP -TODO: -\begin{itemize} -\item Free conformances -\item Can a protocol have a free conformance -\item Can a conformance be made free by changing the protocol -\item Conformance evaluation graph -\item Heuristic to find same-type requirements from a conformance; just the parent type thing -\item The problem with opaque archetypes -\item Open question: can we encode a conformance more directly without evaluating it; \verb|G>>| example -\end{itemize} -\fi - -\IndexFlag{enable-requirement-machine-opaque-archetypes} - -\section[]{Concrete Contraction}\label{concrete contraction} - -\IndexFlag{disable-requirement-machine-concrete-contraction} -\IndexTwoFlag{debug-requirement-machine}{concrete-contraction} - -\IndexDefinition{concrete contraction} - -\ifWIP -TODO: -\begin{itemize} -\item Doesn't actually appear in signature so should not impact minimization -\item The problem: it might give you a smaller anchor -\item Invariant violation without concrete contraction -\item Concrete contraction substitutes superclass and concrete types -\item Also GSB compatibility: T.A, T == C, C.A is a concrete typealias that's not an associated type. this doesn't add a rule -\item Open question: can we do this in a more principled way -\end{itemize} -\fi - -\section[]{Source Code Reference} - -\end{document} diff --git a/docs/Generics/chapters/conformance-paths.tex b/docs/Generics/chapters/conformance-paths.tex index a6711284e8f9d..316e91ed90720 100644 --- a/docs/Generics/chapters/conformance-paths.tex +++ b/docs/Generics/chapters/conformance-paths.tex @@ -96,14 +96,14 @@ \chapter{Conformance Paths}\label{conformance paths} τ_0_1 == τ_0_0.[Collection]SubSequence> \end{verbatim} \end{quote} -We will also be working with the \texttt{String}, \texttt{Substring}, and \texttt{Character} nominal types from the standard library, and their conformances to various protocols. We already studied \tCollection\ in \ExRef{protocol collection example} and \ExRef{protocol collection graph}. \ListingRef{conformance paths listing} recalls the protocol declarations, and lists the relevant facts about the following normal conformances: +We will also be working with the \texttt{String}, \texttt{Substring}, and \texttt{Character} types from the standard library, and their conformances to \tSequence\ and \tCollection. We already studied these protocols in \ExRef{protocol collection example} and \ExRef{protocol collection graph}. \ListingRef{conformance paths listing} recalls the protocol declarations, and lists the relevant facts about the following normal conformances: \begin{gather*} \StringCollection\\ \SubstringCollection\\ \StringSequence\\ \SubstringSequence \end{gather*} -W're going to apply this \index{substitution map}substitution map to various dependent member types: +We're going to apply this \index{substitution map}substitution map to various dependent member types: \begin{align*} \Sigma := \SubstMapC{ &\SubstType{\rT}{String},\\ @@ -180,15 +180,15 @@ \chapter{Conformance Paths}\label{conformance paths} \qquad {} = \texttt{IndexingIterator} \end{gather*} \IndexDefinition{conformance path}% -If we peel off the type witness projection and substitution map on either end of the long expression in the middle, we're left with a conformance path for the abstract conformance $\rUSequence$ of $G$: +If we peel off the type witness projection and substitution map at either end of the long expression in the middle, we're left with a conformance path for the abstract conformance $\rUSequence$ of $G$: \[\SelfSequence \otimes \SelfSubSequence \otimes \rTCollection\] -A conformance path only depends on the generic signature, and not on any particular substitution map. We can use this conformance path to perform a local conformance lookup of $\rUSequence$ in \emph{any} substitution map for our generic signature~$G$. +A conformance path only depends on the generic signature, and not on any particular substitution map. We can use the above conformance path to perform a local conformance lookup of $\rUSequence$ within \emph{any} substitution map with the same input generic signature~$G$. \end{example} \begin{definition}\label{conformance path def} Let $G$ be a generic signature. Formally, a \IndexDefinition{conformance path}\emph{conformance path} is an \index{ordered tuple}ordered tuple $(s_1,\,\ldots,\,s_n)$ of \IndexDefinition{conformance path length}length $n \geq 1$, such that the first step~$s_1$ is a \index{root abstract conformance}root abstract conformance of~$G$, and each subsequent step~$s_i$ is an \index{associated conformance requirement}associated conformance requirement stated by the protocol that appears on the right-hand side of the previous step. However, instead of using tuple notation, we write conformance paths from \emph{right to left}: \[s_n\otimes\cdots \otimes s_1\] -Alteratively, we can write something like this: +Alternatively, we can write something like this: \[\AssocConf{Self.$\tU_n$}{$\tP_n$} \otimes \cdots \otimes \AssocConf{Self.$\tU_2$}{$\tP_2$} \otimes \TPOne \] A conformance path of length~1 is just single root abstract conformance $\TPOne$. \end{definition} @@ -247,7 +247,7 @@ \section{Validity and Existence}\label{conformance paths exist} \item Is every valid abstract conformance equivalent to a principal abstract conformance? \item Given a valid abstract conformance, can we find a conformance path representing an equivalent principal abstract conformance? \end{enumerate} -We will answer (1) and (2) first, before we present \AlgRef{find conformance path algorithm} to solve (3) in the next section. Recall that $\TP$ is a \index{valid abstract conformance}valid abstract conformance if $G\vdash\TP$. To establish (1), we can show that a conformance path can be translated into a special kind of derivation. +We will answer (1) and (2) first, before we look at \AlgRef{find conformance path algorithm} to solve (3) in the next section. Recall that $\TP$ is a \index{valid abstract conformance}valid abstract conformance if $G\vdash\TP$. To establish (1), we can show that a conformance path can be translated into a special kind of derivation. \begin{definition}\label{principal derivation def} Let $G$ be a generic signature. A \IndexDefinition{principal derivation}\emph{principal derivation} is a derivation of a conformance requirement that consists of a single \IndexStep{Conf}\textsc{Conf} elementary statement, followed by zero or more steps that apply \IndexStep{AssocConf}\textsc{AssocConf} to the conclusion of the previous step, with no unused conclusions or other kinds of steps: @@ -283,28 +283,30 @@ \section{Validity and Existence}\label{conformance paths exist} \SameNameStepDef\\ \SameDeclStepDef \end{gather*} -These rules wrap both sides of a same-type requirement in a dependent member type, increasing their lengths by one. The below construction shows that this also allows us to derive ``longer'' same-type requirements. This requires proof by induction, together with some facts about well-formed generic signatures (\SecRef{generic signature validity}). +These rules wrap both sides of a same-type requirement in a dependent member type, increasing their lengths by one. The below lemma shows that we can also iterate these steps and produce ``longer'' same-type requirements. This requires proof by induction, together with some facts about well-formed generic signatures (\SecRef{generic signature validity}). \begin{lemma}\label{general member type} -Suppose that $G$ is a \index{well-formed generic signature}well-formed generic signature, \tT\ and \tU\ are \index{valid type parameter}valid type parameters of $G$, and finally, \texttt{T.V} and \texttt{U.V} are two type parameters formed by repeatedly wrapping \tT~and~\tU\ in dependent member types having pairwise identical identifiers or associated type declarations, so for some $n\geq 0$: -\begin{gather*} -\texttt{T.V} := \texttt{T.$\nA_1$...$\nA_n$}\\ -\texttt{U.V} := \texttt{U.$\nA_1$...$\nA_n$} -\end{gather*} -Under these assumptions, if $G\vdash\TU$ and $G\vdash\texttt{U.V}$, then $G\vdash\SameReq{T.V}{U.V}$. The same conclusion also holds if we start with $G\vdash\texttt{T.V}$, instead of $G\vdash\texttt{U.V}$. +Let $G$ be a \index{well-formed generic signature}well-formed generic signature, with $G\vdash\TU$ for some type parameters \tT~and~\tU. Suppose that \texttt{T.V} and \texttt{U.V} are type parameters formed by wrapping \tT~and~\tU\ in dependent member types with pairwise identical identifiers or associated type declarations, so for some $n\geq 0$: +\[ +\texttt{T.V} := \texttt{T.$\nA_1$...$\nA_n$} \qquad \texttt{and} \qquad \texttt{U.V} := \texttt{U.$\nA_1$...$\nA_n$} +\] +Also, suppose that at least one of \texttt{T.U} or \texttt{U.V} is valid, so $G\vdash\texttt{T.V}$ or $G\vdash\texttt{U.V}$. Then: +\[ +G\vdash\SameReq{T.V}{U.V} +\] \end{lemma} \begin{proof} -We establish the $G\vdash\texttt{U.V}$ case first. Note that \index{type parameter length}$|\texttt{T.V}|=|\tT| + n$ and $|\texttt{U.V}|=|\tU|+n$. We proceed by \index{induction}induction on~$n$. +We consider the case where $G\vdash\texttt{U.V}$ first. Note that \index{type parameter length}$|\texttt{T.V}|=|\tT| + n$ and $|\texttt{U.V}|=|\tU|+n$. We proceed by \index{induction}induction on~$n$. \BaseCase If $n=0$, then \texttt{T.V} is just \tT, and \texttt{U.V} is also \tU. We have $G\vdash\TU$ by assumption, so we already have the desired conclusion $G\vdash\SameReq{T.V}{U.V}$. -\InductiveStep Suppose that $n>0$, and consider the outermost dependent member type of \texttt{U.V}, which is either \index{bound dependent member type}bound \texttt{$\tUp$.[P]A} or \index{bound dependent member type}unbound \texttt{$\tUp$.A}, for some type parameter~$\tUp$ and associated type~\nA\ of a protocol \tP. By assumption, \texttt{T.V} has the same form \texttt{$\tTp$.[P]A} or \texttt{$\tTp$.A}, for some type parameter $\tTp$. +\InductiveStep Suppose that $n>0$, and consider the outermost dependent member type of \texttt{U.V}, which is either \index{bound dependent member type}bound \texttt{$\tUp$.[P]A} or \index{bound dependent member type}unbound \texttt{$\tUp$.A}, for some type parameter~$\tUp$ and associated type~\nA\ of a protocol \tP. By assumption, \texttt{T.V} has the same form \texttt{$\tTp$.[P]A} or \texttt{$\tTp$.A}, for some type parameter $\tTp$. Note that $|\tTp|=|\tT|-1$ and $|\tUp|=|\tU|-1$. -By \PropRef{valid type param alt}, we know $G\vdash\texttt{U.V}$ implies $G\vdash\ConfReq{$\tUp$}{P}$, so we start by deriving: +Since $G\vdash\texttt{U.V}$, we have $G\vdash\ConfReq{$\tUp$}{P}$ by \PropRef{valid type param alt}: \begin{gather*} \AnyStep{\ConfReq{$\tUp$}{P}}{1} \end{gather*} -Since $G$ is well-formed, we also have $G\vdash\tUp$, so by the induction hypothesis, we get: +Also, $G$ is well-formed, so $G\vdash\tUp$. Applying the induction hypothesis, we get: \begin{gather*} \AnyStep{\SameReq{$\tTp$}{$\tUp$}}{2} \end{gather*} @@ -316,7 +318,7 @@ \section{Validity and Existence}\label{conformance paths exist} \begin{gather*} \SameNameStep{1}{2}{$\tTp$.A}{$\tUp$.A}{3} \end{gather*} -Thus, $G\vdash\texttt{U.V}$ implies $G\vdash\SameReq{T.V}{U.V}$. +In both cases, we conclude that $G\vdash\SameReq{T.V}{U.V}$, closing the induction. To prove the same result from $G\vdash\texttt{T.V}$ instead of $G\vdash\texttt{U.V}$, we flip $\SameReq{T}{U}$ using \IndexStep{Sym}\textsc{Sym}, and repeat the above construction with $\SameReq{U}{T}$ to get $\SameReq{U.V}{T.V}$. A second application of \textsc{Sym} yields $G\vdash\SameReq{T.V}{U.V}$. \end{proof} @@ -330,7 +332,7 @@ \section{Validity and Existence}\label{conformance paths exist} \begin{theorem}\label{conformance paths theorem} Let $G$ be a \index{well-formed generic signature}well-formed generic signature. If $G\vdash\TP$ for some type parameter~\tT\ and protocol~\tP, there exists a valid type parameter $\tTp$ such that $G\vdash\ConfReq{$\tTp$}{P}$ via a principal derivation, and $G\vdash\SameReq{T}{$\tTp$}$. \end{theorem} -Before we present the proof, we refer to~\AppendixRef{derived summary} and observe that conformance requirements are always derived by one of three kinds of derivation steps: +Before we look at the proof, the reader may wish to review~\AppendixRef{derived summary}, and recall that conformance requirements are always derived by one of three kinds of steps: \begin{gather*} \ConfStepDef\\ \SameConfStepDef\\ @@ -341,7 +343,7 @@ \section{Validity and Existence}\label{conformance paths exist} \begin{proof} We proceed by \index{structural induction}structural induction on the \index{derived requirement!structural induction}derivation of $G\vdash\TP$. We will construct the two desired derivations at each step. -\BaseCase A \textsc{Conf} elementary statement is the \index{base case}base case of our structural induction, because it has no assumptions. A derivation of an explicit conformance requirement is already principal by our definition, so we set $\tTp := \tT$. Our first derivation is just: +\BaseCase A \textsc{Conf} elementary statement is the base case of our structural induction, because it has no assumptions. A derivation of an explicit conformance requirement is already principal by our definition, so we set $\tTp := \tT$. Our first derivation is just: \[ \ConfStepDef \] @@ -362,7 +364,7 @@ \section{Validity and Existence}\label{conformance paths exist} \AssocConfStepDefX \] This is the ``untangling'' part. While $G\vdash\ConfReq{U}{Q}$ might not be principal, by induction, -it splits up into a principal derivation $G\vdash\ConfReq{$\tUp$}{Q}$ for some~$\tUp$, and a same-type requirement $G\vdash\SameReq{U}{$\tUp$}$. We apply \textsc{AssocConf} to the first derivation to get a conformance to~\tQ, but with some other subject type \texttt{$\tUp$.V}: +it splits up into a principal derivation $G\vdash\ConfReq{$\tUp$}{Q}$ for some~$\tUp$, and a same-type requirement $G\vdash\SameReq{U}{$\tUp$}$. We apply \textsc{AssocConf} to the first derivation to get a conformance to~\tP, but with some other subject type \texttt{$\tUp$.V}: \begin{gather*} \AnyStep{\ConfReq{$\tUp$}{Q}}{1}\\ \AssocConfStep{1}{$\tUp$.V}{P}{2} @@ -372,7 +374,7 @@ \section{Validity and Existence}\label{conformance paths exist} \section{The Conformance Path Graph}\label{finding conformance paths} -We embarked on our tour of type substitution in \ChapRef{substmaps}, with the following three algorithms marking key waypoints along the journey: +We embarked on our tour of type substitution in \ChapRef{chap:substitution maps}, with the following three algorithms marking key waypoints along the journey: \begin{enumerate} \item \AlgRef{type subst algo} implements type parameter substitution, handling generic parameter types directly while delegating the \index{dependent member type}dependent member type case. \item \AlgRef{dependent member type substitution} implements dependent member type substitution in terms of local conformance lookup. @@ -387,7 +389,7 @@ \section{The Conformance Path Graph}\label{finding conformance paths} \item For each pair of abstract conformances $\TP$ and $\ConfReq{$\tTp$}{Q}$ such that $\ConfReq{$\tTp$}{Q}$ is equivalent to $\TP \otimes \AssocConf{Self.U}{Q}$ for some \index{associated conformance projection!conformance path graph}associated conformance projection $\AssocConf{Self.U}{Q}$, we add an \index{edge!conformance path graph}edge with \index{source vertex!conformance path graph}source vertex $\TP$ and \index{destination vertex!conformance path graph}destination vertex $\ConfReq{$\tTp$}{Q}$. The edge is labeled $\AssocConf{Self.U}{Q}$. \end{itemize} \end{definition} -A conformance path is just a \index{path!conformance path graph}path in the conformance path graph. The path \index{source vertex!conformance path}starts from a \index{root abstract conformance!conformance path}root abstract conformance, and \index{destination vertex!conformance path}ends at the \index{principal abstract conformance}principal abstract conformance that the conformance path represents; the vertices visited along the way are the intermediate values of~$\TP$ in \AlgRef{invertconformancepath}. We previously studied directed graphs in \SecRef{type parameter graph} when discussing the \index{type parameter graph}type parameter graph of a generic signature. The conformance path graph construction is similar, and it is instructive to compare the two: +A conformance path is just a \index{path!conformance path graph}path in the conformance path graph. The path starts from a \index{root abstract conformance!conformance path}root abstract conformance, and \index{destination vertex!conformance path}ends at the \index{principal abstract conformance}principal abstract conformance that the conformance path represents; the vertices visited along the way are the intermediate values of~$\TP$ in \AlgRef{invertconformancepath}. We previously studied directed graphs in \SecRef{type parameter graph} when discussing the \index{type parameter graph!conformance path graph}type parameter graph of a generic signature. The conformance path graph construction is similar, and it is instructive to compare the two: \begin{center} \begin{tabular}{lll} \toprule @@ -490,7 +492,7 @@ \section{The Conformance Path Graph}\label{finding conformance paths} \end{algorithm} While the requirement order of \AlgRef{requirement order} is a partial order in general, it is linear on conformance requirements, so Step~5 cannot return ``$\bot$'', and the conformance path order is linear as well. The argument used in \PropRef{well founded type order} can also show that the conformance path order is \index{well-founded order}well-founded. Therefore, each equivalence class of conformance paths contains a unique minimum element, or a \IndexDefinition{reduced conformance path}\emph{reduced conformance path}. -While \AlgRef{conformance path order alg} is part of the formal model, the compiler does not implement it directly. Instead, we ensure that our breadth-first search always visits requirements in increasing order at each step, which is enough to guarantee that we always find the reduced conformance path in each equivalence class first. The breadth-first search requires some persistent state with each generic signature $G$: +While \AlgRef{conformance path order alg} is part of the formal model, the compiler does not implement it directly. Instead, we ensure that our breadth-first search always visits requirements in increasing order at each step, which is enough to guarantee that we always find the reduced conformance path in each equivalence class first. The breadth-first search associates some persistent state with each generic signature $G$: \begin{itemize} \item A hash table, mapping reduced abstract conformances to conformance paths. \item An integer $N$, initially 0. @@ -498,7 +500,7 @@ \section{The Conformance Path Graph}\label{finding conformance paths} \item A growable array $B_1$, to temporarily hold all conformance paths of length $N+1$. \end{itemize} -Here is the actual algorithm at least. Each invocation will either immediately return an existing conformance path from the table, or proceed to enumerate conformance paths in increasing order, until the one we are looking for appears in the table. +Here is the actual algorithm at last. Each invocation will either immediately return an existing conformance path from the table, or proceed to enumerate conformance paths in increasing order, until the one we are looking for appears in the table. \begin{algorithm}[Find conformance path]\label{find conformance path algorithm} Receives a generic signature $G$ and a valid abstract conformance $\TP$ as input. Outputs the reduced conformance path for $\TP$. @@ -534,7 +536,7 @@ \section{The Conformance Path Graph}\label{finding conformance paths} In Step~1, we bail out if the given type parameter~\tT\ does not actually conform to \tP, because then we will never find a conformance path for $\TP$. Once we establish the precondition, we can guarantee termination, by \ThmRef{conformance paths theorem} together with our choice of conformance path order. In particular, this means that the $B$ array will never be empty at the start of Step~5, because then we would enter an infinite loop, contradicting our theorem. We don't \emph{have to} show why it cannot be empty! (Let's do it anyway. If $B$~is empty but~$N>0$, our generic signature $G$ has a finite set of conformance paths, and we've already enumerated all of them on a previous invocation. In this case though, Step~3 will always return a value before we get to Step~5.) \begin{example}\label{free monoid first time} -Termination is great, but how large is ``finite''? In the worst case, the running time of \AlgRef{find conformance path algorithm} is \index{asymptotic complexity}exponential in the \index{type parameter length}length of the subject type parameter. Consider the protocol protocol signature $G_\texttt{M}$, with \texttt{M} as below: +Termination is great, but how large is ``finite''? In the worst case, the running time of \AlgRef{find conformance path algorithm} is \index{asymptotic complexity}exponential in the \index{type parameter length}length of the subject type parameter. Consider the protocol generic signature $G_\texttt{M}$, with \texttt{M} as below: \begin{Verbatim} protocol M { associatedtype A: M @@ -556,20 +558,20 @@ \section{The Conformance Path Graph}\label{finding conformance paths} For now at least, we have yet to observe this form of combinatorial explosion in practice. In realistic programs, conformance path graphs tend to be relatively simple, and type parameters tend not to be overly long. \end{example} -Now, we pause to take a breather. Our implementation of \AlgRef{type subst algo} is almost completely filled in, except for the \IndexQuery{getReducedType}$\Query{getReducedType}{}$ and \IndexQuery{conformsToProtocol}$\Query{requiresProtocol}{}$ generic signature queries, whose implementation is explained in \PartRef{part rqm}. It is worth noting that the pathological protocol~\texttt{M} shown above is quite interesting for other reasons, and it will reappear in \SecRef{monoidsasprotocols}. +Now, we pause to take a breather. Our implementation of \AlgRef{type subst algo} is almost completely filled in, except for the \IndexQuery{getReducedType}$\Query{getReducedType}{}$ and \IndexQuery{requiresProtocol}$\Query{requiresProtocol}{}$ generic signature queries, whose implementation is explained in \PartRef{part rqm}. It is worth noting that the pathological protocol~\texttt{M} shown above is quite interesting for other reasons, and it will reappear in \SecRef{monoidsasprotocols}. \section{Recursive Conformances}\label{recursive conformances} -We classified generic signatures into three families of increasing complexity in \SecRef{archetype builder}. A generic signature may either have a finite set of type parameters in all, an infinite set of type parameters that collapse into a finite set of equivalence classes, or most generally, an infinite set of equivalence classes. +In \SecRef{archetype builder}, we classified generic signatures into three families of increasing complexity. To review, a generic signature may have a finite set of type parameters in all, an infinite set of type parameters that collapse down to a finite set of equivalence classes, or most generally, an infinite set of equivalence classes. -The vertices in the type parameter graph are equivalence classes of type parameters, so in the first two cases, the type parameter graph is finite. Furthermore, in the first case, our graph must be \index{DAG|see {directed acyclic graph}}\IndexDefinition{directed acyclic graph}\emph{acyclic}, meaning it does not contain a \index{cycle}\emph{cycle}, which is a non-empty \index{path}path with the same source and destination. Indeed, if we follow a cycle an arbitrary number of times, we can exhibit an infinite sequence of equivalent valid type parameters. +We can restate this in terms of the type parameter graph, as follows. In the first two cases, the type parameter graph is finite. Furthermore, in the first case, the type parameter graph is a \index{directed acyclic graph}directed acyclic graph (\DefRef{dag def}). Indeed, as \ExRef{protocol collection graph} shows, we can exhibit an infinite sequence of equivalent type parameters by following a \index{cycle}cycle an arbitrary number of times. -We get a similar classification of generic signatures if we consider the conformance path graph instead. Our conformance path graph might be finite and acyclic, so we only have finitely many conformance paths, or it might be finite but contain a cycle, so we have a finite set of equivalence classes where at least one contains infinitely many paths, or we might have \index{infinite graph!conformance path graph}infinitelt many equivalence classes of conformance paths. +We will now see that we obtain a similar classification of generic signatures if we instead consider the conformance path graph. Our conformance path graph might be finite and acyclic, so we only have finitely many conformance paths, or it might be finite but contain a cycle, so we have a finite set of equivalence classes where at least one contains infinitely many paths, or we might have \index{infinite graph!conformance path graph}infinitely many equivalence classes of conformance paths. It turns out that the two classifications overlap, but do not completely coincide. To understand why, we introduce another directed graph. \begin{definition}\label{protocol dependency graph def} -The \IndexDefinition{protocol dependency graph}\emph{protocol dependency graph} is the \index{directed graph!protocol dependency graph}directed graph where the vertices are protocol declarations, or more precisely, the \index{vertex}vertex set is the universe of all \index{protocol declaration!protocol dependency graph}protocol declarations visible to name lookup, which we denoted by $\ProtoObj$ in \SecRef{conformance lookup}. The edge set consists of all \index{associated conformance requirement!protocol dependency graph}associated conformance requirements from all protocols, such that the \index{source vertex!protocol dependency graph}source vertex of $\AssocConfReq{Self.U}{Q}{P}$ is~\tP\ and the \index{destination vertex!protocol dependency graph}destination vertex is~\tQ. +The \IndexDefinition{protocol dependency graph}\emph{protocol dependency graph} is the \index{directed graph!protocol dependency graph}directed graph whose vertices are protocol declarations, or more precisely, its \index{vertex}vertex set is the universe of all \index{protocol declaration!protocol dependency graph}protocol declarations visible to name lookup, which we denoted by $\ProtoObj$ in \SecRef{conformance lookup}. The edge set consists of all \index{associated conformance requirement!protocol dependency graph}associated conformance requirements from all protocols, such that the \index{source vertex!protocol dependency graph}source vertex of $\AssocConfReq{Self.U}{Q}{P}$ is~\tP\ and the \index{destination vertex!protocol dependency graph}destination vertex is~\tQ. \end{definition} We will encounter the protocol dependency graph again in \SecRef{protocol component}, when we describe the construction of the requirement machine for a generic signature. @@ -643,11 +645,11 @@ \section{Recursive Conformances}\label{recursive conformances} An \index{associated conformance requirement!recursive}associated conformance requirement is \IndexDefinition{recursive conformance requirement}\emph{recursive} if its edge is part of a \index{cycle!protocol dependency graph}cycle in the protocol dependency graph. \end{definition} -In the previous three examples we saw, $\AssocConfReq{Self.SubSequence}{Collection}{Collection}$, $\AssocConfReq{Self.A}{N}{N}$, $\AssocConfReq{Self.Foo}{Foe}{Fee}$, and $\AssocConfReq{Self.Foo}{Fee}{Foe}$ are all recursive. We recall from \SecRef{archetype builder} that recursive conformance requirements appeared in \IndexSwift{4.1}Swift~4.1~\cite{se0157}; prior to this, the protocol dependency graph was always \index{directed acyclic graph}acyclic. +In the three preceding examples, all of $\AssocConfReq{Self.SubSequence}{Collection}{Collection}$, $\AssocConfReq{Self.A}{N}{N}$, $\AssocConfReq{Self.Foo}{Foe}{Fee}$, and $\AssocConfReq{Self.Foo}{Fee}{Foe}$ are all recursive. We recall from \SecRef{archetype builder} that recursive conformance requirements appeared in \IndexSwift{4.1}Swift~4.1~\cite{se0157}; prior to this, the protocol dependency graph was always \index{directed acyclic graph}acyclic. To relate the conformance path graph with the protocol dependency graph, consider the function $\pi\colon\ConfObj{G}\rightarrow\ProtoObj$: \[\pi(\XP):=\tP\] -This operation makes sense with both on abstract and concrete conformances. When restricted to abstract conformances, this maps the vertices of the conformance path graph to the vertices of the protocol dependency graph. Furthermore, if two vertices in the conformance path graph are joined by an edge, the corresponding vertices in the protocol dependency graph are also joined by an edge with the same label. +This operation is defined on both abstract and concrete conformances. If we consider its restriction to abstract conformances, we see that it maps the vertices of the conformance path graph to the vertices of the protocol dependency graph. Furthermore, if two vertices in the conformance path graph are joined by an edge, the corresponding vertices in the protocol dependency graph are also joined by an edge with the same label. \begin{definition} Let $X:=(V_X,E_X)$ and $Y:=(V_Y,E_Y)$ be directed graphs. A \index{homomorphism}\IndexDefinition{graph homomorphism}\emph{graph homomorphism} $f\colon X\rightarrow Y$ is a pair of functions, $f_v\colon V_X\rightarrow V_Y$ for mapping \index{vertex!graph homomorphism}vertices to vertices, and $f_e\colon E_X\rightarrow E_Y$ for mapping \index{edge!graph homomorphism}edges to edges, such that for all $e\in E_X$, @@ -658,7 +660,7 @@ \section{Recursive Conformances}\label{recursive conformances} Note that $f_e$ is permitted to map $\Src(e)$ and $\Dst(e)$ to the same vertex as long as $Y$ has a \index{loop!graph homomorphism}loop---an edge with the same source and destination---at this vertex. \end{definition} -We can also apply our graph homomorphism to a \index{path!graph homomorphism}path in~$G_1$, by first applying it to the source vertex and then each successive edge. By induction, the result is a valid path in~$G_2$. If we apply $\pi$ to a \index{path!conformance path graph}path in the conformance path graph, we get a path in the protocol dependency graph. A \index{path!protocol dependency graph}path in the protocol dependency graph is a sequence of \index{associated conformance projection}associated conformance projections, so it's like a conformance path except that the first step is missing. The general form of a path from $\tP_1$ to $\tP_n$ can be denoted by: +We can also apply our graph homomorphism to a \index{path!graph homomorphism}path in~$G_1$, by first applying it to the source vertex, and then to each successive edge. By induction, the result is a valid path in~$G_2$. In particular, if we apply~$\pi$ to a \index{path!conformance path graph}path in the conformance path graph, we get a path in the protocol dependency graph. A \index{path!protocol dependency graph}path in the protocol dependency graph is a sequence of \index{associated conformance projection}associated conformance projections, so it's like a conformance path but without the first step. The general form of a path from $\tP_1$ to $\tP_n$ can be denoted by: \[\AssocConf{Self.$\tU_n$}{$\tP_n$} \otimes \cdots \otimes \AssocConf{Self.$\tU_1$}{$\tP_1$}\] We can now state this interesting fact. @@ -680,7 +682,7 @@ \section{Recursive Conformances}\label{recursive conformances} Let $G$ be a \index{well-formed generic signature}well-formed generic signature. If $G$ has an infinite set of conformance paths, then $G$ has an infinite set of \index{valid type parameter}valid type parameters. \end{proposition} -Therefore, if $G$ has a finite set of valid type parameters, that is, if the type parameter graph of $G$ is finite and acyclic, then the conformance path graph of $G$ must be finite and acyclic as well. However, we cannot strenghten this to an ``if and only if,'' as the following example shows. +Therefore, if $G$ has a finite set of valid type parameters, that is, if the type parameter graph of $G$ is finite and acyclic, then the conformance path graph of $G$ must be finite and acyclic as well. However, we cannot strengthen this to an ``if and only if,'' as the following example shows. \begin{example} Let $G$ be the generic signature of the protocol extension: @@ -705,7 +707,7 @@ \section{Recursive Conformances}\label{recursive conformances} $(2 \Rightarrow 1)$ We choose an conformance path from each equivalence class, and find the reduced abstract conformance for each one. This gives us an infinite set of reduced abstract conformances. Consider a function $g$ that maps an abstract conformance to its subject type. There can only be finitely many distinct reduced abstract conformances with the same subject type, because there are only finitely many protocols in $\ProtoObj$. Thus, $g$ maps the infinite set of reduced abstract conformances to an infinite set of reduced type parameters. \end{proof} -To summarize, we can refine our classification of generic signatures into \emph{four} families, by considering the type parameter graph and conformance path graph together: +To summarize, we can refine our classification of generic signatures into \emph{four} families, by considering the \index{type parameter graph}type parameter graph and \index{conformance path graph}conformance path graph together: \begin{enumerate} \item Both graphs are finite and acyclic. \item Both graphs are finite, but only the conformance path graph is acyclic. @@ -779,10 +781,10 @@ \section{Recursive Conformances}\label{recursive conformances} \item Another infinite ray. \item A finite path leading to a cycle. \end{enumerate} -Another way of saying this is that there can only be three possible outcomes if start with $\XN$ and repeatedly apply $\SelfAToN$: +Therefore, if start with $\XN$ and repeatedly apply $\SelfAToN$, one of three things will eventually happen: \begin{enumerate} -\item We eventually end up back at $\XN$. -\item We get an infinite sequence of distinct conformances to \tN. +\item We end up back at $\XN$. +\item We produce an infinite sequence of distinct conformances to \tN. \item We end up at a conformance we've already seen, but not $\XN$. \end{enumerate} We will now construct an example of each behavior, followed by a surprise. @@ -927,7 +929,7 @@ \section{Recursive Conformances}\label{recursive conformances} \end{gather*} We will now attempt to calculate the substituted return type of the below call to \verb|f()|: \begin{Verbatim} -func f(_: T) -> T.A { ... } +func f(_: T) -> T.A {...} let value = f(F()) // What is the type of `value'? \end{Verbatim} @@ -948,13 +950,13 @@ \section{Recursive Conformances}\label{recursive conformances} \end{gather*} \paragraph{Normal forms.} -We've been making the tacit assumption that every valid expression written using the $\otimes$ ``turntable'' operator can be reduced to a \emph{normal form}---a single type, substitution map, or conformance---by the finite application of algebraic identities, until no occurrences of $\otimes$ remain. This turns out to be a false assumption. In the last example, the record keeps spinning and the party doesn't stop:$\quad \otimes \quad \oplus \quad \otimes \quad \oplus \quad \otimes \quad \cdots$ +We've been making the tacit assumption that every valid expression written using the $\otimes$ ``turntable'' operator can be reduced to a \emph{normal form}---a single type, substitution map, or conformance---by the finite application of algebraic identities, until no occurrences of $\otimes$ remain. This turns out to be a false assumption. In the prior example, the record keeps spinning and the party doesn't stop:$\quad \otimes \quad \oplus \quad \otimes \quad \oplus \quad \otimes \quad \cdots$ We showed that \AlgRef{find conformance path algorithm} for \emph{finding} a conformance path always terminates, but we clearly cannot say the same about \AlgRef{local conformance lookup algorithm} for \emph{evaluating} a conformance path! Local conformance lookup---or more generally, the $\otimes$ operator---is actually a \index{partial function}\emph{partial} function, which does not always produce a result. Similarly, the substitution map~$\SigmaS$ described above does not \emph{have} a conformance substitution graph. At the present time, the compiler terminates with a stack overflow if a non-terminating type substitution is attempted. While \AlgRef{local conformance lookup algorithm} itself is a simple counted loop, the recursion is coming from Step~4. If $C$ is a \index{specialized conformance!recursive}specialized conformance, projecting an associated conformance from $C$ applies a substitution map to some other conformance; if this conformance is abstract, we recursively call into \index{local conformance lookup!recursive}local conformance lookup again. Our example happens to arrange everything just right so that we end up repeating the \emph{same} local conformance lookup. -\section{The Halting Problem}\label{tag systems} +\section{The Halting Problem}\label{halting problem} In the future, the compiler should detect non-terminating type substitution and diagnose the problem, instead of crashing. Two approaches spring to mind: \begin{enumerate} @@ -962,12 +964,12 @@ \section{The Halting Problem}\label{tag systems} \item We could try to detect such behavior ahead of time, by carefully analyzing all substitution maps and conformances in the program, and rejecting those that encode non-terminating computation. \end{enumerate} -It turns out that we cannot hope to do better than (1) above in general. Before we can understand why, we first recall some computability theory. The interested reader can refer to~\cite{cutland} for details. +It turns out that we cannot hope to do better than (1) above in general. Before we can understand why, we first recall some computability theory. A classic text for the below is~\cite{cutland}, while~\cite{maccormick2018can} has a gentler introduction. \paragraph{Computable numbers.} In 1937, \index{Alan Turing}Alan Turing set out to formalize what is meant by a \index{computable number}``computable number,'' in the sense that $1/3$, $(1+\sqrt{5})/2$, and $\pi := 3.14159\ldots$ are computable \cite{turing}. If we start from the observation that every integer is ``inherently'' computable by virtue of being representable by a finite sequence of binary digits, then it remains to define the computability of real numbers in the interval $(0,1)$. The binary representation of such a number consists of the radix point, followed by an infinite sequence of ``0''~and~``1'' digits; if we can write down an \emph{effective procedure} for generating this sequence of digits to any degree of precision, the number is computable. -To describe such an effective procedure, Turing introduced what we now call the \index{Turing machine}\emph{Turing machine} formalism. The precise definition can be found in the literature, but for our purposes, a Turing machine is the following: +To describe such an \IndexDefinition{effective procedure}effective procedure, Turing introduced what we now call the \index{Turing machine}\emph{Turing machine} formalism. The precise definition can be found in the literature, but for our purposes, a Turing machine is the following: \begin{enumerate} \item The fixed \emph{machine description} consists of a finite set of \emph{symbols} (``0'' and ``1'' will do), a finite set of \emph{states} one of which is the \emph{initial state}, and a table mapping (state, symbol) keys to (state, symbol, direction) values. @@ -1312,7 +1314,7 @@ \section{The Halting Problem}\label{tag systems} \end{proof} \begin{example} -To append ``$c$'' at the end of ``$ab$'', we compose on the \emph{left} with $\SigmaC$; notice how it trades places with each neighbour until it reaches $\SigmaEnd$, at which point it morphs into $\SigmaCC$: +To append ``$c$'' at the end of ``$ab$'', we compose on the \emph{left} with $\SigmaC$; notice how it trades places with each neighbor until it reaches $\SigmaEnd$, at which point it morphs into $\SigmaCC$: \begin{gather*} \SigmaC\otimes\SigmaAA\otimes\SigmaBB\otimes\SigmaEnd\\ \qquad {} = \SigmaAA\otimes\SigmaC\otimes\SigmaBB\otimes\SigmaEnd\\ @@ -1385,17 +1387,17 @@ \section{The Halting Problem}\label{tag systems} \end{proof} \begin{example} -The following program calculates the Collatz sequence for $3$ at compile time. It traps at run time, but that's irrelevant for our purposes: +The following program calculates the Collatz sequence for $3$ at compile time. (The \texttt{fatalError()} call traps at run time, but that's irrelevant here.) \begin{Verbatim} func collatz(_: T) -> T.Next { fatalError() } -collatz(AA>>()) // here +let x = collatz(AA>>()) // what is the type of `x'? \end{Verbatim} -To get the substituted return type for the call to \texttt{collatz()}, we apply the substitution map $\SigmaAA\otimes\SigmaAA\otimes\SigmaAA\otimes\SigmaEnd$ to the dependent member type \texttt{$\uptau$.Next}, which is just \texttt{Halt}, because the Collatz sequence for 3 reaches 1. +To get the substituted return type for the call expression, we apply the substitution map $\SigmaAA\otimes\SigmaAA\otimes\SigmaAA\otimes\SigmaEnd$ to the original return type \texttt{$\uptau$.Next}. The substituted type will happen to be \texttt{Halt}, because the Collatz sequence for 3 reaches 1. -To save space, we're just going to consider what happens when the substitution map is $\SigmaAA \otimes \SigmaAA \otimes \SigmaEnd$, which evaluates the Collatz sequence for $2$. We make use of the above lemmas. The parentheses indicate the next sub-expression to reduce: +For a worked example, we instead consider what happens when the substitution map is $\SigmaAA \otimes \SigmaAA \otimes \SigmaEnd$, which just evaluates the Collatz sequence for $2$. We make use of the previous lemmas. Each pair of parentheses indicates the next reduction step: \begin{gather*} \left(\TNext\otimes\SigmaAA\right)\otimes\SigmaAA\otimes\SigmaEnd \tag{\LemmaRef{tag next lemma}}\\ \qquad {} = \TNext \otimes \SigmaC \otimes \SigmaB \otimes \left(\SigmaDel \otimes \SigmaAA\right)\otimes\SigmaEnd \tag{\LemmaRef{tag del lemma}}\\ @@ -1409,21 +1411,22 @@ \section{The Halting Problem}\label{tag systems} \qquad {} = \TNext \otimes \SigmaC \otimes \SigmaB \otimes \left( \SigmaDel \otimes \SigmaEnd\right) \tag{\LemmaRef{tag halt lemma}}\\ \qquad {} = \TNext \otimes \SigmaC \otimes \left(\SigmaB \otimes \SigmaHalt\right) \tag{\LemmaRef{tag halt abc lemma}}\\ \qquad {} = \TNext \otimes \left(\SigmaC \otimes \SigmaHalt\right) \tag{\LemmaRef{tag halt abc lemma}}\\ -\qquad {} = \TNext \otimes \SigmaHalt \tag{\LemmaRef{tag next lemma}}\\ +\qquad {} = \left(\TNext \otimes \SigmaHalt\right) \tag{\LemmaRef{tag next lemma}}\\ \qquad {} = \Halt \end{gather*} \end{example} -The Collatz sequence for 2 is just 2 followed by 1, so clearly, our encoding is not very efficient; it takes a great deal of work just to compute $2\div 2 = 1$! Evaluating the Collatz tag system with $n=19$ takes approximately $1/10$th of a second on the author's machine, while attempting $n=27$ traps with a stack overflow, because the intermediate strings are too long. Practical concerns aside, it should be clear that our scheme can encode any tag system. We can add new symbols, change the deletion number and production rules as needed, and specify an arbitrary input string. Without a doubt: +The Collatz sequence for 2 is just 2 followed by 1, so clearly, our encoding is not very efficient; it takes a great deal of work just to compute $2\div 2 = 1$! Evaluating the Collatz tag system with $n=19$ takes approximately $1/10$th of a second on the author's machine, while attempting $n=27$ traps with a stack overflow, because the intermediate strings are too long. Practical concerns aside, it should be clear that our scheme can encode any tag system. We can add new symbols, change the deletion number and production rules as needed, and kick it off with an arbitrary input string. Without a doubt: \begin{theorem} Swift type substitution is Turing-complete. \end{theorem} \paragraph{Further discussion.} -In \SecRef{conditional conformance}, we looked at an example of a non-terminating conditional conformance check in Swift, and cited a similar example in \index{Rust}Rust. Later, \SecRef{word problem} will show that Swift reduced type equality is also undecidable in the general case, but we will provide a termination guarantee by restricting the problem. -Another source of Turing-completeness in type checking is described in a post to the Swift forums \cite{brainfuck}, where it is shown that an interpreter for the ``Brainfuck'' programming language can be encoded by using the \emph{key path member lookup} feature \cite{se0252}. +In \SecRef{sec:conditional conformances}, we looked at an example of a non-terminating conditional conformance check in Swift, and cited a similar example in \index{Rust}Rust. In fact, the construction in this section can be tweaked slightly to encode the tag system in terms of conditional conformances instead, so Swift conditional conformance checking is again Turing-complete, beyond simply encoding infinite loops. Looking ahead, \SecRef{word problem} will show that Swift reduced type equality is also undecidable in the general case, but we will get a termination guarantee by restricting the problem. + +A contributor to the Swift forums discovered another Turing machine in the Swift type system \cite{brainfuck}, by constructing an interpreter for the ``Brainfuck'' programming language in terms of the \emph{key path member lookup} feature \cite{se0252}. Countless other examples of undecidable type checking problems are described in the literature. For example, \index{Java}Java generics are known to be Turing-complete \cite{java_wildcards,java_undecidable}. -Countless other examples of undecidable type checking problems are described in the literature. We will cite just two more. The Turing-completeness of \index{Java}Java generics is shown in \cite{java_undecidable}, and of \index{TypeScript}TypeScript in \cite{tscollatz}; the latter example also encodes the Collatz sequence, but in a completely different manner than we did in this section. For further discussion of the Collatz conjecture, see \cite{collatzbook} and \cite{wolframtag}. +Finally, an encoding of the Collatz sequence in the \index{TypeScript}TypeScript language is described in \cite{tscollatz}. For further discussion of the Collatz conjecture, see \cite{collatzbook} and \cite{wolframtag}. \section{Source Code Reference} @@ -1445,7 +1448,7 @@ \section{Source Code Reference} \IndexSource{local conformance lookup} \apiref{SubstitutionMap}{class} -The \verb|lookupConformance()| method implements \AlgRef{local conformance lookup algorithm} for performing a local conformance lookup. For other methods, see \SecRef{substmapsourcecoderef}. +The \verb|lookupConformance()| method implements \AlgRef{local conformance lookup algorithm} for performing a local conformance lookup. For other methods, see \SecRef{src:substitution maps}. \subsection*{Finding Conformance Paths} @@ -1455,7 +1458,7 @@ \subsection*{Finding Conformance Paths} \end{itemize} \apiref{GenericSignatureImpl}{class} -The \texttt{getConformancePath()} method returns the reduced conformance path for a given type parameter and protocol declaration. For other methods, see \SecRef{genericsigsourceref}. +The \texttt{getConformancePath()} method returns the reduced conformance path for a given type parameter and protocol declaration. For other methods, see \SecRef{src:generic signatures}. \apiref{rewriting::RequirementMachine}{class} The \texttt{getConformancePath()} method on \texttt{GenericSignature} calls the method with the same name on the \texttt{RequirementMachine} class. The latter implements \AlgRef{find conformance path algorithm}. The \texttt{RequirementMachine} class has a pair of instance variables to store the algorithm's persistent state: @@ -1463,6 +1466,6 @@ \subsection*{Finding Conformance Paths} \item \texttt{ConformancePaths} is the table of known conformance paths. \item \texttt{CurrentConformancePaths} is the buffer of conformance paths at the currently enumerated length, which we denoted by~$B$. \end{itemize} -\AlgRef{find conformance path algorithm} traffics in reduced type parameters, while the actual implementation deals with instances of \verb|Term|. A term is the internal Requirement Machine representation of a type parameter, as we will learn in \ChapRef{symbols terms rules}. This avoids round-trip conversions between \verb|Term| and \verb|Type| when computing reduced types, but does not fundamentally alter the algorithm. +\AlgRef{find conformance path algorithm} traffics in reduced type parameters, while the actual implementation deals with instances of \verb|Term|. A term is the internal Requirement Machine representation of a type parameter, as we will learn in \ChapRef{chap:symbols terms rules}. This avoids round-trip conversions between \verb|Term| and \verb|Type| when computing reduced types, but does not fundamentally alter the algorithm. \end{document} diff --git a/docs/Generics/chapters/conformances.tex b/docs/Generics/chapters/conformances.tex index 8a015716b03e5..b6d44f97d778e 100644 --- a/docs/Generics/chapters/conformances.tex +++ b/docs/Generics/chapters/conformances.tex @@ -2,26 +2,26 @@ \begin{document} -\chapter{Conformances}\label{conformances} +\chapter{Conformances}\label{chap:conformances} -\lettrine{C}{onformances relate} types with the protocols they conform to. To be precise, a \index{protocol conformance|see{conformance}}\IndexDefinition{conformance}\emph{conformance} describes how its conforming type \emph{witnesses} the requirements of a protocol. We will start with the representation of conformances, and then discuss conformance lookup. Conformances also play an important role in type substitution. We will complete the discussion of substitution maps from the previous chapter, by showing how we use conformances to substitute dependent member types. +\lettrine{C}{onformances relate} types with the protocols they conform to. To be precise, a \index{protocol conformance|see{conformance}}\IndexDefinition{conformance}\emph{conformance} describes how its conforming type \emph{witnesses} the requirements of a protocol. We will start by looking at the representation of conformances, and then discuss conformance lookup. Finally, conformances also play an important role in type substitution, so we will fill the gaps in the previous chapter's discussion of substitution maps, by showing how we use conformances to substitute dependent member types. Conformances come in three distinct varieties: \begin{enumerate} -\item A \IndexDefinition{concrete conformance}\textbf{concrete conformance} states that a nominal type conforms to a protocol, in which case we know the original declaration of the conformance, and thus all of the witnesses. -\item An \index{abstract conformance}\textbf{abstract conformance} states that a type parameter or archetype conforms to a protocol, that is, satisfies a conformance requirement (\SecRef{abstract conformances}). -\item The \IndexDefinition{invalid conformance}\textbf{invalid conformance} sentinel indicates that a type does not conform. +\item A \IndexDefinition{concrete conformance}\textbf{concrete conformance} records that a nominal type conforms to a protocol, in which case we know the original declaration of the conformance, and thus all of the witnesses. +\item An \index{abstract conformance}\textbf{abstract conformance} records that a type parameter or archetype satisfies a conformance requirement, but we don't know ``how'' (\SecRef{abstract conformances}). +\item The \IndexDefinition{invalid conformance}\textbf{invalid conformance} records that some type does not conform. \end{enumerate} Concrete conformances break down further into four kinds: \begin{enumerate} -\item A \IndexDefinition{normal conformance}\textbf{normal conformance} directly refers to the declaration of a conformance on a nominal type or extension. +\item A \IndexDefinition{normal conformance}\textbf{normal conformance} represents the actual declaration of a conformance on a nominal type, or extension. \item A \index{specialized conformance}\textbf{specialized conformance} represents a normal conformance with a substitution map applied (\SecRef{conformance subst}). -\item An \index{inherited conformance}\textbf{inherited conformance} represents the conformance of a subclass via a normal or specialized conformance inherited from its superclass. -\item A \IndexDefinition{self conformance}\textbf{self conformance} represents the conformance of an existential type to its own protocol. This is possible only in select instances (\SecRef{selfconformingprotocols}). +\item An \index{inherited conformance}\textbf{inherited conformance} represents a conformance inherited by a subclass from its superclass. +\item A \IndexDefinition{self conformance}\textbf{self conformance} represents the conformance of an existential type to its own protocol. This is possible only in special circumstances (\SecRef{selfconformingprotocols}). \end{enumerate} -\paragraph{Normal conformances.} A normal conformance is declared by stating the name of a protocol in the inheritance clause of a \index{inheritance clause!nominal type declaration}\index{nominal type declaration}nominal type or \index{inheritance clause!extension declaration}\index{extension declaration}extension:\index{horse} +\paragraph{Normal conformances.} In the language, a normal conformance is declared by stating the name of a protocol in the inheritance clause of a \index{inheritance clause!nominal type declaration}\index{nominal type declaration}nominal type or \index{inheritance clause!extension declaration}\index{extension declaration}extension:\index{horse} \begin{Verbatim} struct Horse: Animal {...} @@ -35,16 +35,16 @@ \chapter{Conformances}\label{conformances} \begin{itemize} \item The \index{conforming type!normal conformance}\textbf{conforming type}, which is the \index{declared interface type!normal conformance}declared interface type of the nominal type declaration. \item The \textbf{protocol declaration} being conformed to. -\item The \textbf{conforming context} where this normal conformance is declared, either the nominal type declaration itself, or one of its extensions. This is a \index{declaration context!normal conformance}declaration context (\ChapRef{decls}). -\item The \index{generic signature!normal conformance}\textbf{generic signature} of the conforming context. If the conformance context is a constrained extension, this generic signature has additional requirements not present in the nominal type's generic signature; we say that we have a \emph{conditional conformance} (\SecRef{conditional conformance}). +\item The \textbf{conforming context} where this normal conformance is declared, either the nominal type declaration itself, or one of its extensions. This is a \index{declaration context!normal conformance}declaration context (\ChapRef{chap:decls}). +\item The \index{generic signature!normal conformance}\textbf{generic signature} of the conforming context. If the conformance context is a constrained extension, this generic signature has additional requirements not present in the nominal type's generic signature; we say that we have a \emph{conditional conformance} (\SecRef{sec:conditional conformances}). \item A \textbf{type witness} for each associated type declared by the protocol. This is an interface type written using the generic signature of the conformance (\SecRef{type witnesses}). \item An \textbf{associated conformance} for each associated conformance requirement of the protocol. This is another conformance with the appropriate subject type. Among other things, given a conformance to a derived protocol, this allows us to recover the conformance of the same type to a base protocol (\SecRef{associated conformances}). -\item A \textbf{value witness} for each \index{value requirement}value requirement of the protocol. This is a reference to a value declaration, together with a substitution map. The value declaration is either a member of the conforming nominal type, an extension, a \index{superclass}superclass, or a \index{protocol extension}protocol extension. +\item A \textbf{value witness} for each \index{value requirement}value requirement of the protocol. This is a reference to a value declaration, together with a substitution map. The value declaration is either a member of the conforming nominal type, an extension, a \index{superclass type}superclass, or a \index{protocol extension}protocol extension. \end{itemize} Every nominal type and extension declaration has a list of \IndexDefinition{local conformance}\emph{local conformances}, which are normal conformances stated in the declaration's inheritance clause. In our example, $\ConfReq{Horse}{Animal}$ is a local conformance of the nominal type declaration \texttt{Horse}, while $\ConfReq{Cow}{Animal}$ is a local conformance of the extension of \texttt{Cow}. In \index{SILGen}SILGen, we generate a witness table for each local conformance of each declaration in each primary file. -In addition, every nominal type declaration also has a \IndexDefinition{conformance lookup table}\emph{conformance lookup table}. This is how global conformance lookup will \emph{find} a conformance, in the next section. In this table, we collect the local conformances from the nominal type itself, and all of its extensions. In Swift, conformances are inherited, so the conformance lookup table for a \index{class declaration}class has an additional behavior; it also collects all conformances from its \index{superclass}superclass, and its superclass, and so on. To ensure that such a conformance inherited from a superclass has the correct conforming type, we introduce a little bit of bookkeeping. +In addition, every nominal type declaration also has a \IndexDefinition{conformance lookup table}\emph{conformance lookup table}. This is how global conformance lookup will \emph{find} a conformance, in the next section. In this table, we collect the local conformances from the nominal type itself, and all of its extensions. In Swift, conformances are inherited, so the conformance lookup table for a \index{class declaration!inherited conformance}class has an additional behavior; it also collects all conformances from its \index{superclass type!inherited conformance}superclass, and its superclass, and so on. To ensure that such a conformance inherited from a superclass has the correct conforming type, we introduce a little bit of bookkeeping. \paragraph{Inherited conformances.} In the below, \texttt{Square} is a subclass of \texttt{Polygon}, so it inherits the normal conformance $\ConfReq{Polygon}{Shape}$: @@ -64,7 +64,7 @@ \chapter{Conformances}\label{conformances} \draw [arrow] (SquareShape) -- (Square); \end{tikzpicture} \end{center} -We denote this inherited conformance by $\ConfReq{Square}{Shape}$. It behaves identically to the superclass conformance $\ConfReq{Polygon}{Shape}$, except if ask for its \index{conforming type!inherited conformance}conforming type, we get back \texttt{Square} instead of \texttt{Polygon}. In \ExRef{inherited specialized conf}, we will see that more complex behaviors can manifest when the superclass declaration has generic parameters. +We denote this inherited conformance by $\ConfReq{Square}{Shape}$. It behaves identically to the superclass conformance $\ConfReq{Polygon}{Shape}$, except if ask for its \index{conforming type!inherited conformance}conforming type, we get back \texttt{Square} instead of \texttt{Polygon}. In \ExRef{inherited specialized conf}, we will see that more complex behaviors can manifest when the \index{superclass declaration}superclass declaration has generic parameters. \section{Conformance Lookup}\label{conformance lookup} @@ -136,7 +136,7 @@ \section{Conformance Lookup}\label{conformance lookup} \paragraph{Future directions.} Extending the language implementation to guarantee correctness in the presence of overlapping conformances is a major undertaking, as it would require several architectural changes in the compiler and runtime. -The global conformance lookup operation would need a disambiguation rule, perhaps by taking additional source location information into account. Parts of the compiler that rely on coherence would need to be redesigned to perform each lookup exactly once and save the result for later, instead of freely performing another lookup and assuming the same conformance will be found. +The global conformance lookup operation would need a disambiguation rule, perhaps by taking additional \index{source location}source location information into account. Parts of the compiler that rely on coherence would need to be redesigned to perform each lookup exactly once and save the result for later, instead of freely performing another lookup and assuming the same conformance will be found. For example, when we resolve a generic nominal type written in source, we perform global conformance lookup to ensure that the generic arguments satisfy the conformance requirements of the nominal type. If we later need the context substitution map of this \index{generic nominal type}generic nominal type, we currently call global conformance lookup again, to populate the substitution map's conformances. We could avoid the second lookup by changing the representation of a generic nominal type to store a substitution map, instead of just the generic arguments. @@ -180,7 +180,7 @@ \section{Conformance Substitution}\label{conformance subst} \item The output generic signature of an \index{inherited conformance!output generic signature}inherited conformance is the output generic signature of its underlying superclass conformance. \end{itemize} -The set of conformances with output generic signature~$G$ is denoted by \IndexSetDefinition{conf}{\ConfObj{G}}$\ConfObj{G}$. This is related to the notation of \ChapRef{substmaps} as follows. The above definition implies that if $\XP\in\ConfObj{G}$, then $\tX\in\TypeObj{G}$. Conversely, if some $\tX\in\TypeObj{G}$ conforms to~\tP, then $\PP \otimes \tX \in \ConfObj{G}$, by definition of \AlgRef{conformance lookup algo}. +The set of conformances with output generic signature~$G$ is denoted by \IndexSetDefinition{conf}{\ConfObj{G}}$\ConfObj{G}$. This is related to the notation of \ChapRef{chap:substitution maps} as follows. The above definition implies that if $\XP\in\ConfObj{G}$, then $\tX\in\TypeObj{G}$. Conversely, if some $\tX\in\TypeObj{G}$ conforms to~\tP, then $\PP \otimes \tX \in \ConfObj{G}$, by definition of \AlgRef{conformance lookup algo}. We've only seen substitution maps applied to normal conformances so far, but shortly we will generalize this and define $\XP\otimes\Sigma \in \ConfObj{H}$ for any concrete conformance $\XP\in\ConfObj{G}$ and substitution map $\Sigma\in\SubMapObj{G}{H}$. @@ -217,24 +217,24 @@ \section{Conformance Substitution}\label{conformance subst} \[(\NormalConf\otimes\Sigma_1)\otimes\Sigma_2 := \NormalConf\otimes(\Sigma_1\otimes\Sigma_2)\] \paragraph{Inherited conformances.} -Finally, we can apply a substitution map to an \index{inherited conformance!substitution}inherited conformance. Suppose that $\ConfReq{C}{P}$ is an inherited conformance with conforming type $\tC \in \TypeObj{G}$ and superclass conformance $\ConfReq{$\tC^\prime$}{P} \in \ConfObj{G}$, where $\tC^\prime$ is a \index{superclass}superclass type of \tC. If $\Sigma \in \SubMapObj{G}{H}$ is a substitution map, then $\ConfReq{C}{P} \otimes \Sigma \in \ConfObj{H}$ is the inherited conformance built from $\tC \otimes \Sigma$ and $\ConfReq{$\tC^\prime$}{P} \otimes \Sigma$. +Finally, we can apply a substitution map to an \index{inherited conformance!substitution}inherited conformance. Suppose that $\ConfReq{C}{P}$ is an inherited conformance with conforming type $\tC \in \TypeObj{G}$ and superclass conformance $\ConfReq{$\tC^\prime$}{P} \in \ConfObj{G}$, where $\tC^\prime$ is a \index{superclass type!inherited conformance}superclass type of \tC. If $\Sigma \in \SubMapObj{G}{H}$ is a substitution map, then $\ConfReq{C}{P} \otimes \Sigma \in \ConfObj{H}$ is the inherited conformance built from $\tC \otimes \Sigma$ and $\ConfReq{$\tC^\prime$}{P} \otimes \Sigma$. \begin{example}\label{inherited specialized conf} -Consider \texttt{D} below, which inherits the conformance to \tP\ from \texttt{B}: +Consider \texttt{Mid} below, which inherits the conformance to \tP\ from \texttt{Top}: \begin{Verbatim} protocol P {} -class B: P {} -class D: B<(Y, X)> {} +class Top: P {} +class Mid: Top<(Y, X)> {} \end{Verbatim} -Note that \texttt{D}'s superclass type is \texttt{B<(\rU, \rT)>}, with context substitution map: +The superclass type of \texttt{Mid} is \texttt{Top<(\rU, \rT)>}, with context substitution map: \[\Sigma_1 := \SubstMap{\SubstType{\rT}{(\rU, \rT)}}\] -To build the conformance lookup table of \texttt{D}, we look up the conformance of the superclass type \texttt{B<(\rU, \rT)>} to \tP\ first, which gives us a specialized conformance with the above substitution map. We then wrap this specialized conformance in an inherited conformance, to give it the correct conforming type \texttt{D<\rT, \rU>}. We record the resulting structure in the table: +When we build the conformance lookup table of \texttt{Mid}, we look up the conformance of the superclass type \texttt{Top<(\rU, \rT)>} to \tP, which gives us a specialized conformance with the above substitution map. We then wrap this specialized conformance in an inherited conformance, to give it the correct conforming type \texttt{Mid<\rT, \rU>}. We record the resulting structure in the table: \begin{center} \begin{tikzpicture}[node distance=0.5cm] -\node (Inherited) [type, rectangle split, rectangle split parts=2] {\vphantom{p}inherited conformance\nodepart{two}\texttt{\vphantom{p()}D<\rT, \rU>:~P}}; -\node (Specialized) [type, rectangle split, rectangle split parts=2, right=of Inherited] {specialized conformance\nodepart{two}\texttt{\vphantom{p()}B<(\rU, \rT)>:~P}}; -\node (Normal) [type, rectangle split, rectangle split parts=2, right=of Specialized] {\vphantom{p}normal conformance\nodepart{two}\texttt{\vphantom{p()}B<\rT>:~P}}; -\node (Type) [type, rectangle split, rectangle split parts=2, below=of Inherited] {conforming type\nodepart{two}\texttt{\vphantom{p()}D<\rT, \rU>}}; +\node (Inherited) [type, rectangle split, rectangle split parts=2] {\vphantom{p}inherited conformance\nodepart{two}\texttt{\vphantom{p()}Top<\rT, \rU>:~P}}; +\node (Specialized) [type, rectangle split, rectangle split parts=2, right=of Inherited] {specialized conformance\nodepart{two}\texttt{\vphantom{p()}Top<(\rU, \rT)>:~P}}; +\node (Normal) [type, rectangle split, rectangle split parts=2, right=of Specialized] {\vphantom{p}normal conformance\nodepart{two}\texttt{\vphantom{p()}Top<\rT>:~P}}; +\node (Type) [type, rectangle split, rectangle split parts=2, below=of Inherited] {conforming type\nodepart{two}\texttt{\vphantom{p()}Mid<\rT, \rU>}}; \node (SubMap) [type, rectangle split, rectangle split parts=2, below=of Specialized] {substitution map\nodepart{two}$\vphantom{\texttt{p}}\SubstType{\rT}{(\rU, \rT)}$}; \draw [arrow] (Inherited) -- (Specialized); @@ -243,21 +243,21 @@ \section{Conformance Substitution}\label{conformance subst} \draw [arrow] (Specialized) -- (SubMap); \end{tikzpicture} \end{center} -Suppose global conformance lookup is asked for the conformance of $\texttt{D}$~to~\tP. We need the context substitution map of \texttt{D}: +Suppose we ask global conformance lookup for the conformance of $\texttt{Mid}$~to~\tP. We need the context substitution map of \texttt{Mid}: \[\Sigma_2 := \SubstMap{\SubstType{\rT}{Int},\,\SubstType{\rU}{Bool}}\] -We first find the inherited conformance $\ConfReq{D<\rT, \rU>}{P}$ from the conformance lookup table of~\texttt{D}, and then we apply $\Sigma_2$ to this inherited conformance, which will in turn apply $\Sigma_2$ to its subclass type \texttt{D<\rT, \rU>} and specialized conformance $\ConfReq{B<(\rU, \rT)>}{P}$. For the latter, we note $\Sigma_1 \otimes \Sigma_2 = \SubstMap{\SubstType{\rT}{(Bool, Int)}}$: +We first find the inherited conformance $\ConfReq{Top<\rT, \rU>}{P}$ from the conformance lookup table of~\texttt{Mid}, and then we apply $\Sigma_2$ to this inherited conformance, which will in turn apply $\Sigma_2$ to its subclass type \texttt{Mid<\rT, \rU>} and specialized conformance $\ConfReq{Top<(\rU, \rT)>}{P}$. For the latter, we note $\Sigma_1 \otimes \Sigma_2 = \SubstMap{\SubstType{\rT}{(Bool, Int)}}$: \begin{gather*} -\texttt{D<\rT, \rU>} \otimes \Sigma_2 = \texttt{D}\\ -\ConfReq{B<(\rU, \rT)>}{P} \otimes \Sigma_2 = \ConfReq{B<(Bool, Int)>}{P} +\texttt{Mid<\rT, \rU>} \otimes \Sigma_2 = \texttt{Mid}\\ +\ConfReq{Top<(\rU, \rT)>}{P} \otimes \Sigma_2 = \ConfReq{Top<(Bool, Int)>}{P} \end{gather*} -\noindent Our final result is the rather intricate inherited conformance $\ConfReq{D}{P}$: +\noindent Our final result is the rather intricate inherited conformance $\ConfReq{Mid}{P}$: \begin{center} \begin{tikzpicture}[node distance=0.5cm] -\node (Inherited) [type, rectangle split, rectangle split parts=2] {\vphantom{p}inherited conformance\nodepart{two}\texttt{\vphantom{p()}D:~P}}; -\node (Specialized) [type, rectangle split, rectangle split parts=2, right=of Inherited] {specialized conformance\nodepart{two}\texttt{\vphantom{p()}B<(Bool, Int)>:~P}}; -\node (Normal) [type, rectangle split, rectangle split parts=2, right=of Specialized] {\vphantom{p}normal conformance\nodepart{two}\texttt{\vphantom{p()}B<\rT>:~P}}; -\node (Type) [type, rectangle split, rectangle split parts=2, below=of Inherited] {conforming type\nodepart{two}\texttt{\vphantom{p()}D}}; +\node (Inherited) [type, rectangle split, rectangle split parts=2] {\vphantom{p}inherited conformance\nodepart{two}\texttt{\vphantom{p()}Mid:~P}}; +\node (Specialized) [type, rectangle split, rectangle split parts=2, right=of Inherited] {specialized conformance\nodepart{two}\texttt{\vphantom{p()}Top<(Bool, Int)>:~P}}; +\node (Normal) [type, rectangle split, rectangle split parts=2, right=of Specialized] {\vphantom{p}normal conformance\nodepart{two}\texttt{\vphantom{p()}Top<\rT>:~P}}; +\node (Type) [type, rectangle split, rectangle split parts=2, below=of Inherited] {conforming type\nodepart{two}\texttt{\vphantom{p()}Mid}}; \node (SubMap) [type, rectangle split, rectangle split parts=2, below=of Specialized] {substitution map\nodepart{two}$\vphantom{\texttt{p}}\SubstType{\rT}{(Bool, Int)}$}; \draw [arrow] (Inherited) -- (Specialized); @@ -269,7 +269,7 @@ \section{Conformance Substitution}\label{conformance subst} \end{example} \paragraph{Summary.} -Let's recap the various forms of the \index{$\otimes$}$\otimes$ operator we've worked with so far. We can substitute types (\ChapRef{substmaps}): +Let's recap the various forms of the \index{$\otimes$}$\otimes$ operator we've worked with so far. We can substitute types (\ChapRef{chap:substitution maps}): \begin{gather*} \TypeObj{G}\otimes\SubMapObj{G}{H}\longrightarrow\TypeObj{H} \end{gather*} @@ -277,19 +277,19 @@ \section{Conformance Substitution}\label{conformance subst} \begin{gather*} \ConfObj{G}\otimes\SubMapObj{G}{H}\longrightarrow\ConfObj{H} \end{gather*} -We can compose substitution maps (\SecRef{submapcomposition}): +We can compose substitution maps (\SecRef{sec:composition}): \[\SubMapObj{G}{H}\otimes\SubMapObj{H}{I}\longrightarrow\SubMapObj{G}{I}\] -We can find conformances, where \IndexSetDefinition{proto}{\ProtoObj}$\ProtoObj$ denotes the set of all protocols (\SecRef{conformance lookup}): +We can look up conformances, where \IndexSetDefinition{proto}{\ProtoObj}$\ProtoObj$ is the set of all protocols (\SecRef{conformance lookup}): \[\ProtoObj \otimes \TypeObj{G} \longrightarrow \ConfObj{G}\] -Furthermore, in every instance $\otimes$ operator was associative, so the result of evaluating an expression does not depend on the placement of parentheses. +Furthermore, we saw that the $\otimes$ operator is associative, so parentheses are never required to disambiguate the result of an expression. \section{Type Witnesses}\label{type witnesses} In order to conform to a protocol with associated types, a nominal type declaration must declare a \IndexDefinition{type witness}\emph{type witness} for each \index{associated type declaration!type witness}associated type. There are four ways to declare a type witness in the source language: \begin{enumerate} -\item With a \textbf{member type declaration}---that is, a nested nominal type or \index{type alias declaration}type alias declaration---having the same name as the associated type. This member type might be a child of the conforming nominal type, one of its extensions, or if the conforming type is a class, one of its \index{superclass!type witness}superclasses. +\item With a \textbf{member type declaration}---that is, a nested nominal type or \index{type alias declaration}type alias declaration---having the same name as the associated type. This member type might be a child of the conforming nominal type, one of its extensions, or if the conforming type is a class, one of its \index{superclass type!type witness}superclasses. \item With \index{associated type inference}\textbf{associated type inference}, which deduces type witnesses by considering the candidate value witnesses for each \index{value requirement}value requirement in the protocol. -\item With a \index{generic parameter declaration}\textbf{generic parameter} having the same name as the associated type. +\item With a \index{generic parameter declaration!type witness}\textbf{generic parameter} having the same name as the associated type. \item With a \index{default type witness}\textbf{default type witness} on the associated type declaration, if one is present. This is the fallback if all else fails. \end{enumerate} @@ -301,13 +301,13 @@ \section{Type Witnesses}\label{type witnesses} func play(_: Toy) } \end{Verbatim} -Before we begin, we add a default implementation of \texttt{play()} that can be used with an arbitrary \texttt{Toy}, so conforming types are not required to provide their own: +Before we begin, we add a \index{default witness}default witness for \texttt{play()} that can be used with an arbitrary \texttt{Toy}, so conforming types are not required to provide their own: \begin{Verbatim} extension Pet { func play(_: Toy) {} } \end{Verbatim} -Our first type, \texttt{Chicken}, explicitly declares a member type named \texttt{Toy}, while relying on the default implementation of \texttt{play()}. This is Case~1 above: +Our first type, \texttt{Chicken}, explicitly declares a member type named \texttt{Toy}, while relying on the default witness for \texttt{play()}. This is Case~1 above: \begin{Verbatim} struct Chicken: Pet { struct Toy {} @@ -346,9 +346,9 @@ \section{Type Witnesses}\label{type witnesses} \paragraph{Normal conformances.} Every \index{normal conformance!type witness}normal conformance contains a table that maps the protocol's associated type declarations to the type witnesses of the conformance. This table is lazily populated by the \IndexDefinition{type witness request}\Request{type witness request}, which attempts to resolve a single type witness using \index{qualified lookup!type witness}qualified lookup, which handles Case~1. -If no such member type declaration exists, we then evaluate the \index{type witnesses request}\Request{type witnesses request}, which attempts to simultaneously resolve all type witnesses in the normal conformance via associated type inference. This code path implements Case~2, but also Case~3 and Case~4, which users might not think of as being part of associated type inference. We will discuss associated type inference in \SecRef{associated type inference}. +If no such member type declaration exists, we then evaluate the \IndexDefinition{type witnesses request}\Request{type witnesses request}, which attempts to simultaneously resolve all type witnesses in the normal conformance via associated type inference. This code path implements Case~2, but also Case~3 and Case~4, which users might not think of as being part of associated type inference. We will discuss associated type inference in \SecRef{associated type inference}. -When a conformance is declared in a \index{secondary file}secondary file of a \index{frontend job}frontend job, we resolve its type witnesses only if we need to while type checking something else. On the other hand, when a conformance is declared in a \index{primary file}primary file, we also run the \IndexDefinition{conformance checker}\emph{conformance checker} as part of the \index{type-check source file request}\Request{type-check source file request}. The conformance checker resolves all type and value witnesses and performs various additional checks. If the conformance checker does not emit any diagnostics, we know the conformance provides a complete set of type and value witnesses, and we can move on to code generation. +When a conformance is declared in a \index{secondary file}secondary file of a \index{frontend job}frontend job, we resolve its type witnesses only if we need to while type checking something else. On the other hand, when a conformance is declared in a \index{primary file}primary file, we also run the \IndexDefinition{conformance checker}\emph{conformance checker} as part of the \index{type-check primary file request}\Request{type-check primary file request}. The conformance checker resolves all type and value witnesses and performs various additional checks. If the conformance checker does not emit any diagnostics, we know the conformance provides a complete set of type and value witnesses, and we can move on to code generation. \paragraph{Projection.} We incorporate type witnesses into the \index{type substitution}type substitution algebra with a new form of the $\otimes$ operator. Suppose $\NormalConf$ is a normal conformance, and \tP\ declares an associated type~\nA. We introduce the notation $\APA$ for the \index{associated type declaration!type witness}associated type declaration, and then we denote the type witness for \nA\ with the following expression: @@ -374,25 +374,25 @@ \section{Type Witnesses}\label{type witnesses} The standard library's $\ArraySequence$ normal conformance witnesses the \texttt{Element} and \texttt{Iterator} associated types of \tSequence\ as follows: \begin{gather*} \AElement \otimes \ArraySequence \\ -\qquad\qquad {} = \rT\\ +\qquad {} = \rT\\ \AIterator \otimes \ArraySequence\\ -\qquad\qquad {} = \texttt{IndexingIterator>} +\qquad {} = \texttt{IndexingIterator>} \end{gather*} Recall the specialized conformance $\ArrayIntSequence$ from \ExRef{specialized conf example}. To get its type witnesses, we apply the conformance substitution map $\SubMapInt$: \begin{gather*} \AElement \otimes \ArrayIntSequence \\ -\qquad\qquad {} = \rT \otimes \SubMapInt\\ -\qquad\qquad {} = \texttt{Int}\\[\medskipamount] +\qquad {} = \rT \otimes \SubMapInt\\ +\qquad {} = \texttt{Int}\\[\medskipamount] \AIterator \otimes \ArrayIntSequence\\ -\qquad\qquad {} = \texttt{IndexingIterator>}\otimes\SubMapInt\\ -\qquad\qquad {} = \texttt{IndexingIterator>} +\qquad {} = \texttt{IndexingIterator>}\otimes\SubMapInt\\ +\qquad {} = \texttt{IndexingIterator>} \end{gather*} \end{example} -Projecting a type witness from a normal or specialized conformance has the property that any type parameters appearing in the type witness are valid in the output generic signature of the conformance. In other words, if $\XP\in\ConfObj{G}$ and \nA\ is an associated type of \tP, then $\APA\otimes\XP\in\TypeObj{G}$. (When we get to abstract conformances, we will see the property holds there as well.) If we take \IndexSetDefinition{assoctype}{\AssocTypeObj{P}}$\AssocTypeObj{P}$ to mean the set of all associated type declarations of a fixed protocol \tP, and $\ConfPObj{P}{G}$ as the subset of $\ConfObj{G}$ containing only conformances to this \tP, then for each protocol \tP, \IndexDefinition{type witness projection}type witness projection defines this additional form of the \index{$\otimes$}$\otimes$ operation: +An important invariant is that any type parameters appearing in the type witness of a normal or specialized conformance are valid in the output generic signature of the conformance. In other words, if $\XP\in\ConfObj{G}$ and \nA\ is an associated type of \tP, then $\APA\otimes\XP\in\TypeObj{G}$. (When we get to abstract conformances, we will see the property holds there as well.) To summarize all of the above, if we take \IndexSetDefinition{assoctype}{\AssocTypeObj{P}}$\AssocTypeObj{P}$ to mean the set of all associated type declarations of a fixed protocol \tP, and $\ConfPObj{P}{G}$ as the subset of $\ConfObj{G}$ containing only conformances to this \tP, then for each protocol \tP, \IndexDefinition{type witness projection}type witness projection gives us this additional form of the \index{$\otimes$}$\otimes$ operation: \[\AssocTypeObj{P}\otimes\ConfPObj{P}{G}\longrightarrow\TypeObj{G}\] -We will now study the relationship between \index{global conformance lookup}global conformance lookup and type witness projection, which will help us to better understand the material in the next section. If $\tX=\tXd\otimes\Sigma$ conforms to \tP, and \tP\ declares an associated type \nA, we can write down the ``general type witness expression'' below: +Finally, we consider the relationship between \index{global conformance lookup}global conformance lookup and type witness projection. If $\tX=\tXd\otimes\Sigma$ conforms to \tP, and \tP\ declares an associated type \nA, one way to resolve the type representation ``\texttt{X.A}'' is via global conformance lookup followed by type witness projection: \[\APA\otimes\PP\otimes\tXd\otimes\Sigma\] Of the four possible ways to parenthesize this expression, there are three that correspond to valid combinations of $\otimes$: \newcommand{\PB}{\phantom{\Big(}} @@ -402,13 +402,13 @@ \section{Type Witnesses}\label{type witnesses} \Big(\!\PB \APA \PBB \otimes \Big(\!\Big(\PP \PB \otimes \PB \tXd \Big)\!\PB \otimes \PBB \Sigma \Big)\!\Big) \tag{2}\\ \Big(\!\PB \APA \PBB \otimes \PB\! \Big( \PP \PB \otimes \Big( \tXd \PBB \otimes \PBB \Sigma \Big)\!\Big)\!\Big) \tag{3} \end{gather*} -Let's say that \texttt{$\tXd$.A} is the type witness for \nA\ in $\NormalConf$, and \texttt{X.A} is the type witness of \nA\ in $\XP$. Then, all three combinations above must output \texttt{X.A}, because we defined $\otimes$ to be associative in each case: +Let's say that \texttt{$\tXd$.A} denotes the type witness for \nA\ in $\NormalConf$, and \texttt{X.A} denotes the type witness of \nA\ in $\XP$. Then, all three combinations above must output \texttt{X.A}, because we defined $\otimes$ to be associative in each case: \begin{enumerate} \item We can look up the conformance of $\tXd$ to \tP, project the type witness \texttt{$\tXd$.A} from this normal conformance, and apply $\Sigma$ to this type witness to get \texttt{X.A}. \item We can look up the conformance of $\tXd$ to \tP, apply $\Sigma$ to this normal conformance to get a specialized conformance, and project the type witness \texttt{X.A}. \item We can apply $\Sigma$ to $\tXd$, look up the conformance of \tX\ to \tP, and project the type witness \texttt{X.A}. \end{enumerate} -We can also exhibit this with a \index{commutative diagram}commutative diagram. Each of the three evaluation orders corresponds to one of the three unique paths from $\tXd$ to $\texttt{X.A}$: +We can also exhibit this with a \index{commutative diagram!type witness projection}commutative diagram. Each of the three evaluation orders corresponds to one of the three unique paths from $\tXd$ to $\texttt{X.A}$: \begin{center} \begin{tikzcd}[column sep=3cm,row sep=1cm] \tXd \arrow[d, "\PP"] \arrow[r, "\Sigma"] &\tX \arrow[d, "\PP"] \\ @@ -416,7 +416,7 @@ \section{Type Witnesses}\label{type witnesses} \texttt{$\tXd$.A} \arrow[r, "\Sigma"]&\texttt{X.A} \end{tikzcd} \end{center} -To make this concrete, with the \texttt{Element} type witness of $\ArrayIntSequence$ from \ExRef{array type witness example}, we get: +To make this concrete, with the \texttt{Element} type witness of $\ArrayIntSequence$ from \ExRef{array type witness example}, we get the following, because ``\texttt{Array.Element}'' resolves to \texttt{Int}: \begin{center} \begin{tikzcd}[column sep=2.5cm,row sep=1cm] \texttt{Array<\rT>} \arrow[d, "\PSequence"] \arrow[r, "\SubMapInt"] &\texttt{Array} \arrow[d, "\PSequence"] \\ @@ -428,10 +428,10 @@ \section{Type Witnesses}\label{type witnesses} \section{Abstract Conformances}\label{abstract conformances} An \IndexDefinition{abstract conformance}\emph{abstract conformance} represents the conformance of a \index{type parameter}type parameter to a protocol. If \tT\ is a type parameter and \tP\ is a protocol, global conformance lookup forms an abstract conformance when given \tT\ and \tP: -\[\PP\otimes\tT = \TP\] -We say that an abstract conformance $\TP$ is \IndexDefinition{valid abstract conformance}\emph{valid} in a \index{generic signature}generic signature~$G$ if the conformance requirement $\TP$ can be \index{derived requirement!abstract conformance}derived from~$G$ (\SecRef{derived req}). Thus, there is no ambiguity in our two respective formalisms using the same notation $\TP$ to denote both conformance requirements and abstract conformances. +\[\PP\otimes\tT := \TP\] +We say that an abstract conformance $\TP$ is \IndexDefinition{valid abstract conformance}\emph{valid} in a \index{generic signature}generic signature~$G$ if the \index{conformance requirement!type substitution}conformance requirement $\TP$ can be \index{derived requirement!abstract conformance}derived from~$G$ (\SecRef{derived req}). Thus, there is no ambiguity in our two respective formalisms using the same notation $\TP$ to denote both conformance requirements and abstract conformances. -We will use abstract conformances to fill in the part of \AlgRef{type subst algo} we have not explained yet, namely how to apply a substitution map to a bound dependent member type. To do this, we must complete our type substitution algebra by defining the two fundamental operations of type witness projection and conformance substitution with an abstract conformance. Suppose that $G\vdash\TP$, and~\tP\ declares an associated type~\nA. In \SecRef{bound type params}, we saw that we can derive the \index{bound dependent member type!validity}\index{dependent member type!validity}bound dependent member type \texttt{T.[P]A} by applying the \IndexStep{AssocDecl}\textsc{AssocDecl} inference rule to a derivation of $\TP$: +We will use abstract conformances to fill in the part of \AlgRef{type subst algo} we have not explained yet, namely, how to apply a substitution map to a bound dependent member type. Before we can do this, we must define type witness projection and conformance substitution with an abstract conformance. Suppose that $G\vdash\TP$, and~\tP\ declares an associated type~\nA. In \SecRef{bound type params}, we saw that we can derive the \index{bound dependent member type!validity}\index{dependent member type!validity}bound dependent member type \texttt{T.[P]A} by applying the \IndexStep{AssocDecl}\textsc{AssocDecl} inference rule to a derivation of $\TP$: \begin{gather*} \AnyStep{\TP}{1}\\ \AssocDeclStep{1}{T.[P]A}{2} @@ -457,18 +457,18 @@ \section{Abstract Conformances}\label{abstract conformances} \end{gather*} \end{example} -Factoring a dependent member type in this manner allows us to understand what happens when we apply a substitution map to such a type: +Factoring a \IndexDefinition{dependent member type!type substitution}dependent member type in this manner allows us to understand what happens when we apply a substitution map to such a type: \[\texttt{T.[P]A}\otimes\Sigma:=\bigl(\APA \otimes \TP\bigr)\otimes\Sigma\] The right-hand side has but one interpretation, since we require $\otimes$ to be associative: \[\bigl(\APA \otimes \TP\bigr)\otimes\Sigma := \APA\otimes\bigl(\TP\otimes\Sigma\bigr)\] We're applying the substitution map $\Sigma$ to the abstract conformance $\TP$, and then we get the final result by projecting a type witness from this substituted conformance. This new form of conformance substitution is called \IndexDefinition{local conformance lookup}\emph{local conformance lookup}, by analogy with \index{global conformance lookup!compatibility}global conformance lookup, because it plays a similar role. -While we won't fully explain the implementation of local conformance lookup until \ChapRef{conformance paths}, we can completely \emph{specify} its behavior now. Suppose that $\TP$ and $\Sigma$ are as above. Since $\TP=\PP\otimes\tT$, we also have: +While we won't investigate the implementation of local conformance lookup until \ChapRef{conformance paths}, we can completely \emph{specify} its behavior now. Suppose that $\TP$ and $\Sigma$ are as above. Since $\TP=\PP\otimes\tT$, we also have: \[\TP\otimes\Sigma=(\PP\otimes\tT)\otimes\Sigma=\PP\otimes(\tT\otimes\Sigma)\] In other words, local conformance lookup of $\TP$ in $\Sigma$ must find the \emph{same} conformance as global conformance lookup of $\tT\otimes\Sigma$ to \tP. \paragraph{Substitution maps.} -Given any valid abstract conformance, local conformance lookup must be able to recover the corresponding substituted conformance from the substitution map. Indeed, in the previous chapter, we mentioned that in addition to replacement types, \index{substitution map!conformances}substitution maps also store conformances, when their input generic signature has conformance requirements. We can expand upon this now. +Given any valid abstract conformance, local conformance lookup must be able to recover the corresponding substituted conformance from the substitution map. Indeed, in the previous chapter, we mentioned that in addition to replacement types, \index{substitution map!conformances}substitution maps also store conformances, when their input generic signature has \index{conformance requirement!type substitution}conformance requirements. We can expand upon this now. Suppose we have a substitution map $\Sigma\in\SubMapObj{G}{H}$. The abstract conformance for an \emph{explicit} conformance requirement $\TP$ of $G$ is called a \IndexDefinition{root abstract conformance}\emph{root abstract conformance}. For each root abstract conformance of~$G$, the substitution map $\Sigma$ contains a conformance with subject type $\tT\otimes\Sigma$ and protocol~\tP; we call these the \IndexDefinition{root conformance}\emph{root conformances} of $\Sigma$. Local conformance lookup with a root abstract conformance is easy to describe. Applying $\Sigma$ to a root abstract conformance of~$G$ projects the corresponding root conformance from~$\Sigma$: \[ \TP \otimes \{\ldots,\, \SubstConf{T}{X}{P},\, \ldots\} := \ConfReq{X}{P} \] @@ -499,7 +499,7 @@ \section{Abstract Conformances}\label{abstract conformances} To continue with \ExRef{abstract conformance example}, suppose we call \texttt{firstTwoEqual()} with \texttt{Array} and \texttt{Set}: \begin{Verbatim} func doIt(_ s1: Array, _ s2: Set) { - if firstTwoEqual(s1, s2) { ... } + if firstTwoEqual(s1, s2) {...} } \end{Verbatim} Here is the substitution map $\Sigma$ for the call; it has three root conformances: @@ -543,7 +543,7 @@ \section{Abstract Conformances}\label{abstract conformances} \end{gather*} \end{example} \paragraph{Protocol substitution maps.} -In \ChapRef{genericsig}, we saw that a protocol declaration~\tP\ has the generic signature $\verb||$, called the \index{protocol generic signature}protocol generic signature, or~$\GP$ for short. We will sometimes need to specify a substitution map for $\GP$. This is called a \IndexDefinition{protocol substitution map}\emph{protocol substitution map}, and it consists of a single replacement type \tX\, and a conformance of this type \tX\ to \tP. We denote it by $\Sigma_{\XP}$: +In \ChapRef{chap:generic signatures}, we saw that a protocol declaration~\tP\ has the generic signature $\verb||$, called the \index{protocol generic signature}protocol generic signature, or~$\GP$ for short. We will sometimes need to specify a substitution map for $\GP$. This is called a \IndexDefinition{protocol substitution map}\emph{protocol substitution map}, and it consists of a single replacement type \tX\, and a conformance of this type \tX\ to \tP. We denote it by $\Sigma_{\XP}$: \[\Sigma_{\XP} := \SubstMapC{ \SubstType{Self}{X} }{ @@ -627,12 +627,12 @@ \section{Associated Conformances}\label{associated conformances} \end{example} \begin{example}\label{indexing iterator example} -Recall that \tSequence\ has a single associated conformance requirement $\AssocConfReq{Self.Iterator}{IteratorProtocol}{Sequence}$. In \ExRef{array type witness example}, we saw the type witness for \texttt{Iterator} in $\ArraySequence$ is the \texttt{IndexingIterator} type, specialized at \texttt{Array<\rT>}: +In \ExRef{array type witness example}, we saw that $\ArraySequence$ witnesses the \texttt{Iterator} associated type with \texttt{IndexingIterator}, specialized at \texttt{Array<\rT>}: \begin{gather*} \AIterator \otimes \ArraySequence\\ \qquad {} = \texttt{IndexingIterator>} \end{gather*} -When we project this associated conformance, we receive a specialized conformance of \texttt{IndexingIterator>} to \tIterator: +Now, recall that the \tSequence\ protocol states the associated conformance requirement $\AssocConfReq{Self.Iterator}{IteratorProtocol}{Sequence}$. If we project this requirement from our conformance, we get the specialized conformance of \texttt{IndexingIterator>} to \tIterator: \begin{gather*} \SelfIterator \otimes \ArraySequence\\ {} = \PIterator \otimes \texttt{IndexingIterator>}\\ @@ -726,7 +726,7 @@ \section{Associated Type Inference}\label{associated type inference} \end{enumerate} \paragraph{The problem instance.} -We start by looking at the protocol's \IndexDefinition{value requirement}\emph{value requirements}---those are its \index{function declaration!associated type inference}function, \index{subscript declaration!associated type inference}subscript, and \index{variable declaration!associated type inference}variable (or property) members. The interface type of a value requirement is written with respect to the protocol generic signature \verb||. We only need to consider a value requirement if its \index{interface type!associated type inference}interface type mentions a dependent member type \texttt{Self.[P]A}, where \texttt{A} is one of the associated types whose type witness we're trying to infer. According to the rules of the Swift language, the conforming type must witness each value requirement with some member declaration that has the same name and kind as the value requirement. The interface type of the witness must also match the interface type of the value requirement, if we substitute the conforming type in place of \texttt{Self}. In particular, if the interface type of the value requirement involves a dependent member type \texttt{Self.[P]A}, then the interface type of the value witness must contain the type witness for \texttt{A} in the same position. +We start by looking at the protocol's \index{value requirement}\emph{value requirements}---those are its \index{function declaration!associated type inference}function, \index{subscript declaration!associated type inference}subscript, and \index{variable declaration!associated type inference}variable (or property) members. The interface type of a value requirement is written with respect to the protocol generic signature \verb||. We only need to consider a value requirement if its \index{interface type!associated type inference}interface type mentions a dependent member type \texttt{Self.[P]A}, where \texttt{A} is one of the associated types whose type witness we're trying to infer. According to the rules of the Swift language, the conforming type must witness each value requirement with some member declaration that has the same name and kind as the value requirement. The interface type of the witness must also match the interface type of the value requirement, if we substitute the conforming type in place of \texttt{Self}. In particular, if the interface type of the value requirement involves a dependent member type \texttt{Self.[P]A}, then the interface type of the value witness must contain the type witness for \texttt{A} in the same position. In associated type inference, we don't know what the mapping from value requirements to value witnesses will be yet, and of course we don't even know all of the type witnesses either. However, at the very least, for each value requirement, we can perform a \index{qualified lookup}qualified lookup to find members of the conforming type that \emph{could} witness this value requirement, if their interface type would match. This gives us a set of \IndexDefinition{candidate value witness}\emph{candidate value witnesses} for each value requirement. @@ -745,24 +745,24 @@ \section{Associated Type Inference}\label{associated type inference} In the below conformance, we are asked to infer all three associated types: \begin{Verbatim} struct Lunch: Meal { - func eat(_: Int, _: Bool) { ... } - func eat(_: Bool, _: Float) { ... } - func eat(_: Int, _: Float) -> String { ... } + func eat(_: Int, _: Bool) {...} + func eat(_: Bool, _: Float) {...} + func eat(_: Int, _: Float) -> String {...} - func prepare(_: Void) -> String { ... } - func prepare(_: String) -> Bool { ... } + func prepare(_: Void) -> String {...} + func prepare(_: String) -> Bool {...} } \end{Verbatim} \end{example} -Each candidate value witness gives us a \emph{partial solution}, or an assignment of type witnesses for some subset of the protocol's associated types. To get this partial solution, we walk the interface type of the requirement in parallel with that of the witness. If there are any differences, we reject the candidate, unless the mismatch is that the requirement contains the dependent member type \texttt{Self.[P]A}; in this case, we record a \emph{type witness assignment}. We collect the partial solutions from each candidate value witness to form a \emph{disjunction} for each value requirement. The list of disjunctions obtained from each value requirement gives us our \emph{problem instance}: +Each candidate value witness gives us a \emph{partial solution}, or a list of ``$\text{associated type} \mapsto \text{type witness}$'' pairs, as follows. We walk the interface type of the requirement in parallel with the interface type of the witness. If they differ at any position, we reject the pairing, unless the corresponding child of the requirement's interface type is a dependent member type \texttt{Self.[P]A}; in this case, we add a \emph{type witness assignment} to our partial solution. We collect the partial solutions from each value requirement into a \emph{disjunction}. The list of disjunctions obtained from all value requirements is our \emph{problem instance}: \begin{ceqn} \[ \text{problem instance} = \text{list of disjunctions} = \text{list of lists of partial solutions} \] \end{ceqn} -A \emph{solution} for our problem is a single list of ``$\text{associated type} \mapsto \text{type witness}$'' pairs. A solution is \emph{complete} if every associated type appears \emph{at least} once on the left-hand side of ``$\,\mapsto\,$'', and \emph{consistent} if every associated type appears \emph{at most} once on the left-hand side of ``$\,\mapsto\,$''. A solution \emph{covers} a disjunction if it contains one of the partial solutions from the disjunction as a subset. Our goal is to find \textsl{a complete and consistent solution that covers all disjunctions.} +A \emph{solution} to our problem instance is then just a very special kind of partial solution. We say that a solution is \emph{complete} if every associated type appears \emph{at least} once on the left-hand side of ``$\,\mapsto\,$'', and \emph{consistent} if every associated type appears \emph{at most} once on the left-hand side of ``$\,\mapsto\,$''. A solution \emph{covers} a disjunction if it contains one of the partial solutions from the disjunction as a subset. We state our goal as thus: given a problem instance, \textsl{we must find a complete and consistent solution that covers all disjunctions.} \begin{example}\label{assoc type inference example 2} Given the declarations of \ExRef{assoc type inference example 1}, the below table shows the interface type of each value requirement and candidate value witness: @@ -859,20 +859,20 @@ \section{Associated Type Inference}\label{associated type inference} \paragraph{Incomplete solutions.} The solver cycles through every consistent solution that covers every disjunction, but certain solutions might not assign every associated type, in which case we will get a solution that is consistent but not complete. In this case, we try a few more things to fill in the missing type witnesses before we give up on this solution: \begin{enumerate} -\item We analyze at the \index{associated same-type requirement}associated same-type requirements of the protocol, to see if this type witness is equivalent to some other known type witness. +\item We analyze the protocol's \index{associated same-type requirement}associated same-type requirements to see if this type witness is equivalent to some other known type witness. \item We check if the conforming nominal type declares a \index{generic parameter declaration!associated type inference}generic parameter with the same name as the associated type. \item We look for a \index{default type witness}default type witness, either on the associated type declaration we are attempting to infer, or some other associated type declaration with the same name, in some other protocol the conforming type conforms to. \end{enumerate} We will now look at an example of each behavior. -\begin{example} +\begin{example}\label{abstract type witness example} The first behavior handles a rather common scenario. Suppose we have a \tSequence\ conformance whose \nElement\ type is just some fixed concrete type. We could write it as follows: \begin{Verbatim} struct Fibonacci: Sequence { struct Iterator: IteratorProtocol { - mutating func next() -> Int? { ... } + mutating func next() -> Int? {...} } - func makeIterator() -> Iterator { ... } + func makeIterator() -> Iterator {...} } \end{Verbatim} Consider the nested type declaration first. We deduce the type witness for \nElement\ from \texttt{next()} in the \tIterator\ conformance: @@ -892,7 +892,7 @@ \section{Associated Type Inference}\label{associated type inference} \begin{Verbatim} struct Permutations: IteratorProtocol { // synthesized: typealias Element = Array - func next() -> [Element]? { ... } + func next() -> [Element]? {...} } \end{Verbatim} The \nElement\ type witness in the conformance is \texttt{Array<\rT>}, and not \rT, despite what the generic parameter's name may suggest. Thus, \texttt{Permutations.Element} resolves to \texttt{Array} below: @@ -910,7 +910,7 @@ \section{Associated Type Inference}\label{associated type inference} \end{example} \paragraph{Finishing up.} -If the above produces a complete solution, we still need to check that our type witness assignment actually satisfies the protocol's associated requirements, using \AlgRef{check generic arguments algorithm} from \SecRef{checking generic arguments}. If the algorithm accepts, we have a \emph{valid} solution. Just like in the \index{expression type checker}expression checker (\SecRef{more types}), we collect all valid solutions, and consider the three possibilities: +If the above produces a complete solution, we still need to check that our type witness assignment actually satisfies the protocol's associated requirements, using \AlgRef{check generic arguments algorithm} from \SecRef{checking generic arguments}. If the algorithm accepts, we have a \emph{valid} solution. Just like in the \index{expression type checker}expression type checker (\SecRef{sec:more types}), we collect all valid solutions, and consider the three possibilities: \begin{itemize} \item \textbf{One solution}---there is just one unique way to assign a type witness to each associated type in the conformance. \item \textbf{No solutions}---no possible assignment of type witnesses could be deduced from the conformance. @@ -973,9 +973,9 @@ \section{Associated Type Inference}\label{associated type inference} \end{array} \] \end{ceqn} -A Boolean formula is \emph{satisfiable} if there exists at least one \emph{satisfying assignment} under which the formula evaluates to 1. The \emph{satisfiability problem}, or \index{SAT problem}SAT, asks if a given Boolean formula is satisfiable. We will now show that associated type inference can solve SAT, by encoding a Boolean formula as a conformance, where a valid assignment of type witness corresponds to a satisfying assignment of truth values. +A Boolean formula is \emph{satisfiable} if there exists at least one \emph{satisfying assignment} under which the formula evaluates to 1. The \emph{satisfiability problem}, or \index{SAT problem}SAT, asks if a given Boolean formula is satisfiable. We will now show that associated type inference can solve SAT, by encoding a Boolean formula as a conformance, where a valid assignment of type witnesses corresponds to a satisfying assignment of truth values. -We only need to consider a restricted form of the SAT problem. Let's say that a single variable~$x$ or its negation~$\neg x$ is a \index{literal!in SAT}\emph{literal}, while a \index{clause!in SAT}\emph{clause} is a disjunction of one or more literals, for example $(\neg x_1 \vee x_2 \vee \neg x_3)$. A formula is in \index{conjunctive normal form}\emph{conjunctive normal form}, or \index{CNF|see{conjunctive normal form}}CNF, if it is a conjunction of clauses: +We only need to consider a restricted form of the SAT problem. Let's say that a single variable~$x$ or its negation~$\neg x$ is a \index{literal!in SAT}\emph{literal}, while a \index{clause in SAT}\emph{clause} is a disjunction of one or more literals, for example $(\neg x_1 \vee x_2 \vee \neg x_3)$. A formula is in \index{conjunctive normal form}\emph{conjunctive normal form}, or \index{CNF|see{conjunctive normal form}}CNF, if it is a conjunction of clauses: \[(\neg x_1 \vee x_2 \vee x_3) \wedge (\neg x_1 \vee x_2 \vee \neg x_3) \wedge (\neg x_2 \vee x_3) \wedge (x_1 \vee x_2) \wedge (\neg x_1 \vee \neg x_2)\] The above formula is more precisely in 3CNF, meaning it has the additional property that no clause contains more than three literals. In fact, an arbitrary Boolean formula can always be converted first into CNF, and then 3CNF, in a way that preserves satisfiability (or lack thereof). Thus, for our purposes it suffices to solve 3SAT, as the restriction of the SAT problem to 3CNF is known. @@ -1046,30 +1046,28 @@ \section{Associated Type Inference}\label{associated type inference} \] There is one caveat with how we interpret the output of our ``SAT solver.'' Formally, SAT is a \emph{decision problem} with a true or false answer; it just asks if at least one satisfying assignment \emph{exists}. On the other hand, associate type inference will enumerate \emph{all} valid solutions and attempt to pick the best one. Our example formula was not only satisfiable, but it happened to have a unique satisfying assignment. In general, if associated type inference finds a unique solution, \emph{or} if it diagnoses an ambiguity because there is more than one valid solution, we can conclude that our Boolean formula is satisfiable. If there is no solution, there is no satisfying assignment. But what does this all \emph{mean}? -\paragraph{Non-deterministic polynomial time.} -To understand the significance of this, we turn to \cite{garey1979computers}. -In theory, we can solve satisfiability with a \emph{non-deterministic algorithm}. First, we fix a scheme for encoding a truth assignment as a string of symbols; for example, we can write down a series of 0~and~1 values assigned to each of our $n$ variables, in some fixed order. Our non-deterministic algorithm then ``guesses'' all $2^n$ truth assignments simultaneously, and ``checks'' each assignment in parallel, by evaluating the formula to obtain a true or false result. The non-deterministic algorithm then outputs success if at least one thread of execution had a true result, otherwise it outputs failure. +\paragraph{Non-deterministic polynomial time.} We've arrived at one of the most celebrated ideas in theoretical computer science. A classic text on this topic is~\cite{garey1979computers}, while \cite{maccormick2018can} has a gentler introduction. We can start by describing a \emph{non-deterministic} algorithm to solve satisfiability. We fix an encoding of truth assignments as strings of symbols; for example, we can write down the 0~and~1 values assigned to each of our $n$ variables, in some fixed order. Our non-deterministic algorithm then simultaneously ``guesses'' all $2^n$ truth assignments, and ``checks'' each assignment in parallel with all others, by evaluating the formula to obtain a true or false result. The overall algorithm outputs success if at least one thread outputs success, otherwise it outputs failure. -Our non-deterministic algorithm for satisfiability is also extremely efficient! Indeed, if someone hands us an assignment of truth values, we can \emph{check} if it is a satisfying assignment in $O(n)$ \index{asymptotic complexity}steps, where $n$ is the length of the formula. More generally, a decision problem belongs to the class of \index{non-deterministic polynomial time}\emph{non-deterministic polynomial time problems}, also known as \index{NP problem}NP for short, if we can check that a given solution satisfies the problem instance with an algorithm that always terminates in at most $O(n^k)$ steps, where $n$ is the size of the instance, and $k$ is a fixed constant. +As far as non-deterministic algorithms go, this one is extremely efficient. If someone hands us an assignment of truth values, we can \emph{check} if it is a satisfying assignment in $O(n)$ \index{asymptotic complexity}steps, where $n$ is the length of the formula. More generally, a decision problem belongs to the class of \index{non-deterministic polynomial time}\emph{non-deterministic polynomial time problems}, also known as \index{NP problem}NP for short, if we can check that a given solution satisfies the problem instance with an algorithm that always terminates in at most $O(n^k)$ steps, where $n$ is the size of the instance, and $k$ is a fixed constant. -The checking phase of a non-deterministic polynomial time algorithm terminates in a finite number of steps, so it must only consume a finite amount of memory. In a 1971~paper, \index{Stephen Cook}Stephen A.~Cook used this observation to show that \emph{every} NP problem can be encoded as an instance of Boolean satisfiability, by representing a finite execution of a non-deterministic algorithm with a Boolean formula~\cite{cook}. Boolean satisfiability was only the first of many \index{NP-complete problem}\emph{NP-complete} problems to be discovered. Indeed, we can encode SAT as 3SAT, so 3SAT is also NP-complete. The NP-complete problems are ``the hardest'' problems in NP, because they have every \emph{other} problem in NP as a special case. +We require that the checking phase terminates on all inputs, which implies it only uses a finite amount of memory. In a 1971~paper, \index{Stephen Cook}Stephen A.~Cook used this observation to show that the execution of a non-deterministic algorithm can be modeled by a Boolean formula (the general idea here is almost like assembling a finite digital circuit). In other words, \emph{every} NP problem can be translated into a Boolean satisfiability problem---and importantly, this encoding itself takes only polynomial time and space~\cite{cook}. -Finally, if we can encode some NP-complete problem as an instance of our problem which itself is not necessarily in NP, we just say that our problem is \index{NP-hard problem}\emph{NP-hard}, so the NP-hard problems are ``at least as hard'' as the problems in NP. We've shown that: +Boolean satisfiability was the first of many \index{NP-complete problem}\emph{NP-complete} problems to be discovered. The NP-complete problems are ``the hardest'' problems in NP, because they have every \emph{other} problem in NP as a special case, and they are all as ``equally hard'' as each other. More generally, if our problem can encode an NP-complete problem, but our problem is itself not necessarily in NP, we just say that our problem is \index{NP-hard problem}\emph{NP-hard}: \begin{theorem}\label{assoc np hard} Swift associated type inference is NP-hard. \end{theorem} -In practice, no machine is capable of infinite parallelism, so we cannot actually \emph{run} a non-deterministic algorithm as specified, unless of course, we simulate every possible thread of execution sequentially. In particular, there is no apparent way to translate a non-deterministic \emph{polynomial time} algorithm into an equally efficient deterministic algorithm for a real computer. +But why are these problems ``hard''? The reason is very simple. No machine is capable of unlimited parallelism, so we cannot actually \emph{run} a non-deterministic algorithm as specified, unless of course, we simulate every possible thread of execution sequentially. In particular, there is no apparent way to translate a non-deterministic \emph{polynomial time} algorithm into an equally efficient deterministic algorithm for a real computer. Notice that the running time of \AlgRef{associated type inference algo} is worst case \emph{exponential}, due to the use of backtracking search. We can make various improvements to reduce the size of the search space, so the running time need not be exponential on \emph{every} instance. However, if we could actually devise an algorithm that could \emph{always} solve associated type inference in polynomial time, then by \ThmRef{assoc np hard}, our algorithm would be quite remarkable indeed---because it could solve \emph{every} NP problem in polynomial time. On the other hand, if only someone could prove that \emph{at least one} NP problem cannot be solved by a deterministic polynomial time algorithm, then we could immediately conclude the same about \emph{every} NP-hard problem, so in particular, it would preclude the possibility of solving associated type inference in polynomial time. This is, of course, the famous ``{P} $\stackrel{?}{=}$ {NP}'' problem, which remains unsolved at the present time. For a survey of the problem, see \cite{pvnp}. \paragraph{Overload resolution.} -In realistic Swift programs, associated type inference is only ever presented with a handful of requirements and candidate witnesses at a time, and the search space explored by \AlgRef{associated type inference algo} is not large. However, this matching of requirements against candidate witnesses is just a special case of \emph{overload resolution}, performed by the \index{expression type checker}expression type checker. Swift programmers sometimes run into the NP-hardness of that problem, when \texttt{the compiler is unable to type-check this expression in reasonable time}. It can also be shown that overload resolution is NP-hard in the \index{C${}^\sharp$}$\mathrm{C}^\sharp$~programming language \cite{csharpsat}, using an encoding of the SAT problem quite similar to ours. +In realistic Swift programs, associated type inference is only ever presented with a handful of requirements and candidate witnesses at a time, and the search space explored by \AlgRef{associated type inference algo} is not large. However, this matching of requirements against candidate witnesses is just a special case of \index{overload resolution|see{expression type checker}}\emph{overload resolution}, performed by the \index{expression type checker}expression type checker. Swift programmers sometimes run into the NP-hardness of that problem, when \texttt{the compiler is unable to type-check this expression in reasonable time}. It can also be shown that overload resolution is NP-hard in the \index{C${}^\sharp$}$\mathrm{C}^\sharp$~programming language \cite{csharpsat}, using an encoding of the SAT problem quite similar to ours. -We will not explain it here, but in fact associated type inference can be seen as an instance of an \index{exact cover}\emph{exact cover with colors}, or \index{XCC problem}XCC problem. XCC is also NP-complete of course, but just as with SAT, clever algorithms have been devised that can solve many instances of XCC efficiently. This is a possible future direction if the time spent in associated type inference ever becomes non-trivial in a real-world project. Algorithms for solving XCC and SAT are discussed in~\cite{art4b}. +We will not explain it here, but in fact associated type inference can be seen as an instance of an \index{exact cover}\emph{exact cover with colors}, or \index{XCC problem}XCC problem. XCC is also NP-complete of course, but just as with SAT, clever algorithms have been devised that can solve many instances of XCC efficiently. This is a possible future direction if the time spent in associated type inference ever becomes non-trivial in a real-world project. Algorithms for solving XCC and SAT are discussed in~\cite{art4b}. Finally, another good reference about the SAT problem is~\cite{sataa}. \begin{ceqn} \[\ast \ast \ast\] @@ -1078,42 +1076,45 @@ \section{Associated Type Inference}\label{associated type inference} \vfill \eject -\section{Source Code Reference}\label{conformancesourceref} +\section{Source Code Reference}\label{src:conformances} \subsection*{Global Conformance Lookup} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/ConformanceLookup.h} -\item \SourceFile{lib/AST/ConformanceLookupTable.h} \item \SourceFile{lib/AST/ConformanceLookup.cpp} +\item \SourceFile{lib/AST/ConformanceLookupTable.h} \item \SourceFile{lib/AST/ConformanceLookupTable.cpp} -\end{itemize} -Other source files: -\begin{itemize} \item \SourceFile{include/swift/AST/DeclContext.h} \end{itemize} -\IndexSource{global conformance lookup} \apiref{lookupConformance()}{function} -Performs global conformance lookup of a type to a protocol. Does not check conditional requirements. To check conditional requirements, use \texttt{checkConformance()} described in \SecRef{extensionssourceref}. +Performs \IndexSource{global conformance lookup}global conformance lookup of a type to a protocol. Note that if this is a \IndexSource{conditional conformance}conditional conformance, this does not check \IndexSource{conditional requirement}conditional requirements. To check conditional requirements, use \texttt{checkConformance()} from \SecRef{src:extensions} instead. + +\subsection*{Conformance Lookup Table} + +Key source files: +\begin{itemize} +\item \SourceFile{lib/AST/ConformanceLookupTable.h} +\item \SourceFile{lib/AST/ConformanceLookupTable.cpp} +\item \SourceFile{include/swift/AST/DeclContext.h} +\end{itemize} -\IndexSource{conformance lookup table} \apiref{ConformanceLookupTable}{class} -A conformance lookup table for a nominal type. Every \texttt{NominalTypeDecl} has a private instance of this class, but it is not exposed outside of the global conformance lookup implementation. +A \IndexSource{conformance lookup table}conformance lookup table for a nominal type declaration. Every \texttt{NominalTypeDecl} has a conformance lookup table, but it is not exposed outside of the global conformance lookup implementation. -\IndexSource{local conformance} \apiref{IterableDeclContext}{class} Base class inherited by \texttt{NominalTypeDecl} and \texttt{ExtensionDecl}. \begin{itemize} -\item \texttt{getLocalConformances()} returns a list of conformances directly declared on this nominal type or extension. +\item \texttt{getLocalConformances()} returns a \IndexSource{local conformance} +list of conformances directly declared on this nominal type or extension. \end{itemize} -\index{nominal type declaration} \apiref{NominalTypeDecl}{class} -See also \SecRef{declarationssourceref}. +See also \SecRef{src:declarations}. \begin{itemize} -\item \texttt{getAllConformances()} returns a list of all conformances declared on this nominal type, its extensions, and inherited from its superclass, if any. +\item \texttt{getAllConformances()} returns a list of all conformances declared on this nominal type, its extensions, and inherited from its \IndexSource{superclass declaration}superclass, if any. \end{itemize} \subsection*{Operations on Conformances} @@ -1126,27 +1127,55 @@ \subsection*{Operations on Conformances} \item \SourceFile{lib/AST/ProtocolConformance.cpp} \end{itemize} -\paragraph{Canonical conformances.} -Along the same lines as types and substitution maps, specialized conformances are immutable, and uniquely allocated for each pairing of a normal conformance and substitution map. Specialized conformances can be compared for pointer equality, but this depends on type sugar unless the conformance is \IndexDefinition{canonical conformance}\emph{canonical}. A specialized conformance is canonical if the replacement types and conformances in its substitution map are \index{canonical substitution map}canonical, while normal and abstract conformances are always canonical. We canonicalize a conformance by canonicalizing its substitution map and forming a new specialized conformance. Normal conformances are always considered to be canonical. - -\IndexSource{conformance} -\IndexSource{abstract conformance} \apiref{ProtocolConformanceRef}{class} -A protocol conformance. Stores a single pointer, and is cheap to pass around by value. +A protocol \IndexSource{conformance}conformance. The representation fits in a single pointer, so values of this type are cheap to pass by value. An invalid conformance is encoded using a null pointer, and it is an error to call most of the below operations on an invalid conformance. + +Usually, instances of \texttt{ProtocolConformanceRef} are obtained by substitution or global conformance lookup. It is also possible to construct them directly: +\begin{itemize} +\item To get an invalid conformance, call the default constructor, or equivalently, the \texttt{ProtocolConformanceRef::forInvalid()} static method. +\item To wrap a concrete conformance, call the one-argument constructor that takes a \verb|ProtocolConformance *|. +\item To get an abstract conformance, call the \texttt{forAbstract()} static method. +\end{itemize} +A \texttt{ProtocolConformanceRef} can be taken apart: \begin{itemize} -\item \texttt{isInvalid()} answers if this is an invalid conformance reference, meaning the type did not actually conform to the protocol. -\item \texttt{isAbstract()} answers if this is an abstract conformance reference. -\item \texttt{isConcrete()} answers if this is a concrete conformance reference. -\item \texttt{getConcrete()} returns the \texttt{ProtocolConformance} instance if this is a concrete conformance. -\item \texttt{getRequirement()} returns the \texttt{ProtocolDecl} instance if this is an abstract or concrete conformance. +\item \texttt{isInvalid()} checks if this conformance is invalid. +\item \texttt{isAbstract()} checks if this conformance is abstract. +\item \texttt{getAbstract()} returns the stored \verb|AbstractConformance *| if this conformance is abstract, otherwise asserts. +\item \texttt{isConcrete()} checks if this conformance is concrete. +\item \texttt{getConcrete()} returns the stored \verb|ProtocolConformance *| if this conformance is concrete, otherwise asserts. +\end{itemize} +Often, we do not care if a protocol conformance is abstract or concrete, because we instead use the following methods of \texttt{ProtocolConformanceRef} that work on both: +\begin{itemize} +\item \texttt{getType()} returns the \IndexSource{conforming type}conforming type. +\item \texttt{getProtocol()} returns the \IndexSource{protocol declaration!conformance}\texttt{ProtocolDecl} being conformed to. +\item \texttt{getTypeWitness()} returns the \IndexSource{type witness}type witness for the given associated type declaration. +\item \texttt{getAssociatedConformance()} returns the \IndexSource{associated conformance}associated conformance for an associated conformance requirement of the conformed protocol. +\item \texttt{subst()} returns a new protocol \IndexSource{conformance substitution}conformance obtained by applying a substitution map to this conformance. +\end{itemize} +Like \texttt{Type}, \texttt{GenericSignature}, and \texttt{SubstitutionMap}, conformances are immutable and uniquely allocated. Thus, conformances can be tested for equality using the \texttt{operator==} overload. This depends on type sugar, unless the conformance is \IndexDefinition{canonical conformance}\emph{canonical}. +\begin{itemize} +\item \texttt{isCanonical()} answers if this conformance is canonical. +\item \texttt{getCanonical()} returns the canonical conformance equivalent to this one. +\end{itemize} + +\apiref{AbstractConformance}{class} +An abstract \IndexSource{abstract conformance}protocol conformance. This class is rarely used directly, because both of its operations are available on \texttt{ProtocolConformanceRef} itself. + \begin{itemize} +\item \texttt{getType()} returns the conforming type. +\item \texttt{getProtocol()} returns the \texttt{ProtocolDecl} being conformed to. +\end{itemize} +Abstract conformances with the same conforming type and protocol are equal as pointers. An abstract conformance is canonical if its conforming type is a canonical type. + +\apiref{ProtocolConformance}{class} +A \IndexSource{concrete conformance}concrete protocol conformance. Concrete protocol conformances are always passed by pointer. Concrete conformances can have conditional requirements; this is documented in \SecRef{sec:conditional conformances} and \SecRef{src:extensions}. +\begin{itemize} +\item \texttt{getType()} returns the \IndexSource{conforming type}conforming type. +\item \texttt{getProtocol()} returns the conformed protocol. \item \texttt{getTypeWitness()} returns the \IndexSource{type witness}type witness for an associated type. \item \texttt{getAssociatedConformance()} returns the \IndexSource{associated conformance}associated conformance for a conformance requirement in the protocol's requirement signature. \item \texttt{subst()} returns the protocol conformance obtained by applying a substitution map to this conformance. \end{itemize} - -\IndexSource{concrete conformance} -\apiref{ProtocolConformance}{class} -A concrete protocol conformance. This class is the root of a class hierarchy shown in \FigRef{conformancehierarchy}. Concrete protocol conformances are allocated in the AST context, and are always passed by pointer. See \SecRef{extensionssourceref} for documentation about conditional conformance. +The \texttt{ProtocolConformance} class is the root of a class hierarchy shown in \FigRef{conformancehierarchy}. \begin{figure}\captionabove{The \texttt{ProtocolConformance} class hierarchy}\label{conformancehierarchy} \begin{center} @@ -1167,24 +1196,16 @@ \subsection*{Operations on Conformances} \end{center} \end{figure} -\begin{itemize} -\item \texttt{getType()} returns the \IndexSource{conforming type}conforming type. -\item \texttt{getProtocol()} returns the conformed protocol. -\item \texttt{getTypeWitness()} returns the \IndexSource{type witness}type witness for an associated type. -\item \texttt{getAssociatedConformance()} returns the \IndexSource{associated conformance}associated conformance for a conformance requirement in the protocol's requirement signature. -\item \texttt{subst()} returns the protocol conformance obtained by applying a substitution map to this conformance. -\end{itemize} - \apiref{RootProtocolConformance}{class} Abstract base class for \texttt{NormalProtocolConformance} and \texttt{SelfProtocolConformance}. Inherits from \texttt{ProtocolConformance}. -\IndexSource{normal conformance} \apiref{NormalProtocolConformance}{class} -A normal protocol conformance. Subclass of \texttt{RootProtocolConformance}. +A \IndexSource{normal conformance}normal protocol conformance. Subclass of \texttt{RootProtocolConformance}. \begin{itemize} \item \texttt{getDeclContext()} returns the conforming declaration context, either a nominal type declaration or extension. \item \texttt{getGenericSignature()} returns the generic signature of the conforming context. \end{itemize} +A normal conformance is always canonical. \IndexSource{inherited conformance} \apiref{InheritedProtocolConformance}{class} @@ -1192,6 +1213,7 @@ \subsection*{Operations on Conformances} \begin{itemize} \item \texttt{getInheritedConformance()} returns the base conformance, which must be normal or specialized. \end{itemize} +An inherited conformance is canonical if its conforming type is a canonical type, and its base conformance is a canonical conformance. \IndexSource{conformance substitution map} \IndexSource{specialized conformance} @@ -1201,32 +1223,81 @@ \subsection*{Operations on Conformances} \item \texttt{getGenericConformance()} returns the underlying normal conformance. \item \texttt{getSubstitutionMap()} returns the conformance substitution map. \end{itemize} - -\apiref{SubstitutionMap}{class} -See also \SecRef{substmapsourcecoderef}. A static method for constructing a \IndexSource{protocol substitution map}protocol substitution map from a conformance: -\begin{itemize} -\item \texttt{getProtocolSubstitutions()} builds a new substitution map from a conforming type and a conformance of this type to a protocol. -\end{itemize} +A specialized conformance is canonical if its conformance substitution map is \index{canonical substitution map}canonical. To canonicalize a specialized conformance, we canonicalize the elements of its substitution map, and form a new specialized conformance. \subsection*{Type Substitution} Key source files: \begin{itemize} +\item \SourceFile{include/swift/AST/SubstitutionMap.h} \item \SourceFile{lib/AST/TypeSubstitution.cpp} \end{itemize} -\apiref{TypeSubstituter::transformDependentMemberType()}{method} -Implements \AlgRef{dependent member type substitution}. +\apiref{SubstitutionMap}{class} +We discussed substitution maps in \SecRef{src:substitution maps}. Recall that a \IndexSource{substitution map}substitution map stores a list of conformances, one for each \IndexSource{conformance requirement}conformance requirement in its \IndexSource{input generic signature}input generic signature. Three overloads of the \texttt{get()} static method construct substitution maps. They differ in how the \IndexSource{replacement type}replacement types and conformances are specified: -\subsection*{Lazy Loading} +\medskip +\noindent +\texttt{get(GenericSignature, ArrayRef, ArrayRef)}:\newline +Builds a new substitution map from an input generic signature, an array of replacement types, and array of conformances. The first array's elements are in one-to-one correspondence with the signature's generic parameters, and the second array's elements are in one-to-one correspondence with the signature's conformance requirements. -The interface between conformances and the module system is mediated by an abstract base classes defined in the below header file: -\begin{itemize} -\item \SourceFile{include/swift/AST/LazyResolver.h} -\end{itemize} +\medskip +\noindent +\texttt{get(GenericSignature, ArrayRef, LookupConformanceFn)}:\newline +Builds a new substitution map from an input generic signature, an array of replacement types, and array of conformances. Instead of providing an array of conformances, this form takes a callback, which is invoked on each conformance requirement. + +\medskip +\noindent +\texttt{get(GenericSignature, TypeSubstitutionFn, LookupConformanceFn)}:\newline +Builds a new substitution map by invoking a pair of callbacks to produce each replacement type and conformance. This overload takes two callbacks, which are invoked to produce each replacement type and conformance. + +\medskip -\apiref{LazyConformanceLoader}{class} -Abstract base class implemented by different kinds of modules to fill out conformances. For \index{serialized module}serialized modules, this populates the mapping from requirements to witnesses by deserializing records. For \index{imported module}imported modules, this populates the mapping by inspecting \index{Clang}Clang declarations. +Finally, the \texttt{getProtocolSubstitutions()} static method builds a \IndexSource{protocol substitution map}protocol substitution map for a \IndexSource{protocol generic signature}protocol generic signature, given a conformance to this protocol. + +\apiref{TypeSubstitutionFn}{type alias} +The type of a replacement type callback for the third form of \texttt{SubstitutionMap::get()}. +\begin{verbatim} +using TypeSubstitutionFn + = llvm::function_ref; +\end{verbatim} +The parameter type is always a \texttt{GenericTypeParamType *} when the callback is used with \texttt{SubstitutionMap::get()}. + +\IndexSource{conformance lookup callback} +\apiref{LookupConformanceFn}{type alias} +The type signature of a conformance lookup callback for \texttt{SubstitutionMap::get()}. +\begin{verbatim} +using LookupConformanceFn = llvm::function_ref< + ProtocolConformanceRef(InFlightSubstitution &IFS, + CanType origType, + ProtocolDecl *proto)>; +\end{verbatim} +The \texttt{origType} and \texttt{proto} are the subject type and protocol declaration of a conformance requirement in the input generic signature of the substitution map being constructed. If desired, the \texttt{InFlightSubstitution} instance can be used to recover the substituted subject type as follows: +\begin{Verbatim} +Type substType = origType.subst(IFS); +\end{Verbatim} + +\apiref{LookUpConformanceInModule}{struct} +A callback intended to be used with \texttt{SubstitutionMap::get()} as a conformance lookup callback. Overloads \texttt{operator()} with the signature of \texttt{LookupConformanceFn} to perform a global conformance lookup with the given requirement's substituted subject type and protocol. An instance of this callback is constructed without arguments. For example: +\begin{Verbatim} +auto subMap = SubstitutionMap::get(genericSig, replacementTypes, + LookUpConformanceInModule()); +\end{Verbatim} + +\IndexSource{local conformance lookup callback} +\apiref{LookUpConformanceInSubstitutionMap}{struct} +A callback intended to be used with \texttt{SubstitutionMap::get()} as a conformance lookup callback. Overloads \texttt{operator()} with the signature of \texttt{LookupConformanceFn} to perform a local conformance lookup into another substitution map (\SecRef{abstract conformances}). Constructed with another \texttt{SubstitutionMap}. + +For example, if \texttt{genericSig} is the same as the input generic signature of \texttt{subMap} except that it drops some requirements, we can construct a substitution map for \texttt{genericSig} as follows: +\begin{Verbatim} +auto newMap = SubstitutionMap::get( + genericSig, + subMap.getReplacementTypes(), + LookUpConformanceInSubstitutionMap{subMap}); +\end{Verbatim} + +\apiref{TypeSubstituter::transformDependentMemberType()}{method} +Implements \AlgRef{dependent member type substitution}. \subsection*{Associated Type Inference} diff --git a/docs/Generics/chapters/declarations.tex b/docs/Generics/chapters/declarations.tex index 966fd4fd28f79..ce3a899c36864 100644 --- a/docs/Generics/chapters/declarations.tex +++ b/docs/Generics/chapters/declarations.tex @@ -2,16 +2,16 @@ \begin{document} -\chapter{Declarations}\label{decls} +\chapter{Declarations}\label{chap:decls} -\lettrine{D}{eclarations} are the \IndexDefinition{declaration}building blocks of Swift programs. In \ChapRef{compilation model}, we started by viewing the user's program as a series of \index{module declaration}module declarations, where a module declaration holds \index{file unit}file units. A file unit further holds a list of \IndexDefinition{top-level declaration}top-level declarations, which correspond to the main divisions in a source file. The different kinds of declarations are categorized into a taxonomy, and we will survey this taxonomy, as we did with types in \ChapRef{types}. Our principal goal will be describing the syntactic representations for declaring generic parameters and stating requirements, which are common to all generic declarations; once we have that, we can proceed to \PartRef{part semantics}. +\lettrine{D}{eclarations} are the \IndexDefinition{declaration}building blocks of Swift programs. In \ChapRef{chap:compilation model}, we saw that the user's entire program is a \index{module declaration}\emph{module declaration} at the root of a hierarchy, with \index{file unit}\emph{file units} as immediate children. Now, we will see that a file unit holds a list of \IndexDefinition{top-level declaration}\emph{top-level declarations}, which correspond to the main divisions in a source file, and these declarations can have further declarations nested within them. We will investigate the different kinds of declarations into a taxonomy, as we did with types in \ChapRef{chap:types}. Then, we focus on the syntactic representations for declaring generic parameters and stating requirements, which are common to all generic declarations. Finally, we end the chapter with a discussion of functions, closures, and captured values. We begin with two major divisions in the declaration taxonomy: \begin{enumerate} -\item A \IndexDefinition{value declaration}\emph{value declaration} is a one that can be referenced by name from an \index{expression}expression; this includes variables, functions, and such. Every value declaration has an \IndexDefinition{interface type!value declaration}\emph{interface type}, which is the type assigned to an expression that names this declaration. +\item A \IndexDefinition{value declaration}\emph{value declaration} is a one that can be referenced by name from an \index{expression}expression; this includes variables, functions, and such. Every value declaration has an \IndexDefinition{interface type!value declaration}\emph{interface type}, which is the type we assign to an expression naming this declaration. \item A \IndexDefinition{type declaration}\emph{type declaration} is one that can be referenced by name from within a \index{type representation}type representation. This includes structs, type aliases, and so on. A type declaration declares a type, called the \IndexDefinition{declared interface type}\emph{declared interface type} of the type declaration. \end{enumerate} -Not all declarations are value declarations. An \index{extension declaration}extension declaration adds members to an existing nominal type declaration, as we'll see in \ChapRef{extensions}, but an extension does not itself have a name. A \IndexDefinition{top-level code declaration}\emph{top-level code declaration} holds the statements and expressions written at the top level of a source file, and again, it does not have a name, semantically. +Not all declarations are value declarations. An \index{extension declaration}extension declaration adds members to an existing nominal type declaration, as we'll see in \ChapRef{chap:extensions}, but an extension does not itself have a name. A \IndexDefinition{top-level code declaration}\emph{top-level code declaration} holds the statements and expressions written at the top level of a source file, and again, it does not have a name, semantically. \paragraph{Declaration contexts.} Every declaration is contained in a \IndexDefinition{declaration context}\emph{declaration context}, and a declaration context is anything that \emph{contains} declarations. Consider this program: \begin{Verbatim} @@ -21,29 +21,27 @@ \chapter{Declarations}\label{decls} \end{Verbatim} The \index{parameter declaration}parameter declaration ``\texttt{x}'' is a child of the closure expression ``\verb|{ x in x * x }|'', and not a direct child of the enclosing function declaration. So a \index{closure expression}closure expression is a declaration context, but not a declaration. On the other hand, a parameter declaration is a declaration, but not a declaration context. Finally, the \texttt{squares()} function itself is both a declaration, and a declaration context. -\paragraph{Type declarations.} Types can be written inside expressions, so every type declaration is also a value declaration. We can understand the relationship between the interface type and declared interface type of a type declaration by looking at this example: +\paragraph{Type declarations.} Since the Swift grammar allows type representations to appear inside expressions, every type declaration is \emph{also} a value declaration. The interface type of a type declaration is the metatype formed from its declared interface type. This is a mouthful, but the basic idea should be familiar to every Swift programmer. Consider a global variable with a type annotation and an initial value: \begin{Verbatim} struct Horse {} let myHorse: Horse = Horse() \end{Verbatim} -The struct declaration \index{horse}\texttt{Horse} is referenced twice, first in the type representation on the left-hand side of ``\texttt{=}'' and then again in the \index{expression}\index{initial value expression}initial value expression on the right. On the left-hand side, it's referenced as a type declaration; we want the \emph{declared interface type}, which is the nominal type \texttt{Horse}, because this type's values are stored inside the \texttt{myHorse} variable. The second reference to \texttt{Horse}, within the \index{call expression}call expression, refers to the \emph{type itself} as a value declaration, so we want the \emph{interface type}, which is the \index{metatype type}metatype \texttt{Horse.Type}. (Recall the diagram from \SecRef{more types}.) When a metatype is the callee in a call expression, we interpret it as looking up the member named \texttt{init}: +The struct declaration \index{horse}\texttt{Horse} is referenced twice here, first in the type annotation that follows ``\texttt{:}'', and then again in the \index{expression}\index{initial value expression}initial value expression that follows ``\texttt{=}''. Inside the type annotation, \texttt{Horse} means the \emph{declared interface type} of \texttt{Horse}, which is simply the nominal type~\texttt{Horse}; we're declaring that the \texttt{myHorse} variable stores a value whose type has the stated name. The second reference to \texttt{Horse}, from the initial value expression, refers to \emph{the type itself} as a value, so this uses its \emph{interface type}, which is the \index{metatype type}metatype \texttt{Horse.Type}. (Recall the diagram from \SecRef{sec:more types}.) Furthermore, this metatype value is the callee of a call expression, which is a shorthand for calling a constructor member named \texttt{init}. We can write this out as follows to be even more explicit: \begin{Verbatim} -struct Horse {} let myHorseType: Horse.Type = Horse.self let myHorse: Horse = myHorseType.init() \end{Verbatim} -The interface type of a type declaration always wraps its declared interface type in a metatype type. (It sounds like a mouthful, but the idea is simple.) \paragraph{Nominal type declarations.} -\IndexDefinition{nominal type declaration}Introduced with the \texttt{struct}, \IndexDefinition{enum declaration}\texttt{enum} and \IndexDefinition{class declaration}\texttt{class} keywords; \IndexSwift{5.5}Swift~5.5 also added \texttt{actor}, which to us is just a class~\cite{se0306}. Nominal type declarations are declaration contexts, and the declarations they contain are called their \IndexDefinition{member declaration}\emph{member declarations}. If a member declaration is a function, we call it a \IndexDefinition{method declaration}\emph{method}, a member variable is a \IndexDefinition{property declaration}\emph{property}, and a \IndexDefinition{member type declaration}\emph{member type declaration} is exactly that. +\IndexDefinition{nominal type declaration}Declared by the \texttt{struct}, \IndexDefinition{enum declaration}\texttt{enum}, and \IndexDefinition{class declaration}\texttt{class} keywords; \IndexSwift{5.5}Swift~5.5 also added \texttt{actor}, which to us, is just a class~\cite{se0306}. Nominal type declarations are declaration contexts, and the declarations they contain are called their \IndexDefinition{member declaration}\emph{member declarations}. A function member declaration is commonly called a \IndexDefinition{method declaration}\emph{method}, a member variable is a \IndexDefinition{property declaration}\emph{property}, and a \IndexDefinition{member type declaration}\emph{member type declaration} is exactly that. -Structs and classes can contain a special kind of property declaration called a \IndexDefinition{stored property declaration}\emph{stored property declaration}. Struct values directly store their stored properties, while a class value is a reference to a heap allocated \index{boxing}box. Enum values store exactly one element among several; enum declarations instead contain \IndexDefinition{enum element declaration}\emph{enum element declarations}, introduced with the \texttt{case} keyword. +Structs and classes can contain a special kind of property declaration called a \IndexDefinition{stored property declaration}\emph{stored property declaration}. Struct values directly store their stored properties, while a class value is a reference to a heap allocated \index{boxing}box that contains its stored properties. A value of enum type stores exactly one element among several; instead of stored properties, enum declarations contain \IndexDefinition{enum element declaration}\emph{enum element declarations}, introduced with the \texttt{case} keyword. -The members of a nominal type declaration are visible to name lookup (\SecRef{name lookup}), both in the nominal type declaration's scope (unqualified lookup) and outside (qualified lookup). \ListingRef{unqualified lookup listing} shows three features we will cover in detail later: +The members of a nominal type declaration are visible to name lookup (\SecRef{name lookup}), both in the nominal type declaration's scope (unqualified lookup) and outside (qualified lookup). \ListingRef{unqualified lookup listing} shows three features we will discuss in detail later: \begin{itemize} -\item Nominal type declarations can conform to protocols (\ChapRef{conformances}). -\item Extensions add members to existing nominal type declarations (\ChapRef{extensions}). \item A class can inherit from a \index{superclass type}superclass type, and members of the superclass are also visible from the subclass (\SecRef{classinheritance}). +\item Nominal type declarations can conform to protocols (\ChapRef{chap:conformances}). +\item Extensions add members to existing nominal type declarations (\ChapRef{chap:extensions}). \end{itemize} \begin{listing}\captionabove{Some behaviors of name lookup}\label{unqualified lookup listing} @@ -84,7 +82,7 @@ \chapter{Declarations}\label{decls} \end{tikzpicture} \end{center} -We will say more about name lookup in \ChapRef{typeresolution} and \SecRef{direct lookup}. +We will say more about name lookup in \ChapRef{chap:type resolution} and \SecRef{direct lookup}. A nominal type declaration declares a new type with its own name and identity (hence ``nominal''). The declared interface type of a nominal type declaration is called a \index{nominal type}nominal type, which we talked about in \SecRef{fundamental types}: \begin{Verbatim} @@ -98,7 +96,7 @@ \chapter{Declarations}\label{decls} \end{Verbatim} The declared interface type of the \texttt{Galaxy} struct is \texttt{Universe.Galaxy}, while the declared interface type of \texttt{Planet} is just \texttt{Planet}, with no parent type. This reflects the semantic difference; \texttt{Galaxy} is visible to qualified lookup as a member of \texttt{Universe}, while \texttt{Planet} is only visible to unqualified lookup within the scope of \texttt{solarSystem()}; we call it a \IndexDefinition{local type declaration}\emph{local type declaration}. \SecRef{nested nominal types} gives more detail about nominal type nesting. -\paragraph{Type alias declarations.} These are introduced by the \IndexDefinition{type alias declaration}\texttt{typealias} keyword. The \IndexDefinition{underlying type}underlying type is written on the right-hand side of ``\texttt{=}'': +\paragraph{Type alias declarations.} These are introduced by the \IndexDefinition{type alias declaration}\texttt{typealias} keyword. The \IndexDefinition{underlying type!of type alias declaration}underlying type is written on the right-hand side of ``\texttt{=}'': \begin{Verbatim} typealias Hands = Int // one hand is four inches func measure(horse: Horse) -> Hands {...} @@ -111,23 +109,23 @@ \chapter{Declarations}\label{decls} While type aliases are declaration contexts, the only declarations a type alias can contain are generic parameter declarations, in the event the type alias is generic. \paragraph{Other type declarations.} -We've now seen the first two kinds of type declarations. In the next two sections, we expand on this by looking at the declarations of generic parameters, protocols and associated types. At this point, our foray into Swift generics can begin in earnest. Here are all the type declaration kinds and their declared interface types, with a representative specimen of each: +We've now seen the first two kinds of type declarations. In the next two sections, we expand on this by looking at the declarations of generic parameters, protocols, and associated types. At this point, our foray into Swift generics can begin in earnest. Here are all the type declaration kinds and their declared interface types, with a representative specimen of each: \begin{center} \begin{tabular}{ll} \toprule \textbf{Type declaration}&\textbf{Declared interface type}\\ \midrule Nominal type declaration:&Nominal type:\\ -\verb|struct Horse {...}|&\verb|Horse|\\ +\verb|struct Horse {...}|&\texttt{Horse}\\ \midrule Type alias declaration:&Type alias type:\\ -\verb|typealias Hands = Int|&\verb|Hands|\\ +\verb|typealias Hands = Int|&\texttt{Hands} (canonically \texttt{Int})\\ \midrule Generic parameter declaration:&Generic parameter type:\\ -\verb||&\verb|T| (or \rT)\\ +\verb||&\tT\ (canonically \rT)\\ \midrule Protocol declaration:&Protocol type:\\ -\verb|protocol Sequence {...}|&\verb|Sequence|\\ +\verb|protocol Sequence {...}|&\tSequence\\ \midrule Associated type declaration:&Dependent member type:\\ \verb|associatedtype Element|&\verb|Self.Element|\\ @@ -137,11 +135,11 @@ \chapter{Declarations}\label{decls} \section{Generic Parameters}\label{generic params} -Various kinds of declarations can have a \IndexDefinition{generic parameter list}generic parameter list. We call them \IndexDefinition{generic declaration}\emph{generic declarations}. We start with those where the generic parameter list is written in source: \index{struct declaration}structs, \index{enum declaration}enums, \index{class declaration}classes, \IndexDefinition{generic type alias}type aliases, \index{function declaration}functions and \index{constructor declaration}constructors, and \index{subscript declaration}subscripts. In all cases, a \IndexDefinition{parsed generic parameter list}generic parameter list is denoted in source with the \texttt{<...>} syntax following the name of the declaration: +Various kinds of declarations can have a \IndexDefinition{generic parameter list}generic parameter list. We call them \IndexDefinition{generic declaration}\emph{generic declarations}. We start with those where the generic parameter list is written in source: \index{struct declaration}structs, \index{enum declaration}enums, \index{class declaration}classes, \IndexDefinition{generic type alias}type aliases, \index{function declaration}functions, \index{constructor declaration}constructors, and \index{subscript declaration}subscripts. In all cases, the \IndexDefinition{parsed generic parameter list}generic parameter list is declared by following the generic declaration's name with the \texttt{<...>} syntax: \begin{Verbatim} struct Outer {...} \end{Verbatim} -Each comma-separated element in this list is a \IndexDefinition{generic parameter declaration}\emph{generic parameter declaration}; this is a type declaration that declares a generic parameter type. Generic parameter declarations are visible to unqualified lookup in the entire source range of the parent declaration, the one that has the generic parameter list. When generic declarations nest, each inner generic declaration is effectively parameterized by the generic parameters of its outer declarations. +Each comma-separated element in this list is a \IndexDefinition{generic parameter declaration}\emph{generic parameter declaration}; this is a type declaration that declares a generic parameter type. Generic parameter declarations are scoped to the entire \index{source range}source range of their parent declaration, the one with the generic parameter list. A declaration with a generic parameter list can be nested inside of another generic declaration; each inner generic declaration is effectively parameterized its own generic parameters, together with all generic parameters of its outer declarations. Thus, unqualified lookup will ``see'' all outer generic parameter declarations. Any declaration kind that can have a generic parameter list is also a \index{declaration context}declaration context in our taxonomy, because it contains other declarations; namely, its generic parameter declarations. We say that a declaration context is a \IndexDefinition{generic context}\emph{generic context} if at least one parent context has a generic parameter list. @@ -161,7 +159,6 @@ \section{Generic Parameters}\label{generic params} } } \end{Verbatim} - When type resolution resolves the type representation ``\tT'' in the return type of \texttt{two()}, it outputs a generic parameter type that prints as ``\tT'', if it appears in a diagnostic for example. This is a \index{sugared type}sugared type. Every generic parameter type also has a canonical form which only records the depth and index; we denote a canonical generic parameter type by ``\ttgp{d}{i}'', where \texttt{d} is the depth and \texttt{i} is the index. Two generic parameter types are canonically equal if they have the same depth and index. This is sound, because the depth and index unambiguously identify a generic parameter within its lexical scope. Let's enumerate all generic parameters visible within \texttt{two()}, @@ -188,11 +185,11 @@ \section{Generic Parameters}\label{generic params} \bottomrule \end{tabular} \end{center} -The generic parameter~\tU\ of \texttt{two()} has the same \index{declared interface type!generic parameter declaration}declared interface type, \ttgp{1}{0}, as the generic parameter~\tV\ of \texttt{four()}. This is not a problem because the source ranges of their parent declarations, \texttt{two()} and \texttt{Both}, do not intersect. +The generic parameter~\tU\ of \texttt{two()} has the same \index{declared interface type!generic parameter declaration}declared interface type, \ttgp{1}{0}, as the generic parameter~\tV\ of \texttt{four()}. This is not a problem because the \index{source range}source ranges of their parent declarations, \texttt{two()} and \texttt{Both}, do not intersect. The numbering by depth can be seen in the \index{declared interface type!nested nominal type}declared interface type of a nested generic nominal type declaration. For example, the declared interface type of \texttt{Outer.Both} is the generic nominal type \texttt{Outer<\ttgp{0}{0}>.Both<\ttgp{1}{0}, \ttgp{1}{1}>}. -\paragraph{Implicit generic parameters.} Sometimes the generic parameter list is not written in source. Every protocol declaration has a generic parameter list with a single generic parameter named \IndexSelf\tSelf\ (\SecRef{protocols}), and every extension declaration has a generic parameter list cloned from that of the extended type (\ChapRef{extensions}). These implicit generic parameters can be referenced by name within their scope, just like the generic parameter declarations in a parsed generic parameter list (\SecRef{identtyperepr}). +\paragraph{Implicit generic parameters.} Sometimes the generic parameter list is not written in source. Every protocol declaration has a generic parameter list with a single generic parameter named \IndexSelf\tSelf\ (\SecRef{protocols}), and every extension declaration has a generic parameter list cloned from that of the extended type (\ChapRef{chap:extensions}). These implicit generic parameters can be referenced by name within their scope, just like the generic parameter declarations in a parsed generic parameter list (\SecRef{identtyperepr}). Function, constructor and subscript declarations can also declare \IndexDefinition{opaque parameter}\emph{opaque parameters} with the \texttt{some} keyword, possibly in combination with a generic parameter list: \begin{Verbatim} @@ -200,11 +197,11 @@ \section{Generic Parameters}\label{generic params} \end{Verbatim} An opaque parameter simultaneously declares a parameter value, a generic parameter type that is the type of the value, and a requirement this type must satisfy. Here, we can refer to ``\texttt{elts}'' from an expression inside the function body, but we cannot name the \emph{type} of ``\texttt{elts}'' in a type representation. From \index{expression}expression context however, the type of an opaque parameter can be obtained via the \texttt{type(of:)} special form, which produces a metatype value. This allows for invoking static methods on these types. -The \IndexDefinition{generic parameter list request}\Request{generic parameter list request} appends the opaque parameters to the parsed generic parameter list, so they follow the parsed generic parameters in index order. In \texttt{pickElement()}, the generic parameter \texttt{E} has canonical type~\rT, while the opaque parameter associated with ``\texttt{elts}'' has canonical type~\rU. Opaque parameter declarations also state a constraint type, which imposes a requirement on this unnamed generic parameter. We will discuss this in the next section. Note that when \texttt{some} appears in the return type of a function, it declares an \emph{opaque return type}, which is a related but different feature (\ChapRef{opaqueresult}). +The \IndexDefinition{generic parameter list request}\Request{generic parameter list request} appends the opaque parameters to the parsed generic parameter list, so they follow the parsed generic parameters in index order. In \texttt{pickElement()}, the generic parameter \texttt{E} has canonical type~\rT, while the opaque parameter associated with ``\texttt{elts}'' has canonical type~\rU. Opaque parameter declarations also state a constraint type, which imposes a requirement on this unnamed generic parameter. We will discuss this in the next section. Note that when \texttt{some} appears in the return type of a function, it declares an \emph{opaque result type}, which is a related but different feature (\ChapRef{chap:opaque result types}). -In \ChapRef{genericsig}, we define the generic signature, which records all visible generic parameters of a declaration, independent of surface syntax. +In \ChapRef{chap:generic signatures}, we will discuss the generic signature, a data structure which collects all generic parameters that parameterize a declaration, independent of surface syntax. -\section{Requirements}\label{requirements} +\section{Requirements}\label{sec:requirements} The requirements of a generic declaration constrain the generic argument types that can be provided by the caller. This endows the generic declaration's type parameters with new capabilities, so they abstract over the concrete types that satisfy those requirements. We use the following encoding for requirements in the theory and implementation. @@ -213,7 +210,7 @@ \section{Requirements}\label{requirements} \begin{itemize} \item A \IndexDefinition{conformance requirement}\textbf{conformance requirement} $\TP$ states that the replacement type for~\tT\ must conform to~\texttt{P}, which must be a protocol, protocol composition, or parameterized protocol type. \item A \IndexDefinition{superclass requirement}\textbf{superclass requirement} $\TC$ states that the replacement type for~\tT\ must be a subclass of some \index{class type}class type~\tC. -\item A \IndexDefinition{layout requirement}\textbf{layout requirement} $\TAnyObject$ states that the replacement type for~\tT\ must be represented as a single reference-counted pointer at runtime. +\item A \IndexDefinition{layout requirement}\textbf{layout requirement} $\TAnyObject$ states that the replacement type for~\tT\ must be represented as a single \index{reference count}reference-counted pointer at run time. \item A \IndexDefinition{same-type requirement}\textbf{same-type requirement} $\TU$ states that the replacement types for \tT~and~\tU\ must be \index{canonical type equality}canonically equal. \end{itemize} \end{definition} @@ -231,7 +228,7 @@ \section{Requirements}\label{requirements} \item A \index{parameterized protocol type!constraint type}parameterized protocol type, like \texttt{Sequence}. \item A \index{protocol composition type!constraint type}protocol composition type, like \texttt{Sequence \& MyClass}. \item A \index{class type!constraint type}class type, like \texttt{NSObject}. -\item The \Index{AnyObject@\texttt{AnyObject}}\texttt{AnyObject} \index{layout constraint}\emph{layout constraint}, which restricts the possible concrete types to those represented as a single reference-counted pointer. +\item The \Index{AnyObject@\texttt{AnyObject}}\texttt{AnyObject} \index{layout constraint}\emph{layout constraint}, which restricts the possible concrete types to those represented as a single \index{reference count}reference-counted pointer. \end{enumerate} In the first three cases, the stated requirement becomes a conformance requirement. Otherwise, it is a superclass or layout requirement. In all cases, the subject type of the requirement is the \index{declared interface type!generic parameter declaration}declared interface type of the generic parameter. @@ -265,13 +262,13 @@ \section{Requirements}\label{requirements} We will see later that constraint types also appear in various other positions, and in all cases, they state a requirement with some distinguished subject type: \begin{enumerate} \item In the inheritance clause of a protocol or associated type (\SecRef{protocols}). -\item Following the \texttt{some} keyword in return position, where it declares an opaque return type (\ChapRef{opaqueresult}). -\item Following the \texttt{any} keyword that references an existential type (\ChapRef{existentialtypes}), with the exception that the constraint type cannot be a class by itself (for example, we allow ``\verb|any NSObject & Equatable|'', but ``\verb|any NSObject|'' is just ``\texttt{NSObject}''). +\item Following the \texttt{some} keyword in return position, where it declares an opaque result type (\ChapRef{chap:opaque result types}). +\item Following the \texttt{any} keyword for introducing an existential type (\ChapRef{chap:existential types}), with the exception that the constraint type cannot be a class by itself (for example, we allow ``\verb|any MyClass & Equatable|'', but ``\verb|any MyClass|'' is just ``\texttt{MyClass}''). \end{enumerate} -\paragraph{Trailing where clauses.} Requirements can also be stated in a \IndexDefinition{where clause@\texttt{where} clause}\index{trailing where clause@trailing \texttt{where} clause|see{\texttt{where} clause}}\texttt{where} clause attached to the generic declaration. This allows generality that cannot be expressed using the inheritance clause of a generic parameter alone. +\paragraph{Trailing where clauses.} Requirements can also be stated in a \IndexDefinition{where clause@\texttt{where} clause}\index{trailing where clause@trailing \texttt{where} clause|see{\texttt{where} clause}}\texttt{where} clause attached to the generic declaration. This allows for more generality than can be expressed in the inheritance clause of a generic parameter alone. -A \texttt{where} clause entry defines a requirement whose subject type is written explicitly, so that \index{dependent member type!in requirements}dependent member types can be be subject to requirements; here, we state two requirements, $\rTSequence$ and $\ConfReq{\rT.Element}{Comparable}$: +A \texttt{where} clause entry defines a requirement whose subject type is written explicitly, so that \index{dependent member type!in requirements}dependent member types can be subject to requirements; here, we state two requirements, $\rTSequence$ and $\ConfReq{\rT.Element}{Comparable}$: \begin{Verbatim} func isSorted(_: S) where S: Sequence, S.Element: Comparable {...} \end{Verbatim} @@ -279,12 +276,12 @@ \section{Requirements}\label{requirements} A \texttt{where} clause can also state a same-type requirement. In the next example, we state two conformance requirements using the inheritance clause syntax, another conformance requirement, and the same-type requirement $\SameReq{\rT.Element}{\rU.Element}$: \begin{Verbatim} func merge(_: S1, _: S2) -> [S1.Element] - where S1: Comparable, S1.Element == S2.Element {...} + where S1.Element: Comparable, S1.Element == S2.Element {...} \end{Verbatim} Note that there is no way to refer to an opaque parameter type within the function's \Index{where clause@\texttt{where} clause!opaque parameter}\texttt{where} clause, but every declaration using opaque parameters can always be rewritten into an equivalent one using named generic parameters, so no generality is lost. -We saw in \ChapRef{types} that when the parser reads a type annotation in the source, it constructs a \index{type representation}type representation, a lower-level syntactic object which must be \index{type resolution}resolved to obtain a \index{type}type. Similarly, requirements have a syntactic form, called a \IndexDefinition{requirement representation}\emph{requirement representation}. The parser constructs requirement representations while reading a \texttt{where} clause. The relationship between the syntactic and semantic entities is shown in this diagram: +We saw in \ChapRef{chap:types} that when the parser reads a type annotation in the source, it constructs a \index{type representation}type representation, a lower-level syntactic object which must be \index{type resolution}resolved to obtain a \index{type}type. Similarly, requirements have a syntactic form, called a \IndexDefinition{requirement representation}\emph{requirement representation}. The parser constructs requirement representations while reading a \texttt{where} clause. The relationship between the syntactic and semantic entities is shown in this diagram: \begin{center} \begin{tikzpicture}[node distance=1cm] \node (ReqRepr) [data] {Requirement representation}; @@ -332,7 +329,7 @@ \section{Requirements}\label{requirements} \medskip -In \ChapRef{genericsig}, we will see that the generic signature of a declaration records all of its requirements, regardless of they were stated in source. +In \ChapRef{chap:generic signatures}, we will see that the generic signature of a declaration records all of its requirements, regardless of they were stated in source. \paragraph{History.} The syntax described in this section has evolved over time: \begin{itemize} @@ -350,21 +347,21 @@ \section{Protocols}\label{protocols} Every protocol has an implicit generic parameter list with a single generic parameter named \IndexDefinition{protocol Self type@protocol \tSelf\ type}\tSelf, which abstracts over the conforming type. The declared interface type of \tSelf\ is always~\rT; protocols cannot be nested in other generic contexts (\SecRef{nested nominal types}), nor can they declare any other generic parameters. -The \texttt{associatedtype} keyword introduces an \IndexDefinition{associated type declaration}\emph{associated type declaration}, which can only appear inside of a protocol. The declared interface type is a \index{dependent member type!associated type declaration}dependent member type (\SecRef{fundamental types}). Specifically, the \index{declared interface type!associated type declaration}declared interface type of an associated type~\texttt{A} in a protocol~\texttt{P} is the \index{bound dependent member type!associated type declaration}bound dependent member type denoted \texttt{Self.[P]A}, formed from the base type of~\tSelf\ together with~\texttt{A}. A nominal type conforming to this protocol must declare a type witness for each associated type (\SecRef{type witnesses}). +The \texttt{associatedtype} keyword introduces an \IndexDefinition{associated type declaration}\emph{associated type declaration}, which can only appear inside of a protocol. The declared interface type is a \index{dependent member type!associated type declaration}dependent member type (\SecRef{fundamental types}). Specifically, the \index{declared interface type!associated type declaration}declared interface type of an associated type~\texttt{A} in a protocol~\texttt{P} is the \index{bound dependent member type!associated type declaration}bound dependent member type denoted \texttt{Self.[P]A}, formed from the base type of~\tSelf\ together with~\texttt{A}. A nominal type conforming to a protocol with associated types must declare a type witness for each associated type (\SecRef{type witnesses}). -Protocols can also state \IndexDefinition{associated requirement}\emph{associated requirements} on their \tSelf\ type and its dependent member types. The conforming type and its type witnesses must satisfy the protocol's associated requirements. We will review all the ways of stating associated requirements now. +Protocols can also impose \IndexDefinition{associated requirement}\emph{associated requirements} on \tSelf\ and its dependent member types. A conforming type and its type witnesses must together satisfy these associated requirements. There are a number of ways to state associated requirements in the language, so we will review them now. \paragraph{Protocol inheritance clauses.} A protocol can have an \index{inheritance clause!protocol declaration}inheritance clause with a list of one or more comma-separated \index{constraint type!protocol inheritance clause}constraint types. Each inheritance clause entry states an associated requirement with a subject type of \tSelf. These are additional requirements the conforming type itself must satisfy in order to conform. -An associated conformance requirement with a subject type of \tSelf\ establishes a \index{protocol inheritance|see{inherited protocol}}\IndexDefinition{inherited protocol}\emph{protocol inheritance} relationship. The protocol stating the requirement is the \emph{derived protocol}, and the protocol on the right-hand side is the \emph{base protocol}. The derived protocol is said to \emph{inherit} from (or sometimes, \emph{refine}) the base protocol. A \index{qualified lookup!protocol inheritance}qualified lookup will search through all base protocols, when the lookup begins at a derived protocol or one of its concrete conforming types. +An associated conformance requirement with a subject type of \tSelf\ establishes a \index{protocol inheritance|see{inherited protocol}}\IndexDefinition{inherited protocol}\emph{protocol inheritance} relationship. The protocol stating the requirement is the \emph{derived protocol}, and the protocol on the right-hand side is the \emph{base protocol}. The derived protocol is said to \emph{inherit} from (or sometimes, \emph{refine}) the base protocol. A \index{qualified lookup!protocol inheritance}qualified lookup into a protocol (or concrete nominal types) will traverse all of its base protocols (or all base protocols of each conformed protocol of the concrete nominal type). For example, the standard library's \texttt{Collection} protocol inherits from \texttt{Sequence} by stating the associated requirement $\ConfReq{Self}{Sequence}$: \begin{Verbatim} protocol Collection: Sequence {...} \end{Verbatim} -Protocols can restrict their conforming types to those with a reference-counted pointer representation by stating an \texttt{AnyObject} layout constraint in the inheritance clause: +Protocols can restrict their conforming types to those with a \index{reference count}reference-counted pointer representation by stating an \texttt{AnyObject} layout constraint in the inheritance clause: \begin{Verbatim} protocol BoxProtocol: AnyObject {...} \end{Verbatim} @@ -376,7 +373,7 @@ \section{Protocols}\label{protocols} protocol Duck: Animal {} class MockDuck: Plant, Duck {} // error: not a subclass of Animal \end{Verbatim} -A protocol is \IndexDefinition{class-constrained protocol}\emph{class-constrained} if the $\ConfReq{Self}{AnyObject}$ \index{associated layout requirement}associated layout requirement is either explicitly stated, or a consequence of some other associated requirement. We'll say more about the semantics of protocol inheritance clauses and name lookup in \SecRef{requirement sig}, \SecRef{identtyperepr}, and \ChapRef{building generic signatures}. +A protocol is \IndexDefinition{class-constrained protocol}\emph{class-constrained} if the $\ConfReq{Self}{AnyObject}$ \index{associated layout requirement}associated layout requirement is either explicitly stated, or a consequence of some other associated requirement. We'll say more about the semantics of protocol inheritance clauses and name lookup in \SecRef{requirement sig}, \SecRef{identtyperepr}, and \ChapRef{chap:building generic signatures}. \paragraph{Primary associated types.} A protocol can declare a list of \IndexDefinition{primary associated type}\emph{primary associated types} with a syntax resembling that of a generic parameter list: @@ -386,7 +383,7 @@ \section{Protocols}\label{protocols} mutating func next() -> Element? } \end{Verbatim} -While generic parameter lists introduce new generic parameter declarations, the entries in the primary associated type list reference an \emph{existing} associated type declaration, either in the protocol itself, or some base protocol. +While generic parameter lists introduce new generic parameter declarations, the entries in the primary associated type list reference \emph{existing} associated type declarations, either in the protocol itself, or some base protocol. A \index{parameterized protocol type}\emph{parameterized protocol type} can be formed from a reference to a protocol with primary associated types, by taking a list of generic argument types, one for each primary associated type. On the right-hand side of a conformance requirement, a parameterized protocol type decomposes into a conformance requirement to the protocol, followed by a series of same-type requirements. The following are equivalent: \begin{Verbatim} @@ -426,7 +423,7 @@ \section{Protocols}\label{protocols} protocol Sequence where Self.Iterator: IteratorProtocol, Self.Element == Self.Iterator.Element {...} \end{Verbatim} -In all cases, we state the same two associated requirements. Our notation is to append a subscript with the protocol name declaring the requirement: +In all cases, we state the same two associated requirements. Our notation is to subscript the associated requirement with the name of the protocol stating this requirement: \begin{gather*} \ConfReq{Self.Iterator}{IteratorProtocol}_\texttt{Sequence}\\ \SameReq{Self.Element}{Self.Iterator.Element}_\texttt{Sequence} @@ -434,12 +431,12 @@ \section{Protocols}\label{protocols} Let's summarize all the ways of stating associated requirements in a protocol~\texttt{P}: \begin{itemize} -\item The protocol can state an inheritance clause. Each entry defines a conformance, superclass or layout requirement with a subject type of \tSelf. +\item The protocol can state an inheritance clause. Each entry defines a conformance, superclass, or layout requirement with a subject type of \tSelf. \item An associated type declaration \texttt{A} can state an inheritance clause. Each entry defines a conformance, superclass or layout requirement with a subject type of \texttt{Self.[P]A}. \item Arbitrary associated requirements can be stated in \Index{where clause@\texttt{where} clause!protocol declaration}trailing \texttt{where} clauses, attached to the protocol or any of its associated types, in any combination. \end{itemize} -A protocol's associated requirements are collected in its requirement signature, which we will see is dual to a generic signature in some sense (\SecRef{requirement sig}). How concrete types satisfy the requirement signature will be discussed in \ChapRef{conformances}. +A protocol's associated requirements are collected in its requirement signature, which we will see is dual to a generic signature in some sense (\SecRef{requirement sig}). How concrete types satisfy the requirement signature will be discussed in \ChapRef{chap:conformances}. \paragraph{Self requirements.} The \Index{where clause@\texttt{where} clause!protocol declaration}\texttt{where} clause of a protocol method or subscript requirement cannot constrain \tSelf\ or its associated types. For example, the following protocol is rejected, because there would be no way to implement the \texttt{minElement()} requirement in a concrete conforming type whose \texttt{Element} type is \emph{not} \texttt{Comparable}: @@ -467,7 +464,7 @@ \section{Functions}\label{function decls} \end{Verbatim} \paragraph{Method declarations.} -In addition to the formal parameters declared in its parameter list, a method declaration also has an implicit \texttt{self} parameter, to receive the value on the left-hand side of the ``\texttt{.}'' in the \index{method call expression}method call expression. The interface type of a method declaration is a function type which receives the \IndexDefinition{self parameter declaration}\texttt{self} parameter, and returns another function which then takes the method's formal parameters. The ``\texttt{->}'' syntax for a function type associates to the right, so \verb|A -> B -> C| means \verb|A -> (B -> C)|: +In addition to the formal parameters declared in its parameter list, a \index{method declaration}method declaration also has an implicit \IndexDefinition{self parameter@\texttt{self} parameter}\texttt{self} parameter, to receive the value on the left-hand side of the ``\texttt{.}'' in the \index{member reference expression}member reference expression. The interface type of a method declaration is a function type which receives the \texttt{self} parameter, and returns another function which then takes the method's formal parameters. The ``\texttt{->}'' syntax for a function type associates to the right, so \verb|A -> B -> C| means \verb|A -> (B -> C)|: \begin{Verbatim} struct Universe { func wormhole(x: Int, y: String) -> Bool {...} @@ -480,13 +477,13 @@ \section{Functions}\label{function decls} // Interface type: (inout Universe) -> () -> () } \end{Verbatim} -The interface type of the \texttt{self} parameter is derived as follows: +We derive the \index{interface type!self parameter}interface type and \index{ownership specifier!self parameter}ownership specifier for the \texttt{self} parameter as follows: \begin{itemize} -\item We start with the \IndexDefinition{self interface type}\emph{self interface type} of the method's parent declaration context. In a struct, enum or class, this is the same as the \index{declared interface type!self interface type}declared interface type. In a protocol, this is the protocol \IndexSelf\tSelf\ type (\SecRef{protocols}). In an extension, the self interface type is that of the extended type. +\item We start with the \IndexDefinition{self interface type}\emph{self interface type} of the method's parent declaration context. In a struct, enum, or class, this is the same as the \index{declared interface type!self interface type}declared interface type. In a protocol, this is the protocol \IndexSelf\tSelf\ type (\SecRef{protocols}). In an extension, the self interface type is that of the extended type. -\item If the method is declared inside a class, and if it returns the \Index{dynamic Self type@dynamic \tSelf\ type}dynamic \tSelf\ type, we wrap the type in the dynamic \tSelf\ type (\SecRef{misc types}). +\item If the method is declared inside a class, and if it returns the \Index{dynamic Self type@dynamic \tSelf\ type}dynamic \tSelf\ type, we wrap the \texttt{self} type in the dynamic \tSelf\ type (\SecRef{sec:special types}). -\item If the method is \IndexDefinition{static method declaration}\texttt{static}, we wrap the type in a \index{metatype type}metatype. +\item If the method is \IndexDefinition{static method declaration}\texttt{static}, we further wrap the \texttt{self} type in a \index{metatype type}metatype. \item If the method is \texttt{mutating}, we pass the \texttt{self} parameter \texttt{inout}. @@ -562,18 +559,18 @@ \section{Functions}\label{function decls} \end{minipage} \end{wrapfigure} -All function declarations must be followed by a body in the source language, except for protocol requirements. A function body can contain statements, expressions, and other declarations. (Unlike types and declarations, we will not exhaustively cover all statements and expressions in this book.) The example on the left shows some call expressions. +All function declarations must be followed by a body in the source language, except for protocol requirements. A function body can contain \index{statement}statements, \index{expression}expressions, and other declarations. (Unlike types and declarations, we will not exhaustively survey all statements and expressions in this book.) The example on the left shows some \index{call expression}call expressions. In a method body, an unqualified reference to a member of the innermost nominal type declaration is interpreted as having an implicit ``\texttt{self.}'' qualification. Thus, instance methods can refer to other instance methods this way, and static methods can refer to other static methods. -An unqualified reference to a member of an outer nominal type can only be made if the member is static, because there is no ``outer \texttt{self} value'' to invoke the method with; a \emph{value} of the nested type does not contain a \emph{value} of its parent type. +An unqualified reference to a member of an outer nominal type can only be made if the member is static, because there is no ``outer \Index{self parameter@\texttt{self} parameter!nested nominal type}\texttt{self} parameter'' to invoke the method with; a \emph{value} of the nested type does not contain a \emph{value} of its parent type. -For the same reason, methods inside \index{local type declaration}local types cannot refer to local variables declared outside of the local type. (Contrast this with \index{Java}Java inner classes for example, which can be declared as \texttt{static} or instance members of their outer class; a non-\texttt{static} inner class captures a ``\texttt{this}'' reference from the outer class. Inner classes nested in methods can also capture local variables in Java.) +For the same reason, methods inside \index{local type declaration}local types cannot refer to local variables declared outside of the local type. (Contrast this with \index{Java}Java inner classes for example, which can be declared as \texttt{static} or instance members of their outer class; a non-\texttt{static} inner class \index{captured value!Java inner class}captures a ``\texttt{this}'' reference from the outer class. Inner classes nested in methods can also capture local variables in Java.) \paragraph{Constructor declarations.} \IndexDefinition{constructor declaration}Constructor declarations are introduced with the \texttt{init} keyword. The parent context of a constructor must be a nominal type or extension. -From the outside, the interface type of a constructor looks like a static method that returns a new instance of the type, but inside the constructor, \texttt{self} is the instance being initialized, so the interface type of \texttt{self} is the nominal type, and not its metatype. In a struct or enum, \texttt{self} is also \texttt{inout}. Constructors can delegate to other constructors in various ways. To model the delegation with a call expression, the \IndexDefinition{initializer interface type}\emph{initializer interface type} describes the type of an in-place initialization at a location provided by the caller: +From the outside, the interface type of a constructor looks like a static method that returns a new instance of the type, but inside the constructor, \Index{self parameter@\texttt{self} parameter!constructor declaration}\texttt{self} is the instance being initialized, so the interface type of \texttt{self} is the nominal type, and not its metatype. In a struct or enum, \texttt{self} is also \texttt{inout}. Constructors can delegate to other constructors in various ways. To model the delegation with a call expression, the \IndexDefinition{initializer interface type}\emph{initializer interface type} describes the type of an in-place initialization at a location provided by the caller: \begin{Verbatim} struct Universe { init(age: Int) {...} @@ -583,21 +580,21 @@ \section{Functions}\label{function decls} \end{Verbatim} \paragraph{Destructor declarations.} -\IndexDefinition{destructor declaration}Destructor declarations are introduced with the \texttt{deinit} keyword. They can only appear inside classes. They have no formal parameters, no generic parameter list, no \texttt{where} clause, and no return type. +\IndexDefinition{destructor declaration}Introduced with the \texttt{deinit} keyword, which is only valid inside a class. A destructor cannot have a generic parameter list or \texttt{where} clause. Its interface type is that of a method with no formal parameters and a return type of \texttt{()}. \paragraph{Local contexts.} -A \IndexDefinition{local context}\emph{local context} is any declaration context that is not a module, source file, type declaration or extension. Swift allows variable, function and type declarations to appear in local context. The following are local contexts: +A \IndexDefinition{local context}\emph{local context} is any declaration context that is not a module, source file, type declaration or extension. Swift allows variable, function, and type declarations to appear in local context. The following are local contexts: \begin{itemize} \item \index{top-level code declaration}Top-level code declarations. \item Function declarations. \item \index{closure expression}Closure expressions. \item If a variable is not itself in local context (for example, it's a member of a nominal type declaration), then its \index{initial value expression}initial value expression defines a new local context. -\item \index{subscript declaration}Subscript and \index{enum element declaration}enum element declarations declarations are local contexts, because they can contain parameter declarations (and also a generic parameter list, in the case of a subscript). +\item \index{subscript declaration}Subscript and \index{enum element declaration}enum element declarations are local contexts, because they can contain parameter declarations (and also a generic parameter list, in the case of a subscript declaration). \end{itemize} -Local functions and closures can \IndexDefinition{captured value}\emph{capture} references to other local declarations from outer scopes. We use the standard technique of \IndexDefinition{closure conversion}\emph{closure conversion} to lower functions with captured values into ones without. We can understand this process as introducing an additional parameter for each captured value, followed by a walk to replace references to captured values with references to the corresponding parameters in the function body. In Swift, this is part of \index{SILGen}SILGen's lowering process, and not a separate transformation on the abstract syntax tree. +Local functions and closures can \IndexDefinition{captured value}\emph{capture} references to other local declarations from outer scopes. We use the standard technique of \IndexDefinition{closure conversion}\emph{closure conversion} to lower functions with captured values into ones without. We can understand this process as introducing an additional parameter for each captured value, followed by a walk of the function body to replace references to those captured values with references to the corresponding parameters. In Swift, this is part of \index{SILGen}SILGen's lowering process, and not a separate transformation on the abstract syntax tree. -The \IndexDefinition{capture info request}\Request{capture info request} computes the list of declarations captured by the given function and all of its nested local functions and closure expressions. +The \IndexDefinition{capture info request}\Request{capture info request} computes the list of values captured by the given function and all of its nested local functions and closure expressions. \begin{wrapfigure}[10]{l}{10.6em} \begin{minipage}{10.5em} @@ -620,7 +617,7 @@ \section{Functions}\label{function decls} Consider the three nested functions shown on the left. We proceed to compute their captures from the inside out. -The innermost function~\texttt{h()} captures \texttt{y}~and~\texttt{z}. The middle function~\texttt{g()} captures~\texttt{x}. It also captures~\texttt{y}, because~\texttt{h()} captures~\texttt{y}, but it does not capture~\texttt{z}, because~\texttt{z} is declared by~\texttt{g()} itself. Finally,~\texttt{f()} is declared at the top level, so it does not have any captures. +The innermost function~\texttt{h()} captures \texttt{y}~and~\texttt{z}. The middle function~\texttt{g()} captures~\texttt{x}. It also captures~\texttt{y}, because~\texttt{h()} captures~\texttt{y}, but it does not capture~\texttt{z}, because~\texttt{z} is declared by~\texttt{g()} itself. Finally,~\texttt{f()} is declared at the top level, so it doesn't have any captures. We can summarize this as follows (see \AppendixRef{math summary} for a summary of set notation): % FIXME @@ -640,18 +637,18 @@ \section{Functions}\label{function decls} \bigskip \begin{algorithm}[Compute closure captures]\label{closure captures algorithm} -As input, takes the type-checked body of a \index{closure function}closure expression or \index{local function declaration}local function~$F$. Outputs the \index{set}set of captures of~$F$. +As input, takes the type-checked body of a \index{closure function}closure expression or \index{local function declaration}local function~$F$. Outputs the \index{set!captures}set of captures of~$F$. \begin{enumerate} \item Initialize the return value with an \index{empty set}empty set, $C\leftarrow\varnothing$. \item Recursively walk the type-checked body of $F$ and handle each element: -\item (Declaration references) If $F$ contains an expression that references some local variable or local function~$d$ by name, let $\texttt{PARENT}(d)$ denote the declaration context containing~$d$. This is either $F$ itself, or some outer local context, because we found $d$ by unqualified lookup from~$F$. +\item (Declaration references) If $F$ contains an expression that references some local variable or local function~$d$ by name, let $\texttt{PARENT}(d)$ denote the declaration context containing~$d$. This is either $F$ itself, or an outer local context, because we found~$d$ by unqualified lookup from~$F$. If $\texttt{PARENT}(d)\neq F$, set $C\leftarrow C\cup\{d\}$. \item (Nested closures) If $F$ contains a nested closure expression or local function $F^\prime\!$, then any captures of $F^\prime$ not declared by~$F$ are also captures of~$F$. Recursively compute the captures of $F^\prime\!$. For each $d$ captured by $F^\prime$ such that $\texttt{PARENT}(d)\neq F$, set $C\leftarrow C\cup\{d\}$. -\item (Local types) If $F$ contains a local type, do not walk into the children of the local type. Local types do not capture values; we enforce this below. +\item (Local types) If $F$ contains a local type, do not walk into the children of the local type. Local types do not capture values; we enforce this in the next step. \item (Diagnose) After the recursive walk, consider each element $d\in C$. If the path of parent declaration contexts from $F$ to $d$ contains a nominal type declaration, we have an unsupported capture inside a local type. Diagnose an error. @@ -673,7 +670,7 @@ \section{Functions}\label{function decls} \end{minipage} \end{wrapfigure} -Local functions can also reference each other recursively. Consider the functions shown on the right and notice how \texttt{f()} and \texttt{g()} are mutually recursive. At runtime, we cannot represent this by forming two closure contexts where each one retains the other, because then neither context will ever be released. +Local functions can also reference each other recursively. Consider the functions shown on the right and notice how \texttt{f()} and \texttt{g()} are mutually recursive. At runtime, we cannot represent this by forming two closure contexts \index{reference counting}where each one retains the other, because then neither context will ever be released. We use a second algorithm to obtain the list of \IndexDefinition{lowered captures}\emph{lowered captures}, by replacing any captured local functions with their corresponding capture lists, repeating this until fixed point. The final list contains variable declarations only. With our example, the captures and lowered captures of each function are as follows: @@ -693,7 +690,7 @@ \section{Functions}\label{function decls} (As a special case, if a set of local functions reference each other but capture no other state from the outer declaration context, their lowered captures will be empty, so no runtime context allocation is necessary.) \begin{algorithm}[Compute lowered closure captures]\label{lowered closure captures algorithm} -As input takes the type-checked body of a \index{closure function}closure expression or \index{local function declaration}local function~$F$. Outputs the \index{set}set of variable declarations transitively captured by~$F$. +As input takes the type-checked body of a \index{closure function}closure expression or \index{local function declaration}local function~$F$. Outputs the \index{set!captures}set of variable declarations \index{transitive closure}transitively captured by~$F$. \begin{enumerate} \item Initialize the set $C\leftarrow\varnothing$; this will be the return value. Initialize an empty worklist. Initialize an empty visited set. Add $F$ to the worklist. \item If the worklist is empty, return $C$. Otherwise, remove the next function $F$ from the worklist. @@ -707,7 +704,7 @@ \section{Functions}\label{function decls} A \index{non-escaping function type}\emph{non-escaping} closure can capture a \texttt{var} or \texttt{inout} by simply capturing the memory address of the storage location. This is safe, because a non-escaping closure cannot outlive the dynamic extent of the storage location. -An \index{escaping function type}\texttt{@escaping} closure can also capture a \texttt{var}, which requires promoting the \texttt{var} to a \index{boxing}heap-allocated box with a reference count, with all variable accesses indirecting through the box. The below example can be found in every \index{Lisp}Lisp textbook. Each invocation of \texttt{counter()} allocates a new counter value on the heap, and returns three closures that reference the box; the box itself is completely hidden by the abstraction: +An \index{escaping function type}\texttt{@escaping} closure can also capture a \texttt{var}, which requires promoting the \texttt{var} to a \index{boxing}heap-allocated box with a \index{reference counting}reference count, with all variable accesses indirecting through the box. The below example can be found in every \index{Lisp}Lisp textbook. Each invocation of \texttt{counter()} allocates a new counter value on the heap, and returns three closures that reference the box; the box itself is completely hidden by the abstraction: \begin{Verbatim} func counter() -> (read: () -> Int, inc: () -> (), dec: () -> ()) { var count = 0 // promoted to a box @@ -715,7 +712,7 @@ \section{Functions}\label{function decls} } \end{Verbatim} -Before \IndexSwift{3.0}Swift~3.0, \texttt{@escaping} closures were permitted to capture \texttt{inout} parameters as well. To make this safe, the contents of the \texttt{inout} parameter were first copied into a heap-allocated box, which was captured by the closure. The contents of this box were then copied back before the function returned to its caller. This was essentially equivalent to doing the following transform, where we introduced \verb|_n| by hand: +Before \IndexSwift{3.0}Swift~3, \texttt{@escaping} closures were permitted to capture \texttt{inout} parameters as well. To make this safe, the contents of the \texttt{inout} parameter were first copied into a heap-allocated box, which was captured by the closure. The contents of this box were then copied back before the function returned to its caller. This was essentially equivalent to doing the following transform, where we introduce \verb|_n|: \begin{Verbatim} func changeValue(_ n: inout Int) { var _n = n // copy the value @@ -727,11 +724,11 @@ \section{Functions}\label{function decls} n = _n // write it back } \end{Verbatim} -In this scheme, if the closure outlives the dynamic extent of the \texttt{inout} parameter, any subsequent writes from within the closure are silently dropped. This was a source of user confusion, so Swift~3.0 banned \texttt{inout} captures from escaping closures instead~\cite{se0035}. +In this scheme, if the closure outlived the dynamic extent of the \texttt{inout} parameter, any subsequent writes from within the closure were silently dropped. This was a source of user confusion, so Swift~3 banned \texttt{inout} captures from escaping closures instead~\cite{se0035}. -In SIL, a closure is represented abstractly, as the result of this partial application operation. The mechanics of how the partially-applied function value actually stores its captures---the partially-applied arguments---are left up to IRGen. In IRGen, we allocate space for storing the captures (either on the stack for a \index{non-escaping function type}non-escaping function type, otherwise it's on the heap for \texttt{@escaping}), and then we emit a thunk, which takes a pointer to the context as an argument, unpacks the captured values from the context, and passes them as individual arguments to the original function. This thunk together with the context forms a \IndexDefinition{thick function}\emph{thick function} value which can then be passed around. +In SILGen, captured values introduce new parameters at the end of the function's parameter list, and a closure value is formed from a function with captures by \index{partial application}\emph{partially applying} the captured values. This produces a new function value with the required type, so the captured values are ``sliced off.'' In IRGen, we lower a partial application by allocating space for storing the captures (either on the stack for a \index{non-escaping function type}non-escaping function type, otherwise it's on the heap for \texttt{@escaping}), and then we emit a thunk, which takes a pointer to the \IndexDefinition{closure context}context as an argument, unpacks the captured values from the context, and passes them as individual arguments to the original function. This thunk together with the context forms a \index{thick function}\emph{thick function} value which represents the closure. -If nothing is captured (or if all captured values are zero bytes in size), we can pass a null pointer as the context, without performing a heap allocation. If there is exactly one captured value and this value can be represented as a reference-counted pointer, we can also elide the allocation by passing the captured value as the context pointer instead. For example, if a closure's single capture is an instance of a \index{class type}class type, nothing is allocated. If the single capture is the heap-allocated box that wraps a \texttt{var}, we must still allocate the box for the \texttt{var}, but we avoid a second context allocation. +If nothing is captured (or if all captured values are zero bytes in size), we can pass a null pointer as the context, without performing a heap allocation. If there is exactly one captured value and this value can be represented as a \index{reference count}reference-counted pointer, we can also elide the allocation by passing the captured value as the context pointer instead. For example, if a closure's single capture is an instance of a \index{class type}class type, nothing is allocated. If the single capture is the \index{heap-allocated box}heap-allocated box that wraps a \texttt{var}, we must still allocate the box for the \texttt{var}, but we avoid a second context allocation. \section{Storage}\label{other decls} @@ -739,9 +736,9 @@ \section{Storage}\label{other decls} \index{l-value type} Storage declarations represent locations that can be read and written. -\paragraph{Parameter declarations.} Functions, enum elements and subscripts can have parameter lists; each parameter is represented by a \IndexDefinition{parameter declaration}parameter declaration. Parameter declarations are a kind of variable declaration. +\paragraph{Parameter declarations.} Functions, enum elements, and subscripts can have parameter lists; each parameter is represented by a \IndexDefinition{parameter declaration}parameter declaration. Parameter declarations are a kind of variable declaration. -\paragraph{Variable declarations.} \IndexDefinition{variable declaration}Variables that are not parameters are introduced with \texttt{var} and \texttt{let}. A variable might either be \emph{stored} or \emph{computed}; the behavior of a computed variable is given by its accessor implementations. The interface type of a variable is the stored value type, possibly wrapped in a reference storage type if the variable is declared as \texttt{weak} or \texttt{unowned}. The \IndexDefinition{value interface type}\emph{value interface type} of a variable is the storage type without any wrapping. +\paragraph{Variable declarations.} \IndexDefinition{variable declaration}Variables that are not parameters are introduced with \texttt{var} and \texttt{let}. A variable is either \emph{stored} or \emph{computed}; the behavior of a computed variable is determined by its \index{accessor declaration}\emph{accessor declarations}. The \IndexDefinition{value interface type}\emph{value interface type} of a variable is the type of its value. The interface type of a variable is obtained by taking the type of its value, and possibly wrapping it in a \index{reference storage type}reference storage type if the variable is declared as \index{weak reference type}\texttt{weak} or \index{unowned reference type}\texttt{unowned}. \IndexDefinition{pattern binding declaration} \IndexDefinition{pattern binding entry} @@ -757,11 +754,11 @@ \section{Storage}\label{other decls} \begin{Verbatim} let x = 123 \end{Verbatim} -We can write a more complex pattern, for example storing the first element of a tuple while discarding the second element: +We can write a more complex pattern, for example binding the first element of a tuple while discarding the second element: \begin{Verbatim} let (x, _) = (123, "hello") \end{Verbatim} -Here is a pattern binding declaration with a single entry, whose pattern delares two variables \texttt{x} and \texttt{y}: +Here is a pattern binding declaration with a single entry, whose pattern declares two variables \texttt{x} and \texttt{y}: \begin{Verbatim} let (x, y) = (123, "hello") \end{Verbatim} @@ -795,9 +792,9 @@ \section{Storage}\label{other decls} \paragraph{Subscript declarations.} \IndexDefinition{subscript declaration}Subscripts are introduced with the \texttt{subscript} keyword. They can only appear as members of nominal types and extensions. The interface type of a subscript is a function type taking the index parameters and returning the storage type. The value interface type of a subscript is just the storage type. For historical reasons, the interface type of a subscript does not include the \tSelf\ clause, the way that method declarations do. Subscripts can either be instance or static members; static subscripts were introduced in \IndexSwift{5.1}Swift~5.1 \cite{se0254}. \paragraph{Accessor declarations.} -Each storage declaration has a \IndexDefinition{accessor declaration}set of accessor declarations, which are a special kind of function declaration. The accessor declarations are siblings of the storage declaration in the declaration context hierarchy. The interface type of an accessor depends the accessor kind. For example, getters return the value, and setters take the new value as a parameter. Property accessors do not take any other parameters; subscript accessors also take the subscript's index parameters. We will not need any more details about accessor and storage declarations in this book. +Each storage declaration has a \IndexDefinition{accessor declaration}set of accessor declarations, which are a special kind of function declaration. The accessor declarations are siblings of the storage declaration in the declaration context hierarchy. The interface type of an accessor depends the accessor kind. For example, getters return the value, and setters take the new value as a parameter. Variable accessors do not take any other parameters; subscript accessors also take the subscript's index parameters. We will not need any more details about accessor and storage declarations in this book. -\section{Source Code Reference}\label{declarationssourceref} +\section{Source Code Reference}\label{src:declarations} Key source files: \begin{itemize} @@ -815,7 +812,7 @@ \section{Source Code Reference}\label{declarationssourceref} \IndexSource{declaration} \apiref{Decl}{class} -Base class of declarations. \FigRef{declhierarchy} shows various subclasses, which correspond to the different kinds of declarations defined previously in this chapter. +Base class of declarations. \FigRef{declhierarchy} shows various subclasses, which correspond to the different kinds of declarations described previously in this chapter. \begin{figure}\captionabove{The \texttt{Decl} class hierarchy}\label{declhierarchy} \begin{center} \begin{tikzpicture}[% @@ -898,7 +895,7 @@ \section{Source Code Reference}\label{declarationssourceref} \end{figure} \IndexSource{synthesized declaration} -Instances are always allocated in the permanent arena of the \texttt{ASTContext}, either when the declaration is parsed or synthesized. The top-level \verb|isa<>|, \verb|cast<>| and \verb|dyn_cast<>| template functions support dynamic casting from \texttt{Decl *} to any of its subclasses. +Instances are always allocated in the permanent arena of the \texttt{ASTContext}, either when the declaration is parsed, or synthesized. The top-level \verb|isa<>|, \verb|cast<>| and \verb|dyn_cast<>| template functions support dynamic casting from \texttt{Decl *} to any of its subclasses. \begin{itemize} \item \texttt{getDeclContext()} returns the parent \texttt{DeclContext} of this declaration. \item \texttt{getInnermostDeclContext()} if this declaration is also a declaration context, returns the declaration as a \texttt{DeclContext}, otherwise returns the parent \texttt{DeclContext}. @@ -948,20 +945,17 @@ \section{Source Code Reference}\label{declarationssourceref} \apiref{ValueDecl}{class} Base class of named declarations. -\IndexSource{interface type} \begin{itemize} \item \texttt{getDeclName()} returns the declaration's name. -\item \texttt{getInterfaceType()} returns the declaration's interface type. +\item \texttt{getInterfaceType()} returns the declaration's \IndexSource{interface type}interface type. \end{itemize} \subsection*{Type Declarations} -\IndexSource{type declaration} -\IndexSource{declared interface type} \apiref{TypeDecl}{class} -Base class of type declarations. +Base class of \IndexSource{type declaration}type declarations. \begin{itemize} -\item \texttt{getDeclaredInterfaceType()} returns the type of an instance of this declaration. +\item \texttt{getDeclaredInterfaceType()} returns the \IndexSource{declared interface type}type of an instance of this declaration. \end{itemize} \IndexSource{nominal type declaration} @@ -973,12 +967,12 @@ \subsection*{Type Declarations} \apiref{NominalTypeDecl}{class} Base class of nominal type declarations. Also a \texttt{DeclContext}. \begin{itemize} -\item \texttt{getSelfInterfaceType()} returns the self interface type of the declaration context (\SecRef{function decls}). Different from the declared interface type for protocols, where the declared interface type is a nominal but the declared self type is the generic parameter \tSelf. -\item \texttt{getDeclaredType()} returns the type of an instance of this declaration, without generic arguments. If the declaration is generic, this is an \IndexSource{unbound generic type}unbound generic type. If this declaration is not generic, this is a nominal type. This is occasionally used in diagnostics instead of the declared interface type, when the generic parameter types are irrelevant. +\item \texttt{getSelfInterfaceType()} returns this nominal type declaration's self interface type (\SecRef{function decls}). This is the same as the declared interface type, except if this is a protocol declaration. A protocol's declared interface type is a nominal type, but its self interface type is the generic parameter \tSelf. +\item \texttt{getDeclaredType()} returns the type of an instance of this declaration, without generic arguments. If the declaration is generic, this is an \IndexSource{unbound generic type}unbound generic type. If this declaration is not generic, this is the same as the declared interface type. This is occasionally used in diagnostics instead of the declared interface type, when the generic parameter types are irrelevant. \end{itemize} \IndexSource{type alias declaration} -\IndexSource{underlying type} +\IndexSource{underlying type!of type alias declaration} \apiref{TypeAliasDecl}{class} A type alias declaration. Also a \texttt{DeclContext}. \begin{itemize} @@ -990,7 +984,11 @@ \subsection*{Declaration Contexts} \IndexSource{declaration context} \apiref{DeclContext}{class} -Base class for declaration contexts. The top-level \verb|isa<>|, \verb|cast<>| and \verb|dyn_cast<>| template functions also support dynamic casting from a \texttt{DeclContext *} to any of its subclasses. See also \SecRef{genericsigsourceref}. +Base class for declaration contexts. See also \SecRef{src:generic signatures}. + +The top-level \verb|isa<>|, \verb|cast<>| and \verb|dyn_cast<>| template functions also support dynamic casting from a \texttt{DeclContext *} to any of its subclasses. + +\pagebreak \IndexSource{closure expression} \IndexSource{source file} @@ -1002,7 +1000,7 @@ \subsection*{Declaration Contexts} \item A few other less interesting ones found in the source. \end{itemize} -Utilities for understanding the nesting of declaration contexts: +Methods for understanding the nesting of declaration contexts: \begin{itemize} \item \texttt{getAsDecl()} if declaration context is also a declaration, returns the declaration, otherwise returns \texttt{nullptr}. \item \texttt{getParent()} returns the parent declaration context. @@ -1047,7 +1045,7 @@ \subsection*{Generic Contexts} \IndexSource{generic declaration} \IndexSource{parsed generic parameter list} \apiref{GenericContext}{class} -Subclass of \texttt{DeclContext}. Base class for declaration kinds which can have a generic parameter list. See also \SecRef{genericsigsourceref}. +Subclass of \texttt{DeclContext}. Base class for declaration kinds which can have a generic parameter list. See also \SecRef{src:generic signatures}. \begin{itemize} \item \texttt{getParsedGenericParams()} returns the declaration's parsed generic parameter list, or \texttt{nullptr}. \item \texttt{getGenericParams()} returns the declaration's full generic parameter list, which includes any implicit generic parameters. Evaluates a \texttt{GenericParamListRequest}. @@ -1056,7 +1054,7 @@ \subsection*{Generic Contexts} \item \texttt{getTrailingWhereClause()} returns the declaration's trailing \texttt{where} clause, or \texttt{nullptr}. \end{itemize} -Trailing \texttt{where} clauses are not preserved in serialized generic contexts. Most code should look at \texttt{GenericContext::getGenericSignature()} instead (\SecRef{genericsigsourceref}), except when actually building the generic signature. +Trailing \texttt{where} clauses are not preserved in serialized generic contexts. Most code should look at \texttt{GenericContext::getGenericSignature()} instead (\SecRef{src:generic signatures}), except when actually building the generic signature. \IndexSource{generic parameter list} @@ -1089,7 +1087,7 @@ \subsection*{Generic Contexts} \item \texttt{getInherited()} returns the generic parameter declaration's \IndexSource{inheritance clause!generic parameter declaration}inheritance clause. \end{itemize} -Inheritance clauses are not preserved in serialized generic parameter declarations. Requirements stated on generic parameter declarations are part of the corresponding generic context's generic signature, so except when actually building the generic signature, most code uses \texttt{GenericContext::getGenericSignature()} instead (\SecRef{genericsigsourceref}). +Inheritance clauses are not preserved in serialized generic parameter declarations. Requirements stated on generic parameter declarations are part of the corresponding generic context's generic signature, so except when actually building the generic signature, most code uses \texttt{GenericContext::getGenericSignature()} instead (\SecRef{src:generic signatures}). \apiref{GenericTypeParamType}{class} A \IndexSource{generic parameter type}generic parameter type. @@ -1119,6 +1117,7 @@ \subsection*{Generic Contexts} \end{itemize} \apiref{RequirementReprKind}{enum class} +Return type of \texttt{RequirementRepr::getKind()}. \begin{itemize} \item \texttt{RequirementRepr::TypeConstraint} \item \texttt{RequirementRepr::SameType} @@ -1126,7 +1125,7 @@ \subsection*{Generic Contexts} \end{itemize} \apiref{WhereClauseOwner}{class} -Represents a reference to some set of requirement representations which can be resolved to requirements, for example a trailing \texttt{where} clause. This is used by various requests, such as the \texttt{RequirementRequest} below, and the \texttt{InferredGenericSignatureRequest} in \SecRef{buildinggensigsourceref}. +Represents a reference to some set of requirement representations which can be resolved to requirements, for example a trailing \texttt{where} clause. This is used by various requests, such as the \texttt{RequirementRequest} below, and the \texttt{InferredGenericSignatureRequest} in \SecRef{src:building generic signatures}. \begin{itemize} \item \texttt{getRequirements()} returns an array of \texttt{RequirementRepr}. \item \texttt{visitRequirements()} resolves each requirement representation and invokes a callback with the \texttt{RequirementRepr} and resolved \texttt{Requirement}. @@ -1146,9 +1145,9 @@ \subsection*{Generic Contexts} \item \texttt{getInherited()} returns this protocol's \IndexSource{inheritance clause!protocol declaration}inheritance clause. \end{itemize} -Trailing \texttt{where} clauses and inheritance clauses are not preserved in serialized protocol declarations. Except when actually building the requirement signature, most code uses \texttt{ProtocolDecl::getRequirementSignature()} instead (\SecRef{genericsigsourceref}). +Trailing \texttt{where} clauses and inheritance clauses are not preserved in serialized protocol declarations. Except when actually building the requirement signature, most code uses \texttt{ProtocolDecl::getRequirementSignature()} instead (\SecRef{src:generic signatures}). -The last four utility methods operate on the requirement signature, so are safe to use on deserialized protocols: +The last three utility methods operate on the requirement signature, so are safe to use on deserialized protocols: \begin{itemize} \item \texttt{getInheritedProtocols()} returns an array of all protocols directly \IndexSource{inherited protocol}inherited by this protocol, computed from the inheritance clause. \item \texttt{inheritsFrom()} determines if this protocol inherits from the given protocol, possibly transitively. @@ -1162,23 +1161,21 @@ \subsection*{Generic Contexts} \item \texttt{getInherited()} returns this associated type's \IndexSource{inheritance clause!associated type declaration}inheritance clause. \end{itemize} -Trailing \texttt{where} clauses and inheritance clauses are not preserved in serialized associated type declarations. Requirements on associated types are part of a protocol's requirement signature, so except when actually building the requirement signature, most code uses \texttt{ProtocolDecl::getRequirementSignature()} instead (\SecRef{genericsigsourceref}). +Trailing \texttt{where} clauses and inheritance clauses are not preserved in serialized associated type declarations. Requirements on associated types are part of a protocol's requirement signature, so except when actually building the requirement signature, most code uses \texttt{ProtocolDecl::getRequirementSignature()} instead (\SecRef{src:generic signatures}). \subsection*{Function Declarations} -\IndexSource{function declaration} -\IndexSource{method self parameter} \apiref{AbstractFunctionDecl}{class} -Base class of function-like declarations. Also a \texttt{DeclContext}. +Base class of \IndexSource{function declaration}function-like declarations. Also a \texttt{DeclContext}. \begin{itemize} -\item \texttt{getImplicitSelfDecl()} returns the implicit \texttt{self} parameter, if there is one. +\item \texttt{getImplicitSelfDecl()} returns the implicit \IndexSource{self parameter@\texttt{self} parameter}\texttt{self} parameter if this is a method, \texttt{nullptr} otherwise. \item \texttt{getParameters()} returns the function's parameter list. \item \texttt{getMethodInterfaceType()} returns the type of a method without the \tSelf\ clause. \item \texttt{getResultInterfaceType()} returns the return type of this function or method. \end{itemize} \apiref{ParameterList}{class} -The parameter list of \texttt{AbstractFunctionDecl}, \texttt{EnumElementDecl} or \texttt{SubscriptDecl}. +The parameter list of \texttt{AbstractFunctionDecl}, \texttt{EnumElementDecl}, or \texttt{SubscriptDecl}. \begin{itemize} \item \texttt{size()} returns the number of parameters. \item \texttt{get()} returns the \texttt{ParamDecl} at the given index. @@ -1186,7 +1183,7 @@ \subsection*{Function Declarations} \IndexSource{constructor declaration} \apiref{ConstructorDecl}{class} -Constructor declarations. +A constructor declaration. \begin{itemize} \item \texttt{getInitializerInterfaceType()} returns the initializer interface type, used when type checking \texttt{super.init()} delegation. \end{itemize} @@ -1210,8 +1207,8 @@ \subsection*{Closure Conversion} \apiref{CaptureInfoRequest::evaluate}{method} Computes the \texttt{CaptureInfo}. This is \AlgRef{closure captures algorithm}. -\apiref{TypeLowering::getLoweredLocalCaptures()}{method} -Computes the lowered \texttt{CaptureInfo}. This is \AlgRef{lowered closure captures algorithm}. +\apiref{TypeConverter::getLoweredLocalCaptures()}{method} +Computes the lowered \texttt{CaptureInfo}. This is \AlgRef{lowered closure captures algorithm}. See \SecRef{sec:type lowering} and \SecRef{src:substitution maps} for a discussion of SIL type lowering and the \texttt{TypeConverter}. \subsection*{Storage Declarations} @@ -1232,4 +1229,8 @@ \subsection*{Storage Declarations} \apiref{SubscriptDecl}{class} Subclass of \texttt{AbstractStorageDecl} and \texttt{DeclContext}. +\IndexSource{accessor declaration} +\apiref{AccessorDecl}{class} +Subclass of \texttt{AbstractFunctionDecl}. + \end{document} diff --git a/docs/Generics/chapters/existential-types.tex b/docs/Generics/chapters/existential-types.tex index 1b22ce698383e..7e5febeafee77 100644 --- a/docs/Generics/chapters/existential-types.tex +++ b/docs/Generics/chapters/existential-types.tex @@ -2,7 +2,7 @@ \begin{document} -\chapter[]{Existential Types}\label{existentialtypes} +\chapter[]{Existential Types}\label{chap:existential types} \ifWIP @@ -14,9 +14,9 @@ This feature has an interesting history. The protocols that could be used as types were initially restricted to those without associated types, or requirements with \tSelf\ in non-covariant position (the latter rules out \texttt{Equatable} for example). This meant that the implementation of existential types was at first rather disjoint from generics. As existential types gained the ability to state more complex constraints over time, the two sides of protocols converged. -Protocol compositions were originally written as \texttt{protocol} for a value of a type conforming to both protocols \texttt{P} and \texttt{Q}. The modern syntax for protocol compositions \texttt{P~\&~Q} was introduced in \IndexSwift{3.0}Swift 3 \cite{se0095}. Protocol compositions with superclass terms were introduced in \IndexSwift{4.0}Swift 4 \cite{se0156}. The spelling \texttt{any P} of an existential type, to distinguish from \texttt{P} the constraint type, was introduced in \IndexSwift{5.6}Swift 5.6 \cite{se0355}. This was followed by \IndexSwift{5.7}Swift 5.7 allowing all protocols to be used as existential types \cite{se0309}, and introducing implicit opening of existential types \cite{se0352}, and constrained existential types \cite{se0353}. +Protocol compositions were originally written as \texttt{protocol} for a value of a type conforming to both protocols \texttt{P} and \texttt{Q}. The modern syntax for protocol compositions \texttt{P~\&~Q} was introduced in \IndexSwift{3.0}Swift 3 \cite{se0095}. Protocol compositions with superclass terms were introduced in \IndexSwift{4.0}Swift 4 \cite{se0156}. The spelling \texttt{any P} of an existential type, to distinguish from \texttt{P} the constraint type, was introduced in \IndexSwift{5.6}Swift 5.6 \cite{se0335}. This was followed by \IndexSwift{5.7}Swift 5.7 allowing all protocols to be used as existential types \cite{se0309}, and introducing implicit opening of existential types \cite{se0352}, and constrained existential types \cite{se0353}. -An existential type is written with the \texttt{any} keyword followed by a constraint type, which is a concept previously defined in \SecRef{requirements}. For aesthetic reasons, the \texttt{any} keyword can be omitted if the constraint type is \texttt{Any} or \texttt{AnyObject}, since \texttt{any~Any} or \texttt{any~AnyObject} looks funny. For backwards compatibility, \texttt{any} can also be omitted if the protocols appearing in the constraint type do not have any associated types or requirements with \tSelf\ in non-covariant position. +An existential type is written with the \texttt{any} keyword followed by a constraint type, which is a concept previously defined in \SecRef{sec:requirements}. For aesthetic reasons, the \texttt{any} keyword can be omitted if the constraint type is \texttt{Any} or \texttt{AnyObject}, since \texttt{any~Any} or \texttt{any~AnyObject} looks funny. For backwards compatibility, \texttt{any} can also be omitted if the protocols appearing in the constraint type do not have any associated types or requirements with \tSelf\ in non-covariant position. \paragraph{Type representation} Existential types are instances of \texttt{ExistentialType}, which wraps a constraint type. Even in the cases where \texttt{any} can be omitted, type resolution will wrap the constraint type in \texttt{ExistentialType} when resolving a type in a context where the type of a value is expected. If the constraint type is a protocol composition with a superclass term, or a parameterized protocol type, arbitrary types can appear as structural components of the constraint type. This means that the constraint type of an existential type is subject to substitution by \texttt{Type::subst()}. For example, the interface type of the properties \texttt{foo} and \texttt{bar} below are existential types containing type parameters: @@ -33,14 +33,18 @@ The special \texttt{Any} type can store an arbitrary Swift value. This ``absence of constraints'' is represented as an existential type with an empty protocol composition as the constraint type. The \texttt{ASTContext::getAnyExistentialType()} method returns this type. -The \texttt{AnyObject} type which can store an arbitrary reference-counted pointer is an existential type with a special protocol composition storing a layout constraint as the constraint type. The \texttt{ASTContext::getAnyObjectType()} method returns this type. The \texttt{AnyClass} type in the standard library is a type alias for the existential metatype of \texttt{AnyObject}: +The \texttt{AnyObject} type which can store an arbitrary \index{reference count}reference-counted pointer is an existential type with a special protocol composition storing a layout constraint as the constraint type. The \texttt{ASTContext::getAnyObjectType()} method returns this type. The \texttt{AnyClass} type in the standard library is a type alias for the existential metatype of \texttt{AnyObject}: \begin{Verbatim} typealias AnyClass = AnyObject.Type \end{Verbatim} \fi -\section[]{Opened Existentials}\label{open existential archetypes} +\section[]{Existential Archetypes}\label{open existential archetypes} + +\begin{algorithm}[Apply substitution map to existential archetype]\label{existential archetype subst} +Hello. +\end{algorithm} \ifWIP @@ -103,10 +107,10 @@ \end{quote} In both signatures, the interface type of the existential is \texttt{\ttgp{1}{0}}. \end{example} -Recall from \ChapRef{genericenv} that there are three kinds of generic environments. We've seen primary generic environments, which are associated with generic declarations. We also saw opaque generic environments, which are instantiated from an opaque return declaration and substitution map, in \SecRef{opaquearchetype}. Now, it's time to introduce the third kind, the opened generic environment. An opened generic environment is created from an opened existential signature of the first kind (with no parent generic signature). The archetypes of an opened generic environment are \emph{opened archetypes}. +Recall from \ChapRef{chap:archetypes} that there are three kinds of generic environments. We've seen primary generic environments, which are associated with generic declarations. We also saw opaque generic environments, which are instantiated from an opaque result declaration and substitution map, in \SecRef{opaquearchetype}. Now, it's time to introduce the third kind, the opened generic environment. An opened generic environment is created from an opened existential signature of the first kind (with no parent generic signature). The archetypes of an opened generic environment are \emph{existential archetypes}. \index{call expression} -When the expression type checker encounters a call expression where an argument of existential type is passed to a parameter of generic parameter type, the existential value is \emph{opened}, projecting the value and assigning it a new opened archetype from a fresh opened generic environment. The call expression is rewritten by wrapping the entire call in an \texttt{OpenExistentialExpr}, which stores two sub-expressions. The first sub-expression is the original call argument, which evaluates to the value of existential type. The payload value and opened archetype is scoped to the second sub-expression, which consumes the payload value. The call argument is replaced with a \texttt{OpaqueValueExpr}, which has the opened archetype type. The opened archetype also becomes the replacement type for the generic parameter in the call's substitution map. +When the expression type checker encounters a call expression where an argument of existential type is passed to a parameter of generic parameter type, the existential value is \emph{opened}, projecting the value and assigning it a new existential archetype from a fresh opened generic environment. The call expression is rewritten by wrapping the entire call in an \texttt{OpenExistentialExpr}, which stores two sub-expressions. The first sub-expression is the original call argument, which evaluates to the value of existential type. The payload value and existential archetype is scoped to the second sub-expression, which consumes the payload value. The call argument is replaced with a \texttt{OpaqueValueExpr}, which has the existential archetype type. The existential archetype also becomes the replacement type for the generic parameter in the call's substitution map. For example, if \texttt{animal} is a value of type \texttt{any Animal}, the expression \texttt{animal.eat()} calling a protocol method looks like this before opening: \begin{quote} @@ -147,15 +151,13 @@ child { node [class] {\texttt{\vphantom{p}DeclRefExpr:\ animal}}}; \end{tikzpicture} \end{quote} -Not shown in this picture is that the type of the \texttt{OpaqueValueExpr} is an opened archetype type, and the substitution map replacing \ttgp{0}{0} with this opened archetype is stored in the \texttt{DeclRefExpr} for \texttt{Animal.eat()}. +Not shown in this picture is that the type of the \texttt{OpaqueValueExpr} is an existential archetype type, and the substitution map replacing \ttgp{0}{0} with this existential archetype is stored in the \texttt{DeclRefExpr} for \texttt{Animal.eat()}. -An existential value can store different concrete types dynamically, so each call site where an existential value is opened must produce a new opened archetype from a fresh opened generic environment. Opened generic environments are keyed by the opened existential signature together with a unique ID: -\[\left(\,\ttbox{GenericSignature}\otimes \mathboxed{Unique ID}\,\right) \rightarrow \mathboxed{Opened \texttt{GenericEnvironment}}\] The \texttt{GenericEnvironment::forOpenedExistential()} method creates a fresh opened generic environment, should you have occasion to do this yourself outside of the expression type checker. \fi -\section[]{Existential Layouts}\label{existentiallayouts} +\section[]{Runtime Representation}\label{existentiallayouts} \ifWIP @@ -168,7 +170,7 @@ \item[\texttt{getLayoutConstraint()}] Returns the existential's layout constraint, if there is one. This is the \texttt{AnyObject} layout constraint if the existential can store any Swift or \index{Objective-C}Objective-C class instance. If the superclass bound is further known to be a Swift-native class, this is the stricter \verb|_NativeClass| layout constraint. \end{description} -Some of the above methods might look familiar from the description of generic signature queries in \SecRef{genericsigqueries}, or the local requirements of archetypes in \ChapRef{genericenv}. Indeed, for the most part, the same information can be recovered by asking questions about the existential's interface type in the opened existential signature, or if you have an opened archetype handy, by calling similar methods on the archetype. There is one important difference though. In a generic signature, the minimization algorithm drops protocol conformance requirements which are satisfied by a superclass bound. This is true with opened existential signatures as well. However, for historical reasons, the same transformation is not applied when computing an existential layout. This means that the list of protocols in \texttt{ExistentialLayout::getProtocols()} may include more protocols than the \texttt{getConformsTo()} query on the opened existential signature. It is the former list of protocols coming from the \texttt{ExistentialLayout} that informs the runtime representation of the existential type \texttt{any C \& P}. If \index{ABI}ABI stability was not a concern, this would be reworked to match the behavior of requirement minimization. +Some of the above methods might look familiar from the description of generic signature queries in \SecRef{genericsigqueries}, or the local requirements of archetypes in \ChapRef{chap:archetypes}. Indeed, for the most part, the same information can be recovered by asking questions about the existential's interface type in the opened existential signature, or if you have an existential archetype handy, by calling similar methods on the archetype. There is one important difference though. In a generic signature, the minimization algorithm drops protocol conformance requirements which are satisfied by a superclass bound. This is true with opened existential signatures as well. However, for historical reasons, the same transformation is not applied when computing an existential layout. This means that the list of protocols in \texttt{ExistentialLayout::getProtocols()} may include more protocols than the \texttt{getConformsTo()} query on the opened existential signature. It is the former list of protocols coming from the \texttt{ExistentialLayout} that informs the runtime representation of the existential type \texttt{any C \& P}. If \index{ABI}ABI stability was not a concern, this would be reworked to match the behavior of requirement minimization. \begin{example} Consider these definitions: @@ -205,7 +207,7 @@ \end{tabular} \end{quote} -\paragraph{Class representation} This representation is used when the concrete type is known to be a reference-counted pointer. Instead of a three-word value buffer, only a single pointer is stored, and the type metadata does not need to be separately stored since it can be recovered from the first word of the heap allocation (the ``isa pointer''). The trailing witness tables are stored as in the opaque representation. +\paragraph{Class representation} This representation is used when the concrete type is known to be a \index{reference count}reference-counted pointer. Instead of a three-word value buffer, only a single pointer is stored, and the type metadata does not need to be separately stored since it can be recovered from the first word of the heap allocation (the ``isa pointer''). The trailing witness tables are stored as in the opaque representation. \begin{quote} \begin{tabular}{|l|l|} @@ -230,7 +232,7 @@ \end{tabular} \end{quote} -\paragraph{Error representation} A special representation only used for types conforming to \texttt{Error}. This representation consists of a single reference-counted pointer. The heap allocation is layout-compatible with the \index{Objective-C}Objective-C \texttt{NSError} class. The concrete value and the witness table for the conformance is stored inside the heap allocation. +\paragraph{Error representation} A special representation only used for types conforming to \texttt{Error}. This representation consists of a single \index{reference count}reference-counted pointer. The heap allocation is layout-compatible with the \index{Objective-C}Objective-C \texttt{NSError} class. The concrete value and the witness table for the conformance is stored inside the heap allocation. \begin{quote} \begin{tabular}{|l|l|} @@ -255,11 +257,9 @@ \end{tabular} \end{quote} -\section[]{Generalization Signatures} - \index{metatype type} \index{runtime type metadata} -Swift metatype values have a notion of equality. While metatypes are not nominal types, and cannot conform to protocols, in particular the \texttt{Equatable} protocol,\footnote{but maybe one day...} the standard library nevertheless defines an overload of the \texttt{==} operator taking a pair of \texttt{Any.Type} values. You might recall from the previous section that \texttt{Any.Type} is an existential metatype with no constraints, so it is represented is a single pointer to runtime type metadata. Equality of metatypes can therefore implemented as pointer equality. What this means is that runtime type metadata must be unique by construction. Frozen fixed-size types such as \texttt{Int} have statically-emitted metadata which is directly referenced thereafter, so uniqueness is trivial. On the other hand, generic nominal types and structural types such as functions or tuples can be instantiated with arbitrary generic arguments. Since the arguments are recursively guaranteed to be unique, the metadata instantiation function for each kind of type constructor maintains a cache mapping all generic arguments seen so far to instantiated types. Each new instantiation is only constructed once for a given set of generic arguments, guaranteeing uniqueness. +Swift metatype values have a notion of equality. While metatypes are not nominal types, and cannot conform to protocols, in particular the \texttt{Equatable} protocol,\footnote{but maybe one day...} the standard library nevertheless defines an overload of the \texttt{==} operator taking a pair of \texttt{Any.Type} values. You might recall from the previous section that \texttt{Any.Type} is an existential metatype with no constraints, so it is represented is a single pointer to runtime type metadata. Equality of metatypes can therefore implemented as pointer equality. What this means is that runtime type metadata must be unique by construction. Frozen fixed-size types such as \texttt{Int} have statically-emitted metadata which is directly referenced thereafter, so uniqueness is trivial. On the other hand, generic nominal types and \index{structural type}structural types such as functions or tuples can be instantiated with arbitrary generic arguments. Since the arguments are recursively guaranteed to be unique, the metadata instantiation function for each kind of type constructor maintains a cache mapping all generic arguments seen so far to instantiated types. Each new instantiation is only constructed once for a given set of generic arguments, guaranteeing uniqueness. \index{mangling} \begin{listing}\captionabove{Example demonstrating uniqueness of runtime metadata}\label{metadataunique} @@ -453,6 +453,8 @@ \ifWIP +\cite{sr55} + A common source of confusion for beginners is that in general, protocols in Swift do not conform to themselves. The layperson's explanation of this is that an existential type is a ``box'' for storing a value with an unknown concrete type. If the box requires that the value's type conforms to a protocol, you can't fit the ``box itself'' inside of another box, because it has the wrong shape. This explanation will be made precise in this section. For many purposes, implicit existential opening introduced in \IndexSwift{5.7}Swift 5.7 \cite{se0352} offers an elegant way around this problem: @@ -468,7 +470,7 @@ } } \end{Verbatim} -The above code type checks in Swift 5.7 because the replacement type for the generic parameter \texttt{A} of \texttt{careForAnimal()} becomes the opened archetype from the payload of \texttt{animal}. The lack of self-conformance can still be observed in Swift 5.7 when a generic parameter type is a structural sub-component of another type: +The above code type checks in Swift 5.7 because the replacement type for the generic parameter \texttt{A} of \texttt{careForAnimal()} becomes the existential archetype from the payload of \texttt{animal}. The lack of self-conformance can still be observed in Swift 5.7 when a generic parameter type is a structural sub-component of another type: \begin{Verbatim} func petAnimals(_ animals: [A]) {...} @@ -493,7 +495,7 @@ doStuff([value]) // okay \end{Verbatim} -\paragraph{AnyObject} The \texttt{AnyObject} type is an existential where the constraint type requires the stored value to be a single reference-counted pointer. The \texttt{AnyObject} existential does not carry any witness tables, so the existential itself has the same representation as its payload. For this reason, the \texttt{AnyObject} existential type satisfies the \texttt{AnyObject} layout constraint. The calling convention of \texttt{doStuff()} takes the type metadata for \texttt{T}, and an array of reference-counted pointers. Passing the type metadata of \texttt{AnyObject} itself for \texttt{T}, and an array of \texttt{AnyObject} values works just fine: +\paragraph{AnyObject} The \texttt{AnyObject} type is an existential where the constraint type requires the stored value to be a single \index{reference count}reference-counted pointer. The \texttt{AnyObject} existential does not carry any witness tables, so the existential itself has the same representation as its payload. For this reason, the \texttt{AnyObject} existential type satisfies the \texttt{AnyObject} layout constraint. The calling convention of \texttt{doStuff()} takes the type metadata for \texttt{T}, and an array of reference-counted pointers. Passing the type metadata of \texttt{AnyObject} itself for \texttt{T}, and an array of \texttt{AnyObject} values works just fine: \begin{Verbatim} func doStuff(_: [T]) {...} @@ -506,7 +508,7 @@ \paragraph{Sendable protocol} The \texttt{Sendable} protocol does not have a witness table or any requirements, so \texttt{Sendable} existentials trivially conform to themselves. -\paragraph{Certain @objc protocols} \index{Objective-C}Objective-C protocols do not use witness tables to dispatch method calls, so an existential type where all protocols are \texttt{@objc} has the same representation as \texttt{AnyObject}---a single reference-counted pointer. This allows protocol compositions where all terms are \texttt{@objc} protocols to conform to themselves as long as each protocol satisfies some additional conditions: +\paragraph{Certain @objc protocols} \index{Objective-C}Objective-C protocols do not use witness tables to dispatch method calls, so an existential type where all protocols are \texttt{@objc} has the same representation as \texttt{AnyObject}---a single \index{reference count}reference-counted pointer. This allows protocol compositions where all terms are \texttt{@objc} protocols to conform to themselves as long as each protocol satisfies some additional conditions: \begin{enumerate} \item Each inherited protocol must recursively self-conform. \item The protocol must be an \texttt{@objc} protocol. diff --git a/docs/Generics/chapters/extensions.tex b/docs/Generics/chapters/extensions.tex index e68500cc2d641..af1af4b6a51ac 100644 --- a/docs/Generics/chapters/extensions.tex +++ b/docs/Generics/chapters/extensions.tex @@ -2,9 +2,9 @@ \begin{document} -\chapter{Extensions}\label{extensions} +\chapter{Extensions}\label{chap:extensions} -\lettrine{E}{xtensions add members} to existing nominal type declarations. We refer to this nominal type declaration as the \IndexDefinition{extended type}\emph{extended type} of the extension. The extended type may have been declared in the same source file, another source file of the main module, or in some other module. Extensions themselves are \IndexDefinition{extension declaration}declarations, but they are \emph{not} \index{value declaration}value declarations in the sense of \ChapRef{decls}, meaning the extension itself cannot be referenced by name. Instead, the members of an extension are referenced as members of the extended type, visible to \index{qualified lookup}qualified name lookup. +\lettrine{E}{xtensions add members} to existing nominal type declarations. We refer to this nominal type declaration as the \IndexDefinition{extended type}\emph{extended type} of the extension. The extended type can originate from the same \index{source file}source file, another source file of the main module, or most generally, some other module. Extensions themselves are \IndexDefinition{extension declaration}declarations, but they are \emph{not} \index{value declaration}value declarations in the sense of \ChapRef{chap:decls}, meaning the extension itself cannot be referenced by name. Instead, the members of an extension become visible as members of the extended type to \index{qualified lookup}qualified name lookup. Consider a module containing a pair of struct declarations, \texttt{Outer} and \texttt{Outer.Middle}: \begin{Verbatim} @@ -22,9 +22,9 @@ \chapter{Extensions}\label{extensions} If a third module subsequently imports both the first and second module, it will see the members \texttt{Outer.Middle.foo()} and \texttt{Outer.Middle.Inner} just as if they were defined inside \texttt{Outer.Middle} itself. \paragraph{Extensions and generics.} -The \index{generic parameter declaration}generic parameters of the extended type are visible in the \index{scope tree}scope of the extension's body. Each extension has a generic signature, which describes the \index{interface type!extension member}interface types of its members. The generic signature of an ``unconstrained'' extension is the same as that of the extended type. Extensions can impose additional requirements on their generic parameters via a \Index{where clause@\texttt{where} clause!extension declaration}\texttt{where} clause; this declares a \emph{constrained extension} with its own generic signature (\SecRef{constrained extensions}). An extension can also state a conformance to a protocol, which is represented as \index{normal conformance}normal conformance visible to global conformance lookup. If the extension is unconstrained, this is essentially equivalent to stating a conformance on the extended type. If the extension is constrained, the conformance becomes a \emph{conditional conformance} (\SecRef{conditional conformance}). +The \index{generic parameter declaration}generic parameters of the extended type are visible in the \index{scope tree}scope of the extension's body. Each extension has a generic signature, which describes the \index{interface type!extension member}interface types of its members. An \emph{unconstrained extension} has the same generic signature as the extended type. Extensions can also impose additional requirements on their generic parameters via a \Index{where clause@\texttt{where} clause!extension declaration}\texttt{where} clause; this declares a \emph{constrained extension} with its own generic signature (\SecRef{constrained extensions}). Finally, an extension can state a conformance to a protocol, which as we recall, declares a \index{normal conformance}normal conformance visible to global conformance lookup. If the extension is unconstrained, this is essentially equivalent to stating a conformance on the extended type. If the extension is constrained, the conformance becomes a \emph{conditional conformance} (\SecRef{sec:conditional conformances}). -Let's begin by taking a closer look at \index{generic parameter list}generic parameter lists of extensions, by way of our nested type \texttt{Outer.Middle} and its extension above. Recall how generic parameters of nominal types work from \SecRef{generic params}: their names are lexically scoped to the body of the type declaration, and each generic parameter uniquely identified by its \index{depth}depth and \index{index}index. We can represent the declaration context nesting and generic parameter lists of our nominal type declaration \texttt{Outer.Middle} with a diagram like the following: +Let's begin by taking a closer look at \index{generic parameter list}generic parameter lists of extensions, by way of our nested type \texttt{Outer.Middle} and its extension above. Recall how generic parameters of nominal types work from \SecRef{generic params}: their names are lexically scoped to the body of the type declaration, and each generic parameter is uniquely identified by its \index{depth}depth and \index{index}index. We can represent the declaration context nesting and generic parameter lists of our nominal type declaration \texttt{Outer.Middle} with a diagram like the following: \begin{quote} \begin{tikzpicture}[node distance=5mm and 5mm,text height=1.5ex,text depth=.25ex] @@ -45,9 +45,9 @@ \chapter{Extensions}\label{extensions} \end{tikzpicture} \end{quote} -We create the fiction that each generic parameter in the scope of the extended type is also visible inside the extension by \emph{cloning} generic parameter declarations. The cloned declarations have the same name, depth and index as the originals, but they are parented to the extension. This ensures that looking up a generic parameter inside an extension finds a generic parameter with the same depth and index as the one with the same name in the extended type. Since all generic parameter in a single generic parameter list have the same depth, an extension conceptually has multiple generic parameter lists, one for each level of depth (that is, the generic context nesting) of the extended type. +We create the fiction that each generic parameter in the scope of the extended type is also visible inside the extension, by \emph{cloning} generic parameter declarations. The cloned declarations have the same name, depth, and index as the originals, but they are parented to the extension. This ensures that looking up a generic parameter inside an extension finds a generic parameter with the same depth and index as the one with the same name in the extended type. -This is represented by linking the generic parameter lists together via an optional ``outer list'' pointer. The innermost generic parameter list is ``the'' generic parameter list of the extension, and the head of the list; it is cloned from the extended type. Its outer list pointer is cloned from the extended type's parent generic context, if any, and so on. The outermost generic parameter list has a null outer list pointer. In our extension of \texttt{Outer.Middle}, this looks like so: +Since all generic parameters in a single generic parameter list have the same depth, an extension may have multiple generic parameter lists, one for each level of depth (that is, generic context nesting) of the extended type. We link the cloned generic parameter lists together using an ``outer list'' pointer. The innermost generic parameter list is ``the'' generic parameter list of the extension, and the head of the list; it is cloned from the extended type. Its outer list is cloned from the extended type's parent generic context, if any, and so on. The outermost generic parameter list has a null outer list. In our extension of \texttt{Outer.Middle}, this looks like so: \begin{quote} \begin{tikzpicture}[every node/.style={draw,rectangle},node distance=5mm and 5mm,text height=1.5ex,text depth=.25ex] @@ -72,7 +72,7 @@ \chapter{Extensions}\label{extensions} struct Inner {} } \end{Verbatim} -When an extension declares a nested generic type, the depth of the nested type's generic parameters becomes one greater than the depth of the extension's innermost generic parameter list. Thus, the generic parameter \texttt{V} of \texttt{Inner} has depth 2 and index 0. All three generic parameter lists are visible inside the body of \texttt{Outer.Middle.Inner}: +If an extension declares a nested generic type, the depth of the nested type's generic parameters becomes one greater than the depth of the extension's innermost generic parameter list. Thus, the generic parameter \texttt{V} of \texttt{Inner} has depth 2 and index 0. All three generic parameter lists are visible inside the body of \texttt{Outer.Middle.Inner}: \begin{quote} \begin{tikzpicture}[every node/.style={draw,rectangle},node distance=5mm and 5mm,text height=1.5ex,text depth=.25ex] @@ -129,23 +129,19 @@ \chapter{Extensions}\label{extensions} var flippedX: Int { return -x } // computed property: okay } \end{Verbatim} -All stored properties must be initialized by all declared constructors; extensions being able to introduce stored properties hithero-unknown to the original module would break this invariant. Another reason is that the stored property layout of a struct or class is computed when the struct or class declaration is emitted, and there is no mechanism for extensions to alter this layout after the fact. -\index{enum declaration} -\item Extensions cannot add new cases to enum declarations, for similar reasons as stored properties; doing so would complicate both the static exhaustiveness checking for \texttt{switch} statements and the in-memory layout computation of the enum. +All stored properties must be initialized by all declared constructors; extensions being able to introduce stored properties hitherto-unknown to the original module would break this invariant. Another reason is that the stored property layout of a struct or class is computed when the struct or class declaration is emitted, and there is no mechanism for extensions to alter this layout after the fact. +\item Extensions cannot add new cases to \index{enum declaration!extension}enum declarations, for similar reasons as stored properties; doing so would complicate both the static exhaustiveness checking for \texttt{switch} statements and the in-memory layout computation of the enum. -\index{class declaration} -\index{vtable} -\item When the extended type is a \index{class type!extended type}class, the methods of the extension are implicitly \texttt{final}, and the extension's methods are not permitted to override methods from the superclass. Non-\texttt{final} methods declared inside a class are dispatched through a vtable of function pointers attached to the runtime metadata of the class, and there is no mechanism for extensions to add new entries or replace existing entries inherited from the superclass. +\item When the extended type is a \index{class declaration!extension}\index{class type!extended type}class, the methods of the extension are implicitly \texttt{final}, and the extension's methods are not permitted to \index{override}override methods from the superclass. Calls to non-\texttt{final} class methods are dispatched through a \index{vtable}\emph{vtable}, or table of function pointers, attached to the runtime metadata of the class, and there is no mechanism for extensions to add new entries or replace existing entries in the vtable. -\index{nested type declaration} -\item The rules for nested types in extensions are the same as those for other nominal types (\SecRef{nested nominal types}). An extension of a struct, enum or class may contain nested structs, enums and classes, while a protocol extension cannot just as a protocol cannot. Extensions themselves must be at the top level of a source file, but the extended type can be nested inside of another nominal type (or extension). +\item The rules for types \index{nested type declaration!extension}nested in extensions are the same as those for types nested in nominal types (\SecRef{nested nominal types}). An extension of a struct, enum, or class may contain a nested struct, enum, or class, while a protocol extension cannot, just as a protocol cannot. Extensions themselves must be at the \index{global declaration}top level of a \index{source file}source file, but the extended type can be nested inside of another nominal type (or extension). \end{itemize} \section{Extension Binding}\label{extension binding} -The extended type of an extension is given as a \index{type representation}type representation following the \texttt{extension} keyword. Extension members are made available to \index{qualified lookup}qualified lookup by the process of \IndexDefinition{extension binding}\emph{extension binding}, which resolves the type representation of an extension to its extended type, and adds the extension's members to the extended type's name lookup table. +The extended type of an extension is specified by the \index{type representation}type representation that follows the \texttt{extension} keyword. Extension members are made available to \index{qualified lookup}qualified lookup by the process of \IndexDefinition{extension binding}\emph{extension binding}, which resolves the type representation of an extension to its extended type, and adds the extension's members to the extended type's name lookup table. -A complication is that the extended type of an extension can itself be declared inside of another extension. Extension binding cannot simply visit all extension declarations in a single pass in source order, because of ordering dependencies between extensions and nested types. Instead, multiple passes are performed; a failure to bind an extension is not a fatal condition, and instead failed extensions are revisited later after subsequent extensions are successfully bound. This process iterates until fixed point. +A complication is that the extended type of an extension may \emph{itself} be declared inside of another extension. Extension binding cannot simply visit all extension declarations in a single pass in source order, because of ordering dependencies between extensions and nested types. Instead, multiple passes are performed; a failure to bind an extension is not a fatal condition, and instead failed extensions are revisited later after subsequent extensions are successfully bound. This process iterates until fixed point. \begin{algorithm}[Bind extensions]\label{extension binding algorithm} Takes a list of all extensions in the main module as input, in any order. \begin{enumerate} @@ -159,15 +155,15 @@ \section{Extension Binding}\label{extension binding} \end{algorithm} The worklist-driven extension binding algorithm was introduced in \IndexSwift{5.0}Swift~5. Older compiler releases attempted to bind extensions in a single pass, something that could either succeed or fail depending on declaration order. This incorrect behavior was one of the most frequently reported bugs of all time \cite{sr631}. -\paragraph{Invalid extensions.} If extension binding fails to resolve the extended type of an extension, it simply remains on the delayed list without any diagnostics emitted. Invalid extensions are \index{diagnostic!extension binding}diagnosed later, when the \index{type-check source file request}\Request{type-check source file request} visits all \index{primary file}primary files and attempts to resolve the extended types of any extensions again. +\paragraph{Invalid extensions.} If extension binding fails to resolve the extended type of an extension, it simply remains on the delayed list without any diagnostics emitted. Invalid extensions are \index{diagnostic!extension binding}diagnosed later, when the \index{type-check primary file request}\Request{type-check primary file request} visits all \index{primary file}primary files and attempts to resolve the extended types of any extensions again. Extension binding uses a more limited form of type resolution, because we only need to resolve the type representation to a \emph{type declaration} and not a \emph{type}. This type declaration must be a \index{nominal type}nominal type declaration, so the extended type is typically written as an \index{identifier type representation}\emph{identifier} or \index{member type representation}\emph{member} type representation (Sections \ref{identtyperepr}~and~\ref{member type repr}). Extension binding runs early in the type checking process, immediately after parsing and import resolution. We cannot build \index{generic signature!extension binding}generic signatures or \index{conformance checker!extension binding}check conformances in extension binding, because those requests assume that extension binding has already taken place, freely relying on qualified lookup to find members of arbitrary extensions. -In particular, extension binding \index{limitation!extension binding}cannot find declarations \index{synthesized declaration}synthesized by the compiler, including type aliases created by \index{associated type inference}associated type inference. Also, it cannot perform type substitution; this rules out extensions of \index{generic type alias}generic type aliases whose underlying type is itself a type parameter. +In particular, extension binding \index{limitation!extension binding}cannot find declarations \index{synthesized declaration}synthesized by the compiler, including type aliases created by \index{associated type inference!extension binding}associated type inference. Also, it cannot perform type substitution; this rules out extensions of \index{generic type alias}generic type aliases whose underlying type is itself a type parameter. -If extension binding fails while the \Request{type-check source file request} successfully resolve the extended type when it visits the extension later, we emit a special diagnostic, tailored by a couple of additional checks. If ordinary type resolution returned a \index{type alias type}type alias type \texttt{Foo} desugaring to a nominal type \texttt{Bar}, the type checker emits the ``extension of type \texttt{Foo} must be declared as an extension of \texttt{Bar}'' diagnostic. Even though we know what the extended type should be at this point, we must still diagnose an error; it is too late to bind the extension, because other name lookups may have already been performed, potentially missing out on finding members of this extension. +If extension binding fails while the \Request{type-check primary file request} successfully resolves the extended type when it visits the extension later, we emit a special diagnostic, tailored by a couple of additional checks. If ordinary type resolution returned a \index{type alias type}type alias type \texttt{Foo} whose underlying type is a nominal type \texttt{Bar}, the type checker emits the ``extension of type \texttt{Foo} must be declared as an extension of \texttt{Bar}'' diagnostic. Even though we know what the extended type should be at this point, we must still diagnose an error; it is too late to bind the extension, because other name lookups may have already been performed, potentially missing out on finding members of this extension. -\begin{example}\label{bad extension 1} An invalid extension of an inferred type alias:\index{horse} +\begin{example}\label{bad extension 1} An invalid extension of a synthesized type alias:\index{horse} \begin{Verbatim} protocol Animal { associatedtype FeedType @@ -182,7 +178,7 @@ \section{Extension Binding}\label{extension binding} // extension of `Hay' extension Horse.FeedType {...} \end{Verbatim} -Extension binding fails to resolve \texttt{Horse.FeedType}, because this type alias was not stated explicitly and is inferred by the conformance checker from the witness \verb|Horse.eat(_:)|. Therefore, the type alias does not exist at extension binding time, and we specifically do not trigger its synthesis. +Extension binding fails to resolve \texttt{Horse.FeedType}, because this type alias was not written in source, so it does not exist when extension binding runs. However, when the \Request{type-check primary file request} visits this extension later, type resolution will trigger associated type inference, which will synthesize the type alias from the witness \verb|Horse.eat(_:)|. We produce a diagnostic directing the user to state the correct extended type that can be found at extension binding time. \end{example} \begin{example}\label{bad extension 2} An invalid extension of a type alias with an underlying type that is a type parameter: @@ -200,7 +196,7 @@ \section{Extension Binding}\label{extension binding} \end{gather*} \end{example} -The other case where extension binding fails but \textbf{type-check source file request} is able to resolve the extended type is when this type is not actually a nominal type. In this situation, the type checker emits the fallback ``non-nominal type cannot be extended'' diagnostic: +The other case where extension binding fails but \textbf{type-check primary file request} is able to resolve the extended type is when this type is not actually a nominal type. In this situation, the type checker emits the fallback ``non-nominal type cannot be extended'' diagnostic: \begin{Verbatim} typealias Fn = () -> () @@ -235,7 +231,7 @@ \section{Extension Binding}\label{extension binding} \paragraph{Local types.} \index{local type declaration} -\index{limitation!conditional conformance and local types} +\index{limitation!conditional conformance} Because extensions can only appear at the top level of a source file, the extended type must ultimately be visible from the top level. This allows extensions of types nested inside other top-level types, but precludes extensions of local types nested inside of functions or other local contexts, because there is ultimately no way to name a local type from the top level of a source file. (As a curious consequence, local types cannot conditionally conform to protocols, since the only way to declare a conditional conformance is to write an extension!) \section{Direct Lookup}\label{direct lookup} @@ -244,14 +240,14 @@ \section{Direct Lookup}\label{direct lookup} Nominal type declarations and extensions are \IndexDefinition{iterable declaration context}\emph{iterable declaration contexts}, meaning they contain member declarations. Before discussing direct lookup, let's consider what happens if we ask an iterable declaration context to list its members. This is a lazy operation which triggers work the first time it is called: \begin{itemize} -\item Iterable declaration contexts parsed from source are populated by \index{delayed parsing}delayed parsing; when the \index{parser}parser first reads a source file, the bodies of iterable declaration contexts are skipped, and only the source range is recorded. Asking for the list of members goes and parses the source range again, constructing declarations from their parsed representation (\SecRef{delayed parsing}). +\item Iterable declaration contexts parsed from source are populated by \index{delayed parsing}delayed parsing; when the \index{parser}parser first reads a \index{source file}source file, the bodies of iterable declaration contexts are skipped, and only the source range is recorded. Asking for the list of members goes and parses the source range again, constructing declarations from their parsed representation (\SecRef{delayed parsing}). \item Iterable declaration contexts from \index{serialized module}binary and \index{imported module}imported modules are equipped with a \IndexDefinition{lazy member loader}\emph{lazy member loader} which serves a similar purpose. Asking the lazy member loader to list all members will build the corresponding Swift declarations from deserialized records or imported Clang declarations. The lazy member loader can also find just those declarations with a \emph{specific} name, as explained below; this is the more common operation, since it is much more efficient. \end{itemize} \paragraph{Member lookup table.} Every nominal type declaration has an associated \IndexDefinition{member lookup table}\emph{member lookup table}, which is used for direct lookup. This table maps each identifier to a list of value declarations with that name (multiple value declarations can share a name because Swift allows type-based overloading). The declarations in a member lookup table are understood to be members of one or more iterable declaration contexts, which are exactly the type declaration itself and all of its extensions. These iterable declaration contexts might originate from a mix of different module kinds. For example, the nominal type itself might be an \index{Objective-C}Objective-C class from an imported Objective-C module, with one extension declared in a binary Swift module, and another extension defined in the main module, parsed from source. -The lookup table is populated lazily, in a manner resembling a state machine. Say we're asked to perform a direct lookup for some given name. If this is the first direct lookup, we populate the member lookup table with \emph{all} members from any \emph{parsed} iterable declaration contexts, which might trigger delayed parsing. Each entry in the member lookup table stores a ``complete'' bit. The ``complete'' bit of these initially-populated entries is \emph{not} set, meaning that the entry only contains those members that were parsed from source. If any iterable declaration contexts originate from binary and imported modules, direct lookup then asks each lazy member loader to selectively load only those members with the given name. (Parsed declaration contexts do not offer this level of laziness, because there is no way to parse a subset of the members only.) After the lazy member loaders do their work, the lookup table entry for this name is now complete, and the ``complete'' bit is set. Later when direct lookup finds a member lookup table entry with the ``complete'' bit set, it knows this entry is fully populated, and the stored list of declarations is returned immediately without querying the lazy member loaders. +The lookup table is populated lazily, kind of like a state machine. Say we're asked to perform a direct lookup for some given name~\tX. If this is the first lookup into this table, we begin by populating the table with \emph{all} members from any \emph{parsed} iterable declaration contexts, which might trigger delayed parsing. Each entry in the member lookup table also stores a ``complete'' bit. The ``complete'' bit of these initially-populated entries is \emph{not} set, meaning that each entry only contains those members that were parsed from source. Next, if any iterable declaration contexts originate from binary or imported modules, direct lookup asks each lazy member loader to selectively load only those members named \tX. (Parsed declaration contexts do not offer this level of granularity, because there is no way to find a specific member without parsing them all.) After the lazy member loaders do their work, the lookup table entry for \tX\ is now complete, so we set its ``complete'' bit. If a subsequent direct lookup encounters a member lookup table entry whose ``complete'' bit is already set, the list of declarations stored in the entry is returned immediately, without querying the lazy member loaders. This \emph{lazy member loading} mechanism ensures that only those members which are actually referenced in a compilation session are loaded from serialized and imported iterable declaration contexts. @@ -393,13 +389,13 @@ \section{Direct Lookup}\label{direct lookup} \section{Constrained Extensions}\label{constrained extensions} -An extension can impose its own requirements on the generic parameters of the extended type; we refer to this as a \IndexDefinition{constrained extension}\emph{constrained extension}. These requirements are always additive; the generic signature of a constrained extension is built from the generic signature of the extended type, together with the new requirements. The members of a constrained extension are then only available on specializations of the extended type that satisfy these requirements. There are three ways to declare a constrained extension: +An extension can impose its own requirements on the generic parameters of the extended type; we say this as a \IndexDefinition{constrained extension}\emph{constrained extension}. These requirements are always additive; the generic signature of a constrained extension is built from the generic signature of the extended type, together with the new requirements. The members of a constrained extension are then only available on specializations of the extended type that satisfy these requirements. There are three ways to declare a constrained extension: \begin{enumerate} \item using a \Index{where clause@\texttt{where} clause!extension declaration}\texttt{where} clause, -\item by writing a \index{generic nominal type!extended type}bound generic type as the extended type, +\item by writing a \index{generic nominal type!extended type}generic nominal type with generic arguments as the extended type, \item by writing a generic \index{type alias type!extended type}type alias type as the extended type, with some restrictions. \end{enumerate} -Case~1 is the most general form; Case~2 and Case~3 can be expressed by writing the appropriate requirements in a \texttt{where} clause. An extension that does not fall under one of these three cases is sometimes called an \IndexDefinition{unconstrained extension}\emph{unconstrained extension} when it is important to distinguish it from a constrained extension. The generic signature of an unconstrained extension is the same as the generic signature of the extended type. +Case~1 is the most general form; Case~2 and Case~3 can also be expressed by writing the appropriate requirements in a \texttt{where} clause. An extension that does not fall under one of these three cases is sometimes called an \IndexDefinition{unconstrained extension}\emph{unconstrained extension} when it is important to distinguish it from a constrained extension. The generic signature of an unconstrained extension is the same as the generic signature of the extended type. Here is a constrained extension of \texttt{Set} which constrains the \texttt{Element} type to \texttt{Int}: \begin{Verbatim} @@ -415,7 +411,7 @@ \section{Constrained Extensions}\label{constrained extensions} \begin{Verbatim} extension Set {...} \end{Verbatim} -A non-generic type alias type whose underlying type is a bound generic type can also be used in the same manner: +A non-generic type alias type whose underlying type is a generic nominal type can also be used in the same manner: \begin{Verbatim} typealias StringMap = Dictionary extension StringMap {...} @@ -474,16 +470,16 @@ \section{Constrained Extensions}\label{constrained extensions} \end{verbatim} \item This type alias does not substitute the second generic parameter of the underlying type with the second generic parameter of the type alias: \begin{verbatim} -typealias D = Dictionary Int> +typealias C = Dictionary Int> \end{verbatim} \end{enumerate} \end{example} -\section{Conditional Conformances}\label{conditional conformance} +\section{Conditional Conformances}\label{sec:conditional conformances} A conformance written on a nominal type or unconstrained extension implements the protocol's requirements for all specializations of the nominal type, and we call this an \emph{unconditional} conformance. A conformance declared on a \emph{constrained} extension is what's known as a \IndexDefinition{conditional conformance}\emph{conditional conformance}, which implements the protocol requirements only for those specializations of the extended type which satisfy the requirements of the extension. Conditional conformances were introduced in \IndexSwift{4.2}Swift 4.2~\cite{se0143}. -For example, arrays have a natural notion of equality, defined in terms of the equality operation on the element type. However, there is no reason to restrict all arrays to only storing \texttt{Equatable} types. Instead, the standard library defines a conditional conformance of \texttt{Array} to \texttt{Equatable} where the element type is \texttt{Equatable}: +For example, arrays have a natural notion of equality, defined in terms of an equality operation on the element type. However, we don't require that every array has an \texttt{Equatable} element type. Instead, we declare a conditional conformance of \texttt{Array} to \texttt{Equatable}, for those arrays whose element type is \texttt{Equatable}: \begin{Verbatim} struct Array {...} @@ -500,7 +496,7 @@ \section{Conditional Conformances}\label{conditional conformance} } } \end{Verbatim} -More complex conditional requirements can also be written. We previously discussed overlapping conformances and coherence in \SecRef{conformance lookup}, and conditional conformances inherit an important restriction. A nominal type can only conform to a protocol once, so in particular, \index{overlapping conformance}overlapping conditional conformances are not supported and we emit a \index{diagnostic!overlapping conformance}diagnostic: +More complex conditional requirements can also be written. We previously discussed overlapping conformances and coherence in \SecRef{conformance lookup}, and conditional conformances inherit an important restriction. A nominal type can still only conform to a protocol once, even if that conformance is conditional, so in particular, \index{overlapping conformance}overlapping conditional conformances are not supported and we emit a \index{diagnostic!overlapping conformance}diagnostic: \begin{Verbatim} struct G {} @@ -533,7 +529,7 @@ \section{Conditional Conformances}\label{conditional conformance} \end{Verbatim} \paragraph{Specialized conditional conformances.} Building upon \SecRef{conformance lookup}, we now describe how substitution relates to conditional conformances. If $\tXd$ is the \index{declared interface type!nominal type declaration}declared interface type of some nominal type declaration~$d$ that \emph{unconditionally} conforms to \tP, and $\tX = \tXd \otimes \Sigma$ is a \index{specialized type}specialized type of~$d$ for some substitution map $\Sigma$, then looking up the conformance of \tX\ to $\texttt{P}$ returns a \index{specialized conformance!conditional conformance}specialized conformance with \index{conformance substitution map}conformance substitution map~$\Sigma$: \[\PP\otimes\tX=\ConfReq{$\tXd$}{P}\otimes\Sigma\] -If $\ConfReq{$\tXd$}{P}$ is conditional though, we cannot take~$\Sigma$ to be the \index{context substitution map!for a declaration context}context substitution map of~\tX. The \index{type witness!of conditional conformance}type witnesses of $\ConfReq{$\tXd$}{P}$ might contain type parameters of the constrained extension, and not just the conforming type; however the context substitution map of \tX\ has the generic signature of the conforming type. We must set $\Sigma$ to be the context substitution map of \tX\ for the generic signature of the constrained extension, as in \SecRef{member type repr}. Indeed, this is the same problem as member type resolution when the referenced type declaration is declared in a constrained extension; we must perform some additional \index{global conformance lookup!substitution map}global conformance lookups to populate the substitution map completely. +If $\ConfReq{$\tXd$}{P}$ is conditional though, we cannot take~$\Sigma$ to be the \index{context substitution map!for a declaration context}context substitution map of~\tX. The \index{type witness!of conditional conformance}type witnesses of $\ConfReq{$\tXd$}{P}$ might contain type parameters of the constrained extension, and not just the conforming type; however the context substitution map of~\tX\ has the generic signature of the conforming type. We instead define $\Sigma$ to be the context substitution map of~\tX\ for the generic signature of the constrained extension, as in \SecRef{member type repr}. Indeed, we find ourselves in the same situation as in member type resolution, when the referenced type declaration is declared in a constrained extension; we must perform some additional \index{global conformance lookup!substitution map}global conformance lookups to completely populate the substitution map. The conditional requirements of the specialized conformance $\XP$ are the \index{substituted requirement}substituted requirements obtained by applying $\Sigma$ to each conditional requirement of \ConfReq{$\tXd$}{P}. This makes the following \index{commutative diagram!conditional conformance}diagram commute for each conditional requirement~$R$: \begin{center} @@ -544,7 +540,7 @@ \section{Conditional Conformances}\label{conditional conformance} \end{tikzcd} \end{center} -Consider the $\ConfReq{Array<\rT>}{Equatable}$ conformance from the standard library. The generic signature of \texttt{Array} is \texttt{<\rT>} with no requirements, while the conformance context is the constrained extension with signature \texttt{<\rT\ where \rT:~Equatable>}. The context substitution map of \texttt{Array} for the constrained extension is: +Consider the $\ConfReq{Array<\rT>}{Equatable}$ conformance from the standard library. The generic signature of \texttt{Array} is \texttt{<\rT>}, with no requirements, while the conformance context is the constrained extension with signature \texttt{<\rT\ where \rT:~Equatable>}. The context substitution map of \texttt{Array} for the constrained extension is: \begin{align*} \Sigma := \SubstMapC{ &\SubstType{\rT}{Int}}{ @@ -592,13 +588,13 @@ \section{Conditional Conformances}\label{conditional conformance} protocol Base {...} protocol Derived: Base {...} \end{Verbatim} -When checking a conformance to \texttt{Derived}, the \index{conformance checker}conformance checker ensures that the conforming type satisfies the $\ConfReq{Self}{Base}$ requirement. When the conformance to \texttt{Base} is unconditional, this always succeeds, because the conformance declaration also implies an unconditional conformance to the base protocol: +When checking a conformance to \texttt{Derived}, the \index{conformance checker}conformance checker ensures that the conforming type satisfies the $\ConfReq{Self}{Base}$ requirement. When the conformance to \texttt{Base} is unconditional, this always succeeds, because the conformance declaration also \index{implied conformance}\emph{implies} an unconditional conformance to the base protocol: \begin{Verbatim} struct Pair {} extension Pair: Derived {...} // implies `extension Pair: Base {}' \end{Verbatim} -The nominal type's \index{conformance lookup table}conformance lookup table synthesizes these implied conformances and makes them available to global conformance lookup. With conditional conformances, such implied conformances are not synthesized because there is no way to guess what the conditional requirements should be. The conformance checker still checks the associated conformance requirement on \tSelf\ though, so the user must first explicitly declare a conformance to each base protocol when writing a conditional conformance. +The nominal type's \index{conformance lookup table}conformance lookup table synthesizes these implied conformances and makes them available to global conformance lookup. With conditional conformances, such implied conformances are not synthesized, because there is no way to guess what the conditional requirements should be. The conformance checker still checks the associated conformance requirement on \tSelf\ though, so the user must first explicitly declare a conformance to each base protocol when writing a conditional conformance. Suppose we wish for \texttt{Pair} to conform to \texttt{Derived} when $\SameReq{\rT}{Int}$: \begin{Verbatim} @@ -653,7 +649,7 @@ \section{Conditional Conformances}\label{conditional conformance} \paragraph{Termination.} Conditional conformances can express \index{non-terminating computation}non-terminating computation at compile time. -The below code is taken from a bug report which remains \index{limitation!non-terminating conditional conformance}unfixed for the time being \cite{sr6724}: +The below code is taken from a bug report which remains \index{limitation!conditional conformance}unfixed for the time being \cite{sr6724}: \begin{Verbatim} protocol P {} @@ -723,7 +719,7 @@ \section{Conditional Conformances}\label{conditional conformance} The substitution map $\Sigma:=\SubstMapC{\SubstType{\rT}{Bad}}{\SubstConf{\rT}{Bad}{Sequence}}$ satisfies all of the explicit requirements of its generic signature, but it does not satisfy the derived requirement $\ConfReq{\rT.[Sequence]Iterator}{IteratorProtocol}$. However, this is not a problem, because we still \index{diagnostic!conformance checker}diagnose an error when we \index{conformance checker}check the conformance, so the program is rejected anyway. -Unfortunately, we can write down a substitution map that is not well-formed, and yet \AlgRef{check generic arguments algorithm} does not produce any diagnostics at all, including during conformance checking. This reveals a \index{limitation!conditional conformance soundness hole}soundness hole with conditional conformances that remains unfixed for the time being. We start with two protocols: +Unfortunately, we can write down a substitution map that is not well-formed, and yet \AlgRef{check generic arguments algorithm} does not produce any diagnostics at all, including during conformance checking. This reveals a \index{limitation!conditional conformance}soundness hole with conditional conformances that remains unfixed for the time being. We start with two protocols: \begin{Verbatim} protocol Bar { associatedtype Beer @@ -792,7 +788,7 @@ \section{Conditional Conformances}\label{conditional conformance} \end{enumerate} The second approach is more desirable, and we will revisit this problem in \ChapRef{rqm minimization}. -\section{Source Code Reference}\label{extensionssourceref} +\section{Source Code Reference}\label{src:extensions} Key source files: \begin{itemize} @@ -845,7 +841,7 @@ \subsection*{Direct Lookup and Lazy Member Loading} \IndexSource{direct lookup} \apiref{DirectLookupRequest}{class} -The request evaluator request implementing direct lookup. The entry point is the \texttt{NominalTypeDecl::lookupDirect()} method, which was introduced in \SecRef{compilation model source reference}. This request is uncached, because the member lookup table effectively implements caching outside of the request evaluator. +The request evaluator request implementing direct lookup. The entry point is the \texttt{NominalTypeDecl::lookupDirect()} method, which was introduced in \SecRef{src:compilation model}. This request is uncached, because the member lookup table effectively implements caching outside of the request evaluator. To understand the implementation of \verb|DirectLookupRequest::evaluate()|, one can start with the following functions: \begin{itemize} @@ -876,10 +872,10 @@ \subsection*{Constrained Extensions} \item \SourceFile{lib/Sema/TypeCheckDecl.cpp} \item \SourceFile{lib/Sema/TypeCheckGeneric.cpp} \end{itemize} -The \texttt{GenericSignatureRequest} was previously introduced in \SecRef{buildinggensigsourceref}. It delegates to a pair of utility functions to implement special behaviors of extensions. +The \texttt{GenericSignatureRequest} was previously introduced in \SecRef{src:building generic signatures}. It delegates to a pair of utility functions to implement special behaviors of extensions. \apiref{collectAdditionalExtensionRequirements()}{function} -Collects an extension's requirements from the extended type, which handles extensions of pass-through type aliases (\verb|extension CountableRange {...}|) and extensions of bound generic types (\verb|extension Array {...}|). +Collects an extension's requirements from the extended type, which handles extensions of pass-through type aliases (\verb|extension CountableRange {...}|) and extensions of generic nominal types (\verb|extension Array {...}|). \IndexSource{pass-through type alias} \apiref{isPassthroughTypealias()}{function} @@ -897,9 +893,9 @@ \subsection*{Conditional Conformances} \end{itemize} \apiref{checkConformance}{function} -A utility function that first calls \texttt{lookupConformance()} (\SecRef{conformancesourceref}), and then checks any conditional requirements using \texttt{checkRequirements()}, described in \SecRef{type resolution source ref}. +A utility function that first calls \IndexSource{global conformance lookup}\texttt{lookupConformance()} (\SecRef{src:conformances}), and then checks any conditional requirements using \texttt{checkRequirements()}, described in \SecRef{src:type resolution}. -The \verb|NormalProtocolConformance| and \verb|SpecializedProtocolConformance| classes were previously introduced in \SecRef{conformancesourceref}. +The \verb|NormalProtocolConformance| and \verb|SpecializedProtocolConformance| classes were previously introduced in \SecRef{src:conformances}. \apiref{NormalProtocolConformance}{class} \begin{itemize} \item \texttt{getConditionalRequirements()} returns an array of \IndexSource{conditional requirement}conditional requirements; this is non-empty exactly when this is a \IndexSource{conditional conformance}conditional conformance. @@ -909,7 +905,7 @@ \subsection*{Conditional Conformances} \item \texttt{getConditionalRequirements()} applies the \IndexSource{conformance substitution map}conformance substitution map to each conditional requirement of the underlying normal conformance. \end{itemize} \apiref{GenericSignatureImpl}{class} -See also \SecRef{genericsigsourceref}. +See also \SecRef{src:generic signatures}. \begin{itemize} \item \texttt{requirementsNotSatisfiedBy()} returns an array of those requirements of this generic signature not satisfied by the given generic signature. This is used for computing the conditional requirements of a \texttt{NormalProtocolConformance}. \end{itemize} diff --git a/docs/Generics/chapters/generic-signatures.tex b/docs/Generics/chapters/generic-signatures.tex index 37bdeb9a0ef34..ff55b032f71a6 100644 --- a/docs/Generics/chapters/generic-signatures.tex +++ b/docs/Generics/chapters/generic-signatures.tex @@ -2,7 +2,7 @@ \begin{document} -\chapter{Generic Signatures}\label{genericsig} +\chapter{Generic Signatures}\label{chap:generic signatures} \index{opaque parameter} \lettrine{G}{eneric signatures} are semantic objects that describe the type checking behavior of \index{generic declaration}generic declarations. Declarations can nest, and outer generic parameters are visible inside inner declarations, so a \IndexDefinition{generic signature}generic signature is a ``flat'' representation that collects all \emph{generic parameter types} and \emph{requirements} that apply in the declaration's scope. This abstracts away the concrete syntax described in the previous chapter: @@ -16,9 +16,9 @@ \chapter{Generic Signatures}\label{genericsig} \[\underbrace{\mathstrut\texttt{}}_{\text{requirements}}\] \end{ceqn} -The requirements in a generic signature use the same semantic representation as the requirements in a trailing \texttt{where} clause, given by \DefRef{requirement def}, with a few additional properties. A generic signature always omits any redundant requirements from the list, and the remaining ones are written to be ``as simple as possible.'' We will describe this in \ChapRef{building generic signatures}, when we see how the generic signature of a declaration is built from the syntactic forms described in the previous chapter, but for now, we're just going to assume we're working with an existing generic signature that was given to us by this black box. +The requirements in a generic signature use the same semantic representation as the requirements in a trailing \texttt{where} clause, given by \DefRef{requirement def}, with a few additional properties. A generic signature always omits any redundant requirements from the list, and the remaining ones are rewritten to be ``as simple as possible.'' We will fully explain this in \ChapRef{chap:building generic signatures}, when we see how the generic signature of a declaration is built from the syntactic forms described in the previous chapter, but for now, we're just going to assume we're working with an existing generic signature that was given to us. -After some preliminaries, we will go on to introduce a formal system for reasoning about requirements and type parameters in \SecRef{derived req}. This makes precise the earlier concept of the \index{interface type}\emph{interface type} of a declaration---it contains valid type parameters of the declaration's generic signature. \SecRef{genericsigqueries} describes \emph{generic signature queries}, which are fundamental primitives in the implementation, used by the rest of the compiler to answer questions about generic signatures. These questions will be statements in our formal system. +After some preliminaries, we will go on to introduce a formal system for reasoning about requirements and type parameters in \SecRef{derived req}. This makes precise the earlier concept of the \index{interface type}\emph{interface type} of a declaration---it contains type parameters whose validity can be derived from the declaration's generic signature. \SecRef{genericsigqueries} describes \emph{generic signature queries}, which are fundamental primitives in the implementation, used by the rest of the compiler to answer questions about generic signatures. These questions will be precisely stated using our formal system. \paragraph{Debugging.} The \IndexFlag{debug-generic-signatures}\texttt{-debug-generic-signatures} frontend flag gives us a glimpse into the generics implementation by printing the generic signature of each declaration being type checked. Here is a simple program with three nested generic declarations: \begin{Verbatim} @@ -54,7 +54,7 @@ \chapter{Generic Signatures}\label{genericsig} where S1.Element == S2.Element, S1.Iterator == S2.Iterator {...} \end{Verbatim} -The first function expects two sequences with the same \texttt{Element} associated type, so ``\verb|sameElt(Array(), Set())|'' for example. The second requires both have the same \texttt{Iterator} type. The third requires both, but a consequence of how \texttt{Sequence} is declared in the standard library is that the second is a stronger condition; if two sequences have the same \texttt{Iterator} type, they will also have the same \texttt{Element} type, but not vice versa. In other words, the same-type requirement $\SameReq{S1.Element}{S2.Element}$ is \emph{redundant} in the trailing \texttt{where} clause of \texttt{sameEltAndIter()}. (We will be able to \emph{prove} it when we revisit this generic signature in \ExRef{same name rule example}.) When we compile this program with \texttt{-debug-generic-signatures}, we observe that the generic signature of \texttt{sameElt()} is distinct from the other two, while \texttt{sameIter()} and \texttt{sameEltAndIter()} have a generic signature that looks like this: +The first function expects two sequences with the same \texttt{Element} associated type, so ``\verb|sameElt(Array(), Set())|'' for example. The second function requires both have the same \texttt{Iterator} type. The third function requires both conditions, but a consequence of how \texttt{Sequence} is declared in the standard library is that the second condition is stronger than the first; that is, if two sequences have the same \texttt{Iterator} type, they \emph{also} have the same \texttt{Element} type. In other words, the same-type requirement $\SameReq{S1.Element}{S2.Element}$ is \emph{redundant} in \texttt{sameEltAndIter()}. (We will be able to \emph{prove} this fact when we revisit this generic signature in \ExRef{same name rule example}.) When we compile this program with \texttt{-debug-generic-signatures}, we observe that the generic signature of \texttt{sameElt()} is distinct from the other two, while \texttt{sameIter()} and \texttt{sameEltAndIter()} have a generic signature that looks like this: \begin{quote} \begin{verbatim} 1 @@ -632,11 +632,11 @@ \section{Valid Type Parameters}\label{valid type params} \begin{gather*} \AssocSameStep{1}{\rT.$\texttt{SubSequence}^{n+1}$}{\rT.$\texttt{SubSequence}^{n+2}$}{5} \end{gather*} -These form an equivalence class from all \texttt{\rT.$\texttt{SubSequence}^n$.SubSequence} for $n\geq 0$. Finally, applying \textsc{SameName} gives us one more infinite family: +We get an equivalence class containing \texttt{\rT.$\texttt{SubSequence}^n$} for $n\geq 1$. Finally, applying \textsc{SameName} gives us one more infinite family: \begin{gather*} \SameNameStep{5}{2}{\rT.$\texttt{SubSequence}^{n+1}$.Iterator}{\rT.$\texttt{SubSequence}^{n+2}$.Iterator}{6} \end{gather*} -These form the last equivalence class, because the \texttt{SubSequence} may have a distinct \texttt{Iterator} from the original sequence. +These form our final equivalence class; the \texttt{SubSequence} may have a distinct \texttt{Iterator} from the original sequence. We see that $G_\texttt{Collection}$ defines five equivalence classes, and three of those contain infinitely many representative type parameters each: \begin{align*} @@ -695,13 +695,13 @@ \section{Bound Type Parameters}\label{bound type params} We often apply substitution maps to requirements of \index{generic signature!type substitution}generic signatures and requirement signatures, and the \index{interface type}interface types of declarations. Indeed, all of these semantic objects must only contain bound type parameters. We resolve this as follows: \begin{itemize} \item Requirement minimization (\SecRef{minimal requirements}) converts unbound type parameters in the \texttt{where} clause into bound type parameters when building a generic signature. -\item Queries against an existing generic signature (\SecRef{genericsigqueries}) allow unbound type parameters. We will see that one can obtain a bound type parameter by asking the generic signature for an unbound type parameter's \index{reduced type}\emph{reduced type}. -\item Type resolution makes use of generic signature queries to form bound dependent member types when resolving the interface type of a declaration (\ChapRef{typeresolution}). +\item Generic signature queries (\SecRef{genericsigqueries}) accept unbound type parameters. One can obtain a bound type parameter from an unbound type parameter by asking the generic signature for the unbound type parameter's \index{reduced type}\emph{reduced type}. +\item Type resolution uses generic signature queries to form bound type parameters when resolving the interface type of a declaration (\ChapRef{chap:type resolution}). \end{itemize} We now extend our formal system to describe these behaviors. As always, let $G$ be a generic signature. We assume that $G\vdash\TP$ for some~\tT\ and~\tP. -We first add an inference rule that is analogous to \IndexStep{AssocName}\textsc{AssocName} except that it derives a bound dependent member type. From each \index{associated type declaration!inference rule}associated type declaration~\nA\ of~\tP, the \IndexStepDefinition{AssocDecl}\textsc{AssocDecl} inference rule derives the \index{bound dependent member type!inference rule}bound \index{dependent member type!inference rule}dependent member type \texttt{T.[P]A}, with base type~\tT, referencing the associated type declaration~\nA: +We first add an inference rule analogous to \IndexStep{AssocName}\textsc{AssocName}, except that it derives a bound dependent member type. From each \index{associated type declaration!inference rule}associated type declaration~\nA\ of~\tP, the \IndexStepDefinition{AssocDecl}\textsc{AssocDecl} inference rule derives the \index{bound dependent member type!inference rule}bound \index{dependent member type!inference rule}dependent member type \texttt{T.[P]A}, with base type~\tT, referencing the associated type declaration~\nA: \begin{gather*} \AssocDeclStepDef \end{gather*} @@ -772,7 +772,7 @@ \section{Bound Type Parameters}\label{bound type params} \end{gather*} \end{example} -In fact, we will prove the following in \ChapRef{building generic signatures}: +In fact, we will prove the following in \ChapRef{chap:building generic signatures}: \begin{itemize} \item \ThmRef{bound and unbound equiv} will show that every equivalence class of valid type parameters always contains at least one bound and at least one unbound type parameter. \item \PropRef{equiv generic signatures} will tell us that we can start with a list of explicit requirements written with bound or unbound type parameters without changing the theory. @@ -809,10 +809,10 @@ \section{Bound Type Parameters}\label{bound type params} So far, we've only used our formal system to derive concrete statements \emph{in} a fixed generic signature. In later chapters, we prove results \emph{about} generic signatures: \begin{itemize} -\item In \ChapRef{building generic signatures}, we describe how we diagnose invalid requirements, and show that if no diagnostics are emitted, our formal system has a particularly nice theory. +\item In \ChapRef{chap:building generic signatures}, we describe how we diagnose invalid requirements, and show that if no diagnostics are emitted, our formal system has a particularly nice theory. \item In \ChapRef{conformance paths}, we take a closer look at derived conformance requirements, to describe substitution of dependent member types; in particular, we prove that a certain algorithm must terminate. \item In \ChapRef{monoids}, we use derived requirements to show that a Swift protocol can encode an arbitrary finitely-presented monoid, which demonstrates that a generic signature can have an undecidable theory. -\item In \ChapRef{symbols terms rules}, we will translate the explicit requirements of a generic signature into rewrite rules, and then show that derived requirements correspond to \emph{rewrite paths} under this mapping. This provides a correctness proof for the implementation. +\item In \ChapRef{chap:symbols terms rules}, we will translate the explicit requirements of a generic signature into rewrite rules, and then show that derived requirements correspond to \emph{rewrite paths} under this mapping. This provides a correctness proof for the implementation. \end{itemize} \section{Reduced Type Parameters}\label{reduced types} @@ -842,7 +842,7 @@ \section{Reduced Type Parameters}\label{reduced types} We will define a linear order on type parameters, denoted by $<$. This will express the idea that if $G\vdash\TU$ and $\tT<\tU$, then \tT\ is ``more reduced'' than \tU, within this equivalence class that contains both. The minimum out of all representatives is then the reduced type parameter itself. We will start by measuring the ``complexity'' of a type parameter by counting the number of dependent member types involved in its construction: \begin{definition} -The \index{length!of type parameter}\IndexDefinition{type parameter length}\emph{length} of a type parameter \tT, denoted by $|\tT|$, is a natural number. The length of a generic parameter type is~1, while the length of a dependent member type \texttt{U.A} or \texttt{U.[P]A} is recursively defined as $|\tU|+1$. +The \IndexDefinition{type parameter length}\emph{length} of a type parameter \tT, denoted by $|\tT|$, is a natural number. The length of a generic parameter type is~1, while the length of a dependent member type \texttt{U.A} or \texttt{U.[P]A} is recursively defined as $|\tU|+1$. \end{definition} We want the reduced type parameter to be one of minimum possible length, and we also want to be able to apply substitution maps to it, so it ought to be a \index{bound type parameter!type substitution}bound type parameter. This gives us two conditions our type parameter order must satisfy: @@ -875,9 +875,9 @@ \section{Reduced Type Parameters}\label{reduced types} (The real \texttt{Collection} protocol defines these reduced types together with a few more equivalence classes related to the \texttt{Index} and \texttt{Indices} associated types.) \end{example} -In general, we must also order type parameters of equal length, so we show how we do this now. All of the below algorithms take a pair of values $x$ and $y$ and then compute whether $xy$ or $x=y$ simultaneously. We start with the generic parameters, which are the type parameters of length~1. +In general, we must also order type parameters of equal length, so we show how we do this now. All of the below algorithms take a pair of values $x$ and $y$ and then compute whether $xy$, or $x=y$ simultaneously. We start with the generic parameters, which are the type parameters of length~1. -\begin{algorithm}[Generic parameter order]\label{generic parameter order} \IndexDefinition{generic parameter order}Takes two \index{generic parameter type}generic parameter types \ttgp{d}{i} and \ttgp{D}{I} as input. Returns one of ``$<$'', ``$>$'' or ``$=$'' as output. +\begin{algorithm}[Generic parameter order]\label{generic parameter order} \IndexDefinition{generic parameter order}Takes two \index{generic parameter type}generic parameter types \ttgp{d}{i} and \ttgp{D}{I} as input. Returns one of ``$<$'', ``$>$'', or ``$=$'' as output. \begin{enumerate} \item If $\texttt{d}<\texttt{D}$, return ``$<$''. \item If $\texttt{d}>\texttt{D}$, return ``$>$''. @@ -889,7 +889,7 @@ \section{Reduced Type Parameters}\label{reduced types} To order dependent member types, we must order associated type declarations, and to do that we must order protocol declarations. -\begin{algorithm}[Protocol order]\label{linear protocol order} Takes \IndexDefinition{protocol order}protocols \tP\ and \tQ\ as input, and returns one of ``$<$'', ``$>$'' or ``$=$'' as output. +\begin{algorithm}[Protocol order]\label{linear protocol order} Takes \IndexDefinition{protocol order}protocols \tP\ and \tQ\ as input, and returns one of ``$<$'', ``$>$'', or ``$=$'' as output. \begin{enumerate} \item Compare the parent \index{module declaration}module names of \tP\ and \tQ\ with the usual lexicographic order on identifiers. Return the result if it is ``$<$'' or ``$>$''. Otherwise, \tP\ and \tQ\ are declared in the same module, so keep going. \item Compare the names of \tP\ and~\tQ\ and return the result if it is ``$<$'' or ``$>$''. If \tP~and~\tQ\ are actually the same protocol, return ``$=$''. Otherwise, the program is invalid because it declares two protocols with the same name. Any tie-breaker can be used, such as source location. @@ -919,7 +919,7 @@ \section{Reduced Type Parameters}\label{reduced types} \IndexDefinition{associated type order}% \begin{algorithm}[Associated type order]\label{associated type order}% -Takes associated type declarations $\nA_1$ and $\nA_2$ as input, and returns one of ``$<$'', ``$>$'' or ``$=$'' as output. +Takes associated type declarations $\nA_1$ and $\nA_2$ as input, and returns one of ``$<$'', ``$>$'', or ``$=$'' as output. \begin{enumerate} \item First, compare their names lexicographically. Return the result if it is ``$<$'' or ``$>$''. Otherwise, both associated types have the same name, so keep going. \item If $\nA_1$ is a root associated type and $\nA_2$ is not, return ``$<$''. @@ -933,7 +933,7 @@ \section{Reduced Type Parameters}\label{reduced types} We now have enough to order all type parameters. The type parameter order does not distinguish \index{sugared type}type sugar, so it outputs ``$=$'' if and only if \tT\ and \tU\ are canonically equal. \begin{algorithm}[Type parameter order]\label{type parameter order} -Takes type parameters \tT\ and \tU\ as input, and returns one of ``$<$'', ``$>$'' or ``$=$'' as output. +Takes type parameters \tT\ and \tU\ as input, and returns one of ``$<$'', ``$>$'', or ``$=$'' as output. \begin{enumerate} \item If \tT\ is a generic parameter type and \tU\ is a \index{dependent member type!linear order}dependent member type, then $|\tT|<|\tU|$, so return ``$<$''. \item If \tT\ is a dependent member type and \tU\ is a generic parameter type, then $|\tT|>|\tU|$, so return ``$>$''. @@ -986,7 +986,7 @@ \section{Reduced Type Parameters}\label{reduced types} \item The in-memory layout of a \index{witness table}witness table stores an associated witness table for each associated conformance requirement in the protocol's \index{requirement signature!witness table}requirement signature, sorted in this order. \end{enumerate} -In \ChapRef{genericenv}, we introduce \emph{archetypes}, a self-describing representation of a reduced type parameter packaged up with a generic signature, which behaves more like a concrete type inside the compiler. An archetype thus represents an entire equivalence class of type parameters, and for this reason we will also use the equivalence class notation~$\archetype{T}$ to denote the archetype corresponding to the type parameter~\tT. +In \ChapRef{chap:archetypes}, we introduce \emph{archetypes}, a self-describing representation of a reduced type parameter packaged up with a generic signature, which behaves more like a concrete type inside the compiler. An archetype thus represents an entire equivalence class of type parameters, and for this reason we will also use the equivalence class notation~$\archetype{T}$ to denote the archetype corresponding to the type parameter~\tT. A more complete treatment of equivalence relations and partial orders can be found in a discrete mathematics textbook, such as~\cite{grimaldi}. Something to keep in mind is that some authors say that a partial order is reflexive instead of anti-reflexive, so $\leq$ is the basic operation. This necessitates the additional condition that if both $x\leq y$ and $y\geq x$ are true, then $x=y$; without this assumption, we have a \index{preorder}\emph{preorder}. Sometimes, a linear order is also called a \emph{total order}, and a set with a well-founded order is said to be \emph{well-ordered}. The type parameter order is actually a special case of a \index{shortlex order}\emph{shortlex order}. Under reasonable assumptions, a shortlex order is always well-founded. We will see another instance of a shortlex order in \SecRef{finding conformance paths}, and then generalize the concept when we introduce the theory of string rewriting in \SecRef{rewritesystemintro}. @@ -1030,7 +1030,7 @@ \section{Generic Signature Queries}\label{genericsigqueries} \end{itemize} As for $\Query{getRequiredProtocols}{}$, because the ``for all'' quantifier is selecting from a finite universe of protocols, a correct but inefficient implementation would repeatedly check if $\Query{requiresProtocol}{G,\tT,\,\tP}$ for each~\tP\ in turn. The real implementation instead builds a data structure that can perform this lookup more efficiently (\ChapRef{propertymap}). -The $\Query{getRequiredProtocols}{}$ query is used when type checking a \index{member reference expression}member reference expression like ``\texttt{foo.bar}'' where the type of ``\texttt{foo}'' is a type parameter, because we resolve ``\texttt{bar}'' by a \index{qualified lookup}qualified name lookup into this list of protocols. Qualified lookup recursively visits inherited protocols, so the list is minimal in the sense that no protocol inherits from any other. The protocols are also sorted using \AlgRef{linear protocol order}.} +The $\Query{getRequiredProtocols}{}$ query is used when type checking a \index{member reference expression}member reference expression like ``\texttt{foo.bar}'' where the type of ``\texttt{foo}'' is a type parameter, because we resolve ``\texttt{bar}'' by a \index{qualified lookup}qualified name lookup into this list of protocols. This list is minimal in the sense that no protocol inherits from any other, because qualified lookup recursively visits all inherited protocols anyway. The protocols are also sorted using \AlgRef{linear protocol order}.} \begin{example} Let $G$ be the generic signature from \ExRef{motivating derived reqs}: @@ -1091,13 +1091,15 @@ \section{Generic Signature Queries}\label{genericsigqueries} \end{enumerate} \end{example} -\paragraph{Reduced types.} Given a generic signature $G$, we now consider the interface types of~$G$, that is, those types that \emph{contain} type parameters. We will impose an \index{equivalence relation!interface types}equivalence relation on interface types, with two important properties. First, if an interface type contains a type parameter that is fixed to a concrete type, we can replace this type parameter with the concrete type, to get an equivalent interface type. In the previous example, \texttt{\rT.[Foo]A} was fixed to \texttt{Array<\rT.[Foo]B>}, and \texttt{\rT.[Foo]B} was fixed to \texttt{Int}, so the following three interface types are equivalent in that signature: +\paragraph{Reduced types.} We can now generalize reduced type equality of type parameters to get an \index{equivalence relation!interface types}equivalence relation on interface types, that is, those types that \emph{contain} type parameters. This equivalence relation, known as \emph{reduced type equality of interface types}, can be characterized by its two defining properties. + +The first property is that if a generic signature fixes a type parameter to a concrete type, an interface type containing this type parameter is then equivalent to one where this type parameter has been replaced by its concrete type. In the previous example, \texttt{\rT.[Foo]A} was fixed to \texttt{Array<\rT.[Foo]B>}, and \texttt{\rT.[Foo]B} was fixed to \texttt{Int}, so the following three interface types are equivalent in that signature: \begin{gather*} \texttt{\rT.[Foo]A}\\ \texttt{Array<\rT.[Foo]B>}\\ \texttt{Array} \end{gather*} -The second property is that we can also replace a type parameter with an equivalent type parameter, anywhere that it appears in an interface type, to get an equivalent interface type. Consider this signature, with the same protocol~\texttt{Foo} as before, but instead of fixing \texttt{\rT.[Foo]B} to \texttt{Int}, we say that \texttt{\rT.[Foo]B} is equivalent to \rU: +The second property is that replacing a type parameter with an equivalent type parameter, anywhere that it appears in an interface type, will always produce an equivalent interface type. Consider this signature, with the same protocol~\texttt{Foo} as before, but instead of fixing \texttt{\rT.[Foo]B} to \texttt{Int}, we say that \texttt{\rT.[Foo]B} is equivalent to \rU: \begin{quote} \begin{verbatim} <τ_0_0, τ_0_1 where τ_0_0: Foo, τ_0_1 == τ_0_0.[Foo]B> @@ -1110,23 +1112,21 @@ \section{Generic Signature Queries}\label{genericsigqueries} \texttt{Array<\rU>} \end{gather*} -Suppose we have a subroutine to find the reduced type parameter in an equivalence class of type parameters already. Then, if we are given an arbitrary interface type, we can iterate a pair of transformations until fixed point: first, we eliminate any type parameters fixed to concrete types, and then, we replace any remaining type parameters with the reduced type parameter in each equivalence class. +Suppose we take an interface type, and apply a pair of transformations until fixed point: first, we replace every type parameter that is fixed to concrete type, and second, we replace every other type parameter with the reduced type parameter of its equivalence class. A key fact is that the resulting interface type is equivalent to the original. -\begin{algorithm}[Reduce interface type]\label{reduced type algorithm} +\begin{algorithm}[Compute reduced type]\label{reduced type algorithm} Takes a generic signature~$G$, and an interface type~\tX, as input. Outputs the reduced type of~\tX. \begin{enumerate} \item If \tX\ is actually a type parameter \tT: \begin{enumerate} -\item If $\Query{isConcreteType}{G,\,\tT}$: recurse with $\Query{getConcreteType}{G,\,\tT}$. +\item If $\Query{isConcreteType}{G,\,\tT}$: recursively reduce $\Query{getConcreteType}{G,\,\tT}$. \item Otherwise, return the reduced type parameter in the equivalence class of~\tT. \end{enumerate} \item Otherwise, \tX\ is a concrete type. If \tX\ does not have any children, return \tX. -\item Otherwise, recurse on each child type of~\tX, and construct a new type from these reduced child types, together with any non-type attributes of~\tX. +\item Otherwise, recursively reduce each child type of~\tX, and construct a new type from these reduced child types together with any non-type attributes of~\tX. \end{enumerate} \end{algorithm} -(We guarantee \index{termination}termination by disallowing self-referential \index{same-type requirement!recursive}same-type requirements, like $\SameReq{\rT}{Array<\rT>}$, otherwise Step~1a gets stuck reducing \rT, \texttt{Array<\rT>}, \texttt{Array>}, and so on. We'll describe how in \SecRef{subst simplification}, but for now we just assume they don't appear.) - This algorithm outputs a special kind of interface type: \begin{definition} @@ -1135,29 +1135,31 @@ \section{Generic Signature Queries}\label{genericsigqueries} \item Every such type parameter is a reduced type parameter of $G$. \item No such type parameter is fixed to a concrete type by $G$. \end{enumerate} -A \index{fully-concrete type!reduced type}fully-concrete type (one without type parameters) is trivially a reduced type. +In particular, a \index{fully-concrete type!reduced type}fully-concrete type, one without type parameters, is reduced. \end{definition} -\begin{definition} -Let $G$ be a generic signature. Two interface types are equivalent under the \index{reduced type equality!on interface types}\emph{reduced type equality} relation on the interface types of~$G$ if they have \index{canonical type equality}canonically equal reduced types. -\end{definition} -Every equivalence class of interface types contains a unique reduced type, so we can answer questions about this relation with this pair of generic signature queries: +To guarantee \index{termination}termination in \AlgRef{reduced type algorithm}, we must disallow generic signatures with self-referential \index{same-type requirement!recursive}same-type requirements. For example, if $G\vdash\SameReq{\rT}{Array<\rT>}$, Step~1a would get stuck reducing \rT, \texttt{Array<\rT>}, \texttt{Array>}, and so on. We will describe how in \SecRef{subst simplification}. Given that this can't happen, it follows that every equivalence class of interface types contains a unique reduced type. Therefore: + +\begin{proposition} +Let $G$ be a generic signature. Two interface types are equivalent under the \index{reduced type equality!on interface types}reduced type equality relation if and only if they have the same reduced type. (More precisely, if their reduced types are \index{canonical type equality}canonically equal.) +\end{proposition} +Thus, the following pair of generic signature queries decide reduced type equality: \begin{itemize} \QueryDef{isReducedType} {G,\,\tT} -{interface type \tT} +{interface type \tT.} {true or false: is \tT\ canonically equal to its reduced type?} {Decide if an interface type is already a reduced type.} \QueryDef{getReducedType} {G,\,\tT} -{interface type \tT} +{interface type \tT.} {the reduced type of \tT.} -{Compute the reduced type of an interface type using \AlgRef{reduced type algorithm}. This will also output a \index{canonical type!reduced type}canonical type, so it will not contain type sugar.} +{Compute the reduced type using \AlgRef{reduced type algorithm}. This always outputs a \index{canonical type!reduced type}canonical type, so any type sugar in the original type is lost.} \end{itemize} \begin{example} -Reduced type equality of interface types is a \index{coarser relation}\emph{coarser} relation than reduced type equality of type parameters; when two type parameters are equivalent as type parameters, they are also equivalent as interface types, but the converse does not hold. Let~$G$ be the generic signature: +Reduced type equality on interface types is a \index{coarser relation}\emph{coarser} relation than reduced type equality on type parameters; two equivalent type parameters are also equivalent as interface types, but the converse does \emph{not} hold. Let~$G$ be this generic signature, and note that $\SameReq{\rT}{\rU}$ \emph{cannot} be derived from $G$: \begin{quote} \begin{verbatim} <τ_0_0, τ_0_1 where τ_0_0 == Int, τ_0_1 == Int> @@ -1184,7 +1186,7 @@ \section{Generic Signature Queries}\label{genericsigqueries} {G, \tT} {type parameter \tT.} {true or false: $G\vdash\TAnyObject$?} -{Decide if \tT\ has a \index{layout requirement!generic signature query}single retainable pointer representation.} +{Decide if \tT\ has a \index{layout requirement!generic signature query}single \index{reference count}reference-counted pointer representation.} \QueryDef{getLayoutConstraint} {G, \tT} @@ -1212,12 +1214,12 @@ \section{Generic Signature Queries}\label{genericsigqueries} \end{Verbatim} \begin{enumerate} -\item $\Query{getSuperclassBound}{G,\,\texttt{\rT}}$ is null. -\item $\Query{getSuperclassBound}{G,\,\texttt{\rU}}$ is \texttt{Shape}. -\item $\Query{getSuperclassBound}{G,\,\texttt{\rV}}$ is \texttt{Shape}. -\item $\Query{requiresClass}{G,\,\texttt{\rT}}$ is true. -\item $\Query{requiresClass}{G,\,\texttt{\rU}}$ is true. -\item $\Query{requiresClass}{G,\,\texttt{\rV}}$ is true. +\item $\Query{getSuperclassBound}{G,\,\rT}$ is null. +\item $\Query{getSuperclassBound}{G,\,\rU}$ is \texttt{Shape}. +\item $\Query{getSuperclassBound}{G,\,\rV}$ is \texttt{Shape}. +\item $\Query{requiresClass}{G,\,\rT}$ is true. +\item $\Query{requiresClass}{G,\,\rU}$ is true. +\item $\Query{requiresClass}{G,\,\rV}$ is true. \end{enumerate} We can write down a derivation of $\Query{requiresClass}{G,\,\texttt{\rT}}$: \begin{gather*} @@ -1252,7 +1254,7 @@ \section{Generic Signature Queries}\label{genericsigqueries} \end{gather*}} \end{itemize} -\section{Source Code Reference}\label{genericsigsourceref} +\section{Source Code Reference}\label{src:generic signatures} Key source files: \begin{itemize} @@ -1270,14 +1272,14 @@ \section{Source Code Reference}\label{genericsigsourceref} \end{itemize} \apiref{DeclContext}{class} -See also \SecRef{declarationssourceref}. +See also \SecRef{src:declarations}. \begin{itemize} \item \texttt{getGenericSignatureOfContext()} returns the generic signature of the innermost \IndexSource{declaration context}generic context, or the empty generic signature if there isn't one. \end{itemize} \index{generic context} \apiref{GenericContext}{class} -See also \SecRef{declarationssourceref}. +See also \SecRef{src:declarations}. \begin{itemize} \item \texttt{getGenericSignature()} returns the declaration's generic signature, computing it first if necessary. If the declaration does not have a generic parameter list or trailing \texttt{where} clause, returns the generic signature of the parent context. \end{itemize} @@ -1326,7 +1328,7 @@ \section{Source Code Reference}\label{genericsigsourceref} \item \texttt{print()} prints the generic signature, with various options to control the output. \item \texttt{dump()} prints the generic signature, meant for use from the debugger or ad-hoc print debug statements. \end{itemize} -Also see \SecRef{buildinggensigsourceref}. +Also see \SecRef{src:building generic signatures}. \IndexSource{generic signature query} \apiref{GenericSignatureImpl}{class} @@ -1348,7 +1350,7 @@ \section{Source Code Reference}\label{genericsigsourceref} \item \texttt{isEqual()} checks if two generic signatures are canonically equal. \item \texttt{getSugaredType()} given a type containing canonical type parameters that is understood to be written with respect to this generic signature, replaces the generic parameter types with their ``sugared'' forms, so that the name is preserved when the type is printed out to a string. \item \texttt{forEachParam()} invokes a callback on each generic parameter of the signature; the callback also receives a boolean indicating if the generic parameter type is reduced or not---a generic parameter on the left hand side of a same-type requirement is not reduced. -\item \texttt{areAllParamsConcrete()} answers if all generic parameters are fixed to concrete types via same-type requirements, which makes the generic signature somewhat like an empty generic signature. Fully-concrete generic signatures are lowered away at the SIL level. +\item \texttt{areAllParamsConcrete()} checks if all generic parameters are fixed to concrete types via same-type requirements, which makes the generic signature somewhat like an empty generic signature. Fully-concrete generic signatures are lowered away at the SIL level. \end{itemize} The generic signature queries from \SecRef{genericsigqueries} are methods on \texttt{GenericSignatureImpl}: \begin{itemize} @@ -1393,18 +1395,20 @@ \section{Source Code Reference}\label{genericsigsourceref} \IndexSource{layout requirement} \IndexSource{same-type requirement} \apiref{Requirement}{class} -A generic requirement. See also \SecRef{type resolution source ref} and \SecRef{buildinggensigsourceref}. +A generic requirement. Consists of a kind, a left-hand side which is always a \texttt{Type}, and a right-hand side, which depends on the kind. See also \SecRef{src:type resolution} and \SecRef{src:building generic signatures}. \begin{itemize} +\item \texttt{Requirement(RequirementKind, Type, Type)} constructs any requirement kind except for \texttt{RequirementKind::Layout}. +\item \texttt{Requirement(RequirementKind, Type, LayoutConstraint)} constructs a\\ \texttt{RequirementKind::Layout}. \item \texttt{getKind()} returns the \texttt{RequirementKind}. -\item \texttt{getSubjectType()} returns the subject type. -\item \texttt{getConstraintType()} returns the constraint type if the requirement kind is not \texttt{RequirementKind::Layout}, otherwise asserts. -\item \texttt{getProtocolDecl()} returns the protocol declaration of the constraint type if this is a conformance requirement with a protocol type as the constraint type. -\item \texttt{getLayoutConstraint()} returns the layout constraint if the requirement kind is \texttt{RequirementKind::Layout}, otherwise asserts. +\item \texttt{getFirstType()} returns the left-hand side type. +\item \texttt{getSecondType()} returns the right-hand side type if this requirement has any kind except for \texttt{RequirementKind::Layout}; otherwise asserts. +\item \texttt{getProtocolDecl()} returns the right-hand side protocol declaration if this is a \texttt{RequirementKind::Conformance}; otherwise asserts. +\item \texttt{getLayoutConstraint()} returns the right-hand side layout constraint if this is a \texttt{RequirementKind::Layout}; otherwise asserts. \end{itemize} \IndexSource{requirement kind} \apiref{RequirementKind}{enum class} -An enum encoding the four kinds of requirements. +Return type of \texttt{Requirement::getKind()}. \begin{itemize} \item \texttt{RequirementKind::Conformance} \item \texttt{RequirementKind::Superclass} @@ -1415,10 +1419,10 @@ \section{Source Code Reference}\label{genericsigsourceref} \IndexSource{protocol declaration} \IndexSource{class-constrained protocol} \apiref{ProtocolDecl}{class} -See also \SecRef{declarationssourceref} and \SecRef{buildinggensigsourceref}. +See also \SecRef{src:declarations} and \SecRef{src:building generic signatures}. \begin{itemize} \item \texttt{getRequirementSignature()} returns the protocol's requirement signature, first computing it, if necessary. -\item \texttt{requiresClass()} answers if the protocol is a class-constrained protocol. +\item \texttt{requiresClass()} checks if the protocol is a class-constrained protocol. \end{itemize} \IndexSource{requirement signature} @@ -1428,10 +1432,10 @@ \section{Source Code Reference}\label{genericsigsourceref} \item \texttt{getRequirements()} returns an array of \texttt{Requirement}. \item \texttt{getTypeAliases()} returns an array of \texttt{ProtocolTypeAlias}. \end{itemize} -Also see \SecRef{buildinggensigsourceref}. +Also see \SecRef{src:building generic signatures}. \IndexSource{protocol type alias} -\index{underlying type} +\IndexSource{underlying type!of type alias declaration} \apiref{ProtocolTypeAlias}{class} A protocol type alias descriptor. \begin{itemize} @@ -1442,10 +1446,10 @@ \section{Source Code Reference}\label{genericsigsourceref} \IndexSource{type parameter} \IndexSource{interface type} \apiref{TypeBase}{class} -See also \SecRef{typesourceref}. +See also \SecRef{src:types}. \begin{itemize} -\item \texttt{isTypeParameter()} answers if this type is a type parameter; that is, a generic parameter type, or a \texttt{DependentMemberType} whose base is another type parameter. -\item \texttt{hasTypeParameter()} answers if this type is itself a type parameter, or if it contains a type parameter in structural position. For example, \texttt{Array<\ttgp{0}{0}>} will answer \texttt{false} to \texttt{isTypeParameter()}, but \texttt{true} to \texttt{hasTypeParameter()}. +\item \texttt{isTypeParameter()} checks if this type is a type parameter; that is, a generic parameter type, or a \texttt{DependentMemberType} whose base is another type parameter. +\item \texttt{hasTypeParameter()} checks if this type is itself a type parameter, or if it contains a type parameter in structural position. For example, \texttt{Array<\ttgp{0}{0}>} will answer \texttt{false} to \texttt{isTypeParameter()}, but \texttt{true} to \texttt{hasTypeParameter()}. \end{itemize} \IndexSource{dependent member type} @@ -1461,7 +1465,7 @@ \section{Source Code Reference}\label{genericsigsourceref} \IndexSource{type declaration} \IndexSource{protocol order} \apiref{TypeDecl}{class} -See also \SecRef{declarationssourceref}. +See also \SecRef{src:declarations}. \begin{itemize} \item \texttt{compare()} compares two protocols by the protocol order (\DefRef{linear protocol order}), returning one of the following: \begin{itemize} diff --git a/docs/Generics/chapters/introduction.tex b/docs/Generics/chapters/introduction.tex index ebe12ce737768..8d8e20e3cfca2 100644 --- a/docs/Generics/chapters/introduction.tex +++ b/docs/Generics/chapters/introduction.tex @@ -2,22 +2,22 @@ \begin{document} -\chapter{Introduction}\label{roadmap} +\chapter{Introduction}\label{chap:introduction} \lettrine{S}{wift's generics implementation} is best understood by first considering various design constraints faced by the compiler: \begin{enumerate} \item Generic functions should be independently type checked, without knowledge of all possible generic arguments that they are invoked with. -\item Shared libraries that export generic types and functions should be able to evolve resiliently without requiring recompilation of clients. -\item Layouts of generic types should be determined by their concrete substitutions, with fields of generic parameter type stored inline. -\item Abstraction over concrete types with generic parameters should only impose a cost across module boundaries, or in other situations where type information is not available at compile time. +\item Shared libraries that export generic types and functions should be able to evolve \index{resilience}resiliently without requiring recompilation of clients. +\item Generic struct or enum types should store their fields inline without boxing, so their layout needs to be defined abstractly in terms of their generic argument types. +\item This flexibility should only impose runtime overhead when absolutely necessary, such as when calling across module boundaries, or when complete type information is not available at compile time. \end{enumerate} \noindent The high-level design can be summarized as follows: \begin{enumerate} -\item The interface between a generic function and its caller is mediated by \textbf{generic requirements}. The generic requirements describe the behavior of the generic parameter types inside the function body, and the generic arguments at the call site are checked against the function's generic requirements at compile time. -\item Generic functions receive \textbf{runtime type metadata} for each generic argument from the caller. Type metadata defines operations to abstractly manipulate values of their type without knowledge of their concrete layout. -\item Runtime type metadata is constructed for each type in the language. The \textbf{runtime type layout} of a generic type is computed recursively from the type metadata of the generic arguments. Generic types always store their contents directly, without indirection through a heap-allocated \index{boxing}box. -\item The optimizer can generate a \textbf{specialization} of a generic function in the case where the definition is visible at the call site. This eliminates the overhead of runtime type metadata and abstract value manipulation. +\item The interface between a generic declaration and its caller is given by a list of \textbf{generic parameter types} and \textbf{requirements}. Inside the generic declaration itself, the requirements impose behaviors on its generic parameters. In turn, the caller supplies a list of generic arguments, which must satisfy the requirements. +\item The calling convention of a generic function requires the caller to pass \textbf{runtime type metadata} for each generic argument type. A type metadata record describes how to abstractly manipulate values of the represented type without compile-time knowledge of its concrete layout. +\item The runtime type metadata for a generic struct or enum encodes the layout of type. This layout information is computed recursively from the type metadata of its generic arguments. Runtime type metadata is constructed lazily when first requested, and then cached. +\item If a generic function's definition is visible from its call site, the optimizer can generate a \textbf{specialization} of the generic function for the given generic argument types. When this is possible, specialization eliminates the overhead of abstract value manipulation and runtime type metadata. \end{enumerate} We're going to think of the compiler as \textsl{a library for modeling the concepts of the target language}. The Swift generics implementation defines four fundamental kinds of semantic objects: \emph{generic signatures}, \emph{substitution maps}, \emph{requirement signatures}, and \emph{conformances}. As we will see, they are understood as much by their inherent structure, as their relationships with each other. Subsequent chapters will dive into all the details, but first, we're going to present a series of worked examples. @@ -33,7 +33,7 @@ \section{Functions} \begin{Verbatim} func identity(_ x: T) -> T { return x } \end{Verbatim} -While this function declaration is trivial, it illustrates some important concepts and allows us to introduce terminology. We'll see a full description of the compilation pipeline in the next chapter, but for now, let's consider a simplified view where we begin with parsing, then type checking, and finally code generation. +While this function declaration is rather simple, it illustrates some important concepts and allows us to introduce terminology. We'll see a full description of the compilation pipeline in the next chapter, but for now, let's consider a simplified view where we begin with parsing, then type checking, and finally code generation. \begin{figure}\captionabove{The abstract syntax tree for \texttt{identity(\char`_:)}}\label{identity ast} \begin{center} @@ -69,71 +69,70 @@ \section{Functions} \index{parser} \paragraph{Parsing.} \FigRef{identity ast} shows the abstract syntax tree produced by the parser before type checking. The key elements: \begin{enumerate} -\item The \emph{generic parameter list} \texttt{} introduces a single \emph{generic parameter declaration} named \tT. As its name suggests, this declares the generic parameter type \tT, scoped to the entire source range of this function. -\item The \emph{type representation} \tT\ appears twice, first in the declaration of the parameter ``\verb|_ x: T|'' and then again, as the return type. A type representation is the purely syntactic form of a type. The parser does not perform name lookup, so the type representation stores the identifier \tT\ and does not refer to the generic parameter declaration of \tT\ in any way. -\item The function body contains an expression referencing \texttt{x}. Again, the parser does not perform name lookup, so this is just the identifier \texttt{x} and is not associated with the parameter declaration ``\verb|_ x: T|''. +\item The \emph{generic parameter list} \texttt{} introduces a single \emph{generic parameter declaration} named \tT. This declaration declares the generic parameter type \tT, which is scoped to the entire source range of this function. +\item The parameter declaration ``\verb|_ x: T|'' and the return type of \texttt{identity} both contain the \emph{type representation} \tT. A type representation is a syntactic form that denotes a reference to an existing type. The parser does not perform name lookup, so the type representation \tT\ is just an identifier; it is not immediately associated with the generic parameter declaration~\tT. +\item The function body consists of a \texttt{return} statement with the expression \texttt{x}. Again, the parser does not perform name lookup, so this expression is just the identifier \texttt{x}, not associated with the parameter declaration ``\verb|_ x: T|''. \end{enumerate} -\index{generic parameter type} -\index{generic signature} -\index{type resolution} -\index{type} -\index{interface type} -\index{generic function type} -\paragraph{Type checking.} The type checker constructs the \emph{interface type} of the function declaration from the following: +\paragraph{Type checking.} The type checker translates these syntactic forms into higher-level semantic objects: \begin{enumerate} -\item A \emph{generic signature}, to introduce the generic parameter type \tT. In our case, the generic signature has the printed representation \texttt{}. This is the simplest generic signature, apart from the empty generic signature of a non-generic declaration. We'll see more interesting generic signatures soon. +\item The \index{generic parameter type}generic parameter types declared by the generic parameter list are collected in the function's \index{generic signature}\emph{generic signature}. In our case, the generic signature has the printed representation \texttt{}. This is the simplest generic signature, apart from the empty generic signature of a non-generic declaration. We'll see more interesting generic signatures soon. -\item The \emph{interface type} of the parameter ``\verb|_ x: T|'', which is declared to be the type representation \tT\ in source. The compiler component responsible for resolving this to a semantic type is called \emph{type resolution}. In this case, we perform name lookup inside the lexical scope defined by the function, which finds the generic parameter type~\tT. +\item The \index{type resolution}\emph{type resolution} procedure resolves the parameter type representation~\tT\ to the generic parameter type~\tT, by performing a name lookup. -\item The function's return type, which also resolves to the generic parameter type~\tT. +\item The function's return type representation also resolves to the generic parameter type~\tT. \end{enumerate} -All of this is packaged up into a \emph{generic function type} which completely describes the type checking behavior of a reference to this function: +The generic signature, together with the resolved parameter and return type, are packaged up into a \index{generic function type}\emph{generic function type}, which becomes the \emph{interface type} of the function declaration. We denote this generic function type as follows: \begin{quote} \begin{verbatim} (T) -> T \end{verbatim} \end{quote} -The name ``\tT'' is of no semantic consequence beyond name lookup. We will learn that the \emph{canonical type} for the above erases the generic parameter type \tT\ to the notation \ttgp{0}{0}. More generally, each generic parameter type is uniquely identified within its lexical scope by its \emph{depth} and \emph{index}. +The name ``\tT'' is of no semantic consequence beyond name lookup. We will learn that every generic parameter type is uniquely identified within its lexical scope by its \emph{depth} and \emph{index}. The \emph{canonical type} of the above generic function type is denoted as follows, where we replace \tT\ with the \emph{canonical generic parameter type} \ttgp{0}{0}: +\begin{quote} +\begin{verbatim} +<τ_0_0> (τ_0_0) -> τ_0_0 +\end{verbatim} +\end{quote} -Having computed the interface type of the function, the type checker moves on to the function's body. The type of the return statement's expression must match the return type of the function declaration. When we type check the return expression \texttt{x}, we use name lookup to find the parameter declaration ``\verb|_ x: T|''. The interface type of this parameter declaration is the generic parameter type \tT, which is understood relative to the function's generic signature. The type assigned to the expression, however, is the \index{archetype type}\emph{archetype} corresponding to \tT, denoted $\archetype{T}$. We will learn later that an archetype is a self-describing form of a type parameter which behaves like a concrete type. +The \index{interface type}interface type completely describes the type checking behavior of a reference to a declaration. Having computed the interface type of the function, the type checker moves on to the function's body. The type of the return statement's expression must match the return type of the function declaration. When we type check the return expression \texttt{x}, we use name lookup to find the parameter declaration ``\verb|_ x: T|''. The interface type of this parameter declaration is the generic parameter type \tT, which is understood relative to the function's generic signature. The type assigned to the expression, however, is the \index{archetype type}\emph{archetype} corresponding to \tT, denoted $\archetype{T}$. We will learn later that an archetype is a self-describing form of a type parameter which behaves like a concrete type. \paragraph{Code generation.} -We've now successfully type checked our function declaration. How might we generate code for it? Recall the two concrete implementations that we folded into our single generic function: +We've now successfully type checked our function declaration. The next step in to actually lower the function to executable code. Recall the two concrete implementations that we folded together to get our generic function: \begin{Verbatim} func identity(_ x: Int) -> Int { return x } func identity(_ x: String) -> String { return x } \end{Verbatim} The \index{calling convention}calling conventions of these functions differ significantly: \begin{enumerate} -\item The first function receives and returns the \texttt{Int} value in a machine register. The \texttt{Int} type is \emph{trivial},\footnote{Or POD, for you C++ folks.} meaning it can be copied and moved at will. -\item The second function is trickier. A \texttt{String} is stored as a 16-byte value in memory, and contains a pointer to a reference-counted buffer. When manipulating values of a non-trivial type like \texttt{String}, memory ownership comes into play. +\item The first function receives and returns the \texttt{Int} value in a machine register. The \texttt{Int} type is \index{trivial type}\emph{trivial}, meaning that its values can be copied and moved without doing anything special. (\index{C++}C++ also calls this a ``POD'' type.) +\item The second function is trickier. A \texttt{String} is stored as a 16-byte value in memory, and contains a pointer to a \index{reference count}reference-counted buffer. When manipulating values of a non-trivial type like \texttt{String}, memory \index{ownership}ownership comes into play. \end{enumerate} -The standard ownership semantics for a Swift function call are defined such that the caller retains ownership over the parameter values passed into the callee, while the callee transfers ownership of the return value to the caller. Thus the implementation of \verb|identity(_:)| must create a logical copy of \texttt{x} and then move this copy back to the caller, and do this in a manner that abstracts over all possible concrete types. +The default ownership rules for a Swift function are that the caller retains ownership over the parameter values passed into the callee, while the callee transfers ownership of the return value to the caller. Thus the generic implementation of \verb|identity(_:)| must create a logical copy of \texttt{x} and then move this copy back to the caller, and do this in a manner that abstracts over all possible concrete types. The calling convention for a generic function passes \index{runtime type metadata}\emph{runtime type metadata} for each generic parameter in the function's generic signature. Runtime type metadata describes the size and alignment of a concrete type, and provides implementations of the \emph{move}, \emph{copy} and \emph{destroy} operations. -We move and copy values of trivial type by copying bytes; destroy does nothing in this case. With a reference type, the value is a reference-counted pointer; copy and destroy operations update the reference count, while a move leaves the reference count unchanged. The value operations for structs and enums are defined recursively from their members. Finally, weak references and existential types also have non-trivial value operations. +The move and copy operations for trivial type copy bytes, while destroy does nothing. With a reference type, the value is a \index{reference count}reference-counted pointer, so copy and destroy operations update the \index{reference counting}reference count while a move leaves the reference count unchanged. The value operations for structs and enums are defined recursively from their members. Finally, \index{weak reference type}weak references and existential types also have special value operations. For copyable types, a move is semantically equivalent to a copy followed by a destroy, only more efficient. Traditionally, all types in the language were copyable. \IndexSwift{5.9}Swift 5.9 introduced \index{noncopyable type}\emph{noncopyable types} \cite{se0390}, and \IndexSwift{6.0}Swift 6 extended generics to work with noncopyable types \cite{se0427}. We will not discuss noncopyable types in this book. \begin{MoreDetails} -\item Types: \ChapRef{types} +\item Types: \ChapRef{chap:types} \item Generic parameter lists: \SecRef{generic params} \item Function declarations: \SecRef{function decls} -\item Archetypes: \ChapRef{genericenv} -\item Type resolution: \ChapRef{typeresolution} +\item Archetypes: \ChapRef{chap:archetypes} +\item Type resolution: \ChapRef{chap:type resolution} \end{MoreDetails} \index{call expression} \index{expression} -\paragraph{Substitution maps.} Let us now turn our attention to the callers of generic functions. A \emph{call expression} brings together a \emph{callee} and a list of argument expressions. A callee is just an expression of function type. This function type's parameters must match the argument expressions, and its return type is then the type of the call expression. Some possible callees include an expression that names an existing function declaration, type expressions (which is sugar for invoking a constructor member of the type), function parameters and local variables of function type, and even other calls whose result has function type. In our example, we might call the \verb|identity(_:)| function as follows: +\paragraph{Substitution maps.} Let us now turn our attention to the callers of generic functions. A \emph{call expression} brings together a \emph{callee} and a list of argument expressions. A callee is some expression with a function type; this can be a reference to a named function declaration, a reference to a type (which is sugar for invoking a constructor), a reference to a parameter or local variable of function type, or most generally, the result of another call expression. In our example, we can call the \verb|identity(_:)| function by name: \begin{Verbatim} -identity(3) -identity("Hello, Swift") +let x = identity(3) +let y = identity("Hello, Swift") \end{Verbatim} -In Swift, calls to generic functions do not specify their generic arguments explicitly; the type checker infers them from the types of call argument expressions. Generic arguments are encoded in a \index{substitution map}\emph{substitution map}, which assigns a \emph{replacement type} to each generic parameter type of the callee's generic signature. +In Swift, a call to a generic function does not specify the generic argument types in the syntax; instead, the type checker infers generic arguments by matching the function type of the callee against the types of the call argument expressions, as well as the expression's expected result type, if there is one. The inferred generic arguments are collected in a \index{substitution map}\emph{substitution map}, which is a data structure that assigns a \emph{replacement type} to each generic parameter type of the callee's generic signature. The generic signature of \verb|identity(_:)| has a single generic parameter type. Thus each of its substitution maps holds a single concrete type. We now introduce some notation. Here are two possible substitution maps, corresponding to the two calls shown above: \[ @@ -147,14 +146,14 @@ \section{Functions} \tT \otimes \Sigma_2 = \texttt{String} \] -Substitution maps play a role in code generation. When calling a generic function, the compiler must realize the runtime type metadata for each replacement type in the substitution map of the call. In our example, the types \texttt{Int} and \texttt{String} are \emph{nominal types} defined in the standard library. They are non-generic and have a fixed layout, so their runtime type metadata is accessed by calling a function, exported by the standard library, that returns the address of a constant symbol. +Substitution maps play a role in code generation. When lowering a call to a generic function, the compiler generates code to construct the runtime type metadata for each replacement type in the call expression's \index{substitution map!runtime type metadata}substitution map. In our example, the types \texttt{Int} and \texttt{String} are \emph{nominal types} defined in the standard library. They are non-generic and have a fixed layout, so their runtime type metadata is accessed by calling a function, exported by the standard library, that returns the address of a constant symbol. \begin{MoreDetails} -\item Substitution maps: \ChapRef{substmaps} +\item Substitution maps: \ChapRef{chap:substitution maps} \end{MoreDetails} \index{inlinable function} -\paragraph{Specialization.} Reification of runtime type metadata and the subsequent indirect manipulation of values incurs a performance penalty. As an alternative, if the definition of a generic function is visible at the call site, the optimizer can generate a \emph{specialization} of the generic function by cloning the definition and applying the substitution map to all types appearing in the function's body. Definitions of generic functions are always visible to the specializer within their defining module. Shared library developers can also opt-in to exporting the body of a function across module boundaries with the \texttt{@inlinable} attribute. +\paragraph{Specialization.} Reification of \index{runtime type metadata}runtime type metadata and the subsequent indirect manipulation of values incurs a performance penalty. As an alternative, if the definition of a generic function is visible at the call site, the optimizer can generate a \IndexDefinition{specialization}\index{monomorphization|see{specialization}}\emph{specialization} of the generic function by cloning the definition and applying the substitution map to all types appearing in the function's body. Definitions of generic functions are always visible to the specializer within their defining module. Shared library developers can also opt-in to exporting the body of a function across module boundaries with the \texttt{@inlinable} attribute. \begin{MoreDetails} \item \texttt{@inlinable} attribute: \SecRef{module system} @@ -180,16 +179,15 @@ \section{Nominal Types} The in-memory layout of a struct value is determined by the interface types of its stored properties. Our \texttt{Pair} struct declares two stored properties, \texttt{first} and \texttt{second}, both with interface type \tT. Thus the layout of a \texttt{Pair} depends on the layout of the generic parameter type~\tT. -The generic nominal type \texttt{Pair}, formed by taking the generic parameter type as the argument, is called the \emph{declared interface type} of \texttt{Pair}. The type \texttt{Pair} is a \emph{specialized type} of the declared interface type, \texttt{Pair}. We can obtain \texttt{Pair} from \texttt{Pair} by applying a substitution map: +The generic nominal type \texttt{Pair}, formed by taking the generic parameter type \tT\ as the argument, is called the \emph{declared interface type} of \texttt{Pair}. The type \texttt{Pair} is a \emph{specialized type} of the declared interface type, \texttt{Pair}. We can obtain \texttt{Pair} from \texttt{Pair} by applying a substitution map: \[\texttt{Pair} \otimes \SubstMap{\SubstType{T}{Int}} = \texttt{Pair}\] - -This ``factorization'' is our first example of what is in fact an algebraic identity. The \index{context substitution map}\emph{context substitution map} of a generic nominal type is the substitution map formed from its generic arguments. This has the property that applying the context substitution map to the declared interface type recovers the original generic nominal type. Suppose we declare a local variable of type \texttt{Pair}: +The above substitution map is known as the \index{context substitution map}\emph{context substitution map} of \texttt{Pair}. Every specialized type has a context substitution map, and if we apply the context substitution map to its declared interface type, we will get back the specialized type. Now, suppose we declare a local variable of type \texttt{Pair}: \begin{Verbatim} let twoIntegers: Pair = ... \end{Verbatim} -The compiler must allocate storage on the stack for this value. We take the context substitution map, and apply it to the interface type of each stored property: +The compiler must allocate storage on the stack for this value. We take the context substitution map, and apply it to the interface type of each stored property. Since \texttt{Pair} has two stored properties of type \tT, we get: \[\tT \otimes \SubstMap{\SubstType{T}{Int}} = \texttt{Int}\] -We see that a value of type \texttt{Pair} stores two consecutive values of type \texttt{Int}, which gives \texttt{Pair} a size of 16 bytes and alignment of 8 bytes. Since \texttt{Pair} is trivial, the stack allocation does not require any special cleanup once we exit its scope. +So a \texttt{Pair} consists of two consecutive \texttt{Int}s, which gives \texttt{Pair} a total size of 16 bytes and alignment of 8 bytes. Since \texttt{Pair} is trivial, the stack allocation does not require any special cleanup once we exit its scope. Now, we complete our local variable declaration by writing down an \index{expression}\index{initial value expression}initial value expression which calls the constructor: \begin{Verbatim} @@ -205,19 +203,19 @@ \section{Nominal Types} The metadata access function for a generic type takes the metadata for each generic argument, and calculates the offset of each stored property, also obtaining the size and alignment of the entire value. The move, copy and destroy operations of the aggregate type delegate to the corresponding operations in the generic argument metadata. The constructor of \texttt{Pair} then uses the runtime type metadata for both \texttt{Pair} and \tT\ to correctly initialize the aggregate value from its two constituent parts. \index{structural type} -Structural types, such as function types, tuple types and metatypes, are similar to generic nominal types in that we call a metadata access function to obtain runtime type metadata for them, but this time, the metadata access function is part of the Swift runtime. For example, to construct metadata for the tuple type \texttt{(Int, Pair)}, we first call the metadata access function for \texttt{Pair} to get \texttt{Pair}, then call an entry point in the runtime to obtain \texttt{(Int, Pair)}. +Structural types, such as \index{function type}function types, \index{tuple type}tuple types, and \index{metatype type}metatypes, are similar to generic nominal types in that we call a metadata access function to obtain runtime type metadata for them, but this time, the metadata access function is part of the Swift runtime. For example, to construct metadata for the tuple type \texttt{(Int, Pair)}, we first call the metadata access function for \texttt{Pair} to get \texttt{Pair}, then call an entry point in the runtime to obtain \texttt{(Int, Pair)}. \begin{MoreDetails} -\item Declarations: \ChapRef{decls} +\item Declarations: \ChapRef{chap:decls} \item Context substitution map: \SecRef{contextsubstmap} -\item Structural types: \SecRef{more types} +\item Structural types: \SecRef{sec:more types} \end{MoreDetails} \section{Protocols}\label{intro protocols} -Our \verb|identity(_:)| and \texttt{Pair} declarations both abstract over arbitrary concrete types, but in turn, this limits their generic parameter \tT\ to the common capabilities shared by all types---the move, copy and destroy operations. By stating \emph{generic requirements}, a generic declaration can impose various restrictions on the concrete types used as generic arguments, which in turn endows its generic parameter types with new capabilities provided by those concrete types. +Our \verb|identity(_:)| and \texttt{Pair} declarations abstract over arbitrary concrete types, but this in turn limits their generic parameter \tT\ to the common capabilities shared by all types---the move, copy and destroy operations. By stating \emph{generic requirements}, a generic declaration can impose various restrictions on the concrete types used as generic arguments, which in turn endows its generic parameter types with new capabilities provided by those concrete types. -A \emph{protocol} abstracts over the capabilities of a concrete type. By stating a \index{conformance requirement}\emph{conformance requirement} between a generic parameter type and protocol, a generic declaration can require that its generic argument is a concrete type that \emph{conforms} to this protocol: +A \emph{protocol} specifies some additional capabilities that a concrete type may have. A generic declaration can state a \index{conformance requirement}\emph{conformance requirement} on a generic parameter type, which the caller must satisfy with a concrete type that \emph{conforms} to this protocol: \begin{Verbatim} protocol Shape { func draw() @@ -242,7 +240,7 @@ \section{Protocols}\label{intro protocols} (Array) -> () \end{verbatim} \end{quote} -We can also write the \verb|drawShapes(_:)| function to state the conformance requirement with a trailing \texttt{where} clause, or we can avoid naming the generic parameter \texttt{S} by using an \emph{opaque parameter type} instead: +We can change \verb|drawShapes(_:)| function to state the conformance requirement with a trailing \texttt{where} clause, or we can avoid naming the generic parameter \texttt{S} by using an \emph{opaque parameter type} instead: \begin{Verbatim} func drawShapes(_ shapes: Array) where S: Shape func drawShapes(_ shapes: Array) @@ -250,33 +248,33 @@ \section{Protocols}\label{intro protocols} All three forms of \verb|drawShapes(_:)| are ultimately equivalent, because they define the same generic signature, up to the choice of generic parameter name. In general, when there is more than one way to spell the same underlying language construct due to syntax sugar, the semantic objects ``desugar'' these differences into the same uniform representation. \begin{MoreDetails} \item Protocols: \SecRef{protocols} -\item Requirements: \SecRef{requirements} -\item Generic signatures: \ChapRef{genericsig} +\item Requirements: \SecRef{sec:requirements} +\item Generic signatures: \ChapRef{chap:generic signatures} \end{MoreDetails} \paragraph{Qualified lookup.} -Once we have a generic signature, we can type check the body of \verb|drawShapes(_:)|. The \texttt{for}~loop introduces a local variable ``\texttt{shape}'' of type $\archetype{S}$ (we re-iterate that the generic parameter type \texttt{S} is represented as the archetype $\archetype{S}$ inside a function body, but the distinction doesn't matter right now). This variable is referenced inside the \texttt{for} loop by the \index{member expression}\emph{member expression} ``\texttt{shape.draw}'': +Once we have a generic signature, we can type check the body of \verb|drawShapes(_:)|. The \texttt{for}~loop introduces a local variable ``\texttt{shape}'' of type $\archetype{S}$ (we re-iterate that the generic parameter type \texttt{S} is represented as the archetype $\archetype{S}$ inside a function body, but the distinction doesn't matter right now). This variable is referenced inside the \texttt{for} loop by the \index{member reference expression}\emph{member reference expression} ``\texttt{shape.draw}'': \begin{Verbatim}[firstnumber=6] for shape in shapes { shape.draw() } \end{Verbatim} -Our generic signature has the conformance requirement $\ConfReq{S}{Shape}$, so the caller must provide a replacement type for \texttt{S} conforming to \texttt{Shape}. We'll return to the caller's side of the equation shortly, but inside the callee, the requirement also tells us that the archetype $\archetype{S}$ conforms to \texttt{Shape}. To resolve the member expression, the type checker performs a \index{qualified lookup}\emph{qualified lookup} of the identifier \texttt{draw} with a base type of $\archetype{S}$. A qualified lookup into an archetype checks each protocol the archetype conforms to, so we find and return the \texttt{draw()} method of the \texttt{Shape} protocol. +Our generic signature has the conformance requirement $\ConfReq{S}{Shape}$, so the caller must provide a replacement type for \texttt{S} conforming to \texttt{Shape}. We'll return to the caller's side of the equation shortly, but inside the callee, the requirement also tells us that the archetype $\archetype{S}$ conforms to \texttt{Shape}. To resolve the member reference ``\texttt{shape.draw}'', the type checker performs a \index{qualified lookup}\emph{qualified lookup} of the identifier \texttt{draw} on the base type of $\archetype{S}$, which is the type of ``\texttt{shape}''. A qualified lookup into an archetype visits each protocol the archetype conforms to, so we find and return the \texttt{draw()} method of the \texttt{Shape} protocol. -How does the compiler generate code for the call \verb|shape.draw()|? Together with the runtime type metadata for \texttt{S}, the \index{calling convention}calling convention for \verb|drawShapes(_:)| passes an additional argument, corresponding to the conformance requirement $\ConfReq{S}{Shape}$. This argument is the \index{witness table}\emph{witness table} for the conformance. The layout of a witness table is determined by the protocol's members; a witness table for a conformance to \texttt{Shape} has a single entry, the implementation of the \texttt{draw()} method. To call \texttt{shape.draw()}, we load the function pointer from the witness table, and invoke it with the value of \texttt{shape}. +How do we lower the call expression ``\texttt{shape.draw()}'' to executable code? As well as the runtime type metadata for \texttt{S}, the \index{calling convention}calling convention for \verb|drawShapes(_:)| has another parameter, corresponding to the conformance requirement $\ConfReq{S}{Shape}$. This parameter is used to pass the \index{witness table}\emph{witness table} for the conformance. The layout of a witness table is determined by the protocol; a witness table for a conformance to \texttt{Shape} has a single entry, the implementation of the \texttt{draw()} method. So to call ``\texttt{shape.draw()}'', we load this function pointer from the witness table and call it, passing in the \texttt{shape}. \begin{MoreDetails} \item Name lookup: \SecRef{name lookup} \end{MoreDetails} -\paragraph{Conformances.} This \texttt{Circle} type states a \index{conformance}\emph{conformance} to the \texttt{Shape} protocol: +\paragraph{Conformances.} This \texttt{Circle} struct states a \index{conformance}\emph{conformance} to the \texttt{Shape} protocol: \begin{Verbatim} struct Circle: Shape { let radius: Double func draw() {...} } \end{Verbatim} -The \index{conformance checker}\emph{conformance checker} ensures that the conforming type \texttt{Circle} declares a \emph{witness} for the \texttt{draw()} method of \texttt{Shape}, and records this fact in a \index{normal conformance}\emph{normal conformance}. We denote this normal conformance by $\ConfReq{Circle}{Shape}$. When generating code for the declaration of \texttt{Circle}, we also emit the witness table for the normal conformance $\ConfReq{Circle}{Shape}$. This witness table contains a pointer to the implementation of \texttt{Circle.draw()}. +The \index{conformance checker}\emph{conformance checker} ensures that the declaration of \texttt{Circle} contains a \emph{witness} for the \texttt{draw()} method of \texttt{Shape}, and records this fact in a \index{normal conformance}\emph{normal conformance}. We denote this normal conformance by $\ConfReq{Circle}{Shape}$. When we visit \texttt{Circle} in code generation, we emit its runtime type metadata, together with the witness table for the normal conformance $\ConfReq{Circle}{Shape}$. This witness table contains a pointer to the implementation of \texttt{Circle.draw()}. Now, let's call \verb|drawShapes(_:)| with an array of circles and look at the substitution map for the call: \begin{Verbatim} @@ -284,13 +282,13 @@ \section{Protocols}\label{intro protocols} \end{Verbatim} When the callee's generic signature has conformance requirements, the substitution map must store a conformance for each conformance requirement. This is the ``proof'' that the \index{conforming type}concrete replacement type actually conforms to the protocol, as required. We denote a substitution map with conformances as follows: \[\SubstMapC{\SubstType{S}{Circle}}{\SubstConf{S}{Circle}{Shape}}\] -To find the normal conformance, the type checker performs a \index{global conformance lookup}\emph{global conformance lookup} with the concrete type and protocol: +To find the conformance, the type checker performs a \index{global conformance lookup}\emph{global conformance lookup} with the concrete type and protocol. We use this notation for global conformance lookup: \[\Proto{Shape}\otimes\texttt{Circle}=\ConfReq{Circle}{Shape}\] When generating code for the call to \verb|drawShapes(_:)|, we visit each entry in the substitution map, emitting a reference to runtime type metadata for each replacement type, and a reference to the witness table for each conformance. In our case, we pass the runtime type metadata for \texttt{Circle} and witness table for $\ConfReq{Circle}{Shape}$. \begin{MoreDetails} -\item Conformances: \ChapRef{conformances} +\item Conformances: \ChapRef{chap:conformances} \item Conformance lookup: \SecRef{conformance lookup} \end{MoreDetails} @@ -303,11 +301,10 @@ \section{Protocols}\label{intro protocols} } } \end{Verbatim} - -This function uses the \texttt{Shape} protocol in a new way. The existential type \texttt{any Shape} is a container for a value of some concrete type, together with runtime type metadata, and a witness table describing the conformance. This container stores small values inline, otherwise it points to a heap-allocated \index{boxing}box. Observe the difference between the type \texttt{Array} from the previous variant of \verb|drawShapes(_:)|, and \texttt{Array} here. Every element of the latter has its own runtime type metadata and witness table, so we can mix multiple types of shapes in one array. In the implementation, existential types are built on top of the core primitives of the generics system. +This function uses the \texttt{Shape} protocol in a new way. The existential type \texttt{any Shape} is a container for a value of some concrete type, together with its runtime type metadata and a witness table describing the conformance. This container stores small values inline, otherwise it points to a heap-allocated \index{boxing}box. Observe the difference between the type \texttt{Array} from the previous variant of \verb|drawShapes(_:)|, and \texttt{Array} here. Every element of the latter has its own runtime type metadata and witness table, so we can mix multiple types of shapes in one array. In the implementation, existential types are built on top of the core primitives of the generics system. \begin{MoreDetails} -\item Existential types: \ChapRef{existentialtypes} +\item Existential types: \ChapRef{chap:existential types} \end{MoreDetails} \section{Associated Types} @@ -321,10 +318,10 @@ \section{Associated Types} \end{Verbatim} A conforming type must declare a member type named \texttt{Element}, and a \texttt{next()} method returning an optional value of this type. This member type, which can be a type alias or nominal type, is the \emph{type witness} for the associated type \texttt{Element}. -We declare a \texttt{Nat} type conforming to \texttt{IteratorProtocol} with an \texttt{Element} type of \texttt{Int}, for generating an infinite stream of consecutive natural numbers: +Let's declare a \texttt{Nat} type conforming to \texttt{IteratorProtocol} with an \texttt{Element} type of \texttt{Int}, for generating an infinite stream of consecutive natural numbers: \begin{Verbatim} struct Nat: IteratorProtocol { - typealias Element = Int + typealias Element = Int // (can also be omitted in this case) var x = 0 mutating func next() -> Int? { @@ -335,7 +332,7 @@ \section{Associated Types} \end{Verbatim} We say that \texttt{Int} is the \emph{type witness} for \texttt{[IteratorProtocol]Element} in the conformance $\ConfReq{Nat}{IteratorProtocol}$. We can express this in our type substitution algebra, using the type witness \emph{projection} operation on normal conformances: \[\AssocType{IteratorProtocol}{Element}\otimes \ConfReq{Nat}{IteratorProtocol} = \texttt{Int}\] -(More accurately, the type witness is the \texttt{Element} \index{type alias type}\emph{type alias type}, which is \emph{canonically equal} to \texttt{Int}.) Finally, we can actually omit the declaration of the \texttt{Element} type alias above, because in this case, \index{associated type inference}\emph{associated type inference} will deduce it for us. +(More accurately, the type witness here is the \texttt{Element} \index{type alias type}\emph{type alias type}, whose canonical type is \texttt{Int}.) Finally, we can actually omit the declaration of the \texttt{Element} type alias above, in which case \index{associated type inference}\emph{associated type inference} will be able to deduce it for us. \begin{MoreDetails} \item Type witnesses: \SecRef{type witnesses} @@ -348,53 +345,54 @@ \section{Associated Types} return Pair(first: iter.next()!, second: iter.next()!) } \end{Verbatim} -The return type is the generic nominal type \texttt{Pair}, obtained by applying the declaration of \texttt{Pair} to the generic argument \texttt{I.Element}. The generic argument is a \index{dependent member type}\emph{dependent member type}, built from the base type \texttt{I} together with the associated type declaration \texttt{[IteratorProtocol]Element}. This dependent member type represents the type witness in the conformance $\ConfReq{I}{IteratorProtocol}$. +The return type is the generic nominal type \texttt{Pair}, constructed from the declaration of \texttt{Pair} and the generic argument type \texttt{I.Element}. This generic argument type is a \index{dependent member type}\emph{dependent member type}, made from the base type \texttt{I} and a reference to the associated type declaration \texttt{[IteratorProtocol]Element}. This dependent member type represents the type witness in the conformance $\ConfReq{I}{IteratorProtocol}$. Suppose we call \verb|readTwo(_:)| with a value of type \texttt{Nat}: \begin{Verbatim} var iter = Nat() print(readTwo(&iter)) \end{Verbatim} -The substitution map for the call stores the replacement type \texttt{Nat} and the conformance of \texttt{Nat} to \texttt{IteratorProtocol}. We'll call this substitution map~$\Sigma$: +The substitution map for the call stores the replacement type \texttt{Nat} and the conformance of \texttt{Nat} to \texttt{IteratorProtocol}. Let's denote this substitution map by~$\Sigma$: \begin{align*} \Sigma := \SubstMapC{&\SubstType{I}{Nat}}{\\ &\SubstConf{I}{Nat}{IteratorProtocol}} \end{align*} -The type of the call expression is then $\texttt{Pair}\otimes\Sigma$. Applying a substitution map to a generic nominal type recursively substitutes the generic arguments. Since the dependent member type \texttt{I.Element} abstracts over the type witness in the conformance, we could guess $\texttt{I.Element}\otimes\Sigma=\texttt{Int}$. We will eventually understand the next equation, to relate dependent member type substitution with type witness projection: +To get the return type of our call, we evaluate $\texttt{Pair}\otimes\Sigma$. Applying a substitution map to a generic nominal type recursively applies the substitution map to each generic argument, so it remains to define $\texttt{I.Element}\otimes\Sigma$. Since this dependent member type abstracts over the type witness in the conformance, it must be the case that $\texttt{I.Element}\otimes\Sigma=\texttt{Int}$. We will eventually derive the below equation that relates dependent member type substitution with type witness projection: \begin{gather*} \texttt{I.Element}\otimes\Sigma\\ \qquad {} = \AssocType{IteratorProtocol}{Element}\otimes \ConfReq{I}{IteratorProtocol}\otimes\Sigma\\ \qquad {} = \AssocType{IteratorProtocol}{Element}\otimes \ConfReq{Nat}{IteratorProtocol}\\ \qquad {} = \texttt{Int} \end{gather*} -We can finally say that $\texttt{Pair}\otimes\Sigma=\texttt{Pair}$ is the return type of our call to \verb|readTwo(_:)|. +Once we have the above, we can conclude that the return type of our call to \verb|readTwo(_:)| is $\texttt{Pair}\otimes\Sigma=\texttt{Pair}$. \begin{MoreDetails} \item Dependent member type substitution: \SecRef{abstract conformances}, \ChapRef{conformance paths} \end{MoreDetails} -\paragraph{Bound and unbound.} -We now briefly introduce a concept that will later become important in our study of type substitution. If we need to make the associated type declaration explicit, we use the notation \verb|I.[IteratorProtocol]Element|, despite this not being valid language syntax. This is the \index{bound dependent member type}\emph{bound} form of a dependent member type. To transform the syntactic type representation \texttt{I.Element} into a bound dependent member type, type resolution queries qualified lookup on the base type \texttt{I}, which is known to conform to \texttt{IteratorProtocol}; thus, the lookup finds the associated type declaration \texttt{Element}. For our current purposes, it is more convenient to use the \index{unbound dependent member type}\emph{unbound} notation for a dependent member type, written in the source language style \texttt{I.Element}. - -\begin{MoreDetails} -\item Member type representations: \SecRef{identtyperepr} -\end{MoreDetails} - \paragraph{Type parameters.} Generic parameter types and dependent member types are the two kinds of \index{type parameter}\emph{type parameters}. The generic signature of \verb|readTwo(_:)| defines two type parameters, \texttt{I} and \texttt{I.Element}. -As with generic parameter types, dependent member types map to \index{archetype type}archetypes in the body of a generic function. We can reveal a little more about the structure of archetypes now, and say that an archetype packages a type parameter together with a generic signature. While a type parameter is just a \emph{name} which can only be understood in relation to some generic signature, an archetype inherently ``knows'' what requirements it is subject to. +As with generic parameter types, dependent member types map to \index{archetype type}archetypes in the body of a generic function. We can reveal a little more about the structure of archetypes now, and say that an archetype packages a type parameter together with a generic signature. While a type parameter is like a ``name'' which can only be understood in relation to some generic signature, an archetype inherently ``knows'' what requirements it is subject to. \begin{MoreDetails} \item Type parameters: \SecRef{fundamental types} \item Primary archetypes: \SecRef{archetypesubst} \end{MoreDetails} +\paragraph{Bound and unbound.} +There are actually two kinds of dependent member types. A \index{bound dependent member type}\emph{bound} dependent member type references an associated type declaration. We will denote this by \verb|I.[IteratorProtocol]Element|, despite this not being valid language syntax. An \index{unbound dependent member type}\emph{unbound} dependent member type references an identifier, which we denote by the source language style \texttt{I.Element}. The two forms have distinct representations, but we will see they are equivalent in a strong sense. We will often prefer the unbound form as a notational convenience, but bound dependent member types will later become important in our study of type substitution. + +\begin{MoreDetails} +\item Member type representations: \SecRef{identtyperepr} +\item Bound type parameters: \SecRef{bound type params} +\end{MoreDetails} + \paragraph{Code generation.} -Inside the body of \verb|readTwo(_:)|, the \index{expression}\index{call expression}call expression \verb|iter.next()!| has the type \texttt{$\archetype{I.Element}$?}, which is force-unwrapped to yield the type $\archetype{I.Element}$. To manipulate a value of this type abstractly, we need its runtime type metadata. +Inside the body of \verb|readTwo(_:)|, the \index{expression}\index{call expression}call expression \verb|iter.next()| has the type \texttt{$\archetype{I.Element}$?}, which we force-unwrap using the ``\texttt{!}'' operator to yield a value of type $\archetype{I.Element}$. To manipulate a value of this type abstractly, we need its runtime type metadata. -We recover the runtime type metadata for an associated type from the witness table at run time, in the same way that dependent member type substitution projects a type witness from a conformance at compile time. +To recover the runtime type metadata for a dependent member type at run time, we consult the witness table for the conformance. This mirrors how at compile time, we substitute a dependent member type by projecting a type witness from the conformance. -A witness table for a conformance to \texttt{IteratorProtocol} consists of a metadata access function to witness the \texttt{Element} associated type, and a function pointer to witness the \texttt{next()} method. The witness table for $\ConfReq{Nat}{IteratorProtocol}$ references the runtime type metadata for \texttt{Int}, defined by the standard library. +A witness table for a conformance to \texttt{IteratorProtocol} consists of a pair of function pointers, the first to recover the runtime type metadata for \texttt{Element}, and the second for the implementation of \texttt{next()}. The witness table for our $\ConfReq{Nat}{IteratorProtocol}$ thus references the runtime type metadata for \texttt{Int} from the standard library. \paragraph{Same-type requirements.} To introduce another fundamental requirement kind, we compose \texttt{Pair} and \texttt{IteratorProtocol} in a new way, writing a function that takes two iterators and reads an element from each one: \begin{Verbatim} @@ -417,7 +415,7 @@ \section{Associated Types} \[\SubstMap{\SubstType{T}{$\archetype{I.Element}$}}\] \begin{MoreDetails} -\item Reduced types: \SecRef{reduced types} +\item Reduced type parameters: \SecRef{reduced types} \item The type parameter graph: \SecRef{type parameter graph} \end{MoreDetails} @@ -461,20 +459,13 @@ \section{Associated Requirements} func makeIterator() -> Iterator } \end{Verbatim} -This protocol states two associated requirements: +The associated requirements of a protocol are recorded in the protocol's \index{requirement signature}\emph{requirement signature}. The \texttt{Sequence} protocol states two associated requirements: \begin{itemize} \item The \index{associated conformance requirement}conformance requirement $\ConfReq{Self.Iterator}{IteratorProtocol}$, stated using the sugared form, as a constraint type in the inheritance clause of the \texttt{Iterator} associated type. \item The \index{associated same-type requirement}same-type requirement $\SameReq{Self.Element}{Self.Iterator.Element}$, which we state in a trailing \texttt{where} clause attached to the associated type. \end{itemize} Associated requirements are like the requirements in a generic signature, except they are rooted in the \IndexSelf protocol \tSelf\ type. Once again, there are multiple equivalent syntactic forms for stating them. For example, we could write out the conformance requirement explicitly, and the above \texttt{where} clause could be attached to the protocol itself for the same semantic effect. -The associated requirements of a protocol are recorded in the protocol's \index{requirement signature}\emph{requirement signature}. The \texttt{Sequence} protocol has the following requirement signature: -\begin{quote} -\begin{verbatim} - -\end{verbatim} -\end{quote} Now consider this generic signature, and call it $G$: \begin{quote} \begin{verbatim} @@ -482,11 +473,11 @@ \section{Associated Requirements} T.Element == U.Element> \end{verbatim} \end{quote} -We can informally describe $G$ by looking at its equivalence classes: +We can informally describe $G$ by classifying its type parameters into equivalence classes: \begin{itemize} \item \tT, which conforms to \texttt{Sequence}. \item \texttt{U}, which conforms to \texttt{Sequence}. -\item \texttt{T.Element}, \texttt{U.Element}, \texttt{T.Iterator.Element} and \texttt{U.Iterator.Element} are all in one equivalence class. +\item \texttt{T.Element}, \texttt{U.Element}, \texttt{T.Iterator.Element}, and \texttt{U.Iterator.Element}, which are all in one equivalence class. \item \texttt{T.Iterator}, which conforms to \texttt{IteratorProtocol}. \item \texttt{U.Iterator}, which conforms to \texttt{IteratorProtocol}. \end{itemize} @@ -499,37 +490,44 @@ \section{Associated Requirements} \SameReq{T.Iterator.Element}{U.Iterator.Element} \end{gather*} -At this point, it's worth clarifying that type parameters have a recursive structure; the base type of \texttt{U.Iterator.Element} is another dependent member type, \texttt{U.Iterator}. The theory of derived requirements will also describe the \emph{valid type parameters} of a generic signature. +At this point, it's worth clarifying that type parameters have a recursive structure; the base type of \texttt{U.Iterator.Element} is another dependent member type, \texttt{U.Iterator}. Note that not every such combination is meaningful. The theory of derived requirements will also describe the subset of \emph{valid type parameters} of a generic signature. \paragraph{Conformances.} -A normal conformance stores the \index{associated conformance}\emph{associated conformance} for each associated conformance requirement of its protocol. In the runtime representation, there is a corresponding entry in the witness table for each associated conformance requirement. A witness table for \texttt{Sequence} has four entries: +A normal conformance also stores an \index{associated conformance}\emph{associated conformance} for each associated conformance requirement of its protocol. The \index{associated conformance projection}\emph{associated conformance projection} operation recovers this conformance. At run time, a witness table for a conformance has a corresponding entry for each associated conformance. For example, a witness table for a conformance to \texttt{Sequence} has four entries: \begin{enumerate} -\item A metadata access function to witness the \texttt{Element} associated type. -\item A metadata access function to witness the \texttt{Iterator} associated type. -\item A \index{witness table}\emph{witness table access function} to witness the associated conformance requirement \ConfReq{Self.Iterator}{IteratorProtocol}. -\item A function pointer to witness the \texttt{makeIterator()} protocol method. +\item A metadata access function for \texttt{Element}. +\item A metadata access function for \texttt{Iterator}. +\item A \index{witness table}\emph{witness table access function} for \ConfReq{Self.Iterator}{IteratorProtocol}. +\item A function pointer for the implementation of \texttt{makeIterator()}. \end{enumerate} -Protocol inheritance is the special case of an associated conformance requirement with a subject type of \tSelf. The standard library \texttt{Collection} protocol inherits from \texttt{Sequence}, so the associated conformance requirement $\ConfReq{Self}{Sequence}$ appears in the requirement signature of \texttt{Collection}. Starting from a conformance $\ConfReq{Array}{Collection}$, we can get at the conformance to \texttt{Sequence} via \emph{associated conformance projection}: +A \emph{protocol inheritance} relation is represented by an associated conformance requirement with a subject type of \tSelf. For example, the standard library \texttt{Collection} protocol inherits from \texttt{Sequence}, so the associated conformance requirement $\ConfReq{Self}{Sequence}$ appears in the requirement signature of \texttt{Collection}: +\begin{Verbatim} +protocol Collection: Sequence {...} +\end{Verbatim} +Starting from a conformance $\ConfReq{Array}{Collection}$, we can get at the conformance to \texttt{Sequence} via associated conformance projection: \[\AssocConf{Self}{Sequence}\otimes\ConfReq{Array}{Collection}=\ConfReq{Array}{Sequence}\] -Similarly, at runtime, starting from a witness table for a conformance to \texttt{Collection}, we can recover a witness table for a conformance to \texttt{Sequence}, and thus all of the metadata described above. We will take a closer look at the other associated requirements of the \texttt{Collection} protocol in due time. This will lead us into the topic of \emph{recursive} associated conformance requirements, which as we show, enable the type substitution algebra to encode any computable function. +This also means that at run time, we can recover a witness table for a conformance to \texttt{Sequence}, from a witness table for a conformance to \texttt{Collection}. We will take a closer look at the other associated requirements of the \texttt{Collection} protocol in due time. This will lead us into the topic of \emph{recursive} associated conformance requirements, which as we show, enable the type substitution algebra to encode any computable function. \begin{MoreDetails} \item Requirement signatures: \SecRef{requirement sig} \item Derived requirements: \SecRef{derived req} +\item Valid type parameters: \SecRef{valid type params} \item Associated conformances: \SecRef{associated conformances} \item Recursive conformances: \SecRef{recursive conformances} \end{MoreDetails} \section{Related Work} -A \index{calling convention}calling convention centered on a runtime representation of types was explored in a 1996 paper \cite{intensional}. Swift protocols are similar in spirit to \index{Haskell}Haskell type classes, described in \cite{typeclass}, and subsequently \cite{typeclasshaskell} and \cite{peytonjones1997type}. Swift witness tables follow the ``dictionary passing'' implementation strategy for type classes. Two papers subsequently introduced type classes with associated types, but without associated requirements. Using our Swift lingo, the first paper defined a formal model for associated types witnessed by \index{nested type declaration}nested nominal types \cite{assoctype}. In this model, every \index{type witness}type witness is a distinct concrete type. To translate their example into Swift: +A \index{calling convention}calling convention based around a runtime representation of types was explored in a 1996 paper \cite{intensional}. Swift protocols are similar in spirit to \index{Haskell}Haskell type classes, described in \cite{typeclass}, and subsequently \cite{typeclasshaskell} and \cite{peytonjones1997type}. Swift witness tables follow the ``dictionary passing'' implementation strategy for type classes. + +Associated types were introduced in \cite{assoctype}. In Swift parlance, the original formulation loosely corresponds to a world where each associated type is witnessed by a distinct \index{nested type declaration}nested nominal type. There were no associated requirements in the model. The original motivating example in their paper translates to Swift as follows: \begin{Verbatim} protocol ArrayElem { associatedtype Array func index(_: Array, _: Int) -> Self } \end{Verbatim} -The second paper described \emph{associated type synonyms} \cite{synonyms}. This is the special case of an associated type witnessed by a type alias in Swift. Again, we translate their example: +A subsequent work introduced \emph{associated type synonyms} \cite{synonyms}. In Swift, this translates to an associated type witnessed by a type alias, again without associated requirements. The motivating example looks like this in Swift: \begin{Verbatim} protocol Collects { associatedtype Elem @@ -538,26 +536,26 @@ \section{Related Work} func toList() -> [Elem] } \end{Verbatim} -Other relevant papers from the Haskell world include \cite{schrijvers2008type} and \cite{kiselyov2009fun}. +Other relevant papers from the Haskell world include \cite{schrijvers2008type} and \cite{kiselyov2009fun}. -\paragraph{C++.} While \index{C++}C++ templates are synonymous with ``generic programming'' to many programmers, C++ is somewhat unusual compared to most languages with parametric polymorphism, because templates are fundamentally syntactic in nature. Compiling a template declaration only does some minimal amount of semantic analysis, with most type checking deferred to until \emph{after} template expansion. There is no formal notion of requirements on template parameters, so whether template expansion succeeds or fails at a given expansion point depends on how the template's body uses the substituted template parameters. +\paragraph{C++.} While \index{C++}C++ templates are synonymous with ``generic programming'' to many programmers, C++ is somewhat unusual compared to most languages with parametric polymorphism, because templates are fundamentally syntactic in nature. Compiling a template declaration only does some minimal amount of semantic analysis, with most type checking deferred until \emph{after} template expansion. There is no formal notion of requirements on template parameters, so whether template expansion succeeds or fails at a given expansion point entirely depends on how the template's body uses the specified template arguments. -The inherent flexibility of C++ templates enables some advanced metaprogramming techniques \cite{gregor}. On the other hand, a template declaration's body must be visible from each expansion point, so this model is fundamentally at odds with separate compilation. Undesirable consequences include libraries where large parts must be implemented in header files, and cryptic error messages on template expansion failure. +This unusual flexibility enables some advanced metaprogramming techniques \cite{gregor}. On the other hand, because a template declaration's body must be visible at each expansion point, heavy template use is fundamentally at odds with separate compilation. Library authors can find themselves implementing a large amount of logic in header files, and error messages on template expansion failure can be difficult to understand. -Swift's generics model was in many ways inspired by ``C++0x concepts,'' a proposal to extend the C++ template metaprogramming model with \emph{concepts}, a form of type classes with associated types (\cite{concepts}, \cite{essential}). Concepts could declare their own associated requirements, but the full ramifications were perhaps not yet apparent to the authors when they wrote this remark: +Swift's ``value semantics'' are an evolution of a certain philosophy that originated in the C++ community~\cite{stepanov}. Swift generics also draw inspiration from ``C++0x concepts,'' a proposal to add checked requirements to C++ templates, based around type classes and associated types (\cite{concepts,essential}). Concepts could even state associated requirements, but the full ramifications of this were perhaps not yet apparent to the authors: \begin{quote} \textsl{``Concepts often include requirements on associated types. For example, a container's associated iterator \texttt{A} would be required to model the \texttt{Iterator} concept. This form of concept composition is slightly different from refinement but close enough that we do not wish to clutter our presentation [\ldots]''} \end{quote} \paragraph{Rust.} -\index{Rust}Rust generics are separately type checked, but Rust does not define a calling convention for unspecialized generic code, so there is no separate compilation. Instead, the implementation of a generic function is \emph{specialized}, or \emph{monomorphized}, for every unique set of generic arguments. +\index{Rust}Rust generics are separately type checked, but Rust does not define a calling convention for unspecialized generic code, so there is no separate compilation. Instead, the implementation of a generic function is \index{specialization}\emph{specialized}, or \emph{monomorphized}, for every unique set of generic arguments. Rust's \emph{traits} are similar to Swift's protocols; traits can declare associated types and associated conformance requirements. Rust generics also allow some kinds of abstraction not supported by Swift, such as lifetime variables, generic associated types \cite{rust_gat}, and const generics \cite{rust_const}. On the flip side, Rust does not allow same-type requirements to be stated in full generality~\cite{rust_same}. Instead, trait bounds can constrain associated types with a syntactic form resembling Swift's \index{parameterized protocol type}parameterized protocol types (\SecRef{protocols}), but we will show in \ExRef{proto assoc rule} that Swift's same-type requirements are more general. Rust's ``\texttt{where} clause elaboration'' is more limited than Swift's derived requirements formalism, and associated requirements sometimes need to be re-stated in the \texttt{where} clause of a generic declaration \cite{rust_bug}. An early attempt at formalizing Rust traits appears in a PhD dissertation from 2015 \cite{Milewski_2015}. A more recent effort is ``Chalk,'' an implementation of a \index{Prolog}Prolog-like solver based on Horn clauses~\cite{rust_chalk}. \paragraph{Java.} -\index{Java}Java generics are separately type checked and compiled. In Java, only reference types can be used as generic arguments; primitive value types must be boxed first. Generic argument types are not reified at runtime, and values of generic parameter type are always represented as object pointers. This avoids the complexity of dependent layout, but comes at the cost of more runtime type checks and heap allocation. Java generics are based on existential types and also support \emph{variance}, a subtyping relation between different instantiations of the same generic type, defined in terms of the subtype relation on corresponding generic arguments \cite{java_faq}. +Traditional \index{Java}Java generics are implemented by erasing generic argument types at compile time, so that values of generic parameter type are always represented by object pointers, and primitive value types must be \index{boxing}boxed first. This avoids the complexity of dependent layout, at the cost of more runtime checks and heap allocations. Java generics also support \emph{variance}, a form of subtyping between different instantiations of the same generic type, defined in terms of the \index{subtype}subtype relation on their corresponding generic arguments~\cite{java_faq}. An effort to extend the Java virtual machine with support for user-defined value types and reified generics is currently underway~\cite{valhalla}. \paragraph{Hylo.} \index{Hylo} Hylo is a research language with a focus on mutable value semantics \cite{hylo}. Hylo's generic programming capabilities are similar to Swift and Rust. The Hylo compiler implementation incorporates some ideas from this book, using the theory of string rewriting to reason about generic requirements \cite{hylorqm}. diff --git a/docs/Generics/chapters/math-summary.tex b/docs/Generics/chapters/math-summary.tex index 9f3748625acb5..622d12b68c430 100644 --- a/docs/Generics/chapters/math-summary.tex +++ b/docs/Generics/chapters/math-summary.tex @@ -52,12 +52,12 @@ \section*{More} \item Formal systems (\SecRef{derived req}). \item Equivalence relations (\SecRef{valid type params}, \SecRef{rewrite graph}). \item Partial and linear orders (\SecRef{reduced types}, \SecRef{rewritesystemintro}, \SecRef{building terms}). -\item Category theory (\SecRef{submapcomposition}). +\item Category theory (\SecRef{sec:composition}). \item Boolean satisfiability (\SecRef{associated type inference}). \item Directed graphs (\SecRef{type parameter graph}, \SecRef{finding conformance paths}, \SecRef{recursive conformances}, \SecRef{protocol component}). \item Proof by induction (\SecRef{generic signature validity}). -\item Computability theory (\SecRef{tag systems}, \SecRef{word problem}). -\item Finitely-presented monoids and string rewriting (\ChapRef{monoids}, \ChapRef{completion}). +\item Computability theory (\SecRef{halting problem}, \SecRef{word problem}). +\item Finitely-presented monoids and string rewriting (\ChapRef{monoids}, \ChapRef{chap:completion}). \end{itemize} \end{document} diff --git a/docs/Generics/chapters/rule-minimization.tex b/docs/Generics/chapters/minimization.tex similarity index 83% rename from docs/Generics/chapters/rule-minimization.tex rename to docs/Generics/chapters/minimization.tex index dd86f38aaad7d..928662668f27d 100644 --- a/docs/Generics/chapters/rule-minimization.tex +++ b/docs/Generics/chapters/minimization.tex @@ -2,7 +2,7 @@ \begin{document} -\chapter[]{Rule Minimization}\label{rqm minimization} +\chapter[]{Minimization}\label{rqm minimization} \ifWIP TODO: @@ -42,8 +42,6 @@ \end{itemize} \fi -\section[]{Loop Normalization} - \ifWIP \cite{homotopyreduction} \IndexFlag{disable-requirement-machine-loop-normalization} @@ -127,6 +125,25 @@ \fi +\section[]{Concrete Contraction}\label{concrete contraction} + +\IndexFlag{disable-requirement-machine-concrete-contraction} +\IndexTwoFlag{debug-requirement-machine}{concrete-contraction} + +\IndexDefinition{concrete contraction} + +\ifWIP +TODO: +\begin{itemize} +\item Doesn't actually appear in signature so should not impact minimization +\item The problem: it might give you a smaller anchor +\item Invariant violation without concrete contraction +\item Concrete contraction substitutes superclass and concrete types +\item Also GSB compatibility: T.A, T == C, C.A is a concrete typealias that's not an associated type. this doesn't add a rule +\item Open question: can we do this in a more principled way +\end{itemize} +\fi + \section[]{Source Code Reference}\label{rqm minimization source ref} \end{document} diff --git a/docs/Generics/chapters/monoids.tex b/docs/Generics/chapters/monoids.tex index 14487864b9b1b..d42cac6e25c11 100644 --- a/docs/Generics/chapters/monoids.tex +++ b/docs/Generics/chapters/monoids.tex @@ -4,9 +4,9 @@ \chapter{Monoids}\label{monoids} -\lettrine{M}{onoids} are one of the fundamental objects that we study in abstract algebra. Their simplicity allows for a great deal of generality, so we will quickly narrow our focus to \emph{finitely-presented} monoids. We will see that every finitely-presented monoid can be encoded by a Swift generic signature. After a brief detour into computability theory, we will introduce the \emph{word problem}, and relate the word problem to our derived requirements formalism of \SecRef{derived req}. The word problem turns out to be undecidable in general, but we will see it can be solved in those finitely-presented monoids that admit a \emph{convergent} presentation. This prepares us for for the next chapter, where we attempt to solve derived requirements by encoding a generic signature as a finitely-presented monoid. We begin with standard definitions, found in texts such as \cite{semigroup} or \cite{postmodern}. +\lettrine{M}{onoids} are one of the fundamental objects that we study in abstract algebra. Their simplicity allows for a great deal of generality, so we will quickly narrow our focus to \emph{finitely-presented} monoids. We will see that every finitely-presented monoid can be translated into a Swift generic signature, which establishes a connection between the derived requirements formalism of \SecRef{derived req}, and the \emph{word problem}. Like the halting problem of \SecRef{halting problem}, we will see that the word problem is undecidable in general. On the other hand, it can be solved in those finitely-presented monoids that admit a \emph{convergent} presentation. This prepares us for the next chapter, where we proceed to ``solve'' the derived requirements formalism by encoding a generic signature as a finitely-presented monoid. We begin with standard definitions, found in \cite{semigroup} or \cite{postmodern}. \begin{definition} -A \IndexDefinition{monoid}\emph{monoid} \index{$\cdot$}\index{$\cdot$!z@\igobble|seealso{monoid}}$(M,\, \cdot,\, \varepsilon)$ is a structure consisting of a \index{set}set~$M$, a \index{binary operation}binary operation~``\;$\cdot$\;'', and an identity element~$\varepsilon\in M$, which together satisfy the following three axioms: +A \IndexDefinition{monoid}\emph{monoid} \index{$\cdot$}\index{$\cdot$!z@\igobble|seealso{monoid}}$(M,\, \cdot,\, \varepsilon)$ is a structure consisting of a \index{set!monoid}set~$M$, a \index{binary operation}binary operation~``\;$\cdot$\;'', and an identity element~$\varepsilon\in M$, which together satisfy the following three axioms: \begin{itemize} \item The set $M$ is \emph{closed} under the binary operation: for all $x$, $y \in M$, $x\cdot y\in M$. \item The binary operation is \IndexDefinition{associative operation}\emph{associative}: for all $x, y, z \in M$, $x\cdot(y\cdot z)=(x\cdot y)\cdot z$. @@ -32,7 +32,7 @@ \chapter{Monoids}\label{monoids} Let $A$ be a \index{set!generators}set. The \IndexDefinition{free monoid}\emph{free monoid} generated by $A$, denoted $A^*$, is the \index{set!free monoid}set of all finite strings of elements of $A$. (This is the same notation as the ``\;\texttt{*}\;'' operator in a \index{regular language}\emph{regular expression}.) The set $A$ is called the \index{alphabet!z@\igobble|seealso{generating set}}\emph{alphabet} or \IndexDefinition{generating set}\emph{generating set} of $A^*$. The generating set may be finite or infinite, but we assume it is finite unless stated otherwise. The \index{binary operation!free monoid}binary operation on $A^*$ is \emph{string concatenation}, and the \index{identity element!free monoid}identity element $\varepsilon$ is the empty string. The elements of $A^*$ are also called \IndexDefinition{term}\emph{terms}. We say that $u$ is a \IndexDefinition{subterm}\emph{subterm} of $t$ if $t=xuy$ for some $x$, $y\in A^*$. The \IndexDefinition{term length}\emph{length} of a term $t\in A^*$ is denoted $|t|\in\NN$. \end{definition} -The reader might wish to review the discussion of the type parameter graph from \SecRef{type parameter graph}. In abstract algebra, the \index{directed graph!Cayley graph}\index{Cayley graph!free monoid}\emph{Cayley graph} of a monoid plays a similar role, and we will discover many parallels between the two. We will describe the general construction in the next section, but for now we'll just look the Cayley graph of the free monoid~$A^*$. The vertices in this graph are the terms of $A^*$, and the vertex for the \index{identity element!Cayley graph}identity element is the distinguished \index{root vertex!Cayley graph}root vertex. Then, for each vertex~$t$ and each generator $g\in A$, we also add an edge with \index{source vertex!Cayley graph}source~$t$ and \index{source vertex!Cayley graph}destination~$tg$, and label this edge ``$g$''. (This is sometimes called the \emph{right} Cayley graph, and the left Cayley graph can then be defined in the opposite way by joining $t$ with $gt$.) +The reader might wish to review the discussion of the \index{type parameter graph!Cayley graph}type parameter graph from \SecRef{type parameter graph}. In abstract algebra, the \index{directed graph!Cayley graph}\index{Cayley graph!free monoid}\emph{Cayley graph} of a monoid plays a similar role, and we will discover many parallels between the two. We will describe the general construction in the next section, but for now we'll just look the Cayley graph of the free monoid~$A^*$. The vertices in this graph are the terms of $A^*$, and the vertex for the \index{identity element!Cayley graph}identity element is the distinguished \index{root vertex!Cayley graph}root vertex. Then, for each vertex~$t$ and each generator $g\in A$, we also add an edge with \index{source vertex!Cayley graph}source~$t$ and \index{source vertex!Cayley graph}destination~$tg$, and label this edge ``$g$''. (This is sometimes called the \emph{right} Cayley graph, and the left Cayley graph can then be defined in the opposite way by joining $t$ with $gt$.) \begin{example} The free monoid with two generators $\{a,b\}^*$ consists of all finite strings made up of $a$ and $b$. Two typical elements are $abba$ and $bab$, and their concatenation is $abba\cdot bab=abbabab$. Unlike $(\NN,+,0)$, this monoid operation is not \index{commutative operation}\emph{commutative}, so for example, $abba\cdot bab\neq bab\cdot abba$. The Cayley graph of $\{a,b\}^*$ is an infinite binary \index{tree!Cayley graph of free monoid}tree. Every vertex has two successors, corresponding to multiplication on the right by $a$~and~$b$, respectively: @@ -181,23 +181,23 @@ \chapter{Monoids}\label{monoids} \section{Finitely-Presented Monoids}\label{finitely presented monoids} -Every element of a free monoid has a \emph{unique} expression as a product of generators. Finitely-presented monoids are more general, because multiple distinct combinations can name the same element. To model this phenomenon, we start from a finite set of \index{rule|see{rewrite rule}}\IndexDefinition{rewrite rule}\emph{rewrite rules}, which then generate an equivalence relation on terms. We first turned to equivalence relations in \SecRef{valid type params}, to understand the same-type requirements of a generic signature; the below construction is similar. +Every element of a free monoid has a \emph{unique} expression as a product of generators. Finitely-presented monoids are more general, because multiple distinct combinations can name the same element. To model this phenomenon, we add a finite set of \index{rule|see{rewrite rule}}\IndexDefinition{rewrite rule}\emph{rewrite rules} (or \emph{relations}); these then generate an equivalence relation on terms. We first turned to equivalence relations in \SecRef{valid type params}, to understand the same-type requirements of a generic signature; the below construction is similar. A \IndexDefinition{monoid presentation}\emph{monoid presentation} is a list of generators and rewrite rules: -\[\Pres{ \underbrace{a_1,\,\ldots,\, a_m}_{\text{generators}} }{ \underbrace{u_1 \sim v_1,\,\ldots,\, u_n \sim v_n}_{\text{rewrite rules}}} \] +\[\Pres{ \underbrace{a_1,\,\ldots,\, a_m}_{\text{generators\vphantom{l}}} }{ \underbrace{u_1 \sim v_1,\,\ldots,\, u_n \sim v_n}_{\text{rewrite rules\vphantom{g}}}} \] Also, if $A := \{a_1,\ldots, a_m\}$ and $R := \{(u_1,v_1),\,\ldots,\,(u_n,v_n)\}\subseteq A^* \times A^*$ are finite sets, we can form the monoid presentation~$\AR$. The equivalence relation that~$R$ generates on the terms of~$A^*$ is denoted by $x\sim_R y$, or $x \sim y$ when $R$ is clear from context. Note that when we write $x=y$, we always mean that $x$ and $y$ are \emph{identical} in $A^*$, not just equivalent in $\AR$. We will describe the intuitive model behind the \index{term equivalence relation}term equivalence relation first, and then give a rigorous definition in the next section. Syntactically, a rewrite rule $(u,v)\in R$ is a pair of terms. Semantically, a rewrite rule tells us that if we find $u$ anywhere within a term, we can replace this subterm with $v$, and obtain another equivalent term. Term equivalence is symmetric, so we can also replace $v$ with $u$, too. Finally, it is transitive, so we can iterate these rewrite steps any number of times to prove an equivalence. -A monoid presentation $\AR$ then defines a \IndexDefinition{finitely-presented monoid}\emph{finitely-presented monoid}. The \index{set!finitely-presented monoid}elements of a finitely-presented monoid are the \index{equivalence class!terms}equivalence classes of $\sim_R$, the \index{binary operation!finitely-presented monoid}binary operation is concatenation of terms, and the \index{identity element!finitely-presented monoid}identity element is the equivalence class of $\varepsilon$. +A monoid presentation $\AR$ defines a \IndexDefinition{finitely-presented monoid}\emph{finitely-presented monoid} where the \index{set!finitely-presented monoid}elements are the \index{equivalence class!terms}equivalence classes of $\sim_R$, the \index{identity element!finitely-presented monoid}identity element is the equivalence class of $\varepsilon$, and the \index{binary operation!finitely-presented monoid}binary operation is concatenation of terms (we will show it is \index{well-defined}well-defined later). \begin{example} -Finitely-presented monoids generalize the free monoids, because every free monoid (with a finite generating set) is also finitely presented if we start from an empty set of rewrite rules. In this case $x\sim y$ if and only if $x=y$. +Finitely-presented monoids generalize the free monoids, because every free monoid (with a finite generating set) is also finitely presented if we add an empty set of rewrite rules. In this case $x\sim y$ if and only if $x=y$. \end{example} \begin{example}\label{monoid z4 example} -Consider the finite set $\{0,1,2,3\}$ with the binary operation $+$ given by the following table. The operation is \index{modular arithmetic}addition modulo 4. We call this monoid $\mathbb{Z}_4$. This method of specifying a finite monoid is called a \IndexDefinition{Cayley table}\emph{Cayley table}: +Consider the finite set $\{0,1,2,3\}$ with the binary operation $+$ given by the below table. This method of defining a finite monoid is called a \IndexDefinition{Cayley table}\emph{Cayley table}: \begin{center} \begin{tabular}{c|cccc} $+$&0&1&2&3\\ @@ -208,11 +208,11 @@ \section{Finitely-Presented Monoids}\label{finitely presented monoids} 3&3&0&1&2 \end{tabular} \end{center} -If we write $a$ instead of $1$, $\varepsilon$ instead of $0$, and $\cdot$ instead of $+$, we can also describe $\mathbb{Z}_4$ as a finitely-presented monoid with one generator and one rewrite rule: +The above operation is \index{modular arithmetic}addition modulo 4, and we will denote this monoid by~$\mathbb{Z}_4$. If we write $a$~instead of~$1$, $\varepsilon$~instead of~$0$, and $\cdot$~instead of~$+$, we can present $\mathbb{Z}_4$ with one generator and one rewrite rule: \[\mathbb{Z}_4 := \Pres{a}{a^4\sim\varepsilon}\] The rule $a^4\sim\varepsilon$ allows us to insert or delete $aaaa$, which gives us this general principle: \[a^m\sim a^n\qquad\text{if and only if}\qquad m\equiv n\pmod 4\] -Therefore, the rule partitions the terms of $\{a^*\}$ into four infinite equivalence classes: +Therefore, the rule partitions the terms of $\{a\}^*$ into four infinite equivalence classes: \begin{gather*} \{\varepsilon,\, a^4,\, a^8, \ldots,\, a^{4k},\, \ldots\}\\ \{a,\, a^5,\, a^9,\, \ldots,\, a^{4k+1},\, \ldots\}\\ @@ -220,7 +220,7 @@ \section{Finitely-Presented Monoids}\label{finitely presented monoids} \{a^3,\, a^7,\, a^{11},\, \ldots,\, a^{4k+3},\, \ldots\} \end{gather*} -In the \index{Cayley graph!finitely-presented monoid}Cayley graph of a finitely-presented monoid, the \index{vertex!Cayley graph}vertices are \index{equivalence class!terms}equivalence classes of terms. The \index{edge!Cayley graph}edge set is now defined on these equivalence classes, so for each equivalence class $\EquivClass{t}$ and generator $g\in A$, we have an edge with source~$\EquivClass{t}$ and destination~$\EquivClass{tg}$. We will see later this does not depend on our choice of labels. With $\mathbb{Z}_4$ we can label each vertex with the shortest term in each equivalence class. The Cayley graph for $\mathbb{Z}_4$ is the same as the graph we associated with the \texttt{Z4} protocol from \ExRef{protocol z4 graph}: +We construct the \index{Cayley graph!finitely-presented monoid}Cayley graph of a finitely-presented monoid by taking the \index{vertex!Cayley graph}vertices to be \index{equivalence class!terms}equivalence classes of terms. The \index{edge!Cayley graph}edge relation is now defined on these equivalence classes, so for each equivalence class $\EquivClass{t}$ and generator $g\in A$, we add an edge with source~$\EquivClass{t}$ and destination~$\EquivClass{tg}$. If we label each vertex with the shortest term in its equivalence class, we see that the Cayley graph for $\mathbb{Z}_4$ looks like the type parameter graph of the \texttt{Z4} protocol from \ExRef{protocol z4 graph}: \begin{center} \begin{tikzpicture} @@ -415,7 +415,7 @@ \section{Equivalence of Terms}\label{rewrite graph} \] \end{ceqn} -\begin{definition} +\begin{definition}\label{rewrite graph def} The \IndexDefinition{rewrite graph}\emph{rewrite graph} of a \index{monoid presentation!rewrite graph}monoid presentation $\AR$ has the terms of~$A^*$ as \index{vertex!rewrite graph}vertices, and rewrite steps as \index{edge!rewrite graph}edges. Except in the trivial case where $A=\varnothing$, this graph is \index{infinite graph!rewrite graph}infinite, but we will only look at small \index{subgraph!rewrite graph}subgraphs at any one time. For example, every rewrite step determines a subgraph with two vertices and one edge: \begin{center} \begin{tikzcd}[column sep=huge] @@ -472,7 +472,7 @@ \section{Equivalence of Terms}\label{rewrite graph} \end{center} \end{example} -\paragraph{The algebra of rewriting.} To ensure that the term equivalence relation ultimately satisfies the necessary axioms, we introduce some algebraic operations on rewrite steps and paths, called inversion, composition, and whiskering. +\paragraph{The algebra of rewriting.} To ensure that the \index{term equivalence relation}term equivalence relation ultimately satisfies the necessary axioms, we introduce some algebraic operations on rewrite steps and paths, called inversion, composition, and whiskering. \begin{definition} The \IndexDefinition{inverse rewrite step}\emph{inverse} of a rewrite step $s:=x(u\Rightarrow v)y$, denoted $s^{-1}$, is defined as $x(v\Rightarrow u)y$, so we swap the $u$ and the $v$. The inverse of a positive rewrite step is negative, and vice versa, which means that the edges in the rewrite graph come in complementary pairs: @@ -605,10 +605,10 @@ \section{Equivalence of Terms}\label{rewrite graph} \begin{proposition} Let $\AR$ be a monoid presentation. Let $P$ be the set of rewrite paths of $\AR$. The following axioms characterize~$P$: \begin{enumerate} -\item (Base case) For each $(u,v)\in R$, there is a rewrite path $(u\Rightarrow v)\in P$. +\item (Base case) For each $(u,v)\in R$, there is an ``elementary'' rewrite path $(u\Rightarrow v)\in P$. \item (Closure under inversion) If $p\in P$, then $p^{-1}\in P$. \item (Closure under composition) If $p_1$, $p_2\in P$ with $\Dst(p_1)=\Src(p_2)$, then $p_1\circ p_2\in P$. -\item (Closure under whiskering) If $p\in P$ and $z\in A^*$, then $z\WL p$ and $p\WR z\in P$. +\item (Closure under whiskering) If $p\in P$ and $z\in A^*$, then $z\WL p$, $p\WR z\in P$. \end{enumerate} \end{proposition} \begin{proof} @@ -618,7 +618,7 @@ \section{Equivalence of Terms}\label{rewrite graph} Recall that $x\sim_R y$ means our rewrite graph has a path from $x$~to~$y$. \begin{proposition} -Let $\AR$ be a monoid presentation. Then $\sim_R$ is an equivalence relation on $A^*$. +Let $\AR$ be a monoid presentation. Then $\sim_R$ is an \IndexDefinition{term equivalence relation}equivalence relation on $A^*$. \end{proposition} \begin{proof} Let $P$ be the set of rewrite paths of $\AR$. We check each axiom in turn. @@ -644,7 +644,7 @@ \section{Equivalence of Terms}\label{rewrite graph} Consider the monoid $(\NN,+,0)$. The standard \index{linear order}linear order $<$ on \index{natural numbers!linear order}$\NN$ is translation-invariant (but not an equivalence relation). For instance $5<7$ also implies that $5+2<7+2$. (Hence we're using ``translation'' in the geometric sense.) \end{example} \begin{theorem} -Let $\AR$ be a monoid presentation. Then the term equivalence relation~$\sim_R$ is a monoid congruence. +Let $\AR$ be a monoid presentation. Then the \index{term equivalence relation}term equivalence relation~$\sim_R$ is a monoid congruence. \end{theorem} \begin{proof} We've seen that $\sim$ is an equivalence relation, so it remains to establish translation invariance. Let $P$ be the set of rewrite paths of $\AR$. Suppose that $x$, $y$, $z\in A^*$, and also that $x\sim y$, so we have $p\in P$ such that $x=\Src(p)$ and $y=\Dst(p)$. We must show $zx\sim zy$ and $xz\sim yz$. For the first claim, we notice that $z\WL p\in P$, so $zx\sim zy$: @@ -790,7 +790,7 @@ \section{A Swift Connection}\label{monoidsasprotocols} Explicit requirements&Rewrite rules\\ Derived requirements&Rewrite paths\\ Reduced type equality&Term equivalence\\ -\index{type parameter graph}Type parameter graph&\index{Cayley graph}Cayley graph\\ +\index{type parameter graph!Cayley graph}Type parameter graph&\index{Cayley graph}Cayley graph\\ \bottomrule \end{tabular} \end{center} @@ -951,10 +951,10 @@ \section{A Swift Connection}\label{monoidsasprotocols} \end{enumerate} \begin{theorem}\label{path to derivation} -Let $\AR$ be a monoid presentation, and let \texttt{M} be the protocol that encodes $\AR$. Then $x\sim_R y$ implies $\GM\vdash\SameReqPhi{x}{y}$ for all $x$, $y\in A^*$. +Let $\AR$ be a monoid presentation, and let \texttt{M} be the protocol that encodes $\AR$. Then for all $x$, $y\in A^*$, $x\sim_R y$ implies $\GM\vdash\SameReqPhi{x}{y}$. \end{theorem} \begin{proof} -We are given a \index{rewrite path!to derivation}rewrite path $p$ with $\Src(p)=x$ and $\Dst(p)=y$, and we must construct a derivation of $\GM\vdash\SameReqPhi{x}{y}$. We proceed by \index{induction}induction on the \index{rewrite path length}length of $p$. +We are given a \index{rewrite path!to derivation}rewrite path $p$ with $\Src(p)=x$ and $\Dst(p)=y$, and we must construct a derivation of $\GM\vdash\SameReqPhi{x}{y}$. We proceed by \index{induction}induction on the \index{rewrite path length}length of the rewrite path~$p$. \BaseCase We have an \index{empty rewrite path}empty rewrite path, so $p=1_t$ for some $t\in A^*$. We construct a derivation of $\GM\vdash\varphi(t)$ using \PropRef{monoid type lemma}, and then apply the \IndexStep{Reflex}\textsc{Reflex} inference rule to derive $\SameReq{$\varphi(t)$}{$\varphi(t)$}$: \begin{gather*} @@ -988,7 +988,7 @@ \section{A Swift Connection}\label{monoidsasprotocols} \end{proof} \begin{theorem}\label{derivation to path} -Let $\AR$ be a monoid presentation, and let \texttt{M} be the protocol that encodes $\AR$. Then $\GM\vdash\SameReqPhi{x}{y}$ implies that $x\sim_R y$ for all $x$, $y\in A^*$. +Let $\AR$ be a monoid presentation, and let \texttt{M} be the protocol that encodes $\AR$. Then for all $x$, $y\in A^*$, $\GM\vdash\SameReqPhi{x}{y}$ implies that $x\sim_R y$. \end{theorem} \begin{proof} We are given a derivation of $\SameReqPhi{x}{y}$, and we must show that $x\sim y$ by constructing a rewrite path $p$ with source $x$ and destination $y$. We proceed by structural \index{induction}induction on derived requirements. @@ -1043,9 +1043,9 @@ \section{A Swift Connection}\label{monoidsasprotocols} \end{enumerate} The same can be said about the edge sets as well: \begin{enumerate} -\item An edge in (1) joins each $\EquivClass{t}$ with $\EquivClass{tg}$ for all $a\in A$. -\item An edge in (2) joins each $\EquivClass{\varphi(t)}$ with $\EquivClass{\varphi(t)\texttt{.A}}$ for all associated types~\texttt{A} of~\texttt{M}. -\item An edge in (3) joins each $\ConfReq{$\EquivClass{\varphi(t)}$}{M}$ with $\ConfReq{$\EquivClass{\varphi(t)\texttt{.A}}$}{M}$ for all associated conformance requirements $\AssocConfReq{Self.A}{M}{M}$ of~\texttt{M}. +\item An edge in (1) joins each $\EquivClass{t}$ with $\EquivClass{tg}$, for all $a\in A$. +\item An edge in (2) joins each $\EquivClass{\varphi(t)}$ with $\EquivClass{\varphi(t)\texttt{.A}}$, for all associated types~\texttt{A} of~\texttt{M}. +\item An edge in (3) joins each $\ConfReq{$\EquivClass{\varphi(t)}$}{M}$ with $\ConfReq{$\EquivClass{\varphi(t)\texttt{.A}}$}{M}$, for all associated conformance requirements $\AssocConfReq{Self.A}{M}{M}$ of~\texttt{M}. \end{enumerate} One final remark. Recall that in the Cayley graph of a monoid presentation~$\AR$, every vertex has the same number of \index{successor!vertex}successors, and we just showed that the same is true of the type parameter graph of $\GM$. On the other hand, in an arbitrary generic signature, the number of successors of a vertex will vary, because equivalence classes will conform to various sets of protocols with distinct associated types. The type parameter graph of a ``typical'' generic signature is almost never the Cayley graph of a monoid. @@ -1078,12 +1078,12 @@ \section{The Word Problem}\label{word problem} \end{gather*} The first statement is true, because $baaba\sim babaa$, as we've seen. However, the second statement is actually false, and in fact $baa\not\sim bbb$. The compiler accepts the first call, and diagnoses an error on the second call to \texttt{sameType()}, as we would expect. -We can write a similar program to encode every other finitely-presented monoid we've seen so far, and the Swift compiler will happily accept them, and correctly decide if arbitrary terms are equivalent to each other. We now ask, does this always work, or are there some protocol declarations constructed this way that we cannot accept? +In fact, if we write a similar program for every example of a finitely-presented monoid we've seen so far, the Swift compiler will be able to correctly decide for us if any two arbitrary terms are equivalent. A natural question is, does this always work, or are there some protocol declarations constructed this way that we cannot accept? -This is the well-known \IndexDefinition{word problem}\emph{word problem}: +We're asking the Swift compiler to solve the well-known \IndexDefinition{word problem}\emph{word problem}: \begin{itemize} \item -\textbf{Input:} A monoid presentation $\AR$, and two terms $x$, $y\in A^*$. +\textbf{Input:} Monoid presentation $\AR$, two terms $x$, $y\in A^*$. \textbf{Result:} True or false: $x\sim_R y$? \end{itemize} @@ -1094,7 +1094,7 @@ \section{The Word Problem}\label{word problem} Thue was able to solve the word problem in certain restricted instances. For example, if the rewrite rules preserve \index{length-preserving rewrite rule}\index{term length}term length, then every equivalence class must be finite, and $x\sim y$ can be decided by exhaustive enumeration of $\EquivClass{y}$. What eluded Thue though, was a general approach that worked in all cases. -The next twist in this saga came after the development of computability theory. We already sketched out \index{Turing machine}Turing machines and the \index{halting problem}halting problem in \SecRef{tag systems}, where we saw that the question of \emph{termination checking} is undecidable in our type substitution algebra. Now, we will see another such undecidable problem. In a 1947~paper~\cite{post_1947}, \index{Emil Post}Emil~Post established that no computable algorithm can solve the \index{undecidable problem!word problem}word problem. Post did this by defining an encoding of a Turing machine as a finitely-presented monoid. At a very high level, this encoding works as follows: +The next twist in this saga came after the development of computability theory. We already sketched out \index{Turing machine}Turing machines and the \index{halting problem}halting problem in \SecRef{halting problem}, where we saw that the question of \emph{termination checking} is undecidable in our type substitution algebra. Now, we will see another such undecidable problem. In a 1947~paper~\cite{post_1947}, \index{Emil Post}Emil~Post gave a proof that no \index{effective procedure}effective procedure can solve the \index{undecidable problem!word problem}word problem. Post did this by defining an encoding of a Turing machine as a Thue system. At a very high level, this encoding works as follows: \begin{enumerate} \item At any point in time, the complete state of the Turing machine, consisting of the current state symbol and the contents of the tape, can be described by a term in the finitely-presented monoid. \item The single-step transitions, where the machine replaces the symbol at the current head position and moves left or right, define the monoid's rewrite rules. @@ -1108,8 +1108,6 @@ \section{The Word Problem}\label{word problem} A much simpler monoid with an undecidable word problem appeared in a 1958~paper by \index{Grigori Tseitin}G.~S.~Tseitin \cite{undecidablesemigroup}. For an English translation with commentary, see~\cite{nybergbrodda2024g}. -\newcommand{\Ts}{\mathfrak{C}_1} - \begin{theorem}\label{undecidablemonoid} This finitely-presented monoid has an undecidable word problem: \begin{align*} @@ -1119,7 +1117,7 @@ \section{The Word Problem}\label{word problem} \end{align*} \end{theorem} \begin{corollary} -No computable algorithm can decide if two arbitrary type parameters are equivalent in the protocol generic signature $G_\texttt{C1}$: +There does not exist an \index{effective procedure}effective procedure to decide if two arbitrary type parameters are equivalent in the protocol generic signature $G_\texttt{C1}$: \begin{Verbatim} protocol C1 { associatedtype A: C1 @@ -1179,9 +1177,9 @@ \section{The Word Problem}\label{word problem} Our example does not exhibit undecidability, because the word problem in~$D_{12}$ is very easy to solve. There are only 12 elements, so we can just write down the \index{Cayley table}Cayley table. However, as with monoids, the word problem for groups is undecidable in the general case \cite{undecidablegroup} (for a more accessible introduction, see \cite{undecidablegroup2}, or Chapter~12 of \cite{rotman}). Because~$\Ts$ can encode the word problem in \emph{all} groups, we certainly cannot hope to write down an algorithm to solve the word problem in~$\Ts$. -\paragraph{The big picture.} We've shown that every finitely-presented monoid can be encoded as a specific kind of Swift protocol, almost as if Swift's protocols are a generalization of the finitely-presented monoids! This doesn't immediately help us implement Swift generics, though. However, \ChapRef{symbols terms rules} shows that by taking a suitable alphabet and set of rewrite rules, we can go in the other direction and encode any Swift generic signature as a finitely-presented monoid. While the undecidability of the word problem prevents us from accepting \emph{all} generic signatures, the next section will describe a large class of finitely-presented monoids where the word problem is easy to solve. All reasonable generic signatures fit in with our model. +\paragraph{The big picture.} We've shown that every finitely-presented monoid can be encoded as a specific kind of Swift protocol, almost as if Swift's protocols are a generalization of the finitely-presented monoids! This doesn't immediately help us implement Swift generics, though. However, \ChapRef{chap:symbols terms rules} shows that by taking a suitable alphabet and set of rewrite rules, we can go in the other direction and encode any Swift generic signature as a finitely-presented monoid. While the undecidability of the word problem prevents us from accepting \emph{all} generic signatures, the next section will describe a large class of finitely-presented monoids where the word problem is easy to solve. All reasonable generic signatures fit in with our model. -\paragraph{Closing remarks.} In \SecRef{tag systems}, we showed that the \index{type substitution}type substitution algebra can encode arbitrary computation with recursive conformance requirements~\cite{se0157}. Now we see that using recursive conformance requirements and protocol \texttt{where} clauses~\cite{se0142}, we can encode an undecidable problem in the derived requirements formalism as well. +\paragraph{Closing remarks.} In \SecRef{halting problem}, we showed that the \index{type substitution}type substitution algebra can encode arbitrary computation with recursive conformance requirements~\cite{se0157}. Now we see that using recursive conformance requirements and protocol \texttt{where} clauses~\cite{se0142}, we can encode an undecidable problem in the derived requirements formalism as well. Tseitin's monoid is not a special monoid, so it cannot encode its own word problem. D.~J.~Collins~\cite{universalsemigroup} later discovered a ``more'' universal word problem interpreter having only a few more rules; this can encode word problems in $\Ts$ or even itself: \begin{align*} @@ -1387,7 +1385,7 @@ \section{The Normal Form Algorithm}\label{rewritesystemintro} \end{center} In fact, the type parameter order of \AlgRef{type parameter order} is basically a specific instance of the shortlex order, except we described it with a recursive algorithm, whereas \AlgRef{shortlex} is iterative. -\paragraph{Completion.} The process of repairing confluence violations by adding rules is called \IndexDefinition{completion}\emph{completion}. If completion succeeds, we get a convergent rewriting system. \ChapRef{completion} will explain the Knuth-Bendix algorithm that is used for this purpose. For example, the Swift compiler accepts a protocol \texttt{M} encoding the monoid presentation from \ExRef{non confluent example}; completion allows us to establish that $\GM\vdash\SameReq{Self.A.C}{Self.A}$. +\paragraph{Completion.} The process of repairing confluence violations by adding rules is called \IndexDefinition{completion}\emph{completion}. If completion succeeds, we get a convergent rewriting system. \ChapRef{chap:completion} will explain the Knuth-Bendix algorithm that is used for this purpose. For example, the Swift compiler accepts a protocol \texttt{M} encoding the monoid presentation from \ExRef{non confluent example}; completion allows us to establish that $\GM\vdash\SameReq{Self.A.C}{Self.A}$. Completion is only a \emph{semi-decision procedure} that may fail to terminate; in practice, we impose an iteration limit. This cannot be improved upon, because ultimately, it is \index{undecidable problem}undecidable if a monoid can be presented by a convergent rewriting system \cite{ODUNLAING1983339}. A monoid with an undecidable word problem cannot have such a presentation, so completion must fail in that case. Is the converse true? That is, if a finitely-presented monoid is known to have a decidable word problem, will completion always terminate in a finite number of steps and output a convergent rewriting system? The answer is also no: \begin{enumerate} @@ -1401,11 +1399,9 @@ \section{The Normal Form Algorithm}\label{rewritesystemintro} \begin{theorem}\label{squier s1} The following finitely-presented monoid has a word problem decidable via a ``bespoke'' algorithm, but no convergent presentation over any generating set: \[S_1:=\Pres{a,b,t,x,y}{ab\sim \varepsilon,\,xa\sim atx,\,xt\sim tx,\,xb\sim bx,\,xy\sim \varepsilon}\] \end{theorem} -To prove this theorem, Squier introduced a new combinatorial property of the \index{rewrite graph}rewrite graph (the term \emph{derivation graph} is used in the paper), called \emph{finite derivation type}. It is shown that the rewrite graph of a convergent rewriting system has finite derivation type, and that finite derivation type does not depend on the choice of presentation, so it is an invariant of the monoid itself, and not just a particular presentation. - -For example, $\Pres{a,b}{aba\sim bab}$ from above has finite derivation type, because a \emph{different} presentation of the same monoid happens to be convergent. +To prove this theorem, Squier introduced a new combinatorial property of the \index{rewrite graph}rewrite graph we met in \DefRef{rewrite graph def} (the term \emph{derivation graph} is used in the paper), called \emph{finite derivation type}. It can be shown that the property of having finite derivation type does not depend on the choice of presentation, so it is an invariant of the monoid itself. Furthermore, a convergent presentation necessarily has finite derivation type. (Having finite derivation type is a necessary, but not a sufficient, condition for the existence of a convergent presentation.) For example, $\Pres{a,b}{aba\sim bab}$ from above can be presented as a convergent rewriting system over a different alphabet, so this monoid necessarily has finite derivation type. -On the other hand, if it can be shown than some monoid does not have finite derivation type, we can conclude that no convergent presentation exists for this monoid, even if its word problem is decidable by other means. Squier was able to show that while $S_1$ does not have finite derivation type, it does have a decidable word problem, and an explicit decision procedure is given. +On the other hand, if it can be shown than a monoid does not have finite derivation type, we can conclude that no convergent presentation can exist over any generating set, even if its word problem is decidable by other means. Squier was able to show that $S_1$ does not have finite derivation type, but it does have a decidable word problem. So, each of the below classes is a \index{proper subset}proper subset of the next, and we shall be content to play in the shallow end of the pool from here on out: \begin{center} @@ -1417,11 +1413,54 @@ \section{The Normal Form Algorithm}\label{rewritesystemintro} all finitely-presented monoids \end{tabular} \end{center} -More conditions satisfied by monoids presented by convergent rewriting systems were explored in \cite{fptype}, \cite{fdtfp3}, and \cite{mild}. For a survey of results about convergent rewriting systems, see \cite{Otto1997}. Many open questions remain. Notice how many of the example monoids we saw only had a single rewrite rule. Even with such ``one-relation'' monoids, the situation is far from trivial. For example, it is not known if every one-relation monoid can be presented by a convergent rewriting system, or more generally, if the word problem for such monoids is decidable or not. However, one-relation monoids are known to have finite derivation type \cite{KOBAYASHI2000547}. For a survey of results about one-relation monoids and a good overview of the word problem in general, see \cite{onerelation}. Apart from string rewriting, another approach to solving the word problem is to use finite state automata to describe the structure of a monoid or group~\cite{epstein1992word}. Other interesting undecidable problems in abstract algebra appear in~\cite{tarski1953undecidable}. +More conditions satisfied by monoids presented by convergent rewriting systems were explored in \cite{fptype,fdtfp3,mild}. For a survey of results about convergent rewriting systems, see \cite{Otto1997}. Many open questions remain. Notice how many of the example monoids we saw only had a single rewrite rule. Even with such ``one-relation'' monoids, the situation is far from trivial. For example, it is not known if every one-relation monoid can be presented by a convergent rewriting system, or more generally, if the word problem for such monoids is decidable or not. However, one-relation monoids are known to have finite derivation type \cite{KOBAYASHI2000547}. For a survey of results about one-relation monoids and a good overview of the word problem in general, see \cite{onerelation}. Apart from string rewriting, another approach to solving the word problem is to use finite state automata to describe the structure of a monoid or group~\cite{epstein1992word}. Other interesting undecidable problems in abstract algebra appear in~\cite{tarski1953undecidable}. + +\ifWIP + +%%% +% This will replace some of the above content: +%%% + +The first examples with decidable word problem are due to Craig C.~Squier~\cite{fptype}, who showed that a monoid given by a finite complete presentation must satisfy the invariant of~$\FP_3$, while~$S_k$, with presentation below, has a decidable word problem for all $k \geq 0$, but not $\FP_3$ when $k \geq 2$: +\begin{alignat*}{2} +S_k := \Pres{a,b,t,x_1,\ldots,x_k,y_1,\ldots,y_k}{ +&ab = 1, \\ +&x_1 a = a t x_1, && \quad \ldots,\quad x_k a = a t x_k, \\ +&x_1 t = t x_1, && \quad \ldots,\quad x_k t = t x_k, \\ +&x_1 b = b x_1, && \quad \ldots,\quad x_k b = b x_k, \\ +&x_1 y_1 = 1, && \quad \ldots,\quad x_k y_k = 1} +\end{alignat*} + +To settle the $k=1$ case, Squier then introduced the invariant of \emph{finite derivation type}, or FDT, in a subsequent paper~\cite{SQUIER1994271}. The key result is that a monoid given by a finite complete presentation has FDT, while $S_1$, with 5~generators and 5 relations, does not have FDT, and thus, no finite complete presentation: +\[S_1:=\Pres{a,b,t,x,y}{ab= 1,xa= atx,xt= tx,xb= bx,xy= 1}\] + +We review the definition of FDT in \SecRef{sec:fdt}. We do not define $\FP_3$; it suffices to note that FDT implies $\FP_3$, so in fact $S_k$ is not FDT for all $k \geq 1$ \cite{fdtfp3}. + +Finite derivation type is not a \emph{sufficient} condition for a monoid to admit a finite complete presentation. Katsura and Kobayashi discovered this monoid with decidable word problem and FDT, but no finite complete presentation~\cite{solid}: +\begin{align*} +\Pres{a,b_1,c_1,d_1,b_2,c_2,d_2,b_3,c_3,d_3}{&b_1 a = ab_1, +b_2 a = ab_2, +b_3 a = ab_3,\\ +&c_1 b_1 = c_1 b_1,c_2 b_2 = c_1 b_1,\\ +&b_1 d_1 = b_1 d_1,b_2 d_2 = b_1 d_1} +\end{align*} + +Can we have fewer defining relations than Squier's $S_1$? It seems the previous record was \emph{three} relations. Lafont and Prout{\'e} exhibit this non-$\FP_3$ monoid~\cite{Lafont1991ChurchRooserPA}: +\[ +\Pres{a,b,c,d,d'}{ab = a, da = ac, d'a = ac} +\] +Cain et al.~ show that this monoid does not have FDT~\cite{CAIN201768}: +\[ +\Pres{a,b,c}{ac = ca, bc = cb, cab = cbb} +\] + +Every one-relation monoid $\Pres{A}{u=v}$ has FDT \cite{KOBAYASHI2000547}. It remains an open question if every one-relation monoid has a finite complete presentation or a decidable word problem~\cite{onerelation}. + +\fi -\paragraph{Type substitution.} In fact, we could have formalized the \index{type substitution}type substitution algebra as a term rewriting system. This would be a rewriting system over tree-structured terms, because the types, substitution maps and conformances in this algebra all recursively nest within one another. +\paragraph{Type substitution.} In fact, we could have formalized the \index{type substitution}type substitution algebra as a \index{term rewriting}term rewriting system. This would be a rewriting system over tree-structured terms, because the types, substitution maps and conformances in this algebra all recursively nest within one another. -The ``evaluation rules'' for the $\otimes$ operator would instead define a reduction relation, and the \index{associative operation}associativity of $\otimes$ would be equivalent to proving that the reduction relation is \index{confluence}confluent. (The intrepid reader can quickly review \AppendixRef{notation summary} to see why this might be the case.) This reduction relation is not \index{terminating reduction relation}terminating, though, because as \SecRef{tag systems} demonstrates, it can encode arbitrary computation. +The ``evaluation rules'' for the $\otimes$ operator would instead define a reduction relation, and the \index{associative operation}associativity of $\otimes$ would be equivalent to proving that the reduction relation is \index{confluence}confluent. (The intrepid reader can quickly review \AppendixRef{notation summary} to see why this might be the case.) This reduction relation is not \index{terminating reduction relation}terminating, though, because as \SecRef{halting problem} demonstrates, it can encode arbitrary computation. When viewed as a term rewriting system, type substitution is a lot like the \index{lambda calculus}lambda calculus---confluent but non-terminating. While this is an interesting observation, we will not pursue this direction further. diff --git a/docs/Generics/chapters/opaque-result-types.tex b/docs/Generics/chapters/opaque-result-types.tex new file mode 100644 index 0000000000000..54c3b8c2e02a9 --- /dev/null +++ b/docs/Generics/chapters/opaque-result-types.tex @@ -0,0 +1,1106 @@ +\documentclass[../generics]{subfiles} + +\begin{document} + +\chapter{Opaque Result Types}\label{chap:opaque result types} + +\lettrine{O}{paque result types} introduce a new form of abstraction that cannot be expressed with the other language features we've described so far. Despite this conceptual leap, we will see that opaque result types are assembled from existing building blocks: generic signatures (\ChapRef{chap:generic signatures}), substitution maps (\ChapRef{chap:substitution maps}), and abstract conformances (\SecRef{abstract conformances}). We will begin with an intuitive overview, look at the concrete syntax, and finally, detail the semantics. + +The fundamental idea behind opaque result types is that they ``reverse'' the usual relationship between the caller and callee of a generic declaration. While an ordinary ``input'' generic parameter abstracts over a generic argument provided by the caller, an opaque result type is like an ``output'' generic parameter in that it abstracts over a fixed concrete type that is only known to the \emph{callee}. We refer to this concrete type as the \IndexDefinition{underlying type!opaque result type}\emph{underlying type} of the opaque result type. + +Like ordinary generic parameters, opaque result types can be subject to requirements. When type checking the body of a declaration with an opaque result type, we infer its underlying type from the statements in the body, and we ensure this underlying type satisfies the requirements imposed upon it. If we then encounter a call to a declaration with an opaque result type, we say that result type of the call is an \emph{opaque archetype} subject to these requirements; the caller does not know the underlying type. + +\paragraph{Opaque result declarations.} +Recall how in parameter position, the \texttt{some} keyword denotes an \index{opaque parameter}opaque \emph{parameter} declaration, which was simply syntax sugar for an unnamed generic parameter together with a single conformance requirement (\SecRef{generic params}). Now, when \texttt{some} appears in the \emph{return type} of a \index{function declaration!opaque result type}function or \index{subscript declaration!opaque result type}subscript declaration, or as the type of a \index{variable declaration!opaque result type}variable declaration, we get an \IndexDefinition{opaque result declaration}opaque \emph{result} declaration instead: +\begin{Verbatim} +func foo() -> some Sequence {...} +var bar: some Sequence {...} +struct S { + subscript(_: Int) -> some Sequence {...} +} +\end{Verbatim} +Opaque result declarations come into existence when we evaluate the \IndexDefinition{opaque result type request}\Request{opaque result type request}. This request receives a value declaration as input. In the evaluation function, we check if the declaration's return type contains any occurrences of \texttt{some}. If not, we're done; otherwise, we construct an opaque result declaration. + +\paragraph{Opaque result generic signatures.} +An opaque result declaration points back to its \IndexDefinition{owner declaration}\emph{owner declaration}, which is the \index{value declaration}value declaration that declares this opaque result type. There is a one-to-one correspondence between opaque result declarations, and their owner declarations. Since the \texttt{some} keyword may appear any number of times in the owner declaration's return type, an opaque result declaration will in general declare one or more \IndexDefinition{opaque result type}\emph{opaque result types}. + +We describe these opaque result types using the \index{generic signature!opaque result type}generic signature of the opaque result declaration, which we build within the \Request{opaque result type request}. We will call this generic signature the \IndexDefinition{opaque result generic signature}\emph{opaque result generic signature}, while the \IndexDefinition{outer generic signature}\emph{outer generic signature} is what we call the generic signature of the owner declaration. + +The first step is to collect all occurrences of \texttt{some}. We're working with syntactic \index{type representation!opaque result type}type representations here (\ChapRef{chap:type resolution}), so we walk the type representation of the owner declaration's return type, and invoke \index{type resolution!opaque result type}type resolution to resolve each constraint type that follows each \texttt{some}. Once we have this list of constraint types, we kick off the \index{abstract generic signature request!opaque result type}\Request{abstract generic signature request} (\ChapRef{chap:building generic signatures}) with the following parameters: +\begin{enumerate} +\item We pass the outer generic signature as the parent signature. +\item We add a \index{generic parameter type!opaque result type}generic parameter for each occurrence of the \texttt{some} keyword in the owner declaration's return type. + +If the owner declaration is not actually generic, all new generic parameters have \index{depth!opaque result type}depth 0; otherwise, we set their depth to be one more than the maximum depth of the outer generic signature's parameters. The \index{index!opaque result type}index of each new generic parameter is determined by the order in which the \texttt{some} keyword appears. + +\item For each new generic parameter, we also add a requirement with this generic parameter on the left-hand side, and the constraint type on the right-hand side. + +When the constraint type is a \index{protocol type!opaque result type}protocol type, a \index{protocol composition type!opaque result type}protocol composition type, or a \index{parameterized protocol type!opaque result type}parameterized protocol type, this will be a \index{conformance requirement!opaque result type}conformance requirement. When we have a protocol composition or parameterized protocol type, the request will \index{requirement decomposition}decompose the requirement into simpler requirements automatically (\SecRef{requirement desugaring}). + +Otherwise, the constraint type must be a \index{class type!opaque result type}class type, and we add a \index{superclass requirement!opaque result type}superclass requirement instead. + +\end{enumerate} + +The opaque result generic signature does not depend on the underlying type of its opaque result types, so we can compute it without looking at the function body. This is important, because we omit function bodies in \index{textual interface!opaque result type}textual interfaces, except for \index{inlinable function!opaque result type}\verb|@inlinable| functions (\SecRef{module system}). Similarly, the parser does not parse bodies of declarations in the \index{secondary file}secondary files of a frontend job. The only time we attempt to compute the underlying type is when the owner declaration appears in a \index{primary file}primary source file for this frontend job. For this reason, we can omit function bodies in the examples on the following page. + +\begin{example} +The below is a list of the opaque result generic signatures that we obtain in a few simple instances. We will revisit some of these examples later. +\begin{enumerate} +\item Here we get a completely unconstrained opaque result type, because \Index{Any@\texttt{Any}}\texttt{Any} is the empty protocol composition; a conformance requirement to \texttt{Any} is a no-op: +\begin{Verbatim} +func fullyOpaque() -> some Any {...} +\end{Verbatim} +\textbf{Opaque result generic signature:} \verb|<τ_0_0>| + +\item The following opaque result declaration has two unconstrained opaque result types: +\begin{Verbatim} +func fullyOpaquePair() -> (some Any, some Any) {...} +\end{Verbatim} +\textbf{Opaque result generic signature:} \verb|<τ_0_0, τ_0_1>| + +\item If the constraint type is a single \index{protocol type!opaque result type}protocol, we get an opaque result generic signature with a conformance requirement to this protocol: +\begin{Verbatim} +func someEquatable(_ b: Bool) -> some Equatable {...} +\end{Verbatim} +\textbf{Opaque result generic signature:}\\ +\verb|<τ_0_0 where τ_0_0: Equatable>| + +\item If the owner declaration is generic, we incorporate its generic signature into the opaque result generic signature. We will see why in the next section: +\begin{Verbatim} +func someEquatable2(_: T) -> some Equatable {...} +\end{Verbatim} +\textbf{Opaque result generic signature:}\\ +\verb|<τ_0_0, τ_1_0 where τ_1_0: Equatable>| + +\item A \index{parameterized protocol type!opaque result type}parameterized protocol type decomposes into a conformance requirement and one or more \index{same-type requirement!opaque result type}same-type requirements: +\begin{Verbatim} +func someSequenceOfInt() -> some Sequence {...} +\end{Verbatim} +\textbf{Opaque result generic signature:}\\ \verb|<τ_0_0 where τ_0_0: Sequence, τ_0_0.Element == Int>| + +\item Finally, a parameterized protocol type can introduce a same-type requirement between an opaque result type, and a type parameter of the owner declaration: +\begin{Verbatim} +func someSequenceOfT(_: T) -> some Sequence {...} +\end{Verbatim} +\textbf{Opaque result generic signature:}\\ \verb|<τ_0_0, τ_1_0 where τ_0_0 == τ_1_0.Element, τ_1_0: Sequence>| +\end{enumerate} +\end{example} + +\paragraph{Inferring the underlying type.} +When type checking a function body in a primary file, we infer the \index{underlying type!opaque result type}underlying types of its opaque result types after we assign types to \index{expression!opaque result type}expressions. We collect all \index{return statement!opaque result type}\texttt{return} \index{statement}statements that appear in the function body, and consider the type of each returned expression. + +\begin{example} +The \index{horse}\texttt{hungryHorses} computed property demonstrates a typical use case for opaque result types, which is to ``hide'' a non-trivial generic return type: +\begin{Verbatim} +struct Horse { + var isHungry: Bool +} + +struct Farm { + var horses: [Horse] = [] + var hungryHorses: some Collection { + return horses.lazy.filter(\.isHungry) + } +} +\end{Verbatim} +The \texttt{return} statement's type is \texttt{LazyFilterSequence>}, but the caller only sees that it is a \texttt{Collection} with an \texttt{Element} type of \texttt{Horse}. +\end{example} + +The fundamental invariant we must maintain is that the underlying type of an opaque result type cannot change while a program is running. In general, if the body of a function with an opaque result type contains multiple \texttt{return} statements, they must all return the same exact type. (This is also true of an ordinary function, as well.) +\begin{example} +The following is thus not allowed; to model this kind of dynamism, the function could return an existential \verb|any Sequence| instead (\ChapRef{chap:existential types}): +\begin{Verbatim} +func twoSequences(_ b: Bool) -> some Sequence { + if b { + return [1, 2, 3] // Array + } else { + return ["a", "ab", "ba"] // Array + } +} +\end{Verbatim} +\end{example} +There is one exception to this rule. We allow the underlying type of an opaque result type to depend on \index{availability}\emph{availability}. We will not discuss availability checking in the this book, except to say that the outcome of an availability check, which is written with the special ``\verb|if #available(...)|'' syntax, does not change during program execution. Thus, we allow a function with an opaque result type to contain \texttt{return} statements with mismatched types, as long as they're in different branches of an availability check. +\begin{example} +On a \index{macOS}macOS host, the \texttt{bestWidget()} function will return one of two underlying types, depending on the operating system version: +\begin{Verbatim} +protocol Widget {} +struct OldWidget: Widget {} + +@available(macOS 11, *) +struct NewWidget: Widget {} + +func bestWidget() -> some Widget { + if #available(macOS 11, *) { + return NewWidget() + } else { + return OldWidget() + } +} +\end{Verbatim} +\end{example} + +We record the underlying types of an opaque result declaration in a series of \IndexDefinition{underlying type substitution map}\emph{underlying type substitution maps}, where each substitution map corresponds to a disjoint \index{opaque result type!with availability}\emph{availability range}. Henceforth, we will assume that each opaque result declaration only has one availability range, and thus one underlying type substitution map. + +The underlying type substitution map's \index{input generic signature!opaque result type}input generic signature is the opaque result generic signature, while its output generic signature is, by construction, the outer generic signature. This substitution map sends the generic parameters of the outer generic signature to themselves, and the generic parameters that represent opaque result types to their corresponding underlying types. We check that this substitution map \index{satisfied requirement!opaque result type}satisfies the requirements of the opaque result generic signature using \AlgRef{check generic arguments algorithm}. + +\begin{example} +We reject the following program: +\begin{Verbatim} +func invalidUnderlyingType() -> some Sequence { + return ["ab", "ba", "aab"] // error +} +\end{Verbatim} +The opaque result generic signature is +\begin{quote} +\texttt{<\rT\ where \rT:\ Sequence, \rT.Element == Int>} +\end{quote} +We form the following underlying type substitution map for this generic signature: +\begin{align*} +\SubstMapC{&\SubstType{\rT}{Array} +}{\\ +&\SubstConf{\rT}{Sequence}{Array} +} +\end{align*} +Applying this substitution map to the \index{same-type requirement!opaque result type}same-type requirement $\SameReq{\rT.Element}{Int}$ produces $\SameReq{String}{Int}$, which is unsatisfied, so we \index{diagnostic!opaque result type}diagnose an error. +\end{example} + +\paragraph{History.} +Opaque result types were first introduced in \IndexSwift{5.1}Swift 5.1 \cite{se0244}, so they actually predate opaque parameter declarations~\cite{se0341}. The initial implementation only allowed \texttt{some} to appear once at the outermost level of a type representation. \IndexSwift{5.7}Swift~5.7 generalized this to allow \texttt{some} to appear one or more times, nested in arbitrary types \cite{se0328}. Swift~5.7 also introduced opaque result types that depend on availability~\cite{se0360}. + +\section{Opaque Archetypes}\label{opaquearchetype} + +Once we have the opaque result generic signature, we can proceed to \index{type resolution!opaque archetype}type resolution, where we build the \index{interface type!opaque result type}interface type of the \index{owner declaration}owner declaration. To get the interface type for an occurrence of \texttt{some} in return position, we take the corresponding generic parameter in the opaque result generic signature, and form the \IndexDefinition{opaque archetype}\emph{opaque archetype} representing this generic parameter. This opaque archetype then appears in the interface type of the owner declaration. + +We met \index{archetype type!opaque archetype}archetypes and the \index{generic environment!opaque result type}generic environments that spawn them in \ChapRef{chap:archetypes}. Our focus there was on \emph{primary archetypes}, and we recall some facts about primary archetypes first. Formally, a primary archetype is a pair $(G, \tT)$ for some generic signature~$G$ and (reduced) type parameter~\tT. We denoted a primary archetype by~$\archetype{T}_G$, or just~$\archetype{T}$ when $G$ is clear from context. Primary archetypes only appear in the types of expressions within function bodies, and subsequently in SIL instructions. + +An opaque archetype similarly packages up a type parameter with a generic signature: + +\begin{definition} +If $d$~is an opaque result declaration and \tT~is a \index{reduced type parameter!opaque archetype}reduced \index{type parameter!opaque archetype}type parameter for the opaque result generic signature of~$d$, we will denote the opaque archetype for~\tT\ by~$\Ot_d$, or~$\Ot$ when $d$~is implied from context. (The symbol \index{$\Ot$} +\index{$\Ot$!z@\igobble|seealso{opaque archetype}} +``$\circlearrowright$'' is, of course, an amalgamation of ``o'' for opaque and ``\texttt{->}'' for result type.) +\end{definition} + +\begin{example}\label{some equatable example} +Recall that when the owner declaration is not generic, the opaque result generic signature's generic parameters have \index{depth!opaque result type}depth 0. Consider this function: +\begin{Verbatim} +func someEquatable(_ b: Bool) -> some Equatable { + return b ? 1 : 2 +} +\end{Verbatim} +The opaque result generic signature is \texttt{<\rT\ where \rT:\ Equatable>} in the above, and we resolve this function's return type to the opaque archetype $\Ox$. The \index{interface type!opaque archetype}interface type of this function declaration is the following function type: +\begin{quote} +\texttt{(Bool) -> $\Ox$} +\end{quote} +\end{example} + +Note that opaque archetypes from distinct opaque result declarations are themselves distinct, even if their opaque result generic signatures are equal; in that case, they just represent unrelated types that happen to satisfy the same requirements. + +Opaque archetypes interact with the \index{type substitution!opaque archetype}type substitution algebra in two ways. First, an opaque archetype may appear in a substitution map as a replacement type. Second, we can apply a substitution map to an opaque archetype to get a new opaque archetype. We will now consider each possibility in turn. +\begin{example} +In the previous example, the underlying type of the opaque result type of \texttt{someEquatable()} is \texttt{Int}, but the caller cannot observe this fact. Instead, we get the opaque archetype $\Ox$, and only the operations provided by the \texttt{Equatable} protocol are available to us. We also know that every call returns the same type: +\begin{Verbatim} +print(someEquatable(false) == someEquatable(true)) // prints false +print(someEquatable(false) == someEquatable(false)) // prints true +\end{Verbatim} +Recall that both arguments to ``\texttt{==}'' must have the same type, and this type must conform to \texttt{Equatable}. The generic signature of ``\texttt{==}'' is \texttt{<\rT\ where \rT:\ Equatable>}, so we form this substitution map that contains the replacement type $\Ox$: +\begin{align*} +\SubstMapC{&\SubstType{\rT}{$\Ox$}}{\\ +&\SubstConf{\rT}{$\Ox$}{Equatable}} +\end{align*} +Our input generic signature states a conformance requirement, so we need to \index{global conformance lookup!opaque archetype}look up the conformance of $\Ox$ to \texttt{Equatable} when we build our substitution map. We extended global conformance lookup to archetypes in \SecRef{local requirements}. It behaves identically with primary and opaque archetypes. When given an opaque archetype whose opaque result generic signature states a conformance requirement, global conformance lookup will output an \index{abstract conformance!opaque archetype}abstract conformance whose subject type is this opaque archetype: +\[ +\Proto{Equatable} \otimes \Ox = \ConfReq{$\Ox$}{Equatable} +\] +\end{example} + +We shorten this by saying we get an \IndexDefinition{opaque abstract conformance}\emph{opaque abstract conformance}. Next, we will demonstrate \index{type witness!opaque archetype}type witness projection from an opaque abstract conformance. To do that, we constrain our opaque result type to a protocol with associated types. + +\begin{example}\label{opaque type witness example} +In the final line of the below listing, the call expression \texttt{someSequence()} has the type~$\Ox$. We will deduce the type of \texttt{pick(someSequence())}: +\begin{Verbatim} +func someSequence() -> some Sequence { + return [1, 2, 3] +} + +func pick(_ s: S) -> S.Element {...} + +print(pick(someSequence())) +\end{Verbatim} +The substitution map for the call to \texttt{pick()} is: +\begin{align*} +\Sigma := \SubstMapC{&\SubstType{\rT}{$\Ox$}}{\\ +&\SubstConf{\rT}{$\Ox$}{Sequence}} +\end{align*} +We apply $\Sigma$ to \texttt{pick()}'s original return type, \texttt{\rT.Element}, to get the answer. If we factor this dependent member type and apply $\Sigma$ to its abstract conformance, we get the opaque abstract conformance from our substitution map: +\begin{gather*} +\texttt{\rT.Element} \otimes \Sigma \\ +\qquad {} = \AElement \otimes \ConfReq{\rT}{Sequence} \otimes \Sigma \\ +\qquad {} = \AElement \otimes \ConfReq{$\Ox$}{Sequence} +\end{gather*} +To get the final result, we project the type witness. This outputs an opaque archetype for the same opaque result declaration, but representing our dependent member type: +\begin{gather*} +\AElement \otimes \ConfReq{$\Ox$}{Sequence} = \Opaque{\rT.Element} +\end{gather*} +Note that \texttt{someSequence()} has the following \index{underlying type!opaque result type}underlying type substitution map: +\begin{align*} +\SubstMapC{&\SubstType{\rT}{Array}}{\\ +&\SubstConf{\rT}{Array}{Sequence}} +\end{align*} +Therefore, at run time, a value of type $\Opaque{\rT.Element}$ is actually an \texttt{Int}. +\end{example} + +\paragraph{Declared interface type.} +Recall our discussion of type declarations from \ChapRef{chap:decls}. We didn't mention it at the time, but an opaque result declaration is actually a special kind of type declaration, so we must also give it a \index{declared interface type!opaque result type}declared interface type. We say that the declared interface type of an opaque result declaration is the \index{owner declaration}owner declaration's return type. Note that this type will \emph{contain} at least one opaque archetype, but it may not necessarily \emph{be} an opaque archetype. Thus, an opaque result declaration can be said to ``declare'' all opaque archetypes that appear in its declared interface type, which can otherwise contain arbitrary structure. In the implementation, the declared interface type of a type declaration must be set to \emph{something}, and setting it to \emph{this} turns out to be convenient for the implementation. + +\paragraph{Type substitution.} +Under \AlgRef{type subst algo}, a primary archetype~$\archetype{T}$ behaves like the type parameter \tT\ that it represents, in the sense that if we have a substitution map~$\Sigma$, then $\archetype{T} \otimes \Sigma = \tT \otimes \Sigma$. Opaque archetypes are rather different: applying a substitution map to an opaque archetype produces a new opaque archetype. + +In full generality, an opaque archetype is really a \emph{triple} $(\tT, d, \Sigma)$ where \tT\ is a reduced type parameter in the opaque result generic signature of~$d$, and $\Sigma$ is a substitution map \index{input generic signature!opaque archetype}for the \index{outer generic signature}outer generic signature of~$d$. (If the owner declaration of~$d$ is not generic, $\Sigma$ is always the \index{empty substitution map!opaque archetype}empty substitution map, and all of the below trivializes.) + +\begin{definition} +We will continue to use the notation $\Ot_d$ to denote an opaque archetype for the identity substitution map of its \index{outer generic signature}outer generic signature. More generally, $\Ot_d \otimes\Sigma$ will denote a \IndexDefinition{substituted opaque archetype}\emph{substituted opaque archetype} for a \index{substitution map!opaque archetype}substitution map~$\Sigma$. +\end{definition} + +Type resolution always resolves each occurrence of \texttt{some} to an opaque archetype for the \index{identity substitution map!opaque archetype}identity substitution map of its outer generic signature. Of course, as the notation suggests, if we apply a substitution map $\Sigma$ to $\Ot$, we get $\Ot \otimes \Sigma$. We remark that this behavior is analogous to the \index{context substitution map!opaque archetype}context substitution map of a generic nominal type. + +\begin{example}\label{generic opaque example} +Here is a generic function that states an opaque result type: +\begin{Verbatim} +func someEquatable2(_ t: T) -> some Equatable { + return [t] +} +\end{Verbatim} +The opaque result generic signature includes the outer generic signature, and also adds a generic parameter with \index{depth!opaque result type}depth 1, for the opaque result type: +\begin{quote} +\begin{verbatim} +<τ_0_0, τ_1_0 where τ_0_0: Equatable, τ_1_0: Equatable> +\end{verbatim} +\end{quote} +The interface type of \texttt{someEquatable2()} is the following \index{generic function type}generic function type: +\begin{quote} +\texttt{<\rT\ where \rT:\ Equatable> (\rT) -> $\Oy$} +\end{quote} +Now, we consider type checking the following two statements: +\begin{Verbatim} +print(someEquatable2(1) == someEquatable2(2)) // prints false +print(someEquatable2(1) == someEquatable2("hello")) // type error +\end{Verbatim} +In the first statement, both calls to \texttt{someEquatable2()} have the same substitution map: +\begin{align*} +\Sigma_1 := \SubstMapC{ +&\SubstType{\rT}{Int} +}{\\ +&\SubstConf{\rT}{Int}{Equatable} +} +\end{align*} +It follows that the first statement is well-typed, because we call ``\texttt{==}'' with two arguments of type $\Oy \otimes \Sigma_1$. Evaluating it would print \texttt{false}, because the two values are not actually equal. We don't get that far, however. In the second statement, the right-hand side is a call to \texttt{someEquatable2()} with a different substitution map: +\begin{align*} +\Sigma_2 := \SubstMapC{ +&\SubstType{\rT}{String} +}{\\ +&\SubstConf{\rT}{String}{Equatable} +} +\end{align*} +Thus, the type checker rejects the second statement, because the type of the right-hand side is $\Oy \otimes \Sigma_2$, which is distinct from the left-hand side $\Oy \otimes \Sigma_1$. + +We can justify this behavior if we note that the underlying type of an opaque result type similarly depends on its outer generic signature. Consider the \index{underlying type substitution map}underlying type substitution map of \texttt{someEquatable2()}: +\begin{align*} +\SubstMapC{ +&\SubstType{\rT}{\rT},\\ +&\SubstType{\rO}{Array<\rT>} +}{\\ +&\SubstConf{\rT}{\rT}{Equatable}\\ +&\SubstConf{\rO}{Array<\rT>}{Equatable} +} +\end{align*} +The \index{output generic signature!opaque result type}output generic signature of this substitution map is the generic signature of the owner declaration, \texttt{someEquatable2()}. We will elaborate this in \SecRef{opaque result runtime}. +\end{example} + +We saw that if we take an opaque archetype for the identity substitution map, and apply a substitution map $\Sigma$, we get a new opaque archetype for this substitution map~$\Sigma$. More \index{type substitution!opaque archetype}generally, if an opaque archetype carries an arbitrary substitution map, applying another \index{opaque archetype!type substitution}substitution map to this archetype will \index{substitution map composition}compose the two substitution maps. + +\begin{algorithm}[Apply substitution map to opaque archetype]\label{opaquearchetypesubst} +Receives an opaque archetype~$\Opaque{T}_d \otimes \Sigma$ and a substitution map $\Sigma^\prime$ as input. Outputs $(\Ot_d\otimes\Sigma)\otimes\Sigma^\prime$. +\begin{enumerate} +\item Decompose $\Ot_d \otimes \Sigma$ into a triple $(\tT, d, \Sigma)$ for some type parameter \tT, opaque result declaration $d$, and substitution map $\Sigma$. +\item Construct $\Sigma \otimes \Sigma^\prime$ (\DefRef{subst map composition}). +\item Construct an opaque archetype from the triple $(\tT, d, \Sigma\otimes\Sigma^\prime)$, and return it. (This opaque archetype is written as $\Ot_d \otimes (\Sigma\otimes\Sigma^\prime)$ in our notation, for reasons that are now hopefully clear.) +\end{enumerate} +\end{algorithm} +Notice how if we start with $\Ot_d = \Ot_d\otimes 1_G$ and apply a substitution map~$\Sigma$, the above algorithm reduces to directly constructing a new opaque archetype with this substitution map, because $1_G \otimes \Sigma = \Sigma$: +\[ +(\Ot_d\otimes 1_G) \otimes \Sigma = \Ot_d\otimes (1_G \otimes \Sigma)=\Ot_d\otimes \Sigma +\] + +\paragraph{Interface types.} +The fact that opaque archetypes can appear in the \index{interface type!opaque archetype}interface types of declarations is important. Certainly, when a declaration states an opaque result type, its interface type will contain its own opaque archetype. However, the interface type of a declaration may involve opaque archetypes declared elsewhere. The most obvious case is a variable declaration whose initial value expression calls a function with an opaque result type, for example: +\begin{Verbatim} +let result = someEquatable(false) +\end{Verbatim} +However, this is not always permitted, depending on the variable's \index{declaration context}declaration context. We allow this for \index{local variable!opaque archetype}local variables, as well as any global variables in the \index{main source file}main source file of a module (if it has one). \index{global variable!opaque archetype}Global variables in library source files, and \index{stored property!opaque archetype}stored properties of nominal type declarations, are prohibited from referencing opaque archetypes from other owner declarations, and we \index{diagnostic}diagnose an \index{error}error in this case. + +This distinction between these two kinds of variables is \index{limitation!opaque result type}artificial though, and we will see an even more general way to reference the opaque archetypes of another declaration in \SecRef{reference opaque archetype}. Thus, we extend our definition of an interface type to encompass types that contain certain opaque archetypes, without any restriction on the opaque archetype's owner declaration. + +\begin{definition} +We amend \DefRef{interface type def} as follows. Suppose that $\Ot_d \otimes \Sigma$ is an \index{opaque archetype!interface type}opaque archetype. If $G$ is the outer generic signature of~$d$, and $H$ is some other generic signature such that $\Sigma\in\SubMapObj{G}{H}$, then we say that: +\[ +\Ot_d \otimes \Sigma \in \TypeObj{H} +\] +Thus, an opaque archetype is an interface type for the \index{output generic signature!opaque archetype}output generic signature of its \index{substitution map!opaque archetype}substitution map. In particular, if $\Sigma$ is the identity substitution map~$1_G$, then: +\[ +\Ot \otimes 1_G = \Ot \in \TypeObj{G} +\] +That is, when we resolve an occurrence of \texttt{some} to an opaque archetype in type resolution, we get an interface type for its outer generic signature, as we expect. +\end{definition} + +\paragraph{Conformance substitution.} +We now ponder what it means to apply a \index{conformance substitution!opaque archetype}substitution map to an \index{opaque abstract conformance}opaque \index{abstract conformance!opaque archetype}abstract conformance. As originally defined, an abstract conformance $\TP$ has a type parameter \tT\ as its subject type, and we saw that applying a substitution map $\Sigma$ to $\TP$ reduces to a \index{local conformance lookup}local conformance lookup into $\Sigma$. We related this to \index{global conformance lookup}global conformance lookup on the type $\tT \otimes \Sigma$ using the following identity from \SecRef{abstract conformances}: +\[ +\TP \otimes \Sigma = (\PP \otimes \tT) \otimes \Sigma = \PP \otimes (\tT \otimes \Sigma) +\] +In \SecRef{archetypesubst}, we encountered abstract conformances of primary archetypes, and from the above, we deduced they act like type parameters under conformance substitution, in the sense that $\ConfReq{$\archetype{T}$}{P} \otimes \Sigma = \TP \otimes \Sigma$. Now, suppose we apply $\Sigma$ to an opaque abstract conformance, say $\ConfReq{$\Ot$}{P}$. Again, we have $\ConfReq{$\Ot$}{P} = \PP \otimes \Ot$, so: +\[ +\ConfReq{$\Ot$}{P} \otimes \Sigma = (\PP \otimes \Ot) \otimes \Sigma = \PP \otimes (\Ot \otimes \Sigma) +\] +However, this time, $\Ot \otimes \Sigma$ is another opaque archetype, and $\PP \otimes (\Ot \otimes \Sigma)$ reduces to another abstract conformance to~\tP, except now for the substituted archetype. We define conformance substitution with an opaque abstract conformance thusly: +\[ +\ConfReq{$\Ot$}{P} \otimes \Sigma := \ConfReq{$(\Ot \otimes \Sigma)$}{P} +\] +More generally, the subject type of an opaque abstract conformance might have a non-identity substitution map, at which point the notation becomes slightly awkward: +\[ +\ConfReq{$(\Ot \otimes \Sigma)$}{P} \otimes \Sigma^\prime := \ConfReq{$(\Ot \otimes (\Sigma \otimes \Sigma^\prime))$}{P} +\] +The key fact is applying a substitution map to an opaque abstract conformance always outputs another opaque abstract conformance. This is unlike the situation with abstract conformances whose subject types are type parameters or primary archetypes. + +\paragraph{Interface types and contextual types.} Recall that a \index{contextual type!opaque archetype}contextual type is one that contains primary archetypes. In \SecRef{archetypesubst}, we introduced the $\MapIn$ and $\MapOut$ operations for mapping between the interface types and contextual types of a generic signature~$G$, two sets which we denoted by $\TypeObj{G}$ and $\TypeObjCtx{G}$. + +We also characterized $\MapIn$ and $\MapOut$ in terms of type substitution, as the application of the \index{forwarding substitution map!opaque archetype}forwarding substitution map~$\FwdMap{G}$ or the \index{identity substitution map!opaque archetype}identity substitution map $1_G$ to a type, respectively. This gave us analogous mappings between $\SubMapObj{G}{H}$ and $\SubMapObjCtx{G}{H}$; we compose a substitution map on the right with $\FwdMap{H}$ or $1_H$ to transform its replacement types into contextual or interface types, as desired. + +Naturally, we extend ``$\mathsf{in}$'' and ``$\mathsf{out}$'' to opaque archetypes by composing the opaque archetype's substitution map with the forwarding or identity substitution map for the appropriate generic signature on the right. Let's work out the details. + +First, suppose that $\Ot_d$ is an opaque archetype for the identity substitution map~$1_G$, where $G$ is the outer generic signature of~$d$. When we map $\Ot_d$ into the primary generic environment of $G$, we get the following opaque archetype: +\[ +\MapIn(\Ot_d) = \Ot_d \otimes \FwdMap{G} \in \TypeObjCtx{G} +\] +In the general case, we have an opaque archetype $\Ot_d \otimes \Sigma$ for some $\Sigma \in \SubMapObj{G}{H}$. This opaque archetype is an element of $\TypeObj{H}$, and we can map it into the primary generic environment of $H$ to get a new opaque archetype whose substitution map is the corresponding element of $\SubMapObjCtx{G}{H}$: +\[ +\mathsf{in}_H(\Ot_d \otimes \Sigma) = \Ot_d \otimes (\Sigma \otimes \FwdMap{H}) \in \TypeObjCtx{H} +\] + +In the other direction, we can apply \index{map type out of environment!opaque archetype}$\MapOut$ to the contextual type $\Ot_d \otimes \FwdMap{G}$. Since $\FwdMap{G} \otimes 1_G = 1_G$, we receive the original opaque archetype $\Ot_d$: +\[ +\MapOut(\Ot_d \otimes \FwdMap{G}) = \Ot_d \otimes (\FwdMap{G} \otimes 1_G) = \Ot_d \in \TypeObj{G} +\] +Finally, in the general case where $\Sigma \in \SubMapObjCtx{G}{H}$, we get: +\[ +\mathsf{out}_H(\Ot_d \otimes \Sigma) = \Ot_d \otimes (\Sigma \otimes 1_H) \in \TypeObj{H} +\] + +To summarize, an opaque archetype can either play the role of an \index{interface type!opaque archetype}interface type or a \index{contextual type!opaque archetype}contextual type, depending on whether its substitution map stores interface or contextual replacement types. + +\begin{example} +A detail we left out of \ExRef{generic opaque example} is that the replacement types of substitution maps appearing in expressions are contextual types. We use an opaque archetype as a contextual type below: +\begin{Verbatim} +func someEquatable3(_ t: T) -> some Equatable { + return [someEquatable2(t)] +} +\end{Verbatim} +Let $G$ denote the generic signature of \texttt{someEquatable3()}. Note that \texttt{someEquatable3()} calls \texttt{someEquatable2()} with the following substitution map, whose replacement type is a primary archetype of~$G$: +\begin{align*} +\SubstMapC{ +&\SubstType{\rT}{$\EquivClass{\rT}$} +}{\\ +&\SubstConf{\rT}{$\EquivClass{\rT}$}{Equatable} +} +\end{align*} +In fact, the above is just the forwarding substitution map for~$G$. If $d$ is the opaque result declaration of \texttt{someEquatable2()}, then the original return type of \texttt{someEquatable2()} is the opaque archetype $\Oy_d$, and the type of the call \texttt{someEquatable2(t)} is $\Oy_d \otimes \FwdMap{G}$. This call expression is nested inside an array literal, so the return expression has the type \texttt{Array<$\Oy_d \otimes \FwdMap{G}$>}. + +To build the \index{underlying type substitution map}underlying type substitution map, we map the return expression's type out of the primary environment of~$G$, to get the interface type \texttt{Array<$\Oy_d$>}. We then look up this type's conformance to \texttt{Equatable}, which gives us a conditional specialized conformance. We obtain this substitution map: +\begin{align*} +\SubstMapC{ +&\SubstType{\rT}{\rT}\\ +&\SubstType{\rO}{Array<$\Oy_d$>} +}{\\ +&\SubstConf{\rT}{\rT}{Equatable}\\ +&\SubstConf{\rO}{Array<$\Oy_d$>}{Equatable} +} +\end{align*} +We observe that the opaque result type of \texttt{someEquatable3()} is defined in terms of the opaque result type of \texttt{someEquatable2()}. This is a common occurrence in frameworks that make use of opaque result types. +\end{example} + +\paragraph{Opaque generic environments.} +Every opaque archetype is instantiated from an \IndexDefinition{opaque generic environment}\emph{opaque generic environment}, and every opaque generic environment is uniquely identified by a pair $(d, \Sigma)$, where $d$ is an opaque result declaration, and $\Sigma$ is a \index{substitution map!opaque generic environment}substitution map for the outer generic signature of~$d$. Recall the discussion of reduced types from~\SecRef{reduced types}. Within an opaque generic environment, we lazily populate a lookup table where each key is a \index{reduced type parameter!opaque archetype}reduced type parameter~\tT, storing the \index{opaque archetype}opaque archetype $\Ot_d \otimes \Sigma$. Time has come to take a closer look at this mapping. + +While every opaque archetype represents a reduced type parameter, it is not the case that every reduced type parameter in the opaque result generic signature is represented by an opaque archetype. We recall that a generic parameter in the \index{opaque result generic signature}opaque result generic signature is either part of the outer generic signature, or it represents an opaque result type. Thus, if we consider the \IndexDefinition{root generic parameter}\emph{root} generic parameter that remains after we peel off any \index{dependent member type!base type}dependent member types, we see the opaque result generic signature has two varieties of reduced type parameter: +\begin{enumerate} +\item If a type parameter is rooted in a generic parameter of the owner declaration, we say it is an \index{outer type parameter!opaque result type}\emph{outer} type parameter. +\item All other valid type parameters are rooted in one of the new generic parameters added by the opaque result declaration, so they represent opaque result types. +\end{enumerate} +Only reduced type parameters of the second kind are represented by opaque archetypes; a type parameter of the first kind is handled differently below. + +\begin{algorithm}[Map type parameter into opaque generic environment]\label{map into opaque alg} +As input, takes an \index{opaque generic environment}opaque generic environment $(d, \Sigma)$ for some opaque result declaration~$d$ and substitution map~$\Sigma$, and a type parameter \tT. Outputs $\Ot_d \otimes \Sigma$ if the archetype exists, otherwise an interface or contextual type for the output generic signature of $\Sigma$. +\begin{enumerate} +\item Let $O$ be the opaque result generic signature of~$d$. +\item (Reduce) Set \IndexQuery{getReducedType}$\tX\leftarrow\Query{getReducedType}{O,\,\tT}$. +\item (Concrete) If \tX\ is a concrete type, it may still contain type parameters. Recursively apply this algorithm to each type parameter of~\tX, and return the result. +\item (Abstract) Otherwise, \tX\ is a reduced type parameter. Set $\tT\leftarrow\tX$. +\item (Outer) If \tT\ is an outer type parameter, return $\tT\otimes\Sigma$. +\item (Archetype) If not, return the opaque archetype $(\tT,d,\Sigma)$, denoted $\Ot_d \otimes \Sigma$. +\end{enumerate} +\end{algorithm} + +To actually observe a concrete type in Step~3 or an outer type parameter in Step~5, we can consider an opaque result type constrained to a \index{parameterized protocol type!opaque result type}parameterized protocol type. + +\begin{example} +The following is a fancier variant of \ExRef{opaque type witness example}: +\begin{Verbatim} +func someSequenceOfInt() -> some Sequence { + return [n] +} + +func pick(_ s: S) -> S.Element {...} + +print(pick(someSequenceOfInt())) +\end{Verbatim} +We make a note of the opaque result generic signature of \texttt{someSequenceOfInt()}, which has a \index{same-type requirement!opaque result type}same-type requirement from the parameterized protocol type: +\begin{quote} +\begin{verbatim} +<τ_0_0 where τ_0_0: Sequence, τ_0_0.Element == Int> +\end{verbatim} +\end{quote} +The return type of the call to \texttt{someSequenceOfInt()} is the opaque archetype $\Ox$ for this signature. To get the return type of \texttt{pick(someSequenceOfInt())}, we proceed as in \ExRef{opaque type witness example}, but when we project the type witness from the opaque abstract conformance in the last step, we see that we do not get another opaque archetype back. Instead, when type witness projection maps \texttt{\rT.Element} into the opaque generic environment, we end up in Step~3 of \AlgRef{map into opaque alg}, and we get back \texttt{Int}: +\[ +\AElement \otimes \ConfReq{$\Ox$}{Sequence} = \texttt{Int} +\] +Therefore, the call expression \texttt{pick(someSequenceOfInt())} has type \texttt{Int}. +\end{example} + +\begin{example}\label{opaque result parameterized generic example} +If the owner declaration is generic, the parameterized protocol type can constrain its \index{primary associated type!opaque result type}primary associated types to the type parameters of the outer generic signature. We show that in the below, the type of the argument to \texttt{print()} is \texttt{Bool}: +\begin{Verbatim} +func someSequenceOfT(_ t: T) -> some Sequence { + return [t] +} + +func pick(_ s: S) -> S.Element {...} + +print(pick(sequenceOfT(true))) +\end{Verbatim} +The generic signature of \texttt{sequenceOfT()} is \texttt{<\rT>}, and we call it with the substitution map $\Sigma := \SubstMap{\SubstType{\rT}{Bool}}$ on the final line of the above. Also, let $d$ be the opaque result declaration of \texttt{someSequenceOfT()}, and let~$O$ be the opaque result generic signature: +\begin{quote} +\begin{verbatim} +<τ_0_0, τ_1_0 where τ_0_0 == τ_1_0.Element, τ_1_0: Sequence> +\end{verbatim} +\end{quote} +The return type of \texttt{sequenceOfT(true)}, and hence the argument type of \texttt{pick()}, is the opaque archetype $\Oy \otimes \Sigma$. Hence, we can obtain the return type of \texttt{pick()} by projecting the type witness for $\AElement$ from the conformance $\ConfReq{$(\Oy \otimes \Sigma)$}{Sequence}$. + +To get this type witness, we map \texttt{\rO.Element} into the opaque generic environment $(d,\Sigma)$, and to do that, we must first compute $\Query{getReducedType}{O,\,\texttt{\rO.Element}}$. The result is \rT, which is an outer type parameter of $d$, so we end up in Step~5 of \AlgRef{map into opaque alg}. We apply $\Sigma$ to \rT\ and arrive at our answer: +\begin{gather*} +\AElement \otimes \ConfReq{$(\Oy \otimes \Sigma)$}{Sequence} \\ +\qquad {} = \rT \otimes \Sigma \\ +\qquad {} = \texttt{Bool} +\end{gather*} +\end{example} + +\AlgRef{map into opaque alg} assumes that if a type parameter \tT\ is equivalent to some outer type parameter, then $\Query{getReducedType}{O,\,\tT}$ will always be an outer type parameter. For this to hold, outer type parameters must come first in the \index{type parameter order!opaque result type}type parameter order. However, \AlgRef{type parameter order} does not have this property, because we compare type parameters by length first; we can have $|\tT| > |\tU|$ with \tT\ being outer and \tU\ not. This was the subject of a bug report, now fixed~\cite{issue59391}. Before we describe the fix, let's look at an example. + +\begin{example} +We use protocol \texttt{N} from \ExRef{protocol n example} in the below, but in fact, any protocol that allows us to state a reduced type parameter of \index{type parameter length}length $> 2$ will do. (Also, we take \texttt{pick()} from the previous two examples.) +\begin{Verbatim} +func someSequenceOfLong() -> some Sequence { + return Array +} + +print(pick(someSequenceOfLong(...))) +\end{Verbatim} +We call \texttt{someSequenceOfLong()} with some substitution map~$\Sigma$, so we get the opaque archetype $\Oy \otimes \Sigma$. As in \ExRef{opaque type witness example}, we determine the substituted type of the call to \texttt{pick()} by projecting the type witness for $\AElement$ from a conformance. Notice how the generic argument of the opaque result type is \texttt{\rT.A.A}, a type parameter of length~3. Here is the opaque result generic signature: +\begin{quote} +\begin{verbatim} +<τ_0_0, τ_1_0 where τ_0_0: N, τ_1_0: Sequence, + τ_1_0.Element == τ_0_0.A.A> +\end{verbatim} +\end{quote} +In the above generic signature, \texttt{\rO.Element} is equivalent to the outer type parameter \texttt{\rT.A.A}. Thus, to project the type witness, we apply $\Sigma$ to \texttt{\rT.A.A}: +\[ +\AElement \otimes \ConfReq{$\Oy \otimes \Sigma$}{Sequence} = \texttt{\rT.A.A} \otimes \Sigma +\] +Since $|\texttt{\rO.Element}|=2$ and $|\texttt{\rT.A.A}|=3$, \AlgRef{type parameter order} tells us the reduced type must have length 2. If that were the case, however, type witness projection would output an opaque archetype $\Opaque{\rO.Element} \otimes \Sigma$, which is incorrect. Thus, we must modify the type parameter order used in the opaque result generic signature. +\end{example} + +\begin{definition} +In addition to storing its depth and index, a \index{generic parameter type!opaque result type}generic parameter type stores a \emph{weight}, which is either 0 or 1. The generic parameters of an ordinary generic signature have weight~0, so in particular, the \index{outer type parameter!weight}outer generic parameters of the \index{opaque result generic signature!weight}opaque result generic signature have weight~0. Generic parameters that represent \index{opaque result type!weight}opaque result types in the opaque result generic signature have weight~1. +\end{definition} + +The \emph{weighted type parameter order} compares weight before comparing length, so that outer type parameters (weight~0) always precede opaque result types (weight~1). + +\begin{algorithm}[Weighted type parameter order] +\IndexDefinition{opaque archetype order}Takes two \index{type parameter order!opaque result type}type parameters \tT\ and \tU\ as input. Returns one of ``$<$'', ``$>$'', or ``$=$'' as output. +\begin{enumerate} +\item Let $\tT^\prime$ and $\tU^\prime$ denote the root generic parameters of \tT\ and \tU, respectively. +\item If $\tT^\prime$ has weight~0 and $\tU^\prime$ has weight~1, return ``$<$''. +\item If $\tT^\prime$ has weight~1 and $\tU^\prime$ has weight~0, return ``$>$''. +\item Otherwise both have equal weight. Compare \tT\ and \tU\ with \AlgRef{type parameter order}. +\end{enumerate} +\end{algorithm} + +We claimed that the type parameter order was \index{well-founded order!opaque result type}well-founded in \PropRef{well founded type order}. This remains true with the weighted order above. (Suppose we are given a non-empty set of type parameters; we can produce a minimum element as follows. If at least one type parameter in this set has weight~0, we can discard any elements of weight~1, because they cannot be smaller than this element. Now, since all remaining elements have equal weight, the minimum element of the remaining set under the original type parameter order, is in fact the minimum element of the original set under the modified order.) + +\smallskip + +We close this section with two minor implementation limitations. + +\begin{example} +An opaque result type \index{limitation!opaque result type}cannot be nested inside the constraint type of another, so today, one cannot write this: +\begin{Verbatim} +func unsupportedNesting() -> some Sequence {...} +\end{Verbatim} + +In fact, the above has a clear interpretation in terms of the following opaque result generic signature: +\begin{verbatim} +<τ_0_0, τ_1_0 where τ_0_0: Sequence, τ_0_1: Equatable, + τ_0_1 == τ_0_0.Element> +\end{verbatim} +This one could be resolved with relatively little effort. +\end{example} + +\begin{example} +Unlike ordinary ``input'' type parameters, opaque result types are not \index{limitation!opaque result type}considered by \index{requirement inference!opaque result type}requirement inference (\SecRef{requirementinference}): +\begin{Verbatim} +func goodInference(_: Set) {} // actually Hashable +func badInference() -> Set {} // error +\end{Verbatim} + +We infer the requirement $\ConfReq{\rT}{Hashable}$ when we build the generic signature of the first declaration. There is no equivalent behavior in the second declaration, so we do not infer the requirement $\ConfReq{\rT}{Hashable}$ in the opaque result generic signature. The opaque archetype $\Ox$ for the ``\texttt{some Any}'' does not conform to \texttt{Hashable}, so it cannot be a generic argument type for \texttt{Set}, and we diagnose an error. While extending requirement inference to the opaque result generic signature would certainly be possible, it may be undesirable from a code legibility standpoint. +\end{example} + +\section{Opaque Type Witnesses}\label{reference opaque archetype} + +Recall the discussion of \index{type witness!opaque result type}type witnesses from \SecRef{type witnesses}, and \index{associated type inference!opaque result type}associated type inference from \SecRef{associated type inference}. We will now see that if a \index{value requirement}value requirement in a protocol returns an associated type, and a \index{candidate value witness!opaque result type}candidate value witness returns an opaque archetype, associated type inference will deduce that the type witness is this opaque archetype. Therefore, an \index{opaque archetype!type witness}opaque archetype can witness an associated type requirement in a conformance. + +\begin{example}\label{opaque archetype witness example} +The \index{normal conformance!opaque type witness}normal conformance \index{horse}$\ConfReq{Horses}{Sequence}$ shown below witnesses the \nElement\ and \nIterator\ associated types with a pair of opaque archetypes: +\begin{Verbatim} +struct Horses: Sequence { + func makeIterator() -> some IteratorProtocol { + return ["Noby", "Neo"].makeIterator() + } +} +\end{Verbatim} +Our \texttt{makeIterator()} method returns an opaque archetype $\Ox$. We deduce that the \nIterator\ type witness is the opaque archetype it returns: +\[ +\AIterator \otimes \ConfReq{Horses}{Sequence} = \Ox +\] +This opaque archetype conforms to \tIterator, so it satisfies the \tSequence\ protocol's \index{associated conformance!opaque archetype}associated conformance requirement with an \index{opaque abstract conformance}opaque abstract conformance: +\begin{gather*} +\SelfIterator \otimes \ConfReq{Horses}{Sequence} \\ +\qquad {} = \ConfReq{$\Ox$}{IteratorProtocol} +\end{gather*} +To get \nElement, we proceed as in \ExRef{abstract type witness example}, and consider the \tSequence\ protocol's associated same-type requirement. This produces another opaque archetype: +\begin{gather*} +\AElement \otimes \ConfReq{Horses}{Sequence} \\ +\qquad = \AElement \otimes \ConfReq{$\Ox$}{IteratorProtocol} \\ +\qquad = \Opaque{\rT.Element} +\end{gather*} + +Associated type inference also \index{synthesized declaration}synthesizes two \index{type alias declaration!opaque archetype}type alias declarations, named \texttt{Iterator} and \texttt{Element}. These allow us to refer to these opaque archetypes by name. For example, \texttt{ride()} receives the opaque archetype $\Opaque{\rT.Element}$ as a \emph{parameter}: +\begin{Verbatim} +func ride(_: Horses.Element) {...} // how? + +for horse in Horses() { + ride(horse) +} +\end{Verbatim} +\end{example} + +When the candidate value witness is defined in a superclass of the conforming type, or a protocol extension of some protocol it conforms to, we must \index{opaque archetype!type substitution}apply a substitution map to the candidate type witness, to get an interface type for the right generic signature. This is analogous to type resolution of member types in \SecRef{member type repr}. For the next two examples, we will declare conformances to this protocol: +\begin{Verbatim} +protocol P { + associatedtype A + func f() -> A +} +\end{Verbatim} + +\begin{example}\label{opaque type witness proto example} +Here, we have a \index{default witness}default witness in a \index{protocol extension!opaque type witness}protocol extension: +\begin{Verbatim} +extension P { + func f() -> some Any {...} // underlying type can depend on `Self' +} + +struct S: P {} +\end{Verbatim} +The \texttt{P.f()} default witness returns an opaque archetype $\Oy$ whose outer generic signature is the \index{protocol generic signature!opaque result type}protocol generic signature $\GP$. We deduce that the type witness for \nA\ is $\Oy$ with the \index{protocol substitution map!opaque type witness}protocol substitution map $\Sigma_{\ConfReq{S}{P}}$ applied: +\[ +\APA \otimes \ConfReq{S}{P} = \Oy \otimes \SubstMapC{\SubstType{\rT}{S}}{\SubstConf{\rT}{S}{P}} +\] +Notice how the substitution map refers back to the normal conformance $\ConfReq{S}{P}$. +\end{example} + +\begin{example}\label{opaque type witness superclass example} +The final possibility is that the witness is in a \index{superclass declaration!opaque type witness}superclass: +\begin{Verbatim} +class Base { + func f() -> some Any {...} // underlying type can depend on `T' +} + +class Derived: Base, P {} +\end{Verbatim} +The \texttt{Base.f()} class method returns an opaque archetype $\Oy$ whose outer generic signature is the generic signature of \texttt{Base}. We apply the \index{superclass substitution map!opaque type witness}superclass substitution map for \texttt{Derived} to get the type witness in the conformance of \texttt{Derived} to \tP: +\[ +\APA \otimes \ConfReq{Derived}{P} = \Oy \otimes \SubstMap{\SubstType{\rT}{Int}} +\] +By modifying the declarations of \texttt{Base} and \texttt{Derived} as appropriate, we can set the type witness to an opaque archetype with an arbitrary substitution map. +\end{example} + +\begin{example} +If the conforming type is generic, we can substitute the type witness. Consider the $\ConfReq{Barn<\rT>}{Sequence}$ normal conformance: +\begin{Verbatim} +struct Barn: Sequence { + func makeIterator() -> some IteratorProtocol {...} +} +\end{Verbatim} +In this conformance, the type witness for \nIterator\ is the opaque archetype $\Oy$ of \texttt{makeIterator()}, while the type witness for \nElement\ is the outer generic parameter \rT, as in \ExRef{opaque result parameterized generic example}: +\begin{gather*} +\AIterator \otimes \ConfReq{Barn<\rT>}{Sequence} = \Oy \\ +\AElement \otimes \ConfReq{Barn<\rT>}{Sequence} = \rT +\end{gather*} +The type witnesses of the \index{specialized conformance!opaque type witness}specialized conformance $\ConfReq{Barn}{Sequence}$ are thus: +\begin{gather*} +\AIterator \otimes \ConfReq{Barn}{Sequence} = \Oy \otimes \SubstMap{\SubstType{\rT}{Int}} \\ +\AElement \otimes \ConfReq{Barn}{Sequence} = \texttt{Int} +\end{gather*} +\end{example} + +\begin{example} +An opaque archetype cannot witness an associated type requirement if its owner declaration has a \index{generic parameter list}generic parameter list: +\begin{Verbatim} +protocol GenericP { + associatedtype A + func f(_: T) -> A +} + +struct Bad: GenericP { // error + func f(_: T) -> some Any {...} +} +\end{Verbatim} +This is not an implementation \index{limitation!opaque result type}limitation, but rather a consequence of the language semantics. A type witness must be an interface type for the generic signature of its conformance. However, an opaque result type is parameterized by the \index{outer generic signature}generic signature of its \index{owner declaration}owner declaration. The two no longer coincide if the owner declaration introduces generic parameters. There is an analogous situation with nested nominal types: +\begin{Verbatim} +protocol P { + associatedtype A +} + +struct S: P { // error + struct A {} +} +\end{Verbatim} +\end{example} + +\paragraph{Textual interfaces.} +When building a shared library for distribution, we generate a \index{textual interface!opaque result type}textual interface file, by printing each declaration in the module with the \index{AST printer}AST printer (\SecRef{module system}). The interface file includes all \index{synthesized declaration}synthesized declarations, and in particular, type aliases synthesized by associated type inference. We are now faced with a new dilemma. The underlying type of a synthesized \index{type alias declaration!textual interface}type alias declaration may reference an existing \index{opaque archetype!textual interface}opaque archetype. However, the Swift language does not have a syntax for referencing an opaque archetype; the \texttt{some} keyword \emph{declares} an opaque archetype. + +We solve this problem with a special syntax, permitted only in textual interface files. This syntax encodes a reference to an opaque archetype $\Ot_d \otimes \Sigma$ in terms of the \index{mangling!opaque result type}\texttt{mangling} of the \index{owner declaration!mangling}owner declaration of~$d$, and the \index{index!opaque result type}\texttt{index} of the root generic parameter of \tT. If $\Sigma$ is empty, and \tT\ is just a \index{generic parameter type!opaque archetype}generic parameter type, this looks like so: +\begin{quote} +\verb|@_opaqueReturnTypeOf("mangling", index) __| +\end{quote} +If \tT\ is a \index{dependent member type!opaque archetype}dependent member type, we wrap it in \index{member type representation!opaque archetype}member type representation syntax: +\begin{quote} +\verb|(@_opaqueReturnTypeOf("mangling", index) __).Element| +\end{quote} +If the owner declaration of $d$ is generic, then $\Sigma$ is non-empty. We print the replacement types of $\Sigma$ (say \texttt{X}, \texttt{Y}, \texttt{Z}) in a generic argument list: +\begin{quote} +\begin{verbatim} +@_opaqueReturnTypeOf("mangling", index) __ +(@_opaqueReturnTypeOf("mangling", index) __).Element +\end{verbatim} +\end{quote} +As this syntax is not meant for human consumption, it is buried within the ``\texttt{@foo}'' type attribute grammar, which also simplifies the implementation. Usually, a type attribute modifies the immediately following type representation, like \texttt{@escaping} in front of a function type for example, but in the case of \verb|@_opaqueReturnTypeOf|, the attribute itself entirely specifies the type, and the type representation that follows is not used, except for its generic arguments. Indeed, ``\verb|__|'' can be any valid identifier, and the AST printer actually used an \index{emoji}emoji prior to \IndexSwift{5.5}Swift 5.5, however this innovation was removed~\cite{opaqueemoji}. + +\begin{example} +We can generate a textual interface for \ExRef{opaque archetype witness example} by making the declarations \texttt{public} and invoking \texttt{swiftc} with the \IndexFlag{enable-library-evolution}\texttt{-enable-library-evolution} and \IndexFlag{emit-module-interface}\texttt{-emit-module-interface} flags. Here is part of the output, with line breaks added: +\begin{Verbatim} +public struct Horses : Swift.Sequence { + public func makeIterator() -> some Swift.IteratorProtocol + + public typealias Element = + (@_opaqueReturnTypeOf("$s5horse6HorsesV12makeIteratorQryF", 0) __) + .Element + public typealias Iterator = + @_opaqueReturnTypeOf("$s5horse6HorsesV12makeIteratorQryF", 0) __ +} +\end{Verbatim} +We use the ``\texttt{some}'' syntax when we print out the return type of \texttt{Horse.makeIterator()}, because this denotes the declaration of a new opaque result type. The underlying types of our two type aliases, on the other hand, use the \verb|@_opaqueReturnTypeOf| syntax to refer to existing opaque archetypes. Note that ``\verb|$s5horse6HorsesV12makeIteratorQryF|'' is the mangled name of our \texttt{makeIterator()} method. +\end{example} + +\begin{example} +The textual interface for \ExRef{opaque type witness proto example} looks like this, if we make the declarations public, and set the module name to ``\texttt{p}'': +\begin{Verbatim} +public protocol P { + associatedtype A + func f() -> Self.A +} +extension p.P { + public func f() -> some Any + +} +public struct S : p.P { + public typealias A = + @_opaqueReturnTypeOf("$s1p1PPAAE1fQryF", 0) __ +} +\end{Verbatim} +Notice how we refer to the opaque archetype with a substitution map. Let's use the \index{demangler}\texttt{swift-demangle} tool to print out the mangled name of the owner declaration: +\begin{Verbatim} +$ swift-demangle s1p1PPAAE1fQryF +s1p1PPAAE1fQryF ---> (extension in p):p.P.f() -> some +\end{Verbatim} +The \texttt{-expand} flag prints the \index{mangling!opaque result type}mangled name's structure in more detail. Try it. +\end{example} + +\begin{example} +Finally, here is the textual interface for \ExRef{opaque type witness superclass example}: +\begin{Verbatim} +public class Base { + public init() + public func f() -> some Any +} +@_inheritsConvenienceInitializers +public class Derived : p.Base, p.P { + override public init() + public typealias A = + @_opaqueReturnTypeOf("$s1p4BaseC1fQryF", 0) __ +} +\end{Verbatim} +\end{example} + +To support the \verb|@_opaqueReturnTypeOf| syntax in \index{type resolution!opaque archetype}type resolution, we introduce some bookkeeping in the parser when \index{parser!opaque archetype}parsing a \index{textual interface!opaque archetype}textual interface file, to collect all \index{opaque result declaration}opaque result declarations in a per-source file list. We also maintain a lookup table, which is initially empty, to map the mangled names of \index{owner declaration}owner declarations back to themselves. This table is populated on the first invocation of the below algorithm. + +\begin{algorithm}[Resolve opaque archetype]\label{resolve opaque archetype algorithm} +As input, takes a mangled name~$s$, an integer~\texttt{i}, and an optional list of generic argument types. Returns the corresponding opaque archetype. +\begin{enumerate} +\item If the list of parsed opaque result declarations is empty, go to Step~3. +\item Otherwise, remove the next opaque result declaration from this list. Invoke the mangler to construct the \index{mangling!opaque result type}mangled name of its owner declaration, which will trigger various requests, such as the \index{interface type request}\Request{interface type request}. Add an entry to the lookup table to associate the mangled name with the owner declaration. Go back to Step~1. +\item Look up $s$ in the table to get the opaque result declaration~$d$. +\item Form a generic parameter type \ttgp{d}{i}, where \texttt{d} is the maximum depth of the \index{opaque result generic signature}opaque result generic signature of~$d$, and \texttt{i} is the index input to the algorithm. +\item Let $G$ be the \index{outer generic signature}outer generic signature of~$d$. If $G$ is non-empty, we must have a list of generic arguments. Form a substitution map $\Sigma$ for $G$, from these generic arguments, using global conformance lookup to populate the substitution map's conformances. Otherwise, if $G$ is empty, let $\Sigma$ be the empty substitution map. + +\item Call \AlgRef{map into opaque alg} to map \ttgp{d}{i} into the opaque generic environment $(d, \Sigma)$. +\item Return this opaque archetype $\Opaque{\ttgp{d}{i}}_d \otimes \Sigma$. +\end{enumerate} +\end{algorithm} + +\section{Runtime Representation}\label{opaque result runtime} + +Back in \ChapRef{chap:introduction}, we learned that Swift implements separate compilation of generic functions by encoding the function's generic signature in the calling convention. The caller constructs a runtime representation of each replacement type and conformance in the substitution map, and the callee manipulates generic values abstractly, using the metadata and witness tables provided by the caller. The implementation of opaque result types is analogous, but ``backwards.'' + +The caller of a function with an opaque result type must manipulate the resulting value abstractly, using \index{runtime type metadata!opaque result type}runtime type metadata. To this end, the compiler emits an \IndexDefinition{opaque type descriptor}\emph{opaque type descriptor} when compiling the callee. The opaque type descriptor references the runtime type metadata for the opaque archetype's underlying type. + +We will first define the underlying type of an opaque archetype in a precise way. +To simplify the discussion, we only consider the case where the underlying type of an opaque result type does not depend on \index{availability}availability, so there is only one \index{underlying type substitution map}underlying type substitution map. Say we have an opaque result declaration~$d$ with opaque result generic signature~$O$, underlying type substitution map $\Sigma^\prime$, and outer generic signature~$G$. + +\begin{definition} +Let $\Ot_d \otimes \Sigma$ be an opaque archetype. The \IndexDefinition{underlying type!opaque archetype}\emph{underlying type} of $\Ot_d \otimes \Sigma$ is the following substituted type: +\[ +\tT \otimes \Sigma^\prime \otimes \Sigma +\] +If $\Sigma \in \SubMapObj{G}{H}$, then $\Ot_d \otimes \Sigma \in \TypeObj{H}$, by definition. As we expect, the underlying type of $\Ot_d \otimes \Sigma$ is also an element of $\TypeObj{H}$. Indeed: +\begin{gather*} +\tT \otimes \Sigma^\prime \in \TypeObj{O}\\ +\tT \otimes \Sigma^\prime \otimes \Sigma \in \TypeObj{H} +\end{gather*} +Analogously, if $\Sigma \in \SubMapObjCtx{G}{H}$, then the underlying type of $\Ot_d \otimes \Sigma$ is an element of $\TypeObjCtx{H}$. +\end{definition} + +\begin{definition} +We also define the \IndexDefinition{underlying conformance}\emph{underlying conformance} of an \index{opaque abstract conformance!underlying conformance}opaque abstract conformance $\ConfReq{$(\Ot_d \otimes \Sigma)$}{P}$ as follows: +\[\TP \otimes \Sigma^\prime \otimes \Sigma\] +That is, we apply the underlying type substitution map to the abstract conformance for the type parameter \tT\ of our opaque archetype. Once again, by the definition of substitution map composition, if $\ConfReq{$(\Ot_d \otimes \Sigma)$}{P} \in \ConfObj{H}$, then its underlying conformance is also an element of $\ConfObj{H}$. (Likewise for $\ConfObjCtx{H}$.) +\end{definition} + +\begin{example}\label{opaque archetype underlying type example} +Consider this function: +\begin{Verbatim} +func someSequence2(_ t: T) -> some Sequence { + return [t] +} +\end{Verbatim} +Here is the underlying type substitution map: +\begin{align*} +\Sigma^\prime := \SubstMapC{ +&\SubstType{\rT}{\rT}\\ +&\SubstType{\rO}{Array<\rT>} +}{\\ +&\SubstConf{\rO}{Array<\rT>}{Sequence} +} +\end{align*} +Now, let $\Sigma := \SubstMap{\SubstType{\rT}{Int}}$ be a substitution map for the outer generic signature \texttt{<\rT>}. The underlying type of $\Oy \otimes \Sigma$ is the following: +\[ +\rO \otimes \Sigma^\prime \otimes \Sigma = \texttt{Array<\rT>} \otimes \Sigma = \texttt{Array} +\] +The underlying type of $\Opaque{\rO.Element} \otimes \Sigma$ is: +\[ +\texttt{\rO.Element} \otimes \Sigma^\prime \otimes \Sigma = \texttt{\rT} \otimes \Sigma = \texttt{Int} +\] +\end{example} + +\paragraph{Runtime entry points.} The Swift runtime exports a pair of entry points that receive an opaque type descriptor, and project the \index{runtime type metadata!opaque result type}type metadata and \index{witness table!opaque result type}witness tables for the replacement types and conformances of its underlying type \index{substitution map!runtime type metadata}substitution map: +\begin{itemize} +\item \verb|swift_getOpaqueTypeMetadata2()| takes an opaque type descriptor and the index of an opaque result type in the opaque result generic signature, and returns type metadata for the corresponding underlying type. +\item \verb|swift_getOpaqueTypeConformance2()| takes an opaque type descriptor and the index of a conformance requirement in the opaque result generic signature, and returns the witness table for the corresponding underlying conformance. +\end{itemize} +Since the underlying type substitution map depends on the \index{outer generic signature!runtime type metadata}outer generic signature, each runtime entry point also takes the full set of type metadata and witness tables that would be passed to a generic function with the same signature. Thus, the caller of a function with an opaque result type must invoke these entry points with the same substitution map that was used for the call. This will yield type metadata and witness tables describing the opaque archetype, and from that point on, instances of this opaque archetype can be manipulated like instances of ordinary type parameters. + +\begin{example} +We continue with \ExRef{opaque archetype underlying type example}. Let's call our \texttt{someSequence2()} function as follows: +\begin{Verbatim} +func pick(_ s: S) -> S.Element {...} + +print(pick(someSequence2(123))) +\end{Verbatim} + +The entry point for \texttt{someSequence2()} takes three parameters: a buffer to hold the return value of type $\Oy$, the type metadata for \rT\ which here is \texttt{Int}, and a pointer to a value of type \rT. + +To determine the size of the return value buffer, we generate a call to the appropriate Swift runtime entry point, handing it the opaque type descriptor for \texttt{someSequence2()}, and the type metadata for \rT. At run time, this outputs type metadata for the underlying type, which is \texttt{Array}. We recover the type's size from its metadata, and generate a dynamic stack allocation with the result. + +We then hand the result of \texttt{someSequence2()} to \texttt{pick()}. The entry point for \texttt{pick()} takes four parameters: a buffer for the return value of type \texttt{\rT.Element}, the type metadata for \rT, and a witness table for $\ConfReq{\rT}{Sequence}$. For the latter, we must pass in the witness table for $\ConfReq{$\Oy \otimes \Sigma$}{Sequence}$. To obtain this witness table, we generate a call to the other Swift runtime entry point, again handing it the opaque type descriptor, and the type metadata for \texttt{Int}. This runtime entry point outputs the witness table for the underlying conformance, $\ConfReq{Array}{Sequence}$. + +Finally, to abstractly manipulate the return value of \texttt{pick()}, and in particular, to recover the size of a \texttt{\rT.Element}, we generate a call to the \index{metadata access function}metadata access function for the \nElement\ associated type in the witness table for $\ConfReq{$\Oy \otimes \Sigma$}{Sequence}$. +\end{example} + +\paragraph{Specialization.} +This implementation strategy is \index{resilience!opaque result type}resilient to library evolution. If the callee is in a shared library and the callee links against this library, we can freely change the callee's underlying type. As long as the opaque result generic signature does not change, the layout of the generated opaque type descriptor remains the same, and binary compatibility with the caller is maintained. + +On the other hand, there are many situations where the caller and callee are always compiled together---for example, if they're declared in the same \index{source file}source file. In this case, the type checker must continue to maintain the illusion that the underlying type of an opaque archetype is hidden from the caller. However, in code generation, we can avoid the abstraction penalty imposed by runtime type metadata, instead manipulating values of the opaque archetype as if they were the underlying type. + +Recall that we use a similar implementation strategy for ordinary generic functions. Separate compilation is the default, but in the case where the function's body is visible to the caller, the \index{SIL optimizer}SIL optimizer can generate a \index{specialization!opaque result type}specialization of the function from the caller's substitution map. + +In fact, the situation with opaque result types is somewhat simpler, because an opaque archetype only has one underlying type. Instead of a separate optimizer pass, we perform this replacement in \index{SILGen}SILGen while lowering a type checked \index{abstract syntax tree}AST to SIL instructions. To be precise, this replacement happens as part of SIL \index{SIL type lowering!opaque archetype}type lowering (\SecRef{sec:type lowering}). + +\begin{definition} +Suppose that $f$ is the function declaration currently being lowered, and $\Ot_d \otimes \Sigma$ is some opaque archetype referenced from $f$. Formally, the \index{underlying type substitution map}underlying type substitution map of $d$ is \emph{visible} from~$f$ if either of the following holds: +\begin{enumerate} +\item The owner declaration of $d$ is in another module, and either: +\begin{enumerate} +\item the owner declaration is \index{inlinable function!opaque result type}\verb|@inlinable|, or +\item this other module was built without \index{library evolution!opaque result types}library evolution. +\end{enumerate} +\item The owner declaration of $d$ is in a \index{primary file}primary file of the \index{main module}main module, and either: +\begin{enumerate} +\item $f$ itself is \emph{not} \verb|@inlinable|, or +\item the \index{owner declaration!opaque result type}owner declaration of $d$ is also \verb|@inlinable|. +\end{enumerate} +\end{enumerate} +\end{definition} + +In \index{whole module optimization}whole module mode, every source file is primary, so the optimization is most effective in that case. The \verb|@inlinable| restriction ensures we avoid leaking implementation details when we serialize the SIL representation of an \verb|@inlinable| function as part of a \index{binary module}binary module (\SecRef{module system}). This serialized representation cannot depend on the underlying types of any non-\verb|@inlinable| functions, even within the same module. + +When the above visibility condition holds, we can obtain the underlying type substitution map~$\Sigma^\prime$ for~$d$, and safely assume that the underlying type of~$\Ot$ will always be equal to~$\tT \otimes \Sigma^\prime$ inside~$f$. However, there is one more thing to check. + +The \index{access control}\emph{access control} keywords (\texttt{fileprivate}, \texttt{internal}, \texttt{public}) determine not only the visibility of declarations from \index{name lookup!access control}name lookup at compile time, but also the visibility of symbols in the generated \index{object file}object file. In particular, we can only replace an opaque archetype with its underlying type if every nominal type appearing in the underlying type is visible. For example, if our opaque result type~$d$ is declared in another source file and its underlying type involves a \texttt{private struct}, we cannot directly reference the underlying type, even if we are allowed to know what this type is. + +A \IndexDefinition{type expansion context}\emph{type expansion context} collects the input data for the visibility check. It consists of the parent declaration context of~$f$, paired with a flag indicating if~$f$ is \index{inlinable function}\verb|@inlinable|. + +The three mutually-recursive \IndexDefinition{replace opaque archetypes with underlying types}algorithms below are used by \index{SIL type lowering!opaque result type}SIL type lowering to replace opaque archetypes appearing within \index{type!replace opaque archetype}types, \index{conformance!replace opaque archetype}conformances, and \index{substitution map!replace opaque archetype}substitution maps. + +\begin{algorithm}[Specialize opaque archetypes within a type]\label{unwrap opaque type algorithm} +Receives a type expansion context and a type. +\begin{itemize} +\item For an \index{opaque archetype}\textbf{opaque archetype} $\Ot_d \otimes \Sigma$: +\begin{enumerate} +\item If its underlying type is visible, substitute in underlying type, and recurse if this underlying type further contains opaque archetypes. +\item Otherwise, recursively transform~$\Sigma$ and form a new opaque archetype. +\end{enumerate} + +\item For an \index{existential archetype}\textbf{existential archetype} $\Et \otimes \Sigma$: recursively transform $\Sigma$. (We will meet these in \SecRef{open existential archetypes}.) + +\item For \textbf{every other type} \tX: recursively transform the child types of \tX\ if any, and construct a new type with the transformed children. +\end{itemize} +\end{algorithm} + +\begin{algorithm}[Specialize opaque archetypes within a conformance]\label{unwrap opaque conformance algorithm} +Receives a type expansion context and a conformance. +\begin{itemize} +\item For a \index{specialized conformance!opaque archetype}\textbf{specialized conformance} $\XP \otimes \Sigma$: recursively transform~$\Sigma$ and return a new specialized conformance. + +\item For an \index{opaque abstract conformance}\textbf{opaque abstract conformance} $\ConfReq{$(\Ot \otimes \Sigma)$}{P}$: + \begin{enumerate} + \item If its underlying conformance is visible, substitute in the underlying conformance. Recurse if the conforming type again contains opaque archetypes. + \item Otherwise, recursively transform~$\Sigma$. +\end{enumerate} +\item For an \index{existential abstract conformance}\textbf{existential abstract conformance} $\ConfReq{$(\Et \otimes \Sigma)$}{P}$: recursively transform~$\Sigma$ and form a new existential abstract conformance. +\end{itemize} +\end{algorithm} + +\begin{algorithm}[Specialize opaque archetypes within a substitution map]\label{unwrap opaque substitution map algorithm} +Receives a type expansion context and substitution map. +\begin{itemize} +\item Recursively transform each replacement type and conformance with \AlgRef{unwrap opaque type algorithm} and \AlgRef{unwrap opaque conformance algorithm}, and form a new substitution map. +\end{itemize} +\end{algorithm} + +\paragraph{Circularity.} +We encountered \index{non-terminating computation}non-terminating compile-time computation in our prior discussion of conditional and recursive conformances (\SecRef{sec:conditional conformances}, \SecRef{recursive conformances}). This issue can also arise with opaque result types, because nothing in our discussion so far precludes the possibility of an \index{opaque result type!circularity}opaque result type whose underlying type is \index{circular reference}defined in terms of itself. If this were to happen, our algorithm for replacing an opaque archetype with its underlying type would run forever, if implemented as described. However, in reality, we attempt to detect this and \index{diagnostic!opaque result type}diagnose an error, as we will see now. + +\begin{example} +The simplest example is when a function with an opaque result type calls itself on all control flow paths. Here, the opaque result types of \texttt{f1a()} and \texttt{f1b()} are defined in terms of each other: +\begin{Verbatim} +func f1a() -> some Any { + return f1b() + // error: function opaque return type was inferred as `some Any', + // which defines the opaque type in terms of itself +} + +func f1b() -> some Any { + return f1a() + // error: function opaque return type was inferred as `some Any', + // which defines the opaque type in terms of itself +} +\end{Verbatim} +\end{example} +\begin{example} +More complex recursion is possible, where the underlying type contains the opaque result type in structural position. Here, the underlying type of our opaque result type $\Ot$ is \texttt{Array<$\Ot$>}: +\begin{Verbatim} +func f2() -> some Any { + return [f2()] + // error: function opaque return type was inferred as `[some Any]', + // which defines the opaque type in terms of itself +} +\end{Verbatim} +\end{example} + +Beyond these simple examples, the technique in \SecRef{halting problem} of encoding a \index{tag system}tag system in terms of type substitution can be modified slightly to use opaque result types instead, so in fact, opaque archetype specialization is Turing-complete, and it an \index{undecidable problem!halting problem}undecidable in general if this process terminates. Thus, the only way to prevent non-terminating computation is to impose an upper bound on the total amount of work we can do, and signal an error if this limit is breached: +\begin{enumerate} +\item To catch most cases, we eagerly attempt to fully specialize every declared opaque result type from the \index{type-check primary file request}\Request{type-check primary file request}. If this happens, we emit a diagnostic, as shown with the two examples above. + +\item In more complex scenarios, a circular opaque result type only becomes apparent in the \index{SIL optimizer}SIL optimizer, after some number of inlining and specialization passes have run. In this case, we print a fatal error message and stop compilation. +\end{enumerate} + +In the future, it would be nice to \index{limitation!opaque result type}diagnose an error with a useful source location in the second scenario as well. However, even this cannot catch everything. With separate compilation, one can always define mutually-recursive opaque result types which do not become apparent until run time. In this case, the program will crash with infinite recursion inside the \index{runtime type metadata!circularity}runtime type metadata instantiation logic. + +Finally, while the default limits should be sufficient for all reasonable programs, it is possible to change them with a pair of \index{frontend flag}frontend flags: +\begin{itemize} +\item \IndexFlag{max-substitution-depth}\texttt{-max-substitution-depth} is the maximum number of recursive calls, in the sense where the underlying type of an opaque archetype involves another opaque archetype, which involves another, and so on. The default value is 500. + +\item \IndexFlag{max-substitution-count}\texttt{-max-substitution-count} is the total number of opaque archetypes that can be visited within a single type before we give up. The default value is 120,000. +\end{itemize} + +\section{Source Code Reference} + +Key source files: +\begin{itemize} +\item \SourceFile{lib/Sema/MiscDiagnostics.cpp} +\item \SourceFile{lib/Sema/TypeCheckGeneric.cpp} +\end{itemize} + +\apiref{OpaqueTypeDecl} +An \IndexSource{opaque result declaration}opaque result declaration. +\begin{itemize} +\item \texttt{getNamingDecl()} returns this declaration's \IndexSource{owner declaration}owner declaration. +\item \texttt{getOpaqueInterfaceGenericSignature()} returns this declaration's \IndexSource{opaque result generic signature}opaque result generic signature. +\item \texttt{getUniqueUnderlyingTypeSubstitutions()} returns this declaration's \IndexSource{substitution map!opaque result type}underlying type \IndexSource{underlying type substitution map}substitution map, if it does not depend on availability. + +Returns \texttt{nullopt} if the substitution map not been computed yet, or if it cannot be computed because the owner declaration does not have a body. +\item \texttt{getConditionallyAvailableSubstitutions()} is the general form for when we have more than one availability range and underlying type substitution map. +\end{itemize} + +\apiref{ValueDecl::getOpaqueTypeDecl()}{method} +Returns this \IndexSource{value declaration!opaque result type}declaration's opaque result declaration if it has one, or \texttt{nullptr} otherwise. +See also \SecRef{src:declarations}. + +\apiref{OpaqueResultTypeRequest}{class} +A \IndexSource{opaque result type request}request for constructing opaque result type declarations. This request is evaluated by calling \texttt{ValueDecl::getOpaqueTypeDecl()} above. + +\apiref{OpaqueUnderlyingTypeChecker}{class} +An AST walker to collect the \texttt{return} statements inside a function body and fill in its opaque result declaration's underlying type substitution map. Also diagnoses errors, such as multiple \texttt{return} statements with mismatched return types. The type checker performs this walk after assigning types to expressions. + +\subsection*{Opaque Archetypes} + +Key source files: +\begin{itemize} +\item \SourceFile{include/swift/AST/GenericEnvironment.h} +\item \SourceFile{lib/AST/GenericEnvironment.cpp} +\end{itemize} + +\apiref{TypeBase::hasOpaqueArchetype()}{method} +Returns true if this type contains an opaque archetype. See also \SecRef{src:types}. + +\apiref{OpaqueTypeArchetypeType}{class} +Subclass of \texttt{ArchetypeType} representing an \IndexSource{opaque archetype}opaque archetype. + +Recall from \SecRef{src:archetypes} that every archetype has a \texttt{getInterfaceType()} method that returns its type parameter, and a \texttt{getGenericEnvironment()} method that returns its parent generic environment. Opaque archetypes have two more accessor methods: +\begin{itemize} +\item \texttt{getOpaqueDecl()} returns the opaque result declaration for this opaque archetype. +\item \texttt{getSubstitutions()} returns this opaque archetype's substitution map. +\end{itemize} + +\apiref{GenericEnvironment}{class} +See also \SecRef{src:archetypes}. +\begin{itemize} +\item \texttt{forOpaqueType()} is a static factory method that returns the unique opaque generic environment for the given opaque result declaration and substitution map. +\item \texttt{getKind()} returns \texttt{GenericEnvironment::Kind::Opaque} for an \IndexSource{opaque generic environment}opaque generic environment. The generic environment of an opaque archetype is always one. +\item \texttt{getOpaqueTypeDecl()} returns the opaque result declaration for this environment. +\item \texttt{getOuterSubstitutions()} returns this generic environment's substitution map. +\end{itemize} + +\subsection*{Opaque Type Witnesses} + +Key source files: +\begin{itemize} +\item \SourceFile{lib/AST/SourceFile.cpp} +\item \SourceFile{lib/Sema/TypeCheckType.cpp} +\end{itemize} + +\apiref{TypeResolver::resolveOpaqueReturnType()}{method} +This is \AlgRef{resolve opaque archetype algorithm}, to resolve the special \verb|@_opaqueReturnTypeOf| syntax. + +\apiref{SourceFile}{class} +See also \SecRef{src:compilation model}. +\begin{itemize} +\item \texttt{addUnvalidatedDeclWithOpaqueResultType()} records a declaration as having an opaque result type in the per-source file list. Invoked by the parser. +\item \texttt{getOpaqueReturnTypeDecls()} walks this above list, constructs each opaque result declaration, and populates the per-source file lookup table with the mangled name of each owner declaration. This is Step 1~and~2 of \AlgRef{resolve opaque archetype algorithm}. +\item \texttt{lookupOpaqueResultType()} returns an opaque result declaration for the given \IndexSource{mangling}mangled name. This is Step~3 of \AlgRef{resolve opaque archetype algorithm}. +\end{itemize} + +\subsubsection*{AST Demangler} + +Key source files: +\begin{itemize} +\item \SourceFile{include/swift/AST/ASTDemangler.h} +\item \SourceFile{lib/AST/ASTDemangler.cpp} +\end{itemize} +We didn't say it in \SecRef{reference opaque archetype}, but the mangled name in a \verb|@_opaqueReturnTypeOf| may refer to an opaque archetype from another module. In this case, we do not consult the per-source file list. Instead, we query the ``AST demangler.'' + +\apiref{ASTBuilder::resolveOpaqueType()}{method} +Calls \texttt{lookupOpaqueResultType()} on the appropriate \texttt{SourceFile} if the mangled name refers to the main module, otherwise it takes a different path. + +\subsection*{Runtime Representation} + +Key source files: +\begin{itemize} +\item \SourceFile{lib/IRGen/GenMeta.cpp} +\item \SourceFile{stdlib/public/runtime/MetadataLookup.cpp} +\end{itemize} + +\apiref{IRGenModule::emitOpaqueTypeDecl()}{method} +Emits an \IndexSource{opaque type descriptor}opaque type descriptor. + +\apiref{swift\char`_getOpaqueTypeMetadata2()}{function} +Runtime entry point to construct \IndexSource{runtime type metadata!opaque result type}runtime type metadata for a replacement type in an \IndexSource{opaque type descriptor}opaque type descriptor's underlying type substitution map. + +\apiref{swift\char`_getOpaqueTypeConformance2()}{function} +Runtime entry point to construct a \IndexSource{witness table!opaque result type}witness table for a conformance in an \IndexSource{opaque type descriptor}opaque type descriptor's underlying type substitution map. + +\subsubsection*{Specialization} + +Key source files: +\begin{itemize} +\item \SourceFile{include/swift/AST/Type.h} +\item \SourceFile{lib/AST/TypeSubstitution.cpp} +\end{itemize} + +\apiref{TypeExpansionContext}{class} +A \IndexSource{type expansion context}data type to encode a declaration context together with the \verb|@inlinable| flag. + +\apiref{swift::substOpaqueTypesWithUnderlyingTypes()}{function} +The three overloads of this function, for types, conformances, and substitution maps, \IndexSource{replace opaque archetypes with underlying types}implement \AlgRef{unwrap opaque type algorithm}, \AlgRef{unwrap opaque conformance algorithm}, and \AlgRef{unwrap opaque substitution map algorithm}. Each overload also takes a \texttt{TypeExpansionContext}. + +\end{document} diff --git a/docs/Generics/chapters/opaque-return-types.tex b/docs/Generics/chapters/opaque-return-types.tex deleted file mode 100644 index d8ca6232133c9..0000000000000 --- a/docs/Generics/chapters/opaque-return-types.tex +++ /dev/null @@ -1,436 +0,0 @@ -\documentclass[../generics]{subfiles} - -\begin{document} - -\chapter[]{Opaque Return Types}\label{opaqueresult} - -\ifWIP - -\IndexDefinition{opaque return type} - -TODO: -\begin{itemize} -\item Say a few words about how the underlying type is inferred -\item Joe's thing where an opaque return type of a generic method cannot fulfill an associated type requirement -\end{itemize} - -An opaque return type hides a fixed concrete type behind a generic interface. Opaque return types are declared by defining a function, property or subscript return type with the \texttt{some} keyword: -\begin{Verbatim} -func foo() -> some P {...} -var bar: some P {...} -subscript() -> some P {...} -\end{Verbatim} -Opaque return types were first introduced in \IndexSwift{5.1}Swift 5.1 \cite{se0244}. The feature was generalized to allow occurrences of \texttt{some} structurally nested in other types, as well as multiple occurrences of \texttt{some}, in \IndexSwift{5.7}Swift 5.7 \cite{se0328}. - -The type that follows \texttt{some} is a constraint type, as defined in \SecRef{requirements}. The underlying type is inferred from \texttt{return} statements in the function body. There must be at least one return statement; if there is more than one, all must return the same concrete type. - -At the implementation level, a declaration has an associated \emph{opaque return type declaration} if \texttt{some} appears at least once in the declaration's return type. An opaque return type declaration stores three pieces of information: -\begin{enumerate} -\item A generic signature describing the \emph{interface} of the opaque return type, called the \emph{opaque interface generic signature}. -\item A generic environment instantiated from the opaque generic signature, called the \emph{opaque generic environment}, mapping each generic parameter to its opaque archetype. (Opaque generic environments also store a substitution map, described in the next section.) -\item A substitution map for this generic signature, called the \emph{underlying type substitution map}, mapping each generic parameter to its underlying type. The underlying type substitution map is the \emph{implementation} of the opaque type declaration, and callers from other modules cannot depend on its contents. (This is different from the substitution map stored inside the generic environment.) -\end{enumerate} -The opaque interface generic signature is built from the generic signature of the original function, property or subscript (the owning declaration). Each occurrence of the \texttt{some} keyword introduces a new generic parameter with a single requirement relating the generic parameter to the constraint type. - -The opaque generic environment describes the type of an opaque return type. When computing the interface type of the owner declaration, type resolution replaces each occurrence of the \texttt{some} keyword in the return type with the corresponding opaque archetype. - -The underlying substitution map is computed by analyzing the concrete types returned by one or more \texttt{return} statements appearing in the owner declaration's body. The underlying type substitution map is only needed when emitting the owner declaration, not when referencing the owner declaration. It is not computed if the body of the declaration appears in a secondary source file in batch mode (because the body is not parsed or type checked), or if the declaration was parsed from a \texttt{swiftinterface} file (because declaration bodies are not printed in module interfaces). - -\begin{example} -Consider the following declaration: -\begin{Verbatim} -struct Farm { - var horses: [Horse] = [] - - var hungryHorses: some Collection { - return horses.lazy.filter(\.isHungry) - } -} -\end{Verbatim} -The \texttt{hungryHorses} property has an associated opaque result type declaration, because \texttt{some} appears in its return type. - -The property appears in a non-generic context, so there is no parent generic signature. The return type has a single occurrence of \texttt{some}, so the opaque interface generic signature has a single generic parameter \texttt{\ttgp{0}{0}}. The constraint type is \texttt{Collection}, so the sugared generic requirement \texttt{\ttgp{0}{0}:\ Collection} desugars to a pair of requirements. The opaque generic signature is -\begin{quote} -\texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Collection, \ttgp{0}{0}.[Sequence]Element == Horse>}. -\end{quote} - -The return statement's underlying type is \texttt{LazyFilterSequence>}, so the substitution map is -\begin{align*} -\SubstMapC{ -&\SubstType{\ttgp{0}{0}}{LazyFilterSequence>} -}{\\ -&\SubstConf{\ttgp{0}{0}}{LazyFilterSequence>}{Collection} -} -\end{align*} -The opaque generic environment has a single opaque archetype $\archetype{\ttgp{0}{0}}$ corresponding to the opaque interface generic signature's single generic parameter \ttgp{0}{0}. The interface type of the declaration \texttt{hungryHorses} is the opaque archetype $\archetype{\ttgp{0}{0}}$. -\end{example} - -\begin{example} -Consider the following declaration: -\begin{Verbatim} -func makePair(first: T, second: T) -> (some Collection, some Collection) { - return ([first], [second]) -} -\end{Verbatim} -The generic signature of the \texttt{makePair} function is -\begin{quote} -\texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable>} -\end{quote} - -The opaque interface generic signature is constructed from this with two additional generic parameters and requirements added for each of the two occurrences of \texttt{some} in the return type: -\begin{quote} -\texttt{<\ttgp{0}{0}, \ttgp{1}{0}, \ttgp{1}{1} where \ttgp{0}{0}:\ Equatable, \ttgp{1}{0}:\ Collection, \ttgp{1}{1}:\ Collection, \ttgp{1}{0}.[Collection]Element == \ttgp{1}{1}.[Collection]Element, \ttgp{1}{1}.[Collection]Element == \ttgp{0}{0}>} -\end{quote} - -The substitution map sends the outer generic parameter \ttgp{0}{0} to itself, and the two inner generic parameters \ttgp{1}{0}, \ttgp{1}{1} both to \texttt{Array<\ttgp{0}{0}>}. -\begin{align*} -\SubstMapC{ -&\SubstType{\ttgp{0}{0}}{\ttgp{0}{0}},\\ -&\SubstType{\ttgp{1}{0}}{Array<\ttgp{0}{0}>},\\ -&\SubstType{\ttgp{1}{1}}{Array<\ttgp{0}{0}>} -}{\\ -&\SubstConf{\ttgp{0}{0}}{\ttgp{0}{0}}{Equatable} -} -\end{align*} -The opaque generic environment has two opaque archetypes $\archetype{\ttgp{1}{0}}$ and $\archetype{\ttgp{1}{1}}$. The interface type of the function is -\begin{quote} -\texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable> (\ttgp{0}{0}) -> ($\archetype{\ttgp{1}{0}}$, \archetype{\ttgp{1}{1}})} -\end{quote} -\end{example} - -\fi - -\section[]{Opaque Archetypes}\label{opaquearchetype} - -\ifWIP - -\IndexDefinition{opaque archetype} - -TODO: -\begin{itemize} -\item opaque archetypes: global extent and same underlying concrete type -\item primary archetypes: lexically scoped and bound by caller -\item every type written with primary archetypes has an equivalent interface type representation. opaque archetypes don't correspond to any type parameter in our generic context's signature, etc. -\end{itemize} - -Opaque archetypes appear inside interface types, unlike primary archetypes. Also, unlike primary archetypes, opaque archetypes do not represent type parameters to be substituted by the caller. They behave differently from primary archetypes in two important respects: -\begin{itemize} -\item The \texttt{TypeBase::hasArchetype()} predicate does not detect their presence, since this predicate is asserted to be false for interface types of declarations. To check if a type contains opaque archetypes, use \texttt{TypeBase::hasOpaqueArchetype()}. - -\item The \texttt{Type::subst()} method does not replace opaque archetypes by default. For situations where opaque archetypes need to be replaced, \texttt{subst()} takes an optional set of flags. The \texttt{SubstFlags::SubstituteOpaqueArchetypes} flag can be passed in to enable replacement of opaque archetypes. Usually the lower level two-callback form of \texttt{subst()} is used with this flag, instead of the variant taking a substitution map. To differentiate between the two behaviors, let's call them ``opaque archetype replacement'' and ``opaque archetype substitution,'' respectively. -\end{itemize} -Opaque archetype substitution is the default and common case, but there is less to say about opaque archetype replacement, so let's discuss it first. - -\paragraph{Opaque archetype replacement} A notable appearance of the first behavior is for an optimization performed during SIL lowering. When a usage of an opaque return type appears in the same compilation unit as the definition (the same source file in batch mode, or the same module in whole-module mode), the opaque archetype can be safely replaced with its underlying type. This replacement is performed after type checking; the abstraction boundary between the opaque archetype's interface and implementation still exists as far as the type checker is concerned, but SIL optimizations can generate more efficient code with knowledge of the underlying type. SIL type lowering implements this with appropriately-placed calls to \texttt{Type::subst()} with an instance of the \texttt{ReplaceOpaqueTypesWithUnderlyingTypes} functor as the type replacement callback, and the \texttt{SubstFlags::SubstituteOpaqueArchetypes} flag set. - -\paragraph{Opaque archetype substitution} -Opaque archetypes are parameterized by the generic signature of the owner declaration. In the general case, the underlying type of an opaque archetype can depend on the generic parameters of the owner declaration. For this reason, each substitution map of the owner declaration's generic signature must produce a different opaque archetype. A new opaque generic environment is instantiated for each combination of an opaque return declaration and substitution map; the substitution map is stored in the opaque generic environment: -\[\left(\,\ttbox{OpaqueTypeDecl}\otimes \ttbox{SubstitutionMap}\,\right) \rightarrow \mathboxed{Opaque \texttt{GenericEnvironment}}\] - -\begin{algorithm}[Applying a substitution map to an opaque archetype]\label{opaquearchetypesubst} -As input, takes an opaque archetype $T$ and a substitution map $S$. As output, produces a new type (which is not necessarily an opaque archetype). -\begin{enumerate} -\item Let $G$ be the opaque generic environment of $T$. -\item Compose the original substitution map of $G$ with $S$ to produce the substituted substitution map $S'$. -\item Look up the opaque generic environment for the same opaque return type declaration as $T$ the substituted substitution map $S'$; call it $G'$. -\item Map the interface type of $T$ into $G'$ to produce the result, $T'$. -\end{enumerate} -\end{algorithm} - -TODO: figure with one generic signature, two generic environments, two archetypes. Arrow from generic signatures to generic environments, arrow from generic environment to archetype labeled ``map type into context''. arrow from one archetype to another labeled ''substitution''. - -\begin{example} Consider this definition: -\begin{Verbatim} -func underlyingType(_ t: T) -> some Equatable { return 3 } -\end{Verbatim} -The original declaration's generic signature has a single generic parameter, and the return type has a single occurrence of \texttt{some}, so the opaque interface generic signature has two generic parameters, both constrained to \texttt{Equatable}: -\begin{quote} -\texttt{<\ttgp{0}{0}, \ttgp{1}{0} where \ttgp{0}{0}:\ Equatable, \ttgp{1}{0}:\ Equatable>} -\end{quote} -The interface type of \texttt{underlyingType()} is a generic function type with an opaque archetype as the return type: -\begin{quote} -\texttt{<\ttgp{0}{0} where \ttgp{0}{0}:\ Equatable> (\ttgp{0}{0}) -> \$\ttgp{1}{0}} -\end{quote} -Consider the following three calls: -\begin{Verbatim} -let x = underlyingType(1) -let y = underlyingType(2) -let z = underlyingType("hello") -\end{Verbatim} -The types of \texttt{x}, \texttt{y} and \texttt{z} are constructed by applying substitution maps to the opaque archetype \texttt{\$\ttgp{1}{0}}. For \texttt{x} and \texttt{y}, the substitution map is the following: -\[ -\SubstMapC{ -\SubstType{\ttgp{0}{0}}{Int} -}{ -\SubstConf{\ttgp{0}{0}}{Int}{Equatable} -} -\] -For \texttt{z}, the substitution map is different: -\[ -\SubstMapC{ -\SubstType{\ttgp{0}{0}}{String} -}{ -\SubstConf{\ttgp{0}{0}}{String}{Equatable} -} -\] -Per \AlgRef{opaquearchetypesubst}, two new opaque generic environments are constructed from the opaque return type declaration of \texttt{underlyingType()} with each of the above two substitution maps. The substituted opaque archetypes are constructed by mapping the interface type \texttt{\ttgp{1}{0}} into each of the two opaque generic environments. - -Indeed, even though the generic parameter \texttt{T} and the value \texttt{t} are completely unused in the body of the \texttt{underlyingType()} function, each call of \texttt{underlyingType()} with a different specialization produces a different type. This can be observed by noting the behavior of the \texttt{Equatable} protocol's \texttt{==} operator; it expects both operands to have the same type: -\begin{Verbatim} -let x = underlyingType(1) -let y = underlyingType(2) -print(x == y) // okay - -let z = underlyingType("hello") -print(x == z) // type check error -\end{Verbatim} -The expression \texttt{x == y} type checks successfully, because \texttt{x} and \texttt{y} have the same type, an opaque archetype instantiated from the declaration of \texttt{underlyingType()} with the substitution \texttt{T := Int}. On the other hand, the expression \texttt{x == z} fails to type check, because \texttt{x} and \texttt{z} have different types; both originate from \texttt{underlyingType()}, but with different substitutions: -\begin{itemize} -\item the type of \texttt{x} was instantiated with \texttt{T := Int}, -\item the type of \texttt{z} was instantiated with \texttt{T := String}. -\end{itemize} -\end{example} -\begin{example} -The above behavior might seem silly, since the underlying type of \texttt{underlyingType()}'s opaque return type is always \texttt{Int}, irrespective of the generic parameter \texttt{T} supplied by the caller. However, since opaque return types introduce an abstraction boundary, it is in fact a source-compatible and binary-compatible change to redefine \texttt{underlyingType()} as follows: -\begin{Verbatim} -func underlyingType(_ t: T) -> some Equatable { return t } -\end{Verbatim} -Now, the underlying type is \texttt{T}; it would certainly not be valid to mix up the result of calling \texttt{underlyingType()} with an \texttt{Int} and \texttt{String}. -\end{example} - -The \texttt{GenericEnvironment::forOpaqueType()} method creates an opaque generic environment for a given substitution map, should you have occasion to do this yourself outside of the type substitution machinery. The opaque generic environment's substitution map plays a role beyond its use as a uniquing key for creating new opaque archetypes; it is also applied to the ``outer'' generic parameters of the opaque return type's interface signature when they are mapped into context. This is important when a same-type requirement equates an associated type of an opaque return type with a generic parameter of the owner declaration; the substituted opaque archetype will behave correctly when the associated type is projected. - -\begin{example} -In this example, the specialization of the original declaration is ``exposed'' via a same-type requirement on the opaque return type. -\begin{Verbatim} -func sequenceOfOne(_ elt: Element) -> some Sequence { - return [elt] -} - -let result = sequenceOfOne(3) -var iterator = result.makeIterator() -let value: Int = iterator.next()! -\end{Verbatim} -Let's walk through the formalities first. The opaque interface generic signature: -\begin{quote} -\texttt{<\ttgp{0}{0}, \ttgp{1}{0} where \ttgp{1}{0}:\ Sequence, \ttgp{1}{0}.Element == \ttgp{0}{0}>} -\end{quote} -The interface type of \texttt{sequenceOfOne()} has an opaque archetype in return position: -\begin{quote} -\texttt{<\ttgp{0}{0}> (\ttgp{0}{0}) -> $\archetype{\ttgp{1}{0}}$} -\end{quote} -The type of \texttt{result} is the substituted opaque archetype with the substitution map \texttt{T := Int}, which is the substitution map of the call to \texttt{sequenceOfOne()}. For lack of a better notation, call this archetype $\archetype{\ttgp{1}{0}}$. - -\IndexSelf -The type of \texttt{iterator} is calculated by applying a substitution replacing the protocol \tSelf\ type with the type of \texttt{result} to the return type of the \texttt{makeIterator()} requirement of the \texttt{Sequence} protocol. The type of \texttt{result} is the substituted opaque archetype we're calling \archetype{\ttgp{1}{0}} above, so the type of \texttt{iterator} is the substituted opaque archetype \archetype{\ttgp{1}{0}.Iterator} from the same substituted opaque generic environment as \archetype{\ttgp{1}{0}}. - -What about \texttt{value}? The \texttt{next()} requirement of the \texttt{IteratorProtocol} protocol returns the \texttt{Self.Element} associated type of \texttt{IteratorProtocol}. We're substituting \tSelf\ here with the type of \texttt{iterator}, which is \texttt{\ttgp{1}{0}.Iterator}. This means the type of \texttt{value} can be computed by mapping the type parameter \texttt{\ttgp{1}{0}.Iterator.Element} into the substituted opaque generic environment. - -The type parameter \texttt{\ttgp{0}{0}.Iterator.Element} is equivalent to \texttt{\ttgp{0}{0}} in the opaque interface generic signature. So mapping \texttt{\ttgp{1}{0}.Iterator.Element} into our substituted opaque generic environment applies the substitution map to the interface type \texttt{\ttgp{0}{0}}. This is just \texttt{Int}. So the type of \texttt{value} is \texttt{Int}! -\end{example} - -\fi - -\section[]{Referencing Opaque Archetypes}\label{reference opaque archetype} - -\ifWIP - -\index{AST printer} -TODO: right now, my understanding is the parser puts all OpaqueTypeDecls in a big list, and the first time you do a type reconstruction, we go and compute the interface type of each entry in that list, which implicitly populates a second mangled name -> OpaqueTypeDecl map, and then we go and look in that second map - -\index{associated type inference} -Opaque return types are different from other type declarations in that the \texttt{some P} syntax serves to both declare an opaque return type, and immediately reference the declared type. It is however possible to reference an opaque return type of an existing declaration from a different context. The trick is to use associated type inference to synthesize a type alias whose underlying type is the opaque return type, and then reference this type alias. This can be useful when writing tests to exercise an opaque return type showing up in compiler code paths that might not expect them. - -\begin{example} The normal conformance \texttt{ConcreteP:\ P} in \ListingRef{reference opaque return type} shows how an opaque archetype can witness an associated type requirement. The method \texttt{ConcreteP.f()} witnesses the protocol requirement \texttt{P.f()}. The return type of \texttt{ConcreteP.f()} is a tuple type of two opaque archetypes, and the type witnesses for the \texttt{X} and \texttt{Y} associated types are inferred to be the first and second of these opaque archetypes, respectively. Associated type inference synthesizes two type aliases, \texttt{ConcreteP.X} and \texttt{ConcreteP.Y}, which are then referenced further down in the program: -\begin{enumerate} -\item The global variable \texttt{mince} has an explicit type \texttt{(ConcreteP.X,~ConcreteP.Y)}. -\item The function \texttt{pie()} declares a same-type requirement whose right hand side is the type alias \texttt{ConcreteP.X}. -\end{enumerate} - -\begin{listing}\captionabove{Referencing an opaque return type via associated type inference}\label{reference opaque return type} -\begin{Verbatim} -public protocol P { - associatedtype X: Q - associatedtype Y: Q - - func f() -> (X, Y) -} - -public protocol Q {} - -public struct ConcreteP: P { - public func f() -> (some Q, some Q) { - return (FirstQ(), SecondQ()) - } -} - -public struct FirstQ: Q {} -public struct SecondQ: Q {} - -public let mince: (ConcreteP.X, ConcreteP.Y) = ConcreteP().f() - -public func pie(_: S) where S.Element == ConcreteP.X {} -\end{Verbatim} -\end{listing} -\end{example} - -\index{synthesized declaration} -\index{associated type inference} -The above trick allows referencing opaque return types, albeit indirectly. Is there a way to write down the underlying type of the type aliases \texttt{ConcreteP.X} and \texttt{ConcreteP.Y}? The answer is yes, but only in module interface files and textual SIL, not source code. Module interface files explicitly spell out all type aliases synthesized by associated type inference, avoiding the need to perform associated type inference when building the interface file in another compilation job. Textual SIL similarly needs to spell out the type of the value produced by each SIL instruction. - -\index{mangling} -A direct reference to an opaque return type is expressed in the grammar as a type attribute encoding the mangled name of the owner declaration together with an index: -\begin{quote} -\texttt{@\_opaqueReturnTypeOf("\underline{mangled name}", \underline{index}) \underline{identifier}} -\end{quote} -The mangled name unambiguously identifies the owner declaration. The index identifies a specific opaque archetype among several when the owner declaration's return type contains multiple occurrences of \texttt{some}. The identifier is ignored; in the Swift language grammar, a type attribute must apply to some underlying type representation, so by convention module interface printing outputs ``\texttt{\_\_}'' as the underlying type representation. - -\begin{example} -\ListingRef{reference opaque return type from interface} shows the generated module interface for \ListingRef{reference opaque return type}, with some line breaks inserted for readability. -\begin{listing}\captionabove{References to opaque return types in a module interface}\label{reference opaque return type from interface} -\begin{Verbatim} -public protocol P { - associatedtype X : mince.Q - associatedtype Y : mince.Q - func f() -> (Self.X, Self.Y) -} -public protocol Q { -} -public struct ConcreteP : mince.P { - public func f() -> (some mince.Q, some mince.Q) - - public typealias X = @_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 0) __ - public typealias Y = @_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 1) __ -} -public struct FirstQ : mince.Q { -} -public struct SecondQ : mince.Q { -} - -public let mince: (@_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 0) __, - @_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 1) __) -public func pie(_: S) - where S : Swift.Sequence, - S.Element == @_opaqueReturnTypeOf("$s5mince9ConcretePV1fQr_QR_tyF", 0) __ -\end{Verbatim} -\end{listing} -\end{example} - -A direct reference to a substituted opaque archetype is written like a generic argument list following the identifier. The generic arguments correspond to the flattened list of generic parameters in the generic signature of the opaque archetype's owner declaration. - -\begin{example} -In \ListingRef{substituted opaque archetype reference}, the conformance is declared on the \texttt{Derived} class, but the type witness for \texttt{X} is an opaque archetype from a method on \texttt{Outer.Inner}. The superclass type of \texttt{Derived} is \texttt{Outer.Inner}, so a substitution map is applied to the opaque archetype: -\[ -\SubstMap{ -\SubstType{T}{Int}\\ -\SubstType{U}{String} -} -\] -In the module interface file, this prints as the generic argument list \texttt{}, as shown in \ListingRef{substituted opaque archetype reference interface}. -\end{example} -\begin{listing}\captionabove{Source code with a substituted opaque archetype as a type witness}\label{substituted opaque archetype reference} -\begin{Verbatim} -public protocol P { - associatedtype X: Q - - func f() -> X -} - -public protocol Q {} - -public struct ConcreteQ: Q {} - -public class Outer { - public class Inner { - public func f() -> some Q { - return ConcreteQ() - } - } -} - -public class Derived: Outer.Inner, P {} -\end{Verbatim} -\end{listing} -\begin{listing}\captionabove{Module interface with a substituted opaque archetype as a type witness}\label{substituted opaque archetype reference interface} -\begin{Verbatim} -public protocol P { - associatedtype X : mince.Q - func f() -> Self.X -} -public protocol Q { -} -public struct ConcreteQ : mince.Q { -} -public class Outer { - public class Inner { - public func f() -> some mince.Q - } -} -public class Derived : mince.Outer.Inner, mince.P { - public typealias X = @_opaqueReturnTypeOf("$s5mince5OuterC5InnerC1fQryF", 0) - __ -} -\end{Verbatim} -\end{listing} - -\section[]{Runtime Representation} - -At runtime, an instance of an opaque archetype must be manipulated abstractly, similar to a generic parameter. This mechanism allows the underlying type of an opaque return type to change without breaking callers in other modules. - -Recall that an opaque type declaration consists of an opaque interface generic signature, and an underlying type substitution map for this generic signature. - -The opaque interface generic signature is the \emph{interface} of the opaque type declaration. The underlying type substitution map is the \emph{implementation} of the opaque type declaration. For each generic parameter and conformance requirement, the compiler emits an accessor function. Each accessor function returns the corresponding concrete type metadata or witness table from the substitution map. - -Opaque archetypes are also parameterized by the owner declaration's generic signature. The generic parameters and conformance requirements of the owner declaration become the input parameters of these accessor functions. - -Note the symmetry here between a function's ``input'' generic parameters and conformance requirements, which become input parameters, and the opaque type declaration's ``output'' generic parameters and conformance requirements, which become calls to accessor functions. The caller provides a substitution map for the ``input'' parameters by passing in concrete type metadata and witness tables. The opaque type declaration provides a substitution map for the ``output'' parameters by emitting an accessor function to return the concrete type metadata and witness tables. - -TODO: figure - -\begin{example} -The following generic function declares an opaque return type: -\begin{Verbatim} -func uniqueElements(_ elts: [E]) -> some Sequence {...} -\end{Verbatim} -The calling convention for \texttt{uniqueElements()} receives the type metadata for \texttt{E} together with a witness table for \texttt{E:\ Hashable} as lowered arguments. The return value is an instance of an opaque archetype, and is returned indirectly. - -In order to allocate a buffer of the correct size to hold the return value prior to making the call and to manipulate the return value after the call, the caller invokes the opaque type metadata accessor for \texttt{uniqueElements()}. The metadata accessor also takes the type metadata for \texttt{E} together with a witness table for \texttt{E:\ Hashable}, since the underlying type is parameterized by the generic signature of \texttt{uniqueElements()}. - -Finally, the witness table for the conformance of the underlying type to \texttt{Sequence} is obtained by calling the opaque type witness table accessor for \texttt{uniqueElements()}, which again takes the type metadata for \texttt{E} together with a witness table for \texttt{E:\ Hashable}. -\end{example} - -\fi - -\section[]{Source Code Reference} - -\iffalse - -TODO: - -\begin{description} -\item[\texttt{TypeBase}] The base class of the Swift type hierarchy. -\begin{itemize} -\item \texttt{hasOpaqueArchetype()} Returns true if the type contains an opaque archetype. -\end{itemize} -\item[\texttt{OpaqueTypeArchetypeType}] The class of opaque archetypes. -\begin{itemize} -\item \texttt{getOpaqueDecl()} Returns the opaque type declaration that owns this archetype. -\item \texttt{getSubstitutions()} Returns substitutions applied to this archetype's generic environment. Initially this is an identity substitution map. -\end{itemize} -\item[\texttt{OpaqueTypeDecl}] An opaque type declaration. -\begin{itemize} -\item \texttt{getNamingDecl()} Returns the original declaration having this opaque return type. -\item \texttt{getOpaqueInterfaceGenericSignature()} Returns the generic signature describing the opaque return types and their requirements. -\item \texttt{getUniqueUnderlyingTypeSubstitutions()} Returns the substitution map describing the underlying types of the opaque archetypes. Will return \texttt{None} if the underlying types have not been computed yet (or if they will never be computed because the original declaration's body is not available). -\end{itemize} - -\item[\texttt{GenericEnvironment}] A mapping from type parameters to archetypes with respect to a generic signature. -\begin{itemize} -\item \texttt{forOpaqueType()} Returns the unique opaque generic environment for an opaque return type declaration and substitution map. -\end{itemize} - -\end{description} - -\fi - -\end{document} diff --git a/docs/Generics/chapters/preface.tex b/docs/Generics/chapters/preface.tex index f0e5c191c437a..a00ceca74a84e 100644 --- a/docs/Generics/chapters/preface.tex +++ b/docs/Generics/chapters/preface.tex @@ -9,56 +9,54 @@ \lettrine{T}{his is a book} about the implementation of generic programming---also known as parametric polymorphism---in the \index{Swift}Swift compiler. You won't learn how to \emph{write} generic code in Swift here; the best reference for that is, of course, the official language guide \cite{tspl}. This book is intended mainly for Swift compiler developers who interact with the generics implementation, other language designers who want to understand how Swift evolved, Swift programmers curious to peek under the hood, and finally, mathematicians interested in a practical application of string rewriting and the Knuth-Bendix completion procedure. -From the compiler developer's point of view, the \emph{user} is the developer writing the code being compiled. The declarations, types, statements and expressions written in the user's program become \emph{data structures} the compiler must analyze and manipulate. I assume some basic familiarity with these concepts, and compiler construction in general. For background reading, I recommend \cite{muchnick1997advanced}, \cite{cooper2004engineering}, \cite{craftinginterpreter}, and \cite{incrementalracket}. +From the compiler developer's point of view, the \emph{user} is the developer writing the code being compiled. The declarations, types, statements, and expressions that comprise the user's program become \emph{data structures} the compiler must then analyze and manipulate. I assume some basic familiarity with these concepts, and compiler construction in general. For background reading, see \cite{muchnick1997advanced,cooper2004engineering,craftinginterpreter,incrementalracket}. This book is divided into five parts. \PartRef{part syntax} gives a high-level overview of the Swift compiler architecture, and describes how types and declarations, and specifically, generic types and declarations, are modeled by the compiler. \begin{itemize} -\item \ChapRef{roadmap} summarizes every key concept in the generics implementation with a series of worked examples, and surveys capabilities for generic programming found in other programming languages. -\item \ChapRef{compilation model} covers Swift's compilation model and module system as well as the \emph{request evaluator}, which adds an element of lazy evaluation to the typical ``compilation pipeline'' of parsing, type checking and code generation. -\item \ChapRef{types} describes how the compiler models the \emph{types} of values declared by the source program. Types form a miniature language of their own, and we often find ourselves taking them apart and re-assembling them in new ways. Generic parameter types, dependent member types, and generic nominal types are the three fundamental kinds; others are also summarized. -\item \ChapRef{decls} is about \emph{declarations}, the building blocks of Swift code. Functions, structs, and protocols are examples of declarations. Various kinds of declarations can be \emph{generic}. There is a common syntax for declaring generic parameters and stating \emph{requirements}. Protocols can declare associated types and impose \emph{associated requirements} on their associated types in a similar manner. +\item \ChapRef{chap:introduction} summarizes every key concept in the generics implementation with a series of worked examples, and surveys capabilities for generic programming found in other programming languages. +\item \ChapRef{chap:compilation model} covers Swift's compilation model and module system as well as the \emph{request evaluator}, which adds an element of lazy evaluation to the typical ``compilation pipeline'' of parsing, type checking and code generation. +\item \ChapRef{chap:types} describes how the compiler models the \emph{types} of values declared by the source program. Types form a miniature language of their own, and we often find ourselves taking them apart and re-assembling them in new ways. Generic parameter types, dependent member types, and generic nominal types are the three fundamental kinds; others are also summarized. +\item \ChapRef{chap:decls} is about \emph{declarations}, the building blocks of Swift code. Functions, structs, and protocols are examples of declarations. Various kinds of declarations can be \emph{generic}. There is a common syntax for declaring generic parameters and stating \emph{requirements}. Protocols can declare associated types and impose \emph{associated requirements} on their associated types in a similar manner. \end{itemize} -\PartRef{part semantics} focuses on the core \emph{semantic} objects in the generics implementation. To grasp the mathematical asides, it helps to have had some practice working with definitions and proofs, at the level of an introductory course in calculus, linear algebra or combinatorics. A summary of basic math appears in \AppendixRef{math summary}. +\PartRef{part semantics} focuses on the core \emph{semantic} objects in the generics implementation. To grasp the mathematical asides, it helps to have had some practice working with definitions and proofs, at the level of an introductory course in calculus, linear algebra, or combinatorics. A summary of basic math appears in \AppendixRef{math summary}. \begin{itemize} -\item \ChapRef{genericsig} defines the \emph{generic signature}, which collects the generic parameters and explicit requirements of a generic declaration. The explicit requirements of a generate signature generate a set of \emph{derived requirements} and \emph{valid type parameters}, which explains how we type check code \emph{inside} a generic declaration. This formalism is realized in the implementation via \emph{generic signature queries}. +\item \ChapRef{chap:generic signatures} defines the \emph{generic signature}, which collects the generic parameters and explicit requirements of a generic declaration. The explicit requirements of a generate signature generate a set of \emph{derived requirements} and \emph{valid type parameters}, which explains how we type check code \emph{inside} a generic declaration. This formalism is realized in the implementation via \emph{generic signature queries}. -\item \ChapRef{substmaps} defines the \emph{substitution map}, a mapping from generic parameter types to replacement types. The \emph{type substitution algebra} will explain the operations of type substitution and substitution map composition. This explains how we type check a \emph{reference} to (sometimes called a \emph{specialization} of) a generic declaration. +\item \ChapRef{chap:substitution maps} defines the \emph{substitution map}, a mapping from generic parameter types to replacement types. The \emph{type substitution algebra} will explain the operations of type substitution and substitution map composition. This explains how we type check a \emph{reference} to (sometimes called a \emph{specialization} of) a generic declaration. -\item \ChapRef{conformances} defines the \emph{conformance}, a description of how a concrete type fulfills the requirements of a protocol, in particular its associated types. In the type substitution algebra, conformances are to protocols what substitution maps are to generic signatures. +\item \ChapRef{chap:conformances} defines the \emph{conformance}, a description of how a concrete type fulfills the requirements of a protocol, in particular its associated types. In the type substitution algebra, conformances are to protocols what substitution maps are to generic signatures. -\item \ChapRef{genericenv} defines \emph{archetypes} and \emph{generic environments}, two abstractions used throughout the compiler. Also describes the \emph{type parameter graph} that gives us an intuitive visualization of a generic signature. +\item \ChapRef{chap:archetypes} defines \emph{archetypes} and \emph{generic environments}, two abstractions used throughout the compiler. Also describes the \emph{type parameter graph} that gives us an intuitive visualization of a generic signature. -\item \ChapRef{typeresolution} describes \emph{type resolution}, which uses name lookup and substitution to resolve syntactic representations to semantic types. Checking if a substitution map satisfies the requirements of its generic signature links our two formalisms. +\item \ChapRef{chap:type resolution} describes \emph{type resolution}, which uses name lookup and substitution to resolve syntactic representations to semantic types. Checking if a substitution map satisfies the requirements of its generic signature links our two formalisms. \end{itemize} -\PartRef{part specialties} covers some additional language features and compiler internals, while further developing the derived requirements formalism and type substitution: +\PartRef{part subtleties} covers some additional language features and compiler internals, while further developing the derived requirements formalism and type substitution: \begin{itemize} -\item \ChapRef{extensions} discusses extension declarations, which add members and conformances to existing types. Extensions can also declare \emph{conditional conformances}, which have some interesting behaviors. +\item \ChapRef{chap:extensions} discusses extension declarations, which add members and conformances to existing types. Extensions can also declare \emph{conditional conformances}, which have some interesting behaviors. -\item \ChapRef{building generic signatures} explains how we build a generic signature from syntax written in source, and gives a formal description of \emph{requirement minimization}. This chapter also shows how invalid requirements are diagnosed, and defines a \emph{well-formed generic signature} as one that passes these checks. +\item \ChapRef{chap:building generic signatures} explains how we build a generic signature from syntax written in source, and gives a formal description of \emph{requirement minimization}. This chapter also shows how invalid requirements are diagnosed, and defines a \emph{well-formed generic signature} as one that passes these checks. \item \ChapRef{conformance paths} shows that \emph{conformance paths} give us a way to evaluate expressions in the type substitution algebra, which completes the formalism. The concept of a \emph{recursive conformance} is explored, and finally, the type substitution algebra is shown to be Turing-complete. -\item \ChapRef{opaqueresult} is unfinished. It will describe opaque return types. +\item \ChapRef{chap:opaque result types} describes \emph{opaque result types}, another kind of generic abstraction that allows a function to hide its concrete return type from its caller, by specifying the generic requirements this return type satisfies instead. This capability is built in terms of generic signatures, substitution maps, and archetypes. -\item \ChapRef{existentialtypes} is unfinished. It will describe existential types. +\item \ChapRef{chap:existential types} is unfinished. It will describe existential types. \end{itemize} -\PartRef{part rqm} describes the Requirement Machine, a \emph{decision procedure} for the derived requirements formalism. The original contribution here is that generic signature queries and requirement minimization are problems in the theory of \emph{string rewriting}: +\PartRef{part rqm} describes the Requirement Machine, a \emph{decision procedure} for the derived requirements of \ChapRef{chap:generic signatures}. The original contribution here is that generic signature queries and requirement minimization are problems in the theory of \emph{string rewriting}: \begin{itemize} \item \ChapRef{rqm basic operation} gives a high level overview of how both generic signature queries and requirement minimization recursively build a \emph{requirement machine} for a generic signature from the requirement machines of its \emph{protocol components}. -\item \ChapRef{monoids} introduces \emph{finitely-presented monoids} and the \emph{word problem}, and then presents the theoretical result that a finitely-presented monoid can be encoded as a generic signature, such that word problems become generic signature queries. Therefore, derived requirements are \emph{at least} as expressive as the word problem; that is, undecidable in the general case. +\item \ChapRef{monoids} introduces \emph{finitely-presented monoids} and the \emph{word problem}, and then presents the theoretical result that a finitely-presented monoid can be encoded as a generic signature, such that word problems become generic signature queries. Therefore, generic signature queries are \emph{at least} as hard as the word problem; that is, undecidable in the general case. -\item \ChapRef{symbols terms rules} goes in the other direction and shows that a generic signature can be encoded in the form of a finitely-presented monoid, such that generic signature queries become word problems. Therefore, derived requirements are \emph{at most} as expressive as the word problem---which can be solved in many cases using known techniques. This is the heart of our decision procedure. +\item \ChapRef{chap:symbols terms rules} goes in the other direction and shows that a generic signature can be encoded in the form of a finitely-presented monoid, such that generic signature queries become word problems. Thus, derived requirements are \emph{at most} as hard as the word problem---and the word problem can be solved in many cases using known techniques. This is the heart of our decision procedure. -\item \ChapRef{completion} describes the Knuth-Bendix algorithm, which attempts to solve the word problem by constructing a \emph{convergent rewriting system}. Fundamental generic signature queries can then be answered via the \emph{normal form algorithm}. This is the brain of our decision procedure. +\item \ChapRef{chap:completion} describes the Knuth-Bendix algorithm, which attempts to solve the word problem by constructing a \emph{convergent rewriting system}. Fundamental generic signature queries can then be answered via the \emph{normal form algorithm}. This is the brain of our decision procedure. \item \ChapRef{propertymap} is unfinished. It will describe the construction of a \emph{property map} from a convergent rewriting system; the property map answers trickier generic signature queries. -\item \ChapRef{concrete conformances} is unfinished. It will describe the handling of concrete types in the Requirement Machine. - \item \ChapRef{rqm minimization} is unfinished. It will present the algorithm for rewrite rule minimization, which is the final step in building a new generic signature. \end{itemize} @@ -66,12 +64,14 @@ Occasional \IndexDefinition{history}historical asides talk about how things came to be. Starting with Swift~2.2, the design of the Swift language has been guided by the Swift evolution process, where language changes are pitched, debated, and formalized in the open \cite{evolution}. I will cite Swift evolution proposals when describing various language features. You will find a lot of other interesting material in the bibliography as well, not just evolution proposals. -This book does not say much about the runtime side of the separate compilation of generics, except for a brief overview of the model in relation to the type checker in \ChapRef{roadmap}. To learn more, I recommend watching a pair of LLVM Developer's Conference talks: \cite{llvmtalk} which gives a summary of the whole design, and \cite{cvwtalk} which describes some recent optimizations. +This book does not say much about the runtime side of the separate compilation of generics, except for a brief overview of the model in relation to the type checker in \ChapRef{chap:introduction}. To learn more, I recommend watching a pair of LLVM Developer's Conference talks: \cite{llvmtalk} which gives a summary of the whole design, and \cite{cvwtalk} which describes some recent optimizations. -Also, while most of the material should be current as of Swift~6, two recent language extensions are not covered. These features are mostly additive and can be understood by reading the evolution proposals: +Also, while most of the material should be current as of Swift~6.2, some recent language extensions are not covered. These features are mostly additive and can be understood by reading the evolution proposals: \begin{enumerate} -\item \index{parameter pack}Parameter packs, also known as \index{variadic generics}variadic generics (\cite{se0393}, \cite{se0398}, \cite{se0399}). -\item \index{noncopyable type}Noncopyable types (\cite{se0390}, \cite{se0427}). +\item \index{parameter pack}Parameter packs (or \index{variadic generics}variadic generics), introduced in \IndexSwift{5.9}Swift~5.9 (\cite{se0393,se0398,se0399}). +\item \index{noncopyable type}Noncopyable types, introduced in \IndexSwift{6.0}Swift~6 (\cite{se0390,se0427}). +\item \index{nonescapable type}Nonescapable types, introduced in \IndexSwift{6.2}Swift~6.2 (\cite{se0446}). +\item Integer generic parameters, introduced in \IndexSwift{6.2}Swift~6.2 (\cite{se0452}). \end{enumerate} \section*{Source Code} diff --git a/docs/Generics/chapters/property-map.tex b/docs/Generics/chapters/property-map.tex index 930598eda6aef..fd305aea68f47 100644 --- a/docs/Generics/chapters/property-map.tex +++ b/docs/Generics/chapters/property-map.tex @@ -10,14 +10,10 @@ % Remove these %%%% \newcommand{\proto}[1]{\texttt{#1}} -\usepackage{bm} \newcommand{\namesym}[1]{\mathrm{#1}} -\newcommand{\genericparam}[1]{\bm{\mathsf{#1}}} +\newcommand{\genericparam}[1]{\mathsf{#1}} \newcommand{\gensig}[2]{\langle #1\;\textit{where}\;#2\rangle} -\newcommand{\genericsym}[2]{\bm{\uptau}_{#1,#2}} -\DeclareMathOperator{\gpdepth}{depth} -\DeclareMathOperator{\gpindex}{index} -\DeclareMathOperator{\domain}{domain} +\newcommand{\genericsym}[2]{\uptau_{#1,#2}} %%%%% Until now, you've seen how to solve the \texttt{requiresProtocol()} generic signature @@ -280,6 +276,44 @@ \IndexTwoFlag{debug-requirement-machine}{concrete-unification} +\section[]{Concrete Conformances}\label{rqm concrete conformances} + +\ifWIP +TODO: +\begin{itemize} +\item Concrete conformance rule, property-like +\item Virtual rule that introduces it +\item Idea: it should eliminate the conformance rule but not the concrete type rule +\item Doesn't actually appear in signature so should not impact minimization +\item Conditional requirement inference, only in generic signatures and not protocols, because we can't merge connected components during completion. for a generic signature this actually has to import new components in the general case +\end{itemize} +\fi + +\IndexTwoFlag{debug-requirement-machine}{concretize-nested-types} + +\ifWIP +TODO: +\begin{itemize} +\item Concrete type witness +\item Abstract type witness +\item Virtual rules +\item Algorithm for building a ``relative'' concrete type symbol from substituting another symbol's pattern type +\end{itemize} + +TODO: +\begin{itemize} +\item Free conformances +\item Can a protocol have a free conformance +\item Can a conformance be made free by changing the protocol +\item Conformance evaluation graph +\item Heuristic to find same-type requirements from a conformance; just the parent type thing +\item The problem with opaque archetypes +\item Open question: can we encode a conformance more directly without evaluating it; \verb|G>>| example +\end{itemize} +\fi + +\IndexFlag{enable-requirement-machine-opaque-archetypes} + \section[]{Generic Signature Queries}\label{implqueries} \ifWIP @@ -306,7 +340,7 @@ Layout symbols store a layout constraint as an instance of the \texttt{LayoutConstraint} class. The join operation used in the implementation of the \texttt{requiresClass()} query is defined in the \texttt{merge()} method on \texttt{LayoutConstraint}. -You've already seen the \texttt{requiresProtocol()} query in \ChapRef{symbols terms rules}, where it was shown that it can be implemented by checking if $\Lambda(T).\pP\downarrow\Lambda(T)$. The property map implementation is perhaps slightly more efficient, since it only simplifies a single term and not two. The $\texttt{requiresClass()}$ and $\texttt{isConcreteType()}$ queries are new on the other hand, and demonstrate the power of the property map. With the rewrite system alone, they cannot be implemented except by exhaustive enumeration over all known layout and concrete type symbols. +You've already seen the \texttt{requiresProtocol()} query in \ChapRef{chap:symbols terms rules}, where it was shown that it can be implemented by checking if $\Lambda(T).\pP\downarrow\Lambda(T)$. The property map implementation is perhaps slightly more efficient, since it only simplifies a single term and not two. The $\texttt{requiresClass()}$ and $\texttt{isConcreteType()}$ queries are new on the other hand, and demonstrate the power of the property map. With the rewrite system alone, they cannot be implemented except by exhaustive enumeration over all known layout and concrete type symbols. All of the subsequent examples reference the protocol definitions from \ExRef{propmapexample3}, and the resulting property map shown in Table~\ref{propmapexample2table}. \begin{example} Consider the canonical type term $\genericsym{0}{0}.\assocsym{P}{B}$. This type parameter conforms to $\proto{Q}$ via a requirement stated in the generic signature, and also to $\proto{R}$, because $\proto{Q}$ inherits from $\proto{R}$. The $\texttt{requiresProtocol()}$ query will confirm these facts, because the property map entry for $\genericsym{0}{0}.\assocsym{P}{B}$ contains the protocol symbols $\pQ$ and $\pR$: diff --git a/docs/Generics/chapters/substitution-maps.tex b/docs/Generics/chapters/substitution-maps.tex index 09573bbd86f1b..489c283153e87 100644 --- a/docs/Generics/chapters/substitution-maps.tex +++ b/docs/Generics/chapters/substitution-maps.tex @@ -2,339 +2,598 @@ \begin{document} -\chapter{Substitution Maps}\label{substmaps} +\chapter{Substitution Maps}\label{chap:substitution maps} -\lettrine{S}{ubstitution maps arise} when the type checker needs to reason about a reference to a generic declaration, specialized with list of generic arguments. Abstractly, a \IndexDefinition{substitution map}substitution map defines a \IndexDefinition{replacement type}replacement type corresponding to each type parameter of a generic signature; applying a substitution map to the interface type of a generic declaration recursively replaces the type parameters therein, producing the type of the specialized reference. +\lettrine{S}{ubstitution maps} are the semantic objects that describe \emph{references} to generic declarations. If we think of a generic signature as a \emph{contract} between a declaration and its client, then the previous chapter's derived requirements formalism gave us one side of this contract: within the \emph{body} of a generic declaration, we start from the assumption that the concrete replacement types, whatever they might be, must satisfy the generic signature's explicit requirements, and from this, we derive various consequences which then hold. In this chapter, we turn our attention to the client's side of this contract, which leads to the study of \emph{type substitution}. -The \index{generic signature!type substitution}generic signature of a substitution map is called the \IndexDefinition{input generic signature}\emph{input generic signature}. A substitution map stores its input generic signature, and the generic signature's list of generic parameters and \index{conformance}conformance requirements determine the substitution map's shape: -\begin{quote} -\texttt{<\ttbox{A}, \ttbox{B} where \ttbox{B:\ Sequence}, B.[Sequence]Element == Int>} -\end{quote} -Formally, a substitution map consists of a replacement type for each generic parameter, and a conformance for each conformance requirement: -\begin{quote} -\begin{tabular}{ccc} -\ttbox{A}&\ttbox{B}&\ttbox{B:\ Sequence}\\ -$\Downarrow$&$\Downarrow$&$\Downarrow$\\ -\ttbox{String}&\ttbox{Array}&\ttbox{Array:\ Sequence} -\end{tabular} -\end{quote} -We can collect all of the above information in a table: -\begin{quote} -\begin{tabular}{|lcl|} -\hline -\rule{0pt}{3ex}\textbf{Generic parameters}&&\textbf{Replacement types}\\ -\texttt{A}&$\mapsto$&\texttt{String}\\ -\texttt{B}&$\mapsto$&\texttt{Array}\\[\medskipamount] -\textbf{Conformance requirements}&&\textbf{Conformances}\\ -$\ConfReq{B}{Sequence}$&$\mapsto$&$\ConfReq{Array}{Sequence}$\\[\medskipamount] -\hline -\end{tabular} -\end{quote} -Or more concisely,\index{$\mapsto$}\index{$\mapsto$!z@\igobble|seealso{substitution map}} -\begin{align*} -\SubstMapC{ -&\SubstType{A}{String},\\ -&\SubstType{B}{Array} -}{\\ -&\SubstConf{B}{Array}{Sequence} -} -\end{align*} +\paragraph{Input generic signature.} +The ``shape'' of a substitution map is determined by its \IndexDefinition{input generic signature}\emph{input generic signature}. In this chapter, we will primarily focus on the simplest case, where this generic signature has no requirements, but we will summarize the general case shortly. Subsequent chapters will explain how requirements interact with substitution. -\begin{listing}\captionabove{Substitution maps in type checking}\label{substmaptypecheck} +\begin{example} +Let's look at this generic function, and think about its callers: \begin{Verbatim} -func genericFunction(_: A, _: B) - where B.Element == Int {} - -struct GenericType where B.Element == Int { - func nonGenericMethod() {} +func combine(_ t: T, _ u: U) -> (T, Array) { + return (t, [u]) } - -// substitution map for the call is {A := String, B := Array}. -genericFunction("hello", [1, 2, 3]) - -// the type of `value' is GenericType>. -let value = GenericType>() - -// the context substitution map for the type of `value' is -// {A := String, B := Array}. -value.nonGenericMethod() \end{Verbatim} -\end{listing} +Our generic signature has two generic parameters and no requirements. Let's call it $G$: +\[G := \texttt{}, \qquad \text{or} \qquad \texttt{<\rT, \rU>}.\] +Suppose we call our function with these two values; what is the type of the result? +\begin{Verbatim} +let t: Optional = 3 +let u: String = "Hello world" -\begin{example} -We often use the Greek letter $\Sigma$, in various forms, to denote substitution maps. For example, $\Sigma$, $\Sigma_1$, $\Sigma_2$, $\Sigma^\prime$, etc. For now we'll just work with the single substitution map defined above, so let's denote it by $\Sigma$: +let result = combine(t, u) // what is the return type of this call? +\end{Verbatim} +By matching the function's declared parameter types against the types of the argument expressions, the \index{expression type checker}expression type checker can deduce that \tT~must be \texttt{Optional}, and also that \tU~must be \texttt{String}. This gives us our substitution map; let's call it $\Sigma$: \begin{align*} -\Sigma := \SubstMapC{ -&\SubstType{A}{String},\\ -&\SubstType{B}{Array} -}{\\ -&\SubstConf{B}{Array}{Sequence} -} +\Sigma := \{&\SubstType{\rT}{Optional},\\ +&\SubstType{\rU}{String}\} \end{align*} -Our substitution map appears while type checking the program shown in \ListingRef{substmaptypecheck}. Here, all three of \texttt{genericFunction()}, \texttt{GenericType} and \texttt{nonGenericMethod()} have the same generic signature, \texttt{}. When type checking a generic function call, the expression type checker infers the generic arguments from the types of the argument expressions. When referencing a generic type, the generic arguments can be written explicitly. In fact, all three declarations are also referenced with the same substitution map. (In the case of a generic type, this substitution map is called the \emph{context substitution map}, as you will see in \SecRef{contextsubstmap}.) +Our notation for specifying a substitution map lists the \IndexDefinition{replacement type}replacement type for each generic parameter of its input generic signature, in order. Type substitution disregards \index{sugared type}type sugar, and to remind ourselves of this fact, we write the canonical generic parameter type on the left-hand side of the ``$\mapsto$'' symbol. We often name our substitution maps $\Sigma$ or some variation, like $\Sigma_1$, $\Sigma_2$, $\Sigma^\prime$, and so on. + +Recall from \ChapRef{chap:types} that types have a tree structure, so the two replacement types in our substitution map~$\Sigma$ have an in-memory form that looks something like this: +\begin{center} +\begin{tikzpicture}[node distance=0.5cm, baseline={([yshift=-2pt]OptionalInt)}] +\node (OptionalInt) [type] {\texttt{Optional}}; + +\node (Int) [type, below=of OptionalInt] {\texttt{\vphantom{y}Int}}; + +\draw [arrow] (OptionalInt) -- (Int); +\end{tikzpicture}\;, +\quad and \quad +\begin{tikzpicture}[node distance=0.5cm, baseline={([yshift=-2pt]String)}] +\node (String) [type] {\texttt{String}}; +\end{tikzpicture}\;. +\end{center} +Now, to figure out the inferred type of ``\texttt{result},'' we note that our function's declared return type is the \index{tuple type}tuple type \texttt{(\rT, Array<\rU>)}. Let's call this the \emph{original type}. This type contains the generic parameters of~$G$. To get the \emph{substituted type}, we replace each occurrence of \rT\ and \rU\ in the original type with its replacement type. We conclude that the substituted type is \texttt{(Optional, Array)}. \FigRef{subst fig} illustrates the transformation. \end{example} -\paragraph{Type substitution.} Substitution maps operate on interface types. Recall that an \index{interface type!type substitution}interface type is a type \emph{containing} valid type parameters for some generic signature, which may itself not be a type parameter; for example, one possible interface type is \texttt{Array}, if \texttt{T} is a generic parameter type. Let's introduce the formal notation \IndexSetDefinition{type}{\TypeObj{G}}$\TypeObj{G}$ to mean the set of interface types for a generic signature $G$. Then, if $\texttt{T}\in\TypeObj{G}$ and $\Sigma$ is a substitution map with input generic signature $G$, we can \emph{apply} $\Sigma$ to \texttt{T} to get a new type. This operation is called \IndexDefinition{type substitution}\emph{type substitution}. The interface type here is called the \IndexDefinition{original type}\emph{original type}, and the result of the substitution is the \IndexDefinition{substituted type}\emph{substituted type}. We will think of applying a substitution map to an interface type as an binary operation: \[\texttt{T}\otimes\Sigma\] -The \index{$\otimes$}\index{$\otimes$!z@\igobble|seealso{type substitution}}\index{binary operation}$\otimes$ binary operation is a \emph{right action} of substitution maps on types. (We could have instead defined a left action, but later we will see the right action formulation is more natural for expressing certain identities. Indeed, we'll develop this notation further throughout this book.) +\begin{figure}\captionabove{Type substitution}\label{subst fig} +\begin{center} +\begin{tabular}{ccc} +\textbf{Original type:}&&\textbf{Substituted type:}\\[\medskipamount] +\begin{tikzpicture}[baseline={([yshift=-3pt]Tuple)}] +\node (Tuple) [type] {\texttt{(\rT, Array<\rU>)}}; -Type substitution recursively replaces any type parameters appearing in the original type with new types derived from the substitution map, while preserving the ``concrete structure'' of the original type. Thus the behavior of type substitution is ultimately defined by how substitution maps act the two kinds of type parameters: generic parameters and dependent member types: -\begin{itemize} -\item Applying a substitution map to a generic parameter type returns the corresponding replacement type from the substitution map. +\node (T) [type, fill=light-gray, below=of Tuple, xshift=-38] {\texttt{\vphantom{y}}\rT}; -Type substitution does not care about generic parameter sugar in the original type; replacement types for generic parameters are always looked up by depth and index in the substitution map. +\node (ArrayU) [type, below=of Tuple, xshift=38] {\texttt{Array<\rU>}}; -\item Applying a substitution map to a dependent member type derives the replacement type from one of the substitution map's conformances. +\node (U) [type, fill=light-gray, below=of ArrayU] {\texttt{\vphantom{y}}\rU}; -Now, we haven't talked about conformances yet. There is a circularity between substitution maps and conformances---substitution maps can store conformances, and conformances can store substitution maps. We will look at conformances in great detail in \ChapRef{conformances}. The derivation of replacement types for dependent member types is discussed in \SecRef{abstract conformances}. -\end{itemize} +\draw [arrow] (Tuple) -- (T); +\draw [arrow] (Tuple) -- (ArrayU); +\draw [arrow] (ArrayU) -- (U); +\end{tikzpicture}& +{\Large $\Rightarrow$}& +\begin{tikzpicture}[baseline={([yshift=-3pt]TupleSubst)}] +\node (TupleSubst) [type, xshift=30] {\texttt{(Optional, Array)}}; + +\node (OptionalInt) [type, below=of TupleSubst, xshift=-50] {\texttt{Optional}}; + +\node (Int) [type, below=of OptionalInt] {\texttt{\vphantom{y}Int}}; + +\node (ArrayString) [type, below=of TupleSubst, xshift=50] {\texttt{Array}}; + +\node (String) [type, below=of ArrayString] {\texttt{String}}; + +\draw [arrow] (TupleSubst) -- (OptionalInt); +\draw [arrow] (OptionalInt) -- (Int); +\draw [arrow] (TupleSubst) -- (ArrayString); +\draw [arrow] (ArrayString) -- (String); +\end{tikzpicture} +\end{tabular} +\end{center} +\end{figure} + +This operation is called \emph{type substitution}. We will introduce some notation to talk about it first, before we look at the formal algorithm. With $\Sigma$ as above, we denote this type substitution operation as follows, with the original type on the left of the $\otimes$ operator, and the substitution map on the right: +\[ +\texttt{(\rT, Array<\rU>)} \otimes \Sigma = \texttt{(Optional, Array)} +\] +The above example showed one way to ``observe'' type substitution: we write down a generic function declaration, call it with some list of generic argument types, and then look at the return type of the call. Here is another similar trick. \begin{example} -Applying the substitution map from our running example to sugared and canonical generic parameter types produces the same results: +Consider \index{type resolution}type resolution with a \index{generic type alias!type substitution}generic type alias: +\begin{Verbatim} +typealias Foo = (T, Array) + +let x: Foo, String> = ... +\end{Verbatim} +Here, we're referencing \texttt{Foo} with the same generic arguments as before, so we get the same substitution map~$\Sigma$. To resolve the type annotation on ``\texttt{x}'', we apply $\Sigma$ to the \index{underlying type!type substitution}underlying type of \texttt{Foo}: \[ -\left\{ -\begin{array}{l} -\texttt{A}\\ -\ttgp{0}{0}\\ -\texttt{B}\\ -\ttgp{0}{1} -\end{array}\right\} -\otimes -\Sigma -= -\left\{ -\begin{array}{l} -\texttt{String}\\ -\texttt{String}\\ -\texttt{Array}\\ -\texttt{Array} -\end{array}\right\} +\texttt{(\rT, Array<\rU>)} \otimes \Sigma = \texttt{(Optional, Array)} \] +Since the underlying type of the alias and the generic arguments of a reference are both arbitrary, we can use generic type aliases to encode any type substitution operation. We will discuss type resolution of generic type aliases in \SecRef{identtyperepr} and \SecRef{member type repr}. \end{example} -\begin{listing}\captionabove{Applying a substitution map to four interface types}\label{typealiassubstlisting} + +\paragraph{Output generic signature.} +In our examples so far, the replacement types in the substitution map were \index{fully-concrete type}fully concrete. To observe a substitution map whose replacement types contain type parameters, we can reference a generic declaration from the body of some other generic declaration. We say that a generic signature $H$ is the \IndexDefinition{output generic signature}\emph{output generic signature} of a substitution map $\Sigma$ if all type parameters that appear in the replacement types of $\Sigma$ are valid type parameters of~$H$. + +\begin{example} +In the following, the generic \texttt{callee()} function is referenced from the body of \texttt{caller()}: \begin{Verbatim} -struct GenericType where B.Element == Int { - typealias T1 = A - typealias T2 = B - typealias T3 = (A.Type, Float) - typealias T4 = (Optional) -> B +func callee(_ value: T) -> Array { + return [value] } -let t1: GenericType>.T1 = ... -let t2: GenericType>.T2 = ... -let t3: GenericType>.T3 = ... -let t4: GenericType>.T4 = ... +func caller(_ x: X, y: Y) -> Array<(X, Y)> { + return callee((x, y)) // call here +} \end{Verbatim} -\end{listing} +The call expression passes in a tuple value as the argument, so the replacement type for \rT\ is the tuple type \texttt{(\rT, \rU)}: +\begin{center} +\begin{tikzpicture}[node distance=0.5cm, baseline={([yshift=-2pt]OptionalInt)}] +\node (Tuple) [type] {\texttt{(\rT, \rU)}}; + +\node (T) [type, fill=light-gray, below=of Tuple, xshift=-40] {\texttt{\vphantom{y}}\rT}; +\node (U) [type, fill=light-gray, below=of Tuple, xshift=40] {\texttt{\vphantom{y}}\rU}; + +\draw [arrow] (Tuple) -- (T); +\draw [arrow] (Tuple) -- (U); +\end{tikzpicture} +\end{center} +Our input generic signature is the generic signature of \texttt{callee()}: +\[ G := \texttt{<\rT>} \] +Our output generic signature is the generic signature of \texttt{caller()}: +\[ H := \texttt{<\rT, \rU>} \] +Here is the substitution map: +\[ \Sigma := \{ \SubstType{\rT}{(\rT, \rU)} \} \] +Now, we apply $\Sigma$ to the return type of \texttt{callee()} to get the substituted type: +\[ \texttt{Array<\rT>} \otimes \Sigma = \texttt{Array<(\rT, \rU)>} \] +\FigRef{subst fig output sig} illustrates the transformation. +\end{example} -\begin{example}\label{type alias subst example} \ListingRef{typealiassubstlisting} shows a generic type with four member type alias declarations. There are four global variables, and the type of each global variable is written as a member type alias reference with the same type base type, \texttt{GenericType>}. +\begin{figure}\captionabove{Type substitution with output generic signature}\label{subst fig output sig} +\begin{center} +\begin{tabular}{ccc} +\textbf{Original type:}&&\textbf{Substituted type:}\\[\medskipamount] +\begin{tikzpicture}[baseline={([yshift=-3pt]ArrayT)}] +\node (ArrayT) [type] {\texttt{Array<\rT>}}; -\index{underlying type} -\index{type alias declaration} -\index{substitution map} -Type resolution resolves a member type alias reference by applying a substitution map to the underlying type of the type alias declaration. Here, the underlying type of each type alias declaration is an interface type for the generic signature of \texttt{GenericType}, and the substitution map is the substitution map $\Sigma$ of \ExRef{substmaptypecheck}. +\node (T) [type, fill=light-gray, below=of ArrayT] {\texttt{\vphantom{y}}\rT}; -The type of each global variable \texttt{t1}, \texttt{t2}, \texttt{t3} and \texttt{t4} is determined by applying $\Sigma$ to the underlying type of each type alias declaration: -\begin{quote} -\begin{tabular}{lll} -\toprule -&\textbf{Original type}&\textbf{Substituted type}\\ -\midrule -\texttt{t1}&\texttt{A}&\texttt{String}\\ -\texttt{t2}&\texttt{B}&\texttt{Array}\\ -\texttt{t3}&\texttt{(A.Type, Float)}&\texttt{(String.Type, Float)}\\ -\texttt{t4}&\texttt{(Optional) -> B}&\texttt{(Optional) -> Array}\\ -\bottomrule +\draw [arrow] (ArrayT) -- (T); +\end{tikzpicture}& +{\Large $\Rightarrow$}& +\begin{tikzpicture}[baseline={([yshift=-3pt]ArrayTuple)}] +\node (ArrayTuple) [type] {\texttt{Array<(\rT, \rU)>}}; + +\node (Tuple) [type, below=of ArrayTuple] {\texttt{(\rT, \rU)}}; + +\node (T) [type, fill=light-gray, below=of Tuple, xshift=-40] {\texttt{\vphantom{y}}\rT}; + +\node (U) [type, fill=light-gray, below=of Tuple, xshift=40] {\texttt{\vphantom{y}}\rU}; + +\draw [arrow] (ArrayTuple) -- (Tuple); +\draw [arrow] (Tuple) -- (T); +\draw [arrow] (Tuple) -- (U); +\end{tikzpicture} \end{tabular} -\end{quote} -The first two original types are generic parameters, and substitution directly projects the corresponding replacement type from the substitution map; the second two original types are substituted by recursively replacing generic parameters they contain. -\end{example} +\end{center} +\end{figure} -References to generic type alias declarations are more complex because in addition to the generic parameters of the base type, the generic type alias will have generic parameters of its own. \SecRef{identtyperepr} describes how the substitution map is computed in this case. +Next, we introduce a bit of notation to formalize a concept we've already seen: +\begin{definition}\label{interface type def} +Let $G$ be a generic signature. We write \IndexSetDefinition{type}{\TypeObj{G}}$\TypeObj{G}$ to denote the \index{set!types}set of \IndexDefinition{interface type}interface types of $G$. In particular, this set contains the following: +\begin{itemize} +\item All \index{valid type parameter!interface type}valid type parameters of $G$. +\item All \index{nominal type}nominal types, formed from the non-generic nominal type declarations. +\item All \index{generic nominal type}generic nominal types, formed from the generic nominal type declarations and all combinations of generic arguments from $\TypeObj{G}$. +\item All \index{structural type}structural types---such as \index{function type}function types, \index{tuple type}tuple types, and \index{metatype type}metatypes---formed from the elements of $\TypeObj{G}$. +\end{itemize} +\end{definition} -\paragraph{Substitution failure.} -Substitution of an interface type containing dependent member types can \IndexDefinition{substitution failure}\emph{fail} if any of the conformances in the substitution map are invalid. In this case, an \index{error type}error type is returned instead of signaling an assertion. Invalid conformances can appear in substitution maps when the user's own code is invalid; it is not an invariant violation as long as other errors are diagnosed elsewhere and the compiler does not proceed to \index{SILGen}SILGen with error types in the \index{abstract syntax tree}abstract syntax tree. +We can also talk about the set of all substitution maps that map the interface types of one fixed generic signature into another: +\begin{definition} +Let $G$ and $H$ be generic signatures. We write \IndexSetDefinition{sub}{\SubMapObj{G}{H}}$\SubMapObj{G}{H}$ to denote the set of all substitution maps that have input generic signature~$G$ and output generic signature~$H$. +\end{definition} -\paragraph{Output generic signature.} -If the replacement types in the substitution map are \index{fully-concrete type}fully concrete---that is, they do not contain any type parameters---then all possible substituted types produced by this substitution map will also be fully concrete. If the replacement types are interface types for some \IndexDefinition{output generic signature}\emph{output} generic signature, the substitution map will produce interface types for this generic signature. The output generic signature might be a different from the \emph{input} generic signature of the substitution map. +These are \index{infinite set}infinite sets, so we do not realize them in the implementation. Instead, we use them as a notational aid to understand type substitution. An important and subtle point is that the output generic signature is not stored in the substitution map itself; it is implicit from usage. Thus, we must take care not to get our output generic signatures ``mixed up'' by keeping the relationships between these sets in mind. -The output generic signature is not stored in the substitution map; it is implicit from context. Also, fully-concrete types can be seen as valid interface types for \emph{any} generic signature, because they do not contain type parameters at all. Keeping these caveats in mind, we have this essential principle: -\begin{quote} -\textbf{A substitution map defines a transformation from the interface types of its input generic signature to the interface types of its output generic signature.} -\end{quote} -Recall our notation $\TypeObj{G}$ for the set of interface types of $G$. We also use the notation \IndexSetDefinition{sub}{\SubMapObj{G}{H}}$\SubMapObj{G}{H}$ for the set of substitution maps with input generic signature $G$ and output generic signature $H$. We make use of this notation to formalize our principle. If $\texttt{T}\in\TypeObj{G}$ and $\Sigma\in\SubMapObj{G}{H}$, then $\texttt{T}\otimes\Sigma\in\TypeObj{H}$, and thus the $\otimes$ binary operation is a function between the following sets: +Suppose we are given a $\Sigma\in\SubMapObj{G}{H}$. Note that the replacement types of $\Sigma$ are elements of $\TypeObj{H}$, and furthermore, if we take some $\tX\in\TypeObj{G}$ and replace each type parameter that occurs in \tX\ with an element of $\TypeObj{H}$, the transformed type will then also be an element of $\TypeObj{H}$. In particular, $\tX\otimes\Sigma\in\TypeObj{H}$, and we can think of type substitution as the following mapping between sets: \[\TypeObj{G}\otimes\SubMapObj{G}{H}\longrightarrow\TypeObj{H}\] +We model type substitution a binary operation on types and substitution maps, instead of using an ``applicative'' notation like ``$\Sigma(\tT)$'' where the substitution map is represented as a unary function. The reason for this will soon become clear, when we encounter additional forms of the \index{$\otimes$}\index{$\otimes$!z@\igobble|seealso{type substitution}}\index{binary operation}$\otimes$ operator that allow us to write down more complex expressions. The different forms of $\otimes$ will be related by various identities, which will define the \emph{type substitution algebra}. -\begin{algorithm}[Substitute interface type]\label{type subst algo} -Takes an interface type \tX\ and a substitution map $\Sigma$ as input. The type parameters contained in \tX\ must be valid for the input generic signature of $\Sigma$. Outputs $\tX\otimes\Sigma$. +The below algorithm for \IndexDefinition{type substitution}type substitution formalizes the structural transformation we saw earlier. It also calls out to a handful of subroutines to handle the dependent member type and archetype cases, which we will fill out later. + +\begin{algorithm}[Substitute type]\label{type subst algo} +Takes a type \tX, and a substitution map $\Sigma$ with input generic signature $G$. Outputs $\tX\otimes\Sigma$. +\begin{enumerate} +\item (Type parameter) If \tX\ is a type parameter \tT: \begin{enumerate} -\item If \tX\ is a generic parameter type \ttgp{d}{i}, find the replacement type for \ttgp{d}{i} in~$\Sigma$ and return it. -\item If \tX\ is a dependent member type \texttt{T.[P]A}, apply \AlgRef{dependent member type substitution} to \texttt{T.[P]A}~and~$\Sigma$, and return the result. -\item If \tX\ does not recursively contain any type parameters, return \tX. -\item Otherwise, recursively apply this algorithm to each child of \tX, and return the new type formed from the substituted children. +\item If $\Query{isValidTypeParameter}{G,\,\tT}$ is false, return the \index{error type}error type. +\item If \tT\ is a generic parameter type, return the replacement type for \tT\ in~$\Sigma$. +\item If \tT\ is a dependent member type, apply \AlgRef{dependent member type substitution}, and return the result. +\end{enumerate} +\item (Archetype) If \tX\ is a \index{primary archetype!type substitution}primary archetype $\archetype{T}_G$, return $\tT \otimes \Sigma$ (\SecRef{archetypesubst}). If \tX\ is an opaque archetype, call \AlgRef{opaquearchetypesubst}. If \tX\ is existential, call \AlgRef{existential archetype subst}. +\item (Base case) If \tX\ does not contain type parameters or archetypes: return \tX. +\item (Recurse) Apply $\Sigma$ to each child of \tX\ recursively, and form a new type from these substituted child types, preserving any non-type structural components of \tX. \end{enumerate} \end{algorithm} -\section{Generic Arguments}\label{contextsubstmap} +\begin{figure}\captionabove{Type substitution with dependent member type}\label{subst fig dmt} +\begin{center} +\begin{tabular}{ccc} +\textbf{Original type:}&&\textbf{Substituted type:}\\[\medskipamount] +\begin{tikzpicture}[baseline={([yshift=-3pt]ArrayTElement)}] +\node (ArrayTElement) [type] {\texttt{Array<\rT.Element>}}; -A nominal type is \IndexDefinition{specialized type}\emph{specialized} if the type itself or one of its \index{parent type!specialized type}parent types is a generic nominal type. That is, \texttt{Array} and \texttt{Array.Iterator} are both specialized types, but \texttt{Int} and \texttt{String.UTF8View} are not. Equivalently, a nominal type is specialized if its nominal type declaration is a generic context---that is, if the type declaration itself has a generic parameter list, or an outer declaration context has one. +\node (TElement) [type, fill=light-gray, below=of ArrayTElement] {\texttt{\vphantom{y}\rT.Element}}; -Every specialized type determines a unique substitution map for the generic signature of its declaration, called the \IndexDefinition{context substitution map}\emph{context substitution map}. The context substitution map replaces the generic parameters of the type declaration with the corresponding generic arguments of the specialized type. +\node (T) [type, fill=light-gray, below=of TElement] {\texttt{\vphantom{y}}\rT}; -Let's say that $d$ is a \index{nominal type declaration!generic arguments}nominal type declaration with generic signature $G$. The \index{declared interface type!nominal type declaration}declared interface type of $d$, which we will denote by $\tXd$, is an element of $\TypeObj{G}$. Suppose that \tX\ is some specialized type of $d$ whose generic arguments are interface types for a generic signature $H$, so that $\tX\in\TypeObj{H}$. The context substitution map of \tX\ is a substitution map $\Sigma\in\SubMapObj{G}{H}$, such that applying it to the declared interface type of $d$ gives us back \texttt{T}. That is, -\[ -\tX = \tXd\otimes\Sigma -\] -To demonstrate the above identity, consider the generic signature of the \texttt{Dictionary} type declaration in the standard library: -\begin{quote} -\texttt{} -\end{quote} -One possible specialized type for \texttt{Dictionary} is the type \texttt{Dictionary}; this type is related to the declared interface type of \texttt{Dictionary} by this substitution map: +\draw [arrow] (ArrayTElement) -- (TElement); +\draw [arrow] (TElement) -- (T); + +\begin{scope}[on background layer] + \node (Foo)[fit=(TElement) (T), inner sep=5pt, rounded corners, draw=gray, dashed] {}; +\end{scope} + +\end{tikzpicture}& +{\quad \Large $\Rightarrow$}& +\begin{tikzpicture}[baseline={([yshift=-3pt]ArrayTuple)}] +\node (ArrayInt) [type] {\texttt{Array}}; + +\node (Int) [type, below=of ArrayInt] {\texttt{\vphantom{y}Int}}; + +\draw [arrow] (ArrayInt) -- (Int); +\end{tikzpicture} +\end{tabular} +\end{center} +\end{figure} + +\paragraph{Dependent member types.} +If a generic signature states \index{conformance requirement!type substitution}conformance requirements to protocols with associated types, the generic signature's \index{valid type parameter}valid type parameters also include all \index{dependent member type!type substitution}dependent member types \index{derived requirement}derived from these conformance requirements, in addition to the generic parameters stated in the generic signature. +\begin{Verbatim} +func extract(_ s: S) -> Array {...} +let mySet: Set = [1, 2, 3] +let x = extract(mySet) +\end{Verbatim} +A substitution map records a \index{root conformance}\emph{root conformance} for each conformance requirement of its \index{input generic signature!conformance requirement}input generic signature. In the call to \texttt{extract()}, the substitution map is: \begin{align*} -\texttt{Dictionary<\ttgp{0}{0}, \ttgp{0}{1}>}\otimes -\SubstMapC{ -&\SubstType{\ttgp{0}{0}}{Int},\\ -&\SubstType{\ttgp{0}{1}}{String} -}{\\ -&\SubstConf{\ttgp{0}{0}}{Int}{Hashable} -}\\ -{} = \texttt{Dictionary} +\Sigma := \{&\SubstType{\rT}{Set},\\ +&\SubstConf{\rT}{Set}{Sequence}\} \end{align*} -\paragraph{The identity substitution map.} -What is the context substitution map of a type declaration's declared interface type? By definition, if $\Sigma$ is the context substitution map of $\tXd$, then $\tXd\otimes\Sigma=\tXd$; it leaves the declared interface type unchanged. That is, this substitution map maps every generic parameter of the type declaration's generic signature to itself. If we look at the \texttt{Dictionary} type again, we can write down this substitution map: +Type substitution treats a dependent member type as an indivisible atomic element, and replaces it in one shot with a substituted type. (We do not recurse into the base type.) In \SecRef{abstract conformances}, we will see that the substituted type is computed from one of the conformances stored in the substitution map. We leave this unexplained for now: +\[ \texttt{Array<\rT.Element>} \otimes \Sigma = \texttt{Array} \] +\FigRef{subst fig dmt} illustrates the transformation. + +\paragraph{Archetypes.} In \ChapRef{chap:archetypes}, we will meet \index{archetype type}\emph{archetypes}, an alternate representation which combines a type parameter with a generic signature. There are three kinds of archetype. A \emph{primary} archetype acts like the type parameter it represents, as far as type substitution is concerned. Substitution of \emph{opaque} and \emph{existential} archetypes is different, and will be described in \SecRef{opaquearchetype} and \SecRef{open existential archetypes}. + +\paragraph{Substitution failure.} +Type substitution outputs an error type if the original type contains a type parameter that is not valid in the substitution map's input generic signature. Type substitution might also produce an error type if the substitution map's replacement types contain error types, or if one of the conformances stored in the substitution map is invalid. This is called \IndexDefinition{substitution failure}\emph{substitution failure}, and it indicates an error was diagnosed elsewhere in the user's program. The compiler does not proceed to \index{SILGen}SILGen if errors were diagnosed, and so error types should not appear after type checking. + +\paragraph{Other requirements.} +Conformance requirements are special, because a substitution map directly records how it fulfills each conformance requirement. Other requirements are instead conditions to check. For example, if the input generic signature states a \index{same-type requirement!type substitution}same-type requirement, then a substitution map for this generic signature must \emph{satisfy} this requirement, in the sense that applying the substitution map to both sides should produce canonically equal substituted types. A substitution map is \emph{well-formed} if it satisfies all derived requirements of its input generic signature. We will discuss the algorithm for checking if requirements are satisfied in \SecRef{checking generic arguments}. + +\section{Nominal Types}\label{contextsubstmap} + +We're now going to look at the relationship between nominal type declarations, nominal types, and substitution maps. Consider these declarations: +\begin{Verbatim} +struct Bacon { + struct Lettuce { + struct Tomato { + var t: T + var u: U + var v: V + } + } +} +\end{Verbatim} +(We will discuss the stored properties of \texttt{Tomato} shortly.) Given these declarations, we can then write down various nominal types, for example the type of ``\texttt{x}'' below: +\begin{Verbatim} +let x: Bacon.Lettuce.Tomato = ... +\end{Verbatim} +In the implementation, \texttt{Bacon} is classified as a \index{generic nominal type}generic nominal type, while both \texttt{Int} and \texttt{Bacon.Lettuce.Tomato} are ``just'' \index{nominal type}nominal types. However, the latter's \index{parent type}\emph{parent} is a generic nominal type. We say that a \IndexDefinition{specialized type}\emph{specialized type} is a nominal type that is either itself generic, or has a generic parent. Equivalently, a nominal type is specialized if its declaration has a non-empty generic signature. + +\begin{figure}[b!]\captionabove{Applying the context substitution map}\label{context sub map fig} +\begin{center} +\begin{tabular}{c} +\textbf{Declared interface type:}\\[\medskipamount] +\begin{tikzpicture}[node distance=0.4cm] +\node (Tomato) [type] {\texttt{\vphantom{y}Bacon<\rT, \rU>.Lettuce<\ttgp{1}{0}>.Tomato}}; + +\node (Lettuce) [type, below=of Tomato] {\texttt{\vphantom{y}Bacon<\rT, \rU>.Lettuce<\ttgp{1}{0}>}}; + +\node (Bacon) [type, below=of Lettuce] {\texttt{\vphantom{y}Bacon<\rT, \rU>}}; +\node (V) [type, fill=light-gray, right=of Lettuce, xshift=20pt] {\texttt{\vphantom{y}\ttgp{1}{0}}}; + +\node (Dummy) [left=of Lettuce, xshift=-20pt] {\texttt{\phantom{Float}}}; + +\node (U) [type, fill=light-gray, below=of V] {\texttt{\vphantom{y}\rT}}; +\node (T) [type, fill=light-gray, below=of U] {\texttt{\vphantom{y}\rU}}; + +\draw [arrow] (Tomato) -- (Lettuce); +\draw [arrow] (Lettuce) -- (Bacon); +\draw [arrow] (Lettuce) -- (V); + +\draw [arrow] (Bacon) -- (U); +\draw [arrow] (Bacon.east) ++ (0, -0.1) -- ++ (0.5, 0) |- (T); +\end{tikzpicture}\\ +\textbf{Specialized type:}\\[\medskipamount] +\begin{tikzpicture}[node distance=0.4cm] +\node (Tomato) [type] {\texttt{\vphantom{y}Bacon.Lettuce.Tomato}}; + +\node (Lettuce) [type, below=of Tomato] {\texttt{\vphantom{y}Bacon.Lettuce}}; + +\node (Bacon) [type, below=of Lettuce] {\texttt{\vphantom{y}Bacon}}; +\node (Float) [type, right=of Lettuce, xshift=28pt] {\texttt{\vphantom{y}Float}}; + +\node (Dummy) [left=of Lettuce, xshift=-28pt] {\texttt{\phantom{Float}}}; + +\node (Int) [type, below=of Float] {\texttt{\vphantom{y}Int}}; +\node (Bool) [type, below=of Int] {\texttt{\vphantom{y}Bool}}; + +\draw [arrow] (Tomato) -- (Lettuce); +\draw [arrow] (Lettuce) -- (Bacon); +\draw [arrow] (Lettuce) -- (Float); + +\draw [arrow] (Bacon) -- (Int); +\draw [arrow] (Bacon.east) ++ (0, -0.1) -- ++ (0.8, 0) |- (Bool); +\end{tikzpicture} +\end{tabular} +\end{center} +\end{figure} + +If we take the \index{generic argument}generic arguments from each level of nesting in the specialized type \texttt{Bacon.Lettuce.Tomato}, we can form a substitution map for the generic signature of the declaration of \texttt{Tomato}, which is \texttt{<\rT, \rU, \ttgp{1}{0}>}: +\[ +\Sigma := \SubstMap{ \SubstType{\rT}{Int},\, \SubstType{\rU}{Bool},\, \SubstType{\ttgp{1}{0}}{Float} } +\] +Recall from \ChapRef{chap:decls} that every nominal type declaration has a \index{declared interface type!nominal type declaration}declared interface type. In the case of \texttt{Tomato}, this is \texttt{Bacon<\rT, \rU>.Lettuce<\ttgp{1}{0}>.Tomato}. Now, we see that when we apply our substitution map $\Sigma$ to the declared interface type of \texttt{Tomato}, we get back our specialized type: +\begin{gather*} +\texttt{Bacon<\rT, \rU>.Lettuce<\ttgp{1}{0}>.Tomato} \otimes \Sigma \\ +{} \qquad = \texttt{Bacon.Lettuce.Tomato} +\end{gather*} +We say that $\Sigma$ is the \IndexDefinition{context substitution map}\emph{context substitution map} of our specialized type. \FigRef{context sub map fig} illustrates the transformation. + +\paragraph{Context substitution map.} +Suppose we have two generic signatures $G$ and $H$, and a \index{nominal type declaration!generic arguments}nominal type declaration $d$ with generic signature $G$. Let $\tXd$ denote the \index{declared interface type!nominal type declaration}declared interface type of $d$. If we are given some other specialized type formed from $d$, call it $\tX\in\TypeObj{H}$, then the context substitution map of \tX\ is the unique substitution map $\Sigma\in\SubMapObj{G}{H}$ that satisfies the identity: +\[ \tXd \otimes \Sigma = \tX \] +The context substitution map is more than just a mathematical trick, because it explains the type of a \index{member reference expression}member reference expression like ``\texttt{foo.bar}''. Recall that \texttt{Tomato} declared three stored properties named \texttt{t}, \texttt{u}, and \texttt{v}, with interface types \rT, \rU, and \ttgp{1}{0}, respectively. Consider this snippet: +\begin{Verbatim} +let x: Bacon.Lettuce.Tomato = ... +let xt = x.t +let xu = x.u +let xv = x.v +\end{Verbatim} +To deduce the inferred types of \texttt{xt}, \texttt{xu}, and \texttt{xv}, we can apply our substitution map~$\Sigma$ to the interface type of each stored property: +\begin{gather*} +\rT \otimes \Sigma = \texttt{Int}\\ +\rU \otimes \Sigma = \texttt{Bool}\\ +\ttgp{1}{0} \otimes \Sigma = \texttt{Float} +\end{gather*} + +\paragraph{Identity substitution map.} +What is the context substitution map of $\tXd$, the declared interface type of $d$? If $\Sigma$ satisfies $\tXd = \tXd \otimes \Sigma$, then in particular, $\Sigma$ replaces every generic parameter of $G$ with itself. We call this the \IndexDefinition{identity substitution map}\emph{identity substitution map} for~$G$ and we denote it by~\index{$1_G$}\index{$1_G$!z@\igobble|seealso{identity substitution map}}$1_G$. Note that $1_G \in \SubMapObj{G}{G}$. For example, if $G$ is the generic signature of \texttt{Tomato} from the above, then: +\[ +1_G := \SubstMap{\SubstType{\rT}{\rT},\,\SubstType{\rU}{\rU},\,\SubstType{\ttgp{1}{0}}{\ttgp{1}{0}}} +\] +If $G$ states one or more conformance requirements, the identity substitution map for~$G$ will also record a series of \index{abstract conformance!identity substitution map}\emph{abstract conformances} corresponding to each conformance requirement in $G$. This ensures that applying $1_G$ to a \index{dependent member type!identity substitution map}dependent member type leaves it unchanged. We will discuss abstract conformances in \SecRef{abstract conformances}. + +For example, if $G := \texttt{<\rT, \rU\ where \rT:~Sequence, \rU:~Sequence>}$: \begin{align*} -\texttt{Dictionary<\ttgp{0}{0}, \ttgp{0}{1}>}\otimes -\SubstMapC{ -&\SubstType{\ttgp{0}{0}}{\ttgp{0}{0}},\\ -&\SubstType{\ttgp{0}{1}}{\ttgp{0}{1}} -}{\\ -&\SubstConf{\ttgp{0}{0}}{\ttgp{0}{0}}{Hashable} -}\\ -{} = \texttt{Dictionary<\ttgp{0}{0}, \ttgp{0}{1}>} +1_G := \SubstMapC{&\SubstType{\rT}{\rT}}{\\ +&\SubstConf{\rT}{\rT}{Sequence},\\ +&\SubstConf{\rU}{\rU}{Sequence}} \end{align*} -This is called the \IndexDefinition{identity substitution map}\emph{identity substitution map} for this generic signature; every generic signature has one. We denote the identity substitution map of a generic signature $G$ by \index{$1_G$}\index{$1_G$!z@\igobble|seealso{identity substitution map}}$1_G$. Then, $1_G\in\SubMapObj{G}{G}$, and if $\texttt{T}\in\TypeObj{G}$, we have -\[\tX \otimes 1_G = \tX\] -Applying the identity substitution map to any interface type leaves it unchanged, with three caveats: + +In general, if \tX\ is any \index{interface type!identity substitution map}interface type in $\TypeObj{G}$, then $\tX \otimes 1_G = \tX$. In other words, applying the identity substitution map to any interface type of $G$ leaves it unchanged. (This does not hold when the original type is a \index{contextual type}contextual type that contains archetypes; such a type is not an element of $\TypeObj{G}$. We will discuss archetype substitution in \SecRef{archetypesubst}, and encounter another substitution map, the \emph{forwarding substitution map}, which plays the role of the identity on contextual types.) + +\paragraph{Empty substitution map.} +A nominal type declaration like \texttt{Int}, with an \index{empty generic signature}empty generic signature, only declares a single nominal type with no generic arguments. The \index{context substitution map!non-generic type}context substitution map of this type is called the \IndexDefinition{empty substitution map}\emph{empty substitution map}, because it does not store any replacement types or conformances. We denote it by $\SubstMap{}$. + +This is the \emph{only} possible substitution map for the empty generic signature. If $G$ is the empty generic signature, the set of \index{interface type!empty generic signature}interface types $\TypeObj{G}$ is the set of \index{fully-concrete type}fully-concrete types that do not contain any type parameters. The empty substitution map leaves such a type unchanged, so for example, $\texttt{Int}\otimes\SubstMap{} = \texttt{Int}$. + +The empty substitution map should not be confused with the identity substitution map of a non-empty generic signature. If the original type contains any type parameters whatsoever, applying the empty substitution map will replace them with \index{error type}error types: +\[\texttt{\rT.Element} \otimes \SubstMap{} = \texttt{<>}\] + +\section{Nested Nominal Types}\label{nested nominal types} + +We've seen that generic signatures and substitution maps use a flat representation that collects all outer generic parameters together, while \index{nominal type}nominal types have a recursive structure that reflects the lexical nesting of their nominal type declarations. When we build the \index{context substitution map}context substitution map for a nominal type, we must be able to translate between these two representations. In particular, we must be able to recover a generic argument for each generic parameter of the signature. This imposes some \index{limitation!nominal type nesting}restrictions on how \index{nested type declaration}nominal type declarations can nest: \begin{enumerate} -\item The interface type must only contain type parameters which are valid in the input generic signature $G$ of this identity substitution map $1_G$. -\item Substitution might change type sugar, because generic parameters appearing in the original interface type might be sugared differently than the input generic signature of this identity substitution map. Therefore, canonical equality of types is preserved, not necessarily pointer equality. -\item We won't talk about archetypes until \ChapRef{genericenv}, but you may have met them already. Applying the identity substitution map to a contextual type containing archetypes replaces the archetypes with equivalent type parameters. There is a corresponding \emph{forwarding substitution map} which maps all generic parameters to archetypes; the forwarding substitution map acts as the identity in the world of contextual types. +\item Structs, enums and classes cannot nest in generic \index{local declaration context}local contexts. +\item Structs, enums and classes cannot nest in protocols or \index{protocol extension}protocol extensions. +\item Protocols cannot nest in other generic contexts. \end{enumerate} -\paragraph{The empty substitution map.} -The \index{empty generic signature}empty generic signature only has a single unique substitution map, the \IndexDefinition{empty substitution map}\emph{empty substitution map}, so the context substitution map of a non-specialized nominal type is the empty substitution map. In our notation, the empty substitution map is denoted $\SubstMap{}$. The only valid interface types of the empty generic signature are the \index{fully-concrete type}fully-concrete types. The action of the empty substitution map leaves fully-concrete types unchanged, so for example, $\texttt{Int}\otimes\SubstMap{} = \texttt{Int}$. +\paragraph{Types in generic local contexts.} This restriction stems from the fact that a nominal type declaration can only encode generic arguments for outer nominal type contexts. If the parent context of a nominal type declaration is a generic context that is not a type, the \index{local type declaration}nominal type declaration only declares a single nominal type without any generic arguments. This prevents us from being able to properly model the following code, which is rejected today: +\begin{Verbatim} +func f(t: T) { + struct Nested { // error + let t: T -The empty substitution map $\SubstMap{}$ is almost never the same as the identity substitution map $1_G$. In fact, they only coincide if $G$ is the empty generic signature. Applying the empty substitution map to an interface type containing type parameters is a substitution failure and returns an error type. -\[\texttt{\ttgp{0}{0}.[Sequence]Element} \otimes \SubstMap{} = \texttt{<>}\] + func printT() { + print(t) + } + } + + Nested(t: t).printT() +} +\end{Verbatim} +The \texttt{Nested} local type declaration has a stored property of type \rT, the generic parameter of the outer declaration \texttt{f()}. However, \texttt{Nested} only declares a single nominal type, also written as \texttt{Nested}. This type does not have a parent type or any generic arguments, because it is not nested inside of another nominal type. This means we have to way to fill in the replacement type for \rT\ in the \index{context substitution map!of local type}context substitution map: +\[ +\texttt{Nested} \otimes \SubstMap{\rT\mapsto\text{???}} = \texttt{Nested} +\] +If we call our function \texttt{f()} with two generic argument types, the \index{SIL optimizer}SIL optimizer may decide to specialize one or both calls to \texttt{f()}: +\begin{Verbatim} +func g() { + f(t: 123) + f(t: "hello") +} +\end{Verbatim} +We will form a substitution map replacing \rT\ with \texttt{Int} or \texttt{String}, respectively, and apply this substitution map to every SIL instruction appearing in the body of \texttt{f()}. To ensure the stored property access has the correct substituted type, we must be able to represent the two distinct \index{specialized type}specializations of \texttt{Nested}. Thus, we actually want the declared interface type of \texttt{Nested} must to store a generic argument for the outer generic parameter, so notionally, we want to be able to write something like: +\[ +\texttt{<\rT>.Nested} \otimes \SubstMap{\SubstType{\rT}{Int}} = \texttt{.Nested} +\] +The representation of \index{runtime type metadata}runtime type metadata is already designed to store a flat list of generic arguments, just like a generic signature or substitution map. So while lifting this restriction would require some engineering effort on the compiler side, it would be a backward-deployable and \index{ABI}ABI-compatible change. -\section{Composing Substitution Maps}\label{submapcomposition}\label{classinheritance} +\paragraph{Types in protocol contexts.} We do not allow struct, enum, and class declarations to appear inside protocols and protocol extensions today. There are two ways to lift this restriction, and they differ in whether the nominal type declaration should capture the \IndexSelf protocol \tSelf\ type. Consider this protocol, and the protocol extension that follows. Note that \texttt{Box} has a stored property of type \texttt{\rT.Contents}, because the declaration of the \texttt{Contents} associated type is visible from \texttt{Holder}'s lexical scope: +\begin{Verbatim} +protocol Holder { + associatedtype Contents + var contents: Contents { get } +} + +extension Holder { + struct Box { // error today + let contents: Contents // depends on the outer Self + } + + var box: Box { + return Box(contents: contents) + } +} +\end{Verbatim} +Today, we reject the nested struct \texttt{Box}, but one possible interpretation would allow this code. In this model, the declared interface type of \texttt{Box} is something like ``\texttt{\rT.Box},'' but that's not a dependent member type; it's a nominal type whose \index{parent type}parent type is a \emph{type parameter}. Every nominal type that conforms to \texttt{Holder} would gain a new member type declaration named \texttt{Box}, and the generic argument for \tSelf\ would be this parent type. That is, if we conform a type \texttt{Foo} to \texttt{Holder} and call \texttt{box()}, we would receive a value of type \texttt{Foo.Box}: +\begin{Verbatim} +struct Foo: Holder { + typealias Contents = Int +} -\iffalse +let x: Foo.Box = Foo.box(123) // imaginary +\end{Verbatim} +The context substitution map of \texttt{Foo.Box} would then be the \index{protocol substitution map}protocol substitution map of the conformance: +\[ +\texttt{\rT.Box} \otimes \SubstMapC{\SubstType{\rT}{Foo}}{\SubstConf{\rT}{Foo}{Holder}} = \texttt{Foo.Box} +\] +In this model, it would not make sense to reference \texttt{Box} as a member of the protocol itself; that is, \texttt{Holder.Box} would not be valid. -\SecRef{abstract conformances} talks about composition with root conformances +The alternative would prohibit the nested type declaration from capturing the protocol \tSelf\ type. In this case, the nested type declaration's generic signature would \emph{not} include the protocol \tSelf\ type, so \texttt{Box} would be disallowed as written above, because of its stored property. In this model, the protocol would simply act as a namespace, and the nested type would not depend on the protocol in any other way. a nested type could then be referenced as a member of the protocol type itself, like \texttt{Holder.Box}. -\fi +\paragraph{Protocols in generic contexts.} +Historically, protocols could only be declared at the top level of a source file. In \IndexSwift{5.a@5.10}Swift~5.10 \cite{se0404}, this was relaxed to allow protocols to nest inside other declarations arbitrarily, as long as those declarations are not generic: +\begin{Verbatim} +enum E { + protocol P {} // allowed as of SE-0404 +} -Suppose that we have three generic signatures, $G$, $H$ and $I$, and a pair of substitution maps: $\Sigma_1\in\SubMapObj{G}{H}$, and $\Sigma_2\in\SubMapObj{H}{I}$. If we start with an interface type $\tX\in\TypeObj{G}$, then $\tX\otimes\Sigma_1\in\TypeObj{H}$. If we then apply $\Sigma_2$ to $\tX\otimes\Sigma_1$, we get an interface type in $\TypeObj{I}$: +struct S: E.P {} +\end{Verbatim} +However, protocols are still prohibited from appearing within generic declarations: +\begin{Verbatim} +struct G(_: T) { + protocol P { // error + func f() -> T // because what would this mean? + } +} +\end{Verbatim} +If a protocol could depend on outer generic parameters in this way, its \index{protocol generic signature!generic protocol}protocol generic signature would contain those other parameters and their requirements, in addition to \IndexSelf\tSelf. \index{Haskell}Haskell calls this a \index{multi-parameter type class}\emph{multi-parameter type class}. + +Effectively, each specialization of a generic protocol would be a distinct type, and the same concrete conforming type could conform to multiple specializations of a generic protocol with different implementations of each conformance: +\begin{Verbatim} +struct S: G.P {...} +struct S: G.P {...} +\end{Verbatim} +This would be a major change. Today, a conformance requirement $\TP$ is effectively a unary predicate---a true or false statement---about this type parameter. A conformance to a generic protocol, on the other hand, is a more general kind of relation that relates \emph{multiple} type parameters which then all ``participate'' in the conformance. At the very least, this entails a complete rethink of the formal system from the previous chapter, as well as most of the material in \PartRef{part rqm}. To get another sense of the complexity this feature introduces, see~\cite{mptc}. + +\section{Composition}\label{sec:composition} + +Suppose that we have three generic signatures $G$, $H$, $I$, and a pair of substitution maps $\Sigma_1\in\SubMapObj{G}{H}$, $\Sigma_2\in\SubMapObj{H}{I}$. If we take any interface type $\tX\in\TypeObj{G}$, we can apply $\Sigma_1$ to $\tX$, and we get $\tX\otimes\Sigma_1\in\TypeObj{H}$. If we then apply $\Sigma_2$ to $\tX\otimes\Sigma_1$, we get the following element of $\TypeObj{I}$: \[(\tX\otimes\Sigma_1)\otimes\Sigma_2\] -The \IndexDefinition{substitution map composition}\emph{composition} of the substitution maps $\Sigma_1$ and $\Sigma_2$, denoted by \index{$\otimes$}$\Sigma_1\otimes\Sigma_2$, is the unique substitution map which satisfies the following equation for all $\tX\in\TypeObj{F}$: +Now, consider how we might get from \tX\ to $(\tX\otimes\Sigma_1)\otimes\Sigma_2$ in one step. +\begin{definition}\label{subst map composition} +The \index{composition!of substitution maps}\IndexDefinition{substitution map composition}\emph{composition} of the substitution maps $\Sigma_1$ and $\Sigma_2$, denoted by \index{$\otimes$}$\Sigma_1\otimes\Sigma_2$, is defined as the unique substitution map which satisfies the following identity for all $\tX\in\TypeObj{G}$: \[\tX\otimes(\Sigma_1\otimes\Sigma_2):=(\tX\otimes\Sigma_1)\otimes\Sigma_2\] -That is, applying the composition of two substitution maps is the same as applying the first substitution map followed by the second. Since $(\tX\otimes\Sigma_1)\otimes\Sigma_2\in\TypeObj{I}$, we see that $\Sigma_1\otimes\Sigma_2\in\SubMapObj{G}{I}$; the \index{input generic signature}input generic signature of the composition is the input generic signature of the first substitution map, and the output generic signature of the composition is the \index{output generic signature}output generic signature of the second. Substitution map composition can thus be understood as a function between sets: + +Since $(\tX\otimes\Sigma_1)\otimes\Sigma_2\in\TypeObj{I}$, it follows that $\Sigma_1\otimes\Sigma_2\in\SubMapObj{G}{I}$; that is, the \index{input generic signature}input generic signature of $\Sigma_1 \otimes \Sigma_2$ is the input generic signature of $\Sigma_1$, and the output generic signature of $\Sigma_1 \otimes \Sigma_2$ is the \index{output generic signature}output generic signature of $\Sigma_2$. Substitution map composition gives us this mapping of sets: \[\SubMapObj{G}{H}\otimes\SubMapObj{H}{I}\longrightarrow\SubMapObj{G}{I}\] +\end{definition} -To understand how the composition $\Sigma_1\otimes\Sigma_2$ is actually constructed from $\Sigma_1$ and $\Sigma_2$ in the implementation, we decompose $\Sigma_1$ by applying it to each \index{generic parameter type!type substitution}generic parameter of generic signature $F$: -\[\Sigma_1 := \SubstMap{\SubstType{\ttgp{0}{0}}{$\ttgp{0}{0}\otimes\Sigma_1$},\,\ldots}\] -This looks like a circular definition, but what it really says is that the behavior of $\Sigma_1$ is completely determined by these primitive elements of its input generic signature. Now, we define $\Sigma_1\otimes\Sigma_2$ by applying $\Sigma_2$ to each element of $\Sigma_1$: +Simply stated, applying the composition of two substitution maps to an interface type produces the same result as first applying the first substitution map, followed by the second. Now we will see how to construct $\Sigma_1 \otimes \Sigma_2$. First, suppose that~$G$, the input generic signature of~$\Sigma_1$, does not state any conformance requirements, so the behavior of~$\Sigma_1$ is completely determined by its replacement types: +\[\Sigma_1 := \SubstMap{\ldots,\,\SubstType{\ttgp{d}{i}}{$\ttgp{d}{i}\otimes\Sigma_1$},\,\ldots}\] +To construct $\Sigma_1 \otimes \Sigma_2$, we apply $\Sigma_2$ to each replacement type of $\Sigma_1$: \[ \Sigma_1\otimes\Sigma_2 := \SubstMap{ -\SubstType{\ttgp{0}{0}}{$\bigl((\ttgp{0}{0}\otimes\Sigma_1)\otimes\Sigma_2\bigr)$},\, +\ldots,\, +\SubstType{\ttgp{d}{i}}{$\bigl((\ttgp{d}{i}\otimes\Sigma_1)\otimes\Sigma_2\bigr)$},\, \ldots } \] -Under this definition, if we take a generic parameter \ttgp{d}{i} of $G$, we see that $\Sigma_1\otimes\Sigma_2$ satisfies the necessary identity: +It is immediate that the following identity now holds all generic parameters $\ttgp{d}{i}$ of $G$, which completely determines $\Sigma_1 \otimes \Sigma_2$: \[ \ttgp{d}{i}\otimes(\Sigma_1\otimes\Sigma_2)=(\ttgp{d}{i}\otimes\Sigma_1)\otimes\Sigma_2 \] -Since the behavior of $\Sigma_1\otimes\Sigma_2$ is completely determined by its replacement types, this is actually true for any interface type $\tX\in\TypeObj{G}$: -\[\tX\otimes(\Sigma_1\otimes\Sigma_2)=(\tX\otimes\Sigma_1)\otimes\Sigma_2\] + +In the general case where $G$ states one or more conformance requirements, $\Sigma_1$ also stores a list of \index{root conformance!substitution map composition}\emph{root conformances}. In this case, substitution map composition also needs to apply $\Sigma_2$ to each root conformance of $\Sigma_1$ to properly form the new substitution map. Conformance substitution will be explained in \SecRef{conformance subst}. \newcommand{\FirstMapInExample}{\SubstMap{ -\SubstType{T}{Array},\,\SubstType{U}{A} +\SubstType{\rT}{Optional<\rT>},\,\SubstType{\rU}{Bool} }} \newcommand{\SecondMapInExample}{\SubstMap{ -\SubstType{A}{Int} +\SubstType{\rT}{Int} }} \newcommand{\ThirdMapInExample}{\SubstMap{ -\SubstType{T}{Array},\,\SubstType{U}{Int} +\SubstType{\rT}{Optional},\,\SubstType{\rU}{Bool} }} -\begin{listing}\captionabove{Motivating substitution map composition}\label{composesubstmaplisting} +\begin{example}\label{composesubstmapexample} +Substitution map composition can help us reason about the types of chained \index{member reference expression}member reference expressions, like the type of ``\texttt{x}'' below: \begin{Verbatim} -struct Outer { - var inner: Inner, A> +struct Outer { + var inner: Inner, Bool> } struct Inner { - var value: (T) -> U + var value: (T, U) } let outer: Outer = ... -let x = outer.inner.value +let x = outer.inner.value // What is the type of `x'? \end{Verbatim} -\end{listing} -\begin{example}\label{composesubstmapexample} -\index{expression} -\ListingRef{composesubstmaplisting} shows an example where substitution map composition can help reason about the types of chained \index{member reference expression}member reference expressions. The \texttt{inner} stored property of \texttt{Outer} has type \texttt{Inner, A>}. Here is the context substitution map of this type, which we will refer to as $\Sigma_1$: +First, consider the type of the ``\texttt{inner}'' stored property of \texttt{Outer}. Let's denote its context substitution map by $\Sigma_1$: \[ \Sigma_1 := \FirstMapInExample \] -The interface type of the \texttt{inner} stored property is a specialization of the nominal type \texttt{Inner} with generic signature \texttt{}, so the input generic signature of $\Sigma_1$ is \texttt{}. The interface type of \texttt{inner} is declared inside the nominal type \texttt{Outer} with generic signature \texttt{}, so the output generic signature of $\Sigma_1$ is \texttt{}. +Note that the input generic signature of $\Sigma_1$ is the generic signature of \texttt{Inner}, which is the declaration being referenced, and the output generic signature of $\Sigma_1$ is the generic signature of \texttt{Outer}, the declaration the reference appears in. -Now, let's look at the \texttt{outer} global variable. It has the type \texttt{Outer}, with the following context substitution map, which we denote as $\Sigma_2$: +Now, consider the type of the ``\texttt{outer}'' global variable. We will denote its context substitution map by $\Sigma_2$: \[ \Sigma_2 := \SecondMapInExample \] -The input generic signature of $\Sigma_2$ is \texttt{}, the generic signature of \texttt{Outer}. The output generic signature of $\Sigma_2$ is the empty generic signature, because its replacement types are fully concrete. We can compose $\Sigma_1$ and $\Sigma_2$, because the output generic signature of $\Sigma_1$ is the same as the input generic signature of $\Sigma_2$: -\[\Sigma_1\otimes\Sigma_2 = \FirstMapInExample\otimes\SecondMapInExample = \ThirdMapInExample\] - -Now, the substituted type of \texttt{outer.inner.value} can be derived from the interface type of \texttt{value} in two equivalent ways: -\begin{enumerate} -\item By applying $\Sigma_1$ to \verb|(T) -> U| and then applying $\Sigma_2$ to the result: -\begin{gather*} -(\texttt{(T) -> U}\otimes\Sigma_1)\otimes \Sigma_2\\ -\qquad {} = \texttt{(Array) -> A}\otimes \Sigma_2\\ -\qquad {} = \texttt{(Array) -> Int} -\end{gather*} -\item By applying the composition $\Sigma_1\otimes\Sigma_2$ to \texttt{(T) -> U}: +The input generic signature of $\Sigma_2$ is the generic signature of \texttt{Outer}, which is the output generic signature of $\Sigma_1$, so we can compose the two substitution maps: +\[\Sigma_1\otimes\Sigma_2 = \ThirdMapInExample\] +Finally, consider the interface type of the ``\texttt{value}'' stored property of \texttt{Inner}. This is the tuple type \texttt{(\rT, \rU)}. The type of the expression ``\texttt{outer.inner.value}'' can be derived from this original type in two ways: we can apply each substitution map in turn, or we can apply their composition: \begin{gather*} -\texttt{(T) -> U}\otimes(\Sigma_1\otimes \Sigma_2)\\ -\qquad {} = \texttt{(T) -> U}\otimes \ThirdMapInExample\\ -\qquad {} = \texttt{(Array) -> Int} +(\texttt{(\rT, \rU)}\otimes\Sigma_1)\otimes \Sigma_2\\ +\qquad {} = \texttt{(Optional<\rT>, \rU)}\otimes \Sigma_2\\ +\qquad {} = \texttt{(Optional, Bool)}\\[\medskipamount] +\texttt{(\rT, \rU)}\otimes(\Sigma_1\otimes \Sigma_2)\\ +\qquad {} = \texttt{(\rT, \rU)}\otimes \ThirdMapInExample\\ +\qquad {} = \texttt{(Optional, Bool)} \end{gather*} -\end{enumerate} -The final substituted type, \texttt{(Array) -> Int}, is the same in both cases. \end{example} -If $\Sigma\in\SubMapObj{G}{H}$, then the identity substitution maps $1_G$ and $1_H$ have a natural behavior under substitution map composition: -\[1_G\otimes\Sigma = \Sigma\otimes 1_H = \Sigma\] -The second identity carries the same caveat as the identity $\tX\otimes 1_G=\tX$ does for types; it is only true if the replacement types of $\Sigma$ are interface types. If the replacement types are contextual types, they will map to interface types, as we will explain in \SecRef{archetypesubst}. -\begin{example} -Recall the generic signatures $G$ and $H$, and the substitution map $\Sigma_1 := \FirstMapInExample\in\SubMapObj{G}{H}$ from \ExRef{composesubstmapexample}. We can write down the identity substitution maps $1_G$ and $1_H$: -\begin{gather*} -1_G := \SubstMap{\SubstType{T}{T},\,\SubstType{U}{U}}\\ -1_H := \SubstMap{\SubstType{A}{A}} -\end{gather*} -Now, one can verify that both of these hold: -\begin{gather*} -\SubstMap{\SubstType{T}{T},\,\SubstType{U}{U}}\otimes\FirstMapInExample=\FirstMapInExample\\ -\FirstMapInExample\otimes\SubstMap{\SubstType{A}{A}}=\FirstMapInExample -\end{gather*} -Thus $1_G\otimes\Sigma_1 = \Sigma_1\otimes 1_H=\Sigma_1$. Note that the left and right identity substitution maps are different in this case, because the input and output generic signatures of $\Sigma_1$ are different. -\end{example} -\index{associative operation} -One final rule here is that substitution map composition is \emph{associative}. This means that both possible ways of composing three substitution maps will output the same result: + +\paragraph{Commutative diagrams.} +A \IndexDefinition{commutative diagram}\emph{commutative diagram} exhibits a collection of objects and operations, where each operation is represented by a labeled arrow. When we say this diagram \emph{commutes}, we mean that whenever we have two paths with the same source and destination, we can follow the composition of operations given by either path and arrive at the same result. For example, we can illustrate \DefRef{subst map composition} with this diagram: +\begin{center} +\begin{tikzcd}[column sep=2cm, row sep=2cm] +\tX \arrow[r, "\Sigma_1"] \arrow[rd, "\Sigma_1 \otimes \Sigma_2"'] &{\tX \otimes \Sigma_1}\arrow[d, "\Sigma_2"]\\ +&{(\tX \otimes \Sigma_1) \otimes \Sigma_2} +\end{tikzcd} +\end{center} +We will see more interesting commutative diagrams later. + +\paragraph{Identity substitution map.} +We saw that every generic signature has an identity substitution map in \SecRef{contextsubstmap}. The identity substitution map comes up whenever we forward the generic parameters as arguments to another generic declaration. Now, we consider how the identity behaves under composition. There are two possibilities. + +If $\Sigma\in\SubMapObj{G}{H}$, we can compose $\Sigma$ with the \index{identity substitution map}identity substitution map $1_G$ on the left. This applies $\Sigma$ to each replacement type of $1_G$, which projects each replacement type of $\Sigma$ in turn; we assemble a new substitution map identical to $\Sigma$: \[ -(\Sigma_1\otimes\Sigma_2)\otimes\Sigma_3=\Sigma_1\otimes(\Sigma_2\otimes\Sigma_3) +1_G \otimes \Sigma = \Sigma\tag{1} \] -Putting everything together, if \tX is some type, all of the following are equivalent when defined (and by our compatibility conditions, if one is defined, all are): +We can also compose $\Sigma$ with $1_H$ on the right. This applies $1_H$ to each replacement type of $\Sigma$. From our earlier discussion of the identity substitution map, we know this leaves the replacement types of $\Sigma$ unchanged, so again we get: +\[ +\Sigma\otimes 1_H = \Sigma\tag{1} +\] + +A remark about \index{primary archetype}primary archetypes, which we won't introduce until \ChapRef{chap:archetypes}. Our definition of $\SubMapObj{G}{H}$ was crafted to exclude substitution maps whose replacement types contain primary archetypes. If we also consider such substitution maps, then (1) still holds, but (2) does not. In \SecRef{archetypesubst}, we will see that a different substitution map, the \emph{forwarding substitution map}, acts as the identity element on such types, that we will refer to as \emph{contextual types}. + +\paragraph{Order of operations.} +A fact we will state without proof is that substitution map composition is an \index{associative operation}\emph{associative} operation. That is, if $\Sigma_1$, $\Sigma_2$, and $\Sigma_3$ are any three substitution maps such that all below compositions are defined, then: +\[ +(\Sigma_1 \otimes \Sigma_2) \otimes \Sigma_3 = \Sigma_1 \otimes (\Sigma_2 \otimes \Sigma_3) +\] +For example, this means that when \tX\ is any interface type, all of the following expressions must produce the same substituted type: \begin{gather*} ((\tX\otimes\Sigma_1)\otimes\Sigma_2)\otimes\Sigma_3\\ (\tX\otimes\Sigma_1)\otimes(\Sigma_2\otimes\Sigma_3)\\ @@ -342,419 +601,786 @@ \section{Composing Substitution Maps}\label{submapcomposition}\label{classinheri \tX\otimes((\Sigma_1\otimes\Sigma_2)\otimes\Sigma_3)\\ \tX\otimes(\Sigma_1\otimes(\Sigma_2\otimes\Sigma_3)) \end{gather*} -Thus, our type substitution algebra allows us to omit grouping parentheses without introducing ambiguity: +Thus, we can omit parentheses without ambiguity in our notation: \[\tX\otimes\Sigma_1\otimes\Sigma_2\otimes\Sigma_3\] -A \IndexDefinition{commutative diagram}\emph{commutative diagram} is one where following the chain of operations along two paths with the same start and end always produces the same result. - \paragraph{Categorically speaking.} -A \IndexDefinition{category}\emph{category} is a collection of \IndexDefinition{object}\emph{objects} and \IndexDefinition{morphism}\emph{morphisms}. (Very often the morphisms are \index{function}functions of some sort, but they might also be completely abstract.) Each morphism is associated with a pair of objects, the \emph{source} and \emph{destination}. The collection of morphisms with source $A$ and destination $B$ is denoted $\mathrm{Hom}(A,B)$. The morphisms of a category must obey certain properties: +We won't need this elsewhere, but it's worth mentioning briefly that a certain mathematical abstraction captures the properties of substitution map composition. A \IndexDefinition{category}\emph{category} is a collection of \IndexDefinition{object in category}\emph{objects} and \IndexDefinition{morphism in category}\emph{morphisms}. A morphism has a \emph{source} and \emph{destination} object, and the collection of morphisms with source $A$ and destination $B$ is denoted $\mathrm{Hom}(A,B)$. The morphisms must satisfy some axioms: \begin{enumerate} \item For every object $A$, there is an \IndexDefinition{identity morphism}\emph{identity morphism} $1_A\in\mathrm{Hom}(A, A)$. -\item If $f\in\mathrm{Hom}(A, B)$ and $g\in\mathrm{Hom}(B, C)$ are a pair of morphisms, there is a third morphism $g\circ f\in\mathrm{Hom}(A,C)$, called the \emph{composition} of $g$ with $f$. +\item If $f\in\mathrm{Hom}(A, B)$ and $g\in\mathrm{Hom}(B, C)$, there is a morphism $g\circ f\in\mathrm{Hom}(A,C)$, called the \index{composition!of morphisms}\emph{composition} of $g$ and $f$. (The conventional notation $g\circ f$ mirrors how we write $g(f(x))$ with function application on the left.) \item Composition respects the identity: if $f\in\mathrm{Hom}(A, B)$, then $f\circ 1_A=1_B\circ f=f$. \item Composition is associative: if $f\in\mathrm{Hom}(A, B)$, $g\in\mathrm{Hom}(B, C)$ and $h\in\mathrm{Hom}(C, D)$, then $h\circ(g\circ f)=(h\circ g)\circ f$. \end{enumerate} -We define \emph{the category of generic signatures} as follows: +For example, we could define \emph{the category of interface types} as follows: \begin{itemize} -\item The objects are generic signatures. -\item The morphisms are substitution maps (a technicality here is that their replacement types must not contain archetypes). -\item The source of a morphism (substitution map) is the input generic signature of the substitution map. -\item The destination of a morphism (substitution map) is the output generic signature of the substitution map. -\item The identity morphism is the identity substitution map (you will see later it does not act as the identity on archetypes, which is why we rule them out above). -\item The composition of morphisms $g\circ f$ is the composition of substitution maps $f\otimes g$ (note that we must reverse the order here for the definition to work). +\item The objects are the sets $\TypeObj{G}$, for each generic signature $G$. +\item Substitution maps are morphisms, and $\mathrm{Hom}$ is $\textsc{Sub}$. +\item The source of a morphism $\Sigma\in\SubMapObj{G}{H}$ is $\TypeObj{G}$. +\item The destination of a morphism $\Sigma\in\SubMapObj{G}{H}$ is $\TypeObj{H}$. +\item Identity substitution maps are identity morphisms. +\item Morphism composition $g \circ f$ is substitution map composition $f\otimes g$. \end{itemize} -Category theory often comes up in programming when working with data structures and higher-order functions; an excellent introduction to the topic is \cite{catprogrammer}. While we don't need to deal with categories in the abstract here, but we will encounter another idea from category theory, the commutative diagram, in \SecRef{type witnesses}. +This happens to be a \emph{concrete} category, where every object is a set and every morphism is a function between sets, but this need not be the case in general. In programming, we often encounter category theory when working with data structures and higher-order functions; an excellent introduction to the topic is \cite{catprogrammer}. + +\section{Subclassing}\label{classinheritance} + +The Swift language supports \IndexDefinition{subclassing}\emph{subclassing}, or \index{inheritance|see{subclassing}}\emph{inheritance}. After some preliminaries, we will use the type substitution algebra to describe the interaction between subclassing and generics. This will be useful when we discuss the semantics of \index{superclass requirements}superclass requirements in later chapters. Since superclass requirements are not an essential part of the core Swift generics model, the reader can safely skip this section on a first reading. -\section{Building Substitution Maps}\label{buildingsubmaps} +An inheritance relationship is established when some \index{class declaration!superclass type}class declaration, known as the \emph{subclass}, states a \IndexDefinition{superclass type}\emph{superclass type} in its \index{inheritance clause!class declaration}inheritance clause. Unlike \index{C++}C++, Swift does not allow multiple inheritance, so a class declaration can have at most one superclass type. Unlike \index{Java}Java, there is no distinguished \texttt{Object} class at the root of the hierarchy, so it is common to have class declarations with no superclass type at all. +\begin{example} +First, consider the case where the superclass and the subclass are both non-generic. In the below, \texttt{Apple} and \texttt{Banana} state the superclass type \texttt{Fruit}: +\begin{Verbatim} +class Fruit { + func eat() {...} + func juice() {...} +} -Now that we've seen how to get substitution maps from types, and how to compose existing substitution maps, it's time to talk about building substitution maps from scratch using the two variants of the \textbf{get substitution map} operation. +class Apple: Fruit { + override func eat() {...} +} -\IndexDefinition{get substitution map} -\index{serialized module} -\index{conforming type} -The first variant constructs a substitution map directly from its three constituent parts: a generic signature, an array of replacement types, and an array of conformances. The arrays must have the correct length---equal to the number of generic parameters and \index{conformance requirement!type substitution}conformance requirements, respectively. -This variant of \textbf{get substitution map} is used when constructing a substitution map from a deserialized representation, because a serialized substitution map is guaranteed to satisfy the above invariants. +class Banana: Fruit {} -\IndexDefinition{replacement type callback} -\index{type variable type} -\index{type parameter} -\index{archetype type} -\IndexDefinition{query substitution map callback} -\IndexDefinition{query type map callback} -The second variant takes the input generic signature and a pair of callbacks: -\begin{enumerate} -\item The \textbf{replacement type callback} maps a generic parameter type to a replacement type. It is invoked with each generic parameter type to populate the replacement types array. -\item The \textbf{conformance lookup callback} maps a protocol conformance requirement to a conformance. It is invoked with each conformance requirement to populate the conformances array. -\end{enumerate} -The conformance lookup callback takes three parameters: +let f: Fruit = Apple() // implicit conversion from Apple to Fruit +f.eat() // vtable-dispatched call to Apple.eat() +\end{Verbatim} +In the \index{expression type checker}expression type checker, \texttt{Apple}~and~\texttt{Banana} are \index{subtype}\emph{subtypes} of~\texttt{Fruit}. This allows us to assign the result of the \texttt{Apple()}~constructor call to our variable~\texttt{f}, whose type is~\texttt{Fruit}. We can visualize this with a \index{class hierarchy diagram}class hierarchy diagram: +\begin{center} +\begin{tikzpicture}[node distance=0.75cm] +\node (Fruit) [class] {\texttt{\vphantom{p}Fruit}}; +\node (Apple) [class, below left=of Fruit, xshift=2em] {\texttt{Apple}}; +\node (Banana) [class, below right=of Fruit, xshift=-2em] {\texttt{\vphantom{p}Banana}}; +\draw [arrow] (Apple) -- (Fruit); +\draw [arrow] (Banana) -- (Fruit); +\end{tikzpicture} +\end{center} +Qualified lookup walks up the class hierarchy, so a \index{qualified lookup!superclass}qualified lookup into \texttt{Apple} or \texttt{Banana} will also find members of \texttt{Fruit}. Furthermore, subclasses can~\IndexDefinition{override}\emph{override} \index{function declaration!override}methods, \index{variable declaration!override}properties, and \index{subscript declaration!override}subscripts declared in their superclass. In the above, \texttt{Apple} overrides \texttt{Fruit.eat()} with its own implementation. + +An instance of~\texttt{Fruit} might actually be an~\texttt{Apple} at run time, so a call to \texttt{eat()} on a value type of \texttt{Fruit} must be dispatched through a \IndexDefinition{vtable}\emph{vtable} stored in the object header. The vtable for~\texttt{Fruit} points at \texttt{Fruit}'s original implementation of \texttt{eat()} and \texttt{juice()}, while the vtable for \texttt{Apple} replaces \texttt{eat()} with a pointer to \texttt{Apple.eat()}. +\end{example} + +\begin{example} +Next, suppose we declare a generic subclass of \texttt{Fruit}. Notice how we override \texttt{Fruit.eat()} to print the concrete replacement type for~\rT: +\begin{Verbatim} +class Pear: Fruit { + override func eat() { + print(T.self) + } +} +\end{Verbatim} +With \texttt{Pear} as above, we can form various \index{generic class type}generic class types, such as \texttt{Pear}, \texttt{Pear}, and so on; these are all obtained by applying a \index{substitution map}substitution map to the \index{declared interface type!class declaration}declared interface type \texttt{Pear<\rT>}. The key fact about this example is that every \index{specialized type}specialization of \texttt{Pear} has the same superclass type, \texttt{Fruit}, because \texttt{Pear}'s superclass type does not depend on its generic parameter \rT. Thus, as far as the expression type checker is concerned, all specializations of \texttt{Pear} are subtypes of~\texttt{Fruit}. Our complete class hierarchy diagram is now infinite, but here is one part of it: +\begin{center} +\begin{tikzpicture} +\node (Fruit) [class] {\texttt{\vphantom{p}Fruit}}; +\node (PearInt) [class, below left=of Fruit] {\texttt{\vphantom{p}Pear}}; +\node (PearString) [class, below=of Fruit] {\texttt{\vphantom{p}Pear}}; +\node (PearT) [class, below right=of Fruit] {\texttt{\vphantom{p}Pear<\rT>}}; +\draw [arrow] (PearInt) -- (Fruit); +\draw [arrow] (PearString) -- (Fruit); +\draw [arrow] (PearT) -- (Fruit); +\end{tikzpicture} +\end{center} +This example implements a design pattern known as \index{type erasure}\emph{type erasure}. If we take a value of type~\texttt{Pear} for some \tX, we can always convert it to a value of type~\texttt{Fruit}, effectively \emph{erasing} the generic argument \tX\ from the static type of the resulting value. This allows the type \tX\ to vary dynamically. In the below, the static type of~\texttt{f} is just~\texttt{Fruit}, but the dynamic type of the value stored within is either \texttt{Pear} or \texttt{Pear}, depending on the outcome of an arbitrary check: +\begin{Verbatim} +let someCondition: Bool = ... +let p1 = Pear() +let p2 = Pear() +let f: Fruit = (someCondition ? p1 : p2) +\end{Verbatim} +In \ChapRef{chap:existential types}, we will learn about \emph{existential types}, another more general type erasure mechanism built into the language, based on protocols. +\end{example} + +\paragraph{Generic superclasses.} The interesting case arises when both subclass and superclass are generic. The superclass type of a class declaration is an \index{interface type!superclass type}interface type, so it can contain type parameters from the \index{generic signature!superclass type}generic signature of the subclass. For example: +\begin{Verbatim} +class StoneFruit {} +class Mango: StoneFruit> {} +\end{Verbatim} +In certain situations, we don't care about generic arguments at all, and we say that the \IndexDefinition{superclass declaration}\emph{superclass declaration} of \texttt{Mango} is \texttt{StoneFruit}. However, the full relationship is expressed by the fact that the superclass \emph{type} of \texttt{Mango} is \texttt{StoneFruit>}. We can also obtain the superclass type of a \index{class type!superclass type}class \emph{type}, that is, a reference to a class declaration specialized with a list of generic arguments. This is defined in terms of applying a substitution map to the superclass type of a class declaration. + +\begin{algorithm}[Get superclass type]\label{superclass type of type} Takes a class type~\tC\ as input. Returns the superclass type of~\tC, or null if \tC\ does not have a superclass type. \begin{enumerate} -\item The \emph{original type}; this is the subject type of the conformance requirement. -\item The \emph{substituted type}; this is the result of applying the substitution map to the original type, which should be canonically equal to the conforming type of the conformance that will be returned. -\item The protocol declaration named by the conformance requirement. +\item Let $c$ be the class declaration of \tC. If $c$ doesn't have a superclass type, return null. +\item Let \texttt{S} be the superclass type of $c$. +\item Let $\Sigma$ be the \index{context substitution map!superclass type}context substitution map of \tC. +\item Return $\texttt{S}\otimes\Sigma$. \end{enumerate} -The callbacks can be arbitrarily defined by the caller. Several pre-existing callbacks also implement common behaviors. For the replacement type callback, +\end{algorithm} +\begin{example} +The class type \texttt{Mango} is formed by applying the substitution map $\{\SubstType{\rT}{Int}\}$ to the declared interface type \texttt{Mango<\rT>}. To compute the superclass type of \texttt{Mango}, we apply this substitution map to the superclass type of the declaration, which is \texttt{StoneFruit>}: +\[ +\texttt{StoneFruit>} \otimes \{\SubstType{\rT}{Int}\} = \texttt{StoneFruit>} +\] +The below class hierarchy diagram shows the relationship (or lack hereof) between the two types \texttt{Mango} and \texttt{Mango}: +\begin{center} +\begin{tikzpicture} +\node (StoneFruitInt) [class] {\texttt{\vphantom{p}StoneFruit>}}; +\node (MangoInt) [class, below=of StoneFruitInt] {\texttt{Mango}}; + +\node (StoneFruitBool) [class, right=of StoneFruitInt] {\texttt{\vphantom{p}StoneFruit>}}; +\node (MangoBool) [class, below=of StoneFruitBool] {\texttt{Mango}}; + +\draw [arrow] (MangoInt) -- (StoneFruitInt); +\draw [arrow] (MangoBool) -- (StoneFruitBool); +\end{tikzpicture} +\end{center} +Now, consider the superclass type of \texttt{Mango<\rT>}. We saw at the beginning of this chapter that the context substitution map of the declared interface type is the \index{identity substitution map!superclass type}identity substitution map. Indeed, since $\texttt{Mango<\rT>} \otimes \SubstMap{\SubstType{\rT}{\rT}} = \texttt{Mango<\rT>}$, the superclass type of \texttt{Mango<\rT>} is \texttt{StoneFruit>}, which is the same as the superclass type of the declaration of \texttt{Mango}. +\end{example} +Suppose that the class type \tC\ given to \AlgRef{superclass type of type} is an \index{interface type!superclass type}interface type for some generic signature~$H$, so that $\tC\in\TypeObj{H}$. If $G$~is the generic signature of~$c$ in Step~2, then $\Sigma\in\SubMapObj{G}{H}$ in Step~1. We already said that \texttt{S}, the superclass type of $c$, is an element of $\TypeObj{G}$, so our result $\texttt{S}\otimes\Sigma\in\TypeObj{H}$. That is, the superclass type of \tC, if it exists, is again an interface type for~$H$. + +Not every interface type has a superclass type. Thus, for every generic signature $H$, we get the following \emph{partial} mapping of \index{set!types}sets, which we denote by ``\textsf{superclass}'': +\[ +\textsf{superclass}\colon \TypeObj{G} \rightarrow \TypeObj{G} +\] +If \tC\ is a class type, $\tC^\prime$ is the superclass type of \tC, and $\Sigma$ is a substitution map, then the following diagram commutes: +\begin{center} +\begin{tikzcd}[column sep=3cm,row sep=1cm] +\tC \arrow[d, "\textsf{superclass}"{left}] \arrow[r, "\Sigma"] &\tC \otimes \Sigma \arrow[d, "\textsf{superclass}"] \\ +\tC^\prime \arrow[r, "\Sigma"]&\tC^\prime \otimes \Sigma +\end{tikzcd} +\end{center} + +\begin{example} +We can restate our previous example this way: +\begin{center} +\begin{tikzcd}[column sep=3cm,row sep=1cm] +\texttt{Mango<\rT>} \arrow[d, "\textsf{superclass}"{left}] \arrow[r, "\{\SubstType{\rT}{Int}\}"] &\texttt{Mango} \arrow[d, "\textsf{superclass}"] \\ +\texttt{StoneFruit>} \arrow[r, "\{\SubstType{\rT}{Int}\}"]&\texttt{StoneFruit>} +\end{tikzcd} +\end{center} +\end{example} + +\paragraph{Superclass substitution map.} If qualified lookup into a class type finds a member one or more levels up the class hierarchy, we need to form the correct substitution map for the \index{member reference expression}member reference expression. +\begin{example} +When we invoke \texttt{method()} on a value of type \texttt{Top} below, the substitution map for the call is the \index{context substitution map!class member}context substitution map of \texttt{Top}, because \texttt{method()} is a direct member of \texttt{Top}: +\begin{Verbatim} +class Top { + func method() {} +} +Top().method() +\end{Verbatim} +If we call \texttt{method()} on a subclass of \texttt{Top} though, this is no longer the case: +\begin{Verbatim} +class Mid: Top<(Y, X)> {} +class Bot: Mid {} +Bot().method() +\end{Verbatim} +The class type \texttt{Bot} is not generic, so it's context substitution map is \index{empty substitution map}empty. However, we need a substitution map for \texttt{<\rT>}, the generic signature of \texttt{method()}. Notice how starting from \texttt{Bot}, we reach the parent context of \texttt{method()} in two hops; we jump to \texttt{Mid}, and then to \texttt{Top}. The superclass types of \texttt{Mid} and \texttt{Bot} are as follows: +\begin{gather*} +\mathsf{superclass}(\texttt{Mid<\rT, \rU>}) = \texttt{Top<(\rU, \rT)>} = \texttt{Top<\rT>} \otimes \Sigma_1\\ +\mathsf{superclass}(\texttt{Bot}) = \texttt{Mid} = \texttt{Mid<\rT, \rU>} \otimes \Sigma_2 +\end{gather*} +with the two substitution maps below: +\begin{gather*} +\Sigma_1 := \SubstMap{\SubstType{\rT}{(\rU, \rT)}}\\ +\Sigma_2 := \SubstMap{\SubstType{\rT}{Int},\,\SubstType{\rU}{Bool}} +\end{gather*} +The input generic signature of~$\Sigma_1$ is the generic signature of \texttt{Top}, and the output generic signature of~$\Sigma_1$ is the generic signature of \texttt{Mid}. Also, the input generic signature of~$\Sigma_2$ is the generic signature of \texttt{Mid}, and its output generic signature is the generic signature of \texttt{Bot}, which is the empty generic signature. If we \index{substitution map composition!superclass type}compose the two substitution maps, we get the following: +\[ +\Sigma_1 \otimes \Sigma_2 = \SubstMap{\SubstType{\rT}{(Bool, Int)}} +\] +This is the substitution map for the call. In particular, when we call \texttt{method()} on a value of type \texttt{Bot}, the type of the \texttt{self} parameter inside the method is going to be: +\[ +\texttt{Top<\rT>} \otimes \Sigma_1 \otimes \Sigma_2 = \texttt{Top<(Bool, Int)>} +\] +In fact, we can also get the same result if we apply ``\textsf{superclass}'' twice: +\[ +\mathsf{superclass}(\mathsf{superclass}(\texttt{Bot})) = (\texttt{Top<\rT>} \otimes \Sigma_1) \otimes \Sigma_2 +\] +\end{example} + +We can generalize this as follows. We are given a class type together with the parent \index{class declaration}class declaration of some referenced member, so we must walk up the class hierarchy and collect substitutions along the way, until we reach the parent class. + +\begin{algorithm}[Get superclass type for declaration]\label{superclassfordecl} As input, takes a class type \tC\ and a class declaration $d$. Returns the superclass type of \tC\ for $d$. \begin{enumerate} -\item The \textbf{query substitution map} callback looks up a generic parameter in an existing substitution map. -\item The \textbf{query type map} callback looks up a generic parameter in a hashtable. +\item Let $c$ be the class declaration of \tC. If~$c=d$, return \tC. +\item Let \texttt{S} be the superclass type of $c$, or signal an invariant violation if it doesn't have one. (That would indicate $d$~is not a superclass of the original class type~\tC, in which case members of $d$ should not be visible as members of \tC.) +\item Let~$\Sigma$ be the context substitution map of~\tC. Set $\tC \leftarrow \texttt{S} \otimes \Sigma$. Go back to Step~1. \end{enumerate} -For the conformance lookup callback, +\end{algorithm} + +Note that this algorithm returns the type of the \Index{self parameter@\texttt{self} parameter}\texttt{self} parameter as seen ``inside'' the referenced member. To get the substitution map for the reference, we ask this type for its context substitution map. This is known as the \IndexDefinition{superclass substitution map}\emph{superclass substitution map}. + +This wraps up our discussion of subclassing for now. In later sections, we will learn more about this topic, and in particular, we will see how this all intersects with superclass requirements in a generic signature: +\begin{itemize} +\item \ChapRef{chap:conformances} explains how a subclass inherits conformances from its superclass. +\item \SecRef{local requirements} explains how archetypes behave when their type parameter is subject to a superclass requirement. +\item \SecRef{identtyperepr} explains how type resolution resolves a reference to a member type declared in the superclass of a base type. +\item \SecRef{checking generic arguments} explains how we can check if a substitution map satisfies a superclass requirement. +\item \SecRef{requirement desugaring} and \SecRef{minimal requirements} explain what it means for the user's program to state a superclass requirement that is either redundant or in conflict with some other requirement. +\end{itemize} + +\section{SIL Type Lowering}\label{sec:type lowering} + +We end this chapter by taking a brief look at the \index{SIL}SIL type system. Recall that SIL is an intermediate representation, constructed from the \index{abstract syntax tree}abstract syntax tree by the compiler's \index{SILGen}SILGen pass. In this section, the types appearing in the \index{abstract syntax tree}abstract syntax tree will be called \IndexDefinition{formal type}\emph{formal types}, to distinguish them from the \IndexDefinition{lowered type}\emph{lowered types} used by SIL. The translation of formal types into lowered types is the process of \IndexDefinition{SIL type lowering}\index{type lowering|see{SIL type lowering}}\emph{SIL type lowering}. + +Our primary focus in this section is the relationship between SIL type lowering and \index{type substitution!lowered type}type substitution. We cannot hope to gain a complete understanding of SIL type lowering here, so certain concepts will be omitted or left unexplained. For more details about SIL and SIL types, see \cite{sil,siltypes}. The reader may also choose to skip this section entirely, because this material is not needed in the rest of this book. + +\paragraph{Loadable or address-only.} +Most formal types are also lowered types, and SIL type lowering simply returns the original formal type in this case. One major exception concerns the types of functions. Unlike the \index{function type!SIL type lowering}function types of \SecRef{sec:more types}, which we will refer to as \emph{formal} function types in this section, the type of a function in SIL is a different kind of type that does not appear in the abstract syntax tree: the \IndexDefinition{SIL function type}\emph{SIL function type}. Compared to formal function types, SIL function types encode more details about the \index{calling convention}calling convention. We begin by taking a look at one of these details. + +A SIL function type makes explicit whether each \IndexDefinition{lowered parameter}lowered parameter and \IndexDefinition{lowered result}result can be passed in registers, or if it must be passed indirectly, via a pointer to a value stored in memory. If the parameter's lowered type is \IndexDefinition{loadable type}\emph{loadable}, it is passed directly; otherwise, the lowered type is \IndexDefinition{address-only type}\emph{address-only}, and it must be passed indirectly. Almost all lowered types are loadable. The exceptions are the following: \begin{enumerate} -\item The \textbf{global conformance lookup} callback performs a global conformance lookup (\SecRef{conformance lookup}). -\item The \textbf{local conformance lookup} callback performs a local conformance lookup into another substitution map (\SecRef{abstract conformances}). -\item The \textbf{make abstract conformance} callback asserts that the substituted type is a type variable, type parameter or archetype, and returns an abstract conformance (also in \SecRef{abstract conformances}). It is used when it is known that the substitution map can be constructed without performing any conformance lookups, as is the case with the identity substitution map. -\end{enumerate} -\IndexDefinition{conformance lookup callback} -\index{abstract conformance} -\index{global conformance lookup!substitution map} -\index{local conformance lookup} -\IndexDefinition{global conformance lookup callback} -\IndexDefinition{local conformance lookup callback} -\IndexDefinition{make abstract conformance callback} +\item \index{type parameter type!SIL type lowering}Type parameter types and \index{archetype type!SIL type lowering}archetype types are address-only when they are not subject to an \Index{AnyObject@\texttt{AnyObject}}\texttt{AnyObject} layout constraint (\SecRef{sec:requirements}). -Specialized types only store their generic arguments, not conformances, so the \index{context substitution map!construction}context substitution map of a specialized type is constructed by first populating a \texttt{DenseMap} with the generic arguments of the specialized type and all of its parent types, and then invoking the \textbf{get substitution map} operation with the \textbf{query type map} and \textbf{global conformance lookup} callbacks. +(Values of such type have an unknown size at compile time.) -\index{identity substitution map} -The identity substitution map of a generic signature is constructed from a replacement type callback which just returns the input generic parameter together with the \textbf{make abstract conformance} callback. +\item \index{weak reference type!SIL type lowering}Weak reference types (\SecRef{sec:special types}) are address-only. -\iffalse +(While a weak reference is just the address of a class instance, the location of each weak reference must be registered with the Swift runtime at all times, and so they can only ``exist'' in memory.) -When a subclass inherits from a superclass, there is a subtype relationship between the subclass and superclass. If neither class is generic, the relationship is straightforward. Here, we have a pair of classes \texttt{C} and \texttt{D}; \texttt{D} inherits from \texttt{C}, so instances of \texttt{D} are also instances of \texttt{C}: -\begin{Verbatim} -class C {} -class D {} +\item \index{struct type!SIL type lowering}Struct and \index{enum type}enum types with \index{resilience!SIL type lowering}\emph{resilient} declarations are address-only. + +(We briefly talked about the resilience model in \SecRef{module system}, but further discussion is beyond the scope of this book. See \cite{evolutionblog,libraryevolution} for details.) + +\item Aggregate types, such as \index{tuple type!SIL type lowering}tuples with address-only elements, structs with address-only \index{stored property}stored properties, and enums with address-only cases, are address-only. -let instanceOfD: D = D() -let instanceOfC: C = instanceOfD // okay +(The property of being address-only is \index{transitive relation}transitive with respect to containment.) + +\item \index{existential type!SIL type lowering}Existential types (\ChapRef{chap:existential types}) are address-only when not \texttt{AnyObject} constrained. + +(A value of such a type can \emph{contain} a value of address-only type, and so the existential container itself must be address-only.) +\end{enumerate} +We will look at some examples before we discuss the general case. + +\begin{example}\label{ex:loadable address only 1} +The \texttt{Int} and \texttt{String} types in the standard library are loadable, and so is the below \texttt{Loadable} struct type, because its stored properties are loadable: +\begin{Verbatim} +struct Loadable { + var x: Int + var y: String +} \end{Verbatim} -With generic classes, the situation is more subtle. The subclass \emph{declaration} states a superclass \emph{type}. The superclass type appears in the inheritance clause of the subclass, and can reference the subclass's generic parameters: +On the other hand, the below \texttt{AddressOnly} struct is address-only, because it contains a weak reference: \begin{Verbatim} -class Base {} -class Derived: Base {} +struct AddressOnly { + weak var x: AnyObject? +} \end{Verbatim} -Now, the declaration \texttt{Derived} has the generic superclass type \texttt{Base}. We expect that \texttt{Derived} is a subtype of \texttt{Base}, and \texttt{Derived} is a subtype of \texttt{Base}, but that \texttt{Derived} and \texttt{Base} are unrelated types. +\end{example} -To get a complete picture of the subtype relationship, we need to define the concept of the superclass type \emph{of a type}, and not just the superclass type of a declaration. +For \index{generic nominal type!SIL type lowering}generic nominal types, deciding if a type is address-only requires type substitution. -First of all, what is the superclass type of the declared interface type of a class? The superclass type of the class declaration is an interface type for the class declaration's generic signature, so we say that the superclass type of the declared interface type is just the superclass type of the declaration. In our example, this tells us that \texttt{Derived} is a subtype of \texttt{Base} and an unrelated type to \texttt{Base}. +\begin{example}\label{ex:lowering optional} +First, consider the standard library \texttt{Optional} enum: +\begin{Verbatim} +public enum Optional { + case none + case some(Wrapped) +} +\end{Verbatim} +In this section, we will say that a \index{generic nominal type!SIL type lowering}generic nominal type formed from this declaration is an ``\index{optional type}optional type.'' (Not to be confused with the \index{optional sugared type}optional sugared type that is spelled as \texttt{Int?}, from \SecRef{sec:more types}---however, the latter desugars to the former.) -What about the superclass type of an arbitrary specialization of the class? Here, we rely on the property that a specialized type is the result of applying its context substitution map to the declared interface type. If we instead apply the context substitution map to the superclass type of the class declaration, we get the superclass type of our specialized type. This can be shown with a commutative diagram: -\begin{quote} -\begin{tikzcd}[column sep=3cm,row sep=1cm] -\mathboxed{declared interface type} \arrow[d, "\text{superclass type}"{left}] \arrow[r, "\text{substitution}"] &\mathboxed{specialized type} \arrow[d, "\text{superclass type}"] \\ -\mathboxed{superclass type of declaration} \arrow[r, "\text{substitution}"]&\mathboxed{superclass type of type} -\end{tikzcd} -\end{quote} +The generic argument of an optional type is called the \index{payload type of optional}\emph{payload type} of the optional. In memory, a value of optional type must be large enough to store the payload, together with a tag value indicating if the selected case is \texttt{none} or \texttt{some}. (The tag is also sometimes encoded \emph{inside} the payload~\cite{typelayout}.) -Now that we can compute the superclass type of a type, we can walk up the inheritance hierarchy by iterating the process, to get the superclass type of a superclass type, and so on. +An optional type is loadable if and only if its payload type is loadable. Thus, \texttt{Optional} is loadable, while \texttt{Optional}, with \texttt{AddressOnly} as in the previous example, is address-only. +\end{example} -\begin{algorithm}[Iterated superclass type]\label{superclassfordecl} As input, takes a class type \texttt{T} and a superclass declaration \texttt{D}. Returns the superclass type of \texttt{T} for \texttt{D}. +\begin{example} +Next, consider this enum declaration, which is identical to \texttt{Optional} except that the ``\texttt{some}'' case is declared to be \index{indirect enum case}\texttt{indirect}: +\begin{Verbatim} +enum IndirectOptional { + case none + indirect case some(Wrapped) +} +\end{Verbatim} +Instead of storing the payload inline, an \texttt{indirect} enum case is represented by a single reference-counted pointer that refers to a heap-allocated box. For this reason, both \texttt{IndirectOptional} and \texttt{IndirectOptional} are loadable. +\end{example} + +\begin{example} +This generic struct declaration has no stored properties at all: +\begin{Verbatim} +struct Phantom {} +\end{Verbatim} +Hence, both \texttt{Phantom} and \texttt{Phantom} are loadable. +\end{example} + +\begin{example} +When a type contains \index{type parameter type!SIL type lowering}type parameters, its category depends on the referencing generic context: +\begin{Verbatim} +func f(_ t: T, _ u: U) { + let x: Optional = ... // address-only + let y: Optional = ... // loadable +} +\end{Verbatim} +The type of \texttt{x} is address-only, because \texttt{T} is unconstrained, while the type of \texttt{y} is loadable, because \texttt{U} is subject to an \texttt{AnyObject} layout constraint. +\end{example} + +Now, we're going to look at the algorithm. As described here, it accepts both formal and lowered types. In the actual implementation, it operates on lowered types only, and we compute a number of additional properties of the type's layout in the same way. + +\begin{algorithm}[Decide if type is address-only]\label{alg:lowered type category} +Receives a type~\tX\ as input, together with an optional \index{generic signature!SIL type lowering}generic signature~$G$ in the case where \tX\ contains type parameters. Returns one of ``\index{loadable type}loadable'' or ``\index{address-only type}address-only'' as output. \begin{enumerate} -\item Let \texttt{C} be the class declaration referenced by \texttt{T}. If $\texttt{C}=\texttt{D}$, return \texttt{T}. -\item If \texttt{C} does not have a superclass type, fail with an invariant violation; \texttt{D} is not actually a superclass of \texttt{T}. -\item Otherwise, apply the context substitution map of \texttt{T} to the superclass type of \texttt{C}. Assign this new type to \texttt{T}, and go back to Step~1. +\item If \tX\ is a \index{type parameter!SIL type lowering}\textbf{type parameter}, say $\tX = \tT$: evaluate the \IndexQuery{requiresClass}$\Query{requiresClass}{G,\,\tT}$ generic signature query. If the answer to the query is true, return ``loadable,'' otherwise return ``address-only.'' +\item If \tX\ is an \index{archetype type!SIL type lowering}\textbf{archetype type} (\ChapRef{chap:archetypes}): proceed as above, but in place of~$G$, use the generic signature stored within the archetype. +\item If \tX\ is a \index{weak reference type!SIL type lowering}\textbf{weak reference type}, return ``address-only.'' +\item If \tX\ is a \index{struct type!SIL type lowering}\textbf{struct type}, say $\tX = \tXd \otimes \Sigma$ where~$\Sigma$ is the \index{context substitution map}context substitution map and~$\tXd$ is the \index{declared interface type!SIL type lowering}declared interface type of its struct declaration~$d$, then, for each \index{stored property!SIL type lowering}stored property of~$d$: +\begin{enumerate} +\item Let \tY\ be the \index{interface type!SIL type lowering}interface type of the stored property. +\item Compute the substituted type $\tY \otimes \Sigma$. +\item Recursively check if $\tY \otimes \Sigma$ is address-only. +\item If it is address-only, immediately return ``address-only.'' \end{enumerate} -\end{algorithm} +Otherwise, if all substituted stored property types are loadable, return ``loadable.'' -\begin{example}\label{genericsuperclassexample} -\ListingRef{generic superclass example listing} shows a class hierarchy demonstrating these behaviors: +\item If \tX\ is an \index{enum type!SIL type lowering}\textbf{enum type}: \begin{enumerate} -\item The superclass type of \texttt{Derived} is \texttt{Middle}. -\item The superclass type of \texttt{Middle} is \texttt{Base<(T, T)>}. +\item Check if the \index{enum declaration!SIL type lowering}enum declaration is \texttt{indirect}; if so, return ``loadable.'' +\item Otherwise, proceed as in the struct case, but only consider those enum cases \emph{not} declared as \index{indirect enum case}\texttt{indirect}. + +(The lowered type of an \index{indirect enum case}\texttt{indirect} case is always a reference-counted pointer to a \index{heap-allocated box}heap-allocated box, and such a pointer is loadable. If the enum itself is \texttt{indirect}, we proceed as if each case was \texttt{indirect}.) \end{enumerate} -The superclass type of the type \texttt{Middle} is the superclass type of \texttt{Middle} with the context substitution map of \texttt{Middle} applied: -\[\texttt{Base<(T, T)>}\otimes -\SubstMap{ -\SubstType{T}{Int}\\ -\SubstType{U}{String} -} = \texttt{Base<(Int, Int)>} -\] -This means the superclass type of \texttt{Derived} with respect to \texttt{Base} is \texttt{Base<(Int, Int)>}. -What is the type \texttt{Derived.C}? The type alias \texttt{C} is declared in \texttt{Base}. The superclass type of \texttt{Derived} with respect to \texttt{Base} is \texttt{Base<(Int, Int)>}. We can apply the context substitution map of this superclass type to the declared interface type of \texttt{C}: -\[\texttt{() -> T}\otimes -\SubstMap{ -\SubstType{T}{(Int, Int)} -} = \texttt{() -> (Int, Int)} -\] -\end{example} +\item If \tX\ is a \index{tuple type!SIL type lowering}\textbf{tuple type}, recursively check if each element type of \tX\ is address-only. If any are address-only, return ``address-only.'' Otherwise, return ``loadable.'' -\fi +\item If \tX\ is an \index{existential type!SIL type lowering}\textbf{existential type}, check if the existential type satisfies the \texttt{AnyObject} layout constraint. If so, return ``loadable,'' otherwise ``address-only.'' -\section{Nested Nominal Types}\label{nested nominal types} +\item If \tX\ is a \index{class type!SIL type lowering}\textbf{class type}, \index{metatype type!SIL type lowering}\textbf{metatype type}, or \index{function type!SIL type lowering}\textbf{(SIL) function type}, return ``loadable.'' +\end{enumerate} +\end{algorithm} + +\paragraph{Exploding tuples.} +A formal function type has zero or more parameter types, and exactly one return type. This return type can be a \index{tuple type!SIL type lowering}tuple type, which is how we model the case of having no return value, or more than one return value. -Nominal type declarations can appear \index{nested type declaration}inside other declaration contexts, subject to the following \index{limitation!nested type declarations}restrictions: +Like a formal function type, a \index{SIL function type}SIL function type has zero or more \index{lowered parameter!tuple type}lowered parameter types, but unlike a formal function type, a SIL function type \emph{also} has zero or more lowered \index{lowered result!tuple type}result types. When a tuple type appears in the parameter list or return type of a formal function type, SIL type lowering will ``explode'' it into multiple lowered parameters or lowered results. + +Both lowered parameters and lowered results are annotated with a \index{convention!SIL type lowering}\emph{convention}, which encodes two things: whether the value is passed directly in registers or indirectly via memory, and also, how the \index{ownership}ownership of the value changes during the call: \begin{enumerate} -\item Structs, enums and classes cannot be nested in generic \index{local declaration context}local contexts. -\item Structs, enums and classes cannot be nested in protocols or \index{protocol extension}protocol extensions. -\item Protocols cannot be nested in generic contexts. +\item The directness is determined by the lowered parameter or result type; if the type is loadable, we may pass it directly (but sometimes, we must pass it indirectly, as we will see below); when it is address-only, it must be passed indirectly. +\item A discussion of ownership is beyond the scope of this book. See \cite{siltypes} for details. \end{enumerate} -We're going to explore the implementation limitations behind these restrictions, and possible future directions for lifting them. (The rest of the book talks about what the compiler does, but this section is about what the compiler \emph{doesn't} do.) -\paragraph{Types in generic local contexts.} \index{local declaration context}\index{local type declaration}This restriction is a consequence of a shortcoming in the representation of a nominal type. Recall from \ChapRef{types} that nominal types and generic nominal types store a parent type, and generic nominal types additionally store a list of generic arguments, corresponding to the generic parameter list of the nominal type declaration. This essentially means there is no place to store the generic arguments from outer \index{generic context}local contexts, such as functions. +A SIL function type also has an overall \index{calling convention}convention specifier. The possible conventions are a superset of those for formal function types, which were listed in \SecRef{sec:more types}. For us, the two that we need are \index{thin function!SIL type lowering}\texttt{@convention(thin)}, for functions that don't capture values (such as global functions and methods), and \index{thick function!SIL type lowering}\texttt{@convention(thick)}, for functions that do (such as \index{closure expression!SIL type lowering}closure expressions). A thick function value consists of a function pointer together with a \index{closure context}context, which can be used to store \index{captured value}captured values---the caller passes the context as an additional parameter. Finally, a SIL function type has a \index{generic signature!SIL function type}generic signature. -\begin{listing}\captionabove{A nominal type declaration in a generic local context}\label{nominal type in generic local context} +\begin{example} +Here is a \index{function declaration}function declaration with two parameters, the second of which is a tuple. The second element of the tuple is our address-only type from \ExRef{ex:loadable address only 1}: \begin{Verbatim} -func f(t: T) { - struct Nested { // error - let t: T - - func printT() { - print(t) - } - } - - Nested(t: t).printT() -} +func foo(x: Int, y: (String, AddressOnly)) {} +\end{Verbatim} +Here is the notation used by the compiler when printing this declaration's SIL function type (for example, put the above declaration together with the \texttt{AddressOnly} struct from earlier in a source file, and build it the \IndexFlag{emit-silgen}\verb|-emit-silgen| flag): +\begin{quote} +\begin{verbatim} +@convention(thin) +(Int, @guaranteed String, @in_guaranteed AddressOnly) -> () +\end{verbatim} +\end{quote} +The \texttt{Int} parameter does not have a convention annotation, so it is passed directly. The \texttt{Int} type is \index{trivial type}trivial value, so its values can be moved and copied without ownership concerns. The \texttt{String} parameter is also passed directly, but this time, the \verb|@guaranteed| convention indicates that ownership of the string is retained by the caller. Finally, the \texttt{AddressOnly} parameter is address-only, so the \verb|@in_guaranteed| convention is used to pass the value indirectly, and once again, the caller retains ownership of the value. Since \texttt{foo()} does not state a return type, the SIL function type has zero results. +\end{example} -func g() { - f(t: 123) - f(t: "hello") +\begin{example} +Here is a generic function that returns a tuple: +\begin{Verbatim} +func flip(t: T, u: U) -> (U, T) { + return (u, t) } \end{Verbatim} -\end{listing} +Here is its SIL function type: +\begin{quote} +\begin{verbatim} +@convention(thin) + (@in_guaranteed T, @in_guaranteed U) -> (@out U, @out T) +\end{verbatim} +\end{quote} +Both parameters are passed indirectly, with the caller retaining ownership (so in fact the implementation of the function most copy these values). The SIL function type also returns two results indirectly (at the machine level, the caller provides a pair of return buffers, large enough to hold both results.) +\end{example} -\ListingRef{nominal type in generic local context} shows a nominal type nested inside of a generic function. The generic signature of \texttt{Nested} contains the generic parameter \texttt{T} from the outer generic function \texttt{algorithm()}. However, under our rules, the declared interface type of \texttt{Nested} is a singleton nominal type, because \texttt{Nested} does not have its own generic parameter list, and its parent context is not a nominal type declaration. This means there is no way to recover a \index{context substitution map!of local type}context substitution map for this type because the generic argument for \texttt{T} is not actually stored anywhere. +\paragraph{Re-abstraction thunks.} +So, to get a SIL function type of a \index{function declaration!SIL type lowering}function declaration or \index{closure expression!SIL type lowering}closure expression, we take its \index{interface type!SIL function type}interface type, which is a formal function type. We explode tuples in the parameter list and return type, compute the category of each parameter and result, and do a few other things. Of course, some of those parameter and result types may themselves be function types, and we recursively lower those function types, too. We then form our SIL function type. This is sufficient to explain SILGen's behavior with direct calls of functions and closures, but it is not the whole story. -In the source language, there is no way to specialize \texttt{Nested}; the reference to \texttt{T} inside \texttt{f()} is always understood to be the generic parameter \texttt{T} of the outer function. However, inside the compiler, different generic specializations can still arise. If the two calls to \texttt{f()} from inside \texttt{g()} are specialized and inlined by the SIL optimizer for example, the two temporary instances of \texttt{Nested} must have different in-memory layouts, because in one call \texttt{T} is \texttt{Int}, and in the other \texttt{T} is \texttt{String}. +To understand what happens when a function value is passed between generic contexts, we need to talk about \IndexDefinition{abstraction pattern}\emph{abstraction patterns}. The operation of SIL type lowering in the compiler receives not just a \index{formal type}formal type, but also an abstraction pattern. -A better representation for the specializations of nominal types would replace the parent type and list of generic arguments with a single ``flat'' list that includes all outer generic arguments as well. This approach could represent generic arguments coming from outer local contexts without loss of information. +This complication arises for the following reason. When a substitution map is applied to a SIL function type, the type parameters in the SIL function type are replaced with concrete types. However, substitution does not change the directness of the SIL function type's parameters and results. If an original type parameter is address-only but its replacement type is loadable, the resulting SIL function type will differ from what we would get if we just lower the substituted formal type instead. Typically, a closure expression is emitted with a ``natural'' abstraction pattern that is its own formal type. To change the abstraction pattern of a function value, SILGen wraps the function value in a \IndexDefinition{re-abstraction thunk}\emph{re-abstraction thunk}. There are two kinds of thunk: +\begin{enumerate} +\item A \IndexDefinition{substituted-to-original thunk}\textbf{substituted-to-original thunk} appears when a closure is passed in to a generic function. It wraps a function type lowered with itself as the abstraction pattern, with a function type lowered with a more general abstraction pattern. +\item An \IndexDefinition{original-to-substituted thunk}\textbf{original-to-substituted thunk} appears when a closure is returned from a generic function. It wraps a function type lowered with a more general abstraction pattern, with one lowered with the fully substituted abstraction pattern. +\end{enumerate} -\index{runtime type metadata} -Luckily, this ``flat'' representation is already implemented in the Swift runtime. The runtime type metadata for a nominal type includes all the generic parameters from the nominal type declaration's generic signature, not just the generic parameters of the nominal type declaration itself. So while lifting this restriction would require some engineering effort on the compiler side, it would be a backward-deployable and \index{ABI}ABI-compatible change. +\begin{example}\label{ex:abstraction pattern 1} +Consider this rather pointless generic function that applies a closure to a value and returns the result: +\begin{Verbatim} +func apply(_ x: T, _ fn: (T) -> T) -> T { + return fn(x) +} +\end{Verbatim} +Say we call our function like this, with the substitution map $\Sigma := \SubstMap{\SubstType{\rT}{String}}$: +\begin{Verbatim} +let fn: (String) -> String = { $0.uppercased() } +let result = apply("hello world", fn) +\end{Verbatim} +If we apply $\Sigma$ to the formal type of the ``\texttt{fn}'' parameter to \texttt{apply()}, we get: +\[ +\left[ \texttt{(\rT) -> \rT} \right] \otimes \Sigma = \texttt{(String) -> String} +\] +The above formal type is identical to the formal type of the \index{closure expression}closure expression passed in as an argument by the caller. Now, let's take a look at lowered types. Since \texttt{String} is loadable, lowering the above formal type gives: +\begin{quote} +\begin{verbatim} +@convention(thick) (@guaranteed String) -> (@owned String) +\end{verbatim} +\end{quote} +On the other hand, the lowered type of the ``\texttt{fn}'' parameter passes the parameter and returns the result indirectly: +\begin{quote} +\begin{verbatim} +@convention(thick) (@in_guaranteed T) -> (@out T) +\end{verbatim} +\end{quote} +Finally, if we apply $\Sigma$ to this \index{SIL function type!type substitution}SIL function type, we obtain the following: +\begin{quote} +\begin{verbatim} +@convention(thick) (@in_guaranteed String) -> (@out String) +\end{verbatim} +\end{quote} +To reconcile the difference, SILGen wraps the closure with a substituted-to-original thunk before handing it off to \texttt{apply()}. This thunk receives a pointer to a \texttt{String}, loads it, and calls the wrapped closure with the loaded value. The closure returns a new \texttt{String} directly, which the thunk stores into the indirect result buffer before returning. +\end{example} -\paragraph{Types in protocol contexts.} Allowing struct, enum and class declarations to appear inside protocols and protocol extensions would come down to deciding if the \IndexSelf protocol \tSelf\ type should be ``captured'' by the nested type. +\paragraph{More about abstraction patterns.} +For our purposes, an \index{abstraction pattern}abstraction pattern simply consists of the unsubstituted \index{formal type!abstraction pattern}formal type for computing the directness of each \index{lowered parameter!abstraction pattern}parameter and \index{lowered result!abstraction pattern}result in the \index{SIL function type!abstraction pattern}SIL function type. (In reality, abstraction patterns also carry a \emph{kind} and some additional information, but we won't need this level of detail here.) So, a \index{re-abstraction thunk}re-abstraction thunk is then basically a closure that wraps another closure. The body of the thunk takes each parameter and forwards it to the wrapped closure, then takes the the wrapped closure's result and returns it to the caller of the thunk. In doing so, a re-abstraction thunk can change the convention of the parameters and results. -\begin{listing}\captionabove{A nominal type declaration nested in a protocol context}\label{nominal type in protocol context} -\begin{Verbatim} -protocol P {} +There is also a useful peephole optimization that we will mention now. Very often, the closure being re-abstracted is a literal \index{closure expression!re-abstraction thunk}closure expression, in which case SILGen is able to directly emit the closure with the expected abstraction pattern, instead of immediately wrapping it in a thunk. This avoids an unnecessary heap allocation. -extension P { - typealias Me = Self +\paragraph{Opaque abstraction patterns.} +Type lowering traverses a formal function type ``in parallel'' with the abstraction pattern when in order to build the \index{SIL function type}SIL function type. The abstraction pattern must have the same shape as the formal type, in the sense that the formal type should be obtainable from the abstraction pattern by substitution. For example, the abstraction pattern \texttt{(Int, \rU) -> \rV} is compatible with the formal type \texttt{(Int, (Bool, String) -> Float) -> Bool}, but not with the formal type \texttt{(Bool, Int, String) -> (Float, Float)}---in the latter, the number of parameters doesn't match up, and neither does the return type. - struct Nested { // error - let value: Me // because what would this mean? +The implementation of this ``parallel decomposition'' is mostly obvious, except when the abstraction pattern consists of a single \index{type parameter}type parameter. This is referred to as an \index{opaque abstraction pattern}\emph{opaque} abstraction pattern in the implementation (not to be confused with opaque parameters or opaque result types). In this situation, we lower the formal function type in the ``most general'' way, as if each one of its parameters and results was passed indirectly, without exploding tuples. The next three examples will illustrate this. - func method() { - print(value) - } - } - - func f() { - Nested(value: self).method() - } +\begin{example} +Let's modify the \texttt{apply()} function from \ExRef{ex:abstraction pattern 1} to receive an array of closures instead, while making the closure's parameter concrete: +\begin{Verbatim} +func apply1(_ x: String, _ fns: [(String) -> T]) -> [T] { + return fns.map { fn in fn(x) } +} +\end{Verbatim} +We can do the same but make the result concrete instead: +\begin{Verbatim} +func apply2(_ x: T, _ fns: [(T) -> String]) -> [String] { + return fns.map { fn in fn(x) } } - -struct S1: P {} -struct S2: P {} // are S1.Nested and S2.Nested distinct? \end{Verbatim} -\end{listing} -If the nested type captures \tSelf, the code shown in \ListingRef{nominal type in generic local context} would become valid. With this model, the \texttt{Nested} struct depends on \tSelf, so it would not make sense to reference it as a member of the protocol itself, like \texttt{P.Nested}. Instead, \texttt{Nested} would behave as if it was a member of every \index{conforming type}conforming type, like \texttt{S.Nested} above (or even \texttt{T.Nested}, if \texttt{T} is a generic parameter conforming to \texttt{P}). At the implementation level, the generic signature of a nominal type nested inside of a protocol context would include the protocol \tSelf\ type, and the \emph{entire} parent type, for example \texttt{S} in \texttt{S.Nested}, would become the replacement type for \tSelf\ in the context substitution map. +Say we call both functions with an array containing some large number of closures: +\begin{Verbatim} +let fns: [(String) -> String] = [ + { $0.uppercased() }, + /* and more... */ +] +apply1("hello ", "world", fns) +apply2("hello ", "world", fns) +\end{Verbatim} +We don't want to re-abstract every closure in the array before we pass it to \texttt{apply1()} and \texttt{apply2()}. Instead, when storing a closure in a generic container, we re-abstract it into its most general form, where both the parameter and result are passed indirectly. This is accomplished by wrapping each closure in a \index{substituted-to-original thunk}substituted-to-original thunk before storing it in the array, and wrapping it again in an \index{original-to-substituted thunk}original-to-substituted thunk when loading it from the array. +\end{example} -The alternative is to prohibit the nested type from referencing the protocol \tSelf\ type. The nested type's generic signature would \emph{not} include the protocol \tSelf\ type, and \texttt{P.Nested} would be a valid member type reference. The protocol would effectively act as a namespace for the nominal types it contains, with the nested type not depending on the conformance to the protocol in any way. +An interesting \index{limitation!re-abstraction thunk}limitation of this implementation model is that re-abstraction thunks can nest, resulting in potentially unbounded memory usage. This was reported by a Swift developer on the forums~\cite{abstractleak}. -\begin{listing}\captionabove{Protocol declaration nested inside other declaration contexts}\label{protocol nested inside type} +\begin{example} +Suppose we take an array of closures, repeatedly load each element from the array, and store it back without modification: \begin{Verbatim} -struct Outer { - protocol P {} // allowed as of SE-0404 +var fns: [(String) -> String] = [] + +// Initialize the array +for n in 0 ..< 100 { + array.append({ $0.lowercased() }) } -struct S: Outer.P {} +// Repeat this as many times as necessary +for _ in 0 ..< 1000 { + for n in 0 ..< array.count { + let fn = array[n] + array[n] = fn + } +} +\end{Verbatim} +Each loaded element will be wrapped with an \index{original-to-substituted thunk}original-to-substituted thunk, and this thunk will then be wrapped with a \index{substituted-to-original thunk}substituted-to-original thunk before being stored back to the same location in the array. Thus, each iteration will allocate a pair of closure contexts on the heap, and repeating will consume an arbitrary amount of memory. -func generic(_: T) { - protocol P { // error - func f(_: T) // because what would this mean? +For a simple example like the above, we could detect this situation, and either avoid emitting redundant thunks in \index{SILGen}SILGen, or clean up redundant thunks with a \index{SIL optimizer}SIL optimizer pass. A compile-time fix would not completely solve the problem, though, because we can just change the second ``\texttt{for}'' loop into the following: +\begin{Verbatim} +for _ in 0 ..< 1000 { + for n in 0 ..< array.count { + let fn = array[n] + let fn2 = someFunction(fn) + array[n] = fn2 } } \end{Verbatim} -\end{listing} +Perhaps \texttt{someFunction()} just returns its argument unchanged, but if its defined in another module, the compiler has no way to know this, because it cannot analyze its body, so we would encounter the same problem again. To solve this problem completely would require some kind of run time check when forming a re-abstraction thunk, to ``collapse'' multiple levels of nested thunks into one. +\end{example} + +Our final example shows that a re-abstraction thunk may also need to explode and implode \index{tuple type!SIL type lowering}tuple types appearing in the formal function type's parameters and result. +\begin{example} +Consider a closure with formal type \texttt{(String, (Int, Bool)) -> ()}. Using its own formal type as the abstraction pattern, we get a SIL function type with three parameters and no results: +\begin{quote} +\begin{verbatim} +(@guaranteed String, Int, Bool) -> () +\end{verbatim} +\end{quote} +The most general lowered type for this function type, however, is the following: +\begin{quote} +\begin{verbatim} +(@in_guaranteed String, @in (Int, Bool)) -> (@out ()) +\end{verbatim} +\end{quote} +The latter lowered type receives the two-element tuple as a single indirect parameter. It also returns an empty tuple indirectly (an empty tuple value does not take up any space in memory, so there is nothing to load or store; however, the \index{calling convention}calling convention still allots a parameter to serve as the indirect result buffer). Now, if we are asked to emit a \index{substituted-to-original thunk!tuple type}substituted-to-original thunk to convert from the first type to the second, the thunk must explode the tuple and load both elements from memory before passing them directly to the wrapped closure. Likewise, an \index{original-to-substituted thunk!tuple type}original-to-substituted thunk must implode the last two lowered parameters into a single tuple value to be passed indirectly. +\end{example} + +\paragraph{Optional payloads.} +Most \index{generic nominal type!SIL type lowering}generic nominal types use a uniform representation that does not depend on the abstraction pattern, and type lowering simply returns the original formal type unchanged when given such a formal type, ignoring the abstraction pattern. In particular, the generic arguments of generic nominal types are not themselves lowered. + +Optional types are the exception. (Recall the declaration of \texttt{Optional} from \ExRef{ex:lowering optional}.) Optional function types are particularly common, and we don't want to pay the cost of allocating a re-abstraction thunk every time we wrap a function type with an optional, or unwrap an optional function type. + +For this reason, SIL type lowering treats \index{optional type!re-abstraction}optional types as a special case. When lowering a formal optional type with an optional type as the abstraction pattern, we lower its payload type with the abstraction pattern's formal type, and form a \IndexDefinition{lowered optional type}lowered optional type from the result. + +For example, consider the formal type \texttt{Optional<(String) -> String>}. Lowering this type with itself as the abstraction pattern yields the following---notice how the generic argument becomes a \index{SIL function type!optional payload}SIL function type, and this SIL function type receives its parameter and returns its result directly: +\begin{center} +\texttt{Optional<@convention(thick) (@guaranteed String) -> (@owned String)>} +\end{center} -\paragraph{Protocols in other declaration contexts.} The final possibility is the nesting of protocols inside other declaration contexts, such as functions or nominal types. This breaks down into two cases, illustrated in \ListingRef{protocol nested inside type}: +Since the same formal optional type can map to one of several lowered optional types depending on the abstraction pattern, values of optional types may need to be re-abstracted. We accomplish this by generating a conditional branch. If the active case is ``\texttt{none}'' we simply return this empty value. If the active case is ``\texttt{some}'' we wrap the payload in a re-abstraction thunk, and then wrap this thunk in a new optional value. + +Finally, we are ready to look at the type lowering algorithm. + +\begin{algorithm}[Lower type with abstraction pattern]\label{alg:compute lowered type} +Receives a \index{formal type}formal type~\tX\ and an \index{abstraction pattern}abstraction pattern \tY\ as input, together with an optional \index{generic signature!SIL type lowering}generic signature~$G$ in the case where \tY\ \index{type!containing type parameters}contains type parameters. Returns a lowered type. The structure of the abstraction pattern must match the formal type. +\begin{enumerate} +\item If \tX\ is a \index{function type!SIL type lowering}\textbf{function type}: +\begin{enumerate} +\item Walk the formal parameters of \tX\ and \tY\ in parallel: \begin{enumerate} -\item Protocols inside non-generic declaration contexts. -\item Protocols inside generic declaration contexts. +\item Lower each formal parameter type using the corresponding formal parameter type of \tY\ as the abstraction pattern. +\item For any formal parameter of \tY\ that is a tuple type, explode the lowered type into multiple lowered parameters. +\item Compute a category for each lowered parameter type using \AlgRef{alg:lowered type category}. +\end{enumerate} +\item Consider the formal result types of \tX\ and \tY: +\begin{enumerate} +\item Lower the formal result type of \tX\ using the result type of \tY\ as the abstraction pattern. +\item If the formal result type of \tY\ is a tuple type, explode the lowered result type into multiple lowered results. +\item Compute a category for each lowered result type using \AlgRef{alg:lowered type category}. +\end{enumerate} +\item Collect the lowered parameter types, lowered result types, and their categories, to form a new \index{SIL function type}SIL function type. \end{enumerate} -The first case was originally prohibited, but is now permitted as of \IndexSwift{5.a@5.10}Swift~5.10 \cite{se0404}; the non-generic declaration context acts as a namespace to which the protocol declaration is scoped, but apart from the interaction with name lookup this has no other semantic consequences. The second case is more subtle. If we were to allow a ``generic protocol'' to be parameterized by its outer generic parameters in addition to just the protocol \IndexSelf\tSelf\ type, we would get what \index{Haskell}Haskell calls a \index{multi-parameter type class}``multi-parameter type class.'' Multi-parameter type classes introduce some complications, for example undecidable type inference~\cite{mptc}. -\section{Reabstraction Thunks} +\item If \tX\ is a \index{tuple type!SIL type lowering}\textbf{tuple type}, walk the elements of \tX\ and \tY\ in parallel, and lower each element. Collect the lowered element types to form a lowered tuple type, and return the result. + +\item If \tX\ is an \index{optional type!SIL type lowering}\textbf{optional type}: +\begin{enumerate} +\item Lower the payload type of \tX\ using the payload type of \tY\ as the abstraction pattern. +\item Form a new optional type from this lowered payload type. +\end{enumerate} + +\item If \tX\ is a \index{struct type!SIL type lowering}\textbf{struct type}, \index{enum type!SIL type lowering}\textbf{enum type} (except for \texttt{Optional}, which was handled above), or \index{class type!SIL type lowering}\textbf{class type}, return \tX\ itself. (The formal type is already a lowered type in this case, and we ignore the abstraction pattern \tY.) +\end{enumerate} +\end{algorithm} + +\paragraph{Substitution with lowered types.} +We're now going to investigate the meaning of applying a \index{substitution map!lowered type}substitution map to a \index{lowered type!type substitution}\index{type substitution!lowered type}lowered type. This is a common operation in the \index{SIL optimizer}SIL optimizer; for example, it is used to generate \index{specialization}specializations of generic functions. If the implementation of the function is visible from the call site, the optimizer can generate a specialization by cloning each SIL instruction in the function's body, and applying the substitution map from the call site. + +A substitution map's \index{replacement type!SIL type lowering}replacement types are always \index{formal type!type substitution}formal types. When we substitute a type parameter appearing in a lowered type, we must sometimes lower the replacement type if it appears in a certain position. We always lower the replacement type with an opaque abstraction pattern. We do this precisely in those positions where \AlgRef{alg:compute lowered type} would lower a formal type that appears there. Such a position within a type is referred to as \IndexDefinition{lowered position}\emph{lowered position}. Lowered positions are defined recursively. The entire original lowered type is itself in lowered position. Furthermore: +\begin{enumerate} +\item A \index{SIL function type!lowered position}SIL function type can only appear in lowered position, and all of its parameter and result types are also in lowered position. +\item If a \index{tuple type!lowered position}tuple type appears in lowered position, its element types are also in lowered position. +\item If an \index{optional type!lowered position}optional type appears in lowered position, its generic argument is also in lowered position. +\end{enumerate} +If a type parameter occurs in any other position, the replacement type is not lowered, and it is substituted as-is. + +Let's say that \tY~is an abstraction pattern, \tX~is a formal type, and $L(\tY, \tX)$ is the lowered type for~\tX\ with respect to~\tY. Finally, let ``$=$'' denote \index{canonical type equality}canonical type equality. In \ExRef{ex:abstraction pattern 1}, we saw that lowering a type with its own abstraction pattern is not, in general, compatible with type substitution---that is, $L(\tX, \tX) \otimes \Sigma \neq L(\tX \otimes \Sigma, \tX \otimes \Sigma)$. However, there is an identity relating type lowering with type substitution. When the abstraction pattern \tY\ remains unsubstituted and we apply a substitution map to both sides, it is always true that $L(\tY, \tX) \otimes \Sigma = L(\tY, \tX \otimes \Sigma)$. \begin{example} -Unbounded space usage example -\index{limitation!reabstraction thunks} +Let's apply $\SubstMap{\SubstType{\rT}{Int},\,\rU \mapsto \left[ \texttt{() -> String} \right] }$ to a pair of lowered types. First, consider the generic nominal type \texttt{Array<(\rT, \rU)>}. Neither of the two type parameters are in lowered position, so type substitution will replace them without lowering, and we just get \texttt{Array<(Int, () -> String)>}. + +Now, suppose we apply $\Sigma$ to \texttt{Optional<(\rT, \rU)>}. Both type parameters are in lowered position, so type substitution will lower the replacement types with an \index{opaque abstraction pattern}opaque abstraction pattern. We see that while \texttt{Int} remains unchanged, the formal function type \texttt{() -> String} becomes a SIL function type, and we get: +\[ +\texttt{Optional<(Int, @convention(thick) () -> (@out String))>} +\] \end{example} -This is not the whole story; the representation of SIL function type is more elaborate \cite{substfunctype}. +\paragraph{History.} +\IndexSwift{3.1}Swift~3.1 introduced the special type lowering support and re-abstraction for \index{optional type!SIL type lowering}optional payloads; prior to this, optional types were lowered like every other generic nominal type~\cite{optionalpayload}. The \index{opaque abstraction pattern}most general form of a \index{function type!SIL type lowering}function type changed in \IndexSwift{5.0}Swift~5, when \IndexFlag{language-mode}\verb|-swift-version 3| mode was dropped (these days, the \texttt{-swift-version} flag is named \texttt{-language-mode}, and the minimum supported language mode at the time of writing is~\texttt{4}). As we saw in \SecRef{sec:more types}, \IndexSwift{3.0}Swift~3 did not distinguish between function types that receive multiple parameters and function types with a single parameter of \index{tuple type!SIL type lowering}tuple type. Thus, in Swift~3, the most general form of a function type had to implode all parameters into a single indirect parameter of tuple type, which was somewhat inefficient. Resolving this was a strong impetus for dropping Swift~3 source compatibility~\cite{mostopaque}. Finally, \IndexSwift{5.6}Swift~5.6 introduced the peephole optimization allowing \index{SILGen}SILGen to emit a closure expression with an arbitrary abstraction pattern, rather than immediately wrapping it with a substituted-to-original thunk~\cite{emitabstract}. -\section{Source Code Reference}\label{substmapsourcecoderef} +\section{Source Code Reference}\label{src:substitution maps} Key source files: \begin{itemize} \item \SourceFile{include/swift/AST/SubstitutionMap.h} -\item \SourceFile{lib/AST/SubstitutionMap.cpp} -\item \SourceFile{lib/AST/TypeSubstitution.cpp} -\end{itemize} -Other source files: -\begin{itemize} -\item \SourceFile{include/swift/AST/GenericSignature.h} \item \SourceFile{include/swift/AST/Type.h} \item \SourceFile{include/swift/AST/Types.h} +\item \SourceFile{lib/AST/SubstitutionMap.cpp} +\item \SourceFile{lib/AST/TypeSubstitution.cpp} \end{itemize} -\IndexSource{type substitution} \apiref{Type}{class} -See also \SecRef{typesourceref}. +See also \SecRef{src:types}. \begin{itemize} -\item \texttt{subst()} applies a substitution map to this type and returns the substituted type. +\item \texttt{subst()} applies a \IndexSource{substitution map}substitution map to this type and returns the \IndexSource{type substitution}\IndexSource{substituted type}substituted type. \end{itemize} -\IndexSource{context substitution map} \apiref{TypeBase}{class} -See also \SecRef{typesourceref} and \SecRef{genericsigsourceref}. \begin{itemize} -\item \texttt{getContextSubstitutionMap()} returns this type's context substitution map. +\item \texttt{getContextSubstitutionMap()} returns this type's \IndexSource{context substitution map}context substitution map. +\end{itemize} + +\apiref{GenericSignature}{class} +See also \SecRef{src:generic signatures}. +\begin{itemize} +\item \texttt{getIdentitySubstitutionMap()} returns the \IndexSource{identity substitution map}identity substitution map for this generic signature, that replaces each \IndexSource{generic signature!type substitution}generic parameter with itself. \end{itemize} -\IndexSource{substitution map} -\IndexSource{input generic signature} -\IndexSource{empty substitution map} -\IndexSource{substitution map composition} \apiref{SubstitutionMap}{class} -Represents an immutable, uniqued substitution map. See also \SecRef{conformancesourceref}. +A \IndexSource{substitution map}substitution map. Just like \texttt{Type} and \texttt{GenericSignature}, substitution maps are immutable and uniqued, and the representation of a \texttt{SubstitutionMap} fits in a single pointer, so they are cheap to pass by value. -\paragraph{Canonical substitution maps.} -\IndexDefinition{canonical substitution map}% -\index{canonical type}% -\index{canonical conformance}% -\IndexDefinition{substitution map equality}% -Substitution maps are immutable and uniqued, just like types and generic signatures. A substitution map is canonical if all replacement types are canonical types and all conformances are canonical conformances. A substitution map is canonicalized by constructing a new substitution map from the original substitution map's canonicalized replacement types and conformances. +The default \texttt{SubstitutionMap()} constructor constructs an \IndexSource{empty substitution map}empty substitution map. The implicit \texttt{bool} conversion tests for a non-empty substitution map. We defer our discussion of the entry points that build non-empty substitution maps to \SecRef{src:conformances}, since this involves passing in an array of conformances as well, and we haven't introduced conformances yet. -As with types, canonicalization gives substitution maps two levels of equality; two substitution maps are equal pointers if their replacement types and conformances are equal pointers. Two substitution maps are canonically equal if their canonical substitution maps are equal pointers; or equivalently, if their replacement types and conformances are canonically equal. +Accessor methods: +\begin{itemize} +\item \texttt{empty()} answers if this is the empty substitution map; this is the logical negation of the \texttt{bool} implicit conversion. +\item \texttt{getGenericSignature()} returns the substitution map's \IndexSource{input generic signature}input generic signature. +\item \texttt{getReplacementTypes()} returns an array of \texttt{Type}. +\item \texttt{hasAnySubstitutableParams()} answers if this substitution map's input generic signature contains at least one generic parameter that is not fixed to a concrete type (see the \texttt{GenericSignatureImpl::areAllParamsConcrete()} method from \SecRef{src:generic signatures}). +\end{itemize} -Applying a canonical substitution map to a canonical original type is not guaranteed to produce a canonical substituted type. However, there are two important invariants that do hold: -\begin{enumerate} -\item Given two canonically equal original types, applying the same substitution map to both will produce two canonically equal substituted types. -\item Given an original type and two canonically equal substitution maps, applying the two substitution maps to this type will also produce two canonically equal substituted types. -\end{enumerate} +Recursive properties of replacement types: +\begin{itemize} +\item \texttt{hasPrimaryArchetypes()} answers if any of the replacement types contain \IndexSource{primary archetype}primary archetypes. +\item \texttt{hasOpenedExistential()} answers if any of the replacement types contain an \IndexSource{existential archetype}existential archetype. +\item \texttt{hasDynamicSelf()} answers if any of the replacement types contain the \IndexSource{dynamic Self type@dynamic \tSelf\ type}dynamic Self type. +\end{itemize} + +A substitution map is \IndexSource{canonical substitution map}\emph{canonical} if all replacement types are \IndexSource{canonical type}canonical types and all conformances are \IndexSource{canonical conformance}canonical conformances. A substitution map is canonicalized by constructing a new substitution map from the original substitution map's canonicalized replacement types and conformances. +\begin{itemize} +\item \texttt{isCanonical()} answers if the replacement types and conformances stored in this substitution map are canonical. +\item \texttt{getCanonical()} constructs a new substitution map by canonicalizing the replacement types and conformances of this substitution map. +\end{itemize} -As with \texttt{Type} and \texttt{GenericSignature}, this class stores a single pointer, so substitution maps are cheap to pass around as values. The default constructor \texttt{SubstitutionMap()} constructs an empty substitution map. The implicit \texttt{bool} conversion tests for a non-empty substitution map. +Substitution map composition (\SecRef{sec:composition}): +\begin{itemize} +\item \texttt{subst()} returns the \IndexSource{substitution map composition}composition of this substitution map on the left with the given substitution map on the right. +\end{itemize} -\IndexSource{substitution map equality} -The overload of \texttt{operator==} implements substitution map pointer equality. Canonical equality can be tested by first canonicalizing both sides: +As with types, substitution maps have two levels of \IndexSource{substitution map equality}equality; two substitution maps are equal pointers if their replacement types and conformances are equal pointers. Two substitution maps are canonically equal if their canonical substitution maps are equal pointers; or equivalently, if their replacement types and conformances are canonically equal. The \texttt{operator==} overloads implements pointer equality. Canonical equality can be tested by first canonicalizing both sides: \begin{Verbatim} if (subMap1.getCanonical() == subMap2.getCanonical()) ...; \end{Verbatim} +However, unlike types, checking two substitution maps for equality is rare. The fact that they are uniquely-allocated is more of a performance optimization than anything else. -\index{primary archetype} -\index{opened archetype} -\Index{dynamic Self type@dynamic \tSelf\ type} -\IndexSource{canonical substitution map} -Accessor methods: +\subsection*{Subclassing} + +\apiref{ClassDecl}{class} +See also \SecRef{src:declarations}. \begin{itemize} -\item \texttt{empty()} answers if this is the empty substitution map; this is the logical negation of the \texttt{bool} implicit conversion. -\item \texttt{getGenericSignature()} returns the substitution map's input generic signature. -\item \texttt{getReplacementTypes()} returns an array of \texttt{Type}. -\item \texttt{hasAnySubstitutableParams()} answers if the input generic signature contains at least one generic parameter not fixed to a concrete type; that is, it must be non-empty and not fully concrete (see the \texttt{areAllParamsConcrete()} method of \texttt{GenericSignatureImpl} from \SecRef{genericsigsourceref}). +\item \texttt{getSuperclass()} returns this \IndexSource{class declaration}class declaration's \IndexSource{superclass type}superclass type. \end{itemize} -Recursive properties computed from replacement types: + +\apiref{Type}{class} \begin{itemize} -\item \texttt{hasArchetypes()} answers if any of the replacement types contain a primary archetype or opened existential archetype. -\item \texttt{hasOpenedExistential()} answers if any of the replacement types contain an opened existential archetype. -\item \texttt{hasDynamicSelf()} answers if any of the replacement types contain the dynamic Self type. +\item \texttt{getSuperclass()} returns this \IndexSource{class type}class type's \IndexSource{superclass type}superclass type. This is \AlgRef{superclass type of type}. +\item \texttt{getSuperclassDecl()} returns this \IndexSource{class type}class type's \IndexSource{superclass declaration}superclass declaration. This is \AlgRef{superclassfordecl}. \end{itemize} -Canonical substitution maps: + +\subsection*{SIL Types} + +Key source files: \begin{itemize} -\item \texttt{isCanonical()} answers if the replacement types and conformances stored in this substitution map are canonical. -\item \texttt{getCanonical()} constructs a new substitution map by canonicalizing the replacement types and conformances of this substitution map. +\item \SourceFile{include/swift/AST/Types.h} +\item \SourceFile{include/swift/SIL/SILType.h} +\item \SourceFile{lib/SIL/IR/SILType.cpp} +\item \SourceFile{lib/SIL/IR/SILFunctionType.cpp} \end{itemize} -Composing substitution maps (\SecRef{submapcomposition}): + +A \IndexSource{lowered type}lowered type is represented as a \texttt{CanType}, just like a \IndexSource{formal type}formal type. Be forewarned that the distinction between the two is not enforced statically, and care must be taken not to get them mixed up. + +\apiref{TypeBase}{class} +See also \SecRef{src:types}. Two methods distinguish formal types from lowered types: \begin{itemize} -\item \texttt{subst()} applies another substitution map to this substitution map, producing a new substitution map. +\item \texttt{isLegalFormalType()} checks if this is a legal formal type. +\item \texttt{isLegalSILType()} checks if this is a legal lowered type. \end{itemize} -Two overloads of the \texttt{get()} static method are defined for constructing substitution maps (\SecRef{buildingsubmaps}). -\IndexSource{get substitution map} -\medskip -\noindent -\texttt{get(GenericSignature, ArrayRef, ArrayRef)}\newline builds a new substitution map from an input generic signature, an array of replacement types, and array of conformances. +\apiref{SILValueCategory}{enum class} +In the implementation, whether a type is loadable or address-only is referred to as the lowered type's ``category.'' +\begin{itemize} +\item \texttt{SILValueCategory::Object} is the category of loadable types. +\item \texttt{SILValueCategory::Address} is the category of address-only types. +\end{itemize} -\medskip -\noindent -\texttt{get(GenericSignature, TypeSubstitutionFn, LookupConformanceFn)} builds a new substitution map by invoking a pair of callbacks to produce each replacement type and conformance. +\apiref{SILType}{class} +Combines a lowered type together with its category. SIL types can be taken apart: +\begin{itemize} +\item \texttt{getASTType()} returns the SIL type's lowered type as a \texttt{CanType}. +\item \texttt{getCategory()} returns the SIL type's category. +\end{itemize} +New SIL types can be constructed from a lowered type and a category as follows: +\begin{itemize} +\item \texttt{SILType::getPrimitiveObjectType()} is a static factory method that returns a new object SIL type for the given lowered type. The lowered type must be loadable. +\item \texttt{SILType::getPrimitiveAddressType()} is a static factory method that returns a new address SIL type for the given lowered type. +\end{itemize} +One can also call the \texttt{SILType} constructor with a lowered type and a \texttt{SILValueCategory}. -\IndexSource{replacement type callback} -\apiref{TypeSubstitutionFn}{type alias} -The type signature of a replacement type callback for \texttt{SubstitutionMap::get()}. -\begin{verbatim} -using TypeSubstitutionFn - = llvm::function_ref; -\end{verbatim} -The parameter type is always a \texttt{GenericTypeParamType *} when the callback is used with \texttt{SubstitutionMap::get()}. +Note that all of the above requires already having a lowered type on hand; attempting to construct a SIL type from a formal type that is not a lowered type will result in a run time assertion. To construct a SIL type from a formal type, the formal type must be lowered first, as discussed below. -\IndexSource{query substitution map callback} -\apiref{QuerySubstitutionMap}{struct} -A callback intended to be used with \texttt{SubstitutionMap::get()} as a replacement type callback. -Overloads \texttt{operator()} with the signature of \texttt{TypeSubstitutionFn}. +SIL types support down-casting to the various \texttt{TypeBase} subclasses that represent different kinds of lowered types. The \texttt{SILType} class declares three template methods, \verb|is<>()|, \verb|getAs<>|, and \verb|castTo<>|. These are equivalent to calling \texttt{getASTType()} and then passing the result to the top-level \verb|isa<>|, \verb|dyn_cast<>|, and \verb|cast<>| template functions discussed in \SecRef{src:types}. -Constructed from a \texttt{SubstitutionMap}: -\begin{Verbatim} -QuerySubstitutionMap{subMap} -\end{Verbatim} +The printed representation of a SIL type is \verb|$type| for a loadable type and \verb|$*type| for an address-only type, where \verb|type| is the printed representation of its lowered type. This printed representation shows up in \IndexFlag{emit-silgen}\verb|-emit-silgen| output, for example. -\IndexSource{query type map callback} -\apiref{QueryTypeSubstitutionMap}{struct} -A callback intended to be used with \texttt{SubstitutionMap::get()} as a replacement type callback. -Overloads \texttt{operator()} with the signature of \texttt{TypeSubstitutionFn}. +\apiref{SILFunctionType}{class} +A \texttt{TypeBase} subclass representing a \IndexSource{SIL function type}SIL function type. +\begin{itemize} +\item \texttt{getParameters()} returns an \texttt{ArrayRef} with the function's \IndexSource{lowered parameter}lowered parameters. +\item \texttt{getResults()} returns an \texttt{ArrayRef} with the function's \IndexSource{lowered result}lowered results. +\item \texttt{getInvocationGenericSignature()} returns the function's \IndexSource{generic signature!SIL function type}generic signature. +\end{itemize} -Constructed from an LLVM \texttt{DenseMap}: -\begin{Verbatim} -DenseMap typeMap; +In reality, generic SIL function types have a more complex representation than what we've described; see~\cite{substfunctype}. -QueryTypeSubstitutionMap{typeMap} -\end{Verbatim} +\subsection*{SIL Type Lowering} -\IndexSource{conformance lookup callback} -\apiref{LookupConformanceFn}{type alias} -The type signature of a conformance lookup callback for \texttt{SubstitutionMap::get()}. -\begin{verbatim} -using LookupConformanceFn = llvm::function_ref< - ProtocolConformanceRef(CanType origType, - Type substType, - ProtocolDecl *conformedProtocol)>; -\end{verbatim} +Key source files: +\begin{itemize} +\item \SourceFile{include/swift/SIL/AbstractionPattern.h} +\item \SourceFile{include/swift/SIL/TypeLowering.h} +\item \SourceFile{lib/SIL/IR/AbstractionPattern.cpp} +\item \SourceFile{lib/SIL/IR/TypeLowering.cpp} +\end{itemize} -\IndexSource{global conformance lookup callback} -\apiref{LookUpConformanceInModule}{struct} -A callback intended to be used with \texttt{SubstitutionMap::get()} as a conformance lookup callback. Overloads \texttt{operator()} with the signature of \texttt{LookupConformanceFn}. +\apiref{AbstractionPattern}{class} +An \IndexSource{abstraction pattern}abstraction pattern. +\begin{itemize} +\item \texttt{getKind()} returns the abstraction pattern's kind. See the source code for an explanation of the possible kinds. +\item \texttt{isTypeParameterOrOpaqueArchetype()} returns if this is an \IndexSource{opaque abstraction pattern}opaque abstraction pattern. +\item \texttt{getType()} returns the abstraction pattern's \IndexSource{formal type!abstraction pattern}formal type. +\end{itemize} -Performs a global conformance lookup. Constructed without arguments: -\begin{Verbatim} -LookUpConformanceInModule() -\end{Verbatim} +\apiref{TypeConverter}{class} +A singleton object responsible for \IndexSource{SIL type lowering}SIL type lowering. +\begin{itemize} +\item \texttt{getLoweredType()} takes a \texttt{CanType} representing a formal type, and returns a \texttt{SILType} which encodes the lowered type together with its category. +\item \texttt{getTypeLowering()} takes a \texttt{CanType} representing a formal type, and returns a \texttt{TypeLowering} object, which encodes the lowered type together with additional recursive properties computed by type lowering, such as whether the type is \IndexSource{trivial type}trivial. +\end{itemize} -\IndexSource{local conformance lookup callback} -\apiref{LookUpConformanceInSubstitutionMap}{struct} -A callback intended to be used with \texttt{SubstitutionMap::get()} as a conformance lookup callback. Overloads \texttt{operator()} with the signature of \texttt{LookupConformanceFn}. +\subsection*{Re-abstraction Thunks} -Constructed with a \texttt{SubstitutionMap}: -\begin{Verbatim} -LookUpConformanceInSubstitutionMap{subMap} -\end{Verbatim} +Key source files: +\begin{itemize} +\item \SourceFile{lib/SILGen/SILGenPoly.cpp} +\item \SourceFile{lib/SILGen/SILGenThunk.cpp} +\end{itemize} -\IndexSource{make abstract conformance callback} -\apiref{MakeAbstractConformanceForGenericType}{struct} -A callback intended to be used with \texttt{SubstitutionMap::get()} as a conformance lookup callback. Overloads \texttt{operator()} with the signature of \texttt{LookupConformanceFn}. +\apiref{SILGenFunction}{class} +A class containing various entry points used by \IndexSource{SILGen}SILGen to emit the SIL functions. The following two entry points emit \IndexSource{re-abstraction thunk}re-abstraction thunks: +\begin{itemize} +\item \texttt{emitOrigToSubstValue()} re-abstracts a value from a given abstraction pattern to the ``natural'' abstraction pattern for its formal type. If the value has function type, this emits an \IndexSource{original-to-substituted thunk}original-to-substituted thunk. +\item \texttt{emitSubstToOrigValue()} re-abstracts a value from the ``natural'' abstraction pattern for its formal type to a given abstraction pattern. If the value has function type, this emits an \IndexSource{substituted-to-original thunk}substituted-to-original thunk. +\end{itemize} -Constructed without arguments: -\begin{Verbatim} -MakeAbstractConformanceForGenericType() -\end{Verbatim} +\subsection*{SIL Type Substitution} -\apiref{GenericSignature}{class} -See also \SecRef{genericsigsourceref}. +Key source files: +\begin{itemize} +\item \SourceFile{lib/SIL/IR/SILTypeSubstitution.cpp} +\end{itemize} +\apiref{SILType}{class} +\IndexSource{type substitution!lowered type}SIL type substitution: \begin{itemize} -\item \texttt{getIdentitySubstitutionMap()} returns the \IndexSource{identity substitution map}substitution map that replaces each \IndexSource{generic signature!type substitution}generic parameter with itself. +\item \texttt{subst()} applies a \IndexSource{substitution map!lowered type}substitution map to a SIL type. \end{itemize} +\apiref{SILTypeSubstituter}{class} +A \texttt{CanTypeVisitor} implementation used to implement the above. + \end{document} diff --git a/docs/Generics/chapters/symbols-terms-and-rules.tex b/docs/Generics/chapters/symbols-terms-and-rules.tex index 919977715153c..1f279b7ca7bd9 100644 --- a/docs/Generics/chapters/symbols-terms-and-rules.tex +++ b/docs/Generics/chapters/symbols-terms-and-rules.tex @@ -2,9 +2,9 @@ \begin{document} -\chapter{Symbols, Terms, and Rules}\label{symbols terms rules} +\chapter{Symbols, Terms, and Rules}\label{chap:symbols terms rules} -\lettrine{O}{ur motivation} for translating a \index{monoid presentation}monoid presentation into a generic signature was to show that there are theoretical limits to what we can accept. In comparison, the translation of a generic signature into a monoid presentation is an eminently practical matter. We ended \ChapRef{rqm basic operation} by saying that a requirement machine consists of rewrite rules. We now make this precise. As with did with the derived requirements formalism in \SecRef{derived req}, we gradually reveal the full encoding. We begin with the core of the Swift generics model: \index{unbound type parameter!in requirement machine}unbound type parameters, conformance requirements, and same-type requirements between type parameters. We prove a correctness result, and then extend the encoding to cover bound type parameters and the other requirement kinds. +\lettrine{O}{ur motivation} for translating a \index{monoid presentation}monoid presentation into a generic signature was to show that there are theoretical limits to what we can accept. In comparison, the translation of a generic signature into a monoid presentation is an eminently practical matter. We ended \ChapRef{rqm basic operation} by saying that a requirement machine consists of rewrite rules. We now make this precise. As we did with the derived requirements formalism in \SecRef{derived req}, we gradually reveal the full encoding. We begin with the core of the Swift generics model: \index{unbound type parameter!in requirement machine}\index{unbound dependent member type!in requirement machine}unbound type parameters, conformance requirements, and same-type requirements between type parameters. We prove a correctness result, and then extend the encoding to cover bound type parameters and the other requirement kinds. \paragraph{Core model.} Let $G$ be a \index{generic signature!requirement machine}generic signature. We recall from \SecRef{protocol component} that if~\tP\ is some protocol, then $G\prec\tP$ means that $G$ \index{protocol dependency set}depends on~\tP. We now introduce a certain \index{monoid presentation!of requirement machine}monoid presentation~$\AR$, called the \IndexDefinition{requirement machine!monoid presentation}\emph{requirement machine} for~$G$. @@ -20,7 +20,7 @@ \chapter{Symbols, Terms, and Rules}\label{symbols terms rules} \end{tabular} \end{center} -To get~$R$, we define a ``$\Term$'' function to translate a type parameter of~$G$ into a \index{term!in requirement machine}term, and a ``$\Rule$'' function to translate each \index{explicit requirement}explicit requirement of~$G$ into a \index{rewrite rule!in requirement machine}rewrite rule: +To get~$R$, we define a ``$\Term$'' function to translate a \index{type parameter!in requirement machine}\index{dependent member type!in requirement machine}type parameter of~$G$ into a \index{term!in requirement machine}term, and a ``$\Rule$'' function to translate each \index{explicit requirement}explicit requirement of~$G$ into a \index{rewrite rule!in requirement machine}rewrite rule: \begin{center} \begin{tabular}{l@{ $:=$ }l} \toprule @@ -52,7 +52,7 @@ \chapter{Symbols, Terms, and Rules}\label{symbols terms rules} \bottomrule \end{tabular} \end{center} -This completely describes a very remarkable object, where the \emph{derived requirements} of~$G$ are \emph{rewrite paths} in the requirement machine for~$G$. +This completely describes a very remarkable object, where the \emph{derived requirements} of~$G$ are \index{rewrite path!in requirement machine}\emph{rewrite paths} in the requirement machine for~$G$. \begin{example}\label{rqm first example} Let $G$ be the generic signature below: @@ -65,8 +65,12 @@ \chapter{Symbols, Terms, and Rules}\label{symbols terms rules} \end{quote} We met this signature in \ExRef{motivating derived reqs} and then studied it in \SecRef{derived req} and \SecRef{valid type params}. We will now construct its requirement machine. -Our generic signature depends on \tSequence\ and \texttt{IteratorProtocol}. Both declare an associated type named ``\texttt{Element}'', and \tSequence\ also declares an associated type named ``\texttt{Iterator}''. Our alphabet has six symbols---two of each kind: -\[ A := \{ \underbrace{\mathstrut\rT,\, \rU}_{\text{generic param}},\, \underbrace{\mathstrut\nElement,\, \nIterator}_{\text{name}},\, \underbrace{\mathstrut\pSequence,\, \pIterator}_{\text{protocol}} \} \] +Our generic signature depends on \tSequence, \texttt{IteratorProtocol}, and \texttt{Equatable}. The first two declare an associated type named \nElement, and \tSequence\ also declares an associated type named ``\nIterator''. Our alphabet has seven symbols: +\begin{ceqn} +\[ +\{ \underbrace{\mathstrut\rT,\, \rU}_{\text{generic param\vphantom{l}}},\, \underbrace{\mathstrut\nElement,\, \nIterator}_{\text{name\vphantom{pl}}},\, \underbrace{\mathstrut\pSequence,\, \pIterator,\,\pEquatable}_{\text{protocol}}\} +\] +\end{ceqn} Here are the requirements of $G$ and \tSequence\ that will define our rewrite rules: \begin{gather*} @@ -91,7 +95,7 @@ \chapter{Symbols, Terms, and Rules}\label{symbols terms rules} One more remark. Name symbols and protocol symbols exist in different namespaces, so even if we renamed \tIterator\ to \texttt{Iterator}, the name symbol \texttt{Iterator} would remain distinct from the protocol symbol $\protosym{Iterator}$. On the other hand, two associated types with the same name in different protocols always define the \emph{same} name symbol. -Now, let's continue with our example and think about the \index{term equivalence relation!in requirement machine}equivalence relation these rewrite rules generate. Notice how there is a certain symmetry behind the appearance of protocol symbols. The conformance requirements (1), (2), (3) and (5) rewrite a term that \emph{ends} in a protocol symbol. The associated requirements (5) and (6) rewrite a term that \emph{begins} with a protocol symbol. (The associated conformance requirement (5) is both; its left-hand side begins and ends with a protocol symbol.) +Now, let's continue with our example and think about the \index{term equivalence relation!in requirement machine}equivalence relation these rewrite rules generate. Notice how there is a certain symmetry behind the appearance of protocol symbols. The conformance requirements (1), (2), (3), and (5) rewrite a term that \emph{ends} in a protocol symbol. The associated requirements (5) and (6) rewrite a term that \emph{begins} with a protocol symbol. (The associated conformance requirement (5) is both; its left-hand side begins and ends with a protocol symbol.) We know that $G\vdash\SameReq{\rT.Iterator.Element}{\rU.Iterator.Element}$ from \ExRef{derived equiv example}, even though this requirement is not explicitly stated. Applied to this requirement, ``$\Rule$'' outputs this ordered pair of terms: \[ (\rT\cdot\nIterator\cdot\nElement,\, \rU\cdot\nIterator\cdot\nElement) \] @@ -138,7 +142,7 @@ \chapter{Symbols, Terms, and Rules}\label{symbols terms rules} \section{Correctness}\label{rqm correctness} -We saw some examples of encoding derived requirements as word problems. We now make this precise. We prove two theorems, to establish that we can translate derivations into rewrite paths, and vice versa. We use the algebra of rewriting from \SecRef{rewrite graph}. +We saw some examples of encoding \index{derived requirement!in requirement machine}derived requirements as word problems. We now make this precise. We prove two theorems, to establish that we can translate derivations into \index{rewrite path!in requirement machine}rewrite paths, and vice versa. We use the algebra of rewriting from \SecRef{rewrite graph}. We begin with a preliminary result. In \SecRef{derived req}, we defined the \IndexStep{AssocConf}\textsc{AssocConf} and \IndexStep{AssocConf}\textsc{AssocSame} inference rules via formal substitution: if \tT\ is some type parameter of~$G$ known to conform to some protocol~\tP, and \SelfU\ is some type parameter of~$\GP$, then \texttt{T.U} denotes the replacement of \tSelf\ with \tT\ in \SelfU. We can relate this to the requirement machine \index{monoid operation!in requirement machine}monoid operation. @@ -194,7 +198,7 @@ \section{Correctness}\label{rqm correctness} A \IndexStep{Reflex}\textsc{Reflex} step derives a trivial same-type requirement from a valid type parameter: \[\ReflexStepDef\] -We don't need the fact that $G\vdash\tT$ at all, because in a finitely-presented monoid, every term is already equivalent to itself via the empty rewrite path, so we let $t := \Term(\tT)$, and we set: +We don't need the fact that $G\vdash\tT$ at all, because in a finitely-presented monoid, every term is already equivalent to itself via the \index{empty rewrite path}empty rewrite path, so we let $t := \Term(\tT)$, and we set: \[\Path \SameReq{T}{T} := 1_t\] \InductiveStep In each case, the ``$\mathsf{path}$'' of the conclusion is defined from the ``$\mathsf{path}$'' of the step's assumptions. For an \IndexStep{AssocConf}\textsc{AssocConf} step, the induction hypothesis gives us $p_1 := \Path\TP$, so $\Src(p_1)=\Term(\tT)\cdot\pP$ and $\Dst(p_1)=\Term(\tT)$: @@ -336,7 +340,7 @@ \section{Correctness}\label{rqm correctness} \end{tabular} \end{figure} -We now go in the other direction, and describe how rewrite paths in the requirement machine define derived requirements in our generic signature. Consider how we might reverse our ``$\mathsf{path}$'' mapping. The main difficulty to overcome is the fact that we did not actually use every assumption in each step: +We now go in the other direction, and describe how \index{rewrite path!in requirement machine}rewrite paths in the requirement machine define derived requirements in our generic signature. Consider how we might reverse our ``$\mathsf{path}$'' mapping. The main difficulty to overcome is the fact that we did not actually use every assumption in each step: \begin{enumerate} \item The ``$\mathsf{path}$'' of a $\textsc{Reflex}$ step $\SameReq{T}{T}$ disregards the proof of the validity of the type parameter~\tT, because in the monoid, we can immediately construct an \index{empty rewrite path}empty rewrite path for \emph{any} term. @@ -377,7 +381,7 @@ \section{Correctness}\label{rqm correctness} \begin{proof} Let $p$ be a rewrite path from $t$ to $z$. We argue by \index{induction}induction on the length of~$p$. -\BaseCase If we have an empty rewrite path, then $t=z$, so $z$ is an admissible term. +\BaseCase If we have an \index{rewrite path}empty rewrite path, then $t=z$, so $z$ is an admissible term. \InductiveStep Otherwise, $p=p^\prime \circ s$ for some rewrite path $p^\prime$ and rewrite step~$s$. We write $s := x(u\Rightarrow v)y$, where $x$, $y\in A^*$, and one of $(u,v)$ or $(v,u)\in R$. @@ -393,7 +397,7 @@ \section{Correctness}\label{rqm correctness} Just like a standard term describes an unbound type parameter, an admissible term more generally describes a type parameter and \textsl{a list of conformance requirements}. \begin{definition} -The ``$\Type$'' function maps each admissible term to an unbound type parameter, by ignoring protocol symbols, and translating generic parameter symbols and name symbols into generic parameter types and dependent member types: +The ``$\Type$'' function maps each admissible term to an unbound \index{type parameter!in requirement machine}type parameter, by ignoring protocol symbols, and translating generic parameter symbols and name symbols into generic parameter types and \index{dependent member type!in requirement machine}dependent member types: \[ \begin{array}{l@{\ :=\ }l} \Type(\ttgp{d}{i})&\ttgp{d}{i} \\ @@ -476,7 +480,7 @@ \section{Correctness}\label{rqm correctness} \item $G \vdash \SameReq{$\Type(t)$}{$\Type(z^\prime)$}$. \item $G \vdash \ConfReq{$\Type(u^\prime)$}{P}$ for each $\ConfReq{$\Type(u^\prime)$}{P}\in\Chart(z^\prime)$. \end{enumerate} -The rewrite step~$s$ applies a conformance or same-type requirement; this requirement is explicit (``$\Rule$'') or associated (``$\RuleP{P}$''); and the rewrite step may be \index{positive rewrite step}positive or \index{negative rewrite step}negative. We consider each case to derive the desired conclusion about~$z$: +The \index{rewrite step!in requirement machine}rewrite step~$s$ applies a conformance or same-type requirement; this requirement is explicit (``$\Rule$'') or associated (``$\RuleP{P}$''); and the rewrite step may be \index{positive rewrite step}positive or \index{negative rewrite step}negative. We consider each case to derive the desired conclusion about~$z$: \begin{center} \begin{tabular}{lllcl} \toprule @@ -585,7 +589,7 @@ \section{Correctness}\label{rqm correctness} \SameConfStep{3}{1}{$\Type(u\cdot y_1)$}{P}{4} \end{gather*} -\Case{6} A negative rewrite step for an explicit same-type requirement has the following general form, where $(u,v)\in R$ for standard terms $u$~and~$v$, the left whisker is empty, and the right whisker~$y$ is is a combination of name and protocol symbols: +\Case{6} A negative rewrite step for an explicit same-type requirement has the following general form, where $(u,v)\in R$ for standard terms $u$~and~$v$, the left whisker is empty, and the right whisker~$y$ is a combination of name and protocol symbols: \[(v\Rightarrow u)\WR y\] We proceed as in Case~5, except that when the \textsc{Same} elementary statement gives us $\SameReq{$\Type(u)$}{$\Type(v)$}$, we must apply the \IndexStep{Sym}\textsc{Sym} inference rule to flip it around to get $\SameReq{$\Type(v)$}{$\Type(u)$}$ before we proceed: @@ -679,9 +683,9 @@ \section{Symbols}\label{rqm symbols} We've now described how the core model of Swift generics translates requirements into rewrite rules. We defined these rules over an alphabet of generic parameter, name, and protocol symbols. The remainder of this chapter describes the full model of Swift generics as it is actually implemented. Going forward, there is less mathematical rigor than before, and more of a focus on practical concerns. -We begin by expanding our alphabet with a few more symbol kinds, to encode both bound type parameters, and the other requirement kinds. While our correctness results only relied on the symmetric term equivalence relation, in the implementation we use the \index{normal form algorithm}normal form algorithm to get something computable, so we will define a reduction order on our alphabet as well. +We begin by expanding our alphabet with a few more symbol kinds, to encode both bound type parameters, and the other requirement kinds. While our correctness results only relied on the symmetric \index{term equivalence relation!in requirement machine}term equivalence relation, in the implementation we use the \index{normal form algorithm}normal form algorithm to get something computable, so we will define a reduction order on our alphabet as well. -Symbols are formed by the \index{rewrite context}rewrite context, the global singleton from \SecRef{protocol component} which manages the lifecycle of requirement machine instances. There are eight symbol kinds, with each kind having its own set of \index{structural components}structural components. This resembles how we modeled types in \ChapRef{types}. +Symbols are formed by the \index{rewrite context}rewrite context, the global singleton from \SecRef{protocol component} which manages the lifecycle of requirement machine instances. There are eight symbol kinds, with each kind having its own set of \index{structural components}structural components. This resembles how we modeled types in \ChapRef{chap:types}. For each symbol kind, a constructor function takes the structural components and returns a pointer to the unique symbol formed from those structural components. The pointer identity of a symbol is determined by its kind and structural components, so checking equality of symbols is cheap. \begin{definition}\label{rqm symbol def} @@ -699,7 +703,7 @@ \section{Symbols}\label{rqm symbols} \end{definition} \paragraph{Name symbols.} -We already introduced name symbols at the beginning of this chapter and there isn't much else to say, but we add one remark. In the requirement machine for a \index{well-formed generic signature!name symbols}well-formed generic signature~$G$, every name symbol ``\nA'' has to be the name of some associated type or type alias of some \tP\ such that $G\prec\tP$---for otherwise, ``\nA'' cannot appear in any valid type parameter. However, we allow name symbols for arbitrary identifiers to be formed, so that we can model invalid programs as well. In any case, only a finite set of name symbols are constructed. +We already introduced name symbols at the beginning of this chapter and there isn't much else to say, but we add one remark. In the requirement machine for a \index{well-formed generic signature!name symbols}well-formed generic signature~$G$, every name symbol ``\nA'' has to be the name of some associated type or type alias of some \tP\ such that $G\prec\tP$---for otherwise, ``\nA'' cannot appear in any valid type parameter. However, we allow name symbols for arbitrary identifiers to be formed, so that we can model invalid programs as well. In any case, only a finite set of name symbols is constructed. \paragraph{Protocol symbols.} The reduction order on \index{protocol symbol!reduction order}protocol symbols has the property that if a protocol~\tQ\ inherits from a protocol~\tP, then $\pP<\pQ$. @@ -756,7 +760,7 @@ \section{Symbols}\label{rqm symbols} In the next section, we will see how associated type symbols arise when we translate type parameters to terms. \SecRef{building rules} will discuss associated type rules. -Why do we need associated type symbols? There are two reasons, and we will study them in great detail in \ChapRef{completion}: +Why do we need associated type symbols? There are two reasons, and we will study them in great detail in \ChapRef{chap:completion}: \begin{enumerate} \item In \SecRef{genericsigqueries}, we saw that we can implement $\Query{isValidTypeParameter}{}$ using the $\Query{areReducedTypeParametersEqual}{}$ and $\Query{requiresProtocol}{}$ generic signature queries. Once we add associated type symbols, we can instead directly decide if a type parameter is valid using the \index{normal form algorithm}normal form algorithm. @@ -772,7 +776,7 @@ \section{Symbols}\label{rqm symbols} \end{algorithm} \begin{example} -We continue \ExRef{protocol reduction order example}. The \tSequence\ protocol declares an \nElement\ associated type, so protocols inheriring from \tSequence\ inherit this associated type. The \index{protocol machine}protocol machine for \texttt{RandomAccessCollection} has an alphabet with various associated type symbols, ordered as follows: +We continue \ExRef{protocol reduction order example}. The \tSequence\ protocol declares an \nElement\ associated type, so protocols inheriting from \tSequence\ inherit this associated type. The \index{protocol machine}protocol machine for \texttt{RandomAccessCollection} has an alphabet with various associated type symbols, ordered as follows: \begin{gather*} \assocsym{RandomAccessCollection}{Element}\\ {} < \assocsym{BidirectionalCollection}{Element}\\ @@ -785,7 +789,7 @@ \section{Symbols}\label{rqm symbols} \medskip -The next three symbol kinds appear when we build rewrite rules for \index{layout requirement!in requirement machine}layout, \index{superclass requirement!in requirement machine}superclass and \index{same-type requirement!in requirement machine}concrete type requirements. Just like a \index{conformance requirement!in requirement machine}conformance requirement $\TP$ defines a rewrite rule $\Term(\tT)\cdot\pP\sim\Term(\tT)$, these other requirement kinds define \index{rewrite rule!in requirement machine}rewrite rules of the form $\Term(\tT)\cdot s\sim\Term(\tT)$, where \tT\ is the requirement's subject type, and $s$ is a layout, superclass, or concrete type symbol. +The next three symbol kinds appear when we build rewrite rules for \index{layout requirement!in requirement machine}layout, \index{superclass requirement!in requirement machine}superclass, and \index{same-type requirement!in requirement machine}concrete type requirements. Just like a \index{conformance requirement!in requirement machine}conformance requirement $\TP$ defines a rewrite rule $\Term(\tT)\cdot\pP\sim\Term(\tT)$, these other requirement kinds define \index{rewrite rule!in requirement machine}rewrite rules of the form $\Term(\tT)\cdot s\sim\Term(\tT)$, where \tT\ is the requirement's subject type, and $s$ is a layout, superclass, or concrete type symbol. \paragraph{Layout symbols.} The $\layoutsym{AnyObject}$ \index{layout symbol}layout symbol represents the \texttt{AnyObject} layout constraint we introduced in \DefRef{requirement def}. It appears when we translate a layout requirement $\ConfReq{T}{AnyObject}$ into a rule $\Term(\tT)\cdot\layoutsym{AnyObject} \sim \Term(\tT)$. @@ -797,19 +801,19 @@ \section{Symbols}\label{rqm symbols} \begin{ceqn} \[\text{concrete type} = \text{pattern type} + \text{substitution terms}\] \end{ceqn} -We're forming the symbol because we're translating an explicit or associated requirement into a rewrite rule, so we use the ``$\Term$'' or ``$\TermP{P}$'' mapping as appropriate to translate each type parameter appearing in our concrete type. (The next section will introduce \AlgRef{build term generic} and \AlgRef{build term protocol} used for this purpose, but for now, the definitions from the start of this chapter remain valid.) We then replace each type parameter with a ``phantom'' generic parameter \ttgp{0}{i}, where the \index{depth}depth is always zero, and the \index{index}index~$i\in\NN$ is the index of the corresponding substitution term in the list. +We're forming the symbol because we're translating an explicit or associated requirement into a rewrite rule, so we use the ``$\Term$'' or ``$\TermP{P}$'' mapping as appropriate to translate each type parameter appearing in our concrete type. (The next section will introduce \AlgRef{build term generic} and \AlgRef{build term protocol} used for this purpose, but for now, the definitions from the start of this chapter remain valid.) We then replace each type parameter with a ``phantom'' generic parameter \ttgp{0}{i}, where the \index{depth!substitution term}depth is always zero, and the \index{index}index~$i\in\NN$ is the index of the corresponding substitution term in the list. \begin{algorithm}[Build concrete type symbol]\label{concretesymbolcons} Receives an interface type~\tX\ as input, and optionally, a protocol~\tP. As output, returns a pattern type together with a list of substitution terms. Note that the type \tX\ must not itself be a type parameter. \begin{enumerate} -\item Initialize $S$ with an empty list of terms, and let $i:=0$. -\item Perform a pre-order walk over the tree structure of \tT, transforming each type parameter~\tT\ contained in \tX\ to form the pattern type~\texttt{Y}: +\item (Initialize) Set $S\leftarrow\{\}$. Set $i\leftarrow 0$. +\item (Recurse) Perform a pre-order walk over the tree structure of \tX\ and transform every type parameter~\tT\ that occurs in \tX\ as follows, to form a new type~\texttt{Y}: \begin{enumerate} -\item If we're lowering an explicit requirement, let $t := \Term(\tT)$. Otherwise, let $t := \TermP{P}(\tT)$. Set $S\leftarrow S + \{t\}$. -\item Replace \tT\ with the generic parameter type \ttgp{0}{i} in \tX. -\item Set $i\leftarrow i + 1$. +\item (Record) If we're lowering a generic signature requirement, let $t := \Term(\tT)$. Otherwise, let $t := \TermP{P}(\tT)$. Set $S\leftarrow S + \{t\}$. (Thus, $S[i]=t$.) +\item (Transform) Replace this occurrence of \tT\ with \ttgp{0}{i}. +\item (Next) Set $i\leftarrow i + 1$. (Now, $|S|=i$.) \end{enumerate} -\item Return the type \texttt{Y}, and the array of substitution terms~$S$. +\item Return the pattern type \texttt{Y}, and the array of substitution terms~$S$. \end{enumerate} \end{algorithm} @@ -849,17 +853,17 @@ \section{Symbols}\label{rqm symbols} Takes two superclass or concrete type symbols $s_1$ and $s_2$ as input. Returns one of ``$<$'', ``$>$'', ``$=$'' or \index{$\bot$}``$\bot$'' as output. \begin{enumerate} \item (Invariant) We assume the two symbols already have the same kind; the case of comparing different kinds is handled by the general symbol order we define next. -\item (Incomparable) Compare the pattern type of $s_1$ and $s_2$ with \index{canonical type equality}canonical type equality. If the pattern types are distinct, return ``$\bot$''. -\item (Initialize) Both symbols have the same pattern type, so they must have the same number of substitution terms, say $n$. Also, let $i:=0$. +\item (Incomparable) Compare the pattern types of $s_1$ and $s_2$ for \index{canonical type equality}canonical type equality. If the pattern types are distinct, return ``$\bot$''. +\item (Initialize) Both symbols have the same pattern type, so they must have the same number of substitution terms, say $n$. Set $i\leftarrow 0$. \item (Equal) If $i=n$, all substitution terms are identical. Return ``$=$''. -\item (Compare) Compare the $i$th substitution terms of $s_1$ and $s_2$ using \AlgRef{rqm reduction order}. Return the result if it is ``$<$'' or ``$>$''. +\item (Compare) Compare the $i$th substitution term of $s_1$ and $s_2$ using \AlgRef{rqm reduction order}. Return the result if it is ``$<$'' or ``$>$''. \item (Next) Otherwise, set $i\leftarrow i+1$ and go back to Step~4. \end{enumerate} \end{algorithm} -Note that superclass and concrete type symbols contain terms, but those terms cannot recursively contain superclass and concrete type symbols, because terms corresponding to type parameters only contain generic parameter, name, protocol and associated type symbols. +While superclass and concrete type symbols may contain terms, those terms cannot themselves contain superclass and concrete type symbols, because terms corresponding to type parameters only contain generic parameter, name, protocol, and associated type symbols. -\paragraph{Concrete conformance symbols.} The notation for a \index{concrete conformance symbol}concrete conformance symbol looks like a concrete type symbol, but it also stores a protocol declaration. We will see in \ChapRef{concrete conformances} that when a type parameter \tT\ is subject to the combination of a \index{conformance requirement!concrete conformance}conformance requirement $\TP$ and a concrete \index{same-type requirement!concrete conformance}same-type requirement $\TX$, we introduce a rewrite rule containing a concrete conformance symbol: +\paragraph{Concrete conformance symbols.} The notation for a \index{concrete conformance symbol}concrete conformance symbol looks like a concrete type symbol, but it also stores a protocol declaration. We will see in \SecRef{rqm concrete conformances} that when a type parameter \tT\ is subject to the combination of a \index{conformance requirement!concrete conformance}conformance requirement $\TP$ and a concrete \index{same-type requirement!concrete conformance}same-type requirement $\TX$, we introduce a rewrite rule containing a concrete conformance symbol: \[\Term(\tT)\cdot\concretesym{\tX\colon\texttt{P};\,\ldots} \sim \Term(\tT)\] To compare two concrete conformance symbols, we first compare their protocol, then we compare their pattern type and substitution terms. \begin{algorithm}[Concrete conformance reduction order]\label{concrete conformance reduction order} @@ -1168,7 +1172,7 @@ \section{Rules}\label{building rules} \ConfRule{\pSequence\cdot\nIterator}{\pIterator} \tag{6} \\ \SameRule{\pSequence\cdot\nElement}{\pSequence\cdot\nIterator\cdot\nElement} \tag{7} \end{gather*} -We have seven local rules in total. The next step is \index{completion}completion, which we describe in \ChapRef{completion}; for now, the important fact is that we transform rules (6) and (7) into a different form. The new rules contain associated type symbols instead of protocol and name symbols: +We have seven local rules in total. The next step is \index{completion}completion, which we describe in \ChapRef{chap:completion}; for now, the important fact is that we transform rules (6) and (7) into a different form. The new rules contain associated type symbols instead of protocol and name symbols: \begin{gather*} \ConfRule{\assocsym{Sequence}{Iterator}}{\pIterator} \tag{\CRule{6}} \\ \SameRule{\assocsym{Sequence}{Element}}{\assocsym{Sequence}{Iterator}\cdot\assocsym{IteratorProtocol}{Element}} \tag{\CRule{7}} @@ -1268,7 +1272,7 @@ \section{Protocol Type Aliases}\label{protocol type aliases} Recall that \index{protocol type alias}protocol type aliases appear as member types of type parameters, alongside associated type declarations (\SecRef{member type repr}). If a type parameter \tT\ conforms to a protocol \tP\ that declares a type alias \nA, the user can state the \index{member type representation!protocol type alias}member type representation \texttt{T.A}. -In the \index{interface resolution stage!protocol type alias}interface resolution stage, we already have a generic signature, and we can issue generic signature queries against the type parameter~\tT\ to find the declaration of \texttt{A}. We then substitute \tSelf\ with \tT\ in the \index{underlying type}underlying type of \nA. However, protocol type aliases can also appear in the requirements of a \Index{where clause@\texttt{where} clause!protocol type alias}\texttt{where} clause, which are resolved in the structural resolution stage, \emph{while} we're building the current context's generic signature. +In the \index{interface resolution stage!protocol type alias}interface resolution stage, we already have a generic signature, and we can issue generic signature queries against the type parameter~\tT\ to find the declaration of \texttt{A}. We then substitute \tSelf\ with \tT\ in the \index{underlying type!in requirement machine}underlying type of \nA. However, protocol type aliases can also appear in the requirements of a \Index{where clause@\texttt{where} clause!protocol type alias}\texttt{where} clause, which are resolved in the structural resolution stage, \emph{while} we're building the current context's generic signature. In this case, type resolution don't know anything about \tT\ yet, so the resolved type is always an unbound dependent member type \texttt{T.A}. Therefore, in a source program with protocol type aliases, an \index{unbound dependent member type!protocol type alias}unbound dependent member type may refer to the name of a \emph{type alias} declaration, and not just an associated type declaration. This gives us a new behavior that is \index{limitation!of derived requirements}not part of the \index{derived requirement!protocol type alias}derived requirements formalism. Instead, we just describe how it is implemented at the level of rewrite rules here. @@ -1453,7 +1457,7 @@ \section{The Normal Form Algorithm}\label{term reduction} In this section, we revisit the \index{normal form algorithm}normal form algorithm of \SecRef{rewritesystemintro}. Among other things, we can use this algorithm to decide if two type parameters are equivalent, which describes the implementation of the \IndexQuery{areReducedTypeParametersEqual}$\Query{areReducedTypeParametersEqual}{}$ generic signature query. Our goal will be to address two shortcomings of the specification in \AlgRef{term reduction algo}: \begin{enumerate} -\item If the original term contains more than one subterm equal to the left-hand side of a rewrite rule, we did not specify which rewrite step to take. +\item If the original term contains more than one \index{subterm!normal form algorithm}subterm equal to the left-hand side of a rewrite rule, we did not specify which \index{rewrite step!normal form algorithm}rewrite step to take. \item If there are a large number of rewrite rules, representing them as a list of pairs is inefficient, because we find ourselves comparing each subterm of our original term with the left-hand side of every rewrite rule. \end{enumerate} @@ -1550,7 +1554,7 @@ \section{The Normal Form Algorithm}\label{term reduction} Once we collect the rewrite rules for a requirement machine, we use the below algorithm to record them. We first apply the normal form algorithm to reduce both sides with all rules added so far. If both sides already have the same normal form, we discard the rule. Otherwise, we orient the pair with the reduction order before adding it to the array and updating the trie. Note that completion also adds new rewrite rules, but with a slightly extended procedure, which we describe in \AlgRef{add rule derived algo}. -\begin{algorithm}[Record rewrite rule]\label{add rewrite rule} +\begin{algorithm}[Record rule]\label{add rewrite rule} Takes a pair of terms $(u,v)$ as input. Returns a flag indicating if a new rule was actually recorded. Has side effects. \begin{enumerate} \item (Reduce) Apply \AlgRef{term reduction trie algo} to $u$ and $v$ to obtain $\tilde{u}$ and $\tilde{v}$. @@ -1575,18 +1579,18 @@ \section{The Normal Form Algorithm}\label{term reduction} Thus, we encode a rewrite step as three integers together with a boolean flag, while a rewrite path consists of the initial term together with a list of rewrite steps encoded in this manner. The \IndexDefinition{rewrite path evaluator}\emph{rewrite path evaluator} recovers the intermediate term at each step, as described above. This is used to print rewrite paths in debugging output. -\paragraph{More about tries.} \AlgRef{trie lookup algo} finds the shortest prefix of the input string that matches some key in the trie. In particular, if our trie contains two keys where one is a prefix of another, a lookup will only ever find the shorter key. We cannot simply discard the longer key though, and we will see that completion performs a different kind of trie lookup to find \emph{all} matching prefixes, in \AlgRef{overlap trie lookup}. Finally, in \ChapRef{propertymap}, we will see that to implement the property map data structure, we use a different trie to find the \emph{longest suffix} of the input string that matches a certain key. We refer the reader to Section~6.3 of \cite{art3} for further discussion of tries. +\paragraph{More about tries.} \AlgRef{trie lookup algo} finds the shortest prefix of the input string that matches some key in the trie. In particular, if our trie contains two keys where one is a prefix of another, a lookup will only ever find the shorter key. We cannot simply discard the longer key though, and we will see that completion performs a different kind of trie lookup to find \emph{all} matching prefixes, in \AlgRef{overlap trie lookup}. Finally, in \ChapRef{propertymap}, we will see that to implement the property map data structure, we use a different trie to find the \emph{longest suffix} of the input string that matches a certain key. To learn more about tries, see Section~6.3 of \cite{art3}. \paragraph{A possible optimization.} -Searching an input string for occurrences of a fixed list of substrings is a common problem in programming, and our approach of using a trie is a typical solution. Our measurements show that the normal form algorithm makes up a neglegible portion of total compile time, so further optimization seems unnecessary. However, in applications of this problem where the input string is very long, the paper by \index{Alfred Aho}Alfred~V.~Aho and \index{Margaret Corasick}Margaret~J.~Corasick~\cite{ahocorasick} presents a further improvement. +Searching an input string for occurrences of a fixed list of substrings is a common problem in programming, and our approach of using a trie is a typical solution. Our measurements show that the normal form algorithm makes up a negligible portion of total compile time, so further optimization seems unnecessary. However, in applications of this problem where the input string is very long, the paper by \index{Alfred Aho}Alfred~V.~Aho and \index{Margaret Corasick}Margaret~J.~Corasick~\cite{ahocorasick} presents a further improvement. -In our normal form algorithm, we check for a matching subterm at every position of the input term. If we're currently at some position~$i$, we might perform a lookup that traverses some number of nodes in the trie, say~$j$, before failing at position $i+j$. At this point, we start a new lookup from the root node, back at position $i+1$ in the original term. Let's call this integer~$j$ the \emph{level} of a trie node. When lookup fails at a node with level $j>1$, we are forced to revisit symbols that we saw already. +In our normal form algorithm, we check for a matching \index{subterm}subterm at every position of the input term. If we're currently at some position~$i$, we might perform a lookup that traverses some number of nodes in the trie, say~$j$, before failing at position $i+j$. At this point, we start a new lookup from the root node, back at position $i+1$ in the original term. Let's call this integer~$j$ the \emph{level} of a trie node. When lookup fails at a node with level $j>1$, we are forced to revisit symbols that we saw already. The key idea in the \IndexDefinition{Aho-Corasick algorithm}Aho-Corasick algorithm is to augment the trie, which they call the \emph{goto graph}, with an additional data structure describing the \emph{failure function}. When a node~$n$ does not have a child for the next symbol~$s$, the failure function sends us to some other node at the previous level, that may not be the root node. We can then repeat the lookup of the \emph{same} symbol~$s$ at the new node. If we reach the root node, there cannot be a match, so we advance to the next symbol, and we never go back. -The tradeoff with this approach is the time taken to precompute the failure function, work that would need to be redone when the rule trie changes. Thus, it would only benefit us after we perform \index{completion}completion, because completion intertwines updating the rule trie with computation of normal forms. +The tradeoff with this approach is the time taken to compute the failure function, work which would need to be redone when the rule trie changes. However, the necessary updates can be done incrementally; see \cite{MEYER1985219}. -\section{Source Code Reference}\label{symbols terms rules sourceref} +\section{Source Code Reference}\label{src:symbols terms rules} \subsection*{Symbols} @@ -1623,7 +1627,7 @@ \subsection*{Symbols} \item \texttt{forConcreteType()} \IndexSource{concrete type symbol}takes a pattern type and a list of substitution terms. \item \texttt{forConcreteConformance()} \IndexSource{concrete conformance symbol}takes a pattern type, a list of substitution terms, and a \texttt{ProtocolDecl *}. \end{itemize} -The last three methods take the pattern type as a \texttt{CanType} and the substitution terms as an \texttt{ArrayRef}. The \texttt{RewriteContext::getSubstitutionSchemaFromType()} method takes an arbitrary concrete \texttt{Type} and builds the pattern type and substitution terms. Note that the pattern type is always a canonical type, so the Requirement Machine does not preserve type sugar in requirements when building a generic signautre, for example. +The last three methods take the pattern type as a \texttt{CanType} and the substitution terms as an \texttt{ArrayRef}. The \texttt{RewriteContext::getSubstitutionSchemaFromType()} method takes an arbitrary concrete \texttt{Type} and builds the pattern type and substitution terms. Note that the pattern type is always a canonical type, so the Requirement Machine does not preserve type sugar in requirements when building a generic signature, for example. \paragraph{Structural components.} Various instance methods take symbols apart: \begin{itemize} @@ -1657,7 +1661,7 @@ \subsection*{Symbols} \end{itemize} \apiref{rewriting::RewriteContext}{class} -See also \SecRef{rqm basic operation source ref}. +See also \SecRef{src:basic operation}. \begin{itemize} \item \texttt{compareProtocols()} implements \AlgRef{protocol reduction order}. @@ -1697,7 +1701,7 @@ \subsection*{Terms} \item \texttt{empty()} checks if this mutable term is empty. \item \texttt{add()} adds a single \texttt{Symbol} at the end of this term. \item \texttt{append()} appends another \texttt{Term} or \texttt{MutableTerm} at the end of this term. This is the \index{free monoid}free monoid operation. -\item \texttt{rewriteSubTerm()} replaces a subterm of this term with another term. The subterm to replace is specified by a pair of iterators, which must be valid iterators into this term. The replacement term may be shorter or longer than the \IndexSource{subterm}subterm, and the underlying storage of this term is resized as appropriate. This operation is the key step in \AlgRef{term reduction trie algo}. +\item \texttt{rewriteSubTerm()} replaces a \IndexSource{subterm}subterm of this term with another term. The subterm to replace is specified by a pair of iterators, which must be valid iterators into this term. The replacement term may be shorter or longer than the \IndexSource{subterm}subterm, and the underlying storage of this term is resized as appropriate. This operation is the key step in \AlgRef{term reduction trie algo}. \end{itemize} \apiref{rewriting::Term}{class} @@ -1720,7 +1724,7 @@ \subsection*{Terms} \end{itemize} \apiref{rewriting::RewriteContext}{class} -See also \SecRef{rqm basic operation source ref}. +See also \SecRef{src:basic operation}. \begin{itemize} \item \texttt{getMutableTermForType()} translates a \texttt{Type} containing a \IndexSource{type parameter!in requirement machine}type parameter into a \texttt{MutableTerm}. This method implements both \AlgRef{build term generic} and \AlgRef{build term protocol}; the second parameter is a \texttt{ProtocolDecl *}, which may be null. @@ -1772,7 +1776,7 @@ \subsection*{Rules} \end{itemize} \apiref{rewriting::RewriteSystem}{class} -A ``rewrite system'' is what the implementation calls the list of \IndexSource{monoid presentation!of requirement machine}rewrite rules in a monoid presentation. Every \texttt{RequirementMachine} has a \texttt{RewriteSystem}. See also \SecRef{completion sourceref}. +A list of \IndexSource{monoid presentation!of requirement machine}rewrite rules in a monoid presentation together with some additional state. Every \texttt{RequirementMachine} has a \texttt{RewriteSystem}. See also \SecRef{src:completion}. \begin{itemize} \item \texttt{initialize()} adds the initial list of \IndexSource{imported rule}imported and local rules. \item \texttt{simplify()} computes the normal form of a term using \AlgRef{term reduction trie algo}. Modifies the given \texttt{MutableTerm} in place, and optionally outputs a \texttt{RewritePath} from the original term to the reduced term. @@ -1802,7 +1806,7 @@ \subsection*{Rules} \end{itemize} \apiref{rewriting::Trie}{template class} -A template class implementing the \IndexSource{trie}trie data structure. Used by the \texttt{RewriteSystem} and \texttt{PropertyMap}. The keys are terms, while the value type is a template parameter. Another template parameter selects between a shortest or longest match lookup strategy. We use shortest match for the normal form algorithm. See also \SecRef{completion sourceref} and \SecRef{property map sourceref}. +A template class implementing the \IndexSource{trie}trie data structure. Used by the \texttt{RewriteSystem} and \texttt{PropertyMap}. The keys are terms, while the value type is a template parameter. Another template parameter selects between a shortest or longest match lookup strategy. We use shortest match for the normal form algorithm. See also \SecRef{src:completion} and \SecRef{property map sourceref}. \begin{itemize} \item \texttt{find()} finds an existing entry with \AlgRef{trie lookup algo}. \item \texttt{insert()} inserts a new entry with \AlgRef{trie insert algo}. @@ -1815,7 +1819,7 @@ \subsection*{Rewrite Steps} \item \SourceFile{lib/AST/RequirementMachine/RewriteLoop.h} \item \SourceFile{lib/AST/RequirementMachine/RewriteLoop.cpp} \end{itemize} -See also \SecRef{completion sourceref}. +See also \SecRef{src:completion}. \apiref{rewriting::RewriteStep::Kind}{enum} The \texttt{RewriteStep::Kind::Rule} kind represents the application of a rewrite rule to a term. Other rewrite step kinds appear in \SecRef{property map sourceref}. diff --git a/docs/Generics/chapters/type-resolution.tex b/docs/Generics/chapters/type-resolution.tex index 195810cbf8678..8a31b4c5b235e 100644 --- a/docs/Generics/chapters/type-resolution.tex +++ b/docs/Generics/chapters/type-resolution.tex @@ -2,14 +2,14 @@ \begin{document} -\chapter{Type Resolution}\label{typeresolution} +\chapter{Type Resolution}\label{chap:type resolution} -\IndexDefinition{type resolution}\lettrine{T}{ype resolution} transforms the syntactic \IndexDefinition{type representation}type representations produced by the \index{parser}parser into the semantic \index{type}types of \ChapRef{types}. Type representations have a \index{tree}tree structure. The leaf nodes are \emph{identifier type representations} without generic arguments, such as \texttt{Int}. Nodes with children include \emph{member type representations} which recursively store a base type representation, such as \texttt{T.Element}. There are also type representations for function types, metatypes, tuples, and existentials; they have children and follow the same shape as the corresponding kind of type. Finally, identifier and member type representations may have children in the form of generic arguments, such as \texttt{Array}. One of our main goals in this chapter is to understand how type resolution forms a generic nominal type from a reference to a type declaration and a list of generic arguments. +\IndexDefinition{type resolution}\lettrine{T}{ype resolution} transforms the syntactic \IndexDefinition{type representation}type representations produced by the \index{parser}parser into the semantic \index{type}types of \ChapRef{chap:types}. Type representations have a \index{tree}tree structure. The leaf nodes are \emph{identifier type representations} without generic arguments, such as \texttt{Int}. Nodes with children include \emph{member type representations} which recursively store a base type representation, such as \texttt{T.Element}. There are also type representations for function types, metatypes, tuples, and existentials; they have children and follow the same shape as the corresponding kind of type. Finally, identifier and member type representations may have children in the form of generic arguments, such as \texttt{Array}. One of our main goals in this chapter is to understand how type resolution forms a generic nominal type from a reference to a type declaration and a list of generic arguments. Type resolution builds the \index{resolved type!z@\igobble|see{type resolution}}\emph{resolved type} by consulting the type representation itself, as well as contextual information describing where the type representation appears: \begin{enumerate} -\item Identifier type representations are resolved by unqualified lookup of their identifier; this depends on the type representation's source location. For example, a type representation might name a generic parameter declared in the current scope. -\item Certain type representations are also resolved based on their semantic position. For example, a function type representation appearing in parameter position resolves to a \index{non-escaping function type}non-escaping function type unless annotated with the \texttt{@escaping} attribute; in any other position, a function type representation resolves to an \index{escaping function type}escaping function type. This behavior was introduced in \IndexSwift{3.0}Swift~3~\cite{se0103}. +\item Identifier type representations are resolved by unqualified lookup of their identifier; this depends on the type representation's \index{source location}source location. For example, a type representation might name a generic parameter declared in the current scope. +\item Resolution of certain type representations is also sensitive to semantic position. For example, a function type representation appearing in parameter position resolves to a \index{non-escaping function type}non-escaping function type unless annotated with the \texttt{@escaping} attribute; in any other position, a function type representation always resolves to an \index{escaping function type}escaping function type. This behavior was introduced in \IndexSwift{3.0}Swift~3~\cite{se0103}. \end{enumerate} We encode contextual information in the \IndexDefinition{type resolution context}\emph{type resolution context}, consisting of the below: \begin{enumerate} @@ -25,7 +25,7 @@ \chapter{Type Resolution}\label{typeresolution} \item \index{interface resolution stage}Interface resolution stage requests the current context's generic signature first, and issues generic signature queries against this signature to perform semantic checks. \end{enumerate} -The \index{generic signature request}\Request{generic signature request} resolves various type representations so that it can build a generic signature from user-written requirements, as we will see in \ChapRef{building generic signatures}. This must be done in the structural resolution stage. +The \index{generic signature request}\Request{generic signature request} resolves various type representations so that it can build a generic signature from user-written requirements, as we will see in \ChapRef{chap:building generic signatures}. This must be done in the structural resolution stage. The \index{interface type request}\Request{interface type request} resolves type representations in the interface resolution stage, to form a \index{value declaration}value declaration's \index{interface type!type resolution}interface type from semantically well-formed types. By resolving types in the interface stage, the \Request{interface type request} depends on the \Request{generic signature request}. @@ -35,7 +35,7 @@ \chapter{Type Resolution}\label{typeresolution} \item Generic arguments are not checked to satisfy the requirements of a generic nominal type in the structural resolution stage. We'll describe checking generic arguments in \SecRef{checking generic arguments}. \end{itemize} -All invocations of type resolution ``downstream'' of the generics implementation must use the interface resolution stage, to not admit invalid types. To emit the full suite of \index{diagnostic!type resolution}diagnostics for type representations resolved in the structural stage, in particular inheritance clauses and trailing \texttt{where} clauses of generic declarations, the \index{type check source file request}\Request{type check source file request} revisits these type representations and resolves them again in the interface resolution stage. +All invocations of type resolution ``downstream'' of the generics implementation must use the interface resolution stage, to not admit invalid types. To emit the full suite of \index{diagnostic!type resolution}diagnostics for type representations resolved in the structural stage, in particular inheritance clauses and trailing \texttt{where} clauses of generic declarations, the \index{type-check primary file request}\Request{type-check primary file request} revisits these type representations and resolves them again in the interface resolution stage. While the structural resolution stage skips some semantic checks, it can still produce diagnostics; name lookup can fail to resolve an identifier, and certain simpler semantic invariants are still enforced, such as checking that the \emph{number} of generic arguments is correct. For these reasons, care must be taken to not emit the same diagnostics twice if the same invalid type representation is resolved in both stages. @@ -43,10 +43,10 @@ \chapter{Type Resolution}\label{typeresolution} \section{Identifier Type Representations}\label{identtyperepr} -An \IndexDefinition{identifier type representation}\emph{identifier type representation} is a single identifier that names a type declaration in some outer scope. We find the \emph{resolved type declaration} via \index{unqualified lookup}unqualified lookup, starting from the source location of the type representation (\SecRef{name lookup}). We then form the resolved type, which will be a nominal type, type alias type, generic parameter type, or dependent member type, depending on the type declaration's kind. We show some examples before describing the general principle. +An \IndexDefinition{identifier type representation}\emph{identifier type representation} is a single identifier that names a type declaration in some outer scope. We find the \emph{resolved type declaration} via \index{unqualified lookup}unqualified lookup, starting from the \index{source location}source location of the type representation (\SecRef{name lookup}). We then form the resolved type, which will be a nominal type, type alias type, generic parameter type, or dependent member type, depending on the type declaration's kind. We show some examples before describing the general principle. \paragraph{Nominal types.} -A top-level non-generic \index{nominal type}nominal type declaration declares a single type, which we referred to as the \index{declared interface type!nominal type declaration}declared interface type in \ChapRef{decls}. Here, the type representation resolves to the \index{struct type}struct type \texttt{Int} declared by the standard library: +A top-level non-generic \index{nominal type}nominal type declaration declares a single type, which we referred to as the \index{declared interface type!nominal type declaration}declared interface type in \ChapRef{chap:decls}. Here, the type representation resolves to the \index{struct type}struct type \texttt{Int} declared by the standard library: \begin{Verbatim} var x: Int = ... \end{Verbatim} @@ -72,11 +72,11 @@ \section{Identifier Type Representations}\label{identtyperepr} \end{Verbatim} \SecRef{unbound generic types} describes another special case where generic arguments can be omitted when referencing a generic nominal type. -Recall from \SecRef{name lookup} and \ChapRef{decls} that unqualified lookup visits each outer scope in turn, and if that scope is a nominal type declaration, we attempt a \index{qualified lookup}qualified lookup with this nominal type as the base type. If this base type is a class type, qualified lookup also walks up the superclass hierarchy. +Recall from \SecRef{name lookup} and \ChapRef{chap:decls} that \index{unqualified lookup}unqualified lookup visits each outer scope in turn, and if that scope is a nominal type declaration, we attempt a \index{qualified lookup}qualified lookup with this nominal type as the base type. If this base type is a class type, qualified lookup also walks up the superclass hierarchy. Thus, our identifier type representation might refer to a member type of a superclass of some outer nominal type declaration. In this case, the declared interface type will be written using type parameters that are not visible in the current scope. To get the final resolved type, we \index{type substitution!type resolution}apply a \index{substitution map}substitution map to the \index{declared interface type!type substitution}declared interface type. -If we found a member of the base type's immediate superclass, we use the context substitution map of the \index{superclass type}superclass type. (In the general case, we use the \index{superclass substitution map}\emph{superclass substitution map} construction from \SecRef{classinheritance}). In the below example, unqualified lookup of \texttt{Inner} inside \texttt{Derived} finds the member of \texttt{Base}. The generic parameter \tT\ of \texttt{Base} is always \texttt{Int} in \texttt{Derived}, so the superclass substitution map we apply to members of \texttt{Base} when seen from \texttt{Derived}, is $\SubstMap{\SubstType{T}{Int}}$: +If we found a member of the base type's immediate superclass, we use the context substitution map of the \index{superclass type}superclass type. (In the general case, we use \AlgRef{superclassfordecl} from \SecRef{classinheritance}). In the below example, unqualified lookup of \texttt{Inner} inside \texttt{Derived} finds the member of \texttt{Base}. The generic parameter \tT\ of \texttt{Base} is always \texttt{Int} in \texttt{Derived}, so the \index{superclass substitution map!type resolution}superclass substitution map we apply to members of \texttt{Base} when seen from \texttt{Derived}, is $\SubstMap{\SubstType{T}{Int}}$: \begin{Verbatim} class Base { struct Inner {} @@ -121,7 +121,7 @@ \section{Identifier Type Representations}\label{identtyperepr} } \end{Verbatim} -Inside the source range of a \index{struct type}struct or \index{enum type}enum declaration or an extension thereof, \tSelf\ is shorthand for the \index{declared interface type!nominal type declaration}declared interface type of this nominal type declaration. This is not a generic parameter type at all, but rather a nominal type, generic or non-generic: +Inside the \index{source range}source range of a \index{struct type}struct or \index{enum type}enum declaration or an extension thereof, \tSelf\ is shorthand for the \index{declared interface type!nominal type declaration}declared interface type of this nominal type declaration. This is not a generic parameter type at all, but rather a nominal type, generic or non-generic: \begin{Verbatim} struct Outer { // Return type of `f' is `Outer' @@ -140,10 +140,10 @@ \section{Identifier Type Representations}\label{identtyperepr} } \end{Verbatim} -We previously described the dynamic \tSelf\ type in \SecRef{misc types}. Historically, Swift only had \tSelf\ in protocols and dynamic \tSelf\ in classes, and the latter could only appear in the return type of a method. \IndexSwift{5.1}Swift~5.1 introduced the ability to state dynamic \tSelf\ in more positions, and also refer to ``static'' \tSelf\ inside struct and enum declarations~\cite{se0068}. +We previously described the dynamic \tSelf\ type in \SecRef{sec:special types}. Historically, Swift only had \tSelf\ in protocols and dynamic \tSelf\ in classes, and the latter could only appear in the return type of a method. \IndexSwift{5.1}Swift~5.1 introduced the ability to state dynamic \tSelf\ in more positions, and also refer to ``static'' \tSelf\ inside struct and enum declarations~\cite{se0068}. \paragraph{Type aliases.} -Identifier type representations can also refer to type alias declarations, generalizing the behavior described for nominal type declarations above. Once again, we take the declared interface type of the type alias declaration, and possibly apply a substitution map. While the declared interface type of a nominal type declaration is a nominal type, the \index{declared interface type!type alias declaration}declared interface type of a \index{type alias declaration}type alias declaration is a \index{type alias type}type alias type. This is a sugared type, \index{canonical type}canonically equal to the \index{underlying type}underlying type of the type alias declaration. +Identifier type representations can also refer to type alias declarations, generalizing the behavior described for nominal type declarations above. Once again, we take the declared interface type of the type alias declaration, and possibly apply a substitution map. While the declared interface type of a nominal type declaration is a nominal type, the \index{declared interface type!type alias declaration}declared interface type of a \index{type alias declaration}type alias declaration is a \index{type alias type}type alias type. This is a sugared type, \index{canonical type}canonically equal to the \Index{underlying type!of type alias declaration}underlying type of the type alias declaration. If the named type alias declaration is in a local context, the resolved type is the declared interface type, with no substitution map applied: \begin{Verbatim} @@ -170,7 +170,7 @@ \section{Identifier Type Representations}\label{identtyperepr} As explained in \SecRef{nested nominal types}, a nominal type declaration cannot be a member of a protocol or protocol extension, but a type alias declaration can. We say it's a \IndexDefinition{protocol type alias}\emph{protocol type alias}. Unqualified lookup will find such a type alias declaration from within the scope of any nominal type declaration that conforms to this protocol. We discuss protocol type aliases in the next section when we talk about member type representations. \paragraph{Associated types.} -If the identifier type representation is located within the source range of a \index{protocol declaration!unqualified lookup}protocol or protocol extension, the protocol's \index{associated type declaration!unqualified lookup}associated type declarations are visible to unqualified lookup. The resolved type is the \index{declared interface type!associated type declaration}declared interface type of the associated type declaration, which is a \index{dependent member type!unqualified lookup}dependent member type around ``\tSelf'': +If the identifier type representation is located within the \index{source range}source range of a \index{protocol declaration!unqualified lookup}protocol or protocol extension, the protocol's \index{associated type declaration!unqualified lookup}associated type declarations are visible to unqualified lookup. The resolved type is the \index{declared interface type!associated type declaration}declared interface type of the associated type declaration, which is a \index{dependent member type!unqualified lookup}dependent member type around ``\tSelf'': \begin{Verbatim} protocol Pair { associatedtype A @@ -184,7 +184,7 @@ \section{Identifier Type Representations}\label{identtyperepr} } \end{Verbatim} -Associated type declarations are also visible from within the protocol's \index{conforming type}conforming types. Recall that associated types can be \index{type witness}witnessed by generic parameters, member type declarations, or \index{associated type inference}inference (\SecRef{type witnesses}). When the type witness is a generic parameter or member type, unqualified lookup will always find the witness \emph{before} the associated type declaration. However, if the type witness is inferred, unqualified lookup will find the associated type declaration: +Associated type declarations are also visible from within the protocol's \index{conforming type}conforming types. Recall that associated types can be \index{type witness}witnessed by generic parameters, member type declarations, or \index{associated type inference!type resolution}inference (\SecRef{type witnesses}). When the type witness is a generic parameter or member type, unqualified lookup will always find the witness \emph{before} the associated type declaration. However, if the type witness is inferred, unqualified lookup will find the associated type declaration: \begin{Verbatim} struct S: Pair { // Explicit type witness: @@ -215,7 +215,7 @@ \section{Identifier Type Representations}\label{identtyperepr} \end{Verbatim} Of course \index{import declaration}\texttt{import} declarations are usually followed by a bare module name, but they are resolved by different means than type resolution. -\paragraph{Summary.} If the resolved type declaration is in \index{local type declaration!type resolution}local context, or at the \index{top-level type declaration!type resolution}top level of a source file, the resolved type is just its \index{declared interface type!type resolution}declared interface type. Otherwise, we found the resolved type declaration by performing a \index{qualified lookup}qualified lookup into some outer nominal type or extension. In this case, the resolved type declaration might be a \emph{direct} member of the outer nominal, or a member of a superclass or protocol. In the direct case, or if we started from a protocol and found a member of another protocol, we again return the member's declared interface type. Otherwise, we build a substitution map whose \index{output generic signature}output generic signature is the generic signature of the outer nominal type or extension. We apply it to the member's declared interface type to get the resolved type: +\paragraph{Summary.} If the resolved type declaration is in \index{local type declaration!type resolution}local context, or at the \index{top-level type declaration!type resolution}top level of a \index{source file}source file, the resolved type is just its \index{declared interface type!type resolution}declared interface type. Otherwise, we found the resolved type declaration by performing a \index{qualified lookup}qualified lookup into some outer nominal type or extension. In this case, the resolved type declaration might be a \emph{direct} member of the outer nominal, or a member of a superclass or protocol. In the direct case, or if we started from a protocol and found a member of another protocol, we again return the member's declared interface type. Otherwise, we build a substitution map whose \index{output generic signature}output generic signature is the generic signature of the outer nominal type or extension. We apply it to the member's declared interface type to get the resolved type: \begin{itemize} \item In the \textbf{superclass case}, we take the outer nominal's superclass bound and the member's parent class declaration, and build the \index{superclass substitution map!type resolution}superclass substitution map. @@ -224,7 +224,7 @@ \section{Identifier Type Representations}\label{identtyperepr} \end{itemize} \paragraph{Source ranges.} -In the \index{scope tree}scope tree, a nominal type or extension declaration actually defines \emph{two} scopes, one nested within the other. The smaller source range contains the \emph{body} only, from ``\verb|{|'' to ``\verb|}|''. The larger source range starts with the declaration's opening keyword, such as ``\texttt{class}'' or ``\texttt{extension}'', and continues until the ``\verb|}|''. In particular, the \Index{where clause@\texttt{where} clause!scope tree}\texttt{where} clause is inside the larger source range, but outside of the smaller source range. Within a protocol or protocol extension, the protocol's \index{associated type declaration!unqualified lookup}associated type members (and type alias members, too) are always visible in both scopes: +In the \index{scope tree}scope tree, a nominal type or extension declaration actually defines \emph{two} scopes, one nested within the other. The smaller \index{source range}source range contains the \emph{body} only, from ``\verb|{|'' to ``\verb|}|''. The larger source range starts with the declaration's opening keyword, such as ``\texttt{class}'' or ``\texttt{extension}'', and continues until the ``\verb|}|''. In particular, the \Index{where clause@\texttt{where} clause!scope tree}\texttt{where} clause is inside the larger source range, but outside of the smaller source range. Within a protocol or protocol extension, the protocol's \index{associated type declaration!unqualified lookup}associated type members (and type alias members, too) are always visible in both scopes: \begin{Verbatim} // This is OK; `Element' resolves to `Self.[Collection]Element' extension Collection where Element == Int {...} @@ -243,7 +243,7 @@ \section{Identifier Type Representations}\label{identtyperepr} \section{Member Type Representations}\label{member type repr} -A \IndexDefinition{member type representation}\emph{member type representation} consists of a \emph{base} type representation together with an identifier, joined by ``\verb|.|'' in the concrete syntax. The base might be an identifier type representation, or recursively, another member type representation. The base may also have generic arguments. The general procedure for resolving a member type representation is the following. We start by recursively resolving the base type reprensetation; then, we issue an \index{qualified lookup}qualified lookup (\SecRef{name lookup}) to look for a member type declaration with the given name, inside the resolved base type. This finds the resolved type declaration, from which we compute the resolved type by applying a substitution map. +A \IndexDefinition{member type representation}\emph{member type representation} consists of a \emph{base} type representation together with an identifier, joined by ``\verb|.|'' in the concrete syntax. The base might be an identifier type representation, or recursively, another member type representation. The base may also have generic arguments. The general procedure for resolving a member type representation is the following. We start by recursively resolving the base type representation; then, we issue an \index{qualified lookup}qualified lookup (\SecRef{name lookup}) to look for a member type declaration with the given name, inside the resolved base type. This finds the resolved type declaration, from which we compute the resolved type by applying a substitution map. The resolved types obtained this way include nominal types, dependent member types, and type alias types. We classify the various behaviors by considering each kind of base type in turn, and describing its member types. @@ -287,7 +287,7 @@ \section{Member Type Representations}\label{member type repr} \qquad\qquad{}=\texttt{T.[Provider]Entity} \end{gather*} -The second case, where we found a \index{protocol type alias}\index{type alias declaration}protocol type alias as a member of the base, is closely related. The resolved type is again obtained by replacing all occurrences of \tSelf\ in the \index{underlying type}underlying type of the type alias with the resolved base type of the member type representation. We're also going to make things slightly more interesting by using a dependent member type \texttt{T.Element} as the base type, rather than the generic parameter~\tT: +The second case, where we found a \index{protocol type alias}\index{type alias declaration}protocol type alias as a member of the base, is closely related. The resolved type is again obtained by replacing all occurrences of \tSelf\ in the \Index{underlying type!of type alias declaration}underlying type of the type alias with the resolved base type of the member type representation. We're also going to make things slightly more interesting by using a dependent member type \texttt{T.Element} as the base type, rather than the generic parameter~\tT: \begin{Verbatim} protocol Subscriber { associatedtype Parent: Provider @@ -325,14 +325,14 @@ \section{Member Type Representations}\label{member type repr} \end{quote} At this point, \texttt{f()} has a generic signature, so type representations appearing inside the function can be resolved in the interface resolution stage. The rewriting of requirements that use unbound dependent member types into requirements that use bound dependent member type swill be completely justified in \SecRef{minimal requirements}. -We discussed bound and unbound dependent member types in \SecRef{bound type params}. An unbound dependent member type from the structural resolution stage can be converted into a bound dependent member type by first checking the \IndexQuery{isValidTypeParameter}$\Query{isValidTypeParameter}{}$ generic signature query, followed by \IndexQuery{getReducedType}$\Query{getReducedType}{}$. The first check is needed since an unbound dependent member type, being a syntactic construct, might not name a valid member type at all. However, the usual situation is that the entire type representation is resolved again in the interface resolution stage, at which point an invalid type parameter is resolved to an error type and a \index{diagnostic!invalid type parameter}diagnostic is emitted. In particular, type representations in the trailing \texttt{where} clause of each declaration are re-visited by the \Request{type-check source file request}, which walks all top-level declarations in source order and emits further diagnostics. +We discussed bound and unbound dependent member types in \SecRef{bound type params}. An unbound dependent member type from the structural resolution stage can be converted into a bound dependent member type by first checking the \IndexQuery{isValidTypeParameter}$\Query{isValidTypeParameter}{}$ generic signature query, followed by \IndexQuery{getReducedType}$\Query{getReducedType}{}$. The first check is needed since an unbound dependent member type, being a syntactic construct, might not name a valid member type at all. However, the usual situation is that the entire type representation is resolved again in the interface resolution stage, at which point an invalid type parameter is resolved to an error type and a \index{diagnostic!invalid type parameter}diagnostic is emitted. In particular, type representations in the trailing \texttt{where} clause of each declaration are re-visited by the \Request{type-check primary file request}, which walks all top-level declarations in source order and emits further diagnostics. Suppose now we change \texttt{f()}, to add the invalid requirement $\ConfReq{T.Foo}{Equatable}$: \begin{Verbatim} func f(_: T) where T.Entity: Equatable, T.Foo: Equatable {...} \end{Verbatim} -The invalid requirement will be dropped by requirement minimization, and the \Request{generic signature request} does not emit any diagnostics. Instead, the invalid member type representation \texttt{T.Foo} will be diagnosed by \Request{type-check source file request}, because qualified lookup would fail to find a member type named \texttt{Foo} in \texttt{Provider} when we revisit the \texttt{where} clause in interface resolution stage. We will resume the discussion of how invalid requirements are diagnosed in \SecRef{generic signature validity}. +The invalid requirement will be dropped by requirement minimization, and the \Request{generic signature request} does not emit any diagnostics. Instead, the invalid member type representation \texttt{T.Foo} will be diagnosed by \Request{type-check primary file request}, because qualified lookup would fail to find a member type named \texttt{Foo} in \texttt{Provider} when we revisit the \texttt{where} clause in interface resolution stage. We will resume the discussion of how invalid requirements are diagnosed in \SecRef{generic signature validity}. \smallskip @@ -384,7 +384,7 @@ \section{Member Type Representations}\label{member type repr} Given the generic nominal type \texttt{Base}, we resolve the member type representation written above by applying the \index{context substitution map!type resolution}context substitution map of \texttt{Base} to the declared interface type of \texttt{Inner}, which we can do because \texttt{Inner} has the same generic signature as \texttt{Base}: \[\texttt{Base.Inner}\otimes\SubstMap{\SubstType{T}{String}}=\texttt{Base.Inner}\] -The substitutions above are trivial in a sense, because the referenced member is a nominal type declaration, and to compute the resolved type, we simply transfer over the generic arguments from the base type. However, when the member is a \index{type alias declaration}type alias declaration, we can actually encode an arbitrary substitution. Each choice of base type defines a possible substitution map, which is then applied to the underlying type of the type alias, which can be any valid interface type for its generic signature. The reader may recall that when substitution maps were introduced in \ChapRef{substmaps}, one of the motivating examples was \ExRef{type alias subst example}, showing substitutions performed when resolving member type representations with type alias members. We're going to look at a few interesting examples of type alias members now. +The substitutions above are trivial in a sense, because the referenced member is a nominal type declaration, and to compute the resolved type, we simply transfer over the generic arguments from the base type. However, when the member is a \index{type alias declaration}type alias declaration, we can actually encode an arbitrary substitution. Each choice of base type defines a possible substitution map, which is then applied to the underlying type of the type alias, which can be any valid interface type for its generic signature. When we introduced substitution maps in \ChapRef{chap:substitution maps}, we saw that resolving the type of a type alias member was one situation where substitution maps naturally appear in the language. We're going to look at some examples of type alias members now. \smallskip @@ -468,7 +468,7 @@ \section{Member Type Representations}\label{member type repr} func celebratePetBirthday(_ age: Pet.Age) {} \end{Verbatim} -Due to peculiarities of type substitution, \index{generic type alias}protocol type aliases that are also \index{generic type alias}generic are always considered to depend on \tSelf, even if their underlying type does not reference \tSelf, so they \index{limitation!generic type alias with protocol base}cannot be referenced with a protocol base. (In structural resolution stage, a generic type alias cannot be referenced with a \index{limitation!generic type alias with type parameter base}type parameter base, either. Perhaps it is best not to stick generic type aliases inside protocols, at all.) +Due to peculiarities of type substitution, \index{generic type alias}protocol type aliases that are also \index{generic type alias}generic are always considered to depend on \tSelf, even if their underlying type does not reference \tSelf, so they \index{limitation!generic type alias}cannot be referenced with a protocol base. (In structural resolution stage, a generic type alias cannot be referenced with a \index{limitation!generic type alias}type parameter base, either. Perhaps it is best not to stick generic type aliases inside protocols, at all.) \paragraph{General principle.} Let's say that $H$ is the generic signature of the current context, and \tT\ is the resolved base type of our member type representation, obtained via a recursive call to type resolution. We perform a \index{qualified lookup}qualified lookup after considering the base type \tT: \begin{itemize} @@ -483,7 +483,7 @@ \section{Member Type Representations}\label{member type repr} \end{itemize} If the base type \tT\ is a type parameter subject to a concrete \index{same-type requirement!type resolution}same-type requirement or a \index{superclass requirement!type resolution}superclass requirement, we replace \tT\ with the corresponding concrete type obtained by a generic signature query against $H$ before proceeding to compute $\Sigma$ above. -In all three cases above, $d$ might be defined in a constrained extension that imposes further conformance requirements. When building $\Sigma$, we resolve any of these additional conformances via \index{global conformance lookup!substitution map}global conformance lookup (\SecRef{buildingsubmaps}). +In all three cases above, $d$ might be defined in a constrained extension that imposes further conformance requirements. When building $\Sigma$, we resolve any of these additional conformances via \index{global conformance lookup!substitution map}global conformance lookup (\SecRef{conformance lookup}). This is called a \IndexDefinition{context substitution map!for a declaration context} \emph{context substitution map for a declaration context}. This concept generalizes the context substitution map of a type from \SecRef{contextsubstmap}, which was an inherent property of a type, without reference to a declaration context. If \tT\ is a nominal type and $d$ is a direct member of the nominal type declaration of \tT, the context substitution map of \tT\ for the parent context of $d$ is simply the context substitution map of \tT. @@ -493,7 +493,7 @@ \section{Member Type Representations}\label{member type repr} \paragraph{Caching the type declaration.} Having computed the resolved type of an identifier or member type representation, we stash the resolved type declaration within our type representation, as a sort of cache. If the type representation is resolved again (perhaps once in the \index{type resolution stage}structural stage, and then again in the interface stage), we skip name lookup and proceed directly to computing the resolved type from the stored type declaration. The optimization was more profitable in the past, when type resolution actually had \emph{three} stages. The third stage would resolve interface types to archetypes, but it has since been subsumed by the \index{map type into environment}\textbf{map type into environment} operation on \index{generic environment}generic environments. We also pre-populate this cache when parsing textual \index{SIL}SIL, by assigning a type declaration to certain type representations. Name lookup would otherwise not find these declarations, because of SIL syntax oddities that we're not going to discuss here. -\section{Applying Generic Arguments}\label{checking generic arguments} +\section{Generic Arguments}\label{checking generic arguments} Identifier and member type representations may be equipped with generic arguments, where each \index{generic argument!type resolution}generic argument is recursively another type representation: \begin{Verbatim} @@ -549,13 +549,13 @@ \section{Applying Generic Arguments}\label{checking generic arguments} \paragraph{Checking generic arguments.} Returning to Step~4 from the beginning of the present section, we're given a substitution map $\Sigma\in\SubMapObj{G}{H}$ and we must decide if $\Sigma$ satisfies the requirements of $G$, the generic signature of the referenced type declaration. We apply $\Sigma$ to each explicit requirement of $G$ to get a series of \index{substituted requirement}\emph{substituted requirements}. A substituted requirement is a statement about concrete types that is either true or false; we proceed to check each one by global conformance lookup, canonical type comparison, and so on, as will be described below. If any substituted requirements are unsatisfied, we \index{diagnostic!unsatisfied requirement}diagnose an error. This checking of substituted requirements is a generally useful operation, used not only by type resolution; we discuss other applications at the end of the section. -Our generic argument types can contain type parameters of $H$ (the generic signature of the current context), and checking a substituted requirement may raise questions about~$H$. To avoid separately passing in~$H$, we require that a substituted requirement's types are expressed in terms of \index{archetype type}\emph{archetypes} instead (see \SecRef{archetypesubst} for a refresher). Thus, we begin by mapping $\Sigma$ into the \index{primary generic environment}primary generic environment of $H$, and assume henceforth that $\Sigma\in\SubMapObj{G}{\EquivClass{H}}$. +Our generic argument types can contain type parameters of $H$ (the generic signature of the current context), and checking a substituted requirement may raise questions about~$H$. To avoid separately passing in~$H$, we require that a substituted requirement's types are expressed in terms of \index{archetype type}\emph{archetypes} instead (see \SecRef{archetypesubst} for a refresher). Thus, we begin by mapping $\Sigma$ into the \index{primary generic environment}primary generic environment of $H$, and assume henceforth that $\Sigma\in\SubMapObjCtx{G}{H}$. \begin{definition} -We denote by \IndexSetDefinition{req}{\ReqObj{G}}$\ReqObj{G}$ the set of all requirements whose left-hand and right-hand side types contain type parameters of $G$ (all explicit and \index{derived requirement}derived requirements of $G$ are also elements of $\ReqObj{G}$, but there are many more). Similarly, let $\ReqObj{\EquivClass{H}}$ denote the set of requirements written using the primary archetypes of $H$. +We denote by \IndexSetDefinition{req}{\ReqObj{G}}$\ReqObj{G}$ the set of all requirements whose left-hand and right-hand side types contain type parameters of $G$ (all explicit and \index{derived requirement}derived requirements of $G$ are also elements of $\ReqObj{G}$, but there are many more). Similarly, let $\ReqObjCtx{H}$ denote the set of requirements written using the primary archetypes of $H$. We then define \emph{requirement substitution} as a new ``overload'' of \index{$\otimes$}$\otimes$: -\[\ReqObj{G}\otimes\SubMapObj{G}{\EquivClass{H}}\rightarrow\ReqObj{\EquivClass{H}}\] +\[\ReqObj{G}\otimes\SubMapObjCtx{G}{H}\rightarrow\ReqObjCtx{H}\] Requirement substitution must apply $\Sigma$ to every type parameter appearing in a given requirement $R$ by considering the \index{requirement kind}requirement kind. In all of the below, \tT\ is the subject type of the requirement, so $\tT\in\TypeObj{G}$: \begin{itemize} \item For a \index{conformance requirement!type substitution}\textbf{conformance requirement} $\TP$, we apply $\Sigma$ to \tT. The \index{protocol type}protocol type~\texttt{P} remains unchanged because it does not contain any type parameters: @@ -625,7 +625,7 @@ \section{Applying Generic Arguments}\label{checking generic arguments} var x: Concat = ... } \end{Verbatim} -We build $\Sigma\in\SubMapObj{G}{\EquivClass{H}}$ by mapping our generic arguments into the primary generic environment of $H$: +We build $\Sigma\in\SubMapObjCtx{G}{H}$ by mapping our generic arguments into the primary generic environment of $H$: \begin{align*} \Sigma := \SubstMapC{ &\SubstType{\rT}{$\archetype{C}$},\\ @@ -664,28 +664,28 @@ \section{Applying Generic Arguments}\label{checking generic arguments} \end{example} \begin{algorithm}[Check requirement]\label{reqissatisfied} -Takes a substituted requirement $R\in\ReqObj{\EquivClass{H}}$ as input, where the generic signature $H$ is not given explicitly; $R$ may contain primary archetypes of $H$, but not type parameters. Returns true if $R$ is \IndexDefinition{satisfied requirement}satisfied, false otherwise. In the below, \tX\ is the concrete subject type of $R$, so $\tX\in\TypeObj{\EquivClass{H}}$. We handle each \index{requirement kind}requirement kind as follows: +Takes a substituted requirement $R\in\ReqObjCtx{H}$ as input, where the generic signature $H$ is not given explicitly; $R$ may contain primary archetypes of $H$, but not type parameters. Returns true if $R$ is \IndexDefinition{satisfied requirement}satisfied, false otherwise. In the below, \tX\ is the concrete subject type of $R$, so $\tX\in\TypeObjCtx{H}$. We handle each \index{requirement kind}requirement kind as follows: \begin{itemize} \item For a \index{conformance requirement!checking}\textbf{conformance requirement} $\XP$, we perform the \index{global conformance lookup!conformance requirement}global conformance lookup $\tX \otimes \tT$. There are three possible outcomes: \begin{enumerate} \item If we get an \index{abstract conformance}abstract conformance, it must be that \tX\ is an archetype of $H$ whose type parameter conforms to $\tP$. Return true. -\item If we get a \index{concrete conformance}concrete conformance, it might be \index{conditional conformance}conditional (\SecRef{conditional conformance}). These conditional requirements are also substituted requirements of $\ReqObj{\EquivClass{H}}$, and we check them by recursively invoking this algorithm. If all conditional requirements are satisfied (or if there aren't any), return true. +\item If we get a \index{concrete conformance}concrete conformance, it might be \index{conditional conformance}conditional (\SecRef{sec:conditional conformances}). These conditional requirements are also substituted requirements of $\ReqObjCtx{H}$, and we check them by recursively invoking this algorithm. If all conditional requirements are satisfied (or if there aren't any), return true. \item If we get an invalid conformance, or if the conditional requirement check failed above, return false. \end{enumerate} \item For a \index{superclass requirement!checking}\textbf{superclass requirement} $\ConfReq{X}{C}$, we proceed as follows: \begin{enumerate} \item If \tX\ is a class type \index{canonical type equality}canonically equal to \tC, return true. \item If \tX\ and \tC\ are two distinct generic class types for the same \index{class declaration}class declaration, return false. -\item If \tX\ does not have a \index{superclass type}superclass type (\ChapRef{classinheritance}), return false. +\item If \tX\ does not have a \index{superclass type}superclass type (\SecRef{classinheritance}), return false. \item Otherwise, let $\tX^\prime$ be the superclass type of \tX. Recursively apply the algorithm to the superclass requirement $\ConfReq{$\tX^\prime$}{C}$. \end{enumerate} -\item For a \index{layout requirement!checking}\textbf{layout requirement} $\ConfReq{X}{AnyObject}$, we check if \tX\ is a class type, an archetype satisfying the \Index{AnyObject@\texttt{AnyObject}}\texttt{AnyObject} \index{layout constraint}layout constraint, or an \index{Objective-C existential}\texttt{@objc} existential, and if so, we return true. Otherwise, we return false. (We'll discuss representation of existentials in \ChapRef{existentialtypes}.) +\item For a \index{layout requirement!checking}\textbf{layout requirement} $\ConfReq{X}{AnyObject}$, we check if \tX\ is a class type, an archetype satisfying the \Index{AnyObject@\texttt{AnyObject}}\texttt{AnyObject} \index{layout constraint}layout constraint, or an \index{Objective-C existential}\texttt{@objc} existential, and if so, we return true. Otherwise, we return false. (We'll discuss representation of existentials in \ChapRef{chap:existential types}.) \index{canonical type equality!same-type requirement} \item For a \index{same-type requirement!checking}\textbf{same-type requirement} $\SameReq{X}{Y}$, we check if \tX\ and \tY\ are canonically equal. \end{itemize} \end{algorithm} \paragraph{Contextually-generic declarations.} -A type declaration with a trailing \Index{where clause@\texttt{where} clause!contextually-generic declaration}\texttt{where} clause but no generic parameter list was called a \index{contextually-generic declaration}contextually-generic declaration in \SecRef{requirements}. A non-generic declaration inside a constrained extension is conceptually similar; we'll meet constrained extensions in \SecRef{constrained extensions}. In both cases, the generic signature~$G$ of the referenced declaration and the generic signature of the parent context~$G^\prime$ share the same generic parameters, but $G$ has additional requirements not present in $G^\prime$. While there are no generic arguments to apply, we still proceed to check that~$\Sigma$ satisfies the requirements of~$G$. +A type declaration with a trailing \Index{where clause@\texttt{where} clause!contextually-generic declaration}\texttt{where} clause but no generic parameter list was called a \index{contextually-generic declaration}contextually-generic declaration in \SecRef{sec:requirements}. A non-generic declaration inside a constrained extension is conceptually similar; we'll meet constrained extensions in \SecRef{constrained extensions}. In both cases, the generic signature~$G$ of the referenced declaration and the generic signature of the parent context~$G^\prime$ share the same generic parameters, but $G$ has additional requirements not present in $G^\prime$. While there are no generic arguments to apply, we still proceed to check that~$\Sigma$ satisfies the requirements of~$G$. \begin{example} The \texttt{Inner} type below demonstrates the first case: \begin{Verbatim} @@ -734,11 +734,11 @@ \section{Applying Generic Arguments}\label{checking generic arguments} \item The original type was a dependent member type, and the substituted base type does not conform to the member type's protocol. This means that an earlier \index{conformance requirement!checking}conformance requirement was unsatisfied, and hence already diagnosed. \item The declaration of a \index{normal conformance}normal conformance may itself contain error types for any invalid or missing \index{type witness}type witnesses, in which case projecting a type witness may output an error type; again, we will have diagnosed an error earlier, when checking the conformance. \end{enumerate} -In all cases, a \index{diagnostic!substitution failure}diagnostic was already emitted, thus the requirement itself need not be diagnosed. We called this a \index{substitution failure}\emph{substitution failure} in \ChapRef{substmaps}. +In all cases, a \index{diagnostic!substitution failure}diagnostic was already emitted, thus the requirement itself need not be diagnosed. We called this a \index{substitution failure}\emph{substitution failure} in \ChapRef{chap:substitution maps}. \begin{algorithm}[Check substitution map]\label{check generic arguments algorithm} Takes two inputs: \begin{enumerate} -\item A substitution map $\Sigma\in\SubMapObj{G}{\EquivClass{H}}$. +\item A substitution map $\Sigma\in\SubMapObjCtx{G}{H}$. \item Some list of elements of $\ReqObj{G}$. (When checking the generic arguments of a type representation, these are the explicit requirements of the generic signature~$G$.) \end{enumerate} As output, returns an \emph{unsatisfied} list, and a \emph{failed} list. If both output lists are empty, all input requirements are satisfied by $\Sigma$. @@ -751,28 +751,28 @@ \section{Applying Generic Arguments}\label{checking generic arguments} \item (Loop) Go back to Step~2. \end{enumerate} \end{algorithm} -If any requirements appear in the unsatisfied list, type resolution diagnoses a series of errors at the source location of the generic type representation, one for each unsatisfied requirement. Requirements on the failed list are dropped, because another diagnostic will have been emitted, as explained previously. If at least one requirement was unsatisfied or failed, the resolved type becomes the error type; this enforces the invariant that clients of type resolution will not encounter any types that do not satisfy generic requirements---with the important exception of the \index{error type}error type itself! +If any requirements appear in the unsatisfied list, type resolution diagnoses a series of errors at the \index{source location}source location of the generic type representation, one for each unsatisfied requirement. Requirements on the failed list are dropped, because another diagnostic will have been emitted, as explained previously. If at least one requirement was unsatisfied or failed, the resolved type becomes the error type; this enforces the invariant that clients of type resolution will not encounter any types that do not satisfy generic requirements---with the important exception of the \index{error type}error type itself! \paragraph{There's more.} Other places where we use \AlgRef{check generic arguments algorithm}: \begin{enumerate} \item When \index{conformance checker}checking the declaration of a \index{normal conformance}normal conformance $\NormalConf$ where $\tXd$ is the \index{declared interface type!nominal type declaration}declared interface type of some nominal type declaration~$d$, we must decide if the given set of type witnesses satisfy the associated requirements of \tP. In other words, we take the \index{protocol substitution map}protocol substitution map $\Sigma_{\TP}$, and apply it to each associated requirement of \tP\ (\SecRef{requirement sig}). -\item When a concrete type \tX\ conforms to a protocol \tP\ via a \index{conditional conformance}conditional conformance, we check if the \index{context substitution map!conditional conformance}context substitution map of \tX\ satisfies the conditional requirements of $\XP$. We will describe this in \SecRef{conditional conformance}. +\item When a concrete type \tX\ conforms to a protocol \tP\ via a \index{conditional conformance}conditional conformance, we check if the \index{context substitution map!conditional conformance}context substitution map of \tX\ satisfies the conditional requirements of $\XP$. We will describe this in \SecRef{sec:conditional conformances}. \item The conditional requirements of a conditional conformance are also computed via the same algorithm when the declaration of the conformance is type checked. We ask which requirements in the generic signature of the constrained extension are \emph{not} satisfied by the generic signature of the \index{extended type}extended type. -\item Checking if a subclass method is a well-formed override of a superclass method asks whether the generic signature of the subclass method satisfies each requirement of the generic signature of the superclass method (\ChapRef{building generic signatures}). +\item Checking if a subclass method is a well-formed override of a superclass method asks whether the generic signature of the subclass method satisfies each requirement of the generic signature of the superclass method (\ChapRef{chap:building generic signatures}). \end{enumerate} There are also two related problems which follow different code paths but reason about requirements in the same way as above: \begin{enumerate} -\item The expression type checker translates generic requirements to constraints when type checking a reference to a generic function; these constraints are then solved by the constraint solver and a substitution map is formed for the call. This is entirely analogous to what happens in type resolution when referencing a generic type declaration. +\item The \index{expression type checker}expression type checker translates generic requirements to constraints when type checking a reference to a generic function; these constraints are then solved by the constraint solver and a substitution map is formed for the call. This is entirely analogous to what happens in type resolution when referencing a generic type declaration. \item Requirement inference is the step in building a new generic signature where we \emph{add} requirements to ensure that certain substituted requirements will be satisfied (\SecRef{requirementinference}). \end{enumerate} \section{Unbound Generic Types}\label{unbound generic types} -Introduced in \SecRef{misc types}, the \index{unbound generic type}\emph{unbound generic type} represents a reference to a generic type declaration without generic arguments, and a \index{placeholder type}\emph{placeholder type} represents a specific missing generic argument. +Introduced in \SecRef{sec:special types}, the \index{unbound generic type}\emph{unbound generic type} represents a reference to a generic type declaration without generic arguments, and a \index{placeholder type}\emph{placeholder type} represents a specific missing generic argument. Unbound generic types and placeholder types only appear when permitted by the \index{type resolution context}type resolution context. These contexts are those syntactic positions where missing generic arguments can be filled in by some other mechanism, for example by using the expression type checker to infer the type of an expression. @@ -792,12 +792,12 @@ \section{Unbound Generic Types}\label{unbound generic types} \item The \index{extended type}extended type of an \index{extension declaration}extension is typically written as an unbound generic type: \begin{Verbatim} -struct GenericType { ... } -extension GenericType { ... } +struct GenericType {...} +extension GenericType {...} \end{Verbatim} We will see in \SecRef{constrained extensions} that writing generic arguments for the extended type also has meaning, as a shorthand for a series of same-type requirements. Placeholder types cannot appear here. -\item The \index{underlying type}underlying type of a \index{type alias declaration}type alias may contain an unbound generic type (but not a placeholder type). This is a shorthand for a generic type alias that forwards its generic arguments, so the below are equivalent, given \texttt{GenericType} as above: +\item The \Index{underlying type!of type alias declaration}underlying type of a \index{type alias declaration}type alias may contain an unbound generic type (but not a placeholder type). This is a shorthand for a generic type alias that forwards its generic arguments, so the below are equivalent, given \texttt{GenericType} as above: \begin{Verbatim} typealias GenericAlias = GenericType typealias GenericAlias = GenericType @@ -810,7 +810,7 @@ \section{Unbound Generic Types}\label{unbound generic types} \end{enumerate} Notice how the first two contexts allow any type representation to resolve to an unbound generic type even if it appears in a nested position, while in the last two, only the topmost type representation can resolve to an unbound generic type. -\paragraph{A limitation.} An unbound generic type can refer to either a nominal type declaration, or a type alias declaration. However, an unbound generic type referring to a type alias declaration cannot be the parent type of a nominal type. It is an error to access a member type of a generic type alias without providing generic arguments, \index{limitation!generic type alias with unbound generic type base}even when an unbound generic type referencing a nominal type declaration can appear in the same position: +\paragraph{A limitation.} An unbound generic type can refer to either a nominal type declaration, or a type alias declaration. However, an unbound generic type referring to a type alias declaration cannot be the parent type of a nominal type. It is an error to access a member type of a generic type alias without providing generic arguments, \index{limitation!generic type alias}even when an unbound generic type referencing a nominal type declaration can appear in the same position: \begin{Verbatim} struct GenericType { struct Nested { @@ -829,7 +829,7 @@ \section{Unbound Generic Types}\label{unbound generic types} \paragraph{A future direction.} If we think of type representations as syntactic, and types as semantic, unbound generic types occupy a weird point in between. They are produced by type resolution and refer to type declarations, but they do not actually survive type checking. When all is said and done, generic arguments in expressions will either be resolved or replaced with error types. We could eliminate unbound generic types from the implementation by reworking type resolution to take a callback that fills in missing generic arguments. Each context where an unbound generic type or placeholder type can appear would supply its own callback. For example, the expression type checker would provide returns a fresh \index{type variable type}type variable type. This callback model is partially implemented today, but all existing callbacks are trivial; the callback's presence simply communicates to type resolution that an unbound generic type is permitted in this position. -\section{Source Code Reference}\label{type resolution source ref} +\section{Source Code Reference}\label{src:type resolution} Key source files: \begin{itemize} @@ -853,7 +853,7 @@ \section{Source Code Reference}\label{type resolution source ref} The \IndexSource{type resolution context}type resolution context, which encodes the position of a type representation: \begin{itemize} \item \texttt{TypeResolverContext::None}: no special type handling is required. -\item \texttt{TypeResolverContext::GenericArgument}: generic arguments of a bound generic type. +\item \texttt{TypeResolverContext::GenericArgument}: generic arguments of a generic nominal type. \item \texttt{TypeResolverContext::ProtocolGenericArgument}: generic arguments of a parameterized protocol type. \item \texttt{TypeResolverContext::TupleElement}: elements of a tuple element type. \item \texttt{TypeResolverContext::AbstractFunctionDecl}: the base context of a function declaration's parameter list. @@ -872,12 +872,12 @@ \section{Source Code Reference}\label{type resolution source ref} \item \texttt{TypeResolverContext::EnumPatternPayload}: the payload type of an enum element pattern. Tweaks the behavior of tuple element labels. \item \texttt{TypeResolverContext::TypeAliasDecl}: the underlying type of a non-generic type alias. \item \texttt{TypeResolverContext::GenericTypeAliasDecl}: the underlying type of a generic type alias. -\item \texttt{TypeResolverContext::ExistentialConstraint}: the constraint type of an existential type (\ChapRef{existentialtypes}); +\item \texttt{TypeResolverContext::ExistentialConstraint}: the constraint type of an existential type (\ChapRef{chap:existential types}); \item \texttt{TypeResolverContext::GenericRequirement}: the constraint type of a conformance requirement in a \texttt{where} clause. \item \texttt{TypeResolverContext::SameTypeRequirement}: the subject type or constraint type of a same-type requirement in a \texttt{where} clause. \item \texttt{TypeResolverContext::ProtocolMetatypeBase}: the instance type of a protocol metatype, like \texttt{P.Protocol}. \item \texttt{TypeResolverContext::MetatypeBase}: the base type of a concrete metatype, like \texttt{T.Type}. -\item \texttt{TypeResolverContext::ImmediateOptionalTypeArgument}: the argument of an optional type, like \texttt{T?}. This just tailors some diagnostics. +\item \texttt{TypeResolverContext::ImmediateOptionalTypeArgument}: the payload type of an \IndexSource{optional sugared type}optional sugared type, like \texttt{T?}. This just tailors some diagnostics. \item \texttt{TypeResolverContext::EditorPlaceholderExpr}: the type of an editor placeholder. \item \texttt{TypeResolverContext::Inherited}: the inheritance clause of a concrete type. \item \texttt{TypeResolverContext::GenericParameterInherited}: the inheritance clause of a generic parameter. @@ -952,7 +952,7 @@ \subsection*{Type Representations} Type representations store a source location and kind: \begin{itemize} -\item \texttt{getLoc()}, \texttt{getSourceRange()} returns the source location and source range of this type representation. +\item \texttt{getLoc()}, \texttt{getSourceRange()} returns the \IndexSource{source location}source location and \IndexSource{source range}source range of this type representation. \item \texttt{getKind()} returns the \texttt{TypeReprKind}. \end{itemize} Each \texttt{TypeReprKind} corresponds to a subclass of \texttt{TypeRepr}. Instances of subclasses support safe downcasting via the \verb|isa<>|, \verb|cast<>| and \verb|dyn_cast<>| template functions, @@ -1055,14 +1055,14 @@ \subsection*{Applying Generic Arguments} Factor of \texttt{applyGenericArguments()} to build the substitution map from the base type and check requirements of \IndexSource{contextually-generic declaration}contextually-generic declarations. \apiref{Requirement}{class} -See also \SecRef{genericsigsourceref}. Requirement substitution: +See also \SecRef{src:generic signatures}. Requirement substitution: \begin{itemize} \item \texttt{subst()} applies a substitution map to this \IndexSource{requirement}requirement, returning the \IndexSource{substituted requirement}substituted requirement. \end{itemize} Checking requirements: \begin{itemize} -\item \texttt{checkRequirement()} answers if a single requirement is satisfied, implementing \AlgRef{reqissatisfied}. Any conditional requirements are returned via a \texttt{SmallVector} supplied by the caller, who must then check these requirements recursively. -\item \texttt{checkRequirements()} checks each requirement in an array; if any have their own conditional requirements, those are checked too. This implements \AlgRef{check generic arguments algorithm} for when the caller needs to check a condition without emitting diagnostics. For example, this is used by \texttt{checkConformance()} to check conditional requirements in \SecRef{extensionssourceref}. +\item \texttt{checkRequirement()} answers if a single requirement is satisfied, implementing \AlgRef{reqissatisfied}. Any \IndexSource{conditional requirement}conditional requirements are returned via a \texttt{SmallVector} supplied by the caller, who must then check these requirements recursively. +\item \texttt{checkRequirements()} checks each requirement in an array; if any have their own conditional requirements, those are checked too. This implements \AlgRef{check generic arguments algorithm} for when the caller needs to check a condition without emitting diagnostics. For example, this is used by \texttt{checkConformance()} to check conditional requirements in \SecRef{src:extensions}. \end{itemize} \apiref{checkGenericArgumentsForDiagnostics()}{function} diff --git a/docs/Generics/chapters/type-substitution-summary.tex b/docs/Generics/chapters/type-substitution-summary.tex index 2a70c10e7e093..3115218428e46 100644 --- a/docs/Generics/chapters/type-substitution-summary.tex +++ b/docs/Generics/chapters/type-substitution-summary.tex @@ -4,7 +4,7 @@ \chapter{Substitution Algebra}\label{notation summary} -This is a \index{$\otimes$}summary of the algebra we used to describe various operations on types, substitution maps, and conformances. See \ChapRef{substmaps}, \ChapRef{conformances}, and \ChapRef{conformance paths}. +This is a \index{$\otimes$}summary of our algebraic notation for various operations on types, substitution maps, and conformances. Details in \ChapRef{chap:substitution maps}, \ChapRef{chap:conformances}, and \ChapRef{conformance paths}. \begin{tabbing} XXXXXXXX \= XXXXXXXXXXXXXXX \= XXXX \= XXXXXXX \= \kill @@ -17,13 +17,13 @@ \chapter{Substitution Algebra}\label{notation summary} \IndexSet{sub}{\SubMapObj{G}{H}}$\SubMapObj{G}{H}$ \> \index{substitution map!summary}Substitution maps with \index{input generic signature!summary}input signature $G$ and \index{output generic signature!summary}output signature $H$\\ \IndexSet{conf}{\ConfObj{G}}$\ConfObj{G}$ \> \index{conformance}Conformances with output generic signature $G$\\ \IndexSet{req}{\ReqObj{G}}$\ReqObj{G}$ \> \index{requirement}Requirements containing interface types of $G$\\[\bigskipamount] -$\TypeObj{G}\otimes\SubMapObj{G}{H}\rightarrow\TypeObj{H}$ \> \> \index{type substitution!summary}Type substitution \` \ChapRef{substmaps}\\ -$\SubMapObj{G}{H}\otimes\SubMapObj{H}{I}\rightarrow\SubMapObj{G}{I}$ \> \> \index{substitution map composition!summary}Substitution map composition \` \SecRef{submapcomposition}\\ +$\TypeObj{G}\otimes\SubMapObj{G}{H}\rightarrow\TypeObj{H}$ \> \> \index{type substitution!summary}Type substitution \` \ChapRef{chap:substitution maps}\\ +$\SubMapObj{G}{H}\otimes\SubMapObj{H}{I}\rightarrow\SubMapObj{G}{I}$ \> \> \index{substitution map composition!summary}Substitution map composition \` \SecRef{sec:composition}\\ $\ConfObj{G}\otimes\SubMapObj{G}{H}\rightarrow\ConfObj{H}$ \> \> \index{conformance substitution map!summary}Conformance substitution \` \SecRef{conformance subst}\\ $\ReqObj{G}\otimes\SubMapObj{G}{H}\rightarrow\ReqObj{H}$ \> \> \index{substituted requirement!summary}Requirement substitution \` \SecRef{checking generic arguments} \end{tabbing} -\paragraph{Substitution maps.} A substitution map $\Sigma\in\SubMapObj{G}{H}$ consists of an array of \index{replacement type!summary}types from $\TypeObj{H}$, and an array of \index{root conformance!summary}conformances from $\ConfObj{H}$. If \tX\ is a generic nominal type, then $\tX=\tXd\otimes\Sigma$ for some nominal type declaration~$d$ and substitution map~$\Sigma$ (\SecRef{contextsubstmap}). +\paragraph{Substitution maps.} A substitution map $\Sigma\in\SubMapObj{G}{H}$ contains an array of \index{replacement type!summary}interface types from $\TypeObj{H}$, and an array of \index{root conformance!summary}conformances from $\ConfObj{H}$. \begin{tabbing} XXXXXXXXXXXXXXXXXXXX \= \kill @@ -38,41 +38,42 @@ \chapter{Substitution Algebra}\label{notation summary} \qquad $\XP \otimes 1_G = \XP$ \> for all $\XP\in\ConfObj{G}$\\ \qquad $1_G \otimes \Sigma = \Sigma \otimes 1_H = \Sigma$ \> for all $\Sigma\in\SubMapObj{G}{H}$ \end{tabbing} +Every generic nominal type \tX\ is can be written as $\tX=\tXd\otimes\Sigma$ for some nominal type declaration~$d$ and substitution map~$\Sigma$ (\SecRef{contextsubstmap}). -\paragraph{Conformances.} A normal conformance $\NormalConf$ declares a series of \index{type witness!summary}type witnesses and \index{associated conformance!summary}associated conformances. If $\tX=\tXd\otimes\Sigma$, then $\XP$ is a specialized conformance formed from the normal conformance $\NormalConf$ with substitution map $\Sigma$. +\paragraph{Conformances.} A normal conformance $\NormalConf$ declares a series of \index{type witness!summary}type witnesses and \index{associated conformance!summary}associated conformances. If $\tX=\tXd\otimes\Sigma$, then $\XP=\NormalConf \otimes \Sigma$ is a specialized conformance formed from the normal conformance $\NormalConf$ and substitution map $\Sigma$. \begin{tabbing} XXXXXXXXX \= XXXXXXXXXXXXXXX \= X \= XXXXXXXXX \= \kill \IndexSet{proto}{\ProtoObj}$\ProtoObj$ \> \index{protocol declaration!summary}All protocol declarations\\ -\IndexSet{assoctype}{\AssocTypeObj{P}}$\AssocTypeObj{P}$ \> \index{associated type declaration!summary}Associated types declared in \tP\\ -\IndexSet{assocconf}{\AssocConfObj{P}}$\AssocConfObj{P}$ \> \index{associated conformance requirement!summary}Associated conformance requirements declared in \tP\\ -\IndexSet{confp}{\ConfPObj{P}{G}}$\ConfPObj{P}{G}$ \> Conformances to \tP\ with output generic signature $G$\\[\medskipamount] -$\PP$ \> A protocol\\ -$\AssocType{P}{A}$ \> An associated type declaration\\ -$\SelfUQ$ \> An associated conformance requirement\\ -\texttt{T.[P]A} \> Dependent member type \\ +\IndexSet{assoctype}{\AssocTypeObj{P}}$\AssocTypeObj{P}$ \> \index{associated type declaration!summary}All associated type declarations of $\tP\in\ProtoObj$\\ +\IndexSet{assocconf}{\AssocConfObj{P}}$\AssocConfObj{P}$ \> \index{associated conformance requirement!summary}All associated conformance requirements declared of $\tP\in\ProtoObj$\\ +\IndexSet{confp}{\ConfPObj{P}{G}}$\ConfPObj{P}{G}$ \> The set of all $\XP\in\ConfObj{G}$ for a fixed $\tP\in\ProtoObj$\\[\medskipamount] +$\PP$ \> Protocol declaration named \tP\\ +$\AssocType{P}{A}$ \> Associated type declaration named \nA\ in protocol \tP\\ +$\SelfUQ$ \> Associated conformance requirement\\ +\texttt{T.[P]A} \> Dependent member type with base type \tT\ and associated type \nA\ of \tP\\ $\Sigma_{\XP}$ \> \index{protocol substitution map!summary}Protocol substitution map $\SubstMapC{\SubstType{\rT}{X}}{\SubstConf{\rT}{X}{P}}$\\[\bigskipamount] $\ProtoObj\otimes\TypeObj{G}\rightarrow\ConfObj{G}$ \> \> \index{global conformance lookup!summary}Global conformance lookup \` \SecRef{conformance lookup}\\ $\AssocTypeObj{P}\otimes\ConfPObj{P}{G}\rightarrow\TypeObj{G}$ \> \> \index{type witness!summary}Type witness projection \` \SecRef{type witnesses}\\ $\AssocConfObj{P}\otimes\ConfPObj{P}{G}\rightarrow\ConfObj{G}$ \> \> \index{associated conformance projection!summary}Associated conformance proj. \` \SecRef{associated conformances}\\[\bigskipamount] Global conformance lookup:\\ -\qquad $\PP \otimes \tXd := \NormalConf$\\ -\qquad $\PP \otimes (\tXd \otimes \Sigma) := \NormalConf \otimes \Sigma$\\ -\qquad $\PP \otimes \tT := \TP$\\ +\qquad $\PP \otimes \tXd := \NormalConf$\` normal\\ +\qquad $\PP \otimes (\tXd \otimes \Sigma) := \NormalConf \otimes \Sigma$ \` specialized\\ +\qquad $\PP \otimes \tT := \TP$ \` abstract\\ \qquad $(\PP \otimes \tT) \otimes \Sigma = \PP \otimes (\tT \otimes \Sigma)$ \` \SecRef{abstract conformances}\\[\medskipamount] Specialized conformance substitution:\\ -\qquad $(\tXd \otimes \Sigma_1) \otimes \Sigma_2 := \tXd \otimes (\Sigma_1 \otimes \Sigma_2)$\\[\medskipamount] +\qquad $(\NormalConf \otimes \Sigma_1) \otimes \Sigma_2 := \NormalConf \otimes (\Sigma_1 \otimes \Sigma_2)$\\[\medskipamount] For each $\APA \in \AssocTypeObj{P}$:\\ -\qquad $\APA\otimes \NormalConf := \text{declared in source}$\\ -\qquad $\APA\otimes (\NormalConf\otimes \Sigma) := (\APA\otimes \NormalConf) \otimes \Sigma$\\ -\qquad $\AssocType{P}{A} \otimes \TP := \texttt{T.[P]A}$ \` \SecRef{abstract conformances}\\[\medskipamount] +\qquad $\APA\otimes \NormalConf := \text{declared in source}$ \` normal\\ +\qquad $\APA\otimes (\NormalConf\otimes \Sigma) := (\APA\otimes \NormalConf) \otimes \Sigma$ \` specialized\\ +\qquad $\AssocType{P}{A} \otimes \TP := \texttt{T.[P]A}$ \` \SecRef{abstract conformances} \` abstract\\[\medskipamount] For each $\SelfUQ \in \AssocConfObj{P}$:\\ -\qquad $\SelfUQ\otimes \NormalConf := \PQ \otimes \SelfU \otimes \Sigma_{\NormalConf}$\\ -\qquad $\SelfUQ\otimes (\NormalConf \otimes \Sigma) := (\SelfUQ \otimes \NormalConf) \otimes \Sigma$\\ -\qquad $\SelfUQ \otimes \TP := \ConfReq{T.U}{Q}$\\[\medskipamount] -Dependent member types:\\ +\qquad $\SelfUQ\otimes \NormalConf := \PQ \otimes \SelfU \otimes \Sigma_{\NormalConf}$ \` normal\\ +\qquad $\SelfUQ\otimes (\NormalConf \otimes \Sigma) := (\SelfUQ \otimes \NormalConf) \otimes \Sigma$ \` specialized\\ +\qquad $\SelfUQ \otimes \TP := \ConfReq{T.U}{Q}$ \` abstract\\[\medskipamount] +Dependent member type substitution:\\ \qquad $\texttt{T.[P]A} \otimes \Sigma := \AssocType{P}{A} \otimes (\TP \otimes \Sigma)$ \` \SecRef{abstract conformances}\\[\medskipamount] -Local conformance lookup:\\ +Local conformance lookup using a conformance path:\\ \qquad $\TP \otimes \Sigma := \AssocConf{Self.$\texttt{U}_n$}{$\texttt{P}_n$} \otimes diff --git a/docs/Generics/chapters/types.tex b/docs/Generics/chapters/types.tex index d06b9976913cc..9b07f758a325a 100644 --- a/docs/Generics/chapters/types.tex +++ b/docs/Generics/chapters/types.tex @@ -2,9 +2,9 @@ \begin{document} -\chapter{Types}\label{types} +\chapter{Types}\label{chap:types} -\lettrine{R}{easoning about types} is a central concern in the implementation of a statically typed language. In Swift, various syntactic forms such as \texttt{Int}, \texttt{Array} and \texttt{(Bool) -> ()} denote references to types. A \index{type representation}\emph{type representation} is the syntactic form of a type annotation written in source, as constructed by the parser. A \IndexDefinition{type}\emph{type} is a higher-level semantic object. Types are constructed from type representations by \index{type}\emph{type resolution}. They can also be built and taken apart directly. +\lettrine{R}{easoning about types} is a central concern in the implementation of a statically typed language. In Swift, various syntactic forms such as \texttt{Int}, \texttt{Array}, and \texttt{(Bool) -> ()} denote references to types. A \index{type representation}\emph{type representation} is the syntactic form of a type annotation written in source, as constructed by the parser. A \IndexDefinition{type}\emph{type} is a higher-level semantic object. Types are constructed from type representations by \emph{type resolution}. They can also be built and taken apart directly. \medskip @@ -64,7 +64,7 @@ \chapter{Types}\label{types} \end{tikzpicture} \end{wrapfigure} -Type resolution calls upon name lookup to find type declarations, here \texttt{Array} and \texttt{Int}, and validates the generic argument, to produce a semantic type object. +To resolve this type representation, we first use name lookup to find the type declarations for \texttt{Array} and \texttt{Int}. From these, we form the generic nominal type \texttt{Array}. The generic nominal type \texttt{Array} points at the \texttt{Array} type declaration, and contains a child node for its generic argument, which is the type \texttt{Int}. The latter type also points at the declaration of \texttt{Int}, and not just an identifier. @@ -144,19 +144,19 @@ \chapter{Types}\label{types} \end{minipage} \medskip -Outside of type resolution, type representations do not play a big role in the compiler, so we punt on the topic of type representations until \ChapRef{typeresolution} and just focus on types for now. For our current purposes, it suffices to say that type resolution is really just one possible mechanism by which types are constructed. The expression checker builds types by solving a constraint system, and the generics system builds types via substitution, to give two examples. +Outside of type resolution, type representations do not play a big role in the compiler, so we punt on the topic of type representations until \ChapRef{chap:type resolution} and just focus on types for now. For our current purposes, it suffices to say that type resolution is really just one possible mechanism by which types are constructed. The \index{expression type checker}expression checker builds types by solving a constraint system, and the generics system builds types via substitution, to give two examples. \paragraph{Structural components.} A type is constructed from structural components, which may either be other types, or non-type information. Common examples include: nominal types, which consist of a pointer to a declaration, together with a list of \index{generic argument}generic argument types; \index{tuple type}tuple types, which have element types and labels; and \index{function type}function types, which contain parameter types, return types, and various additional bits like \texttt{@escaping} and \texttt{inout}. We will give a full accounting of all type kinds and their structural components in the second half of this chapter. Once created, types are immutable. To say that a type \emph{contains} another type means that the latter appears as a structural component of the former, perhaps nested several levels deep. We will often talk about \emph{replacing} a type contained by another type. This is understood as constructing a new type with the same kind as the original type, preserving all structural components except for the one being replaced. The original type is never mutated directly. -More generally, types can be transformed by taking the type apart by kind, recursively transforming each structural component, and forming a new type of the same kind from the new components. To preview \ChapRef{substmaps}, if \texttt{Element} is a generic parameter type, the type \texttt{Array} can be formed from \texttt{Array} by replacing \texttt{Element} with \texttt{Int}; this is called \emph{type substitution}. The compiler provides various utilities to simplify the task of implementing recursive walks and transformations over kinds of types; type substitution is one example of such a transformation. +More generally, types can be transformed by taking the type apart by kind, recursively transforming each structural component, and forming a new type of the same kind from the new components. To preview \ChapRef{chap:substitution maps}, if \texttt{Element} is a generic parameter type, the type \texttt{Array} can be formed from \texttt{Array} by replacing \texttt{Element} with \texttt{Int}; this is called \emph{type substitution}. The compiler provides various utilities to simplify the task of implementing recursive walks and transformations over kinds of types; type substitution is one example of such a transformation. \paragraph{Canonical types.} It is possible for two types to differ in their spelling, and yet be equivalent semantically: \begin{itemize} \item The Swift language defines some shorthands for common types, such as \texttt{T?} for \texttt{Optional}, \texttt{[T]} for \texttt{Array}, and \texttt{[K:\ V]} for \texttt{Dictionary}. -\item \index{type alias declaration}Type alias declarations introduce a new name for some existing underlying type, equivalent to writing out the \index{underlying type}underlying type in place of the type alias. The standard library, for example, declares a type alias \IndexDefinition{Void type@\texttt{Void} type}\texttt{Void} with underlying type \texttt{()}. -\item Another form of fiction along these lines is the preservation of generic parameter names. \index{generic parameter type}Generic parameter types written in source have a name, like ``\texttt{Element},'' and should be printed back as such in diagnostics, but internally they are uniquely identified in their generic signature by a pair of integers, the \index{depth}depth and the \index{index}index. This is detailed in \SecRef{generic params}. +\item \index{type alias declaration}Type alias declarations introduce a new name for some existing underlying type, equivalent to writing out the \Index{underlying type!of type alias declaration}underlying type in place of the type alias. The standard library, for example, declares a type alias \IndexDefinition{Void type@\texttt{Void} type}\texttt{Void} with underlying type \texttt{()}. +\item Another form of fiction along these lines is the preservation of generic parameter names. \index{generic parameter type!type sugar}Generic parameter types written in source have a name, like ``\texttt{Element},'' and should be printed back as such in diagnostics, but internally they are uniquely identified in their generic signature by a pair of integers, the \index{depth}depth and the \index{index}index. This is detailed in \SecRef{generic params}. \end{itemize} These constructions are the so-called \IndexDefinition{sugared type}\emph{sugared types}. A sugared type has a desugaring into a more primitive form in terms of its structural components. The compiler constructs type sugar in \index{type resolution}type resolution, and attempts to preserve it as much as possible when transforming types. Preserving sugar in diagnostics can be especially helpful with more complex type aliases and such. @@ -164,7 +164,7 @@ \chapter{Types}\label{types} The compiler can transform an arbitrary type into a canonical type by the process of \emph{canonicalization}, which recursively replaces sugared types with their desugared form; in this way, \texttt{[(Int?, Void)]} becomes \verb|Array<(Optional, ())>|. This operation is very cheap; each type caches a pointer to its canonical type, which is computed as needed (so types are not completely immutable, as we said previously; but the mutability cannot be observed from outside). -One notable exception where the type checker does depend on type sugar is the rule for default initialization of variables: if the variable's type is declared as the \index{optional sugared type}sugared optional type \texttt{T?} for some \tT, the variable's \index{initial value expression}initial value \index{expression}expression is assumed to be \texttt{nil} if none was provided. Spelling the type as \texttt{Optional} avoids the default initialization behavior: +One notable case where the type checker's behavior \emph{does} depend on type sugar is with default initialization of variables: if a variable's type is declared as the \index{optional sugared type}sugared optional type \texttt{T?} for some \tT, the variable's \index{initial value expression}initial value \index{expression}expression is assumed to be \texttt{nil} if none was provided. Spelling the type as \texttt{Optional} avoids the default initialization behavior: \begin{Verbatim} var x: Int? print(x) // prints `nil' @@ -179,7 +179,7 @@ \chapter{Types}\label{types} \begin{enumerate} \item \IndexDefinition{type pointer equality}\textbf{Type pointer equality} checks if two types are exactly equal as trees. \item \IndexDefinition{canonical type equality}\textbf{Canonical type equality} checks if two types are equal after sugar is removed. -\item \index{reduced type equality}\textbf{Reduced type equality} checks if two types have the same \index{reduced type}reduced type with respect to the same-type requirements of a generic signature. +\item \index{reduced type equality}\textbf{Reduced type equality} checks if two types have the same \index{reduced type}reduced type with respect to a generic signature, in a sense we will make precise. \end{enumerate} Each level of equality implies the next, so $(1)\Rightarrow(2)$ and $(2)\Rightarrow(3)$. If both types are canonical, then (1) and (2) coincide; if both are reduced, (1), (2) and (3) all coincide. Type pointer equality is only infrequently used, precisely because it is too strict; we usually do not want to consider two types to be distinct if they only differ by sugar. @@ -228,7 +228,7 @@ \chapter{Types}\label{types} \item When we consider the generic signature of the extension, we're left with just one reduced type, because both canonical types reduce to \texttt{Optional<\ttgp{0}{0}>} via the same-type requirement. Thus, all four of the original types are equal under reduced type equality. \end{itemize} -Reduced type equality means ``equivalent as a consequence of one or more same-type requirements.'' We will define this equivalence of type parameters in \SecRef{valid type params} using the derived requirements formalism, and then generalize to all interface types in \SecRef{genericsigqueries}. Presenting a computable algorithm for reduced type equality is one of our main results in this book; key developments take place in \SecRef{rewritesystemintro} and \ChapRef{symbols terms rules}. +Reduced type equality means ``equivalent as a consequence of one or more same-type requirements.'' We will define this equivalence of type parameters in \SecRef{valid type params} using the derived requirements formalism, and then generalize to all interface types in \SecRef{genericsigqueries}. Presenting a decision procedure for reduced type equality is one of our main results in this book; key developments take place in \SecRef{rewritesystemintro} and \ChapRef{chap:symbols terms rules}. \section{Fundamental Types}\label{fundamental types} @@ -345,7 +345,7 @@ \section{Fundamental Types}\label{fundamental types} \end{center} \end{figure} -\paragraph{Generic parameter types.} A \IndexDefinition{generic parameter type}generic parameter type abstracts over a generic argument provided by the caller. Generic parameter types are declared by \index{generic parameter declaration}generic parameter declarations, described in \SecRef{generic params}. The sugared form references the declaration, and prints as the declaration's name; the canonical form only stores a depth and an index. Care must be taken not to print canonical generic parameter types in \index{diagnostic!printing generic parameter type}diagnostics, to avoid surfacing the ``\ttgp{1}{2}'' notation to the user. (We will show how to transform a canonical generic parameter type into its sugared form at the end of \SecRef{genericsigsourceref}.) +\paragraph{Generic parameter types.} A \IndexDefinition{generic parameter type}generic parameter type abstracts over a generic argument provided by the caller. Generic parameter types are declared by \index{generic parameter declaration}generic parameter declarations, described in \SecRef{generic params}. The sugared form references the declaration, and prints as the declaration's name; the canonical form only stores a depth and an index. Care must be taken not to print canonical generic parameter types in \index{diagnostic!printing generic parameter type}diagnostics, to avoid surfacing the ``\ttgp{1}{2}'' notation to the user. (We will show how to transform a canonical generic parameter type into its sugared form at the end of \SecRef{src:generic signatures}.) \paragraph{Dependent member types.} A \IndexDefinition{dependent member type}dependent member type abstracts over a concrete type that fulfills an associated type requirement. It has two structural components: @@ -354,32 +354,33 @@ \section{Fundamental Types}\label{fundamental types} \item An \index{identifier!dependent member type}identifier (in which case this is an \IndexDefinition{unbound dependent member type}\emph{unbound} dependent member type), or an \index{associated type declaration!dependent member type}associated type declaration (in which case it is \IndexDefinition{bound dependent member type}\emph{bound}). \end{itemize} -In \ChapRef{typeresolution}, we describe the two stages of type resolution. Unbound dependent member types appear in the \index{structural resolution stage}structural resolution stage, when we resolve the requirements in the \texttt{where} clause to feed into the generic signature construction procedure. Once we have a generic signature, we move on to \index{interface resolution stage}interface resolution stage, and dependent member types written elsewhere are fully resolved into their bound form. +In \ChapRef{chap:type resolution}, we describe the two stages of type resolution. Unbound dependent member types appear in the \index{structural resolution stage}structural resolution stage, when we resolve the requirements in the \texttt{where} clause to feed into the generic signature construction procedure. Once we have a generic signature, we move on to \index{interface resolution stage}interface resolution stage, and dependent member types written elsewhere are fully resolved into their bound form. If \tT\ is the base type, \texttt{P} is a protocol and \texttt{A} is an associated type declared inside this protocol, we denote the bound dependent member type by \texttt{T.[P]A}, and the unbound dependent member type by \texttt{T.A}. The base type \tT\ can be another dependent member type, so we get a recursive structure like \texttt{\ttgp{0}{0}.[P]A.[Q]B}. \FigRef{type params fig} illustrates this with two dependent member types, bound and unbound. -A dependent member type is ``dependent'' in the C++ sense, \emph{not} a \index{dependent type}type dependent on a value in the \index{lambda cube}``lambda cube'' sense. Generic parameter types and dependent member types are together known as \emph{type parameters}. A type that might contain type parameters but is not necessarily a type parameter itself is called an \IndexDefinition{interface type}\emph{interface type}. The above summary necessarily leaves many questions unanswered, because a significant portion of the rest of the book is devoted to type parameters. Key topics include: +A \IndexDefinition{type parameter}\emph{type parameter} is a generic parameter type or dependent member type (whose base is then another type parameter). An \index{interface type}\emph{interface type} is a \index{type!containing type parameters}type that might contain type parameters, but is not necessarily a type parameter itself. A significant portion of this book is devoted to understanding type parameters. Key topics include: \begin{itemize} \item Semantic validity of type parameters (\SecRef{derived req}, \SecRef{valid type params}). \item Generic signature queries (\SecRef{genericsigqueries}). \item Dependent member type substitution (\SecRef{abstract conformances}, \ChapRef{conformance paths}). -\item Type resolution with bound and unbound type parameters (\ChapRef{typeresolution}). +\item Type resolution with bound and unbound type parameters (\ChapRef{chap:type resolution}). \end{itemize} +Finally, a note about terminology. This usage of ``dependent'' comes from C++, and this is not a \index{dependent type}type dependent on a value in the \index{lambda cube}``lambda cube'' sense. \paragraph{Archetype types.} -Type parameters derive their meaning from the requirements of a generic signature; they are only ``names'' of external entities, in a sense. \IndexDefinition{archetype type}Archetypes are an alternate ``self-describing'' representation. Archetypes are instantiated from a \emph{generic environment}, which stores a generic signature together with other information (\ChapRef{genericenv}). +Type parameters derive their meaning from the requirements of a generic signature; they are only ``names'' of external entities, in a sense. \IndexDefinition{archetype type}Archetypes are an alternate ``self-describing'' representation. Archetypes are instantiated from a \emph{generic environment}, which stores a generic signature together with other information (\ChapRef{chap:archetypes}). A \index{contextual type}\emph{contextual type} is a type that might contain archetypes but is not necessarily an archetype itself. -Archetypes occur inside \index{expression}expressions and \index{SIL}SIL instructions. Archetypes also represent references to opaque return types (\SecRef{opaquearchetype}) and the type of the payload inside of an existential (\SecRef{open existential archetypes}). In \index{diagnostic!printing archetype type}diagnostics, an archetype is printed as the type parameter it represents. We will denote by $\archetype{T}$ the archetype for the type parameter \tT\ in some generic environment understood from context. A type that might contain archetypes but is not necessarily an archetype itself is called a \index{contextual type}\emph{contextual type}. +Archetypes occur inside \index{expression}expressions and \index{SIL}SIL instructions. Archetypes also represent references to opaque result types (\SecRef{opaquearchetype}) and the type of the payload inside of an existential (\SecRef{open existential archetypes}). In \index{diagnostic!printing archetype type}diagnostics, an archetype is printed as the type parameter it represents. We will denote by $\archetype{T}$ the archetype for the type parameter \tT\ in some generic environment understood from context. \medskip -The fundamental type kinds we surveyed above---nominal types, type parameters, and archetypes---are \textsl{the Swift types that can conform to protocols}. In other words, they can satisfy the left-hand side of a \index{conformance requirement!checking}conformance requirement, with the details for each type given later in \SecRef{conformance lookup}. Also important are \index{constraint type}\emph{constraint types}; these are \textsl{the types appearing on the right hand side of conformance requirements}. Constraint types themselves are never the types of value-producing expressions. (A type-erased value has an existential type, which wraps a constraint type, as we will see in the next section.) +The fundamental type kinds we surveyed above---nominal types, type parameters, and archetypes---are \textsl{the Swift types that can conform to protocols}. In other words, they can satisfy the left-hand side of a \index{conformance requirement!checking}conformance requirement, with the details for each type given later in \SecRef{conformance lookup}. Also important are \index{constraint type}\emph{constraint types}; these are \textsl{the types appearing on the right hand side of conformance requirements}. Constraint types themselves are never the types of value-producing expressions. (A value can have an existential type though, which wraps a constraint type, as we will see next.) \paragraph{Protocol types.} A protocol type is the most fundamental kind of constraint type; a conformance requirement involving any other kind of constraint type can always be split up into simpler conformance requirements. A \IndexDefinition{protocol type}protocol type is a kind of \index{nominal type}nominal type, so it will have a \index{parent type}parent type if the protocol declaration is nested inside of another nominal type declaration. Unlike other nominal types, protocols cannot be nested in generic contexts (\SecRef{nested nominal types}), so neither the protocol type itself nor any of its parents can have generic arguments. Thus, there is exactly one protocol type corresponding to each protocol declaration. \paragraph{Protocol composition types.} -A \IndexDefinition{protocol composition type}protocol composition type is a constraint type with a list of members. On the right hand side of a conformance requirement, protocol compositions \emph{decompose} into a series of requirements for each member of the composition (\SecRef{requirement desugaring}). The members can include protocol types, a class type (at most one), and the \Index{AnyObject@\texttt{AnyObject}}\texttt{AnyObject} layout constraint: +A \IndexDefinition{protocol composition type}protocol composition type may contain any number of protocol types, and at most one class type or \Index{AnyObject@\texttt{AnyObject}}\texttt{AnyObject} layout constraint: \begin{quote} \begin{verbatim} P & Q @@ -387,13 +388,14 @@ \section{Fundamental Types}\label{fundamental types} SomeClass & P \end{verbatim} \end{quote} +The \index{empty protocol composition|see{\texttt{Any}}}empty protocol composition is spelled \texttt{Any}. If the right-hand side of a conformance requirement is a \IndexDefinition{protocol composition type}protocol composition type, we generate a series of simpler requirements, one for each member of the composition (\SecRef{requirement desugaring}). \IndexSwift{2.2}Swift~2.2 and prior used the syntax ``\verb|protocol|''; \IndexSwift{3.0}Swift~3 introduced the modern spelling~\cite{se0095}. So now \texttt{Any} is a special case, but of course it used to be \IndexDefinition{Any@\texttt{Any}}``\verb|typealias Any = protocol<>|''. \paragraph{Parameterized protocol types.} A \index{constrained protocol type|see{parameterized protocol type}}\IndexDefinition{parameterized protocol type}parameterized protocol type stores a protocol type together with a list of generic arguments. As a constraint type, it expands into a conformance requirement together with one or more same-type requirements, for each of the protocol's \emph{primary associated types} (\SecRef{protocols}). The written representation looks just like a generic nominal type, except the named declaration is a protocol, for example, \texttt{Sequence}. Parameterized protocol types were introduced in \IndexSwift{5.7}Swift 5.7~\cite{se0346} (the evolution proposal calls them ``constrained protocol types''). -\section{More Types}\label{more types} +\section{More Types}\label{sec:more types} -Now we will look at the various \IndexDefinition{structural type}\emph{structural types} which are part of the language. (Not to be confused with the types produced by the \emph{structural resolution stage}, which is discussed in \ChapRef{typeresolution}.) +Various \IndexDefinition{structural type}\emph{structural types} can also be formed from other types. (Not to be confused with the types produced by the \emph{structural resolution stage}, discussed in \ChapRef{chap:type resolution}.) \begin{wrapfigure}[15]{r}{6.5cm} \begin{center} @@ -413,9 +415,9 @@ \section{More Types}\label{more types} \paragraph{Existential types.} An \index{existential type}existential type has one structural component, the \emph{constraint type}. An existential value is a container holding a value of some unknown dynamic type that is known to satisfy the constraint; to the right we show the existential type \verb|any (P & Q)|, which stores a value conforming to both \tP\ and \tQ. -The \texttt{any} keyword was added in \IndexSwift{5.6}Swift~5.6~\cite{se0355}; in Swift releases prior, existential types and constraint types were the same concept in the language and implementation. (For the sake of source compatibility, a constraint type without the \texttt{any} keyword still resolves to an existential type except when it appears on the right-hand side of a conformance requirement.) +The \texttt{any} keyword was added in \IndexSwift{5.6}Swift~5.6~\cite{se0335}; in Swift releases prior, existential types and constraint types were the same concept in the language and implementation. (For the sake of source compatibility, a constraint type without the \texttt{any} keyword still resolves to an existential type except when it appears on the right-hand side of a conformance requirement.) -Existential types are covered in \ChapRef{existentialtypes}. +Existential types are covered in \ChapRef{chap:existential types}. \begin{wrapfigure}[9]{r}{3cm} \begin{tikzpicture} @@ -426,7 +428,7 @@ \section{More Types}\label{more types} \end{tikzpicture} \end{wrapfigure} -\paragraph{Metatype types.} A type \tT\ can be the callee in a \index{call expression}call expression, \texttt{T(...)}; this is shorthand for a constructor call \texttt{T.init(...)}. It can serve as the base of a static method call, \texttt{T.foo(...)}, where the type is passed as the \texttt{self} parameter. Finally, it can be directly referenced by the expression \texttt{T.self}. In all cases, the type becomes a \emph{value}, and this value must itself be assigned a type; this type is called a \emph{metatype}. The metatype of a type \tT\ is written as \texttt{T.Type}. The type \tT\ is the \IndexDefinition{instance type}\emph{instance type} of the metatype. For example, the type of the expression ``\verb|Int.self|'' is the metatype \texttt{Int.Type}, whose instance type is \verb|Int|. Metatypes are sometimes referred to as \IndexDefinition{concrete metatype type}\emph{concrete metatypes}, to distinguish them from existential metatypes, which we introduce below. Most concrete metatypes are singleton types with one value, the instance type itself. One exception is that the class metatype for a non-final class also has all subclasses of the class as values. +\paragraph{Metatype types.} A type \tT\ can be the callee in a \index{call expression}call expression, \texttt{T(...)}; this is shorthand for a constructor call \texttt{T.init(...)}. It can serve as the base of a static method call, \texttt{T.foo(...)}, where the type is passed as the \texttt{self} parameter. Finally, it can be directly referenced by the expression \texttt{T.self}. In all cases, the type becomes a \emph{value}, and this value must itself be assigned a type; this type is called a \IndexDefinition{metatype type}\emph{metatype}. The metatype of a type \tT\ is written as \texttt{T.Type}. The type \tT\ is the \IndexDefinition{instance type of metatype}\emph{instance type} of the metatype. For example, the type of the expression ``\verb|Int.self|'' is the metatype \texttt{Int.Type}, whose instance type is \verb|Int|. Metatypes are sometimes referred to as \IndexDefinition{concrete metatype type}\emph{concrete metatypes}, to distinguish them from existential metatypes, which we introduce below. Most concrete metatypes are singleton types with one value, the instance type itself. One exception is that the class metatype for a non-final class also has all subclasses of the class as values. \begin{figure}[b!]\captionabove{Existential metatype and metatype of existential}\label{existential metatype fig} \begin{center} @@ -487,19 +489,19 @@ \section{More Types}\label{more types} An unlabeled one-element tuple type cannot be formed at all; \texttt{(T)} resolves to the same type as \tT. Labeled one-element tuple types \texttt{(foo:\ T)} are valid in the grammar, but are rejected by type resolution. \index{SILGen}SILGen creates them internally when it materializes the payload of an enum case (for instance, ``\texttt{case person(name:\ String)}''), but they do not appear as the types of expressions. -\paragraph{Function types.} A \IndexDefinition{function type}function type is the type of the callee in a \index{call expression}call expression. It contains a parameter list, a return type, and non-type attributes. The attributes include the function's effect, lifetime, and calling convention. The effects are \texttt{throws} and \texttt{async} (part of the \IndexSwift{5.5}Swift~5.5 concurrency model \cite{se0296}). Function values with \index{non-escaping function type}non-escaping lifetime are second-class; they can only be passed to another function, captured by a non-escaping closure, or called. Only \index{escaping function type}escaping functions can be returned or stored inside other values. The four calling conventions are: +\paragraph{Function types.} A \IndexDefinition{function type}function type is the type of the callee in a \index{call expression}call expression. It contains a parameter list, a return type, and non-type attributes. The attributes include the function's effect, lifetime, and calling convention. The effects are \texttt{throws} and \texttt{async} (the latter is from the \IndexSwift{5.5}Swift~5.5 concurrency model \cite{se0296}). Values of function type with \index{non-escaping function type}non-escaping lifetime are second-class; they can only be passed to another function, captured by a non-escaping closure, or called. \index{escaping function type}Escaping functions can also be returned, or stored inside other values, as usual. The four calling conventions are: \begin{itemize} -\item The default ``thick'' convention, where the function is passed as a function pointer together with a reference-counted closure context. -\item \texttt{@convention(thin)}: the function is passed as a single function pointer, without a closure context. Thin functions cannot capture values from outer scopes. -\item \texttt{@convention(c)}: passed as a single function pointer, and also the parameter and return types must be representable in C. -\item \texttt{@convention(block)}: passed as an \index{Objective-C}Objective-C block, which allow captures but must have parameter and return types representable in Objective-C. +\item The default \IndexDefinition{thick function}``thick'' convention, where the function is passed as a function pointer together with a \index{reference count}reference-counted \index{closure context}closure context. +\item \texttt{@convention(thin)}: The value is a single function pointer using the Swift calling convention, \IndexDefinition{thin function}without a closure context. Thin functions cannot have \index{captured value}captures. +\item \texttt{@convention(c)}: The value is a single function pointer, and the parameter and return types must be representable in C. Once again, captures are not permitted. +\item \texttt{@convention(block)}: The value is an \index{Objective-C}Objective-C block. Captures are permitted, but the parameter and return types must be representable in Objective-C. \end{itemize} Each entry in the parameter list contains a parameter type and some non-type bits: \begin{itemize} -\item The \textbf{value ownership kind}, which can be the default, \texttt{inout}, \texttt{borrowing} or \texttt{consuming}. +\item The \index{ownership specifier}\textbf{ownership specifier}, whose possible values correspond to default ownership, \texttt{inout}, \texttt{borrowing}, or \texttt{consuming}. -The \texttt{inout} kind is key to Swift's mutable value type model; the interested reader can consult \cite{valuesemantics} for details. The last two were introduced in \IndexSwift{5.9}Swift~5.9 \cite{se0377}. +The \texttt{inout} kind is key to Swift's mutable value type model; the interested reader can consult \cite{valuesemantics} for details. The other two were introduced in \IndexSwift{5.9}Swift~5.9 \cite{se0377}. \item The \textbf{variadic} flag, in which case the parameter type must be an array type. @@ -573,23 +575,23 @@ \section{More Types}\label{more types} Once the above proposals were implemented, the compiler continued to model function types as having a single input type for quite some time, despite this being completely hidden from the user. After \IndexSwift{5.0}Swift~5, the function type representation fully converged with the semantic model of the language. -It is worth noting that metatypes, tuple types and function types play an important role in the core language model, but they are not essential to the formal analysis of generics; one can construct a toy implementation of Swift generics from nominal types and type parameters alone. Indeed, all structural types can be seen as special kinds of type constructors, which have no special behavior other than possibly containing type parameters which can be substituted. +Note that while \index{metatype type}metatypes, \index{tuple type}tuple types, and \index{function type}function types all an play an important role in the Swift language as a whole, they are not essential to the formal analysis of generics; one can construct a toy implementation of Swift generics with generic nominal types and type parameters alone. Indeed, from the viewpoint of the generics model, structural types have no intrinsic behaviors, other than possibly containing type parameters which can be substituted. \smallskip We finish this section by turning to sugared types. Sugared generic parameter types were already described in \SecRef{fundamental types}. Of the remaining kinds of \index{sugared type}sugared types, type alias types are defined by the user, and the other three are built-in to the language. -\paragraph{Type alias types.} A \IndexDefinition{type alias type}type alias type represents a reference to a type alias declaration. It contains an optional parent type, a substitution map, and the substituted \IndexDefinition{underlying type}underlying type. The canonical type of a type alias type is the substituted underlying type. The substitution map is formed in type resolution, from any generic arguments applied to the type alias type declaration itself, together with the generic arguments of the parent type (\SecRef{identtyperepr}). Type resolution applies this substitution map to the original underlying type of the type alias declaration to compute the substituted underlying type. The substitution map is preserved for printing, and for requirement inference (\SecRef{requirementinference}). +\paragraph{Type alias types.} A \IndexDefinition{type alias type}type alias type represents a reference to a type alias declaration. It consists of an optional parent type, a list of generic arguments, and the substituted \IndexDefinition{underlying type!of type alias type}underlying type. The canonical type of a type alias type is the substituted underlying type. The parent type is only present if the type alias declaration is a member of another type declaration, and the generic arguments are only present if the type alias is generic (\SecRef{identtyperepr}). The parent type and generic arguments are preserved by type resolution so that the type alias type can be printed faithfully, and also for requirement inference (\SecRef{requirementinference}). -\paragraph{Optional types.} The \IndexDefinition{optional sugared type}optional type is written as \texttt{T?} for some object type \tT; its canonical type is \texttt{Optional}. +\paragraph{Optional types.} The \IndexDefinition{optional sugared type}optional type is written as \texttt{T?} for some \IndexDefinition{payload type of optional}payload type \tT; its canonical type is \texttt{Optional}. \paragraph{Array types.} The \IndexDefinition{array sugared type}array type is written as \texttt{[E]} for some element type \texttt{E}; its canonical type is \texttt{Array}. \paragraph{Dictionary types.} The \IndexDefinition{dictionary sugared type}dictionary type is written as \texttt{[K: V]} for some key type \texttt{K} and value type \texttt{V}; its canonical type is \texttt{Dictionary}. -\section{Special Types}\label{misc types} +\section{Special Types}\label{sec:special types} -We now discuss a few of the remaining types, which are all weird in their own unique ways. They tend to only be valid in specific contexts, and some do not represent actual types of values at all. Their unexpected appearance can be a source of counter-examples and failed assertions. They all play important roles in the expression type checker, but again, do not really give us anything new if we consider the generics model from a purely formal viewpoint. +We now discuss a few of the remaining types, which are all weird in their own unique ways. They tend to only be valid in specific contexts, and some do not represent actual types of values at all. Their unexpected appearance can be a source of counterexamples and failed assertions. They all play important roles in the expression type checker, but again, do not really give us anything new if we consider the generics model from a purely formal viewpoint. \paragraph{Generic function types.} A \IndexDefinition{generic function type}generic function type has the same structural components as a function type, except it also stores a generic signature: @@ -599,12 +601,12 @@ \section{Special Types}\label{misc types} \end{verbatim} \end{quote} -Generic function types are the \index{interface type!function declaration}interface types of generic \index{function declaration}function and \index{interface type!subscript declaration}\index{subscript declaration}subscript declarations. A reference to a generic function declaration from an expression always applies substitutions first, so generic function types do not appear as the types of expressions. In particular, an unsubstituted generic function value cannot be a parameter to another function, thus the Swift type system does not support \index{limitation!higher-rank polymorphism}\index{higher-rank polymorphism}higher-rank polymorphism. Type inference with higher-rank types is known to be \index{halting problem}\index{undecidable problem!rank-3 polymorphism}undecidable; see \cite{wells} and \cite{practicalhigherrank}. +A generic function type represents the \index{interface type!function declaration}interface type of a generic \index{function declaration}function or \index{interface type!subscript declaration}\index{subscript declaration}subscript declaration, before substitutions are applied. An expression that references a generic function or subscript declaration will always apply substitutions first, so a \emph{value} in the Swift language cannot have a generic function type. In particular, a generic function type cannot be a parameter or result of another function type; the Swift type system does not support \index{limitation!higher-rank polymorphism}\index{higher-rank polymorphism}\emph{higher-rank polymorphism}. Type inference with higher-rank types is known to be \index{halting problem}\index{undecidable problem!rank-3 polymorphism}undecidable; see \cite{wells} and \cite{practicalhigherrank}. Generic function types have a special behavior when their canonical type is computed. Since generic function types carry a generic signature, the parameter types and return type of a \emph{canonical} generic function type are actually \emph{reduced} types with respect to this generic signature (\SecRef{reduced types}). \paragraph{Reference storage types.} -A \IndexDefinition{reference storage type}reference storage type is the type of a variable declaration adorned with the \IndexDefinition{weak reference type}\texttt{weak}, \IndexDefinition{unowned reference type}\texttt{unowned} or \texttt{unowned(unsafe)} attribute. The wrapped type must be a class type, a class-constrained archetype, or class-constrained existential type. Reference storage types arise as the interface types of variable declarations, and as the types of SIL instructions. The types of \index{expression}expressions never contain reference storage types. +A \IndexDefinition{reference storage type}reference storage type is the type of a variable declaration adorned with the \IndexDefinition{weak reference type}\texttt{weak}, \IndexDefinition{unowned reference type}\texttt{unowned}, or \texttt{unowned(unsafe)} attribute. These modifiers change the \index{reference counting}reference counting behavior of the class type, class-constrained archetype, or class-constrained existential type they are applied to. Reference storage types arise as the interface types of variable declarations, and as the types of values in \index{SIL}SIL. The types of \index{expression}expressions never contain reference storage types. \paragraph{Placeholder types.} A \IndexDefinition{placeholder type}placeholder type represents a generic argument to be inferred by the type checker. The written representation is the underscore ``\verb|_|''. They can only appear in a handful of restricted contexts, and do not appear in the types of expressions or the interface types of declarations after type checking. The constraint solver replaces placeholder types with type variables when solving the constraint system. For example, here, the interface type of the \texttt{myPets} local variable is inferred as \texttt{Array}: @@ -618,7 +620,7 @@ \section{Special Types}\label{misc types} \begin{Verbatim} let myPets: Array = ["Zelda", "Giblet"] \end{Verbatim} -Unbound generic types are also occasionally useful in \index{diagnostic!printing unbound generic type}diagnostics, to print the name of a type declaration only (like \texttt{Outer.Inner}) without the generic parameters of its declared interface type (\texttt{Outer.Inner}, for example). Unbound generic types are discussed in the context of type resolution in \SecRef{unbound generic types}. +Unbound generic types are also occasionally useful in \index{diagnostic!printing unbound generic type}diagnostics, to print the name of a type declaration only (like \texttt{Outer.Inner}) without the generic parameters of its declared interface type (\texttt{Outer.Inner}, for example). We will discuss the behavior of type resolution with unbound generic types in \SecRef{unbound generic types}. \begin{listing}\captionabove{Dynamic \tSelf\ type example}\label{dynamic self example} \begin{Verbatim} @@ -636,10 +638,10 @@ \section{Special Types}\label{misc types} } func invalid1() -> Self { - return Base() + return Base() // error } - func invalid2(_: Self) {} + func invalid2(_: Self) {} // error } class Derived: Base {} @@ -650,23 +652,26 @@ \section{Special Types}\label{misc types} \end{listing} \paragraph{Dynamic Self types.} -The \IndexDefinition{dynamic Self type@dynamic \tSelf\ type}dynamic \tSelf\ type appears when a class method declares a return type of \tSelf. In this case, the object is known to have the same dynamic type as the base of the method call, which might be a subclass of the method's class. The dynamic \tSelf\ type has one structural component, a class type, which is the static upper bound for the type. This concept comes from \index{Objective-C}Objective-C, where it is called \texttt{instancetype}. The dynamic \tSelf\ type in many ways behaves like a generic parameter, but it is not represented as one; the type checker and \index{SILGen}SILGen implement support for it directly. Note that the identifier ``\tSelf'' only has this interpretation inside a class. In a protocol declaration, \tSelf\ is the implicit generic parameter (\SecRef{protocols}). In a struct or enum declaration, \tSelf\ is the declared interface type (\SecRef{identtyperepr}). +The \IndexDefinition{dynamic Self type@dynamic \tSelf\ type}dynamic \tSelf\ type appears when a method in a class declares a return type of \tSelf. This enforces a guarantee that the return value must have the same \emph{dynamic} type as the \texttt{self} parameter passed in to the method call, which might be a subclass of the static type of \texttt{self}. The dynamic \tSelf\ type has one structural component, a class type. The dynamic \tSelf\ type behaves like an existential type in certain respects, but it's just a hard-coded behavior in the type checker and \index{SILGen}SILGen. -\ListingRef{dynamic self example} demonstrates some of the behaviors of the dynamic \tSelf\ type. Two invalid cases are shown; \texttt{invalid1()} is rejected because the type checker cannot prove that the return type is always an instance of the dynamic type of \texttt{self}, and \texttt{invalid2()} is rejected because \tSelf\ appears in contravariant position. +Note that the identifier ``\tSelf'' only has this interpretation inside a class. In a protocol declaration, \tSelf\ is the implicit generic parameter (\SecRef{protocols}). In a struct or enum declaration, \tSelf\ is the declared interface type (\SecRef{identtyperepr}). + +\ListingRef{dynamic self example} demonstrates some of the behaviors of the dynamic \tSelf\ type. Two invalid cases are shown; \texttt{invalid1()} is rejected because the type checker cannot prove that the return type is always an instance of the dynamic type of \texttt{self}, and \texttt{invalid2()} is rejected because \tSelf\ appears in contravariant position. This feature comes from \index{Objective-C}Objective-C, where it is called \texttt{instancetype}. \paragraph{Type variable types.} -A \IndexDefinition{type variable type}type variable represents the future inferred type of an \index{expression}expression. The \index{expression type checker}expression type checker builds a \emph{constraint system} by walking an expression recursively, assigning new type variables to each sub-expression and then recording constraints to relate these type variables. The constraint solver then searches for a \emph{solution} that assigns a concrete type to each type variable in a way that satisfies the constraints. Solving the constraint system can have three possible outcomes: +A \IndexDefinition{type variable type}type variable represents the future inferred type of an \index{expression}expression. The \IndexDefinition{expression type checker}expression type checker builds a \emph{constraint system} by walking an expression recursively, assigning new type variables to each sub-expression and then recording constraints to relate these type variables. The constraint solver then searches for a \emph{solution} that assigns a concrete type to each type variable in a way that satisfies the constraints. Solving the constraint system can have three possible outcomes: \begin{itemize} \item \textbf{One solution}---the expression and all of its sub-expressions are well-typed. \item \textbf{No solutions}---the constraints cannot be satisfied, so the expression is invalid. \item \textbf{Multiple solutions}---the expression is ambiguous, because there is more than one possible assignment of concrete types that satisfies the constraints. \end{itemize} -If there is one solution, we record the concrete type of each expression in the AST. In the case of multiple solutions, we first rank the solutions using a heuristic; if one is clearly ``better'' than the others, we proceed as in the case of one solution. Otherwise, we either have no solution, or the expression was ambiguous, so we \index{diagnostic!multiple solutions}diagnose an error. -The utmost care must be taken when working with type variables and the structures that contain them, because unlike all other types, they are not allocated with indefinite lifetime. Type variables live in the \IndexDefinition{constraint solver arena}constraint solver arena, whose lifetime is scoped to the type checking of a single expression. Structures that \emph{contain} type variables, such as other \index{type!containing type variables}types and \index{substitution map!containing type variables}substitution maps, also need to be allocated in the constraint solver arena. Type variables should never ``escape'' from the constraint solver, or the compiler will crash in odd ways. Assertions should be used to rule out type variables from appearing in the wrong places. +If there is exactly one solution, we update each expression in the AST with its concrete type. In the case of multiple solutions, we attempt to rank the solutions using a heuristic; if one is clearly ``better'' than the others, we proceed as in the case of one solution. If there was no solution, or the are multiple ambiguous solutions and none is better than the rest, we \index{diagnostic!multiple solutions}diagnose an error. + +Type variable types and the structures that contain them cannot outlive the the \IndexDefinition{constraint solver arena}\emph{constraint solver arena}, whose lifetime is scoped to the invocation of the expression type checker on a single expression. Structures that \emph{contain} type variables, such as other \index{type!containing type variables}types and \index{substitution map!containing type variables}substitution maps, are also allocated in the constraint solver arena. For this reason, type variable types must never ``escape'' from the constraint solver arena. Assertions are used to rule out unexpected appearance of type variable types. \IndexFlag{debug-constraints} -The printed representation of a type variable is \texttt{\$Tn}, where \texttt{n} is an incrementing integer local to the constraint system. One way to see type variables in action is to pass the \texttt{-Xfrontend~-debug-constraints} compiler flag. +The printed representation of a type variable is \texttt{\$Tn}, where \texttt{n} is an incrementing integer local to the constraint system. One way to see type variables and constraint solving in action is to pass the \texttt{-Xfrontend~-debug-constraints} compiler flag. To learn more about the expression type checker, see \cite{typechecker,diagnosticsblog}. \paragraph{L-value types.} An \IndexDefinition{l-value type}l-value type represents the type of an \index{expression}expression appearing on the left hand side of an assignment operator (hence the ``l'' in l-value), or as an argument to an \texttt{inout} parameter in a function call. L-value types wrap an \IndexDefinition{object type}\emph{object type} which is the type of the stored value; they print out as \verb|@lvalue T| where \tT\ is the object type, but this is not valid syntax in the language. @@ -674,11 +679,11 @@ \section{Special Types}\label{misc types} L-value types appear in type-checked expressions. The reader familiar with C++ might think of an l-value type as somewhat analogous to a C++ mutable reference type ``\verb|T &|''---unlike C++ though, they are not directly visible in the source language. L-value types do not appear in the types of SIL instructions; \index{SILGen}SILGen lowers l-value accesses into accessor calls or direct manipulation of memory. \paragraph{Error types.} -\IndexDefinition{error type} -\index{expression} -Error types are returned when type substitution encounters an invalid or missing conformance (\ChapRef{substmaps}). In this case, the error type wraps the original type, and prints as the original type to make types coming from malformed conformances more readable in \index{diagnostic!printing error type}diagnostics. +An \IndexDefinition{error type}error type represents an unknown type in an erroneous program. There are two forms that an error type may take: it either wraps some other type, or it is the so-called singleton error type. In some sense, the error type resembles the ``Not a Number'' value of floating point arithmetic. + +Type substitution returns an error type when it encounters an invalid conformance (\ChapRef{chap:conformances}). In this case, the error type wraps the original type being substituted, so that it prints as the original type, to make types coming from malformed conformances more readable in \index{diagnostic!printing error type}diagnostics. -Error types are also returned by \index{type resolution}type resolution if the \index{type representation}type representation is invalid in some way. This uses the singleton form of the error type, which prints as \texttt{<>}. To avoid user confusion, diagnostics containing the singleton error type should not be emitted. Generally, any expression whose type contains an error type does not need to be diagnosed, because a diagnostic should have been emitted elsewhere. +\index{type resolution}Type resolution returns the singleton error type when the \index{type representation}type representation being resolved is invalid in some way. The singleton error type prints as \texttt{<>}. To avoid user confusion, diagnostics containing the singleton error type should not be emitted. Generally, any \index{expression}expression whose type contains an error type does not need to be diagnosed, because a diagnostic will have been emitted elsewhere. \paragraph{Built-in types.} \IndexDefinition{compiler intrinsic} @@ -691,7 +696,7 @@ \section{Special Types}\label{misc types} \IndexFlag{parse-stdlib} Built-in types and their intrinsics are defined in the special \texttt{Builtin} module, which is a module constructed by the compiler itself, and not built from source code. The \texttt{Builtin} module is only visible when the compiler is invoked with the \texttt{-parse-stdlib} frontend flag; the standard library is built with this flag, but user code never interacts with the \texttt{Builtin} module directly. -\section{Source Code Reference}\label{typesourceref} +\section{Source Code Reference}\label{src:types} Key source files: \begin{itemize} @@ -716,18 +721,18 @@ \section{Source Code Reference}\label{typesourceref} \begin{itemize} \item \textbf{Various traversals:} \texttt{walk()} is a general pre-order traversal where the callback returns a tri-state value---continue, stop, or skip a sub-tree. Built on top of this are two simpler variants; \texttt{findIf()} takes a boolean predicate, and \texttt{visit()} takes a void-returning callback which offers no way to terminate the traversal. -\item \textbf{Transformations:} \texttt{transformWithPosition()} and \texttt{transformRec()}. As with the traversals, the first is the most general, and the other one changes the signature of the callback to ignore a \texttt{TypePosition} parameter. The callback is invoked on all types contained within a type, recursively. It can either elect to replace a type with a new type, or leave a type unchanged and instead try to transform any of its child types. -\item \textbf{Substitution:} \texttt{subst()} implements type substitution, which is a particularly common kind of transform which replaces generic parameters or archetypes with concrete types (\SecRef{substmapsourcecoderef}). +\item \textbf{Transformations:} \texttt{transformWithPosition()} and \texttt{transformRec()}. The first is the most general, and the other one changes the signature of the callback to ignore a \texttt{TypePosition} parameter. The callback is invoked on all types contained within a type, recursively. It can either elect to replace a type with a new type, or leave a type unchanged and instead try to transform any of its child types. +\item \textbf{Substitution:} \texttt{subst()} implements type substitution, which is a particularly common kind of transform which replaces generic parameters or archetypes with concrete types (\SecRef{src:substitution maps}). \item \textbf{Printing:} \texttt{print()} outputs the string form of a type, with many customization options; \texttt{dump()} prints the \index{tree}tree structure of a type in an \index{s-expression}s-expression form. The latter is extremely useful for invoking from inside a debugger, or ad-hoc print debug statements. \end{itemize} \IndexSource{type pointer equality} -The \texttt{Type} class explicitly deletes the overloads of \texttt{operator==} and \texttt{operator!=} to make the choice between pointer and canonical equality explicit. To check type pointer equality of possibly-sugared types, first unwrap both sides with a \texttt{getPointer()} call: +The \texttt{Type} class explicitly deletes the overloads of \texttt{operator==} and \texttt{operator!=} to make the choice between pointer and canonical equality explicit. To check type pointer equality of possibly-sugared types, first unwrap both sides with a \texttt{getPointer()} call, and then compare the \texttt{TypeBase *} values for equality: \begin{Verbatim} if (lhsType.getPointer() == rhsType.getPointer()) ...; \end{Verbatim} \IndexSource{canonical type equality} -The more common canonical type equality check is implemented by the \texttt{isEqual()} method on \texttt{TypeBase}: +Canonical type equality is more commonly used. To check for canonical type equality, call the \texttt{isEqual()} method on \texttt{TypeBase}. Unlike the pointer equality check, the below only works if both types are non-\texttt{nullptr}: \begin{Verbatim} if (lhsType->isEqual(rhsType)) ...; @@ -749,12 +754,6 @@ \section{Source Code Reference}\label{typesourceref} \item \texttt{BoundGenericEnumType}, \item \texttt{BoundGenericClassType}. \end{itemize} -\item The structural types \texttt{TupleType}, \texttt{MetatypeType}. -\item \texttt{AnyFunctionType} and its two subclasses: -\begin{itemize} -\item \texttt{FunctionType}, -\item \texttt{GenericFunctionType}. -\end{itemize} \item \texttt{GenericTypeParamType}, \texttt{DependentMemberType}, the two type parameter types. \item \texttt{ArchetypeType}, and its three subclasses: \begin{itemize} @@ -762,20 +761,28 @@ \section{Source Code Reference}\label{typesourceref} \item \texttt{ExistentialArchetypeType}, \item \texttt{OpaqueArchetypeType}. \end{itemize} -\item The abstract types: +\item Constraint types: \begin{itemize} \item \texttt{ProtocolCompositionType}, \item \texttt{ParameterizedProtocolType}, +\end{itemize} +\item Existential types: +\begin{itemize} \item \texttt{ExistentialType}, \item \texttt{ExistentialMetatypeType}, -\item \texttt{DynamicSelfType}. +\end{itemize} +\item The structural types \texttt{TupleType}, \texttt{MetatypeType}. +\item \texttt{AnyFunctionType} and its two subclasses: +\begin{itemize} +\item \texttt{FunctionType}, +\item \texttt{GenericFunctionType}. \end{itemize} \item \texttt{SugarType} and its four subclasses: \begin{itemize} -\item \texttt{TypeAliasType}, -\item \texttt{OptionalType}, -\item \texttt{ArrayType}, -\item \texttt{DictionaryType}. +\item \IndexSource{type alias type}\texttt{TypeAliasType}, +\item \IndexSource{optional sugared type}\texttt{OptionalType}, +\item \IndexSource{array sugared type}\texttt{ArrayType}, +\item \IndexSource{dictionary sugared type}\texttt{DictionaryType}. \end{itemize} \item \texttt{BuiltinType} and its subclasses (there are a bunch of esoteric ones; only a few are shown below): \begin{itemize} @@ -791,8 +798,9 @@ \section{Source Code Reference}\label{typesourceref} \item \texttt{WeakStorageType}, \item \texttt{UnownedStorageType}. \end{itemize} -\item Miscellaneous types: +\item Everything else: \begin{itemize} +\item \texttt{DynamicSelfType}, \item \texttt{UnboundGenericType}, \item \texttt{PlaceholderType}, \item \texttt{TypeVariableType}, @@ -802,8 +810,10 @@ \section{Source Code Reference}\label{typesourceref} \end{itemize} Each concrete subclass defines some set of static factory methods, usually named \texttt{get()} or similar, which take the structural components and construct a new, uniqued type of this kind. There are also getter methods, prefixed with \texttt{get}, which project the structural components of each kind of type. It would be needlessly duplicative to list all of the getter methods for each subclass of \texttt{TypeBase}; they can all be found in \SourceFile{include/swift/AST/Types.h}. -\paragraph{Dynamic casts.} -Subclasses of \texttt{TypeBase *} are identifiable at runtime via the \verb|is<>|, \verb|castTo<>| and \verb|getAs<>| template methods. To check if a type has a specific kind, use \verb|is<>|: +\paragraph{Desugaring casts.} +Subclasses of \texttt{TypeBase *} are identifiable at runtime via the \verb|is<>|, \verb|castTo<>| and \verb|getAs<>| template methods. + +To check if a type has a specific kind, use \verb|is<>|: \begin{Verbatim} Type type = ...; @@ -815,13 +825,14 @@ \section{Source Code Reference}\label{typesourceref} if (FunctionType *funcTy = type->getAs()) ...; \end{Verbatim} -Finally, \verb|castTo<>| is an unconditional cast which asserts that the type has the required kind: +Finally, to assert that a type has the required kind, use \verb|castTo<>|: \begin{Verbatim} FunctionType *funcTy = type->castTo(); \end{Verbatim} -These template methods desugar the type if it is a sugared type, and the casted type can never itself be a sugared type. This is usually correct; for example, if \texttt{type} is the \texttt{Swift.Void} type alias type, then \texttt{type->is()} returns true, because it is for all intents and purposes a tuple (an empty tuple), except when printed in diagnostics. +These template methods desugar the type if it is a sugared type, and they can never cast \emph{to} a sugared type. This is usually what is intended, because type sugar is not meant to have any effect behavior. For example, if \texttt{type} is the \texttt{Swift.Void} type alias type, then \texttt{type->is()} returns true, because it is for all intents and purposes a tuple (an empty tuple), except when printed in diagnostics. -There are also top-level template functions \verb|isa<>|, \verb|dyn_cast<>| and \verb|cast<>| that operate on \texttt{TypeBase *}. Using these with \texttt{Type} is an error; the pointer must be explicitly unwrapped with \texttt{getPointer()} first. These casts do not desugar, and permit casting to sugared types. This is the mechanism used when \IndexSource{sugared type}sugared types must be distinguished from canonical types for some reason: +\paragraph{Direct casts.} +Alternatively, there are also the three top-level template functions \verb|isa<>|, \verb|dyn_cast<>|, and \verb|cast<>|, that operate on \texttt{TypeBase *}. Using these with \texttt{Type} is an error; the pointer must be explicitly unwrapped with \texttt{getPointer()} first. These casts do not desugar, and thus they permit casting to a sugared type. This is the mechanism used when \IndexSource{sugared type}sugared types must be distinguished from canonical types for some reason: \begin{Verbatim} Type type = ...; @@ -829,6 +840,9 @@ \section{Source Code Reference}\label{typesourceref} ...; \end{Verbatim} +\paragraph{Canonical types.} +The \texttt{getCanonicalType()} method outputs a \texttt{CanType} wrapping the \IndexSource{canonical type}canonical form of this \texttt{TypeBase *}. The canonical type is computed once and memoized, so this operation is cheap. The actual computation is done in \texttt{computeCanonicalType()}. Finally, the \texttt{isCanonical()} method checks if a type is already canonical. + \paragraph{Visitors.} The simplest way to \index{exhaustive switch}exhaustively handle each kind of type is to switch over the \IndexSource{type kind}kind, which is an instance of the \texttt{TypeKind} enum, like this: \begin{Verbatim} @@ -861,23 +875,20 @@ \section{Source Code Reference}\label{typesourceref} The \texttt{TypeVisitor} preserves information if it receives a sugared type; for example, visiting \texttt{Int?}\ will call \texttt{visitOptionalType()}, while visiting \texttt{Optional} will call \texttt{visitBoundGenericEnumType()}. In the common situation where the semantics of your operation do not depend on type sugar, you can use the \texttt{CanTypeVisitor} template class instead. Here, the \texttt{visit()} method takes a \texttt{CanType}, so \texttt{Int?}\ will need to be canonicalized to \texttt{Optional} before being passed in. -\paragraph{Canonical types.} -The \texttt{getCanonicalType()} method outputs a \texttt{CanType} wrapping the \IndexSource{canonical type}canonical form of this \texttt{TypeBase *}. The \texttt{isCanonical()} method checks if a type is canonical. - -\paragraph{Nominal types.} A handful of methods on \texttt{TypeBase} exist which perform a desugaring cast to a nominal type (so they will also accept a type alias type or other sugared type), and return the nominal type declaration, or \texttt{nullptr} if the type isn't of a nominal kind: +\paragraph{Nominal types.} The below methods on \texttt{TypeBase} perform a desugaring cast to a nominal type (so they will also accept a type alias type or other sugared type), and then they return the nominal type declaration, or \texttt{nullptr} if the type isn't of a nominal kind: \begin{itemize} -\item \texttt{getAnyNominal()} returns the nominal type declaration of \texttt{UnboundGenericType}, \texttt{NominalType} or \texttt{BoundGenericNominalType}. -\item \texttt{getNominalOrBoundGenericNominal()} returns the nominal type declaration of a \texttt{NominalType} or \texttt{BoundGenericNominalType}. -\item \texttt{getStructOrBoundGenericStruct()} returns the type declaration of a \texttt{StructType} or \texttt{BoundGenericStructType}. -\item \texttt{getEnumOrBoundGenericEnum()} returns the type declaration of an \texttt{EnumType} or \texttt{BoundGenericEnumType}. -\item \texttt{getClassOrBoundGenericClass()} returns the class declaration of a \texttt{ClassType} or \texttt{BoundGenericClassType}. -\item \texttt{getNominalParent()} returns the parent type stored by an \texttt{UnboundGenericType}, \texttt{NominalType} or \texttt{BoundGenericNominalType}. +\item \texttt{getAnyNominal()} returns the nominal type declaration of \texttt{UnboundGenericType}, \texttt{NominalType} or \texttt{BoundGenericNominalType}, or \texttt{nullptr}. +\item \texttt{getNominalOrBoundGenericNominal()} returns the nominal type declaration of a \texttt{NominalType} or \texttt{BoundGenericNominalType}, or \texttt{nullptr}. +\item \texttt{getStructOrBoundGenericStruct()} returns the type declaration of a \texttt{StructType} or \texttt{BoundGenericStructType}, or \texttt{nullptr}. +\item \texttt{getEnumOrBoundGenericEnum()} returns the enum declaration of an \texttt{EnumType} or \texttt{BoundGenericEnumType}, or \texttt{nullptr}. +\item \texttt{getClassOrBoundGenericClass()} returns the class declaration of a \texttt{ClassType} or \texttt{BoundGenericClassType}, or \texttt{nullptr}. +\item \texttt{getNominalParent()} returns the parent type stored by an \texttt{UnboundGenericType}, \texttt{NominalType} or \texttt{BoundGenericNominalType}, or \texttt{nullptr}. \end{itemize} -\paragraph{Recursive properties.} Various predicates are computed when a type is constructed and are therefore cheap to check: +\paragraph{Recursive properties.} Various predicates determine if a type recursively contains other types with certain properties. These are computed when a type is constructed and are therefore cheap to check: \begin{itemize} \item \texttt{hasTypeVariable()} determines whether the type was allocated in the permanent arena or the \IndexSource{constraint solver arena}constraint solver arena. -\item \texttt{hasArchetype()}, \texttt{hasOpaqueArchetype()}, \texttt{hasOpenedExistential()}. +\item \texttt{hasPrimaryArchetype()}, \texttt{hasOpaqueArchetype()}, \texttt{hasOpenedExistential()}. \item \texttt{hasTypeParameter()}. \item \texttt{hasUnboundGenericType()}, \texttt{hasDynamicSelf()}, \texttt{hasPlaceholder()}. \item \texttt{hasLValueType()} determines whether the type contains an l-value type. @@ -885,9 +896,9 @@ \section{Source Code Reference}\label{typesourceref} \paragraph{Utility operations.} These encapsulate frequently-useful patterns. \begin{itemize} -\item \texttt{getOptionalObjectType()} \IndexSource{optional sugared type}returns the type \tT\ if the type is some \texttt{Optional}, otherwise it returns the null type. -\item \texttt{getMetatypeInstanceType()} returns the type \tT\ if the type is some \texttt{T.Type}, otherwise it returns \tT. -\item \texttt{mayHaveMembers()} answers if this is a nominal type, archetype, existential type or dynamic Self type. +\item \texttt{getOptionalObjectType()} \IndexSource{optional sugared type}returns the \IndexSource{payload type of optional}payload type \tT\ of a generic nominal type \texttt{Optional} that was formed from the \texttt{Optional} enum declaration in the standard library. For any other kind of type, returns the null type. +\item \texttt{getMetatypeInstanceType()} \IndexSource{instance type of metatype}returns the type \tT\ if this type is \texttt{T.Type}, otherwise it returns \tT\ (and not the null type). +\item \texttt{mayHaveMembers()} checks if this is a nominal type, archetype, existential type, or dynamic Self type. \end{itemize} \paragraph{Recovering the AST context.} All non-canonical types point at their canonical type, and canonical types point at the AST context. @@ -921,7 +932,7 @@ \section{Source Code Reference}\label{typesourceref} This is the base class of \texttt{FunctionType} and \texttt{GenericFunctionType}. \begin{itemize} \item \texttt{getParams()} returns an array of \texttt{AnyFunctionType::Param}. -\item \texttt{getResult()} returns the result type. +\item \texttt{getResult()} returns the return type. \item \texttt{getExtInfo()} returns an instance of \texttt{AnyFunctionType::ExtInfo} storing the additional non-type attributes. \end{itemize} @@ -931,11 +942,11 @@ \section{Source Code Reference}\label{typesourceref} \item \texttt{getPlainType()} returns the type of the parameter. If the parameter is variadic (\texttt{T...}), this is the element type \tT. \item \texttt{getParameterType()} same as above, but if the parameter is variadic, returns the type \texttt{Array}. \item \texttt{isVariadic()}, \texttt{isAutoClosure()} are the special behaviors. -\item \texttt{getValueOwnership()} returns an instance of the \texttt{ValueOwnership} enum. +\item \texttt{getValueOwnership()} returns an \IndexSource{ownership specifier}ownership specifier encoded as an instance of the \texttt{ValueOwnership} enum. \end{itemize} \apiref{ValueOwnership}{enum class} -The possible ownership attributes on a function parameter. +The return type of \texttt{AnyFunctionType::Param::getValueOwnership()}. \begin{itemize} \item \texttt{ValueOwnership::Default} \item \texttt{ValueOwnership::InOut} diff --git a/docs/Generics/generics.bib b/docs/Generics/generics.bib index 2fc3ef2518379..7b2620a2a64be 100644 --- a/docs/Generics/generics.bib +++ b/docs/Generics/generics.bib @@ -1,8 +1,16 @@ @IEEEtranBSTCTL{IEEEexample:BSTcontrol, - CTLname_url_prefix = "\\*", + CTLname_url_prefix = "", CTLdash_repeated_names = "no" } +% Use this when citing this book: +@misc{csg, + title = "Compiling {S}wift {G}enerics", + author = "Slava Pestov", + url = "https://download.swift.org/docs/assets/generics.pdf", + year = {2024} +} + @misc{tspl, title = "The {S}wift Programming Language", url = "https://docs.swift.org/swift-book/", @@ -57,6 +65,15 @@ @book{gregor publisher={Pearson Education} } +@book{stepanov, + title={Elements of Programming}, + author={Stepanov, A. and McJones, P.}, + isbn={9780578222141}, + url={https://www.elementsofprogramming.com}, + year={2019}, + publisher={Semigroup Press} +} + @book{grimaldi, title={Discrete and Combinatorial Mathematics: An Applied Introduction}, author={Grimaldi, R.P.}, @@ -155,6 +172,19 @@ @book{andallthat url={https://www21.in.tum.de/~nipkow/TRaAT/} } +@book{Sims_1994, +place={Cambridge}, +series={Encyclopedia of Mathematics and its Applications}, +title={Computation with Finitely Presented Groups}, +publisher={Cambridge University Press}, +author={Sims, Charles C.}, +year={1994}, +isbn={9780511574702}, +collection={Encyclopedia of Mathematics and its Applications}, +doi={https://doi.org/10.1017/CBO9780511574702}, +url={https://www.cambridge.org/us/universitypress/subjects/mathematics/algebra/computation-finitely-presented-groups} +} + @book{epstein1992word, title={Word Processing in Groups}, author={Epstein, D.B.A.}, @@ -191,6 +221,16 @@ @book{art4b publisher={Addison-Wesley}, } +@book{sataa, + title={The Satisfiability Problem: Algorithms and Analyses}, + author={Sch{\"o}ning, U. and Tor{\'a}n, J.}, + isbn={9783865416483}, + series={Mathematik f{\"u}r Anwendungen}, + url={https://www.goodreads.com/book/show/27394636-the-satisfiability-problem}, + year={2013}, + publisher={Lehmanns Media} +} + @book{konig, title={Theory of Finite and Infinite Graphs}, author={K{\H{o}}nig, D.}, @@ -219,6 +259,16 @@ @book{cutland url={https://www.cambridge.org/us/universitypress/subjects/computer-science/programming-languages-and-applied-logic/computability-introduction-recursive-function-theory?format=PB} } +@book{maccormick2018can, + title={What Can Be Computed?: A Practical Guide to the Theory of Computation}, + author={MacCormick, J.}, + isbn={9780691170664}, + lccn={2018935138}, + url={https://whatcanbecomputed.com}, + year={2018}, + publisher={Princeton University Press} +} + @book{collatzbook, title={The Ultimate Challenge: The $3x+1$ Problem}, author={Lagarias, J.C.}, @@ -284,6 +334,34 @@ @book{garey1979computers publisher={Freeman} } +@book{transductions, + title={Transductions and Context-Free Languages}, + author={Berstel, J.}, + isbn={9783519023401}, + url={https://www-igm.univ-mlv.fr/~berstel/LivreTransductions/LivreTransductions.html}, + year={1979}, + publisher={Teubner Verlag} +} + +@book{grechuk2024polynomial, + title={Polynomial Diophantine Equations: A Systematic Approach}, + author={Grechuk, B.}, + isbn={9783031629495}, + url={https://link.springer.com/book/10.1007/978-3-031-62949-5}, + year={2024}, + publisher={Springer International Publishing} +} + +@book{awk, + title={The AWK Programming Language}, + author={Aho, A.V. and Kernighan, B.W. and Weinberger, P.J.}, + isbn={9780138269777}, + series={Addison-Wesley Professional Computing Series}, + url={https://awk.dev}, + year={2023}, + publisher={Pearson Education} +} + @article{factor, author = {Pestov, Slava and Ehrenberg, Daniel and Groff, Joe}, title = {Factor: a dynamic stack-based programming language}, @@ -453,6 +531,18 @@ @inproceedings{mptc url={https://lirias.kuleuven.be/retrieve/25382/} } +@inproceedings{morph, +author="Pedersen, John", +editor="Dershowitz, Nachum", +title="Morphocompletion for one-relation monoids", +booktitle="Rewriting Techniques and Applications", +year="1989", +publisher="Springer Berlin Heidelberg", +address="Berlin, Heidelberg", +pages="574--578", +isbn="9783540461494" +} + @article{wells, title = {Typability and type checking in {System F} are equivalent and undecidable}, journal = {Annals of Pure and Applied Logic}, @@ -636,6 +726,24 @@ @article{undecidablegroup_collins URL = {https://doi.org/10.1215/ijm/1256044631} } +@article{java_wildcards, +author = {Tate, Ross and Leung, Alan and Lerner, Sorin}, +title = {Taming wildcards in {J}ava's type system}, +year = {2011}, +issue_date = {June 2011}, +publisher = {Association for Computing Machinery}, +address = {New York, NY, USA}, +volume = {46}, +number = {6}, +issn = {0362-1340}, +url = {https://rosstate.org/publications/tamewild/tamewild-tate-pldi11-tr.pdf}, +doi = {10.1145/1993316.1993570}, +journal = {SIGPLAN Not.}, +month = jun, +pages = {614–627}, +numpages = {14} +} + @article{java_undecidable, author = {Grigore, Radu}, title = {Java Generics Are {T}uring Complete}, @@ -697,6 +805,19 @@ @article{KAPUR1985337 author = {Deepak Kapur and Paliath Narendran} } +@article{OTTO1984249, +title = {Finite complete rewriting systems for the {J}antzen monoid and the {G}reendlinger group}, +journal = {Theoretical Computer Science}, +volume = {32}, +number = {3}, +pages = {249-260}, +year = {1984}, +issn = {0304-3975}, +doi = {https://doi.org/10.1016/0304-3975(84)90044-6}, +url = {https://www.sciencedirect.com/science/article/pii/0304397584900446}, +author = {Friedrich Otto} +} + @article{fptype, title = {Word problems and a homological finiteness condition for monoids}, journal = {Journal of Pure and Applied Algebra}, @@ -723,6 +844,28 @@ @article{SQUIER1994271 author = {Craig C. Squier and Friedrich Otto and Yuji Kobayashi} } +@article{solid, + title={Constructing finitely presented monoids which have no finite complete presentation}, + author={Masashi Katsura and Yuji Kobayashi}, + journal={Semigroup Forum}, + year={1997}, + volume={54}, + pages={292-302}, + url={https://link.springer.com/article/10.1007/BF02676612} +} + +@article{CAIN201768, +title = {On finite complete rewriting systems, finite derivation type, and automaticity for homogeneous monoids}, +journal = {Information and Computation}, +volume = {255}, +pages = {68-93}, +year = {2017}, +issn = {0890-5401}, +doi = {https://doi.org/10.1016/j.ic.2017.05.003}, +url = {https://www.sciencedirect.com/science/article/pii/S0890540117300937}, +author = {Alan J. Cain and Robert D. Gray and Ant\'onio Malheiro} +} + @article{LAFONT1995229, title = {A new finiteness condition for monoids presented by complete rewriting systems (after {Craig C. Squier})}, journal = {Journal of Pure and Applied Algebra}, @@ -783,6 +926,16 @@ @article{homotopyreduction author = {Yuji Kobayashi} } +@InProceedings{homotopyreduction2, +author="Kobayashi, Yuji", +title="{Homotopy Reduction Systems for Monoid Presentations II: The Guba---Sapir Reduction and Homotopy Modules}", +booktitle="Algorithmic Problems in Groups and Semigroups", +year="2000", +publisher="Birkh{\"a}user Boston", +address="Boston, MA", +pages="143--159", +isbn="978-1-4612-1388-8"} + @misc{undecidablegroup2, author = {Will Cravitz}, title = {An introduction to the word problem for groups}, @@ -1033,6 +1186,19 @@ @article{ahocorasick numpages = {8} } +@article{MEYER1985219, +title = {Incremental string matching}, +journal = {Information Processing Letters}, +volume = {21}, +number = {5}, +pages = {219-227}, +year = {1985}, +issn = {0020-0190}, +doi = {https://doi.org/10.1016/0020-0190(85)90088-2}, +url = {https://se.inf.ethz.ch/~meyer/publications/string/string_matching.pdf}, +author = {Bertrand Meyer} +} + @article{formalabi, author = {Wagner, Andrew and Eisbach, Zachary and Ahmed, Amal}, title = {Realistic Realizability: Specifying {ABIs} You Can Count On}, @@ -1061,6 +1227,20 @@ @inproceedings{cook pages = {151–158} } +@article{finitemonoid, +title = {The finiteness of finitely presented monoids}, +journal = {Theoretical Computer Science}, +volume = {204}, +number = {1}, +pages = {169-182}, +year = {1998}, +issn = {0304-3975}, +doi = {https://doi.org/10.1016/S0304-3975(98)00038-3}, +url = {https://www.sciencedirect.com/science/article/pii/S0304397598000383}, +author = {Robert McNaughton}, +keywords = {Finitely presented monoids, Finite monoids, Thue systems, Multiplication tables} +} + @misc{kiselyov2009fun, author = {Kiselyov, Oleg and Peyton Jones, Simon and Shan, {Chung-chieh}}, title = {Fun with type functions}, @@ -1098,6 +1278,12 @@ @misc{java_faq url = {http://www.angelikalanger.com/GenericsFAQ/JavaGenericsFAQ.html} } +@misc{valhalla, + title = "Project {V}alhalla", + year = {2024}, + url = {https://openjdk.org/projects/valhalla/} +} + @misc{rust_chalk, author="{Rust Traits Working Group}", title = "The {Chalk} Book", @@ -1132,6 +1318,13 @@ @misc{rust_gat year = {2016} } +@misc{typechecker, + title="Type checker design and implementation", + author="Doug Gregor and Pavel Yaskevich and Holly Borla", + url = "https://github.com/swiftlang/swift/blob/main/docs/TypeChecker.md", + year = {2020} +} + @misc{hylo, title="The {H}ylo programming language", author="Dave Abrahams and Dimi Racordon", @@ -1156,7 +1349,7 @@ @misc{llvmtalk @misc{cvwtalk, author = "Dario Rexin", title = "Compact value witnesses in {S}wift", - url = "https://www.youtube.com/watch?v=ctS8FzqcRug", + url = "https://www.youtube.com/watch?v=hjgDwdGJIhI", year = {2023} } @@ -1167,40 +1360,93 @@ @misc{siltalk year = {2015} } -@misc(sil, +@misc{sil, title = "Swift Intermediate Language {(SIL)}", - url = "https://github.com/swiftlang/swift/blob/main/docs/SIL.rst", + url = "https://github.com/swiftlang/swift/blob/main/docs/SIL/SIL.md", + year = {2024} +} + +@misc{siltypes, + title = "{SIL} Type Lowering", + url = "https://github.com/swiftlang/swift/blob/main/docs/SIL/Types.md", + year = {2024} +} + +@misc{abstractleak, + title = "Value wrapped in a thunk recursively(?)", + url = "https://forums.swift.org/t/value-wrapped-in-a-thunk-recursively/75925", + author = "Aleksandr", + year = {2024} +} + +@misc{emitabstract, + author = "Joe Groff", + title = "{SILGen}: Emit literal closures at the abstraction level of their context. [take 3]", + url = "https://github.com/swiftlang/swift/pull/39233", + year = {2021} +} + +@misc{optionalpayload, + author = "John McCall", + title = "Abstract the object type of optional types", + url = "https://github.com/swiftlang/swift/pull/4689", year = {2016} -) -@misc(gensig, +} + +@misc{mostopaque, + author = "Slava Pestov", + title = "{SIL}: {S}top imploding parameter list into a single value with opaque abstraction pattern", + url = "https://github.com/swiftlang/swift/pull/19578", + year = {2018} +} + +@misc{gensig, author = "Doug Gregor", title = "Generic Signatures", url = "https://github.com/swiftlang/swift/blob/main/docs/ABI/GenericSignature.md", year = {2018} -) -@misc(reqeval, +} + +@misc{reqeval, author = "Doug Gregor", title = "Request evaluator", url = "https://github.com/swiftlang/swift/blob/main/docs/RequestEvaluator.md", year = {2018} -) -@misc(incremental, +} + +@misc{compileperf, + author = "Graydon Hoare", + title = "Swift compiler performance", + url = "https://github.com/swiftlang/swift/blob/main/docs/CompilerPerformance.md", + year = {2017} +} + +@misc{incremental, author = "Jordan Rose", title = "Dependency analysis", url = "https://github.com/swiftlang/swift/blob/main/docs/DependencyAnalysis.md", year = {2015} -) -@misc(mangling, +} + +@misc{mangling, title = "Mangling", url = "https://github.com/swiftlang/swift/blob/main/docs/ABI/Mangling.rst", year = {2012} -) +} + +@misc{typelayout, + title = "Type layout", + url = "https://github.com/swiftlang/swift/blob/main/docs/ABI/TypeLayout.rst", + year = {2017} +} + @misc{libraryevolution, author = "Jordan Rose and Slava Pestov", title = "Library Evolution", url = "https://github.com/swiftlang/swift/blob/main/docs/LibraryEvolution.rst", year = {2015} } + @misc{rustturing, author = "Shea Leffler", title = "Rust's Type System is {Turing-Complete}", @@ -1222,11 +1468,25 @@ @misc{implrecursive year = {2016} } +@misc{evolutionblog, + author = "Slava Pestov", + title = "Library evolution in {S}wift", + url = {https://www.swift.org/blog/library-evolution/}, + year = {2020} +} + +@misc{diagnosticsblog, + title="New diagnostic architecture overview", + author="Pavel Yaskevich", + url = {https://www.swift.org/blog/new-diagnostic-arch-overview/}, + year = {2019} +} + @misc{swift57, author = "Holly Borla", title = "Swift 5.7 released", - year = {2022}, url = {https://www.swift.org/blog/swift-5.7-released/}, + year = {2022} } @misc{brainfuck, @@ -1243,37 +1503,68 @@ @misc{substfunctype year = {2019} } +@misc{opaqueemoji, + author = {David Zarzycki}, + title = {{[NFC]} {C}hange magic emoji to \texttt{\char`_\char`_}}, + url = {https://github.com/swiftlang/swift/pull/35734}, + year = {2021} +} + +@misc{envelopinggroup, + title = {Mastodon post}, + url = {https://gamedev.lgbt/@typeswitch/114915626980225127}, + year = {2025} +} + +@misc{sr55, + title = "{SR-55}: non-@objc existentials do not conform to their own protocol type", + url = "https://github.com/swiftlang/swift/issues/42677", + year = {2015} +} + @misc{sr617, title = "{SR-617}: \texttt{Self} not always resolved dynamically with Generics", url = "https://github.com/swiftlang/swift/issues/43234", year = {2016} } + @misc{sr631, title = "{SR-631}: Extensions in different files do not recognize each other", url = "https://github.com/swiftlang/swift/issues/43248", year = {2016} } + @misc{sr4206, title = "{SR-4206}: Override checking does not properly enforce requirements", url = "https://github.com/swiftlang/swift/issues/46789", year = {2017} } + @misc{sr6724, title = "{SR-6724}: Swift 4.1 crash when using conditional conformance", url = "https://github.com/swiftlang/swift/issues/49273", year = {2018} } + @misc{sr12120, title = "{SR-12120}: Compiler forgets some constraints of {P} within extension to {P}, known bug?", url = "https://github.com/swiftlang/swift/issues/54555", year = {2020} } + @misc{sr2235, title = "{SR-2235}: Redeclared associatedtype inference not working", url = "https://github.com/swiftlang/swift/issues/44842", year = {2016}, } +@misc{issue59391, + title = "Swift 5.7: Incorrect compiler error with opaque result types and primary associated types", + author = "Gwendal Rou\'e", + url = "https://github.com/swiftlang/swift/issues/59391", + year = {2022}, +} + @misc{evolution, title = "Swift evolution process", url = "https://www.swift.org/swift-evolution/", @@ -1286,156 +1577,182 @@ @misc{se0011 url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0011-replace-typealias-associated.md", year = {2015} } + @misc{se0021, author = "Doug Gregor", title = "{SE-0021}: Naming Functions with Argument Labels", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0021-generalized-naming.md", year = {2016} } + @misc{se0029, author = "Chris Lattner", title = "{SE-0029}: Remove implicit tuple splat behavior from function applications", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0029-remove-implicit-tuple-splat.md", year = {2016} } + @misc{se0035, author = "Joe Groff", title = "{SE-0035}: Limiting \texttt{inout} capture to \texttt{@noescape} contexts", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0035-limit-inout-capture.md", year = {2016} } + @misc{se0048, author = "Chris Lattner", title = "{SE-0048}: Generic type aliases", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0048-generic-typealias.md", year = {2016} } + @misc{se0066, author = "Chris Lattner", title = "{SE-0066}: Standardize function type argument syntax to require parentheses", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0066-standardize-function-type-syntax.md", year = {2016} } + @misc{se0077, author = "Anton Zhilin", title = "{SE-0077}: Improved operator declarations", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0077-operator-precedence.md", year = {2016} } + @misc{se0091, author = "Tony Allevato and Doug Gregor", title = "{SE-0091}: Improving operator requirements in protocols", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0091-improving-operators-in-protocols.md", year = {2016} } + @misc{se0110, author = "Vladimir S. and Austin Zheng", title = "{SE-0110}: Distinguish between single-tuple and multiple-argument function types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0110-distinguish-single-tuple-arg.md", year = {2016} } + @misc{se0111, author = "Austin Zheng", title = "{SE-0111}: Remove type system significance of function argument labels", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0111-remove-arg-label-type-significance.md", year = {2016} } + @misc{se0068, author = "Erica Sadun", title = "{SE-0068}: Expanding Swift \texttt{Self} to class members and value types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0068-universal-self.md", year = {2016} } + @misc{se0081, author = "David Hart and Robert Widmann and Pyry Jahkola", title = "{SE-0081}: Move \texttt{where} clause to end of declaration", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0081-move-where-expression.md", year = {2016} } + @misc{se0092, author = "David Hart and Doug Gregor", title = "{SE-0092}: Typealiases in protocols and protocol extensions", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0092-typealiases-in-protocols.md", year = {2016} } + @misc{se0095, author = "Adrian Zubarev and Austin Zheng", title = "{SE-0095}: Replace \texttt{protocol} syntax with \texttt{P1 \& P2} syntax", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0095-any-as-existential.md", year = {2016} } + @misc{se0103, author = "Trent Nadeau", title = "{SE-0103}: Make non-escaping closures the default", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0103-make-noescape-default.md", year = {2016} } + @misc{se0142, author = "David Hart and Jacob Bandes-Storch and Doug Gregor", title = "{SE-0142}: Permit where clauses to constrain associated types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0142-associated-types-constraints.md", year = {2017} } + @misc{se0143, author = "Doug Gregor", title = "{SE-0143}: Conditional conformances", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0143-conditional-conformances.md", year = {2016} } + @misc{se0148, author = "Chris Eidhof", title = "{SE-0148}: Generic subscripts", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0148-generic-subscripts.md", year = {2017} } + @misc{se0156, author = "David Hart and Austin Zheng", title = "{SE-0156}: Class and subtype existentials", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0156-subclass-existentials.md", year = {2017} } + @misc{se0157, author = "Doug Gregor and Erica Sadun and Austin Zheng", title = "{SE-0157}: Support recursive constraints on associated types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0157-recursive-protocol-constraints.md", year = {2017} } + @misc{se0193, author = "Slava Pestov", title = "{SE-0193}: Cross-module inlining and specialization", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0193-cross-module-inlining-and-specialization.md", year = {2018} } + @misc{se0244, author = "Doug Gregor and Joe Groff", title = "{SE-0244}: Opaque result types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0244-opaque-result-types.md", year = {2019} } + @misc{se0252, author = "Doug Gregor and Pavel Yaskevich", title = "{SE-0252}: Key path member lookup", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0252-keypath-dynamic-member-lookup.md", year = {2019} } + @misc{se0254, author = "Becca Royal-Gordon", title = "{SE-0254}: Static and class subscripts", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0254-static-subscripts.md", year = {2019} } + @misc{se0260, author = "Jordan Rose and Ben Cohen", title = "{SE-0260}: Library Evolution for Stable {ABIs}", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0260-library-evolution.md", year = {2019} } + @misc{se0261, author = "Anthony Latsis", title = "{SE-0261}: \texttt{where} clauses on contextually generic declarations", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0267-where-on-contextually-generic.md", year = {2019} } + @misc{se0281, author = "Nate Cook and Nate Chandler and Matt Ricketson", title = "{SE-0281}: \texttt{@main}: Type-Based Program Entry Points", @@ -1463,111 +1780,150 @@ @misc{se0309 url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0309-unlock-existential-types-for-all-protocols.md", year = {2022} } + @misc{se0315, author = "Frederick Kellison-Linn", title = "{SE-0315}: Type placeholders", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0315-placeholder-types.md", year = {2021} } + @misc{se0328, author = "Benjamin Driscoll and Holly Borla", title = "{SE-0328}: Structural opaque result types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0328-structural-opaque-result-types.md", year = {2021} } + +@misc{se0335, + author = "Holly Borla", + title = "{SE-0335}: Introduce existential \texttt{any}", + url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0335-existential-any.md", + year = {2021} +} + @misc{se0341, author = "Doug Gregor", title = "{SE-0341}: Opaque parameter declarations", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0341-opaque-parameters.md", year = {2022} } + @misc{se0346, author = "Pavel Yaskevich and Holly Borla and Slava Pestov", title = "{SE-0346}: Lightweight same-type requirements for primary associated types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0346-light-weight-same-type-syntax.md", year = {2022} } + @misc{se0352, author = "Doug Gregor", title = "{SE-0352}: Implicitly opened existentials", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0352-implicit-open-existentials.md", year = {2022} } + @misc{se0353, author = "Robert Widmann", title = "{SE-0353}: Constrained existential types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0353-constrained-existential-types.md", year = {2022} } -@misc{se0355, - author = "Holly Borla", - title = "{SE-0335}: Introduce existential \texttt{any}", - url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0335-existential-any.md", - year = {2021} + +@misc{se0360, + author = "Pavel Yaskevich", + title = "{SE-0360}: Opaque result types with limited availability", + url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0360-opaque-result-types-with-availability.md", + year = {2022} } + @misc{se0361, author = "Holly Borla", title = "{SE-0361}: Extensions on bound generic types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0361-bound-generic-extensions.md", year = {2022} } + @misc{se0364, author = "Harlan Haskins", title = "{SE-0364}: Warning for Retroactive Conformances of External Types", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0364-retroactive-conformance-warning.md", year = {2023} } + @misc{se0377, author = "Michael Gottesman and Joe Groff", title = "{SE-0377}: \texttt{borrowing} and \texttt{consuming} parameter ownership modifiers", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0377-parameter-ownership-modifiers.md", year = {2023} } + @misc{se0383, author = "Robert Widmann", title = "{SE-0383}: Deprecate {@UIApplicationMain} and {@NSApplicationMain}", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0383-deprecate-uiapplicationmain-and-nsapplicationmain.md", year = {2023} } + @misc{se0390, author = "Joe Groff and Michael Gottesman and Andrew Trick and Kavon Farvardin", title = "{SE-0390}: Noncopyable structs and enums", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0390-noncopyable-structs-and-enums.md", year = {2023} } + @misc{se0393, author = "Holly Borla and John McCall and Slava Pestov", title = "{SE-0393}: Value and Type Parameter Packs", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0393-parameter-packs.md", year = {2023} } + @misc{se0398, author = "Slava Pestov and Holly Borla", title = "{SE-0398}: Allow Generic Types to Abstract Over Packs", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0398-variadic-types.md", year = {2023} } + @misc{se0399, author = "Sophia Poirier and Holly Borla", title = "{SE-0399}: Tuple of value pack expansion", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0399-tuple-of-value-pack-expansion.md", year = {2023} } + @misc{se0404, author = "Karl Wagner", title = "{SE-0404}: Nested protocols", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0404-nested-protocols.md", year = {2023} } + @misc{se0413, author = "Jorge Revuelta and Torsten Lehmann and Doug Gregor", title = "{SE-0413}: Typed throws", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0413-typed-throws.md", year = {2023} } + @misc{se0427, author = "Kavon Farvardin and Tim Kientzle and Slava Pestov", title = "{SE-0427}: Noncopyable generics", url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0427-noncopyable-generics.md", year = {2024} } + +@misc{se0446, + author = "Andrew Trick and Tim Kientzle", + title = "{SE-0446}: Nonescapable Types", + url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0446-non-escapable.md", + year = {2025} +} + +@misc{se0452, + author = "Alejandro Alonso and Joe Groff", + title = "{SE-0452}: Integer Generic Parameters", + url = "https://github.com/swiftlang/swift-evolution/blob/main/proposals/0452-integer-generic-parameters.md", + year = {2025} +} diff --git a/docs/Generics/generics.tex b/docs/Generics/generics.tex index 1c6fe88834fe9..61046cc87b975 100644 --- a/docs/Generics/generics.tex +++ b/docs/Generics/generics.tex @@ -39,14 +39,28 @@ % Theorem-like environments: proposition, example, algorithm, etc. \usepackage{amsthm} +\makeatletter + % For the 'List of Algorithms'. Must load before hyperref. \usepackage{thmtools} +% Workaround from https://github.com/muzimuzhi/thmtools/issues/72 +\IfFormatAtLeastTF{2024-11-01}{ + \renewcommand*\@addtoreset[2]{% + \bgroup + \edef\aliasctr@@truelist{\aliasctr@follow{#2}}% + \let\@elt\relax + \expandafter\@cons\aliasctr@@truelist{{#1}}% + \egroup + \expandafter\xdef\csname theH#1\endcsname{% + \expandafter\noexpand\csname theH#2\endcsname.% + \noexpand\the\noexpand\value{#1}}% + } +}{} + \renewcommand\thmtformatoptarg[1]{#1} % thmtools doesn't offer enough customization options, so hack it up to not print "Algorithm" in each line of the "List of Algorithms" -\makeatletter - \renewcommand\thmt@mklistcmd{% \thmtlo@newentry \@xa\def\csname ll@\thmt@envname\endcsname{% @@ -68,7 +82,6 @@ }{% \ignorespacesafterend } - \makeatother % Hyperlinks in PDF files. We enable numbering in PDF bookmark titles, and backreferences from bibliography entries, and make the section title in the TOC clickable. @@ -80,7 +93,7 @@ % \uptau for generic parameter notation \usepackage{upgreek} -% rcases, etc +% rcases, gather*, etc \usepackage{mathtools} % Nice horizontal rules in tables: \toprule, \midrule, \bottomrule @@ -132,9 +145,6 @@ % Each chapter is its own file. \usepackage{subfiles} -% Strikethrough (\st) -\usepackage[normalem]{ulem} - % Line breaks in URLs in bibliography \usepackage{url} \def\UrlBreaks{\do\/\do-} @@ -165,6 +175,7 @@ % Use the same counter for all three so you get Definition 12.1, Proposition 12.2, Example 12.3 etc. \newtheorem{example}[definition]{Example} +\newtheorem{remark}[definition]{Remark} % We don't want the statement of the theorem to use italics... % \theoremstyle{theorem} @@ -172,6 +183,7 @@ \newtheorem{lemma}[definition]{Lemma} \newtheorem{corollary}[definition]{Corollary} \newtheorem{theorem}[definition]{Theorem} +\newtheorem{conjecture}[definition]{Conjecture} % 'listing' floating environment for source listings \DeclareNewTOC[ @@ -214,6 +226,8 @@ \newcommand{\LemmaRef}[1]{\hyperref[#1]{Lemma~\ref*{#1}}} \newcommand{\AlgRef}[1]{\hyperref[#1]{Algorithm~\ref*{#1}}} \newcommand{\ExRef}[1]{\hyperref[#1]{Example~\ref*{#1}}} +\newcommand{\RemarkRef}[1]{\hyperref[#1]{Remark~\ref*{#1}}} +\newcommand{\ConjectureRef}[1]{\hyperref[#1]{Conjecture~\ref*{#1}}} \newcommand{\FigRef}[1]{\hyperref[#1]{Figure~\ref*{#1}}} \newcommand{\ListingRef}[1]{\hyperref[#1]{Listing~\ref*{#1}}} @@ -249,6 +263,7 @@ \newcommand{\rT}{\ttgp{0}{0}} \newcommand{\rU}{\ttgp{0}{1}} \newcommand{\rV}{\ttgp{0}{2}} +\newcommand{\rO}{\ttgp{1}{0}} % Identifiers \newcommand{\tT}{\texttt{T}} @@ -436,6 +451,21 @@ % Archetypes print [[like this]]. Must be in math mode \newcommand{\archetype}[1]{[\![\texttt{#1}]\!]} +\newcommand{\TypeObjCtx}[1]{\TypeObj{\EquivClass{#1}}} +\newcommand{\ConfObjCtx}[1]{\ConfObj{\EquivClass{#1}}} +\newcommand{\SubMapObjCtx}[2]{\SubMapObj{#1}{\EquivClass{#2}}} +\newcommand{\ReqObjCtx}[1]{\ReqObj{\EquivClass{#1}}} +\newcommand{\FwdMap}[1]{1_{\EquivClass{#1}}} + +% Opaque archetypes for Chapter 12 +\newcommand{\Opaque}[1]{\EquivClass{\circlearrowright \texttt{#1}}} +\newcommand{\Ot}{\Opaque{\tT}} +\newcommand{\Ox}{\Opaque{\rT}} +\newcommand{\Oy}{\Opaque{\ttgp{1}{0}}} + +% Existential archetypes for Chapter 13 +\newcommand{\Exist}[1]{\EquivClass{\exists\,\texttt{#1}}} +\newcommand{\Et}{\Exist{T}} % Generic environment operations \newcommand{\MapIn}{\mathsf{in}_G} @@ -491,6 +521,32 @@ \newcommand{\CRule}[1]{$\diamond\, #1$} +%%% +% Proof at the end of completion chapter +%%% + +\newcommand{\CC}[2]{#1#2 \Eq #1} +\newcommand{\TheM}{\Pres{t, p, q, a}{\CC{t}{p},\, \CC{t}{q},\, \CC{pa}{p},\, \CC{qa}{q}}} +\newcommand{\TheMS}{\Pres{t,p,q,r,a}{\CC{t}{p},\, \CC{t}{q},\, \CC{pa}{p},\, \CC{qa}{q},\, \CC{p}{r},\, \CC{q}{r}}} +\newcommand{\MS}{M_\succ} + +\newcommand{\Eq}{\Leftrightarrow} +\newcommand{\Red}{\Rightarrow} +\newcommand{\Irr}{\textsc{Irr}} +\newcommand{\NotEq}{\not\Eq} +\newcommand{\PAndQ}{\{p,q\}} + +\newcommand{\RInf}{R^\infty} + +\newcommand{\Rank}{\mathsf{rank}} +\newcommand{\Deg}{\mathsf{deg}} +\newcommand{\First}{\mathsf{first}} +\newcommand{\Tail}{\mathsf{tail}} + +\newcommand{\Ell}{\ell(S_0)} +\newcommand{\Ts}{\mathfrak{C}_1} +\newcommand{\FP}{\mathrm{FP}} + %%% % Math helpers %%% @@ -505,10 +561,6 @@ \DeclareMathOperator{\Src}{src} \DeclareMathOperator{\Dst}{dst} -% Old substitution algebra notation -\newcommand{\mathboxed}[1]{\boxed{\mbox{\vphantom{pI\texttt{pI}}#1}}} -\newcommand{\ttbox}[1]{\boxed{\mbox{\vphantom{pI\texttt{pI}}\texttt{#1}}}} - % GitHub links in 'Source Code Reference' sections \newcommand{\SourceFile}[1]{\href{https://github.com/swiftlang/swift/tree/main/#1}{\texttt{#1}}} @@ -674,7 +726,7 @@ \part{Semantics}\label{part semantics} \subfile{chapters/type-resolution} -\part{Specialties}\label{part specialties} +\part{Subtleties}\label{part subtleties} \subfile{chapters/extensions} @@ -682,7 +734,7 @@ \part{Specialties}\label{part specialties} \subfile{chapters/conformance-paths} -\subfile{chapters/opaque-return-types} +\subfile{chapters/opaque-result-types} \subfile{chapters/existential-types} @@ -698,9 +750,7 @@ \part{The Requirement Machine}\label{part rqm} \subfile{chapters/property-map} -\subfile{chapters/concrete-conformances} - -\subfile{chapters/rule-minimization} +\subfile{chapters/minimization} % Move subsequent PDF bookmarks to the top level since the Appendix, Bibliography and Index are not logically contained in Part V \bookmarksetup{startatroot}