Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libclang/python] Add strict typing to clang Python bindings (#76664) #78114

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DeinAlptraum
Copy link
Contributor

@DeinAlptraum DeinAlptraum commented Jan 14, 2024

This PR adds type annotations to the clang Python bindings, to the point that they pass a strict MyPy check.
This utilizes Python features up to 3.7, thus ending support of the bindings for versions <= 3.6 (see also discussion here: https://discourse.llvm.org/t/type-annotations-for-libclang-python-bindings/70644). I thus removed some compatibility stuff for older Python versions.

I tried to change as little as possible, but sometimes passing the type check without changing semantics would have been impossible. Such changes mostly come down to e.g. returning an empty string instead of None and similar.
Some changes might look unrelated at first glance, but everything was done specifically to resolve typing errors.

I've left comments in some other places where I'm unsure about my changes, especially where they affect semantics.

I know that this is a massive PR, if there's anything I can do to make reviewing this easier etc. please do tell.

This resolves #76664

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jan 14, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 14, 2024

@llvm/pr-subscribers-clang

Author: Jannick Kremer (DeinAlptraum)

Changes

This PR adds type annotations to the clang Python bindings, to the point that they pass a strict MyPy check.
This utilizes Python features up to 3.7, thus ending support of the bindings for versions <= 3.6 (see also discussion here: https://discourse.llvm.org/t/type-annotations-for-libclang-python-bindings/70644). I thus removed some compatibility stuff for older Python versions.

I tried to change as little as possible, but sometimes passing the type check without changing semantics would have been impossible. Such changes mostly come down to e.g. returning an empty string instead of None and similar.
Some changes might look unrelated at first glance, but everything was done specifically to resolve typing errors.

I've left comments in some other places where I'm unsure about my changes, especially where they affect semantics.


Patch is 151.03 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78114.diff

1 Files Affected:

  • (modified) clang/bindings/python/clang/cindex.py (+1733-1201)
diff --git a/clang/bindings/python/clang/cindex.py b/clang/bindings/python/clang/cindex.py
index d780ee353a133cb..809527e712acded 100644
--- a/clang/bindings/python/clang/cindex.py
+++ b/clang/bindings/python/clang/cindex.py
@@ -43,7 +43,7 @@
 Most object information is exposed using properties, when the underlying API
 call is efficient.
 """
-from __future__ import absolute_import, division, print_function
+from __future__ import annotations
 
 # TODO
 # ====
@@ -67,89 +67,690 @@
 import clang.enumerations
 
 import os
-import sys
-
-if sys.version_info[0] == 3:
-    # Python 3 strings are unicode, translate them to/from utf8 for C-interop.
-    class c_interop_string(c_char_p):
-        def __init__(self, p=None):
-            if p is None:
-                p = ""
-            if isinstance(p, str):
-                p = p.encode("utf8")
-            super(c_char_p, self).__init__(p)
-
-        def __str__(self):
-            return self.value
-
-        @property
-        def value(self):
-            if super(c_char_p, self).value is None:
-                return None
-            return super(c_char_p, self).value.decode("utf8")
-
-        @classmethod
-        def from_param(cls, param):
-            if isinstance(param, str):
-                return cls(param)
-            if isinstance(param, bytes):
-                return cls(param)
-            if param is None:
-                # Support passing null to C functions expecting char arrays
-                return None
-            raise TypeError(
-                "Cannot convert '{}' to '{}'".format(type(param).__name__, cls.__name__)
-            )
+from enum import Enum
+
+from typing import (
+    Any,
+    Callable,
+    cast as Tcast,
+    Generic,
+    Iterable,
+    Iterator,
+    Optional,
+    Sequence,
+    Type as TType,
+    TypeVar,
+    TYPE_CHECKING,
+    Union as TUnion,
+)
+from typing_extensions import Protocol, TypeAlias
+
+if TYPE_CHECKING:
+    from ctypes import _Pointer, _FuncPointer, _CArgObject
+    from io import TextIOWrapper
+
+    StrPath: TypeAlias = TUnion[str, os.PathLike[str]]
+    InMemoryFile: TypeAlias = (
+        "tuple[TUnion[str, os.PathLike[Any]], TUnion[str, TextIOWrapper]]"
+    )
+    LibFunc: TypeAlias = TUnion[
+        "tuple[str, Optional[list[Any]]]",
+        "tuple[str, Optional[list[Any]], Any]",
+        "tuple[str, Optional[list[Any]], Any, Callable[..., Any]]",
+    ]
+    CObjP: TypeAlias = _Pointer[Any]
 
-        @staticmethod
-        def to_python_string(x, *args):
-            return x.value
+    TSeq = TypeVar("TSeq", covariant=True)
 
-    def b(x):
-        if isinstance(x, bytes):
-            return x
-        return x.encode("utf8")
+    class NoSliceSequence(Protocol[TSeq]):
+        def __len__(self) -> int:
+            ...
 
-elif sys.version_info[0] == 2:
-    # Python 2 strings are utf8 byte strings, no translation is needed for
-    # C-interop.
-    c_interop_string = c_char_p
+        def __getitem__(self, key: int) -> TSeq:
+            ...
 
-    def _to_python_string(x, *args):
-        return x
 
-    c_interop_string.to_python_string = staticmethod(_to_python_string)
+class ClangLib(Protocol):
+    def clang_annotateTokens(
+        self, tu: TranslationUnit, token: _CArgObject, num: int, cursor: _CArgObject
+    ) -> None:
+        ...
 
-    def b(x):
-        return x
+    def clang_CompilationDatabase_dispose(self, cdb: CompilationDatabase) -> None:
+        ...
+
+    def clang_CompilationDatabase_fromDirectory(
+        self, buildDir: str, errorCode: _CArgObject
+    ) -> CompilationDatabase:
+        ...
+
+    def clang_CompilationDatabase_getAllCompileCommands(
+        self, cdb: CompilationDatabase
+    ) -> CompileCommands:
+        ...
+
+    def clang_CompilationDatabase_getCompileCommands(
+        self, cdb: CompilationDatabase, filename: str
+    ) -> CompileCommands:
+        ...
+
+    def clang_CompileCommands_dispose(self, ccmds: CObjP) -> None:
+        ...
+
+    def clang_CompileCommands_getCommand(self, ccmds: CObjP, key: int) -> CObjP:
+        ...
+
+    def clang_CompileCommands_getSize(self, ccmds: CObjP) -> c_uint:
+        ...
+
+    def clang_CompileCommand_getArg(self, cmd: CObjP, key: int) -> str:
+        ...
+
+    def clang_CompileCommand_getDirectory(self, cmd: CObjP) -> str:
+        ...
+
+    def clang_CompileCommand_getFilename(self, cmd: CObjP) -> str:
+        ...
+
+    def clang_CompileCommand_getNumArgs(self, cmd: CObjP) -> int:
+        ...
+
+    def clang_codeCompleteAt(
+        self,
+        tu: TranslationUnit,
+        filename: str,
+        line: int,
+        column: int,
+        unsaved_files: TUnion[int, Array[_CXUnsavedFile]],
+        num_unsaved_files: int,
+        options: int,
+    ) -> _Pointer[CCRStructure]:
+        ...
+
+    def clang_codeCompleteGetDiagnostic(
+        self, ccrs: CodeCompletionResults, key: int
+    ) -> Diagnostic:
+        ...
+
+    def clang_codeCompleteGetNumDiagnostics(self, ccrs: CodeCompletionResults) -> c_int:
+        ...
+
+    def clang_createIndex(self, excludeDecls: int, displayDiagnostics: int) -> CObjP:
+        ...
+
+    def clang_createTranslationUnit(self, index: Index, filename: str) -> CObjP:
+        ...
+
+    def clang_CXXConstructor_isConvertingConstructor(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXConstructor_isCopyConstructor(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXConstructor_isDefaultConstructor(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXConstructor_isMoveConstructor(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXField_isMutable(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isConst(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isDefaulted(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isDeleted(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isCopyAssignmentOperator(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isMoveAssignmentOperator(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isExplicit(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isPureVirtual(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isStatic(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXMethod_isVirtual(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_CXXRecord_isAbstract(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_EnumDecl_isScoped(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_defaultDiagnosticDisplayOptions(self) -> int:
+        ...
+
+    def clang_defaultSaveOptions(self, tu: TranslationUnit) -> c_uint:
+        ...
+
+    def clang_disposeCodeCompleteResults(self, tu: CodeCompletionResults) -> None:
+        ...
+
+    def clang_disposeDiagnostic(self, diag: Diagnostic) -> None:
+        ...
+
+    def clang_disposeIndex(self, index: Index) -> None:
+        ...
+
+    def clang_disposeString(self, string: _CXString) -> None:
+        ...
+
+    def clang_disposeTokens(
+        self, tu: TranslationUnit, tokens: _Pointer[Token], count: c_uint
+    ) -> None:
+        ...
+
+    def clang_disposeTranslationUnit(self, tu: TranslationUnit) -> None:
+        ...
+
+    def clang_equalCursors(self, self_c: Cursor, other_c: Cursor) -> bool:
+        ...
+
+    def clang_equalLocations(
+        self, self_loc: SourceLocation, other_loc: SourceLocation
+    ) -> bool:
+        ...
+
+    def clang_equalRanges(self, self_r: SourceRange, other_r: SourceRange) -> bool:
+        ...
+
+    def clang_equalTypes(self, self_t: Type, other_t: Type) -> bool:
+        ...
+
+    def clang_formatDiagnostic(self, diag: Diagnostic, options: int) -> str:
+        ...
+
+    def clang_getAddressSpace(self, type: Type) -> int:
+        ...
+
+    def clang_getArgType(self, parent: Type, key: int) -> Type:
+        ...
+
+    def clang_getArrayElementType(self, type: Type) -> Type:
+        ...
+
+    def clang_getArraySize(self, type: Type) -> int:
+        ...
+
+    def clang_getFieldDeclBitWidth(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_getCanonicalCursor(self, cursor: Cursor) -> Cursor:
+        ...
+
+    def clang_getCanonicalType(self, type: Type) -> Type:
+        ...
+
+    def clang_getChildDiagnostics(self, diag: Diagnostic) -> CObjP:
+        ...
+
+    def clang_getCompletionAvailability(self, cs_obj: CObjP) -> int:
+        ...
+
+    def clang_getCompletionBriefComment(self, cs_obj: CObjP) -> str:
+        ...
+
+    def clang_getCompletionChunkCompletionString(
+        self, cs_obj: CObjP, key: int
+    ) -> CObjP:
+        ...
+
+    def clang_getCompletionChunkKind(self, cs_obj: CObjP, key: int) -> int:
+        ...
+
+    def clang_getCompletionChunkText(self, cs_obj: CObjP, key: int) -> str:
+        ...
+
+    def clang_getCompletionPriority(self, cs_obj: CObjP) -> int:
+        ...
+
+    def clang_getCString(self, string: _CXString) -> str | None:
+        ...
+
+    def clang_getCursor(self, tu: TranslationUnit, loc: SourceLocation) -> Cursor:
+        ...
+
+    def clang_getCursorAvailability(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_getCursorDefinition(self, cursor: Cursor) -> Cursor:
+        ...
+
+    def clang_getCursorDisplayName(self, cursor: Cursor) -> str:
+        ...
+
+    def clang_getCursorExceptionSpecificationType(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_getCursorExtent(self, cursor: Cursor) -> SourceRange:
+        ...
+
+    def clang_getCursorLexicalParent(self, cursor: Cursor) -> Cursor:
+        ...
+
+    def clang_getCursorLinkage(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_getCursorLocation(self, cursor: Cursor) -> SourceLocation:
+        ...
+
+    def clang_getCursorReferenced(self, cursor: Cursor) -> Cursor:
+        ...
+
+    def clang_getCursorResultType(self, cursor: Cursor) -> Type:
+        ...
+
+    def clang_getCursorSemanticParent(self, cursor: Cursor) -> Cursor:
+        ...
+
+    def clang_getCursorSpelling(self, cursor: Cursor) -> str:
+        ...
+
+    def clang_getCursorTLSKind(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_getCursorType(self, cursor: Cursor) -> Type:
+        ...
+
+    def clang_getCursorUSR(self, cursor: Cursor) -> str:
+        ...
+
+    def clang_Cursor_getMangling(self, cursor: Cursor) -> str:
+        ...
+
+    def clang_getCXXAccessSpecifier(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_getDeclObjCTypeEncoding(self, cursor: Cursor) -> str:
+        ...
+
+    def clang_getDiagnostic(self, tu: TranslationUnit, key: int) -> CObjP:
+        ...
+
+    def clang_getDiagnosticCategory(self, diag: Diagnostic) -> int:
+        ...
+
+    def clang_getDiagnosticCategoryText(self, diag: Diagnostic) -> str:
+        ...
+
+    def clang_getDiagnosticFixIt(
+        self, diag: Diagnostic, key: int, sr: _CArgObject
+    ) -> str:
+        ...
+
+    def clang_getDiagnosticInSet(self, diag_set: CObjP, key: int) -> CObjP:
+        ...
+
+    def clang_getDiagnosticLocation(self, diag: Diagnostic) -> SourceLocation:
+        ...
+
+    def clang_getDiagnosticNumFixIts(self, diag: Diagnostic) -> c_uint:
+        ...
+
+    def clang_getDiagnosticNumRanges(self, diag: Diagnostic) -> c_uint:
+        ...
+
+    def clang_getDiagnosticOption(
+        self, diag: Diagnostic, disable: _CArgObject | None
+    ) -> str:
+        ...
+
+    def clang_getDiagnosticRange(self, diag: Diagnostic, key: int) -> SourceRange:
+        ...
+
+    def clang_getDiagnosticSeverity(self, diag: Diagnostic) -> int:
+        ...
+
+    def clang_getDiagnosticSpelling(self, diag: Diagnostic) -> str:
+        ...
+
+    def clang_getElementType(self, type: Type) -> Type:
+        ...
+
+    def clang_getEnumConstantDeclUnsignedValue(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_getEnumConstantDeclValue(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_getEnumDeclIntegerType(self, cursor: Cursor) -> Type:
+        ...
+
+    def clang_getExceptionSpecificationType(self, type: Type) -> int:
+        ...
+
+    def clang_getFile(self, tu: TranslationUnit, filename: str) -> CObjP:
+        ...
+
+    def clang_getFileName(self, file: File) -> str:
+        ...
+
+    def clang_getFileTime(self, file: File) -> int:
+        ...
+
+    def clang_getIncludedFile(self, cursor: Cursor) -> File:
+        ...
+
+    def clang_getInclusions(
+        self,
+        tu: TranslationUnit,
+        callback: _FuncPointer,
+        inclusions: list[FileInclusion],
+    ) -> None:
+        ...
+
+    def clang_getInstantiationLocation(
+        self,
+        loc: SourceLocation,
+        file: _CArgObject,
+        line: _CArgObject,
+        column: _CArgObject,
+        offset: _CArgObject,
+    ) -> None:
+        ...
+
+    def clang_getLocation(
+        self, tu: TranslationUnit, file: File, line: int, column: int
+    ) -> SourceLocation:
+        ...
+
+    def clang_getLocationForOffset(
+        self, tu: TranslationUnit, file: File, offset: int
+    ) -> SourceLocation:
+        ...
+
+    def clang_getNullCursor(self) -> Cursor:
+        ...
+
+    def clang_getNumArgTypes(self, type: Type) -> int:
+        ...
+
+    def clang_getNumCompletionChunks(self, cs_obj: CObjP) -> int:
+        ...
+
+    def clang_getNumDiagnostics(self, tu: TranslationUnit) -> c_uint:
+        ...
+
+    def clang_getNumDiagnosticsInSet(self, diag_set: CObjP) -> c_uint:
+        ...
+
+    def clang_getNumElements(self, type: Type) -> int:
+        ...
+
+    def clang_getPointeeType(self, type: Type) -> Type:
+        ...
+
+    def clang_getRange(self, start: SourceLocation, end: SourceLocation) -> SourceRange:
+        ...
+
+    def clang_getRangeEnd(self, range: SourceRange) -> SourceLocation:
+        ...
+
+    def clang_getRangeStart(self, range: SourceRange) -> SourceLocation:
+        ...
+
+    def clang_getResultType(self, type: Type) -> Type:
+        ...
+
+    def clang_getTokenExtent(self, tu: TranslationUnit, token: Token) -> SourceRange:
+        ...
+
+    def clang_getTokenKind(self, token: Token) -> int:
+        ...
+
+    def clang_getTokenLocation(
+        self, tu: TranslationUnit, token: Token
+    ) -> SourceLocation:
+        ...
+
+    def clang_getTokenSpelling(self, tu: TranslationUnit, token: Token) -> str:
+        ...
+
+    def clang_getTranslationUnitCursor(self, tu: TranslationUnit) -> Cursor:
+        ...
+
+    def clang_getTranslationUnitSpelling(self, tu: TranslationUnit) -> str:
+        ...
+
+    def clang_getTypeDeclaration(self, type: Type) -> Cursor:
+        ...
+
+    def clang_getTypedefDeclUnderlyingType(self, cursor: Cursor) -> Type:
+        ...
+
+    def clang_getTypedefName(self, type: Type) -> str:
+        ...
+
+    def clang_getTypeKindSpelling(self, kind: int) -> str:
+        ...
+
+    def clang_getTypeSpelling(self, type: Type) -> str:
+        ...
+
+    def clang_hashCursor(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_isAttribute(self, kind: CursorKind) -> bool:
+        ...
+
+    def clang_isConstQualifiedType(self, type: Type) -> bool:
+        ...
+
+    def clang_isCursorDefinition(self, cursor: Cursor) -> bool:
+        ...
 
+    def clang_isDeclaration(self, kind: CursorKind) -> bool:
+        ...
 
-# Importing ABC-s directly from collections is deprecated since Python 3.7,
-# will stop working in Python 3.8.
-# See: https://docs.python.org/dev/whatsnew/3.7.html#id3
-if sys.version_info[:2] >= (3, 7):
-    from collections import abc as collections_abc
-else:
-    import collections as collections_abc
+    def clang_isExpression(self, kind: CursorKind) -> bool:
+        ...
 
-# We only support PathLike objects on Python version with os.fspath present
-# to be consistent with the Python standard library. On older Python versions
-# we only support strings and we have dummy fspath to just pass them through.
-try:
-    fspath = os.fspath
-except AttributeError:
+    def clang_isFunctionTypeVariadic(self, type: Type) -> bool:
+        ...
 
-    def fspath(x):
+    def clang_isInvalid(self, kind: CursorKind) -> bool:
+        ...
+
+    def clang_isPODType(self, type: Type) -> bool:
+        ...
+
+    def clang_isPreprocessing(self, kind: CursorKind) -> bool:
+        ...
+
+    def clang_isReference(self, kind: CursorKind) -> bool:
+        ...
+
+    def clang_isRestrictQualifiedType(self, type: Type) -> bool:
+        ...
+
+    def clang_isStatement(self, kind: CursorKind) -> bool:
+        ...
+
+    def clang_isTranslationUnit(self, kind: CursorKind) -> bool:
+        ...
+
+    def clang_isUnexposed(self, kind: CursorKind) -> bool:
+        ...
+
+    def clang_isVolatileQualifiedType(self, type: Type) -> bool:
+        ...
+
+    def clang_parseTranslationUnit(
+        self,
+        index: Index,
+        filename: str | None,
+        args_array: Array[c_char_p] | None,
+        num_args: int,
+        unsaved_files: Array[_CXUnsavedFile] | None,
+        num_unsaved: int,
+        options: int,
+    ) -> CObjP:
+        ...
+
+    def clang_reparseTranslationUnit(
+        self,
+        tu: TranslationUnit,
+        num_unsaved: int,
+        unsabed_files: Array[_CXUnsavedFile] | int,
+        options: int,
+    ) -> c_int:
+        ...
+
+    def clang_saveTranslationUnit(
+        self, tu: TranslationUnit, filename: str, options: c_uint
+    ) -> int:
+        ...
+
+    def clang_tokenize(
+        self,
+        tu: TranslationUnit,
+        range: SourceRange,
+        tokens_memory: _CArgObject,
+        tokens_count: _CArgObject,
+    ) -> None:
+        ...
+
+    def clang_visitChildren(
+        self, cursor: Cursor, callback: _FuncPointer, children: list[Cursor]
+    ) -> c_uint:
+        ...
+
+    def clang_Cursor_getNumArguments(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_Cursor_getArgument(self, cursor: Cursor, key: int) -> Cursor:
+        ...
+
+    def clang_Cursor_getNumTemplateArguments(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_Cursor_getStorageClass(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_Cursor_getTemplateArgumentKind(
+        self, cursor: Cursor, key: int
+    ) -> TemplateArgumentKind:
+        ...
+
+    def clang_Cursor_getTemplateArgumentType(self, cursor: Cursor, key: int) -> Type:
+        ...
+
+    def clang_Cursor_getTemplateArgumentValue(self, cursor: Cursor, key: int) -> int:
+        ...
+
+    def clang_Cursor_getTemplateArgumentUnsignedValue(
+        self, cursor: Cursor, key: int
+    ) -> int:
+        ...
+
+    def clang_Cursor_isAnonymous(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_Cursor_isBitField(self, cursor: Cursor) -> bool:
+        ...
+
+    def clang_Cursor_getBriefCommentText(self, cursor: Cursor) -> str:
+        ...
+
+    def clang_Cursor_getRawCommentText(self, cursor: Cursor) -> str:
+        ...
+
+    def clang_Cursor_getOffsetOfField(self, cursor: Cursor) -> int:
+        ...
+
+    def clang_Location_isInSystemHeader(self, loc: SourceLocation) -> bool:
+        ...
+
+    def clang_Type_getAlignOf(self, type: Type) -> int:
+        ...
+
+    def clang_Type_getClassType(self, type: Type) -> Type:
+        ...
+
+    def clang_Type_getNumTemplateArguments(self, type: Type) -> int:
+        ...
+
+    def clang_Type_getTemplateArgumentAsType(self, type: Type, key: int) -> Type:
+        ...
+
+    def clang_Type_getOffsetOf(self, type: Type, fieldname: str) -> int:
+        ...
+
+    def clang_Type_getSizeOf(self, type: Type) -> int:
+        ...
+
+    def clang_Type_getCXXRefQualifier(self, type: Type) -> int:
+        ...
+
+    def clang_Type_getNamedType(self, type: Type) -> Type:
+        ...
+
+    def clang_Type_visitFields(
+        self, type: Type, callback: _FuncPointer, fields: list[Cursor]
+    ) -> c_uint:
+        ...
+
+
+# Python 3 strings are unicode, translate them to/from utf8 for C-interop.
+class c_interop_string(c_char_p):
+    def __init__(self, p: str | bytes | None = None):
+        if p is None:
+            p = ""
+        if isinstance(p, str):
+            p = p.encode("utf8")
+        super(c_char_p, self).__init__(p)
+
+    def __str__(self) -> str:
+        return self.value or ""
+
+    @property
+    def value(self) -> st...
[truncated]

from enum import Enum

from typing import (
Any,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used Any (mostly) to label ununused arguments. More specific annotations would have been possible in some cases, but I did not find this worth the effort.

@DeinAlptraum
Copy link
Contributor Author

@AaronBallman could you review this or recommend reviewers? I didn't see any category in clang/CodeOwners.rst that seems to cover the bindings.

@DeinAlptraum DeinAlptraum marked this pull request as draft January 15, 2024 10:17
@DeinAlptraum
Copy link
Contributor Author

DeinAlptraum commented Jan 15, 2024

I was just made aware of some changes by @linux4life798 that seem reasonable to discuss/merge before this, so I've returned this to draft status for now. Sorry for the confusion.

@DeinAlptraum DeinAlptraum marked this pull request as ready for review June 9, 2024 21:46
@DeinAlptraum
Copy link
Contributor Author

Since #83962 is now closed and the minimum Python version updated to 3.8, this is now finally ready for review. I've updated this with the changes to the Python bindings of the past couple months.

I have no idea idea what the error in the test failure means, so any advice would be appreciated.

@AaronBallman could you review this or recommend reviewers?

@DeinAlptraum DeinAlptraum changed the title [clang-bindings] Add strict typing to clang Python bindings (#76664) [libclang/python] Add strict typing to clang Python bindings (#76664) Jun 9, 2024
@AaronBallman AaronBallman added the clang:as-a-library libclang and C++ API label Jun 11, 2024
Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your great work!
However, I don't think this can be merged as is:

  1. I have maintainability concerns about ClangLib protocol. From what I see, clang_-prefixed functions are not really intended to be a user-facing interface, and instead wrapped in Python code that has type annotations. When a new function is added to libclang, we won't just add a wrapper, implement it, and annotate, but also add the clang_-prefixed function into ClangLib, and basically duplicate the annotation for it.
  2. I see several bugfixes that you highlighted with your comments. I believe they should be done as a separate PR, because they do something else than just add typing annotations.
  3. Changes to enums are massive, and feel somewhat out of place in this PR as well.

Splitting things out would also help reviewing. Several times I found myself in a "this is indented, but which scope I'm in again? Need to scroll all the way up to find out" situation.

@DeinAlptraum
Copy link
Contributor Author

DeinAlptraum commented Jun 11, 2024

Thanks a lot for your feedback! Yup I get that the PR is pretty big and might still need significant changes.

1. I have maintainability concerns about `ClangLib` protocol [...]

I completely agree that this is ugly, but I didn't find a better solution that would enable a strict type check. How do you think we should handle this?

  1. Do you know of another solution that would correctly annotate this? It would be nice if we could just reuse the types as described in functionList but I don't think this is possible
  2. Alternatively, we could type: ignore all calls to ClangLib attributes. That would require about 180 such annotations I believe. Or otherwise give the protocol class a __getattr__ that returns Any, in which case we only need to type: ignore the non-matching return values in about 120 places
  3. Further, we could go without a strict type-check, only fixing type errors to pass a non-strict typecheck, as well as annotating the user-facing interfaces

Regarding the other two points: I tried to change as little as possible here in order to enable type annotations or fix type errors. While there's a lot of places in cindex.py that could use refactoring etc. I held off on doing this unless it was necessary for annotations. Both

2. I see several bugfixes that you highlighted with your comments. I believe they should be done as a separate PR, because they do something else than just add typing annotations.

and

3. Changes to enums are massive, and feel somewhat out of place in this PR as well.

were a direct result of my attempt to fix type errors or annotate interfaces correctly.
The enum changes were also necessary, since the implementation up to now "dynamically" assigned the enum variants after declaration of the class. That means if you use e.g. CursorKind.UNEXPOSED_DECL in a script, this will work fine but fail the type check since it doesn't recognize UNEXPOSED_DECL as an attribute of CursorKind. This is thus effectively part of annotating the user-facing interfaces.

Should I close this PR for now and split this into multiple PRs for the bugfixes and then several smaller, grouped changes?

@Endilll
Copy link
Contributor

Endilll commented Jun 11, 2024

Conceptually, ClangLib works for strict type checking, because implementations are opaque, so the checker has to simply believe the annotations. What I want is to push this opaqueness boundary closer to the user-facing API, and get rid of the large body of code that is here just to satisfy strict checker without providing much value.

Alternatively, we could type: ignore all calls to ClangLib attributes. That would require about 180 such annotations I believe.

This sounds fine to me. Even though I don't care too much about strict type checkers, ubiquitous // type: ignore on calls to clang_-prefixed functions should be rather easy to pick up by both authors and reviewers of future PRs.


Should I close this PR for now and split this into multiple PRs for the bugfixes and then several smaller, grouped changes?

I didn't intend you to close this PR in any case. My thinking was that you submit some of the changes from this PR and another PR, it gets merged into main, then you merge main back into this PR, eliminating changes from here.

That said, what you said about enums sounds convincing. I still would like bugs to be fixed separately (hopefully with tests). The fact that they get in the way of this PR is good enough to show that they are actually bugs to be fixed.

DeinAlptraum pushed a commit to DeinAlptraum/llvm-project that referenced this pull request Jun 14, 2024
Use Python's builtin enum class instead of writing our own.

This is preparation for strict typing in PR llvm#78114
Endilll added a commit that referenced this pull request Jul 12, 2024
Use Python's builtin enum class instead of writing our own.

This is preparation for passing a strict type check in PR #78114 ,
fixing 927 out of 1341 strict typing errors

---------

Co-authored-by: Jannick Kremer <jannick-kremer@gmx.de>
Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav@gmail.com>
aaryanshukla pushed a commit to aaryanshukla/llvm-project that referenced this pull request Jul 14, 2024
Use Python's builtin enum class instead of writing our own.

This is preparation for passing a strict type check in PR llvm#78114 ,
fixing 927 out of 1341 strict typing errors

---------

Co-authored-by: Jannick Kremer <jannick-kremer@gmx.de>
Co-authored-by: Vlad Serebrennikov <serebrennikov.vladislav@gmail.com>
Copy link

github-actions bot commented Jul 30, 2024

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r 55255669077b191043b1a8920107890815151835...04641f7ea15382df2d43fecc32bd1b699e0b2481 clang/bindings/python/clang/cindex.py
View the diff from darker here.
--- cindex.py	2024-07-31 12:40:11.000000 +0000
+++ cindex.py	2024-07-31 13:03:03.564878 +0000
@@ -3054,11 +3054,13 @@
     # Used to indicate that brief documentation comments should be included
     # into the set of code completions returned from this translation unit.
     PARSE_INCLUDE_BRIEF_COMMENTS_IN_CODE_COMPLETION = 128
 
     @staticmethod
-    def process_unsaved_files(unsaved_files: list[InMemoryFile]) -> Array[_CXUnsavedFile] | None:
+    def process_unsaved_files(
+        unsaved_files: list[InMemoryFile],
+    ) -> Array[_CXUnsavedFile] | None:
         unsaved_array = None
         if len(unsaved_files):
             unsaved_array = (_CXUnsavedFile * len(unsaved_files))()
             for i, (name, contents) in enumerate(unsaved_files):
                 if hasattr(contents, "read"):

@DeinAlptraum
Copy link
Contributor Author

I rebased this on the other changes, so there's mostly relatively simple annotations left now. Still a few hundred of them though. @Endilll do you want me to split this further, or is this fine?

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+471 −397 is still a bit too much to my liking. I think there's a way for further splitting. I left comments about possible direction.

clang/bindings/python/clang/cindex.py Show resolved Hide resolved
clang/bindings/python/clang/cindex.py Outdated Show resolved Hide resolved
Endilll pushed a commit that referenced this pull request Jul 31, 2024
On its own, this change leads to _more_ strict typing errors as the
functions are mostly not annotated so far, so the `# type: ignore`s are
reported as Unused. This is part of the work leading up to #78114
though, and one of the bigger parts factored out from it, so these will
later lead to less strict typing errors as the functions are annotated
with return types.
@DeinAlptraum
Copy link
Contributor Author

I'm planning to do one last split that contains all "logic" changes, after which this PR should contain only actual typing annotations and nothing else. Will open a PR for that other change one of these days

clementval pushed a commit to clementval/llvm-project that referenced this pull request Jul 31, 2024
…101310)

On its own, this change leads to _more_ strict typing errors as the
functions are mostly not annotated so far, so the `# type: ignore`s are
reported as Unused. This is part of the work leading up to llvm#78114
though, and one of the bigger parts factored out from it, so these will
later lead to less strict typing errors as the functions are annotated
with return types.
banach-space pushed a commit to banach-space/llvm-project that referenced this pull request Aug 7, 2024
…101310)

On its own, this change leads to _more_ strict typing errors as the
functions are mostly not annotated so far, so the `# type: ignore`s are
reported as Unused. This is part of the work leading up to llvm#78114
though, and one of the bigger parts factored out from it, so these will
later lead to less strict typing errors as the functions are annotated
with return types.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:as-a-library libclang and C++ API clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[clang/python] Add type annotation to libclang Python binding
4 participants