Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language specifications for strings #1370

Open
boxed opened this issue Mar 15, 2023 · 10 comments
Open

Language specifications for strings #1370

boxed opened this issue Mar 15, 2023 · 10 comments

Comments

@boxed
Copy link

boxed commented Mar 15, 2023

In PyCharm you can set the programming/markup language of strings like this:

# language=html
foo = '<div>hello there!</div>'

I find this extremely useful and use it all over the place. A pattern I noticed is that a large proportion of such uses are in fact more like:

# language=html
foo = format_html("<div>{}</div", bar)

In this case the format_html function doesn’t take any random string as the first argument, it takes a string that is supposed to be html. Reading through my usage of # language= I see that I have CSS, HTML, and JavaScript.

There are many places in my codebases that also have very different types of “languages” that unfortunately isn’t supported as a language injection in PyCharm. Some examples:

fully qualified name (module.module.symbol, for example view functions in Django)
module names (module.module, for example app names in Django)

There are probably more, but I think this gets the point across.

PyCharm has some (presumably hardcoded) rules about some of these strings, for example in settings.py it knows that strings in the list INSTALLED_APPS are module names, so you can jump to the definitions of those modules, and PyCharm will resolve and check them for you. But this is a closed system where any introduced variables I create myself can’t be validated in this way.

I think it would be good if python typing could have a facility for this type of thing. When this gets some traction we could see support for it in Python language servers, PyCharm, static analysis tools, etc.

What do you guys think?

(originally posted at https://discuss.python.org/t/language-specifications-for-strings/21826/1 where it was suggested that this is the right place for this discussion)

@srittau
Copy link
Collaborator

srittau commented Mar 15, 2023

There are currently two ways that this could be implemented. Using NewType, which constrains the type:

from typing import NewType, cast

HTMLString = NewType("HTMLString", str)
def print_html(html: HTMLString) -> None: ...

html = cast(HTMLString, "<div>Hi!</div>")
print_html(html)  # works
print_html("<script ...>")  # error

Using Annotated:

from typing import Annotated

def print_html(html: str) -> None: ... 

html: Annotated[str, "html"] = "<div>Hi!</div>"
print_html(html)

In both cases, the hard part is finding common ground for tool manufacturers to support this notation.

@AlexWaygood
Copy link
Member

(originally posted at https://discuss.python.org/t/language-specifications-for-strings/21826/1 where it was suggested that this is the right place for this discussion)

Sorry for the ping-pong -- this issue really belongs over at the python/typing repo, rather than python/typeshed. @srittau, could you transfer it over? :)

@srittau srittau transferred this issue from python/typeshed Mar 15, 2023
@boxed
Copy link
Author

boxed commented Mar 15, 2023

@srittau

In both cases, the hard part is finding common ground for tool manufacturers to support this notation.

100% agreed. It's clearly a "build it and they will come" kind of thing. It has to be built first, and people need to start pushing implementations for it all over the place. I can certainly write lots of PRs, but if the basic architecture isn't there, then I can't :P

@erictraut
Copy link
Collaborator

There has been some discussion of this topic in the pylance discussion forum.

@boxed
Copy link
Author

boxed commented Mar 24, 2023

@srittau Looking again at your example code I think you must have misunderstood the idea. Changing your example to what I want:

def print_html(html: HTMLString) -> None: ...


html = cast(HTMLString, "<div>Hi!</div>")
print_html(html)  # works
print_html("<script ...>")  # ALSO WORKS 

The proposal here is that the second case (print_html("<script ...>")) the tooling could know it's an html fragment, and so can apply syntax highlighting for example. Or check matching tags.

For re.sub this would be a huge step up in usability! And we can specify these languages for strings centrally in Pythons standard library or in typeshed, and when PyCharm and the Python Language Server supports this everyone will get syntax highlighting for regexes in their regex calls automatically without having to change their code and annotate every argument.

@boxed
Copy link
Author

boxed commented Apr 17, 2023

A discussion at microsoft/pylance-release#3952 lead to the concrete suggestion of a Language type used as Language['filename_extension'].

@erictraut
Copy link
Collaborator

I don't think Language['filename_extension'] would work. It would require a bunch of special casing in every runtime type checking library and static analysis tool because a quoted string in the position of a type argument is generally treated as a forward-declared symbol and is parsed as such. Literal is only one exception to this today, and that requires significant special casing.

The other suggestions above, including NewType and Annotated, do not suffer from this problem.

@boxed
Copy link
Author

boxed commented Apr 17, 2023

@erictraut I don't understand the distinction. Why can't Language["html"] be implemented as some variant of Annotated[str, "html"]?

Anyway, I think my point was more about using the word "language" somewhere. Annotated[str, "html"] suffers from adding an annotation string "html" without the context that it's a language it's talking about. To a human that knows a ton about programming that's not a big difference, but to tooling it's the difference between being able to automatically install language support for the specified language, and not.

@adriangb
Copy link

I really like the Annotated solution. Runtime typing has been trying to standardize related metadata, the latest effort is over at https://github.com/annotated-types/annotated-types. It would be very cool if you could do something like:

class MyModel:
    template: Html  # aka Annotated[str, Html(flavor=?)] or similar
    regex: Regex  # aka Annotated[str, Regex(flags=0)] or similar

And have that be available both for static type checking and runtime type checking so that MyModel(regex="<some invalid regex>") gets flagged by static type checkers and MyModel.validate({'regex': 'some unverified data}) gets runtime type checked, all from having a single annotation without having to duplicate it.

@boxed
Copy link
Author

boxed commented Jan 26, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants