-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor makeunicodedata.py: dedupe parsing, use dataclass #81941
Comments
I spent some time yesterday on bpo-18236, and I have a patch for it. Most of that work happens in the script Tools/unicode/makeunicode.py , and along the way I made several changes there that I found made it somewhat nicer to work on, and I think will help other people reading that script too. I'd like to try to merge those improvements first. The main changes are:
There's no radical restructuring or rewrite here; this script has served us well. I've kept these changes focused where there's a high ratio of value, in future ease of development, to cost, in a reviewer's effort as well as mine. I'll send PRs of my changes shortly. |
Just posted three PRs:
Two remaining patches go on top of #59335. Here are drafts, in case they're helpful for reference:
I figure they may be easiest to review after the PR they depend on is merged, so my plan is to send PRs for them each in turn. |
Thanks for working this. In your interested in doing some more hacking on Unicode data, there's bpo-32771. |
$ find -name "*db.h"
./Modules/unicodedata_db.h
./Modules/unicodename_db.h
./Objects/unicodetype_db.h For .gitattributes, I would prefer to use unicode*_db.h pattern rather than "a generic" *_db.h, to avoid masking other potential file called "something_db.h" in the future in GitHub. But it's not really an issue right now ;-) |
"from typing import *" I like to run pyflakes time to time on the Python code base. Please avoid "import *" since it prevents pyflakes (and other code analyzers) to find bugs. Example: $ pyflakes Tools/unicode/makeunicodedata.py
Tools/unicode/makeunicodedata.py:35: 'from typing import *' used; unable to detect undefined names
Tools/unicode/makeunicodedata.py:921: 'Iterator' may be undefined, or defined from star imports: typing
Tools/unicode/makeunicodedata.py:921: 'List' may be undefined, or defined from star imports: typing
Tools/unicode/makeunicodedata.py:921: 'Iterator' may be undefined, or defined from star imports: typing
... |
BTW: Since when do we use type annotations in Python's stdlib ? |
Ah fair enough, thanks! Pushed that change to the next/current PR, #59453. |
Hmm, interesting question! At a quick grep, it's in a handful of places in the stdlib: asyncio, functools, importlib. The earliest it appeared was in 3.7.0a4. It's in more places in the test suite, which I think is a closer parallel to this maintainer script in Tools/. The typing module itself is in the stdlib, so I don't see any obstacle to using it more widely. I imagine the main reason it doesn't appear more widely already is simply that it's new, and most of the stdlib is quite stable. |
What is the minimal Python version for developing CPython? The system Python 3 on current Ubuntu LTS (18.04) is 3.6, so I think it should not be larger. |
Ah, I think my previous message had an ambiguous parse: the earliest that *uses* of the typing module appeared in the stdlib was 3.7. The typing module has been around longer than that. I just checked and I think it would be OK for doing development on CPython to require the latest minor version (i.e. 3.7) -- after all, if you're doing development, you're already building it, so you can always get a newer version than your system provides if needed. But happily the question is moot here, so I guess the place to discuss that further would be a new thread. |
This is good. But the title mentioned dataclasses, and they are 3.7+. |
Ahh, sorry, I think now I understand you. :-) Indeed, when I switch to the branch with that change (gnprice@2b4aec4dd -- it comes after the patch that's #59453, so I haven't yet sent it as a PR), then I think this is fine. Most of all that's because this always works:
Anyone who's going to be running that script will want to build a In fact |
From my perspective, the main problem with using type annotations is that there's nothing checking them in CI. |
Yeah, fair concern. In fact I think I'm on video (from PyCon 2018) warning everyone not to do that in their codebases, because what you really don't want is a bunch of annotations that have gradually accumulated falsehoods as the code has changed around them. Still, I think from "some annotations + no checking" the good equilibrium to land in "some annotations + checking", not "no annotations + no checking". (I do mean "some" -- I don't predict we'll ever go sweep all over adding them.) And I think the highest-probability way to get there is to let them continue to accumulate where people occasionally add them in new/revised code... because that holds a door open for someone to step up to start checking them, and then to do the work to make that part of CI. (That someone might even be me! But I can think of plenty of other likely folks to do it.) If we accumulated quite a lot of them and nobody had yet stepped up to make checking happen, I'd worry. But with the smattering we currently have, I think that point is far off. |
Even if unchecked, type annotations can serve as builtin documentation, as docstrings (even when docstrings are not checked ;-)). Well, I don't have a strong opinion on type annotations in the Python stdlib. But here we are talking about a tool which is not installed, so I don't think that it matters much. It's not really part of the "stdlib". |
On Wed, Aug 14, 2019, at 03:25, STINNER Victor wrote:
Yes, but no one has the expectation that docstrings are automatically verified in any way. :) |
(A bit easy to miss in the way this thread gets displayed, so to highlight in a comment: #59470 is up, following the 5 other patches which have now all been merged. That's the one that replaces the length-18 tuples with a dataclass.) |
Thanks Benjamin for reviewing and merging this series! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: