-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic PSL formatting: what fixes should be applied? #2028
Comments
I agree with what I think you're proposing. Let's do the private section bits first and do all of them. Let's figure out how to integrate this with the already automatic script for the ICANN section later. |
Acknowledged. I've set the initial configuration to focus on the private section, and we can look at the icann block later. |
@danderson looks like there is a strange trailing crlf just before the ICANN section close that is causing the automation to trigger a PR daily. Were you the last to touch this? Can you look at that? |
Likely related to the reformatting change, yes. I'm on it, apologies for the noise. |
all good. your contributions towards evolving this are app appreciated immensely and things are never perfect in change. please @ me on merges on the automaion |
Looking at the timings on the git history, I believe #2093 fixed this, along with the submission of #2103. Timeline (times in US Pacific timezone because that's what github's giving me):
Bottom line is this mess was caused by my neglecting to adjust the gtld updater's output to match the reformat done in #2088, which caused the cronjob to keep trying to make the diff go away until Simon merged a fix. And then one last noisy change to make that newline go away once and for all. I'll be more careful in future, and will add "check what the automation runs would do" to my PR checklist. The good news is the noise should be over now. The gtld updater and 'psltool fmt' both agree on what the file layout should be, so there should be no more spurious PRs. |
ok - another thing to watch is that the date will always be different in ICANN's json file as they generate it daily, so maybe ignore it if it is the sole diff |
The gtld updater already handles that gracefully, thankfully. The timestamped header isn't included in the "has this changed" check, so something other than the timestamp has to change for it to write a new file and create a PR. |
I have a WIP that can automatically fix formatting issues in a PSL file, as well as apply machine edits while preserving correct formatting. It will take a few more PRs to merge it all, but you can peek at https://github.com/danderson/list/tree/danderson/psl-parser-format .
The idea is to give contributors a tool that they run before sending a PR, and it (a) tells them everything that they need to fix, but also (b) just automatically fixes things that don't require human judgement, like "did you sort these strings correctly". This also makes automated editing easier, because scripts can operate on an abstract syntax tree, and not have to worry about exactly how to lay out text bytes. Just adjust AST nodes, then validate+format+write the file back out and get a clean diff.
I'm opening this issue to get feedback from PSL maintainers: what formatting fixes do you want me to implement? @dnsguru @simon-friedberger
I've already implemented/thought of several, listed below. Each one requires a one-time PSL reformat to fix existing issues, with diffs ranging from <10 lines to >1000 lines. After that, we can require new contributions to run the formatter before sending the PR (and enforce it in CI), and there will be no random churn.
Each of the listed things is independent from the others, you can pick any combination. These changes are all purely cosmetic, and do not change the set of public suffixes, or the meaning of any comments (modulo bugs of course, but I'm confident I can reduce the risk of that with tests, pre/post format sanity checks, etc.).
Basic consistency edits: remove leading/trailing whitespace, list wildcard exceptions immediately after corresponding wildcard, consistent inter-block spacing: patch
Sort blocks in the private section, by company/product/owner name. This automates Volunteer support: comment on PRs that have Sorting wrong #1858, or at least one half of it. Somewhat large change, but not unjustified: all those blocks are indeed in the wrong order today. patch
Sort suffixes in the private section in addition to the blocks. This is the other half of Volunteer support: comment on PRs that have Sorting wrong #1858. Patch for this one is tiny, 2 suffixes move by a few lines. patch
Sort suffixes in the ICANN section. Large initial diff, due to the older TLD/ccTLD blocks with hand-curated information, and order changes in non-Latin scripts (I spot-checked some diffs with native speakers, they agree the new order is "correct" for their languages). patch
Canonicalize metadata comments in the private section. The format I selected for the diff is "the most popular current format", to get a minimal diff. We can pick any format we like though. patch
Sort blocks in the ICANN section. I haven't tried this yet, but I assume the diff is going to be massive, and it needs coordination with the GTLD update script so they don't fight each other. But it's possible and pretty easy to implement, if you want it.
FWIW, my personal opinion:
The text was updated successfully, but these errors were encountered: