-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Authority Syntax #2002
Comments
wow that is a lot. I think you're seeking to tap maple syrup from a pine 2x4 here. I mean it is smart idea, it sounds sensible, and I get the objective. The concept, no matter how reasonable it may sound, comes front-loaded with obligations on the downline consumers of the PSL to implement something. And I think that is where it subsequently fails. Some may embrace this, some may not. It is a really, really diverse set of consumers/integrators with varying levels of engagement/set-and-forget use of the PSL from the github repo and the maintainers have no dominion over those parties or what outcomes to expect from them. Some of the past initiatives akin to this, to introduce things beyond a text file have been glorious volunteer cycle drains that netted out to manifest as disposible effort for the volunteers. The challenge has been that the PSL consumer space seems to be lacking a desire to evolve. If a portion of the consumer space might engage and others wont. The main concern that should be paramount to us all is to not introduce fragmentation. So this is why we keep the status quo and do so to ensure the lowest common denominator is consistent for all consumers. I'd look, as a parallel of your concept to maybe server-side includes or something at the time of PSL generation as a means to perhaps accomplish what you're proposing.. things that might generate these sections into the PSL on some interval. There is some dev currently happening on automation - if Amazon wanted to put resourcing towards aiding in our backlog debt as we get that all oranized it might be helpful to look at some of the DNS validation script work being done at the time of pull request reviews, and see if that library could be extended to do a subsequent poll in the presence of some commented text. We need to get that all stable first, and focus on not breaking the status quo, but there is some thought going towards generating the list in other formats on the undocumented roadmap. Meanwhile, there is gratitude for the processes put in place to not wrecking-ball our (I am being generous when I say) modest volunteer resources, and recognition that the desire is to net out to something smarter and better. |
I defer to @dnsguru and other maintainers on the strategic side of what evolution is even possible. But, as I'm working on some PSL tooling right now, on a technical level here's some short/medium term things we could do to ease both Amazon's and PSL maintainers' burden. My options aren't particularly elegant or web-scale, but pragmatically they're something I could commit to implementing, if they're acceptable to PSL maintainers. Certainly I'm sure I can do these quicker than defining a new protocol, file format, and getting the world to adopt both :) Support (hardcoded) recursive ownership validationLooking at the block of Amazon-managed domains in the PSL, the large majority fall within a few top-level domains, whose ownership is either well established, or could be established by some TBD one time process. Assuming that's done, we could change the parser/validator code I'm writing as follows:
Modulo working out a one-time verification, this reduces the cost of reviewing Amazon's PSL changes to ~nothing, and reduces Amazon's per-new-suffix burden to ~nothing. There is still some burden when validating new parent domains and new authorized PR senders, but as I said I think I can get that burden down to the equivalent of clicking approve on a trivial PR. I can't promise any timeline for doing this, since I'm a volunteer and I'm still getting the very basic machinery for the automation set up, and writing basic validation passes. But this is all squarely in the ballpark of the general kind of automation I would like to end up with, even if I wasn't thinking of this specific shape until this thread. Import external suffixes as a cronjobThis is effectively the parallel option @dnsguru proposed: Amazon gives us a URL of the suffixes it wants for its domains, and a robot periodically pulls that and merges it into the PSL - either as a PR that still requires human signoff, or fully hands off if the merge passes tests+lints. The tooling I'm building is explicitly aiming to enable machine-driven edits without causing spurious changes or requiring a wholesale format change. Right now machine-edits is at the bottom of my TODO, behind "get the basics set up", "write/port/improve a bunch of lints and validations", and so on. So this would be more of a medium-term thing, but again if this is acceptable to the PSL maintainers in principle, that's definitely something I'm willing to build towards, and we can work out the details when I have code in hand to look at. Both of these proposals don't require downstream consumers to change anything, and (hopefully) provide incremental improvement along the way. I freely admit it's not particularly elegant or scalable, and it doesn't fix all the pain points that were mentioned... But I'm very sure I can get the above working with minimal surprises, and afaict it would incrementally reduce the workload of maintainers on both sides of this bug. It's also all a logical continuation of my existing automation todo list, so ~95% of work needed for the above is stuff I plan to implement regardless of what we do about this specific bug (again, assuming my plans and implementation are acceptable to PSL maintainers, my plans are just my plans to try and help, not an authoritative roadmap). |
We are exploring a comment syntax that would allow for expression of an abuse contact and rdap/whois to be present, and a possible idea for how one might express a "go get more sub-stuff" might be to look at that syntax as a means to express what Ian is attempting to have occur in a more backward-compatible manner... One possible friendly amendment might be to alter the syntax suggested from lines that would start with an Example:
Might instead be articulated as
Of course, I think convincing consumers to do anything more than just read in a text file is a large issue. As is building out whatever retrieval subroutines with guardrails to keep subitem retrieval remains within the namespace scope. This all assumes that consumers might read and do, or ignore a comment line. Rather than being "hot-cuppa-no", wanted to theorize some "maybe" ideas. All of this, of course, still assumes there were resources beyond the existing ones to do anything. You wouldn't happen to know a large internet company that has over a trillion dollar valuation that could put forth more than just ideas for unpaid volunteers to solve for them, would you? |
@aph3rson Can we clarify the motivation a bit, please?
This seems to be an organizational issue and changing the rules for validation as @danderson suggests might be a solution here and would not require a change to the PSL format. Does that sound correct? Maybe I missed it but I don't see how the "validate at the top" solution logically applies to multiple registries? So, Badger Technologies says they are the owner of badger-tech.com and want to decide the PSL entries with a @badger-tech.example entry or such. Then, their customer at cust1.badger-tech.example wants to have that added to the PSL but cust2.badger-tech.example does not want to be on the PSL, how do they do it?
Is the concern here just the size of the PSL? Or would you like to add and remove entries without changing the PSL? Single-level wildcards haven't always been a rule and I wouldn't mind changing that but as @dnsguru says, we don't know what the downstream consumers do. Looking at for example this section
You could probably cut that down to a third or so already by using
is there a reason why you're not doing that?
I'm not sure I understand what the problem is here. While we have plans to automatically check the _psl entries in DNS and removing entries we haven't been doing that so far so DNS changes shouldn't really impact anyone. |
@aph3rson it seems like this is a 'flying submarine' request; ie seeking something beyond the capability of a text file. We have a lot of 'set and forget' participant requestors that are super casual about their listings once their PR gets merged, an example being #1401 (comment) where the requestor let a name lapse, another party picked up the name, and there might be some security implications. It seems like part of the desired functionality saught in this functionality being identified in your issue would be that the listed @domain.example would somehow be treaated as administratively dynamic at the whim of the domain administrator. When looking at the consequence of a lapsed name, using the commented Pull Request as an example, it seems as though introducing a universe of dynamically generated entries introduces significant security issues that would need to be throught through. |
Appreciate the response from the community on this so far. I wanted to drill in to a few points that have been raised:
Ideally, this would not be the case. I fully agree that backwards-compatibility be the most-important aspect of the PSL, and we'd like to preserve the behavior of any existing PSL clients at-present. This is part of the reason we suggested having the "resolution" performed within the existing GitHub Actions workflow that publishes the PSL artifact to publicsuffix.org. Existing clients could continue to pull the same PSL they're expecting, and clients interested in doing their own resolution could pull the artifact from elsewhere (e.g. artifacts from a GitHub release).
The biggest issue for us has been DNS verification. At any given time, we have a good idea which suffixes need to be on the PSL for a given service. We can assert our ownership over a higher-order domain that encompasses those services. However, the mechanisms to emit DNS verification records requires significant interaction from our service teams, and makes automation of this difficult.
We can look into if we can provide some support here.
I'm not personally in favor of a specialized comment syntax here, as the failure mode of not being able to parse that is "the comment is silently ignored." With a new type of prefix character, presumably a PSL library would know to fail (loudly) if an unrecognized syntax was seen.
A very good point. Barring some kind of cryptographic signature mechanism on the authority’s list of suffixes (or perhaps some level of certificate pinning?), it may be tough to determine if the same party controls that domain at any given time. This might be closer to the discussion of “what to do with stale entries?,” though, as I think this proposed record type is impacted in the same way as all other private members of the PSL.
This is correct. The majority of our services operate under a domain that is per-partition (e.g. A somewhat-limited number of services have their own domain for customer resources, e.g.
In #1605, we talked about attribution of our commits - mainly, that the submissions will come from a specific org/repository, and commits will be made by members of that org. We manage permissions on that repository closely. Perhaps that might be a better option than a static list of users?
We aim to avoid this if at all possible. I can say that the automatic addition of suffixes for new regions might cause the list to grow faster, but not at the two-million-suffixes-at-a-time rate.
Our thought is that some consumers might want to do the resolution (or merging) themselves - in such cases, we figured the un-merged artifact would be helpful.
That's somewhat-correct. The major stipulation is that, sans DNS verification artifacts, we wouldn't have a lot to hand to the PSL maintainers when we submit our changes. It also increases the work associated with PSL maintainers when these changes (within an existing "authority") are proposed to the PSL.
This is a good point. I don’t know how this might be handled at the moment. It would likely require a level of arrangement between BadgerTech and their customer, in this scenario. (In our example, I can’t think of a situation where we’d hand public-suffix-control to a customer, but the use-case might exist elsewhere.)
Yes, both. Specifically, we’re concerned about the size of the PSL source (not necessarily artifact). Many in the PSL community might look at the source on GitHub, and rely on the artifact to be pulled into a library/browser/other consumer.
We cannot cut those suffixes down as-such. The suffixes provided are specific to the EMR service, and the regional suffixes are shared with many other AWS services in said region. We cannot announce all children of those as public suffixes, as it’s possible it may cause issues in a separate unrelated service. This was discussed in #1605, many of the AWS services fall into one of those zones, EMR is one (as is API Gateway, Cloud9, some portions of S3, and other services).
This isn’t referring to the PSL’s DNS verification process at this point. Prior, there was a recommendation that AWS should reorganize their DNS records for in-scope services to more-closely-align with PSL best-practices (e.g. to support wildcarding) - reorganization in that fashion would be backwards-incompatible for AWS customers, and would break PSL consumers who try to use AWS resources. |
Maybe I am missing something but I think the growth of the PSL source is irrelevant compared to the current scale of the internet. The relevant metric would be the size of the generated artifact since that is what gets a large volume of downloads. But for the artifact it doesn't matter if this proposal goes through. I also cannot think of any reason why a consumer would want to merge the includes themselves. Do you have an example for this? Consequently, I would propose to close this and to open issues regarding the other concerns, which afaiu are
Edit: Re 1. - I am also totally open to having a setup where you make PRs from a known org and we just check against that. |
Hello, PSL community,
We (Amazon) are aware that we take up a sizable chunk of the PSL. While we have taken strides internally to emit only the suffixes absolutely-necessary to the PSL, we recognize that we are still the largest private entity on the PSL today. Part of this is related to our DNS infrastructure, and part of it to PSL syntax.
Some challenges we’ve faced so far include:
Our team has a solution to propose on this, which we’ve dubbed the “authority” syntax. This would function almost like an
#include
orimport
statement in modern programming languages, and allows for those with demonstrated control over a parent domain to define their child public suffixes (either direct or indirect). The idea here being that the load is shifted off of the PSL maintainers for these large PSL entities, and onto the entities themselves/the suite of PSL libraries in existence today.We’d originally defined this in an IETF-style Internet Draft, but those aren’t as conducive to a discussion on GitHub and there isn’t precedent for the PSL being governed by such documents. The examples and considerations below capture the technical detail within said prior Internet Draft, though.
A few top-level TL;DRs:
@
, to denote an “authority record” - e.g. a record of@badger-tech.example
means “fetch additional limited suffixes frombadger-tech.example
.”.well-known
file for that domain, the contents of which uses the same syntax (except authority records, i.e. no recursion) as the PSL.Example
We’ll use the following minimal PSL as an example:
In the above example, Badger Technologies provides a significant number of entries on the PSL, and may require additional suffixes (given their domain architecture). Rather than adding each suffix directly onto the list, an authority record is added to the PSL, with a line beginning with
@
. This syntax is similar to other usage of special characters in the PSL (e.g.!
and*
), and is not expected to impact list sort or functionality. An example of this syntax is below:This line indicates that suffixes in this portion of the list are provided by the owners of
badger-tech.example
. To fetch these suffixes, a PSL consumer would need to pull a well-known URI (RFC5785), such ashttps://badger-tech.example/.well-known/public_suffix_list.dat
(the file name is based on the existing PSL file). This file might contain the following content, which uses otherwise-identical syntax to the standard PSL:Considerations / Open Questions
As with any technical proposal, there are a number of considerations made by the authors that discussion from the PSL community would be helpful on. Some of our open questions on these points are below.
Domain Verification
When adding a new suffix authority record to the PSL, the same DNS verification process associated with current PSL modifications is expected. However, a suffix authority adding records to their own authority file MAY implement their own verification process for entries added - suffix authorities are not required to publish/maintain DNS verification records for the suffixes in their own authority file.
Authority Fetching
The authority file would be fetched from a
.well-known
file (RFC5785) for the authority’s domain. For example, for@badger-tech.example
, the authority file would be located athttps://badger-tech.example/.well-known/public_suffix_list.dat
._psl
child thereof?This might make authority syntax easier for the ICANN suffixes, e.g. a TLS certificate for
https://uk/.well-known/public_suffix_list.dat
might be a challenge.Authority Isolation
It should be noted that in no case should a suffix authority be allowed to add suffixes to the PSL for domains that are not their children. Allowing such behavior would permit a suffix authority on the PSL to influence the PSL behavior for domains not under their control and potentially influence PSL-oriented behavior depended on by other Internet entities. If an authority file contains a suffix which is not its child, that suffix in the authority file MUST be ignored.
Or, should authorities be a “single-level” operation, in that authority files may not also define their own sub-authorities, and are only permitted to use bare-suffixes, wildcards (
*
), or exclusions (!
)?Client Anonymity
If desired, PSL clients MAY choose to add a configuration option to permit/deny interaction with suffix authorities to protect client anonymity. Careful considerations by PSL client maintainers should be observed, as usage of this option will cause the library to operate on an “incomplete” version of the PSL. Implementation of these options is left to individual maintainers of PSL clients and/or libraries which may allow for specifying a custom PSL artifact.
This may be good for backwards compatibility and secure defaults, but may require changes from PSL libraries that wish to perform suffix authority resolution at runtime.
Authority Redirection
To prevent cases of a malfunctioning/malicious suffix authority from directing traffic to a destination they do not operate, redirects are only permitted if the redirect target is within the suffix authority’s domain, or a child domain thereof. Responses from a suffix authority redirecting to an HTTPS server outside of their control SHOULD be rejected.
A CNAME-level redirect would still require TLS to function under the pre-CNAME domain, a 3XX redirect would not.
Logistical Considerations
It should be noted that this could be used to reduce the size of the source for the PSL, and upkeep effort for the PSL maintainers. However, the size of PSL artifacts will likely remain unchanged, and may even increase. The process of collecting changes from authority files may also require changes to the automation currently used within the PSL.
Migration Processes
As PSL syntax hasn’t changed for a significant amount of time (/ever?), there may be some migration/onboarding work necessary here.
We’ve identified a handful of entities that we think could onboard (e.g. entities that have multiple suffixes that share a common suffix themselves, but that common suffix is not on the PSL itself).
The text was updated successfully, but these errors were encountered: