Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

domain entity: handle requirement #5

Closed
bpereto opened this issue Jan 14, 2022 · 5 comments
Closed

domain entity: handle requirement #5

bpereto opened this issue Jan 14, 2022 · 5 comments

Comments

@bpereto
Copy link
Contributor

bpereto commented Jan 14, 2022

Hi,

saw your library today and was amazed. cool work - also for the possibility to override the bootstrap. (nic.ch is also not yet submitted in iana ;-) )

I see a problem with the requirement of a handle in the domain parsing:

if not handle:

The rdap response profile defines this for domains:
https://www.icann.org/en/system/files/files/rdap-response-profile-15feb19-en.pdf

Section 3.2 - for registries

Contacts (Admin, Technical) - The RDAP response SHOULD contain at least two
entities​, with the ​ administrative​ and ​ technical​ roles respectively within the ​ entity
with the ​ registrar​ role. The ​ entities​ with the ​ administrative​ and ​ technical​ roles
MUST contain valid ​ fn​, ​ tel​, ​ email​ members, and MAY contain a ​ handle and a
valid ​ adr​ element

so entities can MAY contain a handle for the admin and technical role. (its a MUST for the registrar, but not for these two).
can we remove the enforcing of a handle in the extract_entities?

@meeb
Copy link
Owner

meeb commented Jan 15, 2022

Thanks for the comments!

Regarding the requirement for a handle on entities, this was for a number of reasons when I was testing the library with various RDAP servers:

  1. A suprising number of RDAP servers are not fully compliant with the specification and return weird formats
  2. The handle is most commonly used to look up the registrant to try and find details on the entity, for example, you look up an allocation which returns entities, ideally to get any information on the registrant you need then do a query for the registrant entity and this requires the handle for the API call as the entity RDAP queries are all referenced by handle

That limitation can be removed, however, results were "more useful" with it in place as almost all of the records that were caught by the "do you have a handle" check were junk or pointless (empty fields, results with just notes in, placeholders etc.). It's not totally accurate as per the spec as you've noticed, but it seemed to be more real world useful when attempting to actually query objects.

I could make it a config flag / opt-in param as well if that would be suitable. Can you give a use case where this check might filter out a record that shouldn't be filtered out? You can always just use raw=True as well and bypass the whoisit parser entirely.

Feel free to submit a PR to add the .ch RDAP endpoint, the format is pretty straight forward in https://github.com/meeb/whoisit/blob/main/whoisit/overrides.py

@bpereto
Copy link
Contributor Author

bpereto commented Jan 15, 2022

I experienced the same, that not all rdap providers are complient to the minimal rdap_level0 and do not follow the spec.

True, the parsing with raw=True is a workaround but misses the point of a handy summary of the roles.

I'm curently re-reading the specification and I get the conclusion, that rdap from TLD-CH does not quite follow the spec, as the registrar role contact entity does not contain a handle.

In response to registrar
queries, the returned RDAP response MUST be an ​ entity​ with ​ registrar​ role, with a ​ handle​ and valid elements ​ fn​, ​ adr​, ​ tel​, ​ email​

With the current code you get in the best case a registrar, tech, admin contact entity or in the worst case, nothing :)

Here an example:

>>> import whoisit
>>> whoisit.overrides.iana_overrides['domain'].update({'ch': ['https://rdap.nic.ch/']})
>>> whoisit.bootstrap(overrides=True)
True
>>> response = whoisit.domain('test.ch')
>>> import pprint
>>> pprint.pprint(response)
{'copyright_notice': '',
 'description': [],
 'entities': {},
 'expiration_date': None,
 'handle': 'TEST.CH',
 'last_changed_date': None,
 'name': 'test.ch',
 'nameservers': ['ns1.cyon.ch', 'ns2.cyon.ch'],
 'parent_handle': '',
 'registration_date': datetime.datetime(1996, 11, 7, 0, 0),
 'rir': '',
 'status': ['active'],
 'terms_of_service_url': '',
 'type': 'domain',
 'url': '',
 'whois_server': ''}
>>> response = whoisit.domain('test.ch', raw=True)
>>> pprint.pprint(response)
{'entities': [{'objectClassName': 'entity',
               'roles': ['registrar'],
               'url': 'https://www.kreativmedia.ch',
               'vcardArray': ['vcard',
                              [['version', {}, 'text', '4.0'],
                               ['org', {}, 'text', 'Kreativ Media GmbH'],
                               ['adr',
                                {},
                                'text',
                                ['',
                                 '',
                                 'Höschgasse 45',
                                 'Zürich',
                                 '',
                                 '8008',
                                 'CH']],
                               ['kind', {}, 'text', 'group']]]}],
 'events': [{'eventAction': 'registration', 'eventDate': '1996-11-07'}],
 'handle': 'test.ch',
 'ldhName': 'test.ch',
 'nameservers': [{'ipAddresses': {'v4': ['194.126.200.5'],
                                  'v6': ['2a01:ab20::2']},
                  'ldhName': 'ns1.cyon.ch',
                  'objectClassName': 'nameserver'},
                 {'ipAddresses': {'v4': ['91.206.24.2'],
                                  'v6': ['2001:67c:234::2']},
                  'ldhName': 'ns2.cyon.ch',
                  'objectClassName': 'nameserver'}],
 'notices': [{'description': ['This information is subject to an Acceptable '
                              'Use Policy.'],
              'links': [{'href': 'https://www.nic.ch/terms/aup/',
                         'rel': 'alternate',
                         'type': 'text/html'}],
              'title': 'Acceptable Use Policy (AUP)'}],
 'objectClassName': 'domain',
 'rdapConformance': ['rdap_level_0'],
 'secureDNS': {'delegationSigned': False},
 'status': ['active'],
 'switch_name': 'test.ch'}

in this example you see 'entities': {}, entities is empty, as in the raw response there is a entity with role registrar .
Due to the swiss law: Since 1 January 2021, personal data associated with registered domain names is no longer disclosed. Information about holders of domain names can only be obtained in exceptional cases.

thanks for the discussion. I will probably stick to parsing the raw data.
and ping the TLD for the inclusion for the handle :)

@meeb
Copy link
Owner

meeb commented Jan 15, 2022

Thanks for the example. I've just released v2.4.2 which you can upgrade to now. This includes the following commits:

v2.4.1...v2.4.2

The behaviour now is:

>>> from pprint import pprint
>>> import whoisit
>>> whoisit.bootstrap(overrides=True)
True
>>> results = whoisit.domain('test.ch')
>>> pprint(results)
{'copyright_notice': '',
 'description': [],
 'entities': {'registrar': [{'name': 'Kreativ Media GmbH',
                             'type': 'entity',
                             'url': 'https://www.kreativmedia.ch'}]},
 'expiration_date': None,
 'handle': 'TEST.CH',
 'last_changed_date': None,
 'name': 'test.ch',
 'nameservers': ['ns1.cyon.ch', 'ns2.cyon.ch'],
 'parent_handle': '',
 'registration_date': datetime.datetime(1996, 11, 7, 0, 0),
 'rir': '',
 'status': ['active'],
 'terms_of_service_url': '',
 'type': 'domain',
 'url': '',
 'whois_server': ''}

I re-ran some checks and the handle check was OK to remove, it was added quite early on and other checks to remove junk results were added afterwards so it wasn't doing a great deal other than filtering non-spec compliant entities.

@bpereto
Copy link
Contributor Author

bpereto commented Jan 15, 2022

thank you.

just for clarification what i learned from my research:

The entity object class can contain the following members

note: CAN. not only for the object entity class, its "can" for all object members.

TLD registries and registrars are required to implement an RDAP service by 26 August 2019. ICANN org continues to work with gTLD registries and registrars to implement a service-level agreement and registry reporting requirements for RDAP.
https://www.icann.org/rdap

so I conclude that all gTLDs should have a well defined rdap response profile conforming to icann_rdap_response_profile_0.

All the other TLDs, and in the most cases the ccTLDs implementations, have no requirements what they must or should return, only what properties/members are available and can be used.

@bpereto bpereto closed this as completed Jan 15, 2022
@meeb
Copy link
Owner

meeb commented Jan 16, 2022

Thanks, that's generally what I'd discovered as well. The whoisit parser likely won't ever be fully compliant given it has to handle some potentially invalid upstream responses. Feel free to report any other issues if you find any with data being over or under extracted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants