Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mirrors to be defined in the index #142

Closed
goodmami opened this issue Oct 5, 2021 · 5 comments
Closed

Allow mirrors to be defined in the index #142

goodmami opened this issue Oct 5, 2021 · 5 comments
Labels
data There is a problem with the indexed data sources enhancement New feature or request
Milestone

Comments

@goodmami
Copy link
Owner

goodmami commented Oct 5, 2021

Several wordnet projects have changes that will require some new features in the index.

  • The OMW English Wordnet was misnamed the Princeton WordNet previously. The old and new data should be renamed, along with the identifiers and versions (e.g., pwn:3.0 -> wn30:1.4+omw). When someone uses the old identifier, a warning should be issued as it redirects to the new resource.
  • It might also be useful to have a general mechanism for displaying messages when someone downloads a wordnet. One use for this is to recommend users of the upcoming OWN-PT wordnet to also get the OWN-EN wordnet to go along with it.
  • There have been some reliability issues with servers hosting wordnets (see Cannot download wn.download('ewn:2020') #141), so it would be good to be able to specify a mirror in case one server times out.

These changes might look like this:

[pwn]
  warn = "This project has been renamed. See https://... for more information."
  [pwn.versions."3.0"]
    redirect = "wn30:1.4+omw"
  [pwn.versions."3.1"]
    redirect = "wn31:1.4+omw"

[wn30]
  label = "OMW English Wordnet 3.0"
  [wn30.versions."1.4+omw"]
    url = "https://..."

[wn31]
  label = "OMW English Wordnet 3.1"
  [wn31.versions."1.4+omw"]
    url = "https://..."

[ewn]
  label = "Estonian Wordnet"
  # ...
  [ewn.versions.2020]
    warn = "The Open English WordNet is now under the identifier 'oewn'. See https://... for more information."
    redirect = "oewn:2020"

[oewn]
  label = "Open English WordNet"
  language = "en"
  license = "https://creativecommons.org/licenses/by/4.0/"
  [oewn.versions.2020]
    urls = [
      "https://en-word.net/static/english-wordnet-2020.xml.gz",
      "https://...",
    ]
  [oewn.versions.2019]
    urls = [
      "https://en-word.net/static/english-wordnet-2019.xml.gz",
      "https://...",
    ]
@goodmami goodmami added enhancement New feature or request data There is a problem with the indexed data sources labels Oct 5, 2021
@goodmami goodmami added this to the v0.9.0 milestone Oct 19, 2021
@goodmami
Copy link
Owner Author

Trying to pin down the behavior, I think that having the redirect be automatic might actually be problematic, because if it happens silently, the user won't be aware that they should use a different identifier in order to use the lexicon. e.g:

>>> wn.download('pwn:3.0')  # really gets wn30:1.4
>>> pwn30 = wn.Wordnet('pwn:3.0')  # error

Some solutions:

  • Use the redirect to get the proper ID when creating a wn.Wordnet object. This will again issue any warning for the old project ID, but won't necessarily tell the user what the new name is. E.g.,

    >>> pwn30 = wn.Wordnet('pwn:3.0')  # issues warning about pwn:3.0
    >>> pwn30.lexicons()[0].specifier()  # user is unaware of the new ID unless they look for it
    'wn30:1.4'
  • Raise an error for the redirect, instead, mentioning the new ID and version. This is initially disruptive, but then the user is made aware of what they should be doing. E.g.,

    >>> pwn30 = wn.Wordnet('pwn:3.0')
    Traceback (most recent call last):
      ...
    wn.Error: lexicon specifier 'pwn:3.0' has been deprecated; use 'wn30:1.4' instead

@goodmami
Copy link
Owner Author

Another alternative, specify the error message in the index and don't bother with warnings or redirects. E.g.:

[pwn]
  error = "Instead of 'pwn:3.0' and 'pwn:3.1', please use 'wn30' and 'wn31'."

Then...

>>> wn.download('pwn:3.0')
Traceback (most recent call last):
  ...
wn.Error: Instead of 'pwn:3.0' and 'pwn:3.1', please use 'wn30' and 'wn31'.

@fcbond
Copy link
Collaborator

fcbond commented Oct 22, 2021 via email

@goodmami goodmami changed the title Allow mirrors and redirects to be defined in the index Allow mirrors to be defined in the index Oct 22, 2021
@goodmami
Copy link
Owner Author

In that case #146 is for adding error and this issue is about adding mirrors.

I think I might just extend the current url key to accept a space-separated list of URLs:

[oewn]
  label = "Open English WordNet"
  language = "en"
  license = "https://creativecommons.org/licenses/by/4.0/"
  [oewn.versions.2020]
    url = """
      https://en-word.net/static/english-wordnet-2020.xml.gz
      https://...
    """

This way there's no change for those with only one URL.

@fcbond
Copy link
Collaborator

fcbond commented Oct 24, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data There is a problem with the indexed data sources enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants