New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recognising proto-languages #32

Closed
parker57 opened this Issue Jun 13, 2018 · 6 comments

Comments

Projects
None yet
3 participants
@parker57
Copy link
Contributor

parker57 commented Jun 13, 2018

Gerard de Melo's readme states:

Words are given with ISO 639-3 codes
(additionally, there are some ISO 639-2 codes prefixed with "p_" to indicate proto-languages).

From Wikipedia for ISO 639-3:

ISO 639-3.[2] It provides an enumeration of languages as complete as possible, including living and extinct, ancient and constructed, major and minor, written and unwritten.[1] However, it does not include reconstructed languages such as Proto-Indo-European.

The etywn-relety.json contains the following proto_language references

  • 124 instances of 'p_sla', from ISO 639-2 Proto-Slavic
  • 13 instance of 'p_gem', from ISO 639-2 Proto-Germanic
  • 6 instances of 'p_ine', from ISO 639-2 Proto-Indo-European
  • 3 instance of 'p_gmw', not in ISO 639-2 but seems to be Proto-West-Germanic (it only points to the word "iuwiz")

It's probably best to just add add the relevant JSON and document accordingly, for instance 'p_sla' could be

  {
    "name": "Proto-Slavic",
    "type": "extinct",
    "scope": "individual",
    "iso6393": 'p_sla',
    "iso6392B": null,
    "iso6392T": null,
    "iso6391": null
  }

no idea what to put for scope tbh

@jmsv

This comment has been minimized.

Copy link
Owner

jmsv commented Jun 13, 2018

We should probably strip values we're not using from the language json file and just keep keys called name and iso or something, which makes scope value irrelevant

@alxwrd

This comment has been minimized.

Copy link
Collaborator

alxwrd commented Jun 13, 2018

I think it might be useful to keep the extra information and use it to extend the Language class. E.g. Language("eng").type

@parker57

This comment has been minimized.

Copy link
Contributor Author

parker57 commented Jun 13, 2018

I don't think there is much point keeping the other iso values but I am a bit upset don't have language family, that would be neat for analysis and even presentation.

This guy might have better JSON but tragically seems to have stopped at 639-3

  "bo": {
    "639-1": "bo",
    "639-2": "bod",
    "639-2/B": "tib",
    "family": "Sino-Tibetan",
    "name": "Tibetan Standard, Tibetan, Central",
    "nativeName": "བོད་ཡིག",
    "wikiUrl": "https://en.wikipedia.org/wiki/Standard_Tibetan"
  },
  ...
  "ru": {
    "639-1": "ru",
    "639-2": "rus",
    "family": "Indo-European",
    "name": "Russian",
    "nativeName": "Русский",
    "wikiUrl": "https://en.wikipedia.org/wiki/Russian_language"
  },
@jmsv

This comment has been minimized.

Copy link
Owner

jmsv commented Jun 13, 2018

I'm going to strip the unused keys for now, then we can add back other keys (or switch to a different dataset) when we want to expand the language class

jmsv added a commit that referenced this issue Jun 13, 2018

@alxwrd

This comment has been minimized.

Copy link
Collaborator

alxwrd commented Jun 13, 2018

noumar/iso639 looks like a good replacement to keeping the data in this project.

@jmsv

This comment has been minimized.

Copy link
Owner

jmsv commented Jun 13, 2018

Going to close this because the issue is solved - feel free to open a new issue for changing where we source iso639 codes if that's a good idea

@jmsv jmsv closed this Jun 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment