Skip to content
Daniel McCloy edited this page May 13, 2015 · 6 revisions

PHOIBLE repositories reorg proposal

github.com/phoible/phoible.github.io

The documentation website. Current location and content is fine. Some existing links will need updating if the below changes are made. Will need a "downloads" page.

github.com/phoible/data

New repository, to include:

  • Rdata and TSV files of the most current aggregated data that passes all tests
  • Might also include the features table? Or other shapes of the same data (allophone-level, language-level, etc), though I'm leaning toward just providing functions to let people generate those as needed.
  • if we made a data-only R package, it would actually need a name that was CRAN-unique, so phoible-data is the obvious choice there I think. Worth discussing.

github.com/phoible/dev

New repository (or possibly renaming of existing repository, not sure which is easier). To include:

  • Raw data source files
  • Raw feature mapping table
  • Aggregation code & associated tests
  • Includes the "data" repository as a "submodule" (see here for an example of how this is set up... looks pretty easy).

github.com/phoible/bib

New repository, to include:

  • bibliographic info
  • mapping tables from our IDs to glottocodes?

github.com/phoible/pub

New repository for presentations / papers / notebooks.

github.com/phoible/script

New repository. Could possibly be made a submodule of "dev" too, if that made sense. Content is user-level R/python code designed to make working with phoible data easier. Examples:

  • Functions for applying "trump"
  • Feature reduction code
  • Functions for reshaping the data (language-level, allophone-level, etc).
  • Code relating to future functionality (phonological environments / rules?)

github.com/phoible/phoible

Possibly goes away? Or gets renamed to one of the new repo names, depending on what is easiest. Or gets used as the new home for user-level scripts.

github.com/phoible/roible

This would be the home of an R package for PHOIBLE, if we ever built one. Other possible names are "phoibler", "phoible-r", "phroible", or "phoibre" (playing on the notorious l/r confusion). It is important that this be its own stand-alone repository, in order for devtools::install_github() to work within R. If we had something like this, then "scripts" would probably go away.

github.com/phoible/phoible-py

If we ever make a python package. Other possible names "pyble" (sounds like bible!), "phyble", "pyhoible", "pyoible", "phoibley", "phoiblepy", "pyphoib" (sounds like typhoid!) and "boiga" -- a type of snake (like python) that vaguely sounds like "phoible". I don't like any of these names very much, to be honest, but making a python package is low on the priority list at the moment anyway so maybe it doesn't matter.

Notes

About names

The repo names above are open to debate. Ideally I would like to come up with names that each start with a different letter so that command-line tab completion is as simple as possible, since our local clones of these repos will likely all live side-by-side in the same folder. It is of course always possible to rename them, but IMO it's just easier (i.e., for relative paths when writing code) if your local folder tree matches mine, and they both match the remote repo, so I'd like to avoid renaming folders, while still getting efficient tab-completion. Some naming ideas:

  • data, dev, bib, pub, plus either "script" or "roible + pyphoib"

    • has two "d" words, and eventually maybe two "p" words, but otherwise seems most intuitive choice
  • data, build, ref, pub, plus either "script", or "roible + pyphoib"

    • fully differentiated first letters for the main repos
    • two "r" words if we add "roible"
    • two "p" words if we add the python package
    • "build" is not the most intuitive, since we're not compiling code in the technical sense, but the analogy more or less works for me.
  • data, agg, bib, pub, roible, pyphoib

    • maybe the best option, as long as the dev/build repo is really only for the aggregation code and doesn't do other things.
  • phoible-data, phoible-dev, phoible-bib, phoible-pubs, phoible, (roible)

    • this one is probably nicest from an end-user perspective, because they'll probably end up with just one or two ("phoible-data", and either "phoible" for scripts or installing "roible" through R into the R library).
    • this one is most annoying from a developer perspective: extra typing, extra tabbing, less visually distinct names in the filesystem.

About sprawl

It might be overkill to have so many separate repos (e.g., one just for bibliographic info). A more minimal set of repos might look like this:

  • phoible-data (either static data dumps or a data-only R package)
  • phoible-dev (with subfolders):
    • aggregation
    • feature-reduction
    • tests
    • raw-data (subfolders ph, gm, spa, etc)
    • data (link to the phoible-data repo as a submodule)
    • mappings (or "tables"; for mapping our IDs to glottocodes, other identifiers?)
  • roible (or "phoibler", R package of useful scripts, to be created later)
  • pubs (or "publications" or whatever)