Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature table -- Outdated -- missing 1000 segments #373

Open
tang-kevin opened this issue Mar 27, 2024 · 7 comments
Open

Feature table -- Outdated -- missing 1000 segments #373

tang-kevin opened this issue Mar 27, 2024 · 7 comments

Comments

@tang-kevin
Copy link

Dear Daniel,

I am looking for the feature table that covers all the segments on https://phoible.org/parameters (3,183 segments)

But the one on Github for download: https://github.com/phoible/dev/blob/master/raw-data/FEATURES/phoible-segments-features.tsv
only has 2161 segments and it is missing basic segments such as "ɚ".

Perhaps it was not updated properly, as the other files in the same folder was all updated one year ago.

Many thanks,
Kevin

@drammock
Copy link
Member

I thought it should be possible on https://phoible.org/parameters to simply press a "download" button to get that table, but looking at the site now I don't see any download button (cc @xrotwang - am I just mistaken that there should be a download button for the full segments table?)

As for why the table in https://github.com/phoible/dev/blob/master/raw-data/FEATURES/phoible-segments-features.tsv has fewer segments than the table at https://phoible.org/parameters --- I'm not sure off the top of my head. The website should reflect the state of this repo as of 862bec9 (the 2.0 release tag) but if I look at that file from the 2.0 release commit it has 2163 lines, not 3183 like on the website. Maybe @bambooforest or @xrotwang have ideas? The files in https://github.com/clld/phoible/tree/master/phoible/static/data have suspicious-looking filenames/dates, making me wonder if the live data is in fact out of date?

@xrotwang
Copy link
Contributor

The process of feeding PHOIBLE 2.0 into the web app wasn't particularly streamlined :) This should be a lot simpler for PHOIBLE 3.0, I'd hope.

So, the data from https://github.com/phoible/dev/ was converted to a CLDF dataset using scripts in https://github.com/bambooforest/phoible-scripts . The process is described in https://github.com/bambooforest/phoible-scripts/blob/master/to_cldf/to_cldf.md and here we already see the 3,183 show up. This CLDF data then served as input to basically copy the CLDF data but add metadata in https://github.com/cldf-datasets/phoible/ - which eventually was loaded into the web app database.

As far as I can tell, the primary data source in the phoible/dev repos is the RData object https://github.com/phoible/dev/blob/master/data/phoible.RData , but @bambooforest might know more about this.

@xrotwang
Copy link
Contributor

xrotwang commented Mar 28, 2024

(cc @xrotwang - am I just mistaken that there should be a download button for the full segments table?)

I did away with the per-table download buttons when I moved to the new paradigm that clld apps only serve data from released CLDF datasets. Thus, rather than download (filtered or sorted or otherwise manipulated) individual collections of rows (without any provenance information), users are encouraged to work from the full CLDF dataset, which includes metadata regarding provenance, etc.

I realize that the PHOIBLE app still advertises the per-table download feature, though. Should be changed (see clld/phoible#32).

@xrotwang
Copy link
Contributor

@tang-kevin Looking at your particular example, maybe phoible-segments-features.tsv isn't supposed to be the full list of segments appearing in any inventories? Just grepping for the segment reveals that it appears elsewhere:

$ grep "ɚ" raw-data/*/*
raw-data/FEATURES/component-feature-table.csv:ɚ,025A,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,+,+,+,+,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0
raw-data/UZ/UZ_inventories.tsv:				"ɚː"		"ɚː"	"""vowels are lowered and centralised before [ɹ] and many contrasts are lost"""	
raw-data/UZ/UZ_inventories.tsv:				"ɚ"		"ɚ"	

@tang-kevin
Copy link
Author

tang-kevin commented Mar 28, 2024 via email

@xrotwang
Copy link
Contributor

@tang-kevin As far as I can tell, https://github.com/cldf-datasets/phoible/blob/v2.0.1/cldf/parameters.csv is exactly the complete list of all sounds encountered in any of the inventories covered in PHOIBLE.

@tang-kevin
Copy link
Author

@xrotwang Thank you. It does appear to have all 3183 sounds! It solves my personal problem for sure.
I would suggest the PHOIBLE website to direct the reader to this file instead of download.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants