Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare release #9

Merged
merged 5 commits into from
Aug 12, 2024
Merged

Prepare release #9

merged 5 commits into from
Aug 12, 2024

Conversation

chrzyki
Copy link
Contributor

@chrzyki chrzyki commented Aug 1, 2024

No description provided.

@FredericBlum
Copy link

This PR does not address #8 right? I think we have to take care of this, since this lead to many doubled entries.

@chrzyki
Copy link
Contributor Author

chrzyki commented Aug 1, 2024

No, this will be a separate PR.

@chrzyki
Copy link
Contributor Author

chrzyki commented Aug 5, 2024

Tests are allowed to fail for now. 47827ed and subsequent are a draft to use the online supplement as discussed in #8 to also fix lexibank/lexibank-analysed#49. Note that this is a very quick first draft for using the other data source and I fully expect cognates etc. not to be working 100% correctly for now.

Things I need to do:

  1. verify the data (particularly in comparison to the old version)
  2. verify cognate sets
  3. verify languages (theoretically, the online supplement version of the dataset includes a substantial amount of additional languages)
  4. answer the question whether the data sheet I'm using (i.e. based on English glosses) is indeed one that can be used or if rather should be 33403_SOURCE4_2_A.cognatesets.csv

@chrzyki
Copy link
Contributor Author

chrzyki commented Aug 5, 2024

No, this will be a separate PR.

Apologies, I just continued working in this PR!

@chrzyki
Copy link
Contributor Author

chrzyki commented Aug 7, 2024

Tests are allowed to fail for now. 47827ed and subsequent are a draft to use the online supplement as discussed in #8 to also fix lexibank/lexibank-analysed#49. Note that this is a very quick first draft for using the other data source and I fully expect cognates etc. not to be working 100% correctly for now.

Things I need to do:

1. verify the data (particularly in comparison to the old version)

2. verify cognate sets

3. verify languages (theoretically, the online supplement version of the dataset includes a substantial amount of additional languages)

4. answer the question whether the data sheet I'm using (i.e. based on English glosses) is indeed one that can be used or if rather should be `33403_SOURCE4_2_A.cognatesets.csv`

After a first spot checking everything looks good with using the online supplement rather than the previous raw TSV. The cognate sets also seem correct to me. @LinguList, this would close #8 (but I'm still going to do some more checks and clean up the XLSX->CSV process). It fixes the issue with the cognates occuring multiple times and leading to doublets (triplets, etc.) in the lexeme list and Lexibank.

Also, the online supplement version of the data contains more language (varieties) than the previous raw data file ((x) denotes that those were present in the previous version of the raw data):

  • PLa (x)
  • CW-QY (x)
  • C-WC
  • C-LJ
  • C-LB
  • C-CJ
  • C-QS
  • CE-YA
  • W-SZP (x)
  • W- SLZ
  • W-DT
  • W-YL
  • E-DC (x)
  • E-HS
  • E-TS
  • XZ (x)
  • YL (young) (x, as just YL)
  • YL (old)
  • MD (x)
  • Eka

Should those be included as well or were they excluded for a reason? (not an expert on the languages/studies so there might be a reason I'm not seeing here).

Copy link
Contributor

@LinguList LinguList left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot. In my opinion, the major work is done for now. Very cool, that you used the official data now, @chrzyki. If we realize that there are any different errors in the future, we can look into those in subsequent versions. For now, I'd say that this is good enough for lexibank 2.0.

@chrzyki
Copy link
Contributor Author

chrzyki commented Aug 12, 2024

Thank you very much for checking! I'll prepare the updated release for Lexibank 2.0.

@chrzyki chrzyki merged commit d92ec94 into master Aug 12, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants