Skip to content

Conversation

@jbv2
Copy link
Contributor

@jbv2 jbv2 commented May 13, 2025

PR Checklist for a new package submission

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
  • The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@jbv2 jbv2 assigned jbv2 and EleniSef and unassigned jbv2 May 13, 2025
@nevrome
Copy link
Member

nevrome commented May 13, 2025

Nice! Thanks @jbv2!

If I understood correctly, then @EleniSef may eventually provide the .janno file for this package. If so and to avoid Git headaches, I suggest you open a PR against @jbv2's branch jbv2:2023_Nakatsuka_CaliMexico, @EleniSef. When the correct .janno file gets added there, then this PR here will automatically get updated as well.

@EleniSef
Copy link
Contributor

Hi! yes, I will start working on the .janno file today and open a PR when it's ready!

@jbv2 jbv2 requested a review from a team May 15, 2025 13:02
I confronted the janno file with the supplementary table from Nakatsuka2023 and changed some stuff (see my review comment). There is a bunch of stuff that should probably be checked which I also mentioned in my comment but haven't changed here as I am not sure they actually need a change or they were intentionally altered as compared to the supplement.
@martynamolak
Copy link
Contributor

martynamolak commented May 21, 2025

Janno file:

    1. I removed hash signs from location columns in the janno file as they might cause problems (be read as comment-out) in some downstream analyses. But of course feel free to ignore this change
    1. Those age-related notes are quite confusing but I see they're straight off the original publication's supplement (although for some samples, like I23706, they are different between the submitted package and the supplement, I added it) but I guess we don't really have much choice but to paste them as are for the Poseidon package.
    1. If the uncalibrated date is not available but the Date_BC_AD_Start and Date_BC_AD_Stop still come from actual C14 dating, then shouldn't we still label it as C14? Or is is mandatory to fill in Date_C14_Uncal_BP and Date_C14_Uncal_BP_Err when Date_Type is set to C14? If not, I think we should still be labelling it as C14 (but that's a matter more for the whole general Poseidon-dev consideration; not particularly for this package @nevrome )
    1. user and ds/ss information did not all adhere to the supplementary info of the original paper (I fixed these in the janno file)
    1. Latitude of Cueva DeLos Muertos Chiquitos (Rio Zape) differs between the paper and your janno (28.813 vs 25.813). Is this an intentional fix or a typo?
    1. Similar for Tayopa Site #60 (Sonora, Sahuaripa) Longitude is -100.8 vs -108.8 in the original
    1. Also, for two samples (I23705, I23706) La Playa (Sonora, Trincheras) the coordinates also differ. And according to some archaeological publication for these sites (https://www.mdpi.com/2073-445X/12/3/560), I think there might be various coordinates within the "La Playa" as there were a couple of sites there.
    1. Also for this one: CA-SBA-477 (California, Lake Cachuma, Tequepis Creek) Lat and Lon are different from the supplement
    1. The age for the I11983, I11984, I11986 and I11987 in the supplement are 100-1400 CE and in the janno these are 1000 - 1400CE. While it makes sense that this was a mistake from the authors' side, it may also be that this is a very uncertain contextual dating hence the wide range. Have you confirmed that anyhow?
    1. I fixed the stop age for sample I23705; must have been a typo
    1. I changed "tooth(molar)" to "tooth_molar" as recommended, also added "; bone" to I2713 as the supplement says it was both, and "_incisor" to I21250 and I21249.

I also had a look at the other files but haven't noticed anything suspicious :)

@martynamolak
Copy link
Contributor

So it seems that if I submit a commit with suggested changes for the janno, it damages the checksums and hence validation fails. I hope you can still see my changes to the janno and can use them to edit your version with a correct checksum.

@EleniSef
Copy link
Contributor

EleniSef commented May 26, 2025

Hi @martynamolak ,

thank you for the super thorough review and for already fixing some of the issues you encountered. You caught a lot of my typos! Thank you.
As a response and a way forward:

    1. If I understood correctly it is mandatory to fill in Date_C14_Uncal_BP and Date_C14_Uncal_BP_Err when Date_Type is set to C14. A solution would be to add this as a Date_Note.
    1. The wrong coordinates were a typo which I will fix.
    1. For the same lat / long coordinates, I am aware they are repeated. This is the way they are reported in the paper's supplementary - most likely by accident. We could search for the right coordinates and correct them but as you said we risk providing more generic and/or wrong coordinates. I am not sure how to handle this, what do you think @nevrome?
    1. For the 1000-1400 dates, it was indeed a typo from my side. I believe that even though it's probably reported wrong in the supplementary we shouldn't correct the date based on this assumption.

Once we agree on how to proceed with the date_note and the repeated coordinates I will finish the edits on the janno and can be uploaded again.

@nevrome
Copy link
Member

nevrome commented May 27, 2025

Thank you for the thorough review, @martynamolak! It's commendable that you even went ahead and already implemented the non-controversial changes in the .janno file. And thanks for already looking through the suggestions, @EleniSef!

About the points you raised:

    1. Eleni is right, Date_Type == C14 requires Date_C14_Uncal_BP and Date_C14_Uncal_BP_Err to be set. I decided to specify it like this from an analysis point of view: A C14 date is only useful for any sort of derived analysis when the necessary info is there. Date_Note would be the right place to say that there is a C14 date somewhere supporting this dating.
    1. You're talking about these sites, right?
      image
      image
      I don't see a coordinate difference between I23705 and I23706 in the current version of the .janno file. Are the two different coordinates for La Playa (Sonora, Trincheras) the issue?

@EleniSef
Copy link
Contributor

I updated the Date_Note field to include this and fixed the coordinates errors.

I have also implemented all of Martyna's changes in the .janno so @jbv2 can upload it and we spare ourselves the issues with checksum and validation.

@nevrome
Copy link
Member

nevrome commented Jun 6, 2025

I simplified the documentation for the partially complete C14 ages in the .janno file and added the month field to the bibtex entry. Looks good now. Will merge.

@nevrome nevrome merged commit e448c98 into poseidon-framework:master Jun 6, 2025
1 check passed
@nevrome nevrome mentioned this pull request Oct 1, 2025
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants