Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

making ids (cual-ids) and alphanumeric ids spreadsheet friendly or addressing what happens to ids when read into Google sheets #90

Open
shiffer1 opened this issue Feb 8, 2018 · 1 comment

Comments

@shiffer1
Copy link

shiffer1 commented Feb 8, 2018

Improvement Description
As I have been working with these again I wanted to revisit this issue.
Just to reiterate, as it has been a while, the problems are when opening a metadata sheet that contains cual-ids (or any alpha-numeric ids) in Excel or Google Sheets are:

  • A cual-id with an e followed by a number causes an exponent:
    such as 891e3 would be read in as 891000
  • A cual-id with leading zeroes is read in by dropping the leading zeroes:
    such as 04567 would be read in as 4567
  • Any cual-id that is all numbers is identified as a number as opposed to text.

Proposed Behavior
Above we offered the solution of using importing. I used the import functions in both Excel and Google Sheets and this works well. It is a little bit technical and I worry about some folks ability to deal with it without a nice set of instructions. I definitely think we should provide some language on how to handle this. We currently recommend Keemei for verifying metadata sheets and I talked with @jairideout about this yesterday. We should have instructions in the tool descriptions for how to import these documents into Excel and Google Sheets in conjunction with caul-ids as we recommend both of these tools.

Comments

  1. All this being said, I think it may be worth revisiting the issue of whether to make adjustments to cual-ids allowed characters. The issue could be considered solved due to the fact that we have a work around by importing, however, it only takes one click of a button to undo the formatting of the cual-ids. I think we should consider that many of the people we are handing these ids off to (lab work, sequencing centers, etc.) may not have a great deal of technical aptitude and we are putting the onus of making sure the files are imported instead of just opened on the user to make sure it is done properly. While I agree we want to avoid making exceptions for a bunch tools, both Excel and Google Sheets are widely used by the community at large, which of course includes the labs and sequencing centers.
  2. I believe a simple fix of disallowing the use of the e character would solve 90% of the problems associated with just opening the files. The problems of all numbers or leading zeroes is rather minor as these can be easily interpreted.
  3. I believe we need to either make this exception or provide proper instructions on both the cual-id and keemei docs or do both of these things. Thoughts?
@jairideout
Copy link
Member

Thanks @shiffer1! I think we could add some recommendations for importing spreadsheets to Keemei's docs (e.g. don't try to infer column types when importing; keep everything as strings).

You might also create an issue on the cual-id repo to discuss cual-id specifics (e.g. removing use of the e character). That seems like a separate issue/discussion from updating Keemei's docs. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants