Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Uncaught incorrect value 'X' of type string passed to column 'Y with type number #214

Open
protobi opened this Issue Oct 22, 2013 · 0 comments

Comments

Projects
None yet
1 participant
@ghost

ghost commented Oct 22, 2013

Is there an option to tell the Miso Dataset CSV parser to look at all the values before determining that a column should be of type other than String?

The CSV parser can be brittle in practice for general datasets, not looking far enough down to detect the row types and then unforgiving about type mismatches, failing with a hard error.

I keep running into errors of the form "Uncaught incorrect value 'X' of type string passed to column 'Y with type number". This happens in cases where

  • A column has values, ( e.g. andY) not allowing "Y" in a boolean type column
  • A column has ICD9 codes that are often numeric (e.g. 40100, and fails on the first V code (e.g. V0100)
  • A column has postal codes, which look like a number (e.g.02138) but are really strings, and fails on the first Canadian codes (e.g. K1A0B1)
  • A column has medical ID numbers, which look like numbers for most physicians (e.g. 00002348938) but include alphanumerics for nurses and other HCPs.

Many stats packages look at the first 100 rows by default, and have an option to scan more or even all rows before assessing column type.

** Update **
I see builder.js line 23 has the code, so I just need to find a way to parameterize the 5:
var type = _.inject(data.slice(0, 5), function(memo, value) {

Created a quick patch to always scan all the values. Since a type mismatch is a fatal error, seems more appropriate to make a complete scan the default, and make a partial scan an option.

https://github.com/gradualstudent/dataset/tree/master/dist

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment