-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer values are being inappropriately #1
Conversation
How very strange. This seems to be happening at the SQLite level: if I run incsv on the CSV you supply, and dump the schema, I see that the column is a string:
(I don't actually do any conversion of integers/floats at the moment — just dates, currency, and strings — but it's obviously one of the ones to add, along with stuff like datetimes, with all the fun of timezones.) I'll need to get my head further around type affinities in SQLite. But I guess that the data here is being inserted with no affinity (as a blob?) and then SQLite is casting it to a number. Thanks for this, a good one to dig into! I guess it will inevitably lead to an issue I'd already considered, which is: should you be able to override guesses for column types? Or just disable guessing? If you override them do you need to specify all of them (since that would make an easy interface — |
By the way: I know they're just examples, but in the spirit of using Sequel you can, FYI, write your two example queries much more concisely:
More info: http://sequel.jeremyevans.net/rdoc/files/doc/dataset_basics_rdoc.html |
So for this sort of tool, my preference is brevity. I usually need to use this sort of thing to turn some data into a script or something similar. So I'll be using it to write a .rb or a .sql or something similar. To that end, I would say that the bare minimum for the command would be preferable for me. I'd only want to specify types for the things that were wrong, or things for which I had some preference. Excel messes with numeric string coercion so badly, it stands to reason that SQLite might also have a hard time with it, I guess! Sequel is one of those things that I don't really have much of a reason to look into (though I wish it was available on ActiveRecord instances because I hate the default query builder). I only started to get into it because you mentioned it recently. On 1 Mar 2016, 23:03 +0000, Rob Millernotifications@github.com, wrote:
|
This not only prevents truncation of longer columns, but also avoids an issue where SQLite was incorrectly casting numeric text fields to numbers (since Sequel was specifying a column type of "string", which SQLite seems to assign no affinity to).
I seem to have got to the bottom of this: when you create fields as strings in Sequel, they're created in a way that gives them no affinity in SQLite. Creating them as For your use-case: I guess you'll definitely want a "strings only" mode, where all columns are just left as-is. I wonder whether those two modes (that and the current default) will be enough for all use cases, and I can avoid trying to think of a way to do the manual overrides thing… I suspect not! |
Fix for numeric fields that were being inappropriately cast to floats by SQLite
Should be enough for my usecase. If I get some time this week I'll see if I can make a PR for custom column types.
|
Oh, your point about output raises another point: I want to have some simple helpers that make output easier, so you can do e.g.:
…and so on. At the moment it's good only for exploratory analysis, and not so good for then getting your output into another tool. |
Sample CSV attached. These should be strings containing integers, and shouldn't be treated as integers at all. Not sure the best way to handle this because either use case is valid. Maybe some configure option?
--FIELDNAME "FIELDTYPE"
to override the built-in behaviour?integer.zip