Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integer values are being inappropriately #1

Merged
merged 1 commit into from
Mar 1, 2016

Conversation

robmiller
Copy link
Owner

Sample CSV attached. These should be strings containing integers, and shouldn't be treated as integers at all. Not sure the best way to handle this because either use case is valid. Maybe some configure option? --FIELDNAME "FIELDTYPE" to override the built-in behaviour?

> @db["select * from integer where _incsv_id = 5"].to_a
=> [{:_incsv_id=>5, :date=>"2016-02-29T23:51:45+00:00", :tracking=>9.27489999220832e+21}]
> @db["select * from integer where _incsv_id = 5"].to_a.first[:tracking]
=> 9.27489999220832e+21
> @db["select * from integer where _incsv_id = 5"].to_a.first[:tracking].to_s
=> "9.27489999220832e+21"

integer.zip

@robmiller
Copy link
Owner

How very strange. This seems to be happening at the SQLite level: if I run incsv on the CSV you supply, and dump the schema, I see that the column is a string:

sqlite> .schema
CREATE TABLE `integer` (`_incsv_id` integer NOT NULL PRIMARY KEY AUTOINCREMENT, `date` string, `tracking` string);

(I don't actually do any conversion of integers/floats at the moment — just dates, currency, and strings — but it's obviously one of the ones to add, along with stuff like datetimes, with all the fun of timezones.)

I'll need to get my head further around type affinities in SQLite. But I guess that the data here is being inserted with no affinity (as a blob?) and then SQLite is casting it to a number.

Thanks for this, a good one to dig into! I guess it will inevitably lead to an issue I'd already considered, which is: should you be able to override guesses for column types? Or just disable guessing? If you override them do you need to specify all of them (since that would make an easy interface — --types=integer,integer,string,date) or just override one (e.g. --column-2=integer)?

@robmiller
Copy link
Owner

By the way: I know they're just examples, but in the spirit of using Sequel you can, FYI, write your two example queries much more concisely:

> @db[:integer].first(_incsv_id: 5)
> # or if you end up wanting multiple/more complex conditions:
> @db[:integer].where(_incsv_id: 5).first
> # or multiple rows (so an array of hashes):
> @db[:integer].where(_incsv_id: 5).all
> # first returns just a hash so you can access properties directly:
> @db[:integer].first(_incsv_id: 5)[:tracking]

More info:

http://sequel.jeremyevans.net/rdoc/files/doc/dataset_basics_rdoc.html
http://sequel.jeremyevans.net/rdoc/files/doc/querying_rdoc.html
http://sequel.jeremyevans.net/rdoc/files/doc/dataset_filtering_rdoc.html

@jaspertandy
Copy link
Author

So for this sort of tool, my preference is brevity. I usually need to use this sort of thing to turn some data into a script or something similar. So I'll be using it to write a .rb or a .sql or something similar. To that end, I would say that the bare minimum for the command would be preferable for me. I'd only want to specify types for the things that were wrong, or things for which I had some preference. Excel messes with numeric string coercion so badly, it stands to reason that SQLite might also have a hard time with it, I guess!

Sequel is one of those things that I don't really have much of a reason to look into (though I wish it was available on ActiveRecord instances because I hate the default query builder). I only started to get into it because you mentioned it recently.

On 1 Mar 2016, 23:03 +0000, Rob Millernotifications@github.com, wrote:

By the way: I know they're just examples, but in the spirit of using Sequel you can, FYI, write your two example queries much more concisely:

@db[:integer].first(_incsv_id: 5)># or if you end up wanting multiple/more complex conditions:>@db[:integer].where(_incsv_id: 5).first># or multiple rows (so an array of hashes):>@db[:integer].where(_incsv_id: 5).all># first returns just a hash so you can access properties directly:>@db[:integer].first(_incsv_id: 5)[:tracking]

More info:

http://sequel.jeremyevans.net/rdoc/files/doc/dataset_basics_rdoc.html
http://sequel.jeremyevans.net/rdoc/files/doc/querying_rdoc.html
http://sequel.jeremyevans.net/rdoc/files/doc/dataset_filtering_rdoc.html


Reply to this email directly orview it on GitHub(#1 (comment)).

This not only prevents truncation of longer columns, but also avoids an
issue where SQLite was incorrectly casting numeric text fields to
numbers (since Sequel was specifying a column type of "string", which
SQLite seems to assign no affinity to).
@robmiller
Copy link
Owner

I seem to have got to the bottom of this: when you create fields as strings in Sequel, they're created in a way that gives them no affinity in SQLite. Creating them as TEXT instead gives them an affinity of text which doesn't do this numeric mangling. I'm pushing a fix now.

For your use-case: I guess you'll definitely want a "strings only" mode, where all columns are just left as-is. I wonder whether those two modes (that and the current default) will be enough for all use cases, and I can avoid trying to think of a way to do the manual overrides thing… I suspect not!

robmiller added a commit that referenced this pull request Mar 1, 2016
Fix for numeric fields that were being inappropriately cast to floats by SQLite
@robmiller robmiller merged commit 1623976 into master Mar 1, 2016
@robmiller robmiller deleted the fix/incorrect-casting branch March 1, 2016 23:18
@jaspertandy
Copy link
Author

Should be enough for my usecase. If I get some time this week I'll see if I can make a PR for custom column types.

I seem to have got to the bottom of this: when you create fields as strings in Sequel, they're created in a way that gives them no affinity in SQLite. Creating them asTEXTinstead gives them an affinity of text which doesn't do this numeric mangling. I'm pushing a fix now.

For your use-case: I guess you'll definitely want a "strings only" mode, where all columns are just left as-is. I wonder whether those two modes (that and the current default) will be enough for all use cases, and I can avoid trying to think of a way to do the manual overrides thing… I suspect not!


Reply to this email directly orview it on GitHub(#1 (comment)).

@robmiller
Copy link
Owner

Oh, your point about output raises another point: I want to have some simple helpers that make output easier, so you can do e.g.:

> table @db[:foo].first
+-----+-----+
| foo | bar |
+-----+-----+
| 24  | 89  |
+-----+-----+
> csv @db[:foo].first
foo,bar
24,89
> csv "foo.csv", @db[:foo].first

…and so on. At the moment it's good only for exploratory analysis, and not so good for then getting your output into another tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants