Skip to content

Promote columns #64

@davidanthoff

Description

@davidanthoff

Assume this file:

a,b,c,d,e,f
1863001,134,9981,1.0019,1.0000,0.0246182823
1863001,134,9982,1.0046,1.0000,0.0057100506
1863001,134,9983,1.0186,1.0000,-0.007331108
1863001,134,9984,1.0069,1.0000,0.0052499037
1863001,134,9985,0.9994,1.0000,0.0017077061
1863001,134,9986,1.0009,1.0000,0.0147836641
1863001,134,9987,0.9968,1.0000,-0.004429422
1863001,134,9988,1.0006,1.0000,0.0047920628
1863001,134,9989,1.0143,1.0000,-0.009376122
1863001,134,9990,1.0268,1.0000,-0.002704543
1863001,134,9991,1.0124,1.0000,-0.005628563
1863001,134,9992,1.0053,1.0000,0.0098183902
1863001,134,9993,1.0026,1.0000,0.0098087401
1863001,134,9994,0.9935,1.0000,0.0029873303
1863001,134,9995,1.0112,1.0000,0.0021465219
1863001,134,9996,0.9926,1.0000,-0.006265291
1863001,134,9997,0.9968,1.0000,0.0061212349
1863001,134,9998,0.9999,1.0000,-0.00676656
1863001,134,9999,1.0012,1.0000,0.0010823933
1863001,134,10000,1.0009,1.0000,-0.002033899
1863209,137,0,1.0000,"2,773.9000",

The column type detection doesn't get it right initially here because the change in the type occurs after col 20. Right now reading this fails with:

Main> TextParse.csvread("foo.csv")
MethodError: Cannot `convert` an object of type Float64 to an object of type TextParse.StrRange
This may have arisen from a call to the constructor TextParse.StrRange(...),
since type constructors fall back to convert methods.ERROR: CSV parsing error in foo.csv at line 24 char 21:
1863209,137,0,1.0000,"2,773.9000",
____________________^
column 5 is expected to be: TextParse.Field{Float64,TextParse.Numeric{Float64}}(<Float64>, true, true, false)
Stacktrace:
 [1] copy!(::Array{TextParse.StrRange,1}, ::Int64, ::Array{Float64,1}, ::Int64, ::Int64) at .\abstractarray.jl:691
 [2] promote_column(::Array{Float64,1}, ::Int64, ::Type{T} where T, ::Type{T} where T, ::Bool) at C:\Users\david\.julia\v0.6\TextParse\src\csv.jl:470
 [3] promote_field(::String, ::TextParse.Field{Float64,TextParse.Numeric{Float64}}, ::Array{Float64,1}, ::TextParse.CSVParseError, ::Array{String,1}, ::Type{T} where T) at C:\Users\david\.julia\v0.6\TextParse\src\csv.jl:435
 [4] (::TextParse.##39#43{DataType,DataStructures.OrderedDict{Union{Int64, String},AbstractArray{T,1} where T}})(::String, ::Int64) at C:\Users\david\.julia\v0.6\TextParse\src\csv.jl:345
 [5] collect(::Base.Generator{Base.Iterators.Zip2{Array{String,1},UnitRange{Int64}},Base.##3#4{TextParse.##39#43{DataType,DataStructures.OrderedDict{Union{Int64, String},AbstractArray{T,1} where T}}}}) at .\array.jl:475
 [6] #_csvread_internal#35(::Bool, ::Char, ::Char, ::Bool, ::Type{T} where T, ::Bool, ::Int64, ::Void, ::Int64, ::Void, ::Bool, ::Array{String,1}, ::Array{String,1}, ::DataStructures.OrderedDict{Union{Int64, String},AbstractArray{T,1} where T}, ::Int64, ::Void, ::Array{Any,1}, ::String, ::Int64, ::TextParse.#_csvread_internal, ::String, ::Char) at C:\Users\david\.julia\v0.6\TextParse\src\csv.jl:341
 [7] (::TextParse.#kw##_csvread_internal)(::Array{Any,1}, ::TextParse.#_csvread_internal, ::String, ::Char) at .\<missing>:0
 [8] (::TextParse.##31#33{Array{Any,1},String,Char})(::IOStream) at C:\Users\david\.julia\v0.6\TextParse\src\csv.jl:97
 [9] open(::TextParse.##31#33{Array{Any,1},String,Char}, ::String, ::String) at .\iostream.jl:152
 [10] #_csvread_f#29(::Array{Any,1}, ::Function, ::String, ::Char) at C:\Users\david\.julia\v0.6\TextParse\src\csv.jl:95
 [11] #csvread#25(::Array{Any,1}, ::Function, ::String, ::Char) at C:\Users\david\.julia\v0.6\TextParse\src\csv.jl:69
 [12] csvread(::String) at C:\Users\david\.julia\v0.6\TextParse\src\csv.jl:69
 [13] eval(::Module, ::Any) at .\boot.jl:235

Shouldn't this work, i.e. shouldn't that column type just be promoted automatically? With a loss of performance, but still?

@tk3369 initially found this in https://discourse.julialang.org/t/csv-misread/10966.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions