-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reading of malformed CSV #54
Comments
When I started developing this package, I kind of implicitly optimized for the "well-formed" csv file case so as to be able to focus on performance. That's part of the reason there's not as many "warnings" and such. Happy to accept PRs that improve things without sacrificing performance. In my mind, I'd like to have a single |
We now have the For these two examples: julia> io = IOBuffer("""A,B,C
1,1,10
6,1""")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=16, maxsize=Inf, ptr=1, mark=-1)
julia> CSV.validate(io)
ERROR: CSV.ExpectedMoreColumnsError("row=2, col=2: expected 3 columns, parsed 2, but parsing encountered unexpected end-of-file (EOF); parsed row: '6,1'")
Stacktrace:
[1] validate(::CSV.Source{Base.GenericIOBuffer{Array{UInt8,1}},Nulls.Null}) at /Users/jacobquinn/.julia/v0.7/CSV/src/validate.jl:26
[2] #validate#40(::Bool, ::Dict{Int64,Function}, ::Array{Any,1}, ::Function, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::Type{T} where T) at /Users/jacobquinn/.julia/v0.7/CSV/src/validate.jl:38
[3] validate(::Base.GenericIOBuffer{Array{UInt8,1}}) at /Users/jacobquinn/.julia/v0.7/CSV/src/validate.jl:38
julia> io = IOBuffer("""A;B;C
1,1,10
2,0,16""")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=19, maxsize=Inf, ptr=1, mark=-1)
julia> CSV.validate(io)
ERROR: CSV.TooManyColumnsError("row=1, col=1: expected 1 columns then a newline or EOF, but parsing encountered another delimiter: ','; parsed row: '1'")
Stacktrace:
[1] validate(::CSV.Source{Base.GenericIOBuffer{Array{UInt8,1}},Nulls.Null}) at /Users/jacobquinn/.julia/v0.7/CSV/src/validate.jl:30
[2] #validate#40(::Bool, ::Dict{Int64,Function}, ::Array{Any,1}, ::Function, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::Type{T} where T) at /Users/jacobquinn/.julia/v0.7/CSV/src/validate.jl:38
[3] validate(::Base.GenericIOBuffer{Array{UInt8,1}}) at /Users/jacobquinn/.julia/v0.7/CSV/src/validate.jl:38
julia> io = IOBuffer("""A;B;C
1,1,10
2,0,16""")
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=19, maxsize=Inf, ptr=1, mark=-1)
julia> CSV.validate(io; delim=';')
ERROR: CSV.ExpectedMoreColumnsError("row=1, col=1: expected 3 columns, parsed 1, but parsing encountered unexpected newline; parsed row: '1,1,10'")
Stacktrace:
[1] validate(::CSV.Source{Base.GenericIOBuffer{Array{UInt8,1}},Nulls.Null}) at /Users/jacobquinn/.julia/v0.7/CSV/src/validate.jl:26
[2] #validate#40(::Bool, ::Dict{Int64,Function}, ::Array{Any,1}, ::Function, ::Base.GenericIOBuffer{Array{UInt8,1}}, ::Type{T} where T) at /Users/jacobquinn/.julia/v0.7/CSV/src/validate.jl:38
[3] (::getfield(CSV, Symbol("#kw##validate")))(::Array{Any,1}, ::typeof(CSV.validate), ::Base.GenericIOBuffer{Array{UInt8,1}}, ::Type{T} where T) at ./<missing>:0 (repeats 2 times)
|
Below I describe three behaviors
CSV.read
on malformed CSV files that I found unexpected.I have the following file:
which is malformed as by error
;
is given in head instead of,
in the header.The behavior of three standard utilities for reading such a file in Julia is:
readcsv
from Base: loads whole file and replaces missing column names by empty strings;readtable
fromDataFrames
throws an error;CSV.read
reads only a single column of data into a data frame.Additionally:
readcsv
andreadtable
);readtable
throws an error).In documentation of
CSV.read
I have not found these behaviors described so I am not sure what is the intended functionality. I would recommend to at least to give a warning in those cases.The text was updated successfully, but these errors were encountered: