Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need floor before Int64 #80

Merged
merged 2 commits into from
Nov 1, 2019
Merged

need floor before Int64 #80

merged 2 commits into from
Nov 1, 2019

Conversation

snthot
Copy link
Contributor

@snthot snthot commented Oct 29, 2019

I worked on Julia Version 1.2.0 (2019-08-20) with omnisci/core-os-cpu docker.

I bumped into trouble when applying TColumn on Datetime column.
I think we need "floor" before converting into Int64 and it works well.

No obligation, just want to have it worked well :-)

@codecov
Copy link

codecov bot commented Oct 29, 2019

Codecov Report

Merging #80 into master will increase coverage by 61.7%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #80      +/-   ##
=========================================
+ Coverage    23.7%   85.5%   +61.7%     
=========================================
  Files          11       6       -5     
  Lines        2563     221    -2342     
=========================================
- Hits          609     189     -420     
+ Misses       1954      32    -1922
Impacted Files Coverage Δ
src/constructors.jl 90.2% <100%> (ø) ⬆️
src/mapd/MapD.jl
src/mapd/mapd_types.jl
src/completion_hints/completion_hints_types.jl
...rialized_result_set/serialized_result_set_types.jl
src/common/common_types.jl

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab11d57...0842885. Read the comment docs.

@randyzwitch
Copy link
Contributor

Thanks for checking out the package @snthot! Can you provide a brief example that demonstrates the error you saw? Prior to this fix, the test suite passed with this functionality, so I want to make sure I'm accounting for any edge cases that might be present.

@snthot
Copy link
Contributor Author

snthot commented Oct 30, 2019

oh sure, here is the case:
I have a DateTime column in a DataFrame which filled with now() function. I got error when trying to convert the column to TColumn. Please see below.

julia> using DataFrames

julia> using OmniSci

julia> using Dates

julia> df1 = DataFrame(A=[1,2], B=[DateTime(2013,7,1,12,30,59), DateTime(2013,7,1,12,30,59)])
2×2 DataFrame
│ Row │ A     │ B                   │
│     │ Int64 │ DateTime            │
├─────┼───────┼─────────────────────┤
│ 1   │ 1     │ 2013-07-01T12:30:59 │
│ 2   │ 2     │ 2013-07-01T12:30:59 │

julia> df2 = DataFrame(A=[1,2], B=[DateTime(2013,7,1,12,30,59), now()])
2×2 DataFrame
│ Row │ A     │ B                       │
│     │ Int64 │ DateTime                │
├─────┼───────┼─────────────────────────┤
│ 1   │ 1     │ 2013-07-01T12:30:59     │
│ 2   │ 2     │ 2019-10-30T16:06:57.018 │

julia> TColumn.(eachcol(df1))
2-element Array{TColumn,1}:
 TColumn(OmniSci.TColumnData([1, 2], #undef, #undef, #undef), Bool[false, false])
 TColumn(OmniSci.TColumnData([1372681859, 1372681859], #undef, #undef, #undef), Bool[false, false])

julia> TColumn.(eachcol(df2))
ERROR: InexactError: Int64(1.572451617018e9)
Stacktrace:
 [1] Type at ./float.jl:703 [inlined]
 [2] myInt64 at /home/santi/.julia/packages/OmniSci/OmRhl/src/constructors.jl:38 [inlined]
 [3] _broadcast_getindex_evalf at ./broadcast.jl:578 [inlined]
 [4] _broadcast_getindex at ./broadcast.jl:551 [inlined]
 [5] getindex at ./broadcast.jl:511 [inlined]
 [6] macro expansion at ./broadcast.jl:843 [inlined]
 [7] macro expansion at ./simdloop.jl:73 [inlined]
 [8] copyto! at ./broadcast.jl:842 [inlined]
 [9] copyto! at ./broadcast.jl:797 [inlined]
 [10] copy at ./broadcast.jl:773 [inlined]
 [11] materialize at ./broadcast.jl:753 [inlined]
 [12] TColumn(::Array{DateTime,1}) at /home/santi/.julia/packages/OmniSci/OmRhl/src/constructors.jl:325
 [13] _broadcast_getindex_evalf at ./broadcast.jl:578 [inlined]
 [14] _broadcast_getindex at ./broadcast.jl:551 [inlined]
 [15] getindex at ./broadcast.jl:511 [inlined]
 [16] copyto_nonleaf!(::Array{TColumn,1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Tuple{Base.OneTo{Int64}},Type{TColumn},Tuple{Base.Broadcast.Extruded{DataFrames.DataFrameColumns{DataFrame,AbstractArray{T,1} where T},Tuple{Bool},Tuple{Int64}}}}, ::Base.OneTo{Int64}, ::Int64, ::Int64) at ./broadcast.jl:928
 [17] copy at ./broadcast.jl:791 [inlined]
 [18] materialize(::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,Type{TColumn},Tuple{DataFrames.DataFrameColumns{DataFrame,AbstractArray{T,1} where T}}}) at ./broadcast.jl:753
 [19] top-level scope at none:0

I noticed that your test pass because the DateTime data are defined down to second(s) level but the now() function normally go down to millisecond level so I got the trouble. And this prevented me from using 'load_table_binary_columnar' function. Once I fixed this problem by applying 'floor' I found that I can convert high precision datetime to TColumn (with losses) and load the data to OmniSci correctly.

I am not sure whether there is a better way to fix this. But it looks like Omnisci supports high precision timestamp so you may go with Float64 instead of Int64 ?
Please advise.

@randyzwitch
Copy link
Contributor

@snthot Thanks for the example, yes that's an oversight on my part. Your solution fixes the timestamp at the seconds level (which is what OmniSci supported when I implemented this), but it's a good question about how to implement higher-resolution timestamps

@randyzwitch
Copy link
Contributor

I confirmed internally that we do want to cast to Int.

Could you make your change mydatetime2unix(x) = datetime2unix(floor(x, Dates.Second)) so that the operation here is explicit in rounding to the second? Also, if you could switch one of the values in the tests to use now() like it caused your error, that will help to not make this mistake in the future.

Separately, I'll open up an issue about supporting high-resolution timestamps.

Thanks!

@snthot
Copy link
Contributor Author

snthot commented Nov 1, 2019

great! I absolutely agree!
thanks!

@randyzwitch randyzwitch merged commit a9e98b7 into heavyai:master Nov 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants