Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@transform and @where (maybe more) are broken on v0.7 #88

Closed
2 tasks
tbeason opened this issue Feb 6, 2018 · 8 comments
Closed
2 tasks

@transform and @where (maybe more) are broken on v0.7 #88

tbeason opened this issue Feb 6, 2018 · 8 comments

Comments

@tbeason
Copy link

tbeason commented Feb 6, 2018

I'm on Julia v0.7 and DataFramesMeta v3.0.0. It appears that @transform and @where are broken. I believe it could be related to some changes to broadcast, but most of the work in this package is done in macros which are very unfamiliar to me, so I'm struggling to debug it.

  • @transform is behaving badly. If I just extract the columns and do something like df[:c] = df[:a] .+ df[:b], I have no issues. Sometimes I get nonsense for the result, sometimes not.
julia> dd = DataFrame(a=[1,1,1,1,1,2,2,2,2,2],b=rand(10),c=rand(10))
10×3 DataFrame
│ Row │ a │ b         │ c          │
├─────┼───┼───────────┼────────────┤
│ 1   │ 1 │ 0.940118  │ 0.00203923 │
│ 2   │ 1 │ 0.443475  │ 0.600234   │
│ 3   │ 1 │ 0.958618  │ 0.984405   │
│ 4   │ 1 │ 0.657766  │ 0.826958   │
│ 5   │ 1 │ 0.0482096 │ 0.249115   │
│ 6   │ 2 │ 0.903136  │ 0.27147    │
│ 7   │ 2 │ 0.319808  │ 0.697216   │
│ 8   │ 2 │ 0.0525784 │ 0.890392   │
│ 9   │ 2 │ 0.223741  │ 0.978436   │
│ 10  │ 2 │ 0.297486  │ 0.176859   │

julia> dd[:b] ./ dd[:c]
10-element Array{Float64,1}:
 461.0164605051064
   0.7388362802425429
   0.9738049508515227
   0.7954040678869911
   0.19352378037309087
   3.3268399796579144
   0.4586931077205082
   0.059050832041258175
   0.22867234620825236
   1.6820560963139088

julia> @transform(dd, d = (:b ./ :c))
10×4 DataFrame
│ Row │ a │ b         │ c          │ d         │
├─────┼───┼───────────┼────────────┼───────────┤
│ 1   │ 1 │ 0.940118  │ 0.00203923 │ 461.016   │
│ 2   │ 1 │ 0.443475  │ 0.600234   │ 0.738836  │
│ 3   │ 1 │ 0.958618  │ 0.984405   │ 0.973805  │
│ 4   │ 1 │ 0.657766  │ 0.826958   │ 0.795404  │
│ 5   │ 1 │ 0.0482096 │ 0.249115   │ 0.193524  │
│ 6   │ 2 │ 0.903136  │ 0.27147    │ 3.32684   │
│ 7   │ 2 │ 0.319808  │ 0.697216   │ 0.458693  │
│ 8   │ 2 │ 0.0525784 │ 0.890392   │ 0.0590508 │
│ 9   │ 2 │ 0.223741  │ 0.978436   │ 0.228672  │
│ 10  │ 2 │ 0.297486  │ 0.176859   │ 1.68206   │

julia> @transform(dd, d = (:b ./ :c))
10×4 DataFrame
│ Row │ a │ b         │ c          │ d          │
├─────┼───┼───────────┼────────────┼────────────┤
│ 1   │ 1 │ 0.940118  │ 0.00203923 │ 0.00216912 │
│ 2   │ 1 │ 0.443475  │ 0.600234   │ 1.35348    │
│ 3   │ 1 │ 0.958618  │ 0.984405   │ 1.0269     │
│ 4   │ 1 │ 0.657766  │ 0.826958   │ 1.25722    │
│ 5   │ 1 │ 0.0482096 │ 0.249115   │ 5.16732    │
│ 6   │ 2 │ 0.903136  │ 0.27147    │ 0.300586   │
│ 7   │ 2 │ 0.319808  │ 0.697216   │ 2.18011    │
│ 8   │ 2 │ 0.0525784 │ 0.890392   │ 16.9346    │
│ 9   │ 2 │ 0.223741  │ 0.978436   │ 4.37307    │
│ 10  │ 2 │ 0.297486  │ 0.176859   │ 0.59451    │
  • When using @where sometimes I get an error that really makes no sense given what I'm asking it to do. It seems like it is trying to mix the columns together or something, based on the error I'm seeing. If I do one subset at a time, it works.
julia> dd = DataFrame(a = [Date(1998),Date(1999),Date(2000)],b=rand(3))
3×2 DataFrame
│ Row │ a          │ b        │
├─────┼────────────┼──────────┤
│ 1   │ 1998-01-01 │ 0.100379 │
│ 2   │ 1999-01-01 │ 0.516064 │
│ 3   │ 2000-01-01 │ 0.541581 │

julia> @where(dd, :b .>= 0.2, :a .>= Date(1998))
ERROR: MethodError: no method matching isless(::Float64, ::Date)
Closest candidates are:
  isless(::Float64, ::Float64) at float.jl:457
  isless(::Missing, ::Any) at missing.jl:62
  isless(::AbstractFloat, ::AbstractFloat) at operators.jl:124
  ...
Stacktrace:
 [1] <(::Float64, ::Date) at .\operators.jl:227
 [2] <= at .\operators.jl:273 [inlined]
 [3] >= at .\operators.jl:297 [inlined]
 [4] (::getfield(, Symbol("##331#334")))(::Date, ::Float64, ::Date) at .\<missing>:0
 [5] broadcast_nonleaf(::Function, ::Base.Broadcast.VectorStyle, ::Type{Union{}}, ::Tuple{Base.OneTo{Int64}}, ::Array{Date,1}, ::Vararg{Any,N} where N) at .\broadcast.jl:649
 [6] broadcast(::Function, ::Base.Broadcast.VectorStyle, ::Type{Union{}}, ::Tuple{Base.OneTo{Int64}}, ::Array{Date,1}, ::Array{Float64,1}, ::Vararg{Any,N} where N) at .\broadcast.jl:626
 [7] broadcast at .\broadcast.jl:618 [inlined]
 [8] broadcast at .\broadcast.jl:615 [inlined]
 [9] (::getfield(, Symbol("###1843#333")))(::Array{Float64,1}, ::Array{Date,1}) at C:\Users\tbeason\.julia\v0.7\DataFramesMeta\src\DataFramesMeta.jl:70
 [10] (::getfield(, Symbol("##330#332")))(::DataFrame) at C:\Users\tbeason\.julia\v0.7\DataFramesMeta\src\DataFramesMeta.jl:72
 [11] where(::DataFrame, ::getfield(, Symbol("##330#332"))) at C:\Users\tbeason\.julia\v0.7\DataFramesMeta\src\DataFramesMeta.jl:194
 [12] top-level scope
@nalimilan
Copy link
Member

@bramtayl saved us last time by porting the macros to 0.6, maybe he has ideas? It would also be interesting to try reproducing the bug directly with where, i.e. without macros.

@bramtayl
Copy link
Contributor

bramtayl commented Feb 7, 2018

I dont think theres a macro issue here:

julia> MacroTools.prettify(@macroexpand @transform(dd, d = (:b ./ :c)))
:(transform(dd, d=(barracuda->begin
                  function echidna(lion, dinosaur)
                      dinosaur ./ lion
                  end
                  echidna(barracuda[:c], barracuda[:b])
              end)))
julia> MacroTools.prettify(@macroexpand @where(dd, :b .>= 0.2, :a .>= Date(1998)))
:(where(dd, (barracuda->begin
              function echidna(lion, dinosaur)
                  (dinosaur .>= 0.2) .& (lion .>= Date(1998))
              end
              echidna(barracuda[:b], barracuda[:a])
          end)))

Both seem reasonable to me, meaning the issues are in transform and where

@bramtayl
Copy link
Contributor

bramtayl commented Feb 7, 2018

Oh wait the argument order is backwards for where...

@bramtayl
Copy link
Contributor

bramtayl commented Feb 7, 2018

Ok so here's what I'm getting:

The order of the arguments for the "ecdina" function come in a random/inconsistent order. These arguments are the keys of the membernames Dict. Aren't keys and values are supposed to return items in a consistent order?

This behavior only appears when with_helper is nested within other functions. That is, this is working consistently for me:

:($d -> $(DataFramesMeta.with_helper(d, body))) |> MacroTools.prettify

But this is not:

DataFramesMeta.with_anonymous(body) |> MacroTools.prettify

@nalimilan
Copy link
Member

Good catch! AFAIK entries in a dictionary are unordered, but it's guaranteed that keys and values use the same order. So maybe something has been broken in Base. It would help if we could identify the commit which broke this code. If it still works on the outdated Windows nightlies, we should be able to find it without too much work. Do you confirm that?

@bramtayl
Copy link
Contributor

No it doesn't work on the current Windows nightly. I'm not sure how to figure out when the last time it was working...

@nalimilan
Copy link
Member

I've found a MWE, filed in Julia as JuliaLang/julia#26359.

@nalimilan
Copy link
Member

Got an explanation on the Julia issue. #91 should fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants