Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict for fixed effects #243

Closed
jariji opened this issue Jul 25, 2023 · 4 comments
Closed

predict for fixed effects #243

jariji opened this issue Jul 25, 2023 · 4 comments

Comments

@jariji
Copy link

jariji commented Jul 25, 2023

predict is not implemented for models with fixed effects but I would like to use this functionality.

# Join FE estimates onto data and sum row-wise
# This code does not work propertly with missing or with interacted fixed effect, so deleted
#if has_fe(m)
# df = DataFrame(t; copycols = false)
# fes = leftjoin(select(df, m.fekeys), unique(m.fe); on = m.fekeys, makeunique = true, #matchmissing = :equal)
# fes = combine(fes, AsTable(Not(m.fekeys)) => sum)
# out[nonmissings] .+= fes[nonmissings, 1]
#end

That code looks okay to me but the comment says it's wrong, so I'm reluctant to try implementing it myself lest I get it wrong. What is the problem with this code?

@matthieugomez
Copy link
Member

matthieugomez commented Jul 26, 2023

It does not work if there are missing variables in the original dataframe or if fixed effects are of the form fe(id)&fe(year) (i.e. id-year fixed effects). It would be awesome if you could write a code that handles these two things.

Here is some background: #204

@jariji
Copy link
Author

jariji commented Jul 26, 2023

Setting the missing issue aside for now, I'm looking at the case of interacted fixed effects. Doing the naive thing seems to work here. Am I missing something?

julia> using DataFrames, FixedEffectModels

julia> df = let
           halfX = allcombinations(DataFrame, :a => 1:3, :b => 10:10:30)
           X = vcat(halfX, halfX)
           d = DataFrame(X)
           d.y = rand(nrow(d))
           d
       end
18×3 DataFrame
 Row │ a      b      y         
     │ Int64  Int64  Float64   
─────┼─────────────────────────
   11     10  0.634415
   22     10  0.10137
   33     10  0.619162
   41     20  0.308558
   52     20  0.673735
   63     20  0.0323582
   71     30  0.0197685
   82     30  0.22085
   93     30  0.875045
  101     10  0.747533
  112     10  0.150399
  123     10  0.82051
  131     20  0.259925
  142     20  0.728193
  153     20  0.340064
  161     30  0.983969
  172     30  0.376881
  183     30  0.799643

julia> m = FixedEffectModels.reg(df, @formula(y ~ fe(a) * fe(b)), save = true)
                       FixedEffectModel                       
==============================================================
Number of obs:              18  Converged:                true
dof (model):                 0  dof (residuals):             3
R²:                      0.668  R² adjusted:            -0.880
F-statistic:               NaN  P-value:                   NaN
R² within:              -0.000  Iterations:                  3
==============================================================
  Estimate  Std. Error  t-stat  Pr(>|t|)  Lower 95%  Upper 95%
──────────────────────────────────────────────────────────────

==============================================================


julia> m.fe
18×5 DataFrame
 Row │ a      b      fe_a      fe_b        fe_a&fe_b  
     │ Int64  Int64  Float64?  Float64?    Float64?   
─────┼────────────────────────────────────────────────
   11     10  0.487636   0.0146608   0.188678
   22     10  0.429074   0.0146608  -0.31785
   33     10  0.53202    0.0146608   0.173155
   41     20  0.487636  -0.046219   -0.157175
   52     20  0.429074  -0.046219    0.318109
   63     20  0.53202   -0.046219   -0.29959
   71     30  0.487636   0.0315582  -0.0173249
   82     30  0.429074   0.0315582  -0.161766
   93     30  0.53202    0.0315582   0.273766
  101     10  0.487636   0.0146608   0.188678
  112     10  0.429074   0.0146608  -0.31785
  123     10  0.53202    0.0146608   0.173155
  131     20  0.487636  -0.046219   -0.157175
  142     20  0.429074  -0.046219    0.318109
  153     20  0.53202   -0.046219   -0.29959
  161     30  0.487636   0.0315582  -0.0173249
  172     30  0.429074   0.0315582  -0.161766
  183     30  0.53202    0.0315582   0.273766

julia> unique(m.fe)
9×5 DataFrame
 Row │ a      b      fe_a      fe_b        fe_a&fe_b  
     │ Int64  Int64  Float64?  Float64?    Float64?   
─────┼────────────────────────────────────────────────
   11     10  0.487636   0.0146608   0.188678
   22     10  0.429074   0.0146608  -0.31785
   33     10  0.53202    0.0146608   0.173155
   41     20  0.487636  -0.046219   -0.157175
   52     20  0.429074  -0.046219    0.318109
   63     20  0.53202   -0.046219   -0.29959
   71     30  0.487636   0.0315582  -0.0173249
   82     30  0.429074   0.0315582  -0.161766
   93     30  0.53202    0.0315582   0.273766

julia> fes = leftjoin(df, unique(m.fe); on=m.fekeys, makeunique=true)
18×6 DataFrame
 Row │ a      b      y          fe_a      fe_b        fe_a&fe_b  
     │ Int64  Int64  Float64    Float64?  Float64?    Float64?   
─────┼───────────────────────────────────────────────────────────
   11     10  0.634415   0.487636   0.0146608   0.188678
   22     10  0.10137    0.429074   0.0146608  -0.31785
   33     10  0.619162   0.53202    0.0146608   0.173155
   41     20  0.308558   0.487636  -0.046219   -0.157175
   52     20  0.673735   0.429074  -0.046219    0.318109
   63     20  0.0323582  0.53202   -0.046219   -0.29959
   71     30  0.0197685  0.487636   0.0315582  -0.0173249
   82     30  0.22085    0.429074   0.0315582  -0.161766
   93     30  0.875045   0.53202    0.0315582   0.273766
  101     10  0.747533   0.487636   0.0146608   0.188678
  112     10  0.150399   0.429074   0.0146608  -0.31785
  123     10  0.82051    0.53202    0.0146608   0.173155
  131     20  0.259925   0.487636  -0.046219   -0.157175
  142     20  0.728193   0.429074  -0.046219    0.318109
  153     20  0.340064   0.53202   -0.046219   -0.29959
  161     30  0.983969   0.487636   0.0315582  -0.0173249
  172     30  0.376881   0.429074   0.0315582  -0.161766
  183     30  0.799643   0.53202    0.0315582   0.273766

julia> combine(fes, AsTable(Not(m.fekeys)) => sum => :prediction)
18×1 DataFrame
 Row │ prediction 
     │ Float64    
─────┼────────────
   11.32539
   20.227254
   31.339
   40.592799
   51.3747
   60.218569
   70.521638
   80.519716
   91.71239
  101.43851
  110.276283
  121.54035
  130.544166
  141.42916
  150.526274
  161.48584
  170.675747
  181.63699

@matthieugomez
Copy link
Member

Hmm.. maybe what was missing was interaction with continuous variable, like y & fe(a)?

@nilshg
Copy link
Contributor

nilshg commented Sep 22, 2023

I had completely forgotten about #204 and the discussion had died down after my suggestion for dealing with the missing issue. Could you point me to an example of the interacted FE issue? It would be really good to get predict back, we just need a more comprehensive testset that covers the issues raised with my old predict implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants