Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults when clustering with large dataset #11

Closed
jmboehm opened this issue Jan 14, 2017 · 6 comments
Closed

Segfaults when clustering with large dataset #11

jmboehm opened this issue Jan 14, 2017 · 6 comments

Comments

@jmboehm
Copy link
Contributor

jmboehm commented Jan 14, 2017

Hi,

I'm getting a segmentation fault when trying to use clustered standard errors. The dataset is fairly large (46m obs); it could be that I'm just running out of memory. What do you think?

The regression command I'm using is

reg( C ~ interaction |> outputinput , df, VcovCluster([:outputinput]))

and it yields

signal (11): Segmentation fault: 11
while loading no file, in expression starting on line 0
macro expansion at /Users/johannes.boehm/.julia/v0.5/FixedEffectModels/src/vcov/vcovcluster.jl:74 [inlined]
macro expansion at ./simdloop.jl:73 [inlined]
helper_cluster at /Users/johannes.boehm/.julia/v0.5/FixedEffectModels/src/vcov/vcovcluster.jl:73
unknown function (ip: 0x316729720)
jl_call_method_internal at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia_internal.h:189 [inlined]
jl_apply_generic at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/gf.c:1942
shat! at /Users/johannes.boehm/.julia/v0.5/FixedEffectModels/src/vcov/vcovcluster.jl:55
unknown function (ip: 0x316728f66)
jl_call_method_internal at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia_internal.h:189 [inlined]
jl_apply_generic at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/gf.c:1942
vcov! at /Users/johannes.boehm/.julia/v0.5/FixedEffectModels/src/vcov/vcovcluster.jl:31 [inlined]
#reg#48 at /Users/johannes.boehm/.julia/v0.5/FixedEffectModels/src/reg.jl:332
unknown function (ip: 0x3166e0f30)
runRegressions at /Users/johannes.boehm/Documents/IndiaCourts/verticalDistance/verticalDistanceModule.jl:256
unknown function (ip: 0x3166baf26)
jl_call_method_internal at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia_internal.h:189 [inlined]
jl_apply_generic at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/gf.c:1942
do_call at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/interpreter.c:66
eval at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/interpreter.c:190
jl_toplevel_eval_flex at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/toplevel.c:558
jl_toplevel_eval_in_warn at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/builtins.c:590
eval at ./boot.jl:234
jlcall_eval_19752 at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
jl_call_method_internal at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia_internal.h:189 [inlined]
jl_apply_generic at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/gf.c:1942
eval_user_input at ./REPL.jl:64
unknown function (ip: 0x31667b8e6)
jl_call_method_internal at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia_internal.h:189 [inlined]
jl_apply_generic at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/gf.c:1942
macro expansion at ./REPL.jl:95 [inlined]
#3 at ./event.jl:68
unknown function (ip: 0x3166780df)
jl_call_method_internal at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia_internal.h:189 [inlined]
jl_apply_generic at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/gf.c:1942
jl_apply at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/./julia.h:1392 [inlined]
start_task at /Users/osx/buildbot/slave/package_osx10_9-x64/build/src/task.c:253
Allocations: 1077545854 (Pool: 1077544121; Big: 1733); GC: 866
Segmentation fault: 11

Running it without VcovCluster() works fine. Any help/thoughts are appreciated.

Best, Johannes

@matthieugomez
Copy link
Member

matthieugomez commented Jan 14, 2017 via email

@jmboehm
Copy link
Contributor Author

jmboehm commented Jan 14, 2017

Thanks for the quick response Matthieu. I went to the master branch using Pkg.checkout("FixedEffectModels")
but I'm still getting the same segfault. I should perhaps add that memory use before the segfault is not particularly high...

@jmboehm
Copy link
Contributor Author

jmboehm commented Dec 5, 2017

Seemed to have indeed been a memory issue.

@jmboehm jmboehm closed this as completed Dec 5, 2017
@matthieugomez
Copy link
Member

Ok. Out of curiosity could you tell me the size of your memory compared to the size of the dataset?

@jmboehm
Copy link
Contributor Author

jmboehm commented Dec 5, 2017

The dataframe was about 20gb (according to Base.summarysize() ); total memory on the machine was 64gb. Not sure it is/was really related to FixedEffectModels, actually. These days I tend to trim down my dataframes before running regressions, and that works ok for me.

@matthieugomez
Copy link
Member

Ok thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants