Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large data sets cause browser to crash #320

Open
jwildfire opened this issue Feb 14, 2020 · 3 comments
Open

Large data sets cause browser to crash #320

jwildfire opened this issue Feb 14, 2020 · 3 comments
Projects
Milestone

Comments

@jwildfire
Copy link
Contributor

Summary

Very large data (~100 mb+) sets cause severe performance issues, and may not render at all.

Details

After the data is loaded, the codebook shouldn't need huge amounts of time to summarize the data and render the page. After doing some basic profiling, I'm 99% sure issues are due to inefficient data handling in various places in the code.

@samussiah samussiah added this to the v1.8.0 milestone Mar 16, 2020
@samussiah samussiah added this to To do in v1.8.0 via automation Mar 16, 2020
@samussiah samussiah moved this from To do to In progress in v1.8.0 Mar 16, 2020
@samussiah
Copy link
Contributor

samussiah commented Mar 18, 2020

  • makeSummary takes ~60% of load time
    • determineType takes a significant amount of that time. Avoid checking every variable value; rather loop through values until a value identifies the variable's type.
    • coerce to numeric once here, and only here
    • define value arrays once
    • figure out how to preserve record index, used when clicking bars; defining an array of objects that is the same size as the input data array is computationally intensive
  • draw takes ~30% of load time
    • makeTitle takes a lot of time
    • lots of garbage collection at the bottom of makeHist
    • avoid creating new data arrays in the charts
    • a

@jwildfire
Copy link
Contributor Author

  • makeSummary takes ~60% of load time

    • determineType takes a significant amount of that time. Avoid checking every variable value; rather loop through values until a value identifies the variable's type.

Could also recommend the user provide type for each column and avoid this altogether in large data sets. We could just pass in the R column types in datadigest.

@samussiah
Copy link
Contributor

  • makeSummary takes ~60% of load time

    • determineType takes a significant amount of that time. Avoid checking every variable value; rather loop through values until a value identifies the variable's type.

Could also recommend the user provide type for each column and avoid this altogether in large data sets. We could just pass in the R column types in datadigest.

Having R determine the data types would definitely take some load off the browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
v1.8.0
  
In progress
Development

No branches or pull requests

2 participants