Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metric.values - speed up #2

Open
leppott opened this issue Mar 19, 2018 · 7 comments
Open

metric.values - speed up #2

leppott opened this issue Mar 19, 2018 · 7 comments
Labels

Comments

@leppott
Copy link
Owner

leppott commented Mar 19, 2018

Adding 11 dominant metrics slowed up metric.values() considerably. Try to speed up.

@leppott
Copy link
Owner Author

leppott commented Mar 19, 2018

Seemed mostly ok until had to define pipe, %>%. That is when it slowed down but otherwise got an error if dplyr wasn't loaded.

line 270 in metric.values.R

define pipe

%>% <- dplyr::%>%

Should clean up the extra data frames that were created for the dominant metrics. This may free up some memory and keep things from getting sluggish.

@leppott
Copy link
Owner Author

leppott commented Mar 19, 2018

Only minor improvement with removing extra data frames.

Defining pipe seems to be the issue.

leppott referenced this issue in leppott/BCGcalc Mar 19, 2018
Metric changes for nonclumpy and removed extra dominant data frames for
speed.  And ReadMe packages.  Issue #4 and Issue #5
@leppott
Copy link
Owner Author

leppott commented Mar 19, 2018

Nothing else to do at this point other than to disable the metrics if don't need.

I could add a trigger in the calling routine to set the max number of Dominant metrics to generate.

Then I could modify the code to loop through until have all of those metrics added.

This might be a good idea even if not putting into master function call.

@leppott leppott transferred this issue from leppott/BCGcalc Nov 19, 2018
@leppott leppott added the enhancement New feature or request label Nov 20, 2018
@leppott
Copy link
Owner Author

leppott commented Jul 3, 2019

Time metric.values:

image

@leppott
Copy link
Owner Author

leppott commented Jul 23, 2020

Better but still a bit slow.

This is example number 2 in metric.values().
678 samples.

image

image

@leppott
Copy link
Owner Author

leppott commented Dec 12, 2022

Ran a 19 MB, 125k records (2000+ samples) file and it took somewhere between 45 min and 1 hour.

Have added lots of metrics so may have to break apart and then merge back together.

@leppott
Copy link
Owner Author

leppott commented Dec 23, 2022

Things to try:

  1. group_by first or inside of summarise (current configuration).

image

  1. Switch packages (requiring a rewrite) from dplyr::summarise to data.table.

  2. Check if can add conditions to the data then evaluate them more easily in the summarise.

https://community.rstudio.com/t/dplyr-summarise-with-condition/100885/3

image

@leppott leppott pinned this issue May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant