Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[major] Speed up split_df by vectorizing for loops #1356

Merged
merged 9 commits into from Jun 30, 2023
Merged

Conversation

leoniewgnr
Copy link
Collaborator

@leoniewgnr leoniewgnr commented Jun 20, 2023

🔬 Background

This part of a bigger change, where successively NP will be sped up.
Split_df gets very slow, the more IDs are added

🔮 Key changes

The following sub functions are accelerated:

  • check_dataframe --> ~10x faster
  • handle_missing_data --> ~100x faster
  • split_df

⏩ Speed up in s

No. IDs Before total After Total Before check_datafram After check_dataframe Before handle_missing_data After handle_missing_data Before split_df After split_df
8 1.4 0.5 0.3 0.2 0.19 0.03 0.2 0.1
16 3.8 0.9 0.5 0.4 0.52 0.06 0.3 0.2
32 21.6 2.2 1.3 0.7 0.91 0.11 0.8 0.5
64 48.5 5.2 4.4 1.8 5.4 0.42 3.7 1.5
128 131.9 9 12 3.4 18 0.85 12 4.4
256 570.9 16 35 6.7 55 1.33 37 4.5
512 1504.1 35 161 14 151 1.18 97 6.4

@leoniewgnr leoniewgnr self-assigned this Jun 21, 2023
@leoniewgnr leoniewgnr added the status: in development Pull requests which are in development label Jun 21, 2023
@github-actions
Copy link

github-actions bot commented Jun 21, 2023

Model Benchmark

Benchmark Metric main current diff
AirPassengers MAE_val 13.0626 13.0626 0.0%
AirPassengers RMSE_val 15.9453 15.9453 0.0%
AirPassengers Loss_val 0.00131 0.00131 0.0%
AirPassengers MAE 9.88156 9.88156 0.0%
AirPassengers RMSE 11.7354 11.7354 0.0%
AirPassengers Loss 0.00052 0.00052 0.0%
AirPassengers time 5.36697 5.14 -4.23%
PeytonManning MAE_val 0.58159 0.58159 0.0%
PeytonManning RMSE_val 0.72216 0.72216 0.0%
PeytonManning Loss_val 0.01239 0.01239 0.0%
PeytonManning MAE 0.41671 0.41671 0.0%
PeytonManning RMSE 0.55961 0.55961 0.0%
PeytonManning Loss 0.00612 0.00612 0.0%
PeytonManning time 12.532 11.97 -4.48%
YosemiteTemps MAE_val 1.3442 1.3442 0.0%
YosemiteTemps RMSE_val 2.00245 2.00245 0.0%
YosemiteTemps Loss_val 0.00077 0.00077 0.0%
YosemiteTemps MAE 1.3192 1.3192 0.0%
YosemiteTemps RMSE 2.13518 2.13518 0.0%
YosemiteTemps Loss 0.00064 0.00064 0.0%
YosemiteTemps time 59.6048 55.81 -6.37% 🎉
Model training plots

Model Training

PeytonManning

YosemiteTemps

AirPassengers

@codecov
Copy link

codecov bot commented Jun 21, 2023

Codecov Report

Merging #1356 (9678f51) into main (2b303f8) will decrease coverage by 0.08%.
The diff coverage is 91.57%.

@@            Coverage Diff             @@
##             main    #1356      +/-   ##
==========================================
- Coverage   89.88%   89.81%   -0.08%     
==========================================
  Files          38       38              
  Lines        5103     5058      -45     
==========================================
- Hits         4587     4543      -44     
+ Misses        516      515       -1     
Impacted Files Coverage Δ
neuralprophet/df_utils.py 94.50% <87.75%> (-0.76%) ⬇️
neuralprophet/data/process.py 93.88% <95.65%> (+1.32%) ⬆️

@leoniewgnr leoniewgnr marked this pull request as ready for review June 22, 2023 17:37
@leoniewgnr leoniewgnr added status: needs review PR needs to be reviewed by Reviewer(s) and removed status: in development Pull requests which are in development labels Jun 22, 2023
@leoniewgnr leoniewgnr changed the title [major] Speed up split_df [major] Speed up split_df by vectorizing for loops Jun 30, 2023
Copy link
Owner

@ourownstory ourownstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent Work!

@ourownstory ourownstory merged commit 887ef3f into main Jun 30, 2023
14 checks passed
@ourownstory ourownstory deleted the speed-up-split-df branch June 30, 2023 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: needs review PR needs to be reviewed by Reviewer(s)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants