-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added thread_requests parameter to import_pbp_data and import_weekly … #46
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea, @bendominguez0111! Just a couple small fixes needed, plus a question on the dependency addition.
Cool! I will make some of these changes. I actually thought up another optimization for this but never got to it (removes the need to sort the CSVs at the end, which takes a considerable amount of time if youre pulling a lot of data) I'll add that in as well along with the changes requested |
… not to have to sort dfs when using threaded requests
Made those changes. Didn't move the |
Oh, and added some additional logic compared to last time to avoid sorting years at the end. Since the HTTP requests can resolve out of order, originally there would need to be a sort at the end which took up some time. Now just creating a fixed sized list and then inserting responses into it as their threads resolve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea on avoiding the extra sort operation, Ben! All seems to work as intended, just had a couple more quick suggestions you can commit if you like. We'll get this merged soon.
Co-authored-by: Alec Ostrander <alec.ostrander@gmail.com>
Co-authored-by: Alec Ostrander <alec.ostrander@gmail.com>
Co-authored-by: Alec Ostrander <alec.ostrander@gmail.com>
Committed those suggestions. Good call, those lines were a bit cluttered |
…data functions.
Added optional parameter to import_pbp_data and import_weekly_data to use threading to speed up requests for play by play data and weekly data. I also tested async and multiprocessing as well but threading posted the best results. Depending on connection, it sped up the speed at which PBP data from 1999 to 2022 was loaded by 25-50%.
Also added associated tests. I added a init.py file to the tests folder, because it was the only way I could run pytest (although I may have done something wrong there).
Did not open issue for this beforehand but spoke to @cooperdff on Twitter about the idea and thought it was a good idea.
Test results below: