Use multithreading to fetch json faster #41

CafeAuLait-CC · 2022-10-03T00:45:35Z

Hello,

As s3s and stat.ink are both under active development, I'm still using the -o parameter to backup my data to local files quite often. Since the fetch_json() for ALL data is now taking forever to run, I added multithreading support for it.

I tried to set the max_workers to 10, and the total running time for me with the -o parameter has dropped from 4m45s to 45 seconds. So I hope this is useful.

clovervidia · 2022-10-05T22:34:04Z

I do agree that it does take some time to download the JSONs from SplatNet 3, but I'm not sure if using multithreading to speed it up is a good idea. We're walking a fine line in terms of accessing SplatNet 3, and I think going any faster with our requests might raise some red flags. I don't want to risk anyone's account getting banned.

CafeAuLait-CC · 2022-10-05T22:46:58Z

@clovervidia I totally understand your concern. I was thinking that when opening a webpage using a modern web browser, it would be a normal strategy for the browser to send dozens of requests at background and preload the links in that page. So I assumed a small number for the max_number_of_threads would be fine. Even 2 will make it twice faster than 1 for-loop.

But it's up to you to make the decision, I'm absolutely OK with it. After all, the -o is just a temporary feature, right?

frozenpandaman · 2022-10-06T00:32:44Z

Sorry for the late reply, I've been traveling.

I agree with @clovervidia that we don't want to do anything that the official app can't do. If we add multithreading, limiting the concurrent threads to 2 would probably be best. I don't think this should necessarily be closed – but @clovervidia and I should probably discuss further.

clovervidia · 2022-10-06T22:07:17Z

Yeah, we should probably discuss this. Limiting it to 2 or 3 threads might be all right.

CafeAuLait-CC · 2022-10-06T23:21:19Z

Please feel free to reopen this PR or keep it closed for now, which ever you wish.

niyari · 2022-10-07T03:48:35Z

I really like the idea of using threads. After I got the whole thing working, I think I prepared a PR to make it that way too.
For example, in any scene can separate display and capture.

frozenpandaman · 2022-11-17T08:38:44Z

Thanks a lot for keeping this updated @CafeAuLait-CC! Will probably limit the number of threads to 2 but will plan to take another closer look by this weekend and hopefully get a version of this merged in.

CafeAuLait-CC · 2022-11-17T15:23:33Z

Cool! I have just updated it and set the number to 2. Feel free to squash the commits when you are merging it.

frozenpandaman · 2022-11-20T16:58:34Z

So sorry to make you keep having to fix the merge conflicts. 😅 What is really holding things up is that I have a ton of local changes that aren't public/present in master yet – namely, support for Salmon Run. As soon as Hina adds the stat.ink web UI so you can actually see uploaded SR results (he told me it would take "maybe less than a week"!), I'll update to v0.2.0 and push all those changes. I can then immediately merge this PR in as well, so it can go along with that update. And the areas of code those touch vs. this PR touches are pretty different so I don't think there will be any conflicts then, but if there are I'm happy to resolve them myself.

Just, for now, I'm wary (really due to my own lack of Git confidence, sorry) about merging this and then having to try and pull these new upstream changes into my own non-committed/draft local version of the codebase… I think stuff would just get messy. But as soon as v0.2.0 goes out I will pull this in with it!

frozenpandaman · 2022-11-20T17:06:33Z

Also, would you mind explaining what's going on with the "passing in lists the length of battle_ids" parts in these two lines? I'm also curious what just the bare swim does here, like, referring to the function but…?
https://github.com/CafeAuLait-CC/s3s/blob/8d60a6a8ac2b541af318695a41db676798e443ba/s3s.py#L274-L276

Relatedly, why is swim an argument in fetch_detailed_result()?
https://github.com/CafeAuLait-CC/s3s/blob/8d60a6a8ac2b541af318695a41db676798e443ba/s3s.py#L302

Never done anything with concurrency in Python, really, so just for my own understanding if you have the time to just give a quick explanation that'd be cool!

CafeAuLait-CC · 2022-11-20T18:29:49Z

Don’t worry and take your time, I’m looking forward to seeing the new features like supporting for Salmon Run and Tri-color Turf War(in Hina’s plan) come out. Most of the conflicts are in the import part, so it’s not hard to fix at all.

As for your questions, the thread_pool.map() function takes iterable as arguments. It will create new threads using the function name which the threads will run and lists of job parameters used by that function. So each thread will look like:

thread	function to run	[arg1]	[arg2]	[arg3]
Thread 0	fetch_detailed_result	True	battle_id 1	swim()
Thread 1	fetch_detailed_result	True	battle_id 2	swim()
…	…	…	…	…

The arg columns are the lists passed in, and each thread will take an argument from each of the lists, then use them as the arguments for the fetch_detailed_result() function.

So Thread 0 will run fetch_detailed_result(arg1[0], arg2[0], arg3[0]), and Thread 1 will run fetch_detailed_result(arg1[1], arg2[1], arg3[1])

I’m not sure whether my explanation is clear enough. You can find a simple example of using map() in ”Python ThreadPoolExecutor map” section at this link.

And the reason swim is an argument in fetch_detailed_result() simply because swim() needs to be called every time right after a detailed record is fetched from the server, to make sure the progress bar works as expected. In your original code where loops is used instead of multi threading, swim() is call inside the loop, at every iteration. So for the parallelized version, swim() need’s to be called in every thread.

frozenpandaman · 2022-11-25T17:06:50Z

ok lol looks like i still made the commit history super gross when merging the multithreading branch into master but whatever. merged!

thank you, @CafeAuLait-CC, and for your explanation above as well, super appreciated! :)

CafeAuLait-CC · 2022-11-25T17:11:06Z

lol Thank you! And great to see v0.2.0 finally comes out!

use multithreading to fetch json faster

ca6f985

CafeAuLait-CC closed this Oct 5, 2022

Merge branch 'frozenpandaman:master' into master

3b3b9f6

Merge branch 'frozenpandaman:master' into master

0d1f68e

CafeAuLait-CC added 2 commits October 7, 2022 19:53

Merge branch 'frozenpandaman:master' into master

36c55af

Merge branch 'frozenpandaman:master' into master

f30ff93

frozenpandaman reopened this Oct 9, 2022

Merge branch 'frozenpandaman:master' into master

7462ae6

frozenpandaman mentioned this pull request Oct 10, 2022

v0.2.0 roadmap #49

Closed

11 tasks

frozenpandaman and others added 4 commits October 26, 2022 16:42

Merge branch 'master' into master

c3b5738

Merge branch 'frozenpandaman:master' into master

e3f7417

Merge branch 'frozenpandaman:master' into master

e0dfb09

Merge branch 'frozenpandaman:master' into master

30b89e9

Limit number of threads to 2

ed944ed

CafeAuLait-CC added 2 commits November 18, 2022 10:19

Merge branch 'frozenpandaman:master' into master

f7753b6

Merge branch 'frozenpandaman:master' into master

8d60a6a

frozenpandaman added 3 commits November 25, 2022 10:38

fix merge conflicts

9c23eaf

fix merge conflicts

4fc712e

Merge branch 'CafeAuLait-CC-master'

a6885e5

Merge branch 'master' of https://github.com/CafeAuLait-CC/s3s

7a432fe

frozenpandaman changed the base branch from master to multithreading November 25, 2022 17:02

frozenpandaman merged commit 08b8872 into frozenpandaman:multithreading Nov 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multithreading to fetch json faster #41

Use multithreading to fetch json faster #41

CafeAuLait-CC commented Oct 3, 2022 •

edited

Loading

clovervidia commented Oct 5, 2022

CafeAuLait-CC commented Oct 5, 2022 •

edited

Loading

frozenpandaman commented Oct 6, 2022

clovervidia commented Oct 6, 2022

CafeAuLait-CC commented Oct 6, 2022

niyari commented Oct 7, 2022 •

edited

Loading

frozenpandaman commented Nov 17, 2022

CafeAuLait-CC commented Nov 17, 2022

frozenpandaman commented Nov 20, 2022 •

edited

Loading

frozenpandaman commented Nov 20, 2022 •

edited

Loading

CafeAuLait-CC commented Nov 20, 2022 •

edited

Loading

frozenpandaman commented Nov 25, 2022 •

edited

Loading

CafeAuLait-CC commented Nov 25, 2022

Use multithreading to fetch json faster #41

Use multithreading to fetch json faster #41

Conversation

CafeAuLait-CC commented Oct 3, 2022 • edited Loading

clovervidia commented Oct 5, 2022

CafeAuLait-CC commented Oct 5, 2022 • edited Loading

frozenpandaman commented Oct 6, 2022

clovervidia commented Oct 6, 2022

CafeAuLait-CC commented Oct 6, 2022

niyari commented Oct 7, 2022 • edited Loading

frozenpandaman commented Nov 17, 2022

CafeAuLait-CC commented Nov 17, 2022

frozenpandaman commented Nov 20, 2022 • edited Loading

frozenpandaman commented Nov 20, 2022 • edited Loading

CafeAuLait-CC commented Nov 20, 2022 • edited Loading

frozenpandaman commented Nov 25, 2022 • edited Loading

CafeAuLait-CC commented Nov 25, 2022

CafeAuLait-CC commented Oct 3, 2022 •

edited

Loading

CafeAuLait-CC commented Oct 5, 2022 •

edited

Loading

niyari commented Oct 7, 2022 •

edited

Loading

frozenpandaman commented Nov 20, 2022 •

edited

Loading

frozenpandaman commented Nov 20, 2022 •

edited

Loading

CafeAuLait-CC commented Nov 20, 2022 •

edited

Loading

frozenpandaman commented Nov 25, 2022 •

edited

Loading