Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU is completely utilized #34

Closed
MoroseCorpse opened this issue Feb 2, 2021 · 12 comments
Closed

CPU is completely utilized #34

MoroseCorpse opened this issue Feb 2, 2021 · 12 comments

Comments

@MoroseCorpse
Copy link

When I want to download a complete playlist, my server's CPU is completely maxed out. I have 4 cores assigned to my VM and they are almost always all at 100% utilization. So far I have only tested the Spotify playlist download. I also already turned down the parallel downloads in the config and the result was the same

@miraclx
Copy link
Owner

miraclx commented Feb 2, 2021

Hi @MoroseCorpse, this is probably the fault of ffmpeg reencoding the audio stream gotten from YouTube into an AAC container for M4A audio

You should probably turn down the number of concurrent tasks for the encoder.

By default, (as at 6d8a90b see conf.json), freyr uses 6 concurrent ffmpeg instances to parallelize encoding multiple tracks at once.

You can override this with the -z, --concurrency <SPEC> flag, like this;

$ freyr -z encoder=1 <QUERY>

Or modify the conf.json file by changing the default 6 to 1 or whatever you are comfortable with.

{
  ...
  "concurrency": {
    "queries": 1,
    "tracks": 1,
    "trackStage": 6,
    "downloader": 4,
-   "encoder": 6,
+   "encoder": 1,
    "embedder": 10
  },
  ...
}

@MoroseCorpse
Copy link
Author

I have changed the encoder variable to 1 in the config but all 4 cores are still up to 100% utilization. Even when I write this -z encoder=1 in my command

@miraclx
Copy link
Owner

miraclx commented Feb 2, 2021

Can you inspect this with htop or something similar to see which processes are hogging up the CPU?

@MoroseCorpse
Copy link
Author

anZFxTw

@miraclx
Copy link
Owner

miraclx commented Feb 2, 2021

Clearly, the culprit seems to be youtube-dl, which understandably does some work trying to extract audio feeds for the youtube sources

You can turn concurrency down on that with the trackStage=1 spec in the command or trackStage entry in the conf file.

@miraclx
Copy link
Owner

miraclx commented Feb 2, 2021

If you want to combine these specs for the concurrency in the cli flags, you can merge them like this

$ freyr -z "encoder=1,trackStage=1" <QUERY>

or like this

$ freyr -z encoder=1 -z trackStage=1 <QUERY>

(as documented here)

$ freyr get --help | grep '\--concurrency' -A 4

-z, --concurrency <SPEC>     key-value concurrency pairs (repeatable and optionally `,`-separated)
                               (format: <[key=]value>) (key omission implies track concurrency)
                               (valid(key): queries,tracks,trackStage,downloader,encoder,embedder)
                               (example: `queries=2,downloader=4` processes 2 CLI queries,
                               downloads at most 4 tracks concurrently)

@MoroseCorpse
Copy link
Author

Okay thanks, that helped a lot. Are now "only" 50% utilization. But individual cores still pull up to 100%. If I give my VM only one core, it is completely overloaded. For my server this may not be so important, but I would like to intriegieren your tool in a project of me that should also work on worse servers.

I have now also everything already set to 1 and are still 100% utilized at a core:

freyr -z "encoder=1,trackStage=1,downloader=1,embedder=1" spotify:playlist:37i9dQZF1DXcBWIGoYBM5M

@miraclx
Copy link
Owner

miraclx commented Feb 2, 2021

That seems quite problematic, please include a screenshot of htop on this run.

@MoroseCorpse
Copy link
Author

zqZgZCJ

@miraclx
Copy link
Owner

miraclx commented Feb 2, 2021

Well from that, it's clear that the brunt of the work is solely coming from youtube-dl and ffmpeg where one is sourcing feeds and the other is reencoding audio streams.

That seems justified, although youtube-dl does come as a shock seeing its computational load, surely there could be a better way.

Maybe #21 can help in that, by using youtube-dl as a library as opposed to individual processes.

@miraclx
Copy link
Owner

miraclx commented Feb 2, 2021

Closing this for now, seeing as the issue isn't from freyr itself.

@miraclx miraclx closed this as completed Feb 2, 2021
@miraclx
Copy link
Owner

miraclx commented Feb 3, 2021

So I profiled the source for youtube-dl using scalene and it shows that most of the CPU work was spent simply reading from the HTTP response

  • HEADER
    Screenshot_20210203_014251
  • youtube_dl/utils.py (% of time = 89.23% out of 12.68s)
    ••• (source)
    Screenshot_20210203_014408
    ••• (source)
    Screenshot_20210203_014542

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants