Switch to self hosted version (aka Podsync CLI) #38

mxpv · 2019-10-27T04:21:16Z

grafmik · 2019-10-27T11:53:07Z

Hello Max and thanks for your work so far and for your effort to make a self-hosting version !

Just wondering, why the mp3 format? Isn't that for audio only? Do you mean mp4?

Keep it up!

jonathanp0 · 2019-10-27T13:16:30Z

Since some Youtube channels simply consist of people talking for long periods of time in front of a camera, it's useful to be able to convert the video to an audio file and listen to it using podcast software. Podsync already has this feature.

amcgregor · 2019-10-27T14:34:02Z

I found the internal architecture of podsync to be… rather excessively complicated. (A testament to the power of ingenuity, but too complicated for me to be comfortable setting up locally.) DynamoDB? Lambda? Golang! Docker. But also Node… (How many programming languages does one need at a single time? Node does not touch my machines.) I looked at all that, then just dove into the code looking for the ultimate youtube-dl invocation. Then extracted and isolated that, automated using GNU parallel. That Gist also describes the patches made to youtube-dl to avoid excessive numbers of HTTP requests. (E.g. actually sys.exit() when failing a video because it is too old, all subsequent videos will be older.)

I've resolved the issue with playlists being named after their origin channel instead of the actual name of the playlist and will continue to keep this tiny little shell script updated. Also added optional lines for rate limiting, randomized sleep periods, and SOCKS5 proxy configuration; that is, ssh -D 8088 example.com and the proxy would be --proxy "socks5://localhost:8088/". Only real remaining issue, feed thumbnails. With this setup, it's taken YouTube two weeks to begin to throw up Captcha challenges, after ingesting 5,895 episodes totalling 1.6 TiB across 141 channels / playlists. (Each run taking, on average, around 10 minutes, run via cron every few hours.)

amcgregor · 2019-10-27T14:50:47Z

Ah, adding a second comment as it's an important note, my shell script there (basically a text file containing a channel or playlist URL per line…) explicitly gives you control over per-channel quality settings (see line 41; split that up with multiple formats if needed, I do) as well as extended video selection criteria, such as title exclusions (see line 74). (Run youtube-dl -h to see the many, many options available.)

grafmik · 2019-10-27T15:13:30Z

Hello Alice,

Thanks for this work! I had started to dive into Max code, starting with early commits. Did you know podsync started as a .NET project? :)

I can understand Max used a database because he had to store every user playlist. For a self-hosted, single-user version, generating/serving just one file may indeed actually be a simpler/better solution.
For node I feel you.
As for Docker, it could be a nice feature to add to your code, as this could determine the right environment, especially for versions of parallel and python.

Anyway, I'm grateful because you made me save some time and effort.

amcgregor · 2019-10-27T15:17:45Z

especially for versions of parallel and python.

Any version will do:

brew install parallel

Python 3 is already a pretty universal standard; the given code will work with any Python 3.3 or newer, that is, virtually any Python released in the last 10 years in that series. Including the version that comes pre-installed on macOS.

Edited to add: thus, in this particular case, Docker would simplify nothing, and complicate everything. Like a Spartan soldier taking everything and giving nothing.

store every user playlist

On-disk directories are the database, in my case. My Python script and template will transform any directory containing youtube-dl .info.json files into a podcast. (Future improvement: only regenerate the index.xml if there are actually new/updated episodes, but feed generation is so minor compared to content collection, that's a low priority.)

Edited to add: ingest (pull.sh invocations of youtube-dl) are one half of the problem: actually getting the content. A problem tackled entirely separately: turning those collected media files into podcast feeds.

grafmik · 2019-10-27T15:50:22Z

I understand you want to keep things simple. Loved the 300 reference and can't help imagining Docker as a bare-torso warrior now.

As I already said, didn't read podsync code, but does it store every mp4 on their server?
I'm using your script right now (will also check these youtube channels of yours, just curious).
I see the content (mp4) is directly "youtube-dl-ed" right here on the machine.

I can see the dl.podsync.net/* urls link to googlevideo.com. Is there an upload wrapper somewhere that could avoid using space on the podsync self-hosted server?

amcgregor · 2019-10-27T19:53:09Z

…does it store every mp4 on their server?

Yes, as part of the background "updater" process. Where that is Python code, so invokes youtube_dl directly, and my shell script is a shell script invoking the youtube-dl command itself. One layer out. ;)

«googlevideo.com links» … Is there an upload wrapper somewhere that could avoid using space on the podsync self-hosted server?

Well, where youtube-dl by command line, by default, will download the video content, if you are careful to pick a video format that comes "pre-muxed" (that is, audio and video together) you can hypothetically avoid downloading the video and pull the actual origin links from the .info.json for use in the RSS feed. Or, in Podsync's case, after a 302 redirect, likely looking up the local cache status vs. availability from YouTube of the pre-muxed version link.

That's a key difference, I think. I get 1080p episodes, as I re-mux the independent streams. Hypothetically I could choose a 4K --format. (But ye gods, the storage space, then!)

leekillough · 2019-10-28T00:40:54Z

Just wondering, why the mp3 format? Isn't that for audio only? Do you mean mp4?

Some of us want audio-only, since we like listening to the audio of podcasts posted to YouTube, but we don't have the time to watch the video, since we're doing other things when we listen to the audio, such as driving, or we don't care to see the podcaster's studio, when what they say is more important than their studio.

I consider it a welcome addition.

Self-hosting seems like the way to go too, eliminating single points of failure.

davidAlittle · 2019-11-06T05:05:15Z

Just wondering, why the mp3 format? Isn't that for audio only? Do you mean mp4?

Some of us want audio-only, since we like listening to the audio of podcasts posted to YouTube, but we don't have the time to watch the video, since we're doing other things when we listen to the audio, such as driving, or we don't care to see the podcaster's studio, when what they say is more important than their studio.

I consider it a welcome addition.

Self-hosting seems like the way to go too, eliminating single points of failure.

Second the audio only option. I don't know what APIs you're using, but I can tell you that as a Youtube Red subscriber (actually a Google Play Music subscriber, but that's the same thing now), there is a way to only stream audio, since this is a premium feature specifically offered as part of Red.

amcgregor · 2019-11-06T12:31:22Z

Direct use of youtube-dl (the command-line program powering all of this media ingest) permits retrieval of just the audio. My little automation script wins again, it already can do this! ;P

Some of us want audio-only…

It really is a bit flabbergasting to be repeatedly asked for something the user already has the ability to do… and search for.

mirth · 2019-11-07T18:25:45Z

Is it expected that docker-compose pull produces the following?

ERROR: for api  pull access denied for mxpv/podsync_api, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
ERROR: for updater  pull access denied for mxpv/updater, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
ERROR: for resolver  pull access denied for mxpv/podsync_lambda, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
ERROR: for nginx  pull access denied for mxpv/nginx, repository does not exist or may require 'docker login': denied: requested access to the resource is denied

mxpv · 2019-11-07T18:29:49Z

CLI docker images are not yet published.

mxpv · 2019-11-16T07:52:12Z

New functionality, docs and tutorials will be added in follow up PRs.

This was referenced Oct 27, 2019

Feeds Aren't Updating #37

Closed

Where to obtain podsync_lambda #31

Closed

Some Episodes Won't Download #36

Closed

Podsync not updating new content #34

Closed

Option to select larger download size for videos #35

Closed

mxpv closed this as completed Nov 16, 2019

allanlaal mentioned this issue Feb 24, 2021

new public hosting to replace the defunct service from podsync.net #216

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to self hosted version (aka Podsync CLI) #38

Switch to self hosted version (aka Podsync CLI) #38

mxpv commented Oct 27, 2019 •

edited

Loading

grafmik commented Oct 27, 2019

jonathanp0 commented Oct 27, 2019

amcgregor commented Oct 27, 2019 •

edited

Loading

amcgregor commented Oct 27, 2019 •

edited

Loading

grafmik commented Oct 27, 2019

amcgregor commented Oct 27, 2019 •

edited

Loading

grafmik commented Oct 27, 2019

amcgregor commented Oct 27, 2019 •

edited

Loading

leekillough commented Oct 28, 2019

davidAlittle commented Nov 6, 2019

amcgregor commented Nov 6, 2019 •

edited

Loading

mirth commented Nov 7, 2019

mxpv commented Nov 7, 2019

mxpv commented Nov 16, 2019

Switch to self hosted version (aka Podsync CLI) #38

Switch to self hosted version (aka Podsync CLI) #38

Comments

mxpv commented Oct 27, 2019 • edited Loading

grafmik commented Oct 27, 2019

jonathanp0 commented Oct 27, 2019

amcgregor commented Oct 27, 2019 • edited Loading

amcgregor commented Oct 27, 2019 • edited Loading

grafmik commented Oct 27, 2019

amcgregor commented Oct 27, 2019 • edited Loading

grafmik commented Oct 27, 2019

amcgregor commented Oct 27, 2019 • edited Loading

leekillough commented Oct 28, 2019

davidAlittle commented Nov 6, 2019

amcgregor commented Nov 6, 2019 • edited Loading

mirth commented Nov 7, 2019

mxpv commented Nov 7, 2019

mxpv commented Nov 16, 2019

mxpv commented Oct 27, 2019 •

edited

Loading

amcgregor commented Oct 27, 2019 •

edited

Loading

amcgregor commented Oct 27, 2019 •

edited

Loading

amcgregor commented Oct 27, 2019 •

edited

Loading

amcgregor commented Oct 27, 2019 •

edited

Loading

amcgregor commented Nov 6, 2019 •

edited

Loading