Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Download speed getting slower and slower when downloading many videos #1033

Closed
joon612 opened this issue Jun 25, 2021 · 51 comments · Fixed by #1037
Closed

[BUG] Download speed getting slower and slower when downloading many videos #1033

joon612 opened this issue Jun 25, 2021 · 51 comments · Fixed by #1037

Comments

@joon612
Copy link

joon612 commented Jun 25, 2021

When downloading multiple videos continuously, the download speed will become slower and slower.

image
image

System information
Please provide the following information:

  • Python version (run python --version)
    Python 3.8.0
  • Pytube version (run print(pytube.__version__) in python)
    10.8.5
  • Command used to install pytube
    python3.8 -m pip install pytube
@joon612 joon612 added the bug label Jun 25, 2021
@github-actions
Copy link

Thank you for contributing to PyTube. Please remember to reference Contributing.md

@tfdahlin
Copy link
Collaborator

I think I saw this somewhere a while back while I was researching another problem. I seem to recall seeing an implementation somewhere that dynamically adjusts the chunk size that's used for downloads, but I think it was a toggleable feature. Did you try accessing a YouTube video on your web browser while this was happening, to see if this was being applied to your IP address? Did you see the same slowdown on other websites?

There are two things that I can think of that would cause this issue. It could be throttling on YouTube's end, where they see you accessing large amounts of data, and slow down transfers to your IP address (or possibly just to requests made by your client at your IP address). It could also be your ISP throttling your speeds for the same reason, or because of a data cap. If the problem is that you're simply downloading things too quickly, then it might be possible to dynamically adjust how quickly we download files with some clever code. If it's a data cap issue with your ISP, that's not really a problem we can circumvent.

@joon612
Copy link
Author

joon612 commented Jun 25, 2021 via email

@Glevion
Copy link

Glevion commented Jun 26, 2021

This is also happening to me since the last week...

@rishabh3354
Copy link

Same issue happening to me. I have together downloaded 9 videos from pytube. First 2 videos downloaded quickly and after that its getting very slower and slower. This is happening for me as well. And i know its not my ISP.

My last video was 10 MB , and it took more than 5 minutes to download. If its from youtube ends , then there must be some solution out there. Please help..

@tfdahlin
Copy link
Collaborator

ytdl-org/youtube-dl#29326 looks like this is a new youtube issue, with a possible solution, but it sounds like it might be a bit hard to write. I'll try to investigate and implement it this week if I can, just might take me a while

@shoxie007
Copy link

@tfdahlin If you succeed, would you PLEASE provide some suggestions on how to edit Youtube-DL's extractor to bypass the throttling? Many thanks for your efforts.

@rishabh3354
Copy link

ytdl-org/youtube-dl#29326 looks like this is a new youtube issue, with a possible solution, but it sounds like it might be a bit hard to write. I'll try to investigate and implement it this week if I can, just might take me a while

Thanks for considering. I hope you will succeed

@tfdahlin tfdahlin linked a pull request Jun 28, 2021 that will close this issue
@tfdahlin
Copy link
Collaborator

Current progress: I have the functionality to find and extract the relevant function for computing n implemented. I have not yet implemented the functionality for performing the relevant operations on n. That will be the hard part.

@tfdahlin
Copy link
Collaborator

For documentation purposes, an example of what this function looks like (after adding indentation to make it more readable):

Function for computing n
cha=function(a){
    var b=a.split(""),
        c=[
            142389785,
            -1062889771,
            function(d,e){
                for(
                    e=(e%d.length+d.length)%d.length;
                    e--;
                    
                )
                    d.unshift(d.pop())
            },
            b,
            1966130824,
            -388933535,
            952987009,
            -152323794,
            -646774959,
            -1835069559,
            null,
            -1891128459,
            2069343819,
            264462757,
            null,
            -1827754172,
            function(d,e){
                e=(e%d.length+d.length)%d.length;
                d.splice(
                    0,
                    1,
                    d.splice(
                        e,
                        1,
                        d[0]
                    )[0]
                )
            },
            null,
            b,
            function(d){
                d.reverse()
            },
            1468427690,
            b,
            "pop",
            416624305,
            86884804,
            function(d,e){
                d.push(e)
            },
            1815184233,
            1319603020,
            function(d,e){
                for(
                    var f=64,h=[];
                    ++f-h.length-32;
                    
                ){
                    switch(f){
                        case 58:
                            f-=14;
                        case 91:
                        case 92:
                        case 93:
                            continue;
                        case 123:
                            f=47;
                        case 94:
                        case 95:
                        case 96:
                            continue;
                        case 46:
                            f=95
                    }
                    h.push(String.fromCharCode(f))
                }
                d.forEach(
                    function(l,m,n){
                        this.push(
                            n[m]=h[
                                (h.indexOf(l)-h.indexOf(this[m])+m-32+f--)%h.length
                            ]
                        )
                    },
                    e.split("")
                )
            },
            1823083268,
            1134033916,
            -646774959,
            -2126127395,
            function(d,e){
                e=(e%d.length+d.length)%d.length;
                d.splice(-e).reverse().forEach(
                    function(f){
                        d.unshift(f)
                    }
                )
            },
            1845050767,
            1965266158,
            1132026013,
            -294807870,
            -508492140,
            -2000244778,
            52788620,
            2012640407,
            function(d,e){
                for(
                    e=(e%d.length+d.length)%d.length;
                    e--;
                    
                )
                    d.unshift(d.pop())
            },
            -2021094533,
            1012922155,
            -1870404083,
            -161415485,
            1496545176,
            function(d,e){
                e=(e%d.length+d.length)%d.length;
                var f=d[0];
                d[0]=d[e];
                d[e]=f
            },
            1247337884,
            1584781061,
            function(d){
                for(
                    var e=d.length;
                    e;
                )
                    d.push(d.splice(--e,1)[0])
            },
            416358396,
            -410621770,
            function(d,e){
                e=(e%d.length+d.length)%d.length;
                d.splice(e,1)
            }
        ];
        c[10]=c;
        c[14]=c;
        c[17]=c;
        try{
            c[16](c[3],c[24]),
            c[16](c[18],c[35]),
            c[16](c[21],c[43]),
            c[16](c[14],c[7]),
            c[54](c[3],c[8]),
            c[16](c[18],c[53]),
            c[16](c[3],c[5]),
            c[33](c[3],c[0]),
            c[33](c[17],c[39]),
            c[40](c[21],c[22]),
            c[53](c[8],c[41]),
            c[38](c[22],c[28]),
            c[27](c[42],c[37]),
            c[22](c[50],c[15]),
            c[54](c[46],c[14]),
            c[28](c[32],c[12]),
            c[22](c[46],c[23]),
            c[22](c[39],c[42]),
            c[2](c[50],c[51]),
            c[28](c[47],c[35]),
            c[28](c[47],c[44]),
            c[31](c[50],c[26]),
            c[45](c[50],c[3]),
            c[22](c[39],c[24]),
            c[28](c[47],c[49]),
            c[16](c[46],c[11]),
            c[42](c[25],c[43]),
            c[14](c[0],c[5]),
            c[2](c[25],c[26]),
            c[55](c[53],c[38]),
            c[46](c[53],c[3]),
            c[11](c[30],c[4]),
            c[8](c[22]),
            c[24](c[26]),
            c[39](c[26],c[1]),
            c[46](c[29],c[16]),
            c[55](c[26],c[13]),
            c[43](c[55],c[50]),
            c[13](c[43],c[1]),
            c[49](c[23],c[39]),
            c[29](c[41],c[6])
        }
        catch(d){
            return"enhanced_except_8pIBje3-_w8_"+a
        }
    return b.join("")
}

@tfdahlin
Copy link
Collaborator

The transform plan seems to be successfully extracting. Next step is to create the array of values + functions so that the plan can be applied.

@tfdahlin
Copy link
Collaborator

I'm about halfway done w/ the next step. Have to stop here for now

@tfdahlin
Copy link
Collaborator

Slow progress today; two more javascript functions left to emulate, then I can try to put together the last piece of this and hopefully get it working

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 3, 2021

Finished writing the code that will calculate the new value of n. Feel free to test out the experimental branch here, and let me know if it fixes the slowdown problem for you. One thing that I'm unclear about is whether or not we will need to disable caching the base.js file.

@joon612
Copy link
Author

joon612 commented Jul 5, 2021

The download slow problem seems to be solved. But how to prevent http status 429 issue, I get a video each 3 seconds. But it still will say request too much.
image

@joon612
Copy link
Author

joon612 commented Jul 5, 2021

Hi @tfdahlin
It will occur KeyError after this fix.

Traceback (most recent call last):
 File "youtube_downloader.py", line 118, in <module>
   download_video(v_url, SKIP, LOG_FILENAME)
 File "youtube_downloader.py", line 53, in download_video
   logger.info("Process video id: %s, title: %s", video_id, video.title)
 File "/.../pytube/__main__.py", line 351, in title
   self._title = self.player_response['videoDetails']['title']
KeyError: 'videoDetails'

The video sample is https://www.youtube.com/watch?v=ZiBpzHuHRJc

@shoxie007
Copy link

The download slow problem seems to be solved. But how to prevent http status 429 issue, I get a video each 3 seconds. But it still will say request too much.

How many videos in total did you download prior to getting this error? 100? 200? etc

@joon612
Copy link
Author

joon612 commented Jul 5, 2021 via email

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 5, 2021

@joon612 thank you for helping me test this!

The 429 error is a separate issue that we can't really handle in the library. Instead, you'll have to use proxies to handle that, as the 429 rate limit will most likely be based on your ip address. You can tell pytube to use a proxy with the pytube.helpers.install_proxy function, which take a dict of proxies to use. Internally, pytube calls this, so code to use a proxy would look something like this:

proxy_list = [
    # Fill with list of proxies you want to use
    https://proxy.example.com:9001,
    https://proxy.example.com:9002,
    https://proxy.example.com:9003
]

for video in video_list:
    try:
        video.streams[0].download()
    except:  # Handle 429 errors by installing new proxy
        success = False
        for proxy in proxy_list:
            if success:
                break
            try:
                pytube.helpers.install_proxy({'https': proxy})
                video.streams[0].download()
                success = True
            except:
                print(f'Failed with proxy {proxy}')
                pass
        if not success:
            print(f'Could not successfully download {video.title} with proxies.')

Hi @tfdahlin
It will occur KeyError after this fix.

Traceback (most recent call last):
 File "youtube_downloader.py", line 118, in <module>
   download_video(v_url, SKIP, LOG_FILENAME)
 File "youtube_downloader.py", line 53, in download_video
   logger.info("Process video id: %s, title: %s", video_id, video.title)
 File "/.../pytube/__main__.py", line 351, in title
   self._title = self.player_response['videoDetails']['title']
KeyError: 'videoDetails'

The video sample is https://www.youtube.com/watch?v=ZiBpzHuHRJc

The reason that the video you sent isn't working is because it's a private video, but it should be raising a VideoPrivate exception, so I'll try to investigate why that's not happening.

@joon612
Copy link
Author

joon612 commented Jul 5, 2021 via email

@shoxie007
Copy link

shoxie007 commented Jul 5, 2021

Yes, I think so it should be raise exception. And thank you for your advice about 429 status.

I wonder if the new version of pytube for handling the changes in Javascript generates many additional requests which lead to this error prematurely. Eg 100 videos, for which 1000 requests were generated during the download.

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 5, 2021

@shoxie007 the new version does not make any new requests that the old version did not make

@Justxd22
Copy link

Justxd22 commented Jul 5, 2021

@tfdahlin, I'm getting 429 error after downloading one or two videos, the error appears after waiting sometime like 5 mins from last downlad then downloading a third video, it's a heroku bot so I restart it with a fresh ip and still getting the same error , does adding cookies solve the problem? As in #601

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 5, 2021

@Justxd22 I don't know for certain if adding cookies will solve that problem. Pretty regularly, when we get reports about 429 errors, it's because the ip address has been blocked by youtube. I dont know if authenticating with youtube solves that problem or not.

@Justxd22
Copy link

Justxd22 commented Jul 5, 2021

@tfdahlin, I'm trying to edit the source code to preform certain actions after getting "http error 429" which part should i look into?

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 5, 2021

If you're trying to inject cookies directly into your requests, look at _execute_request() in request.py.

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 5, 2021

I'm writing unit tests for the new code now, then will merge and update pypi.

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 6, 2021

I've merged the update into the v10.9.0 release

@joon612
Copy link
Author

joon612 commented Jul 6, 2021 via email

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 6, 2021

So the newest version of Pytube is still 10.8.5?

No, the newest version is 10.9.0. It should be available on pypi.

@joon612
Copy link
Author

joon612 commented Jul 6, 2021 via email

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 6, 2021

Sometimes it takes a while for that icon to update, it's showing up correctly for me

@joon612
Copy link
Author

joon612 commented Jul 6, 2021 via email

@joon612
Copy link
Author

joon612 commented Jul 6, 2021

The reason that the video you sent isn't working is because it's a private video, but it should be raising a VideoPrivate exception, so I'll try to investigate why that's not happening.

@tfdahlin The Key Error still exist in newest version, Is it still not fixed?

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 6, 2021

The reason that the video you sent isn't working is because it's a private video, but it should be raising a VideoPrivate exception, so I'll try to investigate why that's not happening.

@tfdahlin The Key Error still exist in newest version, Is it still not fixed?

The KeyError will be fixed in a different update. I'm planning to add a few fixes and features tomorrow if I'm able to, and the KeyError problem will be one of those fixes

@joon612
Copy link
Author

joon612 commented Jul 6, 2021

@tfdahlin
I met a new error in this version.

Traceback (most recent call last):
  File "youtube_downloader.py", line 135, in <module>
    download_video(v_url, SKIP)
  File "youtube_downloader.py", line 70, in download_video
    video_output_path = video.download(
  File "/usr/local/lib/python3.8/dist-packages/pytube/streams.py", line 258, in download
    for chunk in request.stream(
  File "/usr/local/lib/python3.8/dist-packages/pytube/request.py", line 181, in stream
    chunk = response.read()
  File "/usr/lib/python3.8/http/client.py", line 467, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python3.8/http/client.py", line 610, in _safe_read
    raise IncompleteRead(data, amt-len(data))
http.client.IncompleteRead: IncompleteRead(102061 bytes read, 9335123 more expected)

Video sample is: https://www.youtube.com/watch?v=FbcC7na-alY.

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 6, 2021

Usually an IncompleteRead error is caused by a network error. I'm planning to add some additional handling for these errors in the retry code, so an upcoming release should help with that problem.

@joon612
Copy link
Author

joon612 commented Jul 6, 2021 via email

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 6, 2021

Hi @tfdahlin
It will occur KeyError after this fix.

Traceback (most recent call last):
 File "youtube_downloader.py", line 118, in <module>
   download_video(v_url, SKIP, LOG_FILENAME)
 File "youtube_downloader.py", line 53, in download_video
   logger.info("Process video id: %s, title: %s", video_id, video.title)
 File "/.../pytube/__main__.py", line 351, in title
   self._title = self.player_response['videoDetails']['title']
KeyError: 'videoDetails'

The video sample is https://www.youtube.com/watch?v=ZiBpzHuHRJc

The fix for this has been released in v10.9.1

@tfdahlin
I met a new error in this version.

Traceback (most recent call last):
  File "youtube_downloader.py", line 135, in <module>
    download_video(v_url, SKIP)
  File "youtube_downloader.py", line 70, in download_video
    video_output_path = video.download(
  File "/usr/local/lib/python3.8/dist-packages/pytube/streams.py", line 258, in download
    for chunk in request.stream(
  File "/usr/local/lib/python3.8/dist-packages/pytube/request.py", line 181, in stream
    chunk = response.read()
  File "/usr/lib/python3.8/http/client.py", line 467, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python3.8/http/client.py", line 610, in _safe_read
    raise IncompleteRead(data, amt-len(data))
http.client.IncompleteRead: IncompleteRead(102061 bytes read, 9335123 more expected)

Video sample is: https://www.youtube.com/watch?v=FbcC7na-alY.

I've also added some support for getting around this in v10.9.1. Using the max_retries argument in Stream.download() should now retry when it experiences IncompleteRead errors.

@joon612
Copy link
Author

joon612 commented Jul 7, 2021

@tfdahlin
I met a new error in this version.

Traceback (most recent call last):
  File "youtube_downloader.py", line 135, in <module>
    download_video(v_url, SKIP)
  File "youtube_downloader.py", line 70, in download_video
    video_output_path = video.download(
  File "/usr/local/lib/python3.8/dist-packages/pytube/streams.py", line 258, in download
    for chunk in request.stream(
  File "/usr/local/lib/python3.8/dist-packages/pytube/request.py", line 181, in stream
    chunk = response.read()
  File "/usr/lib/python3.8/http/client.py", line 467, in read
    s = self._safe_read(self.length)
  File "/usr/lib/python3.8/http/client.py", line 610, in _safe_read
    raise IncompleteRead(data, amt-len(data))
http.client.IncompleteRead: IncompleteRead(102061 bytes read, 9335123 more expected)

Video sample is: https://www.youtube.com/watch?v=FbcC7na-alY.

I've also added some support for getting around this in v10.9.1. Using the max_retries argument in Stream.download() should now retry when it experiences IncompleteRead errors.

Although I have set max_retries = 2, IncompleteRead errors still happens.

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 7, 2021

@joon612 are you getting IncompleteRead errors, or are you getting MaxRetriesExceeded errors? If you're seeing IncompleteRead errors, then I don't think you have the latest build

@joon612
Copy link
Author

joon612 commented Jul 7, 2021

@tfdahlin I am using the latest build 10.9.1. It except by

except IncompleteRead:
        logger.error("IncompleteRead: %s", url)

2021-07-07 01:56:26,836 - tid 140216156170048 - ERROR: IncompleteRead: https://www.youtube.com/watch?v=yTl0qi70sMU

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 7, 2021

Have you tried decreasing the default_range_size?

import pytube.request
pytube.request.default_range_size = 1048576  # set to 1MB chunks; default is 9MB

@joon612
Copy link
Author

joon612 commented Jul 7, 2021 via email

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 7, 2021

Try using a 1048576 or lower. This tells pytube to download smaller chunks at a time, which could help with the IncompleteRead errors.

@joon612
Copy link
Author

joon612 commented Jul 7, 2021

It still not work, It only happens on some videos. Although make range size smaller, it still will occur.
video sample: https://www.youtube.com/watch?v=yTl0qi70sMU

@tfdahlin
Copy link
Collaborator

tfdahlin commented Jul 7, 2021

I can't seem to replicate this error at all on my machine. Does it consistently happen on specific videos? Or is it random?

@joon612
Copy link
Author

joon612 commented Jul 7, 2021 via email

@joon612
Copy link
Author

joon612 commented Jul 15, 2021

I still got slow download recently. Anyone else got this same issue?

@joon612
Copy link
Author

joon612 commented Jul 16, 2021

@tfdahlin, I'm trying to edit the source code to preform certain actions after getting "http error 429" which part should i look into?

Is there any update for this issue?

@tfdahlin
Copy link
Collaborator

I still got slow download recently. Anyone else got this same issue?

I haven't seen this myself, but I don't download things regularly enough to encounter the issue. If anybody else has, please share.

@tfdahlin, I'm trying to edit the source code to preform certain actions after getting "http error 429" which part should i look into?

Is there any update for this issue?

429 issues aren't something we can effectively solve on the side of pytube. This is youtube rate-limiting your IP address, and you would need to use proxies to bypass that. I think youtube-dl bypasses this by allowing you to pass a file to read cookies from, so that you're effectively using your YouTube account to access the videos. I think I could do the same, but I want to more explicitly support logging in if I'm going to support credentialled requests, which is going to be more challenging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants