Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RegexMatchError: get_throttling_function_name: could not find match for multiple #1750

Open
SpaceRocket69 opened this issue Aug 9, 2023 · 23 comments
Labels

Comments

@SpaceRocket69
Copy link

SpaceRocket69 commented Aug 9, 2023

I have confirmed that I am on the latest version of pytube by installing from the source. I did this by running !pip install git+https://github.com/pytube/pytube.

Describe the bug

Since today I'm encountering a RegexMatchError when trying to download a YouTube video using pytube. The error message suggests that the function get_throttling_function_name could not find a match for multiple. It was working fine until yesterday and I didn't do anything to the code.

To Reproduce

The video URL that is causing the error is: https://www.youtube.com/watch?v=oMHLkcc9I9c&ab_channel=NENA

Here is the code where the problem is occurring:

from pytube import YouTube

yt = YouTube("https://www.youtube.com/watch?v=oMHLkcc9I9c&ab_channel=NENA")
yt.streams

Expected behavior

I expected the video to be downloaded without any errors.

Output

Here is the full traceback for the exception:

---------------------------------------------------------------------------
RegexMatchError                           Traceback (most recent call last)
[~\AppData\Roaming\Python\Python38\site-packages\pytube\__main__.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/__main__.py) in fmt_streams(self)
    180         try:
--> 181             extract.apply_signature(stream_manifest, self.vid_info, self.js)
    182         except exceptions.ExtractError:

[~\AppData\Roaming\Python\Python38\site-packages\pytube\extract.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/extract.py) in apply_signature(stream_manifest, vid_info, js)
    408     """
--> 409     cipher = Cipher(js=js)
    410 

[~\AppData\Roaming\Python\Python38\site-packages\pytube\cipher.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/cipher.py) in __init__(self, js)
     42 
---> 43         self.throttling_plan = get_throttling_plan(js)
     44         self.throttling_array = get_throttling_function_array(js)

[~\AppData\Roaming\Python\Python38\site-packages\pytube\cipher.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/cipher.py) in get_throttling_plan(js)
    404         The contents of the base.js asset file.
--> 405     :returns:
    406         The full function code for computing the throttlign parameter.

[~\AppData\Roaming\Python\Python38\site-packages\pytube\cipher.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/cipher.py) in get_throttling_function_code(js)
    310     :returns:
--> 311         The name of the function used to compute the throttling parameter.
    312     """

[~\AppData\Roaming\Python\Python38\site-packages\pytube\cipher.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/cipher.py) in get_throttling_function_name(js)
    295                     array = array.group(1).strip("[]").split(",")
--> 296                     array = [x.strip() for x in array]
    297                     return array[int(idx)]

RegexMatchError: get_throttling_function_name: could not find match for multiple

During handling of the above exception, another exception occurred:

RegexMatchError                           Traceback (most recent call last)
 in 
      1 #yt = YouTube("https://www.youtube.com/watch?v=8JIncRMkr00&pp=ygU7SSdtIEluIExvdmUgRHViIFBpc3RvbHMgZmVhdC4gTGluZHkgTGF5dG9uICYgUm9kbmV5IFAgbXVzaWM%3D")
      2 yt = YouTube("https://www.youtube.com/watch?v=oMHLkcc9I9c&ab_channel=NENA")
----> 3 yt.streams

[~\AppData\Roaming\Python\Python38\site-packages\pytube\__main__.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/__main__.py) in streams(self)
    294         """
    295         self.check_availability()
--> 296         return StreamQuery(self.fmt_streams)
    297 
    298     @property

[~\AppData\Roaming\Python\Python38\site-packages\pytube\__main__.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/__main__.py) in fmt_streams(self)
    186             pytube.__js__ = None
    187             pytube.__js_url__ = None
--> 188             extract.apply_signature(stream_manifest, self.vid_info, self.js)
    189 
    190         # build instances of :class:`Stream `

[~\AppData\Roaming\Python\Python38\site-packages\pytube\extract.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/extract.py) in apply_signature(stream_manifest, vid_info, js)
    407 
    408     """
--> 409     cipher = Cipher(js=js)
    410 
    411     for i, stream in enumerate(stream_manifest):

[~\AppData\Roaming\Python\Python38\site-packages\pytube\cipher.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/cipher.py) in __init__(self, js)
     41         ]
     42 
---> 43         self.throttling_plan = get_throttling_plan(js)
     44         self.throttling_array = get_throttling_function_array(js)
     45 

[~\AppData\Roaming\Python\Python38\site-packages\pytube\cipher.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/cipher.py) in get_throttling_plan(js)
    403     :param str js:
    404         The contents of the base.js asset file.
--> 405     :returns:
    406         The full function code for computing the throttlign parameter.
    407     """

[~\AppData\Roaming\Python\Python38\site-packages\pytube\cipher.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/cipher.py) in get_throttling_function_code(js)
    309     :rtype: str
    310     :returns:
--> 311         The name of the function used to compute the throttling parameter.
    312     """
    313     # Begin by extracting the correct function name

[~\AppData\Roaming\Python\Python38\site-packages\pytube\cipher.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/Maurice/Code/Music_Recommendation/~/AppData/Roaming/Python/Python38/site-packages/pytube/cipher.py) in get_throttling_function_name(js)
    294                 if array:
    295                     array = array.group(1).strip("[]").split(",")
--> 296                     array = [x.strip() for x in array]
    297                     return array[int(idx)]
    298 

RegexMatchError: get_throttling_function_name: could not find match for multiple

I already tried removing the semicolon as suggested in 1707 but it didn't work for me.

System information
Please provide the following information:

@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Thank you for contributing to PyTube. Please remember to reference Contributing.md

@bigbear22941
Copy link

here the same error ...

@gravek
Copy link

gravek commented Aug 9, 2023

The same situation: can not download anything...

@NannoSilver
Copy link

Same problem

@jm7uz
Copy link

jm7uz commented Aug 9, 2023

here the same error ((

@oscaraandersson
Copy link

Same

@YuriiMaiboroda
Copy link

#1707 (comment)
This work for me

@bigbear22941
Copy link

Yes:

r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',
r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])\([a-z]\)',

this work for me, too

@NannoSilver
Copy link

NannoSilver commented Aug 9, 2023

That worked for me too, but seems to have some abnormal delay to start the download:

In cipher.py locate:

        r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])?\([a-z]\)',

And replace by:

        r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])\([a-z]\)',

The only change is the removal of a ?

@qupterra
Copy link

Not to speak out of turn for this repo, but adding comments saying "me too" add a lot of noise to the discussion. Feel free to just thumbs-up the original post to let the maintainers know you're watching the thread
image

@oszlsm
Copy link

oszlsm commented Aug 10, 2023

That worked for me too, but seems to have some abnormal delay to start the download:

In cipher.py locate:

        r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])?\([a-z]\)',

And replace by:

        r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])\([a-z]\)',

The only change is the removal of a ?

It works but the data download speed is awful

@NannoSilver
Copy link

I did many tests and I can confirm it works, but the download speed is far lower than usual.

Here are some the download speed I got:

downloaded in      30.37 seconds     at speed         316.60 kB/s
downloaded in      20.47 seconds     at speed          46.13 kB/s
downloaded in      16.87 seconds     at speed          54.87 kB/s
downloaded in      23.06 seconds     at speed         437.26 kB/s
downloaded in      20.93 seconds     at speed       1,288.54 kB/s
downloaded in      32.14 seconds     at speed         538.38 kB/s
downloaded in      27.80 seconds     at speed          28.67 kB/s
downloaded in      28.10 seconds     at speed         378.43 kB/s
downloaded in      31.73 seconds     at speed         675.31 kB/s
downloaded in      32.36 seconds     at speed          26.40 kB/s
downloaded in      33.19 seconds     at speed       1,649.87 kB/s
downloaded in      13.64 seconds     at speed          69.36 kB/s
downloaded in      33.52 seconds     at speed          26.18 kB/s
downloaded in      14.62 seconds     at speed         258.11 kB/s
downloaded in      42.43 seconds     at speed       2,123.43 kB/s
downloaded in      41.57 seconds     at speed         123.60 kB/s
downloaded in      45.57 seconds     at speed           9.91 kB/s
downloaded in      42.59 seconds     at speed         107.25 kB/s
downloaded in      44.01 seconds     at speed       1,285.29 kB/s
downloaded in      12.58 seconds     at speed         371.68 kB/s
downloaded in      19.63 seconds     at speed         119.63 kB/s
downloaded in      18.96 seconds     at speed         162.03 kB/s
downloaded in      38.84 seconds     at speed          13.18 kB/s
downloaded in      62.54 seconds     at speed          12.88 kB/s
downloaded in      66.01 seconds     at speed         808.10 kB/s
downloaded in       9.19 seconds     at speed         538.32 kB/s
downloaded in       8.29 seconds     at speed       1,018.81 kB/s
downloaded in      27.40 seconds     at speed       4,351.08 kB/s
downloaded in      15.88 seconds     at speed         171.19 kB/s
downloaded in       8.13 seconds     at speed         168.79 kB/s
downloaded in      18.02 seconds     at speed       1,022.75 kB/s
downloaded in      11.60 seconds     at speed          73.62 kB/s
downloaded in      11.02 seconds     at speed          54.26 kB/s
downloaded in      11.63 seconds     at speed         357.40 kB/s


@NannoSilver
Copy link

NannoSilver commented Aug 10, 2023

Not to speak out of turn for this repo, but adding comments saying "me too" add a lot of noise to the discussion. Feel free to just thumbs-up the original post to let the maintainers know you're watching the thread

It is not bad at all.
When people say "same problem here" or something like that, will count-up at the list of "issues" and attract attention of others for which is the thread of the issues affecting everybody. Helping to concentrate the discussions and reducing the number of threads.

comments

@AkshayShineKrishna
Copy link

That worked for me too, but seems to have some abnormal delay to start the download:

In cipher.py locate:

        r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])?\([a-z]\)',

And replace by:

        r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*?\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])\([a-z]\)',

The only change is the removal of a ?

This works , but download speeds are significantly lower than usual

@akawari
Copy link

akawari commented Aug 10, 2023

I agree with all, I faced the issue today and replaced the above code, it works but significantly impacts download speeds.

@Mc-Kappa
Copy link

from cipher.py i deleted another one question mark and now, my program runs smoothly, so in cipher.py i have regex like this:

r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])\([a-z]\)'

@SpaceRocket69
Copy link
Author

from cipher.py i deleted another one question mark and now, my program runs smoothly, so in cipher.py i have regex like this:

r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])\([a-z]\)'

This works, but for the unadvanced like me: You have to restart the kernel after changing something in the imported libraries otherways it will show the new declaration in the error but will not work.

@ymk201
Copy link

ymk201 commented Aug 15, 2023

Here is what i found.

Fresh install via bellow command.
$ python -m pip install git+https://github.com/pytube/pytube

1st run of bellow works fine.
$ pytube -l <youtube_url>

2nd run of save 1st line make reg* error as bellow.
pytube.exceptions.RegexMatchError: init: could not find match for ^\w+\W

I found that above RegexMatchError has a Forkfix?
#1763
@ cipher.py 30 line
AS-IS
var_regex = re.compile(r"^\w+\W")
TO-BE
var_regex = re.compile(r"^$\w+\W")

After fork fix 1763(#1763) bellow get_throttling_function_name happens
pytube.exceptions.RegexMatchError: get_throttling_function_name: could not find match for multiple

@jeancarv
Copy link

Isso funcionou para mim: #1754 (comentário)

@LaniakeaArmstrong
Copy link

I have a question: why does it seem like "cypher.py" is where the most errors occur?

@YuriiMaiboroda
Copy link

I have a question: why does it seem like "cypher.py" is where the most errors occur?

Because this code do parsing and processing the script coming from YouTube. If YouTube changes something, the processing algorithm may become out of date and require modification.

@LaniakeaArmstrong
Copy link

from cipher.py i deleted another one question mark and now, my program runs smoothly, so in cipher.py i have regex like this:

r'a\.[a-zA-Z]\s*&&\s*\([a-z]\s*=\s*a\.get\("n"\)\)\s*&&.*\|\|\s*([a-z]+)',
        r'\([a-z]\s*=\s*([a-zA-Z0-9$]+)(\[\d+\])\([a-z]\)'

The pattern provided by @Mc-Kappa solved the problem, and without the slow download side effect, too. Thank you.
Maybe it's because the pattern is more permissive? Don't know. I appreciate in advance if someone explains to me why it worked (if it really worked for you, of course).

@YuriiMaiboroda
Copy link

.*? This matches the smallest possible part of the text when the other parts of the expression are matched
.* This matches the biggest possible part of the text when the other parts of the expression are matched
When using the first option, an incorrect element of the code could be determined, due to which the decryption of the key could be incorrect. And then YouTube cuts the speed. When replacing the expression with the second one, the required function began to be determined, the decryption began to proceed successfully, and YouTube stopped cutting speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests