Skip to content
This repository has been archived by the owner on Sep 10, 2022. It is now read-only.

Download process doesn't work in macOS #24

Closed
dayvsonsales opened this issue Oct 1, 2020 · 10 comments · Fixed by #26 or #31
Closed

Download process doesn't work in macOS #24

dayvsonsales opened this issue Oct 1, 2020 · 10 comments · Fixed by #26 or #31
Assignees
Labels
bug Something isn't working hacktoberfest

Comments

@dayvsonsales
Copy link
Contributor

dayvsonsales commented Oct 1, 2020

Describe the bug
Hello,

I tried to use your application to download an m3u8 playlist, but it seems that the code is failing in the download step. Apparently the method sched_getaffinity doesn't exist in module os on macOS.

press ctrl+c or ctrl+z if parsed headers of type http2 False are incorrect.
 
Starting Download process MainProcess. 
Traceback (most recent call last):  
  File "/Users/dayvsonsales/m3u8-dl/core/download_process.py", line 27, in download_process. 
    download_manager = DownloadProcess(links, total_links, session, http2,  
  File "/Users/dayvsonsales/m3u8-dl/core/download_process.py", line 55, in __init__. 
    self.__process_num = 4 if platform.system() == "Windows" else   len(os.sched_getaffinity(os.getpid())). 
AttributeError: module 'os' has no attribute 'sched_getaffinity'

To Reproduce
Steps to reproduce the behavior:

  1. python3 main.py https://<mywebsite>.com/scripts/m3us/playlist.m3u
  2. See error

Expected behavior
I expected that my playlist should be downloaded.

Desktop (please complete the following information):

  • OS: macOS
  • Version 10.13.6
  • python version: Python 3.8.5
@excalibur-kvrv excalibur-kvrv self-assigned this Oct 2, 2020
@excalibur-kvrv excalibur-kvrv added the bug Something isn't working label Oct 2, 2020
@excalibur-kvrv excalibur-kvrv added this to To do in m3u8-downloader via automation Oct 2, 2020
@excalibur-kvrv excalibur-kvrv moved this from To do to In progress in m3u8-downloader Oct 2, 2020
@excalibur-kvrv excalibur-kvrv added this to In progress in M3U8-DL Oct 2, 2020
@excalibur-kvrv excalibur-kvrv linked a pull request Oct 2, 2020 that will close this issue
@excalibur-kvrv excalibur-kvrv moved this from In progress to Review in progress in M3U8-DL Oct 2, 2020
m3u8-downloader automation moved this from In progress to Done Oct 2, 2020
M3U8-DL automation moved this from Review in progress to Done Oct 2, 2020
@excalibur-kvrv excalibur-kvrv reopened this Oct 2, 2020
m3u8-downloader automation moved this from Done to In progress Oct 2, 2020
M3U8-DL automation moved this from Done to In progress Oct 2, 2020
@excalibur-kvrv
Copy link
Owner

excalibur-kvrv commented Oct 2, 2020

The fix has been made, let me know if you face another issue @dayvsonsales, if not, this issue can be closed

m3u8-downloader automation moved this from In progress to Done Oct 2, 2020
M3U8-DL automation moved this from In progress to Done Oct 2, 2020
@excalibur-kvrv excalibur-kvrv reopened this Oct 2, 2020
m3u8-downloader automation moved this from Done to In progress Oct 2, 2020
M3U8-DL automation moved this from Done to In progress Oct 2, 2020
@excalibur-kvrv excalibur-kvrv moved this from In progress to Review in progress in M3U8-DL Oct 2, 2020
@excalibur-kvrv excalibur-kvrv moved this from Review in progress to Done in M3U8-DL Oct 2, 2020
@dayvsonsales
Copy link
Contributor Author

@excalibur-kvrv it works now. thank you. One more question: is the download process made entirely in memory? 'cause I downloaded a total of 730mb playlist and I noticed via top that python process was growing over and over.

@excalibur-kvrv
Copy link
Owner

excalibur-kvrv commented Oct 2, 2020

It is designed to write the data as soon as it is downloaded (this maybe a problem if each individual chunk is big), i did notice the growing size, i'm still working on identifying the areas where the memory is growing. Btw @dayvsonsales how much download speed(mega bytes per sec) where you getting while using m3u8-dl? was it close to your internet bandwidth(mega bytes per sec)?

@dayvsonsales
Copy link
Contributor Author

@excalibur-kvrv the download speed is fine. The only problem is memory usage, my playlists have big single files (more than 100mb usually). I have a simple script that I wrote using only curl and it was working fine (no memory usage problem), but it doesn't scale (like your script that uses 4 parallel processes).

@excalibur-kvrv
Copy link
Owner

excalibur-kvrv commented Oct 2, 2020

100mb per file in the playlist? @dayvsonsales, then i think i know what the issue is. The playlists that i have encountered so far only contained small files(10mb max) so i had designed my program to download the entire file and then write it, the fix is quite simple i will just need to write the data in chunks. it'll take a few hours to fix.

@dayvsonsales
Copy link
Contributor Author

dayvsonsales commented Oct 2, 2020

@excalibur-kvrv I think that I solved my problem. Inspecting the fetch.py file, I noticed that you use session.get without passing the stream option. So, I added these options and deal with the chunks, writing to file_path file, using python's default file system (not your write_file_no_gil). The code is below:

with session.get(download_url, timeout=timeout, stream=True) as r:
            r.raise_for_status()
            
            if r.status_code == 302:
                r = redirect_handler(session, r.content)
        
            with open(file_path, "wb") as f: 
                for chunk in r.iter_content(1024):
                    if not chunk:
                        break
                    f.write(chunk)

The memory usage seems now littler than before.

But theres a check that I had to ignore:

if type(request_data) == bytes:
        data = request_data
    else:
        data = request_data.content

I don't know what you were trying to do with this type check. Could you explain to me, please?

@excalibur-kvrv
Copy link
Owner

excalibur-kvrv commented Oct 2, 2020

The type check was simply for compatiblity, in the event redirect_handler were to run since it was returning bytes. The if else would ensure that it wouldn't be calling .content on a bytes object, but run it on a response object. Also try experimenting with the amount of bytes passed into r.iter_content since if you pass a small amount it would increase the overall file write time, the file writting to the os is faster if it's passed a larger value. The custom write_file_no_gil was to ensure faster write time by taking advantage of the fact that the gil gets dropped.

@excalibur-kvrv
Copy link
Owner

Nice so @dayvsonsales, i take it your issue has been resolved?

@dayvsonsales
Copy link
Contributor Author

dayvsonsales commented Oct 2, 2020

@excalibur-kvrv it solved. Just to clarify, if redirect_data returns bytes there's no iter_content so? Is this right? Cause removing this check could cause more problems, I think. I'll make a pull request just to history the code in this issue. But I think it should be more investigated before merge it.

@excalibur-kvrv
Copy link
Owner

excalibur-kvrv commented Oct 2, 2020

Well if you were to remove the type check, it would cause a lot of problems for whenever the redirect handler were to run. But you are on the right path, with a few more changes and a bit of restructuring the code your fix would work, i'll take a look and notify you of the changes that you need to make. Oh and do ensure that your code passes codacy checks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working hacktoberfest
Projects
Development

Successfully merging a pull request may close this issue.

3 participants