Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paragraphs issue #5

Closed
jaan143 opened this issue Sep 6, 2022 · 34 comments
Closed

paragraphs issue #5

jaan143 opened this issue Sep 6, 2022 · 34 comments
Labels
help wanted Extra attention is needed

Comments

@jaan143
Copy link

jaan143 commented Sep 6, 2022

please check whole book have this text issue

Screenshot 2022-09-06 213612
Screenshot 2022-09-06 213540

this is book link
https://www.perlego.com/book/3294395/second-language-pronunciation-bridging-the-gap-between-research-and-teaching-pdf

@evmer evmer added the bug Something isn't working label Sep 6, 2022
@evmer
Copy link
Owner

evmer commented Sep 6, 2022

Thanks for reporting this to me. After some research I found that this is related to a pdfkit/wkhtmltopdf bug that seems to have been ongoing for 4+ years:

wkhtmltopdf/wkhtmltopdf#3256
wkhtmltopdf/wkhtmltopdf#45

Unfortunately there's no solution yet, so all I can do is refactor the script replacing pdfkit with another html2pdf library. It's going to take some time, maybe in the next few days I'll come up with something working.

@jaan143
Copy link
Author

jaan143 commented Sep 7, 2022

@evmer well you can check this topic mostly peoples fixed in dpi setting and some not
https://stackoverflow.com/questions/34241932/letter-spacing-is-too-large-with-wkhtmltopdf

@evmer
Copy link
Owner

evmer commented Sep 7, 2022

@jaan143 I refactored the script replacing pdfkit with pyppeteer. Now this bug should be fixed, you can try yourself.
Unfortunately the pdf building process became slow due to the external font/images rendering, I hope to be able to improve it in a future version.

Don't forget to update the python requirements:

python3 -m pip install pyppeteer

@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer thanks for your efforts Dear :)
here is error while converting to pdf in new script

page 347 downloaded
Traceback (most recent call last):
File "downloader.py", line 183, in
asyncio.get_event_loop().run_until_complete(html2pdf())
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "downloader.py", line 114, in html2pdf
browser = await launch(options={
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main_4 (new pdf convert library)\perlego-downloader-main>

@evmer
Copy link
Owner

evmer commented Sep 8, 2022

@jaan143 seems your system is missing some required dependencies:

https://stackoverflow.com/questions/57217924/pyppeteer-errors-browsererror-browser-closed-unexpectedly

@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer check this
downloader.py:183: DeprecationWarning: There is no current event loop
asyncio.get_event_loop().run_until_complete(html2pdf())

i read a lot topics and they fixing issue in their project code
i think you need to add timeout session
but i dont know exactly

here is main link
https://github.com/miyakogi/pyppeteer

and it is also no more updating

are you tried it in windows os ?

@jaan143 jaan143 mentioned this issue Sep 8, 2022
@evmer
Copy link
Owner

evmer commented Sep 8, 2022

@jaan143 can you please describe better your issue?

This is just a warning and shouldn't break the script execution:

downloader.py:183: DeprecationWarning: There is no current event loop asyncio.get_event_loop().run_until_complete(html2pdf())

Try to reinstall the latest version of Python and upgrade the required dependencies.

@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer actually issue is the same which i show above
and i spend whole day to get help from internet (github stackoverflow etc)
but cannot get proper answer mostly there are linux related helps

Traceback (most recent call last):
File "downloader.py", line 183, in
asyncio.get_event_loop().run_until_complete(html2pdf())
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\asyncio\base_events.py", line 616, in run_until_complete
return future.result()
File "downloader.py", line 114, in html2pdf
browser = await launch(options={
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "C:\Users\Hp\AppData\Local\Programs\Python\Python38\lib\site-packages\pyppeteer\launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main_4 (new pdf convert library)\perlego-downloader-main>

@evmer
Copy link
Owner

evmer commented Sep 8, 2022

@jaan143 I updated the script, can you please try now?

@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer ok let me confirm you

@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer still the same. what OS you are using ?
C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main\downloader.py:184: DeprecationWarning: There is no current event loop
asyncio.get_event_loop().run_until_complete(html2pdf())
Traceback (most recent call last):
File "C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main\downloader.py", line 184, in
asyncio.get_event_loop().run_until_complete(html2pdf())
File "C:\Users\Hp\AppData\Local\Programs\Python\Python310\lib\asyncio\base_events.py", line 641, in run_until_complete
return future.result()
File "C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main\downloader.py", line 114, in html2pdf
browser = await launch(options={
File "C:\Users\Hp\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\launcher.py", line 307, in launch
return await Launcher(options, **kwargs).launch()
File "C:\Users\Hp\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\launcher.py", line 168, in launch
self.browserWSEndpoint = get_ws_endpoint(self.url)
File "C:\Users\Hp\AppData\Local\Programs\Python\Python310\lib\site-packages\pyppeteer\launcher.py", line 227, in get_ws_endpoint
raise BrowserError('Browser closed unexpectedly:\n')
pyppeteer.errors.BrowserError: Browser closed unexpectedly:

PS C:\Users\Hp\Downloads\Compressed\perlego-downloader-main\perlego-downloader-main 5\perlego-downloader-main>

@evmer
Copy link
Owner

evmer commented Sep 8, 2022

@jaan143 I tested it on MacOSX, Linux (Debian) and Windows 10, so it seems a problem related to your configuration.

Can you please follow these instructions for troubleshoot and post the output here?

@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer he saying copy command and run in powershell or cmd but he is asking for docker or aws.
anyway i just copied and past in my powershell and here screenshot you can see
Screenshot 2022-09-08 161259

@evmer
Copy link
Owner

evmer commented Sep 8, 2022

@jaan143 can you please copy-paste the printed command and run it? I mean this:

image

@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer here is
Screenshot 2022-09-08 223526

@evmer
Copy link
Owner

evmer commented Sep 8, 2022

@jaan143 to run an executable on powershell you first have to 'dot' source the script, so for you:

./root/.local/share/pyppeteer/local-chromium/588429/chrome-linux/chrome...etc.

@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer here is with dot
Screenshot 2022-09-08 224425

@evmer
Copy link
Owner

evmer commented Sep 8, 2022

@jaan143 I'm not familiar with powershell, try to google the error and make it work

@evmer evmer added help wanted Extra attention is needed and removed bug Something isn't working labels Sep 8, 2022
@jaan143
Copy link
Author

jaan143 commented Sep 8, 2022

@evmer here is cmd
Screenshot 2022-09-08 224751

@lilfmdude
Copy link

unknown

this is what i get any idea i got it this far

@jaan143
Copy link
Author

jaan143 commented Sep 9, 2022

@evmer this script is also same working like yours. it is downloading html pages and then making epub. so you can get help from it and make epub file which will more best
https://github.com/ilyakharlamov/bookmate_downloader

@smack893
Copy link

did you have a plan to update

@lilfmdude
Copy link

i don't know if im getting closer i am getting this now
image

@lilfmdude
Copy link

this one is new too hopefully I'm not annoying you with all this
image

@jaan143
Copy link
Author

jaan143 commented Sep 12, 2022

@evmer i dont want to delete html files so can you tell me which lines i need to remove from script code
then html files will not be delete ?

@evmer
Copy link
Owner

evmer commented Sep 12, 2022

Guys, this is not a chat group, please try to follow the Github's guidelines. The original problem was solved by replacing pdfkit with pyppeteer so now I'll mark this as closed. Feel free to open a new request providing detailed info about your issue.

@lilfmdude please see my reply to #7.
@jaan143 i can't address issues regarding puppeeter/pyppeteer. Your OS configuration may be different, try to google the error messages or ping their Github repo.

@evmer evmer closed this as completed Sep 12, 2022
evmer added a commit that referenced this issue Sep 12, 2022
evmer added a commit that referenced this issue Sep 12, 2022
bug reported by some windows users in #5 #8
@evmer
Copy link
Owner

evmer commented Sep 12, 2022

@jaan143 I added some troubleshooting guidelines in the readme, let me know if you can solve your issue. Please consider to download the latest version of the script, I fixed some few bugs.

@jaan143
Copy link
Author

jaan143 commented Sep 12, 2022

@evmer its done now
Thank you very very much :)

@evmer
Copy link
Owner

evmer commented Sep 12, 2022

@evmer its done now
Thank you very very much :)

Finally! 😅 So was the missing chromedriver the issue?

@jaan143
Copy link
Author

jaan143 commented Sep 12, 2022

@evmer no. it was already set in path
problem was in pyppeteer which i reinstall and then pyppeteer reinstall chromium
but if i download epub book format its top and bottom margins of pages are very close with text and also page width and height is very big
and if download pdf format it is good

@evmer
Copy link
Owner

evmer commented Sep 12, 2022

but if i download epub book format its top and bottom margins of pages are very close with text and also page width and height is very big

You can try to adjust the margins setting a different value (in pixels) at line 209: https://github.com/evmer/perlego-downloader/blob/main/downloader.py#L209

  	options['margin'] = {'top': '10', 'bottom': '10', 'left': '10', 'right': '10'}

Just try to increase them and see how the output looks like

@evmer
Copy link
Owner

evmer commented Sep 12, 2022

problem was in pyppeteer which i reinstall and then pyppeteer reinstall chromium

Can you please elaborate the solution step-by-step? It's a very common issue and you'd make many users happy. Thank you!

@jaan143
Copy link
Author

jaan143 commented Sep 13, 2022

@evmer can i also set page size like A4 or letter ?

@jaan143
Copy link
Author

jaan143 commented Sep 13, 2022

@evmer see this
Screenshot 2022-09-13 191525
after pdf i open and i saw some images are breaks in two pages
here is book url you can check
https://www.perlego.com/book/3260547/cell-biology-a-short-course-pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants