Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some web pages did not get readable mode #26

Open
JadeVane opened this issue Aug 18, 2022 · 3 comments · May be fixed by go-shiori/shiori#525
Open

Some web pages did not get readable mode #26

JadeVane opened this issue Aug 18, 2022 · 3 comments · May be fixed by go-shiori/shiori#525

Comments

@JadeVane
Copy link

Most of the pages I visit are Chinese content, for example:

https://finance.sina.com.cn/tech/it/2022-07-27/doc-imizirav5659240.shtml

https://www.zhihu.com/question/346862321/answer/2573127062

Only archived versions of these pages are available, not readable versions. But one thing puzzles me is that the first page added to shiori gets the readable version correctly and it comes from this link: https://www.zhihu.com/question/546215156/answer/2605044965 , and other links from this site are not able to get a readable version

图片

As you can see, both of them are from zhihu.com

@Katarn
Copy link

Katarn commented Aug 23, 2022

@stale
Copy link

stale bot commented Sep 22, 2022

This issue has been automatically marked as stale because it has not had any activity for quite some time.
It will be closed if no further activity occurs.
Thank you for your contributions.

@Acelya-9028
Copy link

https://finance.sina.com.cn/tech/it/2022-07-27/doc-imizirav5659240.shtml and https://www.zhihu.com/question/346862321/answer/2573127062 are actually readable but the CheckDocument() function fails because these contents consist of many small paragraphs and the condition of 140 characters minimum in a paragraph to calculate the final score is not reached.

https://www.zhihu.com/question/546215156/answer/2605044965 have a paragraph longer than 140 characters and the calculated score is over 20 so the CheckDocument() function does not fails and caching can be done.

https://habr.com/ru/company/selectel/blog/684162/ is ok and this https://habr.com/ru/post/683052/ need this commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants