Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realbooru extractor broken #2530

Closed
DuendeInexistente opened this issue Apr 26, 2022 · 11 comments
Closed

Realbooru extractor broken #2530

DuendeInexistente opened this issue Apr 26, 2022 · 11 comments

Comments

@DuendeInexistente
Copy link

Realbooru's extractor is broken, and generates 404ing image URLs.

@mikf mikf closed this as completed in 3e926bd May 2, 2022
@mo-han
Copy link
Contributor

mo-han commented Sep 28, 2022

@mikf
is this fix included in the latest release, or it's not a complete fix?
am using version 1.23.1, and still encounter this issue. most images are 404 on this search page:
https://realbooru.com/index.php?page=post&s=list&tags=x-art
note: i picked one or two of those images, the resized picture is showd, but the "original" ones are 404.

@mo-han
Copy link
Contributor

mo-han commented Sep 28, 2022

btw, tags=true is not working for realbooru

@Hrxn
Copy link
Contributor

Hrxn commented Sep 28, 2022

[..] most images are 404 on this search page: https://realbooru.com/index.php?page=post&s=list&tags=x-art note: i picked one or two of those images, the resized picture is showd, but the "original" ones are 404.

Same in the browser.. their "original" links seem to be broken for many of these.

@mo-han
Copy link
Contributor

mo-han commented Nov 10, 2022

@mikf
i've made my patch for this, including solution for both the incorrect image url 404 problem and the missmatch of extended tags extracting.
my self-using code snippet is pasted in the comments of 3e926bd

mikf added a commit that referenced this issue Nov 11, 2022
@mikf
Copy link
Owner

mikf commented Nov 11, 2022

@mo-han
Thank you. I've applied your fixes and everything seems to be working again. (ecad02c, 6423f99)

@mo-han
Copy link
Contributor

mo-han commented Jan 12, 2023

404 again, seems realbooru changed something again, but not same like last time.
should i re-open this issue or create a new one?
anyway let me describe the new problem here.

https://realbooru.com/index.php?page=post&s=view&id=813374
https://realbooru.com/index.php?page=post&s=view&id=762045
the extension is missing, and file_url is invalid (the suffix lost).
videos seem ok, only images (gif included) effected.

haven't dug into the details, maybe later.

where is the source code which extract href from Original?

@mikf
Copy link
Owner

mikf commented Jan 13, 2023

should i re-open this issue or create a new one?

You can't reopen this issue, since you didn't create it and are not the repo owner. A new one might have been better, but this works as well.

where is the source code which extract href from Original?

gallery-dl uses the old gelbooru API from back when realbooru was just a gelbooru fork. It still "works", but, as you said, most URLs are missing their filename extensions.

@mikf mikf reopened this Jan 13, 2023
@mikf
Copy link
Owner

mikf commented Apr 19, 2023

Fixed in ac97aca (v1.25.2) by grabbing file URLs from HTML post pages.

@mikf mikf closed this as completed Apr 19, 2023
@mo-han
Copy link
Contributor

mo-han commented May 29, 2024

@mikf
It's been long since my last downloading from realbooru, and today I met some 404 not found again.

  • they are old posts
  • they are webm/gif, but the URL extracted is jpeg
['gallery-dl', '-R', '20', '-c', 'C:\\Users\\mo-han\\locallib\\usr\\etc\\gallery-dl.json', '-o', 'base-directory=C:\\Users\\mo-han\\locallib\\usr\\dl\\gldl', '-o', 'cookies-update=true', '-o', 'videos=true', '-o', 'tags=true', '-o', 'filename="{category} {date!S:.10} {id} {md5} ${tags_copyright!S:L40/___/} @{tags_model!S:L80/___/} .{extension}"', '-o', 'directory=["malena_morgan {category} pq"]', '-vv', '--range', '1-10', 'https://realbooru.com/index.php?page=post&s=list&tags=malena_morgan sort:score']
[gallery-dl][debug] Version 1.26.9
[gallery-dl][debug] Python 3.8.10 - Windows-10-10.0.22631-SP0
[gallery-dl][debug] requests 2.31.0 - urllib3 2.1.0
[gallery-dl][debug] Configuration Files ['C:\\Users\\mo-han\\locallib\\usr\\etc\\gallery-dl.json']
[gallery-dl][debug] Starting DownloadJob for 'https://realbooru.com/index.php?page=post&s=list&tags=malena_morgan sort:score'
[realbooru][debug] Using GelbooruV02TagExtractor for 'https://realbooru.com/index.php?page=post&s=list&tags=malena_morgan sort:score'
[cookies][debug] Extracting cookies from C:\Users\mo-han\AppData\Roaming\Mozilla\Firefox\Profiles\4n321zy5.default-release\cookies.sqlite
[cookies][info] Extracted 337 cookies from Firefox
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): realbooru.com:443
[urllib3.connectionpool][debug] https://realbooru.com:443 "GET /index.php?page=dapi&s=post&q=index&tags=malena_morgan+sort%3Ascore&pid=0&limit=100 HTTP/1.1" 200 None
[urllib3.connectionpool][debug] https://realbooru.com:443 "GET /index.php?page=post&s=view&id=755451 HTTP/1.1" 200 None
[realbooru][debug] Active postprocessor modules: [ExecPP]
...
[urllib3.connectionpool][debug] https://realbooru.com:443 "GET /images/cc/9f/cc9f3177b8a185cb33760862cf05cad5.jpeg HTTP/1.1" 404 None
[downloader.http][warning] '404 Not Found' for 'https://realbooru.com/images/cc/9f/cc9f3177b8a185cb33760862cf05cad5.jpeg'
[download][error] Failed to download realbooru 2018-02-19 649682 cc9f3177b8a185cb33760862cf05cad5 $ @aurielee_summers malena_morgan .jpg
...
[urllib3.connectionpool][debug] https://realbooru.com:443 "GET /images/5c/1d/5c1d9cea962d56d1539d1799936c227f.jpeg HTTP/1.1" 404 None
[downloader.http][warning] '404 Not Found' for 'https://realbooru.com/images/5c/1d/5c1d9cea962d56d1539d1799936c227f.jpeg'
[download][error] Failed to download realbooru 2019-09-03 694959 5c1d9cea962d56d1539d1799936c227f $ @aurielee_summers malena_morgan morg .jpg
...
[urllib3.connectionpool][debug] https://realbooru.com:443 "GET /images/3b/2e/3b2e9d686748d3d1f71f606ed131ccf2.jpeg HTTP/1.1" 404 None
[downloader.http][warning] '404 Not Found' for 'https://realbooru.com/images/3b/2e/3b2e9d686748d3d1f71f606ed131ccf2.jpeg'
[download][error] Failed to download realbooru 2019-09-03 694958 3b2e9d686748d3d1f71f606ed131ccf2 $ @aurielee_summers malena_morgan morg .jpg
...

@mikf
Copy link
Owner

mikf commented Jun 1, 2024

@mo-han
Should be fixed in v1.27.0 (807e2f7)
(I managed to break video downloadss with my previous attempt at fixing older files in acc94ac)

@mo-han
Copy link
Contributor

mo-han commented Jun 2, 2024

@mikf
I see "trying fallback url", guess that's the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants