[facebook] add support #5626

zWolfrost · 2024-05-22T19:24:41Z

Fixes #470 and #2612 (probably duplicate)
For now it supports Account Photos & Photo Albums.
The only way it can work is by making one request per post, so it's not really optimized unfortunately.

zWolfrost · 2024-05-23T19:20:46Z

It looks like Facebook blocks your account for about one hour when running the extractor for too many images.
It happened to me, after running the extractor and downloading 1800 of them.
Also, it only appears to be an account ban (logging out removes the block), and it only prevents you from viewing images by opening them from a link (opening them using the react UI works).
It's probably best if the extractor actively avoids using the imported cookies, unless requested otherwise (with proper warnings).
Please let me know your thoughts.

MrJmpl3 · 2024-06-13T18:41:35Z

It's probably best if the extractor actively avoids using the imported cookies, unless requested otherwise (with proper warnings).

I think photos and videos have a signature in the url, Facebook maybe can track you and ban using this info.

zWolfrost · 2024-06-13T21:43:46Z

I think photos and videos have a signature in the url, Facebook maybe can track you and ban using this info.

I don't think i understand how you would be able to ban someone using a signature in the photo url. I think the most reasonable option would just be to use the request cookies (which include your account ids and such) to account-ban you.

As i mentioned, it's not really a complete ban, it's only limited to some parts of the UI, and logging out (thus not sending the account request cookies) does remove the block, with the tradeoff that you can't view private or R-18 images.

I still don't know if being logged out in advance prevents the ban altogether. If that's the case then i think i will add a warning about that.

also added author followups for singular images

zWolfrost · 2024-06-16T09:07:33Z

After doing some more testing i can tell that not using cookies still gets you blocked from viewing images, in the sense that you are forced to log in, and it happens much faster than when using them.
I think it's best not to use cookies unless facebook forces the user to login, and print a small warning whenever the extractor uses them. Doing this i could extract about 2400 images before getting temporarily blocked, and i think that's pretty good.

Also, I'd love to know if the extractor works for anyone else other than me, so please feel free to let me know.

AdBlocker69 · 2024-06-16T17:28:04Z

Hi, I've tested your version and it seems to be working fine for pictures. I'm planning to save quite a few images from a public Facebook page and was wondering if using one of the --sleep commands could circumvent you from being blocked (or if Facebook just reacts to an arbitrary number of request, no matter the frequency). And just overall - does that mean I'm generally unable to connect to the Facebook services (like using gallery-dl with it) or does it just prevent browser/account interaction?
Let me not forget - thanks for your work. I hope this gets implemented in the official project soon. Facebook is (still) such a big platform; so having a tool like gallery-dl supporting it is pretty important (imo)...

Facebook video support would be nice too: Luckily in my case they weren't that many so I was able to download them one by one with yt-dlp... ...But they also don't support Album/Account video downloading (yet) like they do with YouTube for example.

zWolfrost · 2024-06-16T18:00:49Z

Hi, thank you for your feedback.

I'm not sure if waiting to continue extracting would work, and if it did work, i have no idea for how long or after how many images the wait should start. That would require a lot of testing and unfortunately every time i get blocked i have to wait about 6 hours to try again.

To be more specific, the "block" which I'm talking about only prevents you from accessing images by their URL (the way the extractor does it), but you can still access them by using the React user interface.
That means, accessing them by clicking them, scrolling with the arrows etc., but if, for example, you reload the page while viewing one, an error pops up about you "using this feature too fast".
When not using an account, instead, you don't get the error but you get redirected to the login page (aside from that the behavior is the same though).

As far as i can tell this block is only limited to this and you can do any other thing on Facebook

About the video support, i will keep that in mind. I'm not sure how yt-dlp downloads them and i will check that out when i have time.

AdBlocker69 · 2024-06-16T22:46:47Z

Thanks for the info :)
Btw, do you know what to do when let's say, you have downloaded 2400 pictures and get blocked afterwards, and want to continue downloading from the same profile; can you just continue after the 6 hours because I guess when checking for duplicates gallery-dl still does requests for the 2400 images (as it goes chronologically from newest to oldest post) or does that work differently?

also fixed some bugs

zWolfrost · 2024-06-17T08:45:33Z

No, I'm sorry, once you get blocked the photo page doesn't load at all (assuming you're loading it by its URL) so there is no way to know the metadata and stuff. This is the reason why I just added a way to continue the extraction from an image in the set/album instead of having to start from the beginning. Just take the photo URL and add "&setextract" to it to download the whole set from there instead of the photo alone. The user will be prompted with this URL if they get blocked while extracting

AdBlocker69 · 2024-06-18T00:00:31Z

Good idea for implementing that 👍🏻 Does it only work with the prompted URL because I just tried it up front by using an image link and adding "&setextract" to it but it gave me an 'unknown command' error after downloading just the singular image?

Also, it seems like your video extraction only pulls the version without audio (the best one in the list of formats in yt-dlp but it gets merged with an audio-only version there by default)... So it would be best to either add the ffmpeg merge-by-default or have it select the "hd" format by default which has video+audio.

zWolfrost · 2024-06-18T08:37:08Z

The "&setextract" feature didn't work to you because you probably passed it to gallery-dl without using the double quotes (") and the command prompt recognized the "&" as the split character between two commands (you can use the ampersand to execute two commands in one line). That would also explain why it downloaded the image, and then gave you an "unknown command" message, as you probably don't have a command assigned to the "setextract" keyword

By the way, after further inspection, I don't think there's a way to make an "all profile videos" extractor, as they don't share a set id i can use to navigate though them all.

Good catch for the audio thing though, i wasn't wearing headphones :)
I have fixed it right now, the audio gets downloaded as well. By default they will be separate, to merge the two you'll have to let youtube-dl/yt-dlp handle the download, by adding "videos": "ytdl" in the facebook extractor configuration

AdBlocker69 · 2024-06-18T18:51:38Z

Okay, several things I just realized by doing some trial and error 😅:
First of all, I need to use the full link for gallery-dl to detect what "set" I even want to continue downloading from; I had just used the short version (as marked blue) and wondered why it didn't do anything anymore after downloading the image the link directs to.

Secondly, I need to then put the full link into quotation marks since it otherwise, as you said, detects the text after the ampersand as a secondary command (as marked red), giving me a 'syntax error' and not processing the rest further after downloading the image the link directs to.

So this is how it's to be written to avoid any errors due to the link format and command logic etc.

Now I got it to work successfully 👌🏻

Alternatively you can also just add the set id given to you by the previous downloading as it is in the folder name where your set images were saved to (e.g. the 'Timeline photos' set (all account images)) manually after the 'short' image link with the addition of "&set=" in front of it:

zWolfrost · 2024-06-18T19:13:06Z

I'm sorry if things got confusing 😥 at least you managed to make it work now.
Of course, someone who would've had their download blocked would have been prompted with the full URL already so there is no chance of this happening in a real situation (at least i hope so).
I will see if there is a way to maybe get the set id by inspecting the photo page by itself (if i remember correctly there should be a default one)

AdBlocker69 · 2024-06-18T19:33:44Z

No problem, that's just what happens when the outgoing situation is slightly different :)
I just like to use short links as sometimes, when taking them directly from browsing the web, they have some certain parameters in them (like a less than source size of an image etc.) which are undesirable. So it's more or less best practice for me to take the link as 'raw' as possible to avoid any of that.
In this case the extra information given in the link was vital though...

zWolfrost · 2024-06-18T19:41:38Z

There, i just had to change the matching URL pattern a little. Now it works even without including the set id. Hopefully it's the same for you. I recommend you to avoid this anyway, as Facebook acts a little weird when you navigate images without their set id. Sometimes their sequence gets changed or some images get skipped altogether. Or maybe it just works fine and i unintentionally bugfixed it some while ago, i don't know.

AdBlocker69 · 2024-06-20T14:50:21Z

Works 🤙🏻
Thanks; I guess it's generally helpful to have it work like that too for when you maybe just have the image link from a 3rd source of whatever and want to download from that point back - and you then don't have to specifically go back to the profile page itself to find the set id.
So quality of life wise good, no question.

the extractor should be ready :)

zWolfrost · 2024-08-09T19:55:11Z

Ah, well, that's embarrassing. I kept thinking I had to manually replace {filename} argument with the image filename inside the curly brackets instead of just using the argument as is. Not very smart of me. The argument works just like you said and I'm able to automatically save photos with my preferred filenames. Apologies for the confusion and thanks for your patience.

Don't worry, happens to the best of us. By the way, I've just fixed a thing that could have been the cause. Could you try again with the latest commit and let me know if it works instead?

lobt4 · 2024-08-10T01:19:46Z

Don't worry, happens to the best of us. By the way, I've just fixed a thing that could have been the cause. Could you try again with the latest commit and let me know if it works instead?

Errored out the same as last time, perhaps this is happening to just my machine for some reason:

AdBlocker69 · 2024-08-15T19:39:24Z

Hi,
I just tried having yt-dlp handle the facebook video extraction and I'm getting this error:

Might it be that it only searches for "youtube-dl" and therefore can't find it?

zWolfrost · 2024-08-15T20:18:26Z

@AdBlocker69 Please enter this line in the CMD: py -m yt_dlp (Windows) / python3 -m yt_dlp (Linux)

If it prints No module named yt_dlp or such, then it's something to do with your yt-dlp installation;

If it's something like You must provide at least one URL then should try to see if this happens with other extractors as well (reddit and twitter, for example) and let me know what happens. Remember try the other extractors on the same repo you are using for the facebook one.

For your information, gallery-dl should, by default, try to import "yt_dlp" followed by "youtube_dl" as fallback. For me it works fine so this is kinda odd.

AdBlocker69 · 2024-08-15T20:48:37Z

I don't have yt-dlp installed with Python but as the standalone exe, could that be the culprit? (I get No module named yt_dlp...)

zWolfrost · 2024-08-15T20:53:26Z

Yeah, it most likely is. You should probably install yt-dlp with pip (pip install yt-dlp) (follow a tutorial if you don't have pip installed and don't know how to install it, there's a lot of them). As far as i know there is no way to make gallery-dl use the yt-dlp executable instead of the python module.

AdBlocker69 · 2024-08-15T21:38:04Z

Thanks, this way it works.
By default yt-dlp pulls the best video + best audio and merges them using FFmpeg.
In this case it would be "452530707751531v" + "1054256548546316a".

How'd I go about telling yt-dlp to, for example, pull "hd" instead?
Because the following doesn't do the trick (normal way of selecting the yt-dlp format to use in gallery-dl; gallery-dl configuration documentation website):

zWolfrost · 2024-08-15T21:59:30Z

The reason why it's not working is because you should put that argument in the "downloader" options, not the "extractor" ones. Took me a bit 😅

{
	"extractor": {
		"#": "not here!"
	},

	"downloader": {
		"ytdl": {
			"format": "sd"
		}
	}
}

AdBlocker69 · 2024-08-15T23:01:41Z

LOL, okay thanks, that works better now 😂...
Logically thinking it makes sense to be in "downloader" but I simply didn't think of that even being an option as I saw the same argument existing in the "extractor" options.
What is the one in there for then?

Fixed some metadata attributes from not decoding correctly from non-latin languages, or not showing at all. Also improved few patterns.

-Added tests -Fixed video extractor giving incorrect URLs -Removed start warning -Listed supported site correctly

I've chosen to remove the "reactions", "comments" and "views" attributes as I've felt that they require additional maintenance even though nobody would ever actually use them to order files. I've also removed the "title" and "caption" video attributes for their inconsistency across different videos. Feel free to share your thoughts.

fireattack · 2024-10-13T02:42:11Z

Sorry if this has been mentioned; but this seems to only be able to process the user, than the single post (https://www.facebook.com/{username}/posts/{posthash})

zWolfrost · 2024-10-13T08:08:02Z

@fireattack I just fixed it. Let me know if now it works for you.

fireattack · 2024-10-13T08:27:20Z

I now get error:

>python -m gallery_dl "https://www.facebook.com/joho.press.jp/posts/pfbid02mfFRpVkErLQxQ8cpD2f1hwXEVsFzK8kfNBKdK2Jndnx6AkmMQZuXhovwDgwvoDNil" 
[facebook][error] HttpError: '404 Not Found' for 'https://www.facebook.com/media/set/?set='

zWolfrost · 2024-10-13T08:34:12Z

@fireattack The problem is that the extractor can't really get all the images in the posts. You would have a better chance copying the set id of one of those images, which is in their urls (in your case it's pcb.1160563418981189) and giving the set page to gallery-dl (https://www.facebook.com/media/set/?set=pcb.1160563418981189). Right now I'll see if I can do something about it (as well as fixing the error).

zWolfrost · 2024-10-13T09:42:12Z

Right now the extractor should get the first set id it can find in the post and try to extract that set. Sometimes the set contains way more images than the post, but you could still quit it when it's done I guess. I could technically make it quit by itself when it has extracted the same number of images in the posts (36+4) but I'm not sure if I want to.

zWolfrost added 3 commits May 22, 2024 21:15

[facebook] add support

1c9d170

renamed extractors & subcategories

0146e49

better stability, modularity & naming

edcba3f

zWolfrost added 3 commits May 26, 2024 10:32

added single photo extractor, warnings & retries

fe6f1bf

more metadata + extract author followups

20ee175

renamed "album" mentions to "set" for consistency

252c6d6

zWolfrost marked this pull request as ready for review May 31, 2024 14:20

zWolfrost added 2 commits June 16, 2024 10:56

cookies are now only used when necessary

89ba87d

also added author followups for singular images

removed f-strings

341a8a0

added way to continue extraction from where it left off

f1d1fb5

also fixed some bugs

zWolfrost added 2 commits June 17, 2024 12:13

fixed bug wrong subcategory

22158c0

added individual video extraction

e5364d3

extract audio + added ytdl option

0577df5

updated setextract regex

19a2e0c

added option to disable start warning

4a714dc

the extractor should be ready :)

fixed a few bugs regarding profile parsing

9113eca

zWolfrost added 2 commits August 16, 2024 15:41

a few bugfixes

16c34e0

Fixed some metadata attributes from not decoding correctly from non-latin languages, or not showing at all. Also improved few patterns.

retrigger checks

2f144e8

mikf mentioned this pull request Aug 29, 2024

Questions, Feedback, and Suggestions #4 #5262

Open

zWolfrost added 9 commits August 30, 2024 16:18

Final cleanups

c04224c

-Added tests -Fixed video extractor giving incorrect URLs -Removed start warning -Listed supported site correctly

fixed regex

e89982c

trigger checks

3d7ae88

fixed regex

16a4262

fixed filename fallback

44960de

fixed retrying when a photo url is not found

c2e9d40

Merge branch 'master' into facebook

ffd3d44

fixed end line

4e7ed6c

post url fix + better naming

673d6af

fix posts

5b5e885

Merge branch 'master' into facebook

35d5517

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[facebook] add support #5626

[facebook] add support #5626

zWolfrost commented May 22, 2024 •

edited

Loading

zWolfrost commented May 23, 2024 •

edited

Loading

MrJmpl3 commented Jun 13, 2024

zWolfrost commented Jun 13, 2024

zWolfrost commented Jun 16, 2024

AdBlocker69 commented Jun 16, 2024

zWolfrost commented Jun 16, 2024

AdBlocker69 commented Jun 16, 2024

zWolfrost commented Jun 17, 2024

AdBlocker69 commented Jun 18, 2024

zWolfrost commented Jun 18, 2024 •

edited

Loading

AdBlocker69 commented Jun 18, 2024 •

edited

Loading

zWolfrost commented Jun 18, 2024

AdBlocker69 commented Jun 18, 2024 •

edited

Loading

zWolfrost commented Jun 18, 2024 •

edited

Loading

AdBlocker69 commented Jun 20, 2024 •

edited

Loading

zWolfrost commented Aug 9, 2024

lobt4 commented Aug 10, 2024

AdBlocker69 commented Aug 15, 2024

zWolfrost commented Aug 15, 2024 •

edited

Loading

AdBlocker69 commented Aug 15, 2024 •

edited

Loading

zWolfrost commented Aug 15, 2024 •

edited

Loading

AdBlocker69 commented Aug 15, 2024 •

edited

Loading

zWolfrost commented Aug 15, 2024

AdBlocker69 commented Aug 15, 2024

fireattack commented Oct 13, 2024 •

edited

Loading

zWolfrost commented Oct 13, 2024

fireattack commented Oct 13, 2024

zWolfrost commented Oct 13, 2024

zWolfrost commented Oct 13, 2024

[facebook] add support #5626

Are you sure you want to change the base?

[facebook] add support #5626

Conversation

zWolfrost commented May 22, 2024 • edited Loading

zWolfrost commented May 23, 2024 • edited Loading

MrJmpl3 commented Jun 13, 2024

zWolfrost commented Jun 13, 2024

zWolfrost commented Jun 16, 2024

AdBlocker69 commented Jun 16, 2024

zWolfrost commented Jun 16, 2024

AdBlocker69 commented Jun 16, 2024

zWolfrost commented Jun 17, 2024

AdBlocker69 commented Jun 18, 2024

zWolfrost commented Jun 18, 2024 • edited Loading

AdBlocker69 commented Jun 18, 2024 • edited Loading

zWolfrost commented Jun 18, 2024

AdBlocker69 commented Jun 18, 2024 • edited Loading

zWolfrost commented Jun 18, 2024 • edited Loading

AdBlocker69 commented Jun 20, 2024 • edited Loading

zWolfrost commented Aug 9, 2024

lobt4 commented Aug 10, 2024

AdBlocker69 commented Aug 15, 2024

zWolfrost commented Aug 15, 2024 • edited Loading

AdBlocker69 commented Aug 15, 2024 • edited Loading

zWolfrost commented Aug 15, 2024 • edited Loading

AdBlocker69 commented Aug 15, 2024 • edited Loading

zWolfrost commented Aug 15, 2024

AdBlocker69 commented Aug 15, 2024

fireattack commented Oct 13, 2024 • edited Loading

zWolfrost commented Oct 13, 2024

fireattack commented Oct 13, 2024

zWolfrost commented Oct 13, 2024

zWolfrost commented Oct 13, 2024

zWolfrost commented May 22, 2024 •

edited

Loading

zWolfrost commented May 23, 2024 •

edited

Loading

zWolfrost commented Jun 18, 2024 •

edited

Loading

AdBlocker69 commented Jun 18, 2024 •

edited

Loading

AdBlocker69 commented Jun 18, 2024 •

edited

Loading

zWolfrost commented Jun 18, 2024 •

edited

Loading

AdBlocker69 commented Jun 20, 2024 •

edited

Loading

zWolfrost commented Aug 15, 2024 •

edited

Loading

AdBlocker69 commented Aug 15, 2024 •

edited

Loading

zWolfrost commented Aug 15, 2024 •

edited

Loading

AdBlocker69 commented Aug 15, 2024 •

edited

Loading

fireattack commented Oct 13, 2024 •

edited

Loading