Attempting to craft Twitter configuration to fully archive an account #3137
atomicthumbs
started this conversation in
General
Replies: 1 comment
-
Looks like you have a working config now: #3158 If you want to put everything in a file together, try adding this PP: {
"mtime": true,
"name": "metadata",
"open": "a",
"#": "post",
"filename": "{user[name]!l}.jsonl",
"directory": "media_full_jsonl/",
"not locals().get('extension') and not locals().get('reply_to') and not locals().get('quote_id') and not retweet_id and not count": {
"directory": "tweets_full_jsonl/"},
"mode": "jsonl",
"extension": "jsonl"
} If you want to put {date} and {content} together in a jsonlines file, add this PP: {
"name": "metadata",
"mtime": true,
"event": "post",
"filter": "content",
"filter": "not locals().get('extension') and not locals().get('reply_to') and not locals().get('quote_id') and not retweet_id and not count",
"archive": "./gallery-dl_archives/metadata_archives/twitter/{user[name]!l}_tweets_metadata.archive",
"directory": "tweets_jsonl/",
"filename": "{user[name]!l}_tweets.jsonl",
"open": "a+",
"mode": "custom",
"content-format": "{_lit[{]}\"Date\": {date!j}, \"Tweet\": {content!j}{_lit[}]}\n"
} Would create a valid json in every line: {"Date": "2022-12-18 17:03:22", "Tweet": "A Ted talk that\u2019s too long is a Theodore Talk"}
{"Date": "2022-12-16 23:32:49", "Tweet": "Saw a great speech on arthritis. Still not sure if the mic drop was intentional"}
{"Date": "2022-12-14 22:17:12", "Tweet": "Geneva must have a crazy convention center"}
{"Date": "2022-12-13 18:09:59", "Tweet": "No one is more disappointed around the holidays than the people who steal my packages"} |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a multiple Twitter accounts, both locked and unlocked, with many thousands of media and text posts. I am trying to archive the entire accounts using gallery-dl, since Twitter's built-in data archive is extremely inconvenient to "hydrate."
Unfortunately, the metadata/postprocessor/etc configuration is too complex for me to wrap my head around. I'd like to have all the tweets I've posted or retweeted archived, whether or not they contain media. I'd also like, ideally, a single big metadata file containing all my text tweets and information associated with them. I can't tell if gallery-dl can do that or not by reading the configuration instructions. Dealing with hundreds of thousands of tiny JSON files seems like a pain in the ass.
What I'd prefer is the following: I feed gallery-dl the account, and get back the following directories in the base dir:
[account]/
(contains all media files I've posted, and their metadata)[account]/retweets/
(contains media and metadata for everything I've retweeted or quote tweeted)[account/likes/
(the same for likes, would be nice to have but I'm not sure if it's possible)Essentially, just the most comprehensive and comprehensible archive possible, with mtime postprocessing enabled. Is there an easy way to pull this off?
As a postscript: some kind of config cookbook/snippet sharing arrangement would be very helpful for this stuff.
Beta Was this translation helpful? Give feedback.
All reactions