Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track deleted posts #56

Closed
py1984 opened this issue Nov 19, 2017 · 4 comments

Comments

@py1984
Copy link

commented Nov 19, 2017

Absolutely love this software! Something that would be helpful is if deleted posts had their filename changed to indicate they were deleted. For example, after downloading all the posts in a profile, when subsequently running InstaLoader with a certain flag then when it scans through all the photos in a profile and finds one that is missing based on the previous downloading, then that file that was previously downloaded is renamed with a postfix of "-deleted."
Looking forward to feedback! I think this would really amplify this tool.
Thanks!

@aandergr

This comment has been minimized.

Copy link
Member

commented Dec 3, 2017

I agree it would be an interesting feature, but cleanly implementing it would be challenging. We would face at least the following problems:

  • Sources such as hashtags most commonly spawn almost-infinite amounts of new posts at a time, so it would not be possible to iterate through them and compare which posts are still present online. This option would only make sense for downloading profiles and :feed.
  • The path where to store posts is customizable, so for this feature to work, a filename specification has to be chosen that encodes the profile name and date, or the shortcode. Instaloader would then need to "reverse" the filename specification to get that data and compare it with the query result from Instagram. Alternatively, the JSON-file formerly created with --metadata-json would have to be analyzed.
  • With --fast-update enabled, Instaloader aborts once the first already-downloaded picture is encountered. The online-offline comparison would stop at that moment, so this option could not be used together with --track-deleted.

What exactly are you trying to achieve with such an option? What is the original problem that you're trying to solve? How would you further process the generated information? I'm asking in the hope that we could elaborate an easier method for obtaining that result. Maybe one could also write a tiny separate script for tracking deleted posts given that certain conditions are met.

Happy Instaloading!
Alex

@py1984

This comment has been minimized.

Copy link
Author

commented Dec 6, 2017

@aandergr Alex, yes great points that you bring up. So--track-deleted would be able to be run in conjunction with --fast-update or as a hashtag, but only :feed or a user profile. One way that I could see implementing this would be to keep track of the last photo downloaded and the number of photos, then when you look at the profile the next time, you can see if the last photo downloaded is in the correct position in series after the new pictures if any based on the knowns: number of new pictures, number of pictures currently, number of pictures last download last session, most recent photo downloaded in last session. Would you say that is a practical way to go about it?
The purpose is to determine when photos are deleted and attempt to understand patterns in why users delete photos.
Cheers!
Luke

@aandergr

This comment has been minimized.

Copy link
Member

commented Dec 13, 2017

I wrote a little separate script to track deleted posts:

from glob import glob
from sys import argv
from os import chdir

import instaloader

# Instaloader instantiation - you may pass additional arguments to the constructor here
loader = instaloader.Instaloader(filename_pattern='{date:%Y-%m-%d_%H-%M-%S}')

# If desired, load session previously saved with `instaloader -l USERNAME`:
#loader.load_session_from_file(USERNAME)

def post_filenames(post_iterator):
    needs_profilename = instaloader.format_string_contains_key(loader.filename_pattern, 'profile')
    for p in post_iterator:
        base = loader.filename_pattern.format(profile=p.owner_username if needs_profilename else None,
                                              target=TARGET.lower(), date=p.date, shortcode=p.shortcode,
                                              post=p)
        if p.typename == 'GraphImage':
            yield base + '.jpg'
        elif p.typename == 'GraphVideo':
            yield base + '.jpg'
            yield base + '.mp4'
        elif p.typename == 'GraphSidecar':
            yield from ('{0}_{1}.{2}'.format(base, q + 1, 'mp4' if r['node']['is_video'] else 'jpg')
                        for q, r in enumerate(p.get_sidecar_edges()))

try:
    TARGET = argv[1]
except IndexError:
    raise SystemExit("Pass profile name as argument!")

post_iterator = loader.get_profile_posts(loader.get_profile_metadata(TARGET))

online_posts = set(post_filenames(post_iterator))
chdir(TARGET)
offline_posts = set(glob('*.mp4') + glob('*.jpg')) - set(glob('*_profile_pic.jpg'))

if online_posts - offline_posts:
    print("Not yet downloaded posts:")
    print(" ".join(online_posts - offline_posts))

if offline_posts - online_posts:
    print("Deleted posts:")
    print(" ".join(offline_posts - online_posts))

Copy it into a file (say track_deleted.py), then run with python3 track_deleted.py PROFILE. In the path where you execute must be a directory named like the profile. Its content is used for the online-offline comparison. Note that depending on the number of posts it might take several seconds until you get an output.

The script uses Instaloader to obtain a list of currently-online posts, and generates the matching filename of each post. Then it compares this list with the *.jpg and *.mp4 files that are locally present. It outputs a list of posts which are online but not offline (i.e. not yet downloaded) and a list of posts which are offline but not online (i.e. deleted in the profile).

You may pass additional arguments or altered filename specifications in the loader = instaloader.Instaloader(...) line. Also, you may login by uncommenting the loader.load_session_from_file() line. Theoretically, you may also substitute the post_iterator = ... line to track hashtags or feeds. This script works perfect for profiles with the default filename specification, but keep in mind that there is no logic for handling the corner cases mentioned in my last comment, e.g. non-unique filenames or sources spawning infinite amounts of posts.

It outputs lists of posts rather than renaming files. However, I think it could be easily modified to add _deleted suffixes if desired.

I hope I could give a useful approach and help you towards solving your problem. Please comment whether this somehow fulfills the purpose or which further ideas you have regarding this issue.

@aandergr

This comment has been minimized.

Copy link
Member

commented Jun 4, 2018

I ported the snippet to Instaloader version 4, where it is much simpler thanks to the .json files that are per default created for each post.

from glob import glob
from sys import argv
from os import chdir

from instaloader import Instaloader, Post, Profile, load_structure_from_file

# Instaloader instantiation - you may pass additional arguments to the constructor here
L = Instaloader()

# If desired, load session previously saved with `instaloader -l USERNAME`:
#loader.load_session_from_file(USERNAME)

try:
    TARGET = argv[1]
except IndexError:
    raise SystemExit("Pass profile name as argument!")

# Obtain set of posts that are on hard disk
chdir(TARGET)
offline_posts = set(filter(lambda s: isinstance(s, Post),
                           (load_structure_from_file(L.context, file)
                            for file in (glob('*.json.xz') + glob('*.json')))))

# Obtain set of posts that are currently online
post_iterator = Profile.from_username(L.context, TARGET).get_posts()
online_posts = set(post_iterator)

if online_posts - offline_posts:
    print("Not yet downloaded posts:")
    print(" ".join(str(p) for p in (online_posts - offline_posts)))

if offline_posts - online_posts:
    print("Deleted posts:")
    print(" ".join(str(p) for p in (offline_posts - online_posts)))
aandergr added a commit that referenced this issue Jun 4, 2018
Presents code examples that use the instaloader module for more advanced tasks
than what is possible with the Instaloader command line interface.

Presents #46, #56, #110, #113, #120, #121.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.