Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instagram metadata: post_pageurl, post_tags #743

Merged
merged 10 commits into from
May 28, 2020
Merged

Conversation

Vrihub
Copy link
Contributor

@Vrihub Vrihub commented May 10, 2020

Add the following metadata for instagram:

  • post_pageurl: json string with url of the post page
  • post_tags tags: json array with instagram tags extracted from the post description

Add the following metadata for instagram:
- post_pageurl: json string with url of the post page
- post_tags: json array with instagram tags extracted from the post description
This way, --write-tags will pick up the post tags.
Copy link
Contributor

@iamleot iamleot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Vrihub for working on it!

Just some minor suggestions, apart the rename from post_pageurl to url.

Usally all change for extractor in the first commit message line has the format:

[extractor] short description

Please change the first line of the commit message to something like:

[instagram] add 'url' and 'tags' metadata

....

For both url and tags please update existing test in order to test these metadata as well.

Can you please rename post_pageurl to url, add tests and squash all commits to one single commit and adjust the commit message as suggested?

Thanks again!

gallery_dl/extractor/instagram.py Outdated Show resolved Hide resolved
@Vrihub
Copy link
Contributor Author

Vrihub commented May 10, 2020

I've made the requested changes (and fixed a bug with duplicate tags).

Re: the squash everything into a single commit, IIRC this can be done by the project admin when merging the pull request, right?

BTW, I think I'm going to add some other useful metadata: location and tagged users, so please don't merge this yet.

@@ -395,6 +394,7 @@ class InstagramImageExtractor(InstagramExtractor):
def __init__(self, match):
InstagramExtractor.__init__(self, match)
self.shortcode = match.group(1)
self._find_tags = re.compile(r'#\w+').findall
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add this to the __init__() of the base class, so that all Instagram extractors have access to it.

The Travis tests currently have lots of AttributeError: 'InstagramUserExtractor' object has no attribute '_find_tags' messages in them. You can run those tests locally with python test/test_results.py instagram to check if anything broke before pushing your commits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, right. I was misleaded by the absence of an __init__() in the base class InstagramExtractor.

A tentative fix is in 1305200

I added __init__() to the base class, and also had it call Extractor.__init__(), since all the subcategory extractors in their __init__() are calling InstagramExtractor.__init__() (which in the absence of it would resolve in calling Extractor.__init__(), am I right?)

It works for me, but please confirm that this indeed makes sense.

Or maybe it would be better to define self._find_tags() as a static method instead?

@mikf mikf merged commit 62b65e5 into mikf:master May 28, 2020
mikf added a commit that referenced this pull request May 28, 2020
@Vrihub Vrihub deleted the ins-metadata branch November 30, 2020 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants