Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Support oEmbed for media previews. #7920

Merged
merged 18 commits into from
Jul 27, 2020
Merged

Support oEmbed for media previews. #7920

merged 18 commits into from
Jul 27, 2020

Conversation

clokep
Copy link
Contributor

@clokep clokep commented Jul 21, 2020

Fixes #7643.

This fixes Twitter previews by querying for oEmbed information instead of the originally requested URL. It is mostly generic (so additional oEmbed endpoints can be added), but it isn't end-user configurable at the moment.

The gist of the changes to the preview algorithm is:

  1. Check if the requested URL matches any of the oEmbed patterns, if so:
    1. Query the oEmbed endpoint for the JSON metadata.
    2. If the metadata is a photo or has a thumbnail, use that instead of the original URL and continue the normal preview code.
    3. If the metadata is HTML, use the HTML directly.

A big section of _download_url got indented, but was otherwise unchanged.

A couple of questions I have:

  • Does it make sense to make this user configurable?
  • Parsing of the globs to regular expressions at module load time seems kind of meh, but is easy. I wonder if we should have a separate handler or something for oEmbed?
  • Are there other services we want to add? (Note that there's a JSON file which contains all the providers, we could fetch it programmatically or include it.)

Note that this does NOT attempt to support "discovery" via <link> tags. (Nor does it handle XML.)

Testing steps

  1. Enable URL previews in your homeserver (I think this involes enable_media_repo, url_preview_enabled, url_preview_ip_range_blacklist).
  2. Join a room.
  3. Enable URL previews for that room in the settings.
  4. Send a link that should get previewed and make sure it still works.
  5. Send a Twitter link to get previewed (e.g. https://twitter.com/matrixdotorg/status/1285937589791399936) and see a preview text.

@clokep clokep marked this pull request as ready for review July 22, 2020 15:30
@clokep clokep requested a review from a team July 22, 2020 15:31
@clokep
Copy link
Contributor Author

clokep commented Jul 22, 2020

I think this is ready for review, might want to read the description first before taking a look at the code in case we decide to rework any of the features.

synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
tests/rest/media/v1/test_url_preview.py Outdated Show resolved Hide resolved
tests/rest/media/v1/test_url_preview.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to make this user configurable?

If we agree that using oEmbed over the existing URL preview code is generally advantageous, then I'd say that the ideal situation would be to support oEmbed discovery + a manual user configurable list of glob strings to use. This could be done in a later PR though.

Parsing of the globs to regular expressions at module load time seems kind of meh, but is easy. I wonder if we should have a separate handler or something for oEmbed?

Presumably this is loaded at startup, so not too big a deal?

Are there other services we want to add? (Note that there's a JSON file which contains all the providers, we could fetch it programmatically or include it.)

I guess with previous ACME experiments it's not totally out of the ordinary for Synapse to poll an external service on startup. Again, could be decided out of scope of this PR though(?).

synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
tests/rest/media/v1/test_url_preview.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
@clokep
Copy link
Contributor Author

clokep commented Jul 24, 2020

Does it make sense to make this user configurable?

If we agree that using oEmbed over the existing URL preview code is generally advantageous, then I'd say that the ideal situation would be to support oEmbed discovery + a manual user configurable list of glob strings to use. This could be done in a later PR though.

Thinking more about it, I think we should save this for future PRs.

Parsing of the globs to regular expressions at module load time seems kind of meh, but is easy. I wonder if we should have a separate handler or something for oEmbed?

Presumably this is loaded at startup, so not too big a deal?

It is done at start-up, yeah. I don't think it is a big deal...just running code at module import always seems odd. 😄

Are there other services we want to add? (Note that there's a JSON file which contains all the providers, we could fetch it programmatically or include it.)

I guess with previous ACME experiments it's not totally out of the ordinary for Synapse to poll an external service on startup. Again, could be decided out of scope of this PR though(?).

I think this fits in with above, let's leave it for a future PR.

@clokep
Copy link
Contributor Author

clokep commented Jul 24, 2020

@anoadragon453 Thanks for the review! I handled the straightforward comments, wasn't sure about a couple though!

Copy link
Member

@anoadragon453 anoadragon453 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is done at start-up, yeah. I don't think it is a big deal...just running code at module import always seems odd.

You could populate the lists in the class's constructor instead? I don't have a strong opinion one way or the other :)

synapse/rest/media/v1/preview_url_resource.py Outdated Show resolved Hide resolved
synapse/rest/media/v1/preview_url_resource.py Show resolved Hide resolved
@clokep clokep merged commit 3fc8fdd into develop Jul 27, 2020
@clokep clokep deleted the clokep/twitter-preview branch July 27, 2020 11:50
anoadragon453 added a commit that referenced this pull request Jul 28, 2020
…rove_test_times

* 'develop' of github.com:matrix-org/synapse: (148 commits)
  Add script for finding files with unix line terminators (#7965)
  Convert the remaining media repo code to async / await. (#7947)
  Convert a synapse.events to async/await. (#7949)
  Convert groups and visibility code to async / await. (#7951)
  Convert push to async/await. (#7948)
  update changelog
  1.18.0rc1
  Fix error reporting when using `opentracing.trace` (#7961)
  Fix typing replication not being handled on master (#7959)
  Remove hacky error handling for inlineDeferreds. (#7950)
  Convert tests/rest/admin/test_room.py to unix file endings (#7953)
  Support oEmbed for media previews. (#7920)
  Convert state resolution to async/await (#7942)
  Fix up types and comments that refer to Deferreds. (#7945)
  Do not convert async functions to Deferreds in the interactive_auth_handler (#7944)
  Convert more of the media code to async/await (#7873)
  Return an empty body for OPTIONS requests. (#7886)
  Downgrade warning on client disconnect to INFO (#7928)
  Convert presence handler helpers to async/await. (#7939)
  Update the auth providers to be async. (#7935)
  ...
@Recolate
Copy link

Just a question - should this enable image previews from Twitter as well, or is that still a work-in-progress? Because while I can get text previews now, image previews are still lost.

@clokep
Copy link
Contributor Author

clokep commented Jul 31, 2020

Just a question - should this enable image previews from Twitter as well, or is that still a work-in-progress? Because while I can get text previews now, image previews are still lost.

I was unable to find any image information in the Twitter oEmbed JSON. If you have examples of some with it we can try to support it.

@anoadragon453
Copy link
Member

Getting the oEmbed information for this tweet: https://twitter.com/hotsamsdetroit/status/1288641347914694657, we get:

{
  "url": "https:\\/\\/twitter.com\\/hotsamsdetroit\\/status\\/1288641347914694657",
  "author_name": "HotSamsDetroit",
  "author_url": "https:\\/\\/twitter.com\\/hotsamsdetroit",
  "html": "\\u003Cblockquote class=\"twitter-tweet\"\\u003E\\u003Cp lang=\"en\" dir=\"ltr\"\\u003EWe invite you to hear from Lauren Stovall, marketing director and legacy preserver as she speaks with \\u003Ca href=\"https:\\/\\/twitter.com\\/Square?ref_src=twsrc%5Etfw\"\\u003E@Square\\u003C\\/a\\u003E about how we have been weathering the storm and surviving the storm that we are all still in. \\u003Cbr\\u003E\\u003Cbr\\u003EClick and share the link below:\\u003Ca href=\"https:\\/\\/t.co\\/7qryHvxP18\"\\u003Ehttps:\\/\\/t.co\\/7qryHvxP18\\u003C\\/a\\u003E \\u003Ca href=\"https:\\/\\/t.co\\/Lq35RhMngn\"\\u003Epic.twitter.com\\/Lq35RhMngn\\u003C\\/a\\u003E\\u003C\\/p\\u003E&mdash; HotSamsDetroit (@hotsamsdetroit) \\u003Ca href=\"https:\\/\\/twitter.com\\/hotsamsdetroit\\/status\\/1288641347914694657?ref_src=twsrc%5Etfw\"\\u003EJuly 30, 2020\\u003C\\/a\\u003E\\u003C\\/blockquote\\u003E\n\\u003Cscript async src=\"https:\\/\\/platform.twitter.com\\/widgets.js\" charset=\"utf-8\"\\u003E\\u003C\\/script\\u003E\n",
  "width": 550,
  "height": null,
  "type": "rich",
  "cache_age": "3153600000",
  "provider_name": "Twitter",
  "provider_url": "https:\\/\\/twitter.com",
  "version": "1.0"
}

In there is a pic.twitter.com link, pic.twitter.com/Lq35RhMngn, but navigating to this URL just takes you to the tweet HTML.

By right-click->copy image location on twitter's website, I get the link https://pbs.twimg.com/media/EeIsedPXoAI1T-N?format=jpg&name=orig, which is the actual image.

So, not entirely sure how we link those two together, and it may be a really twitter-specific process, which we don't want.

@Recolate
Copy link

Getting the oEmbed information for this tweet: https://twitter.com/hotsamsdetroit/status/1288641347914694657, we get:

{
  "url": "https:\\/\\/twitter.com\\/hotsamsdetroit\\/status\\/1288641347914694657",
  "author_name": "HotSamsDetroit",
  "author_url": "https:\\/\\/twitter.com\\/hotsamsdetroit",
  "html": "\\u003Cblockquote class=\"twitter-tweet\"\\u003E\\u003Cp lang=\"en\" dir=\"ltr\"\\u003EWe invite you to hear from Lauren Stovall, marketing director and legacy preserver as she speaks with \\u003Ca href=\"https:\\/\\/twitter.com\\/Square?ref_src=twsrc%5Etfw\"\\u003E@Square\\u003C\\/a\\u003E about how we have been weathering the storm and surviving the storm that we are all still in. \\u003Cbr\\u003E\\u003Cbr\\u003EClick and share the link below:\\u003Ca href=\"https:\\/\\/t.co\\/7qryHvxP18\"\\u003Ehttps:\\/\\/t.co\\/7qryHvxP18\\u003C\\/a\\u003E \\u003Ca href=\"https:\\/\\/t.co\\/Lq35RhMngn\"\\u003Epic.twitter.com\\/Lq35RhMngn\\u003C\\/a\\u003E\\u003C\\/p\\u003E&mdash; HotSamsDetroit (@hotsamsdetroit) \\u003Ca href=\"https:\\/\\/twitter.com\\/hotsamsdetroit\\/status\\/1288641347914694657?ref_src=twsrc%5Etfw\"\\u003EJuly 30, 2020\\u003C\\/a\\u003E\\u003C\\/blockquote\\u003E\n\\u003Cscript async src=\"https:\\/\\/platform.twitter.com\\/widgets.js\" charset=\"utf-8\"\\u003E\\u003C\\/script\\u003E\n",
  "width": 550,
  "height": null,
  "type": "rich",
  "cache_age": "3153600000",
  "provider_name": "Twitter",
  "provider_url": "https:\\/\\/twitter.com",
  "version": "1.0"
}

In there is a pic.twitter.com link, pic.twitter.com/Lq35RhMngn, but navigating to this URL just takes you to the tweet HTML.

By right-click->copy image location on twitter's website, I get the link https://pbs.twimg.com/media/EeIsedPXoAI1T-N?format=jpg&name=orig, which is the actual image.

So, not entirely sure how we link those two together, and it may be a really twitter-specific process, which we don't want.

I see. That's unfortunate. A lot of people on my home server use Twitter to look at artwork, so when they share links they like to be able to see a thumbnail.

Thank you both for your efforts regardless.

@clokep
Copy link
Contributor Author

clokep commented Jul 31, 2020

No problem! If there's another way to get the attached images please let us know! (I'd suggest filing a new issue so it doesn't get lost.)

@anoadragon453
Copy link
Member

I've created a new issue for this at #8022, with a potential solution.

babolivier pushed a commit that referenced this pull request Sep 1, 2021
* commit 'f88c48f3b':
  1.18.0rc1
  Fix error reporting when using `opentracing.trace` (#7961)
  Fix typing replication not being handled on master (#7959)
  Remove hacky error handling for inlineDeferreds. (#7950)
  Convert tests/rest/admin/test_room.py to unix file endings (#7953)
  Support oEmbed for media previews. (#7920)
  Convert state resolution to async/await (#7942)
  Fix up types and comments that refer to Deferreds. (#7945)
  Do not convert async functions to Deferreds in the interactive_auth_handler (#7944)
  Convert more of the media code to async/await (#7873)
  Return an empty body for OPTIONS requests. (#7886)
  Downgrade warning on client disconnect to INFO (#7928)
  Convert presence handler helpers to async/await. (#7939)
  Update the auth providers to be async. (#7935)
  Put a cache on `/state_ids` (#7931)
clokep added a commit that referenced this pull request Sep 21, 2021
The major change is moving the decision of whether to use oEmbed
further up the call-stack. This reverts the _download_url method to
being a "dumb" functionwhich takes a single URL and downloads it
(as it was before #7920).

This also makes more minor refactorings:

* Renames internal variables for clarity.
* Factors out shared code between the HTML and rich oEmbed
  previews.
* Fixes tests to preview an oEmbed image.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Repair URL preview for tweets, may require a logged in Twitter account
4 participants