Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip target URLs with no TLD (ie dot in domain) #534

Closed
snarfed opened this issue Oct 30, 2015 · 1 comment
Closed

skip target URLs with no TLD (ie dot in domain) #534

snarfed opened this issue Oct 30, 2015 · 1 comment
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Oct 30, 2015

https://www.facebook.com/Kayvx recently signed up and we tried to backfeed their event https://www.facebook.com/897476880334829 with 1.9k (!) invitees. the event description does have URLs, but they're mangled and have spaces in them:

Giá rẻ nhất thị trường: http://chiasefb .com/fbapps#/fbshare_price
Mua dịch vụ này thế nào ?
Liên hệ 0978142241 hoặc Inbox Facebook.
Link tham khảo và đăng ký: http://chiasefb. com/fbapps#/fbshare_

we extracted the part before the space, http://chiasefb, as the only target URL (which will obviously fail), and then proceeded to fetch it for webmention endpoint discovery 1.9k times, once for each invitee. ugh.

dumb hack: automatically skip target URLs without dots in the domain, since they'll pretty much never work. (except for TLD NIC web sites, natch, but i'm ok with false negatives for those.)

@snarfed snarfed changed the title automatically skip target URLs with no dot in domain automatically skip target URLs with no TLD (ie dot in domain) Oct 30, 2015
@snarfed snarfed changed the title automatically skip target URLs with no TLD (ie dot in domain) kip target URLs with no TLD (ie dot in domain) Oct 30, 2015
@snarfed snarfed changed the title kip target URLs with no TLD (ie dot in domain) skip target URLs with no TLD (ie dot in domain) Oct 30, 2015
snarfed added a commit that referenced this issue Oct 31, 2015
before this, we only cached webmention endpoint discovery results when we successfully fetched the page. this is a bit aggressive, since it means we'll cache transient network failures, but with the lower expiration (2h instead of 1d), and now that the retry button flushes the cache (#524), i think it's ok.

for #534
@snarfed
Copy link
Owner Author

snarfed commented Oct 31, 2015

OK, we're now caching all webmention discovery errors, including connection failures like this, for 2h. i say that's good enough.

@snarfed snarfed closed this as completed Oct 31, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant