Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

facebook: /me/posts returns objects with ids with colons #305

Closed
snarfed opened this issue Oct 28, 2014 · 48 comments
Closed

facebook: /me/posts returns objects with ids with colons #305

snarfed opened this issue Oct 28, 2014 · 48 comments
Assignees
Labels

Comments

@snarfed
Copy link
Owner

@snarfed snarfed commented Oct 28, 2014

very weird. @aaronpk reported that bridgy sent him duplicate responses this morning, e.g. on http://aaronparecki.com/notes/2014/09/02/4/drhorrible. here are example original and duplicated u-urls:

note the odd (bad) colon-formatted post id in the dupe. FB evidently returned comment ids like these in a single /me/posts API call, 2014-10-28 16:53:24 UTC. log here. polls for @aaronpk's acccount since then have received normal comment ids, without colons.

@snarfed snarfed added the listen label Oct 28, 2014
@snarfed
Copy link
Owner Author

@snarfed snarfed commented Oct 28, 2014

google doesn't find any other reports, at least for queries like facebook graph api ids "colons".

@snarfed snarfed closed this in cc02b97 Oct 28, 2014
@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

reopening, we're seeing this again for many/all FB users. example log:

Traceback (most recent call last):
...
File "/base/data/home/apps/s~brid-gy/3.383236630846410047/tasks.py", line 109, in post
  source_updates.update(self.poll(source))
File "/base/data/home/apps/s~brid-gy/3.383236630846410047/tasks.py", line 161, in poll
  cache=cache)
File "/base/data/home/apps/s~brid-gy/3.383236630846410047/facebook.py", line 204, in get_activities_response
  assert ':' not in id[1], 'Cowardly refusing id with colon: %s' % id[1]
AssertionError: Cowardly refusing id with colon: 12802152:10101224399086339:10101869104476441_10101225182401569

@kylewm thinks it might be /1565113317092307/invited, but we don't see colons in those ids right now.

@snarfed snarfed reopened this Apr 1, 2015
@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

the id in that example, 12802152:10101224399086339:10101869104476441_10101225182401569, is [USERID]:[POSTID]:[OTHER-USERID]_[UNKNOWN]

@snarfed snarfed added the now label Apr 1, 2015
@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

more example ids:

  • 100001501388539:866229876770384:10101869104476441_866434786749893
  • 10205989425965050:10206207863185844:10101869104476441_10206207866545928
  • 510659665:10153194244359666:10101869104476441_10153194358259666

the only part that's the same is the second user id, 10101869104476441.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

i'm seeing that 10101869104476441 user id in every single bad id, so i'm going to go out on a limb and say it's the root cause, or at least the common thread.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

it's the app-scoped user id for https://www.facebook.com/rickosterberg .

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

also notable, the colon-separated ids here are of a different form than the original ones in this bug, e.g. 11500459:10101595978024156:63.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

progress: these ids are in comments returned by /me/posts for some users, e.g. 10205049272524056. example post https://www.facebook.com/10205049272524056/posts/10205200565786293 . corresponding object returned by FB API:

{
  "id": "10205049272524056_10205200565786293",
  "comments": {
    "data": [
      {
        "id": "10205049272524056:10205200565786293:10101869104476441_10205207541960693",
        "message": "Did you go to Comicon (spelled correctly?) this year?",
        ...
      },
      {
        "id": "10205049272524056:10205200565786293:10101869104476441_10205216360221144",
        "message": "No. As usual, just lived vicariously through tweets.",
        ...
      },
  ...

the odd part is that the 10101869104476441 user seems unrelated. he didn't write the post or any comments or like anything. i wonder if that id here is something other than the app-scoped user id...but the FB id namespace is pretty unified, so hrmph.

@snarfed snarfed changed the title facebook: /me/posts API call once returned objects with badly formatted ids facebook: /me/posts returns objects with ids with colons Apr 1, 2015
@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

current status is watchful waiting. it was transient last time, so i'm still kinda hoping this is an FB bug that will eventually get fixed. it's been happening longer for this time, ~16h so far, but hope springs eternal.

chart

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

huh. the v2.3 /me/posts docs say it returns Post objects, but the v2.3 Post object docs say they were "removed after Graph API v2.2."

i'm guessing just an unrelated doc bug...but hrm.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

currently reading the changes from 2.2 to 2.3.

@kylewm
Copy link
Collaborator

@kylewm kylewm commented Apr 1, 2015

/rickosterberg's global user id = "63", does that number mean anything to you?

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

oh wow, funny! just that he was harvard undergrad at the same time as zuck? i think that's who those very low ids are generally, as opposed to employees...?

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

ooh coincidence! 63 is the last part of the original weird id, 11500459:10101595978024156:63.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

WHO IS RICK OSTERBERG AND WHY DOES HE HATE US 😢 😭 😿

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

afaik you can't. you can only change the time window. :/

here's the alternative new hotness: https://console.developers.google.com/project/brid-gy/appengine . i don't think it lets you change the axes explicitly either, but it's different at least.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 1, 2015

same graph you were looking at. its (automatic) y axis scale just wasn't thrown off by that 4xx spike.

@kylewm
Copy link
Collaborator

@kylewm kylewm commented Apr 2, 2015

More fallout from this. All these FacebookPages are marked in the DB with status = 'error' now, which means publish is also disabled for them.

This was noticed by @tantek; Bridgy reports "Publish is not enabled for your account(s). Please visit https://www.brid.gy/facebook/214611 and sign up!", even though publish is enabled.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 2, 2015

oops! argh. fortunately that's at least an easy fix.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 2, 2015

if this is here to stay, one approach would be to skip the comments with these bad ids.

on a related note, i'm curious if they're all new to bridgy, or if we've seen some of them before with normal ids, and they changed. i think that's what happened the first time, since we sent @aaronpk dupe wms.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 2, 2015

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 2, 2015

i've filed a facebook bug for this here: https://developers.facebook.com/bugs/786903278061433/ ...and they've already responded! discussion ongoing, i'll summarize here when it's done.

@kylewm
Copy link
Collaborator

@kylewm kylewm commented Apr 2, 2015

Here's two IDs that refer to the same comment (I already posted them on the FB bug report, copying here for posterity). Both seem to work with the graph API explorer for now:

10101218793564849_10101219208273769

12802152:10101218793564849:10101869104476441_10101219208273769

We received the first one on 2015-02-22, and I found it in the appspot Data Viewer with this query:

SELECT * FROM Response where failed='https://kylewm.com/10101859667632951/02/til-i-have-close-friends-who-don-t-know-what-ethics'
@beck24
Copy link

@beck24 beck24 commented Apr 3, 2015

Hi, I'm new to Bridgy as of yesterday, and my account seems to be experiencing this problem consistently. Hopefully more data to look at helps...

https://www.brid.gy/facebook/948590525171368

screen shot 2015-04-03 at 8 45 55 am

@kylewm
Copy link
Collaborator

@kylewm kylewm commented Apr 8, 2015

We got a reply back from Facebook saying, it's a bug and they won't fix it.

So I think we can assume:

  1. ids with colons are here to stay
  2. they will continue to appear unpredictably
  3. on posts that we have previously seen other ids on

Based on that, I guess we handle both forms, but for the purposes of deduplication, canonicalize the colon form to "userid_postid"

@kylewm
Copy link
Collaborator

@kylewm kylewm commented Apr 8, 2015

OK I'm confused already. You said above the old comment format was USER_POST_COMMENT, but I'm seeing POST_COMMENT and MYUSERID:POST:OSTERBERG_COMMENT

@kylewm
Copy link
Collaborator

@kylewm kylewm commented Apr 8, 2015

kind of fascinating fiddling with it. you can fetch the comment with either POST_COMMENT or USER_POST_COMMENT. Also you can substitute any number for USER and it still works.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 8, 2015

conclusion: @kylewm and i discussed, and we're going to leave it as is for now, dropping all colon ids. we'll probably revisit this soon, especially since i expect they'll gradually migrate all new ids to this format, so we'll start seeing non-dupes eventually.

@beck24
Copy link

@beck24 beck24 commented Apr 10, 2015

Hey guys, I think we're seeing non-dupes already

Original post: https://www.facebook.com/matt.beckett.39/posts/952329091464178

http://known.mattbeckett.me/2015/local-friends---where-would-be-a-good-spot-to

But only the likes are coming through. I think all of the comments are coming through with the colon ids.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 10, 2015

ugh, ok. thanks for letting us know. we'll get them working.

@kylewm, i definitely get your proposal, but i wonder if we can get away with less parsing and converting between id formats. an alternative would be to store new, non-dupe colon ids as is, instead of normalizing to the old format.

...however, we'd still need to parse and convert every colon id to check whether it's a dupe...and we'd need to add similar new logic for every new id format they throw at us...so i dunno. i wonder if there's a way to avoid the parsing altogether, and still avoid most (if not all?) dupes. meh. thoughts?

@kylewm
Copy link
Collaborator

@kylewm kylewm commented Apr 11, 2015

It'd be great if we could truly treat the ids as opaque strings (like we were admonished to do :-P). The first problem I ran into on that track was get_comment -- they don't give us enough data in the comment object to e.g. construct a permalink URL. https://github.com/snarfed/activitystreams-unofficial/blob/master/facebook.py#L860

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 12, 2015

@kylewm and i discussed and settled on his earlier proposal: canonicalize these ids to the old form, and complain and ignore ids in any other format that we don't recognize.

@snarfed snarfed self-assigned this Apr 13, 2015
snarfed added a commit to snarfed/granary that referenced this issue Apr 14, 2015
snarfed added a commit to snarfed/granary that referenced this issue Apr 15, 2015
also store original, un-canonicalized facebook id in new fb_id field

for snarfed/bridgy#305
snarfed added a commit to snarfed/granary that referenced this issue Apr 17, 2015
@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 17, 2015

i'm about to turn on the new id parsing for just us three. hold onto your butts! https://www.youtube.com/watch?v=HKK4KmDlj8U

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 17, 2015

looks like it didn't find any colon ids (ie new comments that it had previously ignored) for @kylewm or me, but it found a boatload for @beck24. matt, mind confirming when you get a chance? hopefully you now got all the comments you were missing, but no duplicates of old comments.

@beck24
Copy link

@beck24 beck24 commented Apr 17, 2015

Well, I definitely got a bunch of missing comments, but I did end up getting some duplicates:

http://known.mattbeckett.me/2015/embedlycards---a-plugin-for-known

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 17, 2015

ugh, ok. thanks for the heads up. glad i canaried this with just us first. i'll look at those dupes soon.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 18, 2015

funny. looks like some of the original comments on that post, from a week or two ago, were colon ids that we passed through untouched, so the dupes we sent yesterday are the old style non-colon ids.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 19, 2015

expanding the canary to a few more guinea pig users now.

@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 19, 2015

looks good. expanding it to (gulp) everyone now. 🙈 🙉 🙊

snarfed added a commit that referenced this issue Apr 19, 2015
only look for an existing Response with the fb_id field if it's different than the normal id field.
@snarfed
Copy link
Owner Author

@snarfed snarfed commented Apr 20, 2015

tentatively closing. (woo!) @beck24, @kylewm, feel free to reopen if you see anything suspicious.

@snarfed snarfed closed this Apr 20, 2015
@beck24
Copy link

@beck24 beck24 commented Apr 20, 2015

Fantastic, thanks for the hard work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.