Rounding errors found in IDs #24

kmcelwee · 2020-10-10T19:34:24Z

Oembed process revealed IDs that had been rounded:

1265004698148495400
1265812305817862100
1266080125805834200

This exists within the dataset in fortune-100-blm-dataset. I believe I manually entered values for Lowe's because of API limits, so that might be what's happening.

Request oembed for all tweets ending in 00 to double check that it's limited to these tweets.
Dig into fortune-100-blm-dataset repo and double check scripts.
Pandas automatically reads the ID column as an integer. Research that this doesn't cause issues.

The text was updated successfully, but these errors were encountered:

kmcelwee · 2020-10-10T20:02:59Z

df['ID'].astype(str).apply(lambda x: len(x)).value_counts()

outputs:

19    83280
18    54160
11      117
17       87
10       38
16       11

Meaning a majority of tweets have 19 digits.

kmcelwee · 2020-10-10T20:09:01Z

The maximum ID value as an integer is 1287171305138204672, which is greater than all the Lowe's values that were rounded, supporting the argument that it's just the Lowe's tweets.

kmcelwee · 2020-10-10T20:23:33Z

Looking only at IDs that end in 00, leaves us with 3158 IDs

all_ids = [x for x in df['ID'].astype(str).tolist() if x[-2:] == '00']

Using get_oembed(tweet_id) we get the following tweets that raised errors:

1139202878801715200
1063239357237248000
1192497953505792000
1266080125805834200
1265812305817862100
1265004698148495400
1176691432100249600

kmcelwee · 2020-10-11T01:32:36Z

1139202878801715200
Nike
Thu Jun 13 16:08:26 +0000 2019

It doesn’t matter what you play. Nobody wins alone. #BeTrue #UntilWeAllWin

@caster800m @TheChrisMosier @ScoutBassett @KerronClement @MarkMcKenzie4_ @ EricKoston @S10bird @brittneyGriner @jordin_canada @jewellloyd https://t.co/veA9PtqwbW

✅ confirmed. This was deleted.

kmcelwee · 2020-10-11T01:43:18Z

1063239357237248000,Exelon,Fri Nov 16 01:16:31 +0000 2018,,

RT @ Amartines: “HR does not solely own the responsibility for ensuring diversity. Leaders need to be accountable for the make up of their t…

Cannot scroll back far enough. Feed for Exelon stops in 2019. The original tweet exists though. Not exactly sure what happened here.

kmcelwee · 2020-10-11T01:48:18Z

1192497953505792000,IBM,Thu Nov 07 17:44:02 +0000 2019,

What does a day without IBM look like?

Watch Techless, where people must complete seemingly simple tasks without using anything that was invented by IBM or could use our technology: https://t.co/GRvjE2fz6s https://t.co/tL5UnsfCbW

✅ confirmed. This was deleted.

kmcelwee · 2020-10-11T01:49:29Z

1266080125805834200
1265812305817862100
1265004698148495400

✅ Are all the Lowe's tweets we know about

kmcelwee · 2020-10-11T01:53:12Z

1176691432100249600,Facebook,Wed Sep 25 02:54:33 +0000 2019

RT @ boztank: See you tomorrow at #OC6

https://t.co/oFTviQaIyr

✅ Looks like the original tweet was deleted

kmcelwee · 2020-10-11T02:05:38Z

Seems like in the raw data pull (fortune-100-blm-dataset/data/fortune-100-json/Lowes.json), the id did not match id_str. Test is added to test.py to check for this.

kmcelwee · 2020-10-11T02:09:33Z

Pandas supports 64 bit integers by default, and Twitter suggests that's what it's using. Still can't figure out how that error creeped in, but it should be all set now.

kmcelwee mentioned this issue Oct 10, 2020

Address errors in twitter oembeds #14

Closed

kmcelwee closed this as completed Oct 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rounding errors found in IDs #24

Rounding errors found in IDs #24

kmcelwee commented Oct 10, 2020 •

edited

kmcelwee commented Oct 10, 2020

kmcelwee commented Oct 10, 2020

kmcelwee commented Oct 10, 2020

kmcelwee commented Oct 11, 2020 •

edited

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

Rounding errors found in IDs #24

Rounding errors found in IDs #24

Comments

kmcelwee commented Oct 10, 2020 • edited

kmcelwee commented Oct 10, 2020

kmcelwee commented Oct 10, 2020

kmcelwee commented Oct 10, 2020

kmcelwee commented Oct 11, 2020 • edited

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 11, 2020

kmcelwee commented Oct 10, 2020 •

edited

kmcelwee commented Oct 11, 2020 •

edited