API for sharing data #2

fhopp · 2019-09-26T01:07:18Z

We need to find a good API that lets us obtain sharing data of newspaper articles.

fhopp · 2019-09-27T16:49:36Z

Check out Facebook's Graph API: https://developers.facebook.com/docs/graph-api

fhopp · 2019-09-27T23:23:21Z

@yibeichan and @fhopp discussed today that we can get very informative data via the Twitter API. For now, we are going to focus on "shares" on Twitter and later return to the idea of shares on Facebook. The single unit for scraping twitter data will still be a single URL. However, we will create two extra tables in cassandra:

twitter_shares
twitter_tweets

For (1), each row will be a unique URL and columns will consist of the unique number of users that mentioned this tweet and the total retweet counts, total likes, total comments of this URL.

For (2), each row will be the unique tweet that mentioned this URL along with metadata for that tweet such as the text, how many likes the tweet has gotten, how many replies, favorites etc.

Next step for @yibeichan is to think about how we can retrieve so many URLs. @musainayatmalik will help with implementing the "twitter scraping" pipeline in PySpark.

yibeichan · 2019-10-01T15:30:01Z

several ways to get historical twitter data (sorted)

http://www.orgneat.com/ (free) it doesn’t allow to download tweets. We would be ok, we just need the retweet number. And if we get tweet ID, I can try other ways to get tweets.
use multi-public twitter database, choose certain topics or combine them together, search news link among tweets, get share counts.
database: https://www.docnow.io/catalog/ (free)
Most of these database are topic/event/keyword-specified
https://github.com/Jefferson-Henrique/GetOldTweets-python (free)
I used it before, but it doesn’t contain the deleted data. We can give it a try.
https://codecanyon.net/item/historical-tweets/22120633 ($14 purchase) It seems good, it’s an app.
https://sifter.texifter.com/ this website has the complete, undeleted historical data of Twitter between 01/14/2014-09/29/2018, and can be cleaned by https://discovertext.com/ ( $24/month) However, we need contact Twitter to get approval to use the data
https://www.trackmyhashtag.com/historical-twitter-data (pay) this one is based on hashtag to get historical data
https://www.tweetbinder.com/payments/#/process-payment/historical (one-time purchase?) historical data, limitied to 140,000 tweets

fhopp · 2019-10-03T23:31:17Z

@yibeichan , can we close this now? We are using sharedcount.com to get facebook data and we will pay to get the twitter data? Can you open an issue for the Twitter data and comment the link to the company so I can get started on the application? Thanks!

fhopp added the enhancement A new feature or improvement label Sep 26, 2019

fhopp assigned yibeichan Sep 26, 2019

fhopp added this to Waiting in Sharing Sep 27, 2019

fhopp moved this from Waiting to In progress in Sharing Sep 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for sharing data #2

API for sharing data #2

fhopp commented Sep 26, 2019

fhopp commented Sep 27, 2019

fhopp commented Sep 27, 2019

yibeichan commented Oct 1, 2019

fhopp commented Oct 3, 2019

API for sharing data #2

API for sharing data #2

Comments

fhopp commented Sep 26, 2019

fhopp commented Sep 27, 2019

fhopp commented Sep 27, 2019

yibeichan commented Oct 1, 2019

fhopp commented Oct 3, 2019