Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is considered "news" when using filter:news? #5

Closed
igorbrigadir opened this issue Aug 5, 2019 · 4 comments
Closed

What is considered "news" when using filter:news? #5

igorbrigadir opened this issue Aug 5, 2019 · 4 comments
Projects

Comments

@igorbrigadir
Copy link
Owner

@igorbrigadir igorbrigadir commented Aug 5, 2019

Need to try and enumerate all the "newsworthy" urls twitter includes in a filter:news query.
eg: search

Stopped working at:

lang:en filter:news -url:smh.com.au -url:buzzfeed.com -url:billboard.com -url:nhk.or.jp -url:note.mu -url:excelsior.com.mx -url:cnbc.com -url:thehindu.com -url:apnews.com -url:nymag.com -url:msn.com -url:bbc.co.uk -url:news.biglobe.ne.jp -url:freep.com -url:clarin.com -url:ameblo.jp -url:nicovideo.jp -url:foxnews.com -url:dallasnews.com -url:thehill.com -url:abcnews -url:startribune -url:cnn.com

(number of exclude parameters (23) reached?)

@igorbrigadir
Copy link
Owner Author

@igorbrigadir igorbrigadir commented Aug 9, 2019

Going to try and get a handle on the urls: english only first, then other languages.

1 day sample of filter:news lang:en tweets: 757 tweets, 148 different domains, top:

     64 hollywoodreporter.com
     64 cnn.com
     43 bbc.co.uk
     33 forbes.com
     32 eonline.com
     29 abcnews.go.com
     28 rollingstone.com
     24 bloomberg.com
     24 billboard.com
     22 washingtonpost.com
     20 reuters.com
     20 404
     19 nature.com
     19 gamepass.nfl.com
     18 ctvnews.ca
     18 cbc.ca
     17 thehill.com
     16 variety.com
     16 espn.com
     15 cbsnews.com
     14 tribpub.com
     14 si.com
     14 ign.com
     13 nytimes.com
     11 standard.co.uk
     11 soompi.com
     10 vogue.com
     10 timesofindia.indiatimes.com
     10 theguardian.com
     10 choice.npr.org
     10 cbs.com
...

@igorbrigadir igorbrigadir added this to In progress in Collections Aug 22, 2019
@igorbrigadir igorbrigadir moved this from In progress to To Do in Collections Aug 22, 2019
@samhenrigold
Copy link

@samhenrigold samhenrigold commented Jul 13, 2021

Hey there! I found something that might help answer this. In Chrome, if you go to Dev Tools > Application > Storage > IndexedDB > localforage > keyvaluepairs, there's an entry titled "device:rweb.articleDomains". It contains an array of around 4,300 news sources. If you're having trouble finding it, I can post a gist, let me know :)
image

@igorbrigadir
Copy link
Owner Author

@igorbrigadir igorbrigadir commented Jul 14, 2021

Fantastic find! Thank you! This looks like exactly what I was looking for!

@igorbrigadir
Copy link
Owner Author

@igorbrigadir igorbrigadir commented Jul 14, 2021

I managed to extract it with this snippet https://gist.github.com/loilo/ed43739361ec718129a15ae5d531095b in case anyone else is looking for a fast way.

The list from today is here (I imagine they update it occasionally) https://gist.github.com/igorbrigadir/ef143d2f3167258359007a0ff7ac401d#file-news_domain_list-json

Thanks again!

Collections automation moved this from To Do to Done Jul 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Collections
  
Done
Development

No branches or pull requests

2 participants