# GDELT

the GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world.

https://www.gdeltproject.org

We can access to the data in multiple ways: 
- Google Big Query through the Google Cloud Platform
- Download the raw data
- API

Here you can download online news articles monitored by GDELT across the 65 languages machine translated by GDELT since November 2015 mentioning either "virus" or "Covid-19." Each row represents a distinct URL and includes the date GDELT first saw the article, its title, URL and the language code of the primary publication language of the article:
    
https://blog.gdeltproject.org/a-new-dataset-for-exploring-the-global-multilingual-covid-19-online-news-narrative/

Here you have a list of all datasets available:

http://data.gdeltproject.org/blog/2020-coronavirus-narrative/live_onlinenews/MASTERFILELIST.TXT

In [1]:
# you can download articles in csv format with pandas 
import pandas as pd

# different datasets are: 
# * cases
# * covid19
# * falsehoods
# * masks
# * panic
# * prices
# * quarantine
# * shortages
# * socialdistancing
# * testing
# * ventilators

# to download a specific dataset:
dataset = "panic"

# import (can take time)
df = pd.read_csv("http://data.gdeltproject.org/blog/2020-coronavirus-narrative/live_onlinenews/20191101-20200326-" + dataset + ".csv.gz", names=["date", "url", "title", "content"])

# print the number of rows / columns
print(df.shape)

# see first rows 
df.head()

(118612, 4)


Unnamed: 0,date,url,title,content
0,2020-03-08 03:46:52 UTC,https://www.freepressseries.co.uk/news/nationa...,Women charged after supermarket toilet paper f...,Video of the incident was shared widely online...
1,2020-03-04 18:19:03 UTC,https://www.nbcbayarea.com/tag/california/,California – NBC Bay Area,Kris Sanchez reports. Robert Durst 13 hours ag...
2,2020-03-06 17:31:57 UTC,https://www.dunfermlinepress.com/news/national...,Coronavirus: 163 people in UK test positive,"HEALTH Coronavirus / (PA Graphics) Meanwhile, ..."
3,2020-02-14 22:17:56 UTC,https://www.dailymail.co.uk/news/article-80059...,British scientist leading coronavirus fight sa...,"The Ferns Medical Practice in Farnham, Surrey,..."
4,2020-01-24 10:32:35 UTC,http://www.interaksyon.com/politics-issues/202...,Philippine government's order to deport travel...,"Its incubation period, or the time from exposu..."


We can then save in local the dataset:

In [2]:
df.to_csv("../data/gdelt_live_onlinenews_20191101-20200326-" + dataset + ".csv", compression="gzip")

You can also download articles for a single day:

In [3]:
dataset = "masks"

# import 
df_20200326 = pd.read_csv("http://data.gdeltproject.org/blog/2020-coronavirus-narrative/live_onlinenews/20200326-" + dataset + ".csv.gz", names=["date", "url", "title", "content"])

# print the number of rows / columns
print(df_20200326.shape)

# see first rows 
df_20200326.head()

(15181, 4)


Unnamed: 0,date,url,title,content
0,2020-03-27 17:18:45 UTC,https://www.vanguardngr.com/2020/03/coronaviru...,,He stressed that accuracy of most of the test ...
1,2020-03-27 13:33:30 UTC,https://www.aircargonews.net/services/ground-h...,,This equipment is now in use in hospitals acro...
2,2020-03-27 21:03:17 UTC,https://www.informationng.com/2020/03/bobrisky...,,A video fast trending captures the moment popu...
3,2020-03-27 11:48:27 UTC,https://www.tap.info.tn/en/Portal-World/124956...,,U.S. has most coronavirus cases in world 27/03...
4,2020-03-27 21:48:35 UTC,http://www.wcsmradio.com/index.php/news/52407/...,,THEY SAY THE MIST PERMEATES THE LAYERS OF THE ...


Alternatively you can send request through API endpoints:

1. https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/

In [4]:
# import libraries
import requests
import json

# define query
url = 'https://api.gdeltproject.org/api/v2/doc/doc?query=("covid" OR "coronavirus") domain:cnn.com &mode=artlist&maxrecords=250&timespan=1day&format=json'

# we send the request to the API 
response = requests.get(url)

# parse results 
data = json.loads(response.text)

# print results
print("Number of articles:", len(data["articles"]))
data["articles"]

Number of articles: 28


[{'url': 'https://www.cnn.com/2021/07/26/politics/us-china-sherman-tense-meetings/',
  'url_mobile': 'https://amp.cnn.com/cnn/2021/07/26/politics/us-china-sherman-tense-meetings/index.html',
  'title': 'US and China trade barbs after another high - level meeting but say they want to keep talking',
  'seendate': '20210727T003000Z',
  'socialimage': 'https://cdn.cnn.com/cnnnext/dam/assets/210726132824-wendy-sherman-chinese-officials-0726-super-tease.jpg',
  'domain': 'cnn.com',
  'language': 'English',
  'sourcecountry': 'United States'},
 {'url': 'https://www.cnn.com/2021/07/26/politics/us-covid-travel-restrictions/',
  'url_mobile': 'https://amp.cnn.com/cnn/2021/07/26/politics/us-covid-travel-restrictions/index.html',
  'title': 'US to keep existing Covid - related travel restrictions',
  'seendate': '20210726T151500Z',
  'socialimage': 'https://cdn.cnn.com/cnnnext/dam/assets/210716145103-white-house-file-0306-super-tease.jpg',
  'domain': 'cnn.com',
  'language': 'English',
  'sourcec

2. https://blog.gdeltproject.org/announcing-the-gdelt-context-2-0-api/

In [5]:
# define query
url = 'https://api.gdeltproject.org/api/v2/context/context?format=html&timespan=24H&query=("Trump" OR "Biden")&mode=artlist&maxrecords=75&format=json'

# we send the request to the API 
response = requests.get(url)

# parse results 
data = json.loads(response.text)

# print results
print("Number of articles:", len(data["articles"]))
data["articles"]

Number of articles: 24


[{'url': 'https://www.myheraldreview.com/news/benson/opinion/commentary/cal-thomas-kristi-noem-and-the-gop-s-future/article_f23c1f6e-ee3f-11eb-811e-4bff405d6fad.html',
  'title': 'Cal Thomas: Kristi Noem and the GOPs future | Commentary',
  'seendate': '20210727T090412Z',
  'socialimage': 'https://bloximages.chicago2.vip.townnews.com/myheraldreview.com/content/tncms/custom/image/5de3b48c-b05c-11e7-99b7-db39ff5c3a7d.jpg',
  'domain': 'myheraldreview.com',
  'language': 'ENGLISH',
  'isquote': 0,
  'sentence': 'Noem believes Donald Trump will run for president again in 2024.',
  'context': 'When you look at some states that have done what we believe in you can see overwhelming prosperity and families healthier and in school getting educated and that\'s what the American dream is and what the Republican Party needs to be talking about." Noem believes Donald Trump will run for president again in 2024. "I think (former) President Trump did a great job.'},
 {'url': 'https://www.westport-news

3. https://blog.gdeltproject.org/gdelt-geo-2-0-api-debuts/