# Check country coverage using the GDELT API

This is a quick notebook to see how much data GDELT is collecting from New Zealand and selected Pacific media. I'm interested in the overall volume and the specific domains that are being covered.

Docs on the GDELT api:  
[https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/](https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/)

This is a helpful way to generate the url for querying the GDELT api without having to read the docs:  
[https://gdelt.github.io/](https://gdelt.github.io/)

The next cell imports a function to retrieve data from the GDELT Docs API for a specific country for a 24-hour period. This assumes reasonably low volumes (i.e. less than 250 articles - the GDELT API max rows) per hour. It first queries for a 24-hour period. If it gets back 250 rows it will then query for 1 hour periods covering that 24 hours. I'm not interested in high volume countries, so this is fine for my purposes - but if someone ends up using this for a country with higher volumes you will need to adjust the code in get_gdelt_data.py to handle that.

In [12]:
from get_gdelt_data import *

To find the country code for a country, use this link:  
[http://data.gdeltproject.org/api/v2/guides/LOOKUP-COUNTRIES.TXT](http://data.gdeltproject.org/api/v2/guides/LOOKUP-COUNTRIES.TXT)

In [2]:
countries = ['NZ', 'FJ', 'CW', 'TN', 'WS', 'NR']

In [3]:
dfs = {}
for country in countries:
    print("Getting data for country: ", country)
    dfs[country] = get_gdelt_data_for_country(country)
    print("Data for country ", country, " has been downloaded.")
    print("")

Getting data for country:  NZ
20240610000000 20240611000000 250
Result appears to be larger than 250 records, trying 1 hour increments ...
20240610000000 20240610010000 90
20240610010000 20240610020000 63
20240610020000 20240610030000 77
20240610030000 20240610040000 75
20240610040000 20240610050000 90
20240610050000 20240610060000 32
20240610060000 20240610070000 45
20240610070000 20240610080000 25
20240610080000 20240610090000 18
20240610090000 20240610100000 3
20240610100000 20240610110000 6
20240610110000 20240610120000 5
20240610120000 20240610130000 8
20240610130000 20240610140000 2
20240610140000 20240610150000 4
20240610150000 20240610160000 4
20240610160000 20240610170000 5
20240610170000 20240610180000 14
20240610180000 20240610190000 13
20240610190000 20240610200000 18
20240610200000 20240610210000 48
20240610210000 20240610220000 10
20240610220000 20240610230000 10
20240610230000 20240611000000 12
Total Rows: 479
Data for country  NZ  has been downloaded.

Getting data for 

In [6]:
# compare data counts for each country
for country in countries:
    if dfs[country] is None:
        print("Data count for country ", country, " is: 0")
    else:
        print("Data count for country ", country, " is: ", len(dfs[country]))


Data count for country  NZ  is:  479
Data count for country  FJ  is:  14
Data count for country  CW  is: 0
Data count for country  TN  is:  3
Data count for country  WS  is:  11
Data count for country  NR  is: 0


In [8]:
# domain column summary
print('Domains:')
for country in countries:
    if dfs[country] is not None:
        print('Country:', country)
        display(dfs[country]['domain'].value_counts().reset_index())


Domains:
Country: NZ


Unnamed: 0,domain,count
0,nzherald.co.nz,88
1,home.nzcity.co.nz,67
2,scoop.co.nz,63
3,odt.co.nz,28
4,foreignaffairs.co.nz,27
5,auckland.scoop.co.nz,19
6,livenews.co.nz,18
7,community.scoop.co.nz,16
8,sunlive.co.nz,14
9,thedailyblog.co.nz,12


Country: FJ


Unnamed: 0,domain,count
0,fijivillage.com,9
1,pina.com.fj,5


Country: TN


Unnamed: 0,domain,count
0,parliament.gov.to,2
1,matangitonga.to,1


Country: WS


Unnamed: 0,domain,count
0,oane.ws,7
1,samoaobserver.ws,4


In [9]:
# redo for a random day a few days ago
dfs = {}
for country in countries:
    print("Getting data for country: ", country)
    dfs[country] = get_gdelt_data_for_country(country, days_ago = 8)
    print("Data for country ", country, " has been downloaded.")
    print("")

Getting data for country:  NZ
20240603000000 20240604000000 250
Result appears to be larger than 250 records, trying 1 hour increments ...
20240603000000 20240603010000 48
20240603010000 20240603020000 39
20240603020000 20240603030000 39
20240603030000 20240603040000 13
20240603040000 20240603050000 14
20240603050000 20240603060000 54
20240603060000 20240603070000 75
20240603070000 20240603080000 20
20240603080000 20240603090000 5
20240603090000 20240603100000 10
20240603100000 20240603110000 3
20240603110000 20240603120000 2
20240603120000 20240603130000 4
20240603130000 20240603140000 2
20240603140000 20240603150000 1
20240603150000 20240603160000 1
20240603160000 20240603170000 2
20240603170000 20240603180000 4
20240603180000 20240603190000 8
20240603190000 20240603200000 38
20240603200000 20240603210000 44
20240603210000 20240603220000 22
20240603220000 20240603230000 39
20240603230000 20240604000000 34
Total Rows: 355
Data for country  NZ  has been downloaded.

Getting data for co

In [10]:
# compare data counts for each country
for country in countries:
    if dfs[country] is None:
        print("Data count for country ", country, " is: 0")
    else:
        print("Data count for country ", country, " is: ", len(dfs[country]))


Data count for country  NZ  is:  355
Data count for country  FJ  is:  36
Data count for country  CW  is: 0
Data count for country  TN  is:  4
Data count for country  WS  is:  7
Data count for country  NR  is: 0


In [11]:
# domain column summary
print('Domains:')
for country in countries:
    if dfs[country] is not None:
        print('Country:', country)
        display(dfs[country]['domain'].value_counts().reset_index())


Domains:
Country: NZ


Unnamed: 0,domain,count
0,nzherald.co.nz,90
1,home.nzcity.co.nz,56
2,foreignaffairs.co.nz,29
3,scoop.co.nz,20
4,newstalkzb.co.nz,20
5,odt.co.nz,18
6,community.scoop.co.nz,17
7,nbr.co.nz,11
8,newzealandstar.com,11
9,thedailyblog.co.nz,10


Country: FJ


Unnamed: 0,domain,count
0,fijivillage.com,10
1,fijisun.com.fj,10
2,pina.com.fj,8
3,islandsbusiness.com,8


Country: TN


Unnamed: 0,domain,count
0,parliament.gov.to,4


Country: WS


Unnamed: 0,domain,count
0,samoaobserver.ws,3
1,samoanews.com,3
2,oane.ws,1


Ok - so there isn't much for the Pacific states profiled here. What does this look like for a 7-day period?

In [15]:
if 'NZ' in countries:
    countries.remove('NZ')

dfs = {}
for day in range(1, 8):
    print(f"Getting data for {day} days ago ...")
    dfs[day] = {}
    for country in countries:
        print("Getting data for country: ", country)
        dfs[day][country] = get_gdelt_data_for_country(country, day)
        print("Data for country ", country, " has been downloaded.")


Getting data for 1 days ago ...
Getting data for country:  FJ
20240610000000 20240611000000 14
Total Rows: 14
Data for country  FJ  has been downloaded.
Getting data for country:  CW
20240610000000 20240611000000 0
Data for country  CW  has been downloaded.
Getting data for country:  TN
20240610000000 20240611000000 3
Total Rows: 3
Data for country  TN  has been downloaded.
Getting data for country:  WS
20240610000000 20240611000000 11
Total Rows: 11
Data for country  WS  has been downloaded.
Getting data for country:  NR
20240610000000 20240611000000 0
Data for country  NR  has been downloaded.
Getting data for 2 days ago ...
Getting data for country:  FJ
20240609000000 20240610000000 19
Total Rows: 19
Data for country  FJ  has been downloaded.
Getting data for country:  CW
20240609000000 20240610000000 0
Data for country  CW  has been downloaded.
Getting data for country:  TN
20240609000000 20240610000000 2
Total Rows: 2
Data for country  TN  has been downloaded.
Getting data for cou

In [20]:
# not pretty - but just interested in the counts
for country in countries:
    print("Country: ", country)
    for day in range(7, 0, -1):
        count = 0
        if dfs[day][country] is not None:
            count = len(dfs[day][country])
        print(count, end=" ")
    print()


Country:  FJ
25 23 34 38 13 19 14 
Country:  CW
4 0 0 9 2 0 0 
Country:  TN
3 7 5 0 1 2 3 
Country:  WS
11 6 15 8 1 5 11 
Country:  NR
0 0 0 0 0 0 0 
