# GDELT API Exploration

## Usage

Note: There is a Python API library open-sourced, which provides handy utilities to interact and query the GDELT Project database. This is [the project's GitHub Page](https://github.com/alex9smith/gdelt-doc-api)


The following is an example to query the API and fetch all articles in the GDELT database containing the one of the keywords ("femicide", "illegal fishing", "wildlife crime") and were published within the timeframe of last 2 months.


In [1]:
from gdeltdoc import GdeltDoc, Filters

In [6]:
f = Filters(keyword=["femicide", "illegal fishing", "wildlife crime"], timespan="2months")
gd = GdeltDoc()

articles = gd.article_search(filters=f)
articles

Unnamed: 0,url,url_mobile,title,seendate,socialimage,domain,language,sourcecountry
0,https://www.elsiglodetorreon.com.mx/noticia/20...,https://www.elsiglodetorreon.com.mx/nota/2023/...,Fiscalía de Edomex detiene al presunto feminic...,20230918T031500Z,https://tecolotito.elsiglodetorreon.com.mx/i/2...,elsiglodetorreon.com.mx,Spanish,Mexico
1,https://www.semana.com/nacion/articulo/urgente...,https://www.semana.com/amp/nacion/articulo/urg...,Capturan en México a exnovio de Ana María Serr...,20230918T030000Z,https://www.semana.com/resizer/oAmAZdLzkZhBnge...,semana.com,Spanish,Colombia
2,https://www.eluniversal.com.mx/edomex/caso-ana...,https://www.eluniversal.com.mx/edomex/caso-ana...,Caso Ana María : Trasladan al penal de Barrien...,20230918T024500Z,https://www.eluniversal.com.mx/resizer/3RcNoOo...,eluniversal.com.mx,Spanish,Mexico
3,https://www.24-horas.mx/2023/09/17/asesinan-en...,,"Asesinan en México a Ana María Serrano , sobri...",20230918T033000Z,https://www.24-horas.mx/wp-content/uploads/202...,24-horas.mx,Spanish,Mexico
4,https://www.elsiglodetorreon.com.mx/noticia/20...,https://www.elsiglodetorreon.com.mx/nota/2023/...,"Sujeto asesina a exnovia en Atizapán , Edomex",20230918T031500Z,https://tecolotito.elsiglodetorreon.com.mx/i/2...,elsiglodetorreon.com.mx,Spanish,Mexico
5,https://www.eluniversal.com.co/mundo/el-crudo-...,https://m.eluniversal.com.co/mundo/el-crudo-re...,El crudo relato de la madre de Ana María Serra...,20230918T033000Z,https://www.eluniversal.com.co/binrepository/1...,eluniversal.com.co,Spanish,Colombia
6,https://www.eluniversal.com.mx/edomex/caso-ana...,https://www.eluniversal.com.mx/edomex/caso-ana...,Caso Ana María : La joven víctima de feminicid...,20230918T024500Z,https://www.eluniversal.com.mx/resizer/OaCauEl...,eluniversal.com.mx,Spanish,Mexico
7,http://www.razon.com.mx/estados/feminicidio-an...,https://www.razon.com.mx/amp/estados/feminicid...,Feminicidio de Ana María . Así recuerdan a la ...,20230918T033000Z,,razon.com.mx,Spanish,Mexico
8,https://tn.com.ar/sociedad/2023/09/17/la-reacc...,https://tn.com.ar/sociedad/2023/09/17/la-reacc...,EL FUERTE MENSAJE de la mamá de CECILIA Strzyz...,20230918T024500Z,https://tn.com.ar/resizer/ESMC8ceJWXp9maJJod4q...,tn.com.ar,Spanish,Argentina
9,https://www.eltiempo.com/bogota/autoridades-re...,https://www.eltiempo.com/amp/bogota/autoridade...,Autoridades reportaron cero homicidios durante...,20230918T030000Z,https://www.eltiempo.com/files/og_paste_img/up...,eltiempo.com,Spanish,Colombia


## 1. Can we use code to fetch lists of URLs matching boolean queries?

Yes, the API query allows for filtering using boolean queries (AND, OR). The below example showcases the example of chaining `OR` keyword and this result is used to query the GDELT API to get a response similar to above.
 
Note: Creating queries using `AND` filter is not directly supported by the library yet and can be achieved with updating some functions and minimal efforts
Note: GDELT API behaves differently when the query string is longer than 225 characters. The current implementation of the API doesn't have a good mechanism to handle such large query strings and the response from GDELT API. The GDELT API in this cases, returns a browser interactive HTML widget. 

In [9]:
f = Filters(keyword=['asesinato', 'homicidio', 'femicidio', 'feminicidio', 'travesticidio', 'transfemicidio', 'Lesbicidio', 'asesina', 'asesinada', 'muerta', 'muerte', 'mata', 'mató', 'dispara', 'balea', 'apuñala', 'acuchillada', 'golpeada', 'estrangula', 'ahogada', 'degollada', 'incinera', 'quemada', 'envenenada', 'prendida fuego', 'descuartizada', 'sin vida', 'intento', 'intento de asesinato', 'Intentó asesinarla', 'intento de femicidio', 'intento de transfemicidio', 'intento de travesticidio', 'intento de lesbicidio', 'intentó matarla', 'suicidio', 'se quito la vida', 'se mató', 'se suicido', 'se ahorco") AND (muje', 'niña', 'una joven', 'una adolescente', 'una chica', 'cuerpo de una mujer', 'restos', 'cadaver de una mujer', 'prostituta', 'trabajadora sexual', 'mujer trans', 'una travesti', 'hombre vestido de mujer'], timespan="2months")
f.query_string

'(asesinato OR homicidio OR femicidio OR feminicidio OR travesticidio OR transfemicidio OR Lesbicidio OR asesina OR asesinada OR muerta OR muerte OR mata OR mató OR dispara OR balea OR apuñala OR acuchillada OR golpeada OR estrangula OR ahogada OR degollada OR incinera OR quemada OR envenenada OR "prendida fuego" OR descuartizada OR "sin vida" OR intento OR "intento de asesinato" OR "Intentó asesinarla" OR "intento de femicidio" OR "intento de transfemicidio" OR "intento de travesticidio" OR "intento de lesbicidio" OR "intentó matarla" OR suicidio OR "se quito la vida" OR "se mató" OR "se suicido" OR "se ahorco") AND (muje" OR niña OR "una joven" OR "una adolescente" OR "una chica" OR "cuerpo de una mujer" OR restos OR "cadaver de una mujer" OR prostituta OR "trabajadora sexual" OR "mujer trans" OR "una travesti" OR "hombre vestido de mujer") &timespan=2months&maxrecords=250'

## 2. If so, how do we filter to results from the last three days and page through those?

Filtering the results based on an absolute and relative time ranges are supported by the GDELT API. For relative filtering this is the supported syntax: 
    - Minutes: `min`
    - Hours: `h`, `hours`
    - Days: `d`, `days`
    - Weeks: `w`, `weeks`
    - Months: `m`, `months` 
    
So, below would be the way to filter the results from last three days

In [10]:
f = Filters(keyword="femicide", timespan="3d")
gd.article_search(f)

Unnamed: 0,url,url_mobile,title,seendate,socialimage,domain,language,sourcecountry
0,https://diario.mx/juarez/encabeza-juarez-top-e...,https://diario.mx/juarez/amp/encabeza-juarez-t...,Encabeza Juárez top estatal de feminicidios,20230915T200000Z,https://diario.mx/jrz/media/uploads/galeria/20...,diario.mx,Spanish,Mexico
1,https://www.am.com.mx/guanajuato/2023/9/17/con...,https://www.am.com.mx/amp/congreso-amplia-caus...,Congreso amplía causales para aplicar pena máx...,20230917T230000Z,https://www.am.com.mx/u/fotografias/m/2023/9/1...,am.com.mx,Spanish,Mexico
2,https://www.eluniversal.com.mx/estados/exigen-...,https://www.eluniversal.com.mx/estados/exigen-...,Exigen justicia por maestra asesinada,20230916T104500Z,https://www.eluniversal.com.mx/resizer/CSE8Tf3...,eluniversal.com.mx,Spanish,Mexico
3,https://www.radioformula.com.mx/nacional/2023/...,,"Encuentran sin vida a Kailani , niña de 3 años...",20230915T201500Z,https://www.radioformula.com.mx/u/fotografias/...,radioformula.com.mx,Spanish,Mexico
4,https://www.semana.com/nacion/articulo/aterrad...,https://www.semana.com/amp/nacion/articulo/ate...,"Aterrador | Ana María Serrano , sobrina del ex...",20230917T220000Z,https://www.semana.com/resizer/cH2C1EzR8Xmeez0...,semana.com,Spanish,Colombia
...,...,...,...,...,...,...,...,...
245,https://larepublica.pe:443/sociedad/2023/09/16...,https://larepublica.pe/amp/sociedad/2023/09/16...,Abuso infantil : piden aborto terapéutico en Á...,20230916T100000Z,https://imgmedia.larepublica.pe/1200x630/larep...,larepublica.pe,Spanish,Peru
246,https://www.diariodecuyo.com.ar/argentina/En-C...,https://www.diariodecuyo.com.ar/amp/argentina/...,"En Chaco , cajoneado el caso Cecilia Strzyzo...",20230917T041500Z,https://www.diariodecuyo.com.ar/__export/16949...,diariodecuyo.com.ar,Spanish,Argentina
247,https://www.zazoom.it/2023-09-17/violenza-su-d...,,Violenza su donne | Zanella | castrazione chi...,20230917T143000Z,,zazoom.it,Italian,Italy
248,https://www.jpost.com/opinion/article-759223,https://m.jpost.com/opinion/article-759223,The shofar call to remember Mahsa Amini - opin...,20230915T081500Z,"https://images.jpost.com/image/upload/f_auto,f...",jpost.com,English,Israel


Note: Currently there is no mechanism to paginate or retrieve more than 250 articles in a single API call. None of the online resources or blogs on GDELT website specify alternative mechanisms to achieve this. But, this should be relatively easy to achieve, through the use of multiple API calls and filtering the results for each day or week and concatenating the result sets.  

## 3. Can we filter by geographic country or state/provide of publication?

Yes, the results of the search can be filtered based on the source country of publication, along with the language of publication. The API uses the standard [FIPS country code standard](https://en.wikipedia.org/wiki/List_of_FIPS_country_codes) to filter based on the country. There 

In [11]:
f = Filters(keyword=["femicide", "farmer"], timespan="2m", country=["IN", "PK"])
gd.article_search(f)

Unnamed: 0,url,url_mobile,title,seendate,socialimage,domain,language,sourcecountry
0,https://www.thehindu.com/news/international/tu...,https://www.thehindu.com/news/international/tu...,Turkey drops bid to close leading women rights...,20230914T003000Z,https://th-i.thgim.com/public/incoming/hktweg/...,thehindu.com,English,India
1,https://tribune.com.pk/story/2429155/corporate...,,Corporate farming to attract Rs100 billion,20230804T083000Z,https://i.tribune.com.pk/media/images/JSSKHSJR...,tribune.com.pk,English,Pakistan
2,https://jang.com.pk/news/1263261,https://jang.com.pk/amp/1263261,بھارت : کسانوں کا احتجاج ، مظاہرین عمارت کے حف...,20230829T171500Z,https://jang.com.pk/assets/uploads/updates/202...,jang.com.pk,Urdu,Pakistan
3,https://dunyanews.tv/en/Entertainment/749835-F...,,Farmer stumbles upon big dinosaur egg,20230824T223000Z,https://img.dunyanews.tv/news/2023/August/08-2...,dunyanews.tv,English,Pakistan
4,https://jang.com.pk/news/1256337,https://jang.com.pk/amp/1256337,جھل مگسی میں کسان کے قتل کا نو ٹس لیا جا ئے ، ...,20230810T033000Z,,jang.com.pk,Urdu,Pakistan
...,...,...,...,...,...,...,...,...
245,https://www.punjabitribuneonline.com/news/doab...,https://m.punjabitribuneonline.com/article/the...,ਕਿਸਾਨਾਂ - ਮਜ਼ਦੂਰਾਂ ਨੇ ਕੇਂਦਰ ਸਰਕਾਰ ਦੇ ਪੁਤਲੇ ਫੂਕ...,20230909T061500Z,https://www.punjabitribuneonline.com/wp-conten...,punjabitribuneonline.com,Punjabi,India
246,https://www.amarujala.com/dehradun/the-farmers...,https://www.amarujala.com/amp/dehradun/the-far...,Dehradun News : बीज की कमी के कारण चिंतित नजर ...,20230723T203000Z,https://staticimg.amarujala.com/assets/images/...,amarujala.com,Hindi,India
247,https://www.punjabitribuneonline.com/news/punj...,https://m.punjabitribuneonline.com/article/lon...,ਲੌਂਗੋਵਾਲ : ਪੁਲੀਸ ਨਾਲ ਝੜਪ ਦੌਰਾਨ ਕਿਸਾਨ ਦੀ ਮੌਤ - ...,20230822T034500Z,https://www.punjabitribuneonline.com/wp-conten...,punjabitribuneonline.com,Punjabi,India
248,https://www.kannadaprabha.com/karnataka/2023/a...,https://m.kannadaprabha.com/karnataka/2023/aug...,ತಿ . ನರಸಿಪುರ ರಸ್ತೆ ಅಪಘಾತದಲ್ಲಿ ರೈತ ಸಾವು : ಹೆದ್ದ...,20230826T114500Z,https://media.kannadaprabha.com/uploads/user/i...,kannadaprabha.com,Kannada,India


## 4. Are there API limits? Are there any associated monetary costs?

There doesn't seem to be a monetary or pricing mechanism imposed on the GDELT API. Their website describes themselves as 
```
The entire GDELT database is 100% free and open and you can download the raw datafiles, visualize it using the
GDELT Analysis Service, or analyze it at limitless scale with Google BigQuery.
```
So, there shouldn't be any associated monetary costs without any information from their blogs.


The following are the limitations that I have noticed during my interaction with GDELT: 
- Max 250 articles are returned when querying through API
- Doesn't have support for fuzzy matching text within articles. So, the filters would have to account for language tense, sentence structure, etc while designing.


## Conclusion

While working with the GDELT API, I found that it supports a few other features as well:
- Allows for searching images (attached to articles) using the image EXIF data, image OCR result, image tag (images are processed by GDELT and tagged based on the image content predictions)
- Allows for filtering based on the tonality of articles (how negative or positive the article's content)
- Allows for filtering based on themes (pre-determined set of categories, examples: TERROR, CRISIS, MARITIME, etc). List of themes [is here](http://data.gdeltproject.org/api/v2/guides/LOOKUP-GKGTHEMES.TXT).
- It can return visualizations for certain contexts based on the query. (This can save the processing overhead for certain situations)


## References
- GDELT API Documentation: https://blog.gdeltproject.org/gdelt-doc-2-0-api-debuts/
- GDELT Python API package: https://github.com/alex9smith/gdelt-doc-api
- GDELT Blog (for extra info): https://blog.gdeltproject.org/