# 🌍 GDELT Event Scraper — Global Energy Themes (Jul–Dec 2017)

This notebook pulls GDELT event records with global impact related to energy (e.g. OIL, GAS, ELECTRICITY) for the last half of 2017.
- Filters by energy-related `themes`
- Collects all 15-minute GDELT files over 6 months
- Stores results to `../data/raw/gdelt_energy_events_2017_H2.csv`

In [1]:
import pandas as pd
from datetime import datetime, timedelta
import requests
from io import StringIO
from tqdm.notebook import tqdm

In [2]:
# Parameters
start_date = datetime(2017, 7, 1)
end_date = datetime(2017, 12, 31)
keywords = ['ENERGY', 'OIL', 'GAS', 'ELECTRICITY']
output_path = '../data/raw/gdelt_energy_events_2017_H2.csv'
base_url = "http://data.gdeltproject.org/gdeltv2/"

# Filename generator for 15-min intervals
def generate_filenames(start, end):
    current = start
    while current <= end:
        yield current.strftime("events/%Y%m%d%H%M00.export.CSV")
        current += timedelta(minutes=15)

In [3]:
# Download and filter
records = []
print("🔍 Searching for global energy-related events...")

for fname in tqdm(list(generate_filenames(start_date, end_date))):
    url = base_url + fname
    try:
        r = requests.get(url, timeout=10)
        if r.status_code == 200:
            df = pd.read_csv(StringIO(r.text), sep='\t', header=None, low_memory=False)
            df.columns = [f'col_{i}' for i in range(df.shape[1])]
            df = df[df['col_50'].fillna('').str.contains('|'.join(keywords), case=False)]
            if not df.empty:
                records.append(df)
    except Exception:
        continue

🔍 Searching for global energy-related events...


  0%|          | 0/17569 [00:00<?, ?it/s]

In [4]:
# Save all matched rows
if records:
    result = pd.concat(records)
    result.to_csv(output_path, index=False)
    print(f"✅ Saved {len(result)} records to: {output_path}")
else:
    print("⚠️ No matching energy-related events found.")

⚠️ No matching energy-related events found.


Need to find a new strategy for collecting news </br>
Finished bu Jad Akra on April 20th 2025