# IEEE Literature Search Strategy & CSV Generation
This notebook allows you to test different keyword strategies for the IEEE API.

Obtain your API key through: https://developer.ieee.org/

Edit the `groups` and `logic` in the next code cell, then run the subsequent cells to see the results

In [None]:
# === SECTION: Import Important Functions ===
import os
import csv
import requests
from datetime import datetime

print("✅ Imported all necessary libraries.")

# 1. Setup folders and API key

In the below section uncomment (ctrl+/ on PC or command+/ on Mac) the relevant lines to define the csv and summary folder and to include your API key

In [None]:
# === SECTION: USER SETUP (PC/Windows) ===
# Comment/uncomment below and edit these variables to match your Windows setup
csv_folder = r"C:\Users\YOUR_USERNAME\Documents\csvs\ieee_csv"
summary_folder = r"C:\Users\YOUR_USERNAME\Documents\csvs\summaries"
api_key = "YOUR_IEEE_API_KEY"  # Replace with your IEEE API key

# === SECTION: USER SETUP (Mac) ===
# Comment/uncomment below and edit these lines to match your Mac setup
csv_folder = r"/Users/YOUR_USERNAME/Documents/csvs/ieee_csv"
summary_folder = r"/Users/YOUR_USERNAME/Documents/csvs/summaries"
api_key = "YOUR_IEEE_API_KEY"  # Replace with your IEEE API key (https://developer.ieee.org/)

# === SECTION: FOLDER CREATION AND CHECK ===
import os

os.makedirs(csv_folder, exist_ok=True)
os.makedirs(summary_folder, exist_ok=True)
missing = []
if not api_key or api_key == "YOUR_IEEE_API_KEY":
    missing.append("API key")
if not os.path.isdir(csv_folder):
    missing.append("CSV folder")
if not os.path.isdir(summary_folder):
    missing.append("Summary folder")

if missing:
    print(f"⚠️ WARNING: Please check the following: {', '.join(missing)}")
else:
    print("✅ Output folders and API key are set up and ready.")


# 2. Test and adjust your keyword strategy

The below 4 sections will help test different keyword groups and their combinations.
- 2.1 Run to define groups of keywords and your exclusion keyword group using AND/OR rules, then define a combination logic
- 2.2. Run to see the number of results returned for each keyword group and the combined query
- 2.3. Run to see the first 10 titles for each keyword group
- 2.4. Run to see the first 10 titles for the combined keyword group

In [None]:
# === SECTION: Define Groups, Logic, and Year Filter ===

groups = {
    'group1': 'keyword OR keyword',
    'group2': 'keyword OR keyword AND keyword', # add more keyword groups if needed
    'excluded': 'NOT (keyword OR keyword)'
}

logic = "({group1}) AND ({group2}) {excluded}" # make sure to add in any additional keyword groups
combined_query = logic.format(**groups)

year_from = 2016 # adjust date to when your want to have your articles from
year_to = datetime.now().year

print(f"Keyword groups and logic defined.\nYear filter: {year_from}-{year_to}")
print("Combined IEEE query:", combined_query)

In [None]:
# === SECTION: Run API Query and Return Total Results for Each Group and Combined Query ===

def run_ieee_query(query, api_key, year_from=None, year_to=None, max_records=1):
    base_url = "https://ieeexploreapi.ieee.org/api/v1/search/articles"
    params = {
        'apikey': api_key,
        'format': 'json',
        'max_records': max_records,
        'querytext': query
    }
    if year_from:
        params['start_year'] = year_from
    if year_to:
        params['end_year'] = year_to
    response = requests.get(base_url, params=params)
    data = response.json()
    total = int(data.get('total_records', 0))
    articles = data.get('articles', [])
    return total, articles

print("Year filter applied to all queries:", f"{year_from}-{year_to}\n")
print("="*50)
print("INDIVIDUAL GROUP RESULTS:")
print("="*50)
group_results = {}
for name, query in groups.items():
    if name != 'excluded':
        count, _ = run_ieee_query(query, api_key, year_from, year_to)
        group_results[name] = count
        print(f"{name.upper():<25}: {count} results")

print("\n" + "="*50)
print("COMBINED LOGIC RESULTS:")
print("="*50)
print(f"Logic: {logic}\n")
print(f"Combined query: {combined_query}")
combined_count, combined_articles = run_ieee_query(combined_query, api_key, year_from, year_to)
print(f"Combined results: {combined_count}")


The next block will show the first 10 titles for each keyword group (except the excluded keyword group).

Based on this, you can go back and adust your keyword groups.

In [None]:
# === SECTION: Show First 10 Titles for Each Keyword Group (Except NOT Group) ===

for name, query in groups.items():
    if name == 'excluded':
        continue
    print(f"\n{'='*30}\n{name.upper()} (First 10 Titles):\n{'='*30}")
    _, articles = run_ieee_query(query, api_key, year_from, year_to, max_records=10)
    for i, article in enumerate(articles, 1):
        print(f"{i}. {article.get('title', 'No Title')}")


The next block will show the first 10 titles for the combined query.

Based on the results you can go back and adjust your groups and logic.

In [None]:
# === SECTION: Show First 10 Titles for Combined Keyword Group ===

print(f"\n{'='*30}\nCOMBINED QUERY (First 10 Titles):\n{'='*30}")
_, articles = run_ieee_query(combined_query, api_key, year_from, year_to, max_records=10)
for i, article in enumerate(articles, 1):
    print(f"{i}. {article.get('title', 'No Title')}")


# 2. Export IEEE Results to CSV

The below script will use your combined query to download titles and abstracts and save them to a CSV file, including author name, title, abstract, year and doi. It will also update the summary table to include the total of found and downloaded records, the source the final query and a timestamp for record keeping purposes.

In [None]:
# === SECTION: Download CSV and Update Summary ===

import os
import csv
from datetime import datetime

def extract_first_author(authors_field):
    if isinstance(authors_field, dict) and 'authors' in authors_field:
        authors_list = authors_field.get('authors', [])
        if isinstance(authors_list, list) and authors_list:
            first = authors_list[0]
            if isinstance(first, dict):
                return first.get('full_name', '') or first.get('name', '')
            elif isinstance(first, str):
                return first
    elif isinstance(authors_field, list) and authors_field:
        first = authors_field[0]
        if isinstance(first, dict):
            return first.get('full_name', '') or first.get('name', '')
        elif isinstance(first, str):
            return first
    elif isinstance(authors_field, dict):
        return authors_field.get('full_name', '') or authors_field.get('name', '')
    elif isinstance(authors_field, str):
        return authors_field
    return ''

def get_next_versioned_filename(folder, base_name="ieee_csv", ext=".csv"):
    i = 1
    while True:
        filename = f"{base_name}_v{i}{ext}"
        filepath = os.path.join(folder, filename)
        if not os.path.exists(filepath):
            return filename, filepath, i
        i += 1

def ensure_newline_at_end(filepath):
    """Ensures the file ends with a newline before appending."""
    if os.path.isfile(filepath):
        with open(filepath, 'rb+') as f:
            f.seek(-1, os.SEEK_END)
            last_char = f.read(1)
            if last_char != b'\n':
                f.write(b'\n')

def export_ieee_to_csv_and_update_summary(query, csv_folder, summary_folder, api_key, year_from, year_to, max_records=1000):
    # Download articles
    total_found, articles = run_ieee_query(query, api_key, year_from, year_to, max_records=max_records)
    actual_downloaded = len(articles)
    # Get next versioned CSV name and version number
    csv_name, csv_path, version_number = get_next_versioned_filename(csv_folder, base_name="ieee_csv", ext=".csv")
    # Write to CSV
    with open(csv_path, 'w', newline='', encoding='utf-8') as csvfile:
        writer = csv.writer(csvfile)
        writer.writerow(['first_author', 'title', 'abstract', 'year', 'doi'])
        for article in articles:
            first_author = extract_first_author(article.get('authors'))
            title = article.get('title', '')
            abstract = article.get('abstract', '')
            year = article.get('publication_year', '')
            doi = article.get('doi', '')
            writer.writerow([first_author, title, abstract, year, doi])
    print(f"✅ Exported {actual_downloaded} records to {csv_path}")

    # Prepare versioned source name for summary row
    source_name = f"ieee v{version_number}"

    # Format timestamp as YYYY-MM-DDTHH:MM
    timestamp = datetime.now().strftime('%Y-%m-%dT%H:%M')

    # Update summary CSV: always append, never overwrite, never repeat header
    summary_csv_path = os.path.join(summary_folder, "summary_csv.csv")
    file_exists = os.path.isfile(summary_csv_path)
    # Ensure file ends with a newline before appending
    if file_exists and os.path.getsize(summary_csv_path) > 0:
        ensure_newline_at_end(summary_csv_path)
    with open(summary_csv_path, 'a', newline='', encoding='utf-8') as f:
        writer = csv.writer(f)
        if not file_exists:
            writer.writerow(['source', 'found', 'downloaded', 'keyword_combination', 'date'])
        writer.writerow([source_name, total_found, actual_downloaded, query, timestamp])
    print("✅ Summary row updated.")

# Example usage:
export_ieee_to_csv_and_update_summary(
    combined_query, csv_folder, summary_folder, api_key, year_from, year_to, max_records=1000
)


# IEEE Literature Search Strategy Completed

If all scripts have been run successfully (either once or multiple times), you should've received confirmation messages for each block and have at least one csv named ieee_csv_v(n).csv in your folder defined at the start. Note, that with every single download the code generates an additional version following the naming convention of v1, v2, v3 etc. You should also have a summary table updated with a record of each download you made.