<a href="https://colab.research.google.com/github/nunoandrade80-cmd/Citation-counter-v2/blob/main/Citation_counter_v2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## PMID Citation Count Generator

This script is designed to automate the process of fetching citation counts for a list of PubMed IDs (PMIDs) using the Europe PMC API and generating an organized Excel report.

### Key Features:

1.  **Flexible PMID Input**: You can provide PMIDs in two ways:
    *   **Manual String**: Paste a comma-separated string of PMIDs directly into the `pmid_string_manual` variable within the script.
    *   **File Upload**: If the manual string is left empty, the script will prompt you to upload a plain text file (`.txt`) containing PMIDs. This file can have PMIDs separated by commas on a single line, or one PMID per line.
2.  **Europe PMC API Integration**: The script queries the Europe PMC API to retrieve the `citedByCount` for each PMID. It includes error handling for API request failures and JSON parsing issues, and a small delay between requests to prevent rate limiting.
3.  **Excel Report Generation**: All collected data (PMID and its citation count) is compiled into a pandas DataFrame, which is then saved as an Excel file (`citations.xlsx`).
4.  **Automatic Download**: The generated Excel file is automatically downloaded to your local machine upon completion.

### How to Use:

1.  **Configure Input**: Decide whether to use manual input or file upload.
    *   **For Manual Input**: Locate the `pmid_string_manual` variable in the code cell below and replace the empty string with your comma-separated PMIDs (e.g., `pmid_string_manual = "12345678,98765432"`).
    *   **For File Upload**: Ensure `pmid_string_manual` is an empty string (`""`). When you run the code cell, a file upload dialog will appear. Select your `.txt` file containing PMIDs.
2.  **Run the Code**: Execute the code cell provided below.
3.  **Monitor Progress**: The script will print updates as it fetches citation counts for each PMID. For large lists, it will show progress every 50 PMIDs.
4.  **Receive Output**: Once all data is fetched and the Excel file is generated, it will be automatically downloaded to your browser's default download location. A confirmation message will also be printed.

This script is self-contained and ready for immediate use. For future reference, you can save this notebook to your Google Drive or GitHub.


**Reasoning**:
Now I will update the code cell (`KWdFCan8P6xO`) with the consolidated Python script, adding inline comments to clearly explain each section of the code as requested by the subtask.



In [None]:
import requests # Library for making HTTP requests to APIs
import time     # Library for adding delays to prevent rate limiting
import pandas as pd # Library for data manipulation and creating DataFrames
from google.colab import files # Colab-specific utility for file uploads and downloads
import io       # Library for handling I/O operations (not strictly needed here but often imported with files)

# --- Configuration Section ---
# Option 1: Manual PMID Input (Prioritized if not empty)
# Paste your comma-separated PMIDs here.
# If this string is not empty, it will be used instead of prompting for a file upload.
# Example: pmid_string_manual = "31580267,31581173,31595038"
pmid_string_manual = "" # Leave this empty to use the file upload option

# Option 2: Europe PMC API Base URL (Do not change unless the API structure changes)
# This URL is used to query the Europe PMC database for publication details, including citation counts.
EUROPE_PMC_API_BASE_URL = "https://www.ebi.ac.uk/europepmc/webservices/rest/search?query=EXT_ID:{}&resulttype=core&format=json"

# --- Script Logic ---
pmids_list = [] # Initialize an empty list to store parsed PMIDs

# Prioritize manual input if pmid_string_manual is not empty
if pmid_string_manual.strip():
    print("Using PMIDs from manual input.")
    # Split the manual string by comma and clean up each PMID (remove whitespace)
    pmids_list = [pmid.strip() for pmid in pmid_string_manual.split(',') if pmid.strip()]
else:
    # If manual input is empty, prompt the user to upload a text file
    print("Manual PMID string is empty. Please upload a text file containing PMIDs (either comma-separated or one per line).")

    # Display a file upload widget for the user
    uploaded = files.upload()

    # Check if any file was uploaded
    if not uploaded:
        print("No file was uploaded. Exiting.")
    else:
        # Assuming a single file is uploaded, get its content
        pmid_content = None
        file_name = None
        for fn in uploaded.keys(): # Iterate through uploaded files (usually just one)
            file_name = fn
            pmid_content = uploaded[fn].decode('utf-8') # Decode the file content from bytes to string
            print(f"User uploaded file: '{file_name}'")

        if pmid_content:
            # Parse PMIDs from the uploaded file content
            # Check if PMIDs are comma-separated or newline-separated
            if ',' in pmid_content:
                pmids_list = [pmid.strip() for pmid in pmid_content.split(',') if pmid.strip()]
            else:
                pmids_list = [pmid.strip() for pmid in pmid_content.split('\n') if pmid.strip()]
        else:
            print("Uploaded file was empty. Exiting.")

# Proceed only if PMIDs were successfully parsed (either manually or from file)
if not pmids_list:
    print("No PMIDs were found to process. Please ensure you provide PMIDs either manually or via a file.")
else:
    print(f"\nTotal PMIDs to process: {len(pmids_list)}")
    # Print a preview of the parsed PMIDs
    if len(pmids_list) > 5:
        print(f"First 5 parsed PMIDs: {pmids_list[:5]}")
    else:
        print(f"Parsed PMIDs: {pmids_list}")

    citation_data = [] # Initialize a list to store dictionaries of PMID and citation counts

    # Fetch citation counts for each PMID from the Europe PMC API
    print("\nFetching citation counts from Europe PMC API...")
    for i, pmid in enumerate(pmids_list):
        api_url = EUROPE_PMC_API_BASE_URL.format(pmid) # Construct the API URL for the current PMID
        try:
            response = requests.get(api_url) # Make the API request
            response.raise_for_status()  # Raise an exception for HTTP errors (4xx or 5xx)
            data = response.json()       # Parse the JSON response

            citation_count = None
            # Extract 'citedByCount' from the JSON response
            # The path is typically data['resultList']['result'][0]['citedByCount']
            if 'resultList' in data and 'result' in data['resultList'] and len(data['resultList']['result']) > 0:
                citation_count = data['resultList']['result'][0].get('citedByCount')

            # Append the PMID and its citation count to our data list
            citation_data.append({'PMID': pmid, 'Citation Count': citation_count if citation_count is not None else 'N/A'})

            # Optional: print progress for large lists every 50 PMIDs or at the end
            if (i + 1) % 50 == 0 or (i + 1) == len(pmids_list):
                print(f"Processed {i + 1}/{len(pmids_list)} PMIDs. Last fetched PMID {pmid}: {citation_count if citation_count is not None else 'N/A'}")

        except requests.exceptions.RequestException as e:
            # Handle network-related errors during API request
            print(f"Error fetching data for PMID {pmid}: {e}")
            citation_data.append({'PMID': pmid, 'Citation Count': 'Error'})
        except (KeyError, IndexError) as e:
            # Handle errors during JSON parsing or if expected keys are missing
            print(f"Error parsing JSON for PMID {pmid}: {e}")
            citation_data.append({'PMID': pmid, 'Citation Count': 'Parsing Error'})

        time.sleep(0.2)  # Introduce a small delay to avoid hitting API rate limits

    print("\nFinished fetching all citation counts.")

    # Create a pandas DataFrame from the collected citation data
    df_citations = pd.DataFrame(citation_data)
    output_filename = 'citations.xlsx' # Define the output Excel file name

    # Save the DataFrame to an Excel file, without including the pandas index
    df_citations.to_excel(output_filename, index=False)

    # Print a preview of the DataFrame and confirm file creation
    print(f"\nDataFrame created (first 5 rows):\n{df_citations.head()}")
    print(f"Citation data saved to '{output_filename}'.")

    # Automatically download the generated Excel file to the user's local machine
    files.download(output_filename)
    print(f"'{output_filename}' has been downloaded to your local machine.")

Manual PMID string is empty. Please upload a text file containing PMIDs (either comma-separated or one per line).
