Skip to content

GraphQL Coverage is a CLI tool that analyzes your GraphQL schema and queries to report on field usage and coverage, generating detailed CSV reports and visual charts.

License

Notifications You must be signed in to change notification settings

pligor/graphql-coverage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphQL Coverage

Overview

GraphQL Coverage is a powerful tool designed to assess how extensively your GraphQL schema is utilized by your queries. By analyzing the fields defined in your schema and comparing them against the fields used in your queries, this tool provides valuable insights into the coverage and potential gaps within your GraphQL implementation.

While the Jupyter Notebook (graphql_coverage.ipynb) serves as a playground for exploratory analysis, the primary interface for users is the command-line script (graphql_coverage.py). This README will guide you through the installation, usage, and functionalities of the CLI tool.

Features

  • Comprehensive Field Extraction: Extracts all fields or only leaf fields from your GraphQL schema.
  • Query Analysis: Parses and analyzes GraphQL queries to determine field usage.
  • Coverage Calculation: Computes the percentage of schema fields utilized by queries.
  • Detailed Reporting: Generates comprehensive reports in CSV format and visualizes coverage with charts.
  • Configurable Depth: Allows aggregation of fields at specified depths for streamlined reporting.
  • Normalization Option: Supports case-insensitive comparison of field names.

Installation

  1. Clone the Repository

    git clone https://github.com/pligor/graphql-coverage.git
    cd graphql-coverage
  2. Install Dependencies

    Ensure you have Python 3.7 or higher installed. Install the required Python packages using pip:

    pip install -r requirements.txt

Usage

The primary tool is the graphql_coverage.py script, which can be executed via the command line. Below is a step-by-step guide to using the script effectively.

Command-Line Interface

Available Options

Option Description Default
--schema_path Path to the GraphQL schema file. GraphQLClients/spaceXplayground/schema.graphql
--queries_path Path to the directory containing GraphQL queries. GraphQLClients/spaceXplayground/Queries
--only_leafs If set, only leaf fields will be considered. False
--depth Depth for reporting coverage. Aggregates fields at this level. 1
--normalize_field_names If set, field names will be normalized (case-insensitive). False
--csv_path Path to the CSV file for the coverage report. schema_coverage_report.csv
--plot_path Path to the plot file for the coverage chart. schema_coverage_chart.png

Examples

  1. Basic Usage

    Analyze the default schema and queries directory, extracting all fields:

    python graphql_coverage.py
  2. Only Leaf Fields

    Focus the analysis on leaf fields:

    python graphql_coverage.py --only_leafs
  3. Specify Custom Paths

    Provide custom paths for the schema and queries:

    python graphql_coverage.py --schema_path path/to/schema.graphql --queries_path path/to/queries/
  4. Normalize Field Names and Adjust Depth

    Normalize field names for case-insensitive comparison and aggregate fields at depth 2:

    python graphql_coverage.py --normalize_field_names --depth 2
  5. Custom Report and Plot Paths

    Define custom output paths for the CSV report and coverage chart:

    python graphql_coverage.py --csv_path output/report.csv --plot_path output/chart.png

Output

Upon execution, the script performs the following steps:

  1. Schema Loading

    • Loads the entire GraphQL schema from the specified file.
    • Extracts all fields or only leaf fields based on the --only_leafs flag.
  2. Query Loading

    • Recursively searches the specified directory for all .graphql query files.
    • Reads each query, storing its file path and content.
  3. Field Usage Extraction

    • Parses each query, handling fragments, and extracts hierarchical field names.
    • Counts how many queries each field appears in.
  4. Coverage Calculation

    • Compares the extracted schema fields against the fields used in queries.
    • Calculates the coverage percentage.
  5. Report Generation

    • Generates a CSV report detailing field usage and coverage.
    • Creates a visual chart representing the coverage.

After successful execution, you will find the schema_coverage_report.csv and schema_coverage_chart.png in your specified output paths.

Coverage Statistics Calculator

The calculate_csv_coverage_stats.py script provides aggregate statistics across multiple CSV coverage reports. This post-processing tool is useful when you have generated multiple coverage reports (e.g., for different GraphQL clients) and want to analyze overall coverage metrics.

Purpose

The script aggregates coverage statistics from multiple CSV files, calculating both per-file metrics and overall aggregated statistics. This helps you understand coverage patterns across different schemas or clients.

How It Works

For each CSV file, the script:

  • Counts rows where the Covered column is True (numerator)
  • Counts the total number of data rows excluding the header (denominator)
  • Calculates the coverage fraction: numerator / denominator

The script then computes two aggregate statistics:

  1. Average of all fractions: Sum of all individual fractions divided by the number of CSV files
  2. Overall fraction: Sum of all numerators divided by the sum of all denominators

Prerequisites

  • Python 3.x
  • pandas library (install via pip install pandas)

Usage

Basic Usage (Default Directory)

By default, the script processes CSV files in the results/csv/ directory:

python calculate_csv_coverage_stats.py

Custom Directory

You can specify a custom directory containing CSV files:

python calculate_csv_coverage_stats.py results/csv

Or use an absolute path:

# Windows
python calculate_csv_coverage_stats.py "C:\path\to\csv\directory"

# Linux/Mac
python calculate_csv_coverage_stats.py /path/to/csv/directory

Output

The script prints detailed statistics for each CSV file and aggregate statistics at the end:

Processing 3 CSV file(s)...

ExternalAuthClient_schema_coverage_report.csv:
  Covered entries: 65
  Total entries: 260
  Fraction: 0.2500 (65/260)

ExternalPublicClient_schema_coverage_report.csv:
  Covered entries: 15
  Total entries: 35
  Fraction: 0.4286 (15/35)

InternalClient_schema_coverage_report.csv:
  Covered entries: 156
  Total entries: 270
  Fraction: 0.5778 (156/270)

==================================================
STATISTICS:
==================================================
1. Average of all fractions: 0.4188
2. Overall fraction (sum of numerators / sum of denominators):
   0.4177 (236/565)
==================================================

Understanding the Statistics

  • Average of all fractions: This metric treats each CSV file equally, regardless of size. It's useful when you want to see the average coverage across different schemas or clients.
  • Overall fraction: This metric weights each file by its size. It represents the true overall coverage when considering all fields across all files together.

Notes

  • The script automatically excludes lock files (files starting with .~lock)
  • Files with no data rows (denominator = 0) are skipped with a warning
  • Errors in individual files are handled gracefully, allowing the script to continue processing remaining files
  • The script validates that the provided directory exists and is a valid directory

Jupyter Notebook Playground

The repository includes a Jupyter Notebook (graphql_coverage.ipynb) that serves as an interactive environment for experimenting with the coverage analysis. While the CLI script is intended for regular use, the notebook provides a deeper dive into each step of the process, leveraging comments and outputs to enhance understanding.

Contributing

Contributions are welcome! If you encounter issues, have questions, or want to suggest improvements, please raise an issue on GitHub.

License

This project is licensed under the GNU Affero General Public License v3.0.

Acknowledgements

Thank you for using GraphQL Coverage! Your feedback and contributions help improve the tool for everyone.

About

GraphQL Coverage is a CLI tool that analyzes your GraphQL schema and queries to report on field usage and coverage, generating detailed CSV reports and visual charts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published