GraphQL Coverage is a powerful tool designed to assess how extensively your GraphQL schema is utilized by your queries. By analyzing the fields defined in your schema and comparing them against the fields used in your queries, this tool provides valuable insights into the coverage and potential gaps within your GraphQL implementation.
While the Jupyter Notebook (graphql_coverage.ipynb) serves as a playground for exploratory analysis, the primary interface for users is the command-line script (graphql_coverage.py). This README will guide you through the installation, usage, and functionalities of the CLI tool.
- Comprehensive Field Extraction: Extracts all fields or only leaf fields from your GraphQL schema.
- Query Analysis: Parses and analyzes GraphQL queries to determine field usage.
- Coverage Calculation: Computes the percentage of schema fields utilized by queries.
- Detailed Reporting: Generates comprehensive reports in CSV format and visualizes coverage with charts.
- Configurable Depth: Allows aggregation of fields at specified depths for streamlined reporting.
- Normalization Option: Supports case-insensitive comparison of field names.
-
Clone the Repository
git clone https://github.com/pligor/graphql-coverage.git cd graphql-coverage -
Install Dependencies
Ensure you have Python 3.7 or higher installed. Install the required Python packages using pip:
pip install -r requirements.txt
The primary tool is the graphql_coverage.py script, which can be executed via the command line. Below is a step-by-step guide to using the script effectively.
| Option | Description | Default |
|---|---|---|
--schema_path |
Path to the GraphQL schema file. | GraphQLClients/spaceXplayground/schema.graphql |
--queries_path |
Path to the directory containing GraphQL queries. | GraphQLClients/spaceXplayground/Queries |
--only_leafs |
If set, only leaf fields will be considered. | False |
--depth |
Depth for reporting coverage. Aggregates fields at this level. | 1 |
--normalize_field_names |
If set, field names will be normalized (case-insensitive). | False |
--csv_path |
Path to the CSV file for the coverage report. | schema_coverage_report.csv |
--plot_path |
Path to the plot file for the coverage chart. | schema_coverage_chart.png |
-
Basic Usage
Analyze the default schema and queries directory, extracting all fields:
python graphql_coverage.py
-
Only Leaf Fields
Focus the analysis on leaf fields:
python graphql_coverage.py --only_leafs
-
Specify Custom Paths
Provide custom paths for the schema and queries:
python graphql_coverage.py --schema_path path/to/schema.graphql --queries_path path/to/queries/
-
Normalize Field Names and Adjust Depth
Normalize field names for case-insensitive comparison and aggregate fields at depth 2:
python graphql_coverage.py --normalize_field_names --depth 2
-
Custom Report and Plot Paths
Define custom output paths for the CSV report and coverage chart:
python graphql_coverage.py --csv_path output/report.csv --plot_path output/chart.png
Upon execution, the script performs the following steps:
-
Schema Loading
- Loads the entire GraphQL schema from the specified file.
- Extracts all fields or only leaf fields based on the
--only_leafsflag.
-
Query Loading
- Recursively searches the specified directory for all
.graphqlquery files. - Reads each query, storing its file path and content.
- Recursively searches the specified directory for all
-
Field Usage Extraction
- Parses each query, handling fragments, and extracts hierarchical field names.
- Counts how many queries each field appears in.
-
Coverage Calculation
- Compares the extracted schema fields against the fields used in queries.
- Calculates the coverage percentage.
-
Report Generation
- Generates a CSV report detailing field usage and coverage.
- Creates a visual chart representing the coverage.
After successful execution, you will find the schema_coverage_report.csv and schema_coverage_chart.png in your specified output paths.
The calculate_csv_coverage_stats.py script provides aggregate statistics across multiple CSV coverage reports. This post-processing tool is useful when you have generated multiple coverage reports (e.g., for different GraphQL clients) and want to analyze overall coverage metrics.
The script aggregates coverage statistics from multiple CSV files, calculating both per-file metrics and overall aggregated statistics. This helps you understand coverage patterns across different schemas or clients.
For each CSV file, the script:
- Counts rows where the
Coveredcolumn isTrue(numerator) - Counts the total number of data rows excluding the header (denominator)
- Calculates the coverage fraction:
numerator / denominator
The script then computes two aggregate statistics:
- Average of all fractions: Sum of all individual fractions divided by the number of CSV files
- Overall fraction: Sum of all numerators divided by the sum of all denominators
- Python 3.x
- pandas library (install via
pip install pandas)
By default, the script processes CSV files in the results/csv/ directory:
python calculate_csv_coverage_stats.pyYou can specify a custom directory containing CSV files:
python calculate_csv_coverage_stats.py results/csvOr use an absolute path:
# Windows
python calculate_csv_coverage_stats.py "C:\path\to\csv\directory"
# Linux/Mac
python calculate_csv_coverage_stats.py /path/to/csv/directoryThe script prints detailed statistics for each CSV file and aggregate statistics at the end:
Processing 3 CSV file(s)...
ExternalAuthClient_schema_coverage_report.csv:
Covered entries: 65
Total entries: 260
Fraction: 0.2500 (65/260)
ExternalPublicClient_schema_coverage_report.csv:
Covered entries: 15
Total entries: 35
Fraction: 0.4286 (15/35)
InternalClient_schema_coverage_report.csv:
Covered entries: 156
Total entries: 270
Fraction: 0.5778 (156/270)
==================================================
STATISTICS:
==================================================
1. Average of all fractions: 0.4188
2. Overall fraction (sum of numerators / sum of denominators):
0.4177 (236/565)
==================================================
- Average of all fractions: This metric treats each CSV file equally, regardless of size. It's useful when you want to see the average coverage across different schemas or clients.
- Overall fraction: This metric weights each file by its size. It represents the true overall coverage when considering all fields across all files together.
- The script automatically excludes lock files (files starting with
.~lock) - Files with no data rows (denominator = 0) are skipped with a warning
- Errors in individual files are handled gracefully, allowing the script to continue processing remaining files
- The script validates that the provided directory exists and is a valid directory
The repository includes a Jupyter Notebook (graphql_coverage.ipynb) that serves as an interactive environment for experimenting with the coverage analysis. While the CLI script is intended for regular use, the notebook provides a deeper dive into each step of the process, leveraging comments and outputs to enhance understanding.
Contributions are welcome! If you encounter issues, have questions, or want to suggest improvements, please raise an issue on GitHub.
This project is licensed under the GNU Affero General Public License v3.0.
Thank you for using GraphQL Coverage! Your feedback and contributions help improve the tool for everyone.