A fast and easy-to-use tool that analyzes CSV files for data completeness by checking for missing/invalid values across different column combinations.
- Checks for missing or invalid values across user defined column combinations.
- Saves and loads analysis configurations to and from JSON files for easy reuse.
- Interactive command line interface to guide you through the analysis setup.
- Full command line automation support for scripting and batch processing.
Just let CMake do its magic.
Dependencies (automatically fetched):
- fast-cpp-csv-parser
- nlohmann_json
- cxxopts
- fmt
Run the application without arguments to use the interactive mode:
./csv-completeness-checker
The application will guide you through the following steps:
-
Provide a CSV or JSON file:
- You can start with a
.csvfile to analyze. - Or, you can provide a
.jsonfile with a saved configuration from a previous session.
- You can start with a
-
Select Fields to Analyze:
- Choose the columns from your CSV file that you want to include in the analysis.
-
Define Invalid Values:
- For each selected field, you can specify a comma-separated list of values that should be considered invalid or empty (e.g.,
N/A,Unspecified,-1).
- For each selected field, you can specify a comma-separated list of values that should be considered invalid or empty (e.g.,
-
Define Column Combinations:
- Specify the combinations of columns you want to check for completeness. Use
:to require multiple fields together and/inside a group to indicate alternatives. For example:1:3checks rows where both column 1 and column 3 are valid.1:2:3/4checks rows where columns 1 and 2 are valid and at least one of columns 3 or 4 is valid.
- You can also provide multiple combinations separated by commas (e.g.,
1:2, 1:2:3/4).
- Specify the combinations of columns you want to check for completeness. Use
-
Save Configuration (Optional):
- You can save your configuration (selected fields, invalid values, and combinations) to a JSON file for later use.
-
Process the CSV:
- The tool will then process the CSV file and output the results for each column combination, showing both the raw counts (valid rows / total rows) and the completeness percentage.
Use command line arguments to automate the analysis without interactive prompts:
./csv-completeness-checker [OPTIONS]
Options:
| Argument | Description |
|---|---|
--csv, -c |
Path to CSV file to analyze |
--config, -C |
Path to JSON configuration file (overrides --csv) |
--output, -o |
Path to save output JSON configuration |
--fields, -f |
Comma-separated field numbers to analyze |
--invalid-values, -i |
Invalid values mapping (format: field:value1,value2:field:value3...) |
--combinations, -b |
Column combinations to check (format: 1:2,1:3/4) |
--format |
Output format: text, json, csv, or keyvalue (will also enable quiet mode) |
--silent, -q |
Minimal output (only results) |
--help, -h |
Display help message |