# Brief demonstration of ncompare: comparing the structure, groups, variables, and attributes of two netCDF files

Installation instructions for `ncompare` can be found in either of these locations:

- [GitHub repository](https://github.com/nasa/ncompare)
- [Pip entry](https://pypi.org/project/ncompare/)

# Command Line Usage

## `ncompare`'s command line arguments, provided by the `--help` description

***✍️ Syntax Note:*** Commands are preceded by an exclamation point "!"
(which is needed to [run shell commands in a Jupyter notebook](https://stackoverflow.com/a/48529220)) can be run from a terminal.  
In a shell/terminal, the exclamation point should not be used.

In [1]:
! ncompare --help

usage: ncompare [-h] [--only-diffs] [--file-text FILE_TEXT]
                [--file-csv FILE_CSV] [--file-xlsx FILE_XLSX] [--no-color]
                [--show-attributes] [--show-chunks]
                [--column-widths COLUMN_WIDTHS COLUMN_WIDTHS COLUMN_WIDTHS]
                [--version]
                path_a path_b

Compare the variables contained within two different netCDF datasets

positional arguments:
  path_a                First (netCDF or HDF) file
  path_b                Second (netCDF or HDF) file

options:
  -h, --help            show this help message and exit
  --only-diffs          Only display variables and attributes that are
                        different
  --file-text FILE_TEXT
                        A text file to which the output will be written.
  --file-csv FILE_CSV   A csv (comma separated values) file to which the
                        output will be written.
  --file-xlsx FILE_XLSX
                        An Excel file to which the output will be writ

## Example 1: Two netCDF files with the same groups, variables, and attributes
----

Data files are first defined. The examples here rely on three files: two from NOAA National Centers of Environmental Information's (NCEI) (a) _[Global Precipitation Climatology Project (GPCP) Climate Data Record (CDR), Monthly V2.3](https://doi.org/10.7289/V56971M6)_ and one from the (b) _[Climate Data Record (CDR) of Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN-CDR), Version 1 Revision 1)](https://doi.org/10.7289/V51V5BWQ)_ (a daily quasi-global precipitation product), accessible via [this GPCP catalog](https://www.ncei.noaa.gov/thredds/catalog/cdr/gpcp_final/2023/catalog.html) and [this PERSIANN catalog](https://www.ncei.noaa.gov/thredds/catalog/cdr/persiann/catalog.html):

1. https://www.ncei.noaa.gov/thredds/catalog/cdr/gpcp_final/2023/catalog.html?dataset=cdr_gpcp_final/2023/gpcp_v02r03_monthly_d202301_c20230411.nc
2. https://www.ncei.noaa.gov/thredds/catalog/cdr/gpcp_final/2023/catalog.html?dataset=cdr_gpcp_final/2023/gpcp_v02r03_monthly_d202302_c20230505.nc
3. https://www.ncei.noaa.gov/thredds/fileServer/cdr/persiann/2023/PERSIANN-CDR_v01r01_20230419_c20231030.nc

In [2]:
from pathlib import Path

file_urls = [
    "https://www.ncei.noaa.gov/thredds/fileServer/cdr/gpcp_final/2023/gpcp_v02r03_monthly_d202301_c20230411.nc",
    "https://www.ncei.noaa.gov/thredds/fileServer/cdr/gpcp_final/2023/gpcp_v02r03_monthly_d202302_c20230505.nc",
    "https://www.ncei.noaa.gov/thredds/fileServer/cdr/persiann/2023/PERSIANN-CDR_v01r01_20230419_c20231030.nc",
]

file_names = [Path(url).name for url in file_urls]

To download these files (e.g., for the first time running this notebook), run the following:

In [3]:
import requests

for url, filename in zip(file_urls, file_names):
    r = requests.get(url, allow_redirects=True)
    open(filename, "wb").write(r.content)

Next, we pass the two filepaths to `ncompare`, and any differences would be printed in red. In this case, there are no differences; therefore, all of the variables are printed in black.

***✍️ Syntax Note:*** the curly brackets, "{" and "}", that follow are simply a way to [substitute python variables into a shell command](https://stackoverflow.com/a/35497161). 
In a shell/terminal, one can just write out the full arguments, separated by spaces.
For example, the following command would be run at the terminal as `ncompare notebook_example_data/MOP03JM-202205-L3V95.6.3.he5 notebook_example_data/MOP03JM-202205-L3V95.9.3.he5`

***✍️ `ncompare` Options Note:*** the `--column-widths 33 26 26` arguments are optional, and they are being used here to shrink the columns width-wise from their defaults to a size that fits better in the GitHub notebook renderer.

In [4]:
! ncompare --column-widths 33 26 26 {file_names[0]} {file_names[1]}

[37m[0mFile A: gpcp_v02r03_monthly_d202301_c20230411.nc[0m
[0m[37m[0mFile B: gpcp_v02r03_monthly_d202302_c20230505.nc[0m
[0m[37m[0m[94m
Root-level Dimensions:[0m
[0m[37m[0m	[36mAre all items the same? ---> True.[0m
[0m[37m[0m	[36m[('latitude', 72), ('longitude', 144), ('nv', 2), ('time', 1)][0m
[0m[37m[0m[94m
Root-level Groups:[0m
[0m[37m[0m	[36mAre all items the same? ---> True.  (No items exist.)[0m
[0m[37m[0m[94m
All variables:[0m
[0m                                                       File A                     File B[0m
[0m                     All Variables                                                      [0m
[0m                                 - -------------------------- --------------------------[0m
[0m                                                                                        [0m
[0m                         GROUP #00 -------------------------/ -------------------------/[0m
[0m           num variables in group:  

## Example 2: Two netCDF files with different groups, variables, and attributes
----

In [5]:
! ncompare --column-widths 33 30 30 {file_names[0]} {file_names[2]}

[37m[0mFile A: gpcp_v02r03_monthly_d202301_c20230411.nc[0m
[0m[37m[0mFile B: PERSIANN-CDR_v01r01_20230419_c20231030.nc[0m
[0m[37m[0m[94m
Root-level Dimensions:[0m
[0m[37m[0m	Are all items the same? ---> [31mFalse.  (2 items are shared, out of 6 total.)[0m
[0m[37m[0m	[31mWhich items are different?[0m
[0m                                                           File A                         File B[0m
[0m[37m[0m                              [31m #00 ------------------------------ ------------------('lat', 480)[0m
[0m[37m[0m                              [31m #01 --------------('latitude', 72) ------------------------------[0m
[0m[37m[0m                              [31m #02 ------------------------------ -----------------('lon', 1440)[0m
[0m[37m[0m                              [31m #03 ------------('longitude', 144) ------------------------------[0m
[0m                               #04 ---------------------('nv', 2) ---------------------('nv'

#### More file details can be examined by using the `--show-attributes` and `--show-chunks` options

In [6]:
! ncompare --show-attributes --show-chunks --column-widths 33 30 30 {file_names[0]} {file_names[2]}

[37m[0mFile A: gpcp_v02r03_monthly_d202301_c20230411.nc[0m
[0m[37m[0mFile B: PERSIANN-CDR_v01r01_20230419_c20231030.nc[0m
[0m[37m[0m[94m
Root-level Dimensions:[0m
[0m[37m[0m	Are all items the same? ---> [31mFalse.  (2 items are shared, out of 6 total.)[0m
[0m[37m[0m	[31mWhich items are different?[0m
[0m                                                           File A                         File B[0m
[0m[37m[0m                              [31m #00 ------------------------------ ------------------('lat', 480)[0m
[0m[37m[0m                              [31m #01 --------------('latitude', 72) ------------------------------[0m
[0m[37m[0m                              [31m #02 ------------------------------ -----------------('lon', 1440)[0m
[0m[37m[0m                              [31m #03 ------------('longitude', 144) ------------------------------[0m
[0m                               #04 ---------------------('nv', 2) ---------------------('nv'

# Python Package Usage Example
----

In [7]:
from ncompare import compare

In [8]:
total_number_of_differences = compare(
    file_names[0],
    file_names[2],
    only_diffs=True,
    show_chunks=True,
    show_attributes=True,
    column_widths=[33, 30, 30],
)

File A: gpcp_v02r03_monthly_d202301_c20230411.nc
File B: PERSIANN-CDR_v01r01_20230419_c20231030.nc

Root-level Dimensions:
	Are all items the same? ---> False.  (2 items are shared, out of 6 total.)
	Which items are different?
                                                           File A                         File B
                               #00 ------------------------------ ------------------('lat', 480)
                               #01 --------------('latitude', 72) ------------------------------
                               #02 ------------------------------ -----------------('lon', 1440)
                               #03 ------------('longitude', 144) ------------------------------

Root-level Groups:
	Are all items the same? ---> True.  (No items exist.)

All variables:
                                                           File A                         File B
                     All Variables                                                              
   

The output of `ncompare` is the total number of differences (across _variables_, _groups_, and _attributes_):

In [9]:
print(total_number_of_differences)

114


END of Notebook.