# H5XRAY

H5XRAY is a visualization / reporting tool to better understand the 
A weekend project inspired by the h5cloud project at the 2023 ICESat-2 Hackweek.

__Jonathan Markel__  
3D Geospatial Laboratory  
The University of Texas at Austin  
09/09/2023

#### [Twitter](https://twitter.com/jonm3d) | [GitHub](https://github.com/jonm3d) | [Website](http://j3d.space) | [GoogleScholar](https://scholar.google.com/citations?user=KwxwFgYAAAAJ&hl=en) | [LinkedIn](https://www.linkedin.com/in/j-markel/) 

In [1]:
from h5xray import h5xray
import matplotlib.pyplot as plt

ModuleNotFoundError: No module named 's3_utils'

In [None]:
input_file = "data/atl03_4.h5"

In [None]:
# main function for notebook interaction
help(h5xray.analyze)

## Default Usage
At it's core, h5xray is meant to quickly visualize and report on the structure of and requests needed to read an HDF5 file. The barcode plot below shows blocks for each dataset within the H5 file. The width of a block represents the total size in bytes, and color indicates how many GET requests are needed to read in that data (blue is few). For the same size request / colorbar, more red = more requests = more $ to read from cloud storage.

In [None]:
h5xray.analyze(input_file) # default usage

For more programmatic uses, the report can be silenced and the plot can be saved to disk.

In [None]:
h5xray.analyze(input_file, report=False, plotting_options={'output_file':'img/barcode.png'}) # simple barcode

## Plot Details
The debug plot option creates more detailed plots, adding the title, colorbar, and labels to identify large datasets.

In [None]:
h5xray.analyze(input_file, report=False, plotting_options={'debug':True, 'output_file':'img/options_labels.png'})


## Request Details
It may be helpful to manually control the size of the GET requests when reading in data. Let's see how using larger GET requests changes the number needed to read in all the data, especially for larger datasets. Here, we see that the largest datasets needed fewer requests, and the barcode is lighter / bluer overall.


In [None]:
h5xray.analyze(input_file, request_byte_size=3*1024*1024, plotting_options={'debug':True})

## Plot Customization
Minor plot details will likely differ between HDF5 files, including the range of the colorbar, the colormap, the title, and the figure size. The font size of the dataset labels, and the threshold (in bytes) required to label a dataset can be changed for smaller/larger files.

In [None]:
# path to save image
output_file = 'img/options_all.png'
output_file

In [None]:
# try changing these!
plotting_options = {'debug':True, # whether to include the title, colormap, and labels
                    'cmap': plt.cm.RdYlBu_r, 
                    'byte_threshold':10 * 1024**2, # datasets with more than this get labeled
                    'font_size':9, # font size for dataset labels
                    'figsize':(10, 3),
                    'max_requests': 15, # specify colormap range
                    'title':'DEMO',
                    'output_file':output_file
                   }

h5xray.analyze(input_file, report=True, plotting_options=plotting_options)
