# Storing Histograms in Hepfiles
A great example of the use of hepfiles is storing information about many histograms with varying numbers of bins in a hepfile. This tutorial walks through this process.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import hepfile as hf

## Generating a Dataset
We first generate 5 random datasets with varying length. If you are planning to do this, you probably already have your dataset, this is just for an example.

In [None]:
# generate some random normally distributed datasets
datasets = []
for i in range(1,6):
    rand = np.random.normal(size=10**i)
    datasets.append(rand)

## Preparing the Dataset
Then, we can calculate/create the following information about each dataset:
* bins and edge information from numpy histogram
* x-label
* y-label
* title

You can imagine that each histogram is like an "event" which allows us to pack the bins and edge information in a group called `histogram` and the rest of the information as singletons. To easily pack the data into a hepfile, we can store all of this information in a list of dictionaries. This will allow us to call `hepfile.dict_tools.dictlike_to_hepfile` easily. 

We also plot the histogram data we calculate so that you can see what the datasets look like and how they differ!

In [None]:
data_dicts = []

for rand in datasets:
    
    i = int(np.log10(len(rand)))
    
    # append an empty dictionary to store the data about this histogram in
    data_dicts.append({}) 
    
    # generate random data and then the histogram bins and edges using numpy
    # vary the number of bins because that's what hepfile is good at!
    bins, edges = np.histogram(rand, bins=int(4**i)) 
    
    # plot this data
    plt.figure()
    plt.bar(edges[:-1], bins, align='edge', width=edges[1]-edges[0])
    
    ylabel = f'y-label {i}'
    plt.ylabel(ylabel)
    data_dicts[-1]['ylabel'] = ylabel # store the ylabel in the recently appended dict
    
    xlabel = f'x-label {i}'
    plt.xlabel(xlabel)
    data_dicts[-1]['xlabel'] = xlabel # store the xlabel in the recently appended dict
    
    title = f'Histogram with 10^{i} values'
    plt.title(title)
    data_dicts[-1]['title'] = title # store the title in the recently appended dict
    
    # store the bin and edge information in a sub dictionary
    data_dicts[-1]['histogram'] = {}
    data_dicts[-1]['histogram']['bins'] = bins
    data_dicts[-1]['histogram']['edges'] = edges

Below is the list of data dictionaries that we plan to store in the hepfile. Notice how it is heterogeneous and the lengths of the bins and edges arrays vary between histograms!

In [None]:
print(data_dicts)

## Writing the hepfile

Now that we have a list of dictionaries, we can easily write this to a hepfile using `hepfile.dict_tools.dictlike_to_hepfile`:

In [None]:
# data_dicts is in a format we can easily save to a hepfile!
filename = 'histogram-hepfile.h5'
data = hf.dict_tools.dictlike_to_hepfile(data_dicts, filename)

Then we can view the data from the awkward array that is returned! Also note that data has been saved as a hepfile called `histogram-hepfile.h5` that can be read in using `hepfile.load` if you wish.

In [None]:
data.histogram.bins