# How to initialize a PeakTree object
This tutorial shows how to format input data to `PeakTree(data)` that initializes an instance. PeakTree instantiation takes a single argument `data`, which must be an iterable of `(label,value)` pairs or `x,y` points.

How to do that is shown for a range of different data sources.

## Discrete data sets
How to find the hierarchical peaks in a discrete sequence of data points (e.g. sampled points, discrete 1-D functions, time series, etc.) is demonstrated for these cases:
1. An x-vector and a y-vector
2. Non-numeric x-coordinates
3. A list of (x,y) tuples
4. A dictionary of x: y
5. A y-vector only
6. A function y(x)
7. A generator function yielding (x, y)
8. Labels that are (label, value)-tuples themselves

In [1]:
# import the module
import hierarchical_peaks as hip

### Example 1: An x-vector and a y-vector
If data point coordinates are stored in two lists (`x_vector` and `y_vector`), then a `zip()` of the two lists can be used as input argument.

In [5]:
# small example data set:
x_vector = [n for n in range(2005,2022)]
y_vector = [8.5, 9.0, 5.0, 10.0, 7.5, 3.0, 6.0, 6.0, 2.0, 7.5, 7.5, 8.0, 6.0, 7.0, 4.0, 4.0, 5.0]


# initialize a PeakTree:
tree1 = hip.PeakTree(zip(x_vector, y_vector))


You can double-check which data points that `PeakTree` used to initialize this instance, because they are stored (as a dict) in the data attribute called `data`.

In [6]:
print(tree1.data)

{2005: 8.5, 2006: 9.0, 2007: 5.0, 2008: 10.0, 2009: 7.5, 2010: 3.0, 2011: 6.0, 2013: 2.0, 2014: 7.5, 2016: 8.0, 2017: 6.0, 2018: 7.0, 2019: 4.0, 2021: 5.0}


### Example 2: Non-numeric x-coordinates
The y-values must be numeric. But the x-labels can be any type of non-numeric object, as long as they can be used as keys in the `data` dict (must be unique and hashable).

In [7]:
x_nonnumeric = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

print(*(zip(x_nonnumeric, y_vector)))

('A', 8.5) ('B', 9.0) ('C', 5.0) ('D', 10.0) ('E', 7.5) ('F', 3.0) ('G', 6.0) ('H', 6.0) ('I', 2.0) ('J', 7.5) ('K', 7.5) ('L', 8.0) ('M', 6.0) ('N', 7.0) ('O', 4.0) ('P', 4.0) ('Q', 5.0)


In [8]:
# initialize a PeakTree:
tree2 = hip.PeakTree(zip(x_nonnumeric, y_vector))

print(tree2.data)

{'A': 8.5, 'B': 9.0, 'C': 5.0, 'D': 10.0, 'E': 7.5, 'F': 3.0, 'G': 6.0, 'I': 2.0, 'J': 7.5, 'L': 8.0, 'M': 6.0, 'N': 7.0, 'O': 4.0, 'Q': 5.0}


### Example 3: A list of (x,y) tuples
If data points are available as a sorted list of `(x, y)`, then this list can be used directly as input argument.

In [9]:
xy_list = list(zip(x_vector, y_vector))
print("xy_list = ", xy_list)

xy_list =  [(2005, 8.5), (2006, 9.0), (2007, 5.0), (2008, 10.0), (2009, 7.5), (2010, 3.0), (2011, 6.0), (2012, 6.0), (2013, 2.0), (2014, 7.5), (2015, 7.5), (2016, 8.0), (2017, 6.0), (2018, 7.0), (2019, 4.0), (2020, 4.0), (2021, 5.0)]


In [10]:
# initialize a PeakTree:
tree3 = hip.PeakTree(xy_list)

print(tree3.data)

{2005: 8.5, 2006: 9.0, 2007: 5.0, 2008: 10.0, 2009: 7.5, 2010: 3.0, 2011: 6.0, 2013: 2.0, 2014: 7.5, 2016: 8.0, 2017: 6.0, 2018: 7.0, 2019: 4.0, 2021: 5.0}


### Example 4: A dict of x: y
If data points are stored in a dictionary, and they were inserted in sorted order, then the dict's `.items()` can be used as input argument. (In the current implementation, however, the dict itself can also be used as argument.)

In [11]:
xy_dict = dict(zip(x_vector, y_vector))
print("xy_dict = ", xy_dict)

xy_dict =  {2005: 8.5, 2006: 9.0, 2007: 5.0, 2008: 10.0, 2009: 7.5, 2010: 3.0, 2011: 6.0, 2012: 6.0, 2013: 2.0, 2014: 7.5, 2015: 7.5, 2016: 8.0, 2017: 6.0, 2018: 7.0, 2019: 4.0, 2020: 4.0, 2021: 5.0}


In [12]:
# initialize a PeakTree:
tree4 = hip.PeakTree(xy_dict.items())

print(tree4.data)

{2005: 8.5, 2006: 9.0, 2007: 5.0, 2008: 10.0, 2009: 7.5, 2010: 3.0, 2011: 6.0, 2013: 2.0, 2014: 7.5, 2016: 8.0, 2017: 6.0, 2018: 7.0, 2019: 4.0, 2021: 5.0}


### Example 5: A y-vector only
If the data set is a list of y-values alone, then the index of the list is a natural x-coordinate, and an `enumerate()` of the list can be used as input argument.

In [13]:
print("y_vector = ", y_vector)

y_vector =  [8.5, 9.0, 5.0, 10.0, 7.5, 3.0, 6.0, 6.0, 2.0, 7.5, 7.5, 8.0, 6.0, 7.0, 4.0, 4.0, 5.0]


In [14]:
# initialize a PeakTree:
tree5 = hip.PeakTree(enumerate(y_vector))

print(tree5.data)

{0: 8.5, 1: 9.0, 2: 5.0, 3: 10.0, 4: 7.5, 5: 3.0, 6: 6.0, 8: 2.0, 9: 7.5, 11: 8.0, 12: 6.0, 13: 7.0, 14: 4.0, 16: 5.0}


### Example 6: A function y(x)
If we only have the x labels, but the y-values can be calculated with a function, then a generator expression like `((x, y(x)) for x in range(2005,2022))` can be used as input argument. 

In [15]:
def y(x):
    return y_vector[x - 2005]

# For example,
print("In 2005, y was", y(2005))
print("In 2021, y was", y(2021))

In 2005, y was 8.5
In 2021, y was 5.0


In [16]:
# initialize a PeakTree:
tree6 = hip.PeakTree(((x, y(x)) for x in range(2005,2022)))

print(tree6.data)

{2005: 8.5, 2006: 9.0, 2007: 5.0, 2008: 10.0, 2009: 7.5, 2010: 3.0, 2011: 6.0, 2013: 2.0, 2014: 7.5, 2016: 8.0, 2017: 6.0, 2018: 7.0, 2019: 4.0, 2021: 5.0}


### Example 7: A generator function yielding (x, y)
Sometimes you don't want a generator expression as input argument, but a generator function instead.

In [17]:
def xy_generator():
    for x in range(2005,2022):
        yield x, y_vector[x - 2005]

In [18]:
# initialize a PeakTree:
tree7 = hip.PeakTree(xy_generator())

print(tree7.data)

{2005: 8.5, 2006: 9.0, 2007: 5.0, 2008: 10.0, 2009: 7.5, 2010: 3.0, 2011: 6.0, 2013: 2.0, 2014: 7.5, 2016: 8.0, 2017: 6.0, 2018: 7.0, 2019: 4.0, 2021: 5.0}


### Example 8: Labels that are (label, value)-tuples themselves
The x-labels can be more complex objects. For example, they can be `(x, y)` coordinate pairs:

In [21]:
print("xy_pairs = ", *zip(x_vector, y_vector))

xy_pairs =  (2005, 8.5) (2006, 9.0) (2007, 5.0) (2008, 10.0) (2009, 7.5) (2010, 3.0) (2011, 6.0) (2012, 6.0) (2013, 2.0) (2014, 7.5) (2015, 7.5) (2016, 8.0) (2017, 6.0) (2018, 7.0) (2019, 4.0) (2020, 4.0) (2021, 5.0)


In [22]:
# initialize a PeakTree:

tree8 = hip.PeakTree(zip(zip(x_vector, y_vector), y_vector))

print(tree8.data)

{(2005, 8.5): 8.5, (2006, 9.0): 9.0, (2007, 5.0): 5.0, (2008, 10.0): 10.0, (2009, 7.5): 7.5, (2010, 3.0): 3.0, (2011, 6.0): 6.0, (2013, 2.0): 2.0, (2014, 7.5): 7.5, (2016, 8.0): 8.0, (2017, 6.0): 6.0, (2018, 7.0): 7.0, (2019, 4.0): 4.0, (2021, 5.0): 5.0}


So it is straightforward to adapt data from discrete data sources. (Might be more automated in future.)

## Continuous functions
For finding peaks in a continuous function, if you sample the function with high enough density to capture all hills and valleys, you are back to the discrete case.