# The `trendlist` Package

If you can put your finger between any two elements of a sequence, `s`, and everything to your left is less than everything to your right, the sequence is sorted. A mathematician would say "It's **monotonically increasing**."

A sequence of $N$ random reals has $N!$ permutations. Of these, only one is sorted, that is, the odds that such a sequence is sorted, by chance, are $1/N!$. 

This probability gets small quickly.

In [None]:
import math

for n in range(1, 11):
    print(f"{n=}, P('sorted') = 1/{math.factorial(n)}")

While there's only a vanishingly small probability that a long sequence of distinct numbers is monotonic,
you can decompose such a sequence into a list of monotonic sequences.

For example, the list $w = [64, 2, 4, 8, 16, 1024, 32, 512, 128, 256]$ breaks into four, sorted, sub-sequences:
[64], [2, 4, 8, 16, 1024], [32, 512], [128, 256].

Studying the statistical properties of such subsequences, called *ascents*, leads to the **Eulerian numbers**,
$\left<{n} \atop {k}\right>$ (using Don Knuth's notation), which are
defined by the recursion 
$\left<{n} \atop {k}\right> = \left<{n-1} \atop {k}\right> * k + \left<{n-1} \atop {k-1} \right>*(n-k++1)$

We won't discuss Eulerian numbers here, but you can find them in [Sloane's On-line Encyclopedia of Integer Sequences](https://en.wikipedia.org/wiki/On-Line_Encyclopedia_of_Integer_Sequences), where they're entry [OEIS A008292](https://oeis.org/A008292)


What if you lower your expectations by relaxing the criterion?
Suppose, for example, you just look for sequences in which the *average* of the numbers to the left of your finger (the **prefix**)
is always less than the average of those to your right (the **suffix**). 

For example, in the list $[2, 1, 8, 4, 16]$,

* $mean([2]) < mean([1, 8, 4, 16])$
* $mean([2, 1]) < mean([8, 4, 16])$
* $mean([2, 1, 8]) < mean([4, 16)]$
* $mean([2, 1, 8, 4]) < mean([16])$

We'll call such a sequence a **trend**.

Just as sequences decompose, uniquely into subsequences that are monotonic, they decompose, uniquely, into trends.

The `trendlist` package makes such decompositions easy. You can install it as a Python package from PyPI like this:

In [None]:
pip install trendlist

You can find [package documentation on ReadTheDocs](https://trendlist.readthedocs.io/en/latest/index.html).

This notebook sketches the package implementation, and its data structures and methods.

The sketches will often have less detail and precision than you'll find in the package itself because we want you to see the forest, not the trees and every detail of the biota. When you want to see those details, they're in
[the package source on GitHub](https://github.com/jsh/trendlist.git).

Our language will also be quite informal.

We don't however, think either the sketches or the language are misleading.
We think reading the sketches and our explanations will make reading the code easier,
and let you construct precise proofs when you need to.

### Generating Lists

It's useful to have a couple of functions to use in the sections that follow.
One of these, `pows()`, returns lists of powers of a specified `base`, defaulting to powers of two.

In [None]:
def pows(n, base=2, start=0):
    start = start % n # in case start >= n
    for i in range(start, n):
        yield base**i
    for i in range(start):
        yield base**i
    
print(list(pows(10)))
print(list(pows(10, base=3, start=4)))

Setting the `start` argument lets you retrieve the same list, but starting somewhere else. When you reach the end, go back to the beginning and start again. This is a **circular permutation** of the list you'd have starting at position $0$ in the original list.

Having that ability will prove quite useful, down the road.

The second function, `rands()`, generates lists of random numbers, with a given seed.

In [None]:
import random

def rands(n, seed=None, start=0):
    random.seed(seed)
    start = start % n         # in case start >= n
    for _ in range(start):
        random.random()       # ignore first `start` numbers
    for _ in range(start, n):
        yield(random.random())
    random.seed(seed)        # go back to the beginning
    for _ in range(start):
        yield(random.random())

list(rands(5))

The `seed` argument helps testing, because it lets you generate a reproduceable list of (pseudo-)random numbers.

In [None]:
print(list(rands(5, 0)))
print(list(rands(5, 0)))

In [None]:
print(list(rands(3, seed=0)))
print(list(rands(3, seed=0))) # exactly the same list of pseudo-random numbers
print(list(rands(3, seed=0, start=1))) # rotate the list by 1
print(list(rands(3, seed=0, start=2))) # rotate the same list by 2
print(list(rands(3, seed=0, start=3))) # rotate by 3, to bring you back to the beginning
print(list(rands(3, seed=0, start=4))) # rotating by 4 is just like rotating by 1

One difference between these two worth noting is that `pows()` just returns a list, while `rands()` is a generator.
`pows()` is used in simple examples, but rarely invoked to create long sequences.
In practice, the largest integers you'd probably want are only about 64-bits long. the largest lists from `pows()` not much longer than five dozen elements.

In contrast, you'll find yourself wanting `rands()` to make sequences of a million or more floats, so a generator is the way to go. 

## Is This List a Trend?

Because the overall mean of a list, `s`, lies between the means of each prefix, `s[:i]`, and its suffix, `s[i:]`, here's a simple test of whether a sequence is a trend:

In [None]:
import statistics

def is_trend(s):
    mean_s = statistics.mean(s)
    for i in range(1, len(s)):
        if statistics.mean(s[:i]) > mean_s:  # prefix mean greater than the whole, so greater than the suffix, too.
            return False
    return True # every prefix mean is less than its suffix mean

In [None]:
s = [2, 1, 8, 4, 16]
print(is_trend(s))

Systematically testing all $4! = 24$ permutations of [1,2,4,8], you find these $6$ trends:

In [None]:
import itertools

s = [1, 2, 4, 8]
for perm in itertools.permutations(s):
    if is_trend(perm):
        print(perm)

You can show that for a sets of size $N$, $(N-1)!$ of its permutations are trends.
So, for $N = 10$, only $1$ out of the $10! == 3628800$ arrangements is sorted,
but $1$ out of $10$ of those arrangements, or $9! == 362880$, are trends.

That's much better odds.

## Decomposing a List into Trends

Here's a simple, direct method to decompose a list into trends.

1. Find the longest prefix trend.
1. Next, find the longest trend in the remaining suffix.
1. Repeat this until you've carved the entire list into trends.

In [None]:
from copy import copy

def pfx_trend(s):
    # Return the first trend in the sequence.
    t = s.copy()
    while t: # start with the whole sequence
        if is_trend(t): # work backwards until you find a trend
            return t
        t.pop()

def decompose(s):
    # Decompose a sequence into its trends,
    # return the list of trends.
    decompose = []
    while s:
        p = pfx_trend(s)    # find the longest, leftmost trend
        decompose.append(p) # tack it onto the end of the trendlist
        p_len = len(p)      
        s = s[p_len:]       # decompose what remains
    return decompose

In [None]:
import itertools

for perm in itertools.permutations(pows(3)):
        print(f"{perm} -> {decompose(list(perm))}") # itertools.permutations() returns tuples

This direct approach works fine for shorter lists of length 5 or 50, but when you want to decompose lists of $500$ or $5,000$, you're manipulating long, in-core lists. Even if you switch to floats, so you don't have list elements that are huge powers of two, this algorithm starts off like a herd of turtles. 

With $N=10,000$, it's unuseable.

In [None]:
for n in [5, 50, 500]:
    print(f"{n=:4_d}: ", end="")
    %timeit decompose(list(rands(n)))

This approach is still useful for getting your feet wet, because it's easy to wrap your head around, there aren't many abstractions, and you can make sense of all the individual steps.
You'll find code you can use for it in the sub-package `trendlist.simple`.

We'll move on, though, to a different approach.

## Decomposing a List into Trends, Try Two

The key attributes of a trend are its mean and its length,
so let's define a `Trend` class with just those two attributes.

In [None]:
from dataclasses import dataclass

@dataclass(order=True)
class Trend:
    mean: int = None
    length: int = 1

In [None]:
t = Trend(mean=6, length=9)
print(t)
t2 = Trend(9, 6)
print(t2)

t3 = Trend(math.pi)
print(t3)

`@dataclass(order=True)` orders `Trend` objects by their means. `<`, `>`, etc. will all work correctly on `Trend` objects.

In [None]:
t1 = Trend(6.9, 10)
t2 = Trend(9.6, 1)
t1 < t2

It's also easy to merge `Trend` instances, like this:

In [None]:
import operator

def merge(self, other, reverse=False):
        can_merge = operator.gt if reverse else operator.lt
        if can_merge(self, other):
            length = self.length + other.length
            mean = (self.length*self.mean + other.length*other.mean)/length  # weighted average
            return Trend(mean, length)
        return None
    
Trend.merge = merge

merge(Trend(mean=2, length=2), Trend(mean=4, length=2))

(Ignore the `reverse` argument for a moment. We'll explain it when we need to use it, below.)

Next, we'll define a `TrendList` as a list of `Trend` objects, 
with a method that lets us merge a `Trend` onto the right end of a `TrendList`

In [None]:
from collections import deque

class TrendList(list):
    def __init__(self, s, reverse=False):
        self.reverse = reverse
        for elem in s:
            if not isinstance(elem, Trend): # accept either Trends or numbers
                elem = Trend(elem)
            self.append(elem)

    def append(self, other):
        if not self:
            super().append(other)
            return
        popped = self.pop()
        if merged := popped.merge(other, reverse=self.reverse):  # merge and recurse
            self.append(merged)
        else:  # new trend cannot merge
            self.extend([popped, other])  # push popped back on, then other

In [None]:
s = [1, 2, 4]
for perm in itertools.permutations(s):
        print(f"{perm} -> {TrendList(list(perm))}") # itertools.permutations() returns tuples

We could use a list of Trends as initializers, too!

In [None]:
s = [Trend(elem) for elem in [1, 2, 4]]  # now try a list of Trends
print(f"s is now {s}")
for perm in itertools.permutations(s):
    print(f"{perm} -> {TrendList(list(perm))}") # itertools.permutations() returns tuples

If we reverse a list, then decompose it into a `TrendList` for a *downwards* trend, (`reverse=True`), we should get the same list of `Trends`, reversed.

In [None]:
s = [1, 2, 4]
for perm in itertools.permutations(s):
        trends= TrendList(perm)
        rperm = reversed(list(perm))
        rtrends = list(reversed(TrendList(rperm, reverse=True)))
        print(f"{perm=}: reversing everything gets you back to where you started: {rtrends == trends}")

But now, we can decompose even large sequences quite quickly. This algorithm is $O(N)$.

In [None]:
for n in pows(7, base=10):
    print(f"{n=:10_d}: ", end="")
    %timeit TrendList(rands(n))

On my laptop, the last line tells me this new approach decomposes a list of a *million* random floats into trends, in an average of one and a third seconds for seven runs. That's two-tenths of a second per run, without breaking a sweat.

In trade, what we give up is knowing the contents of those trends -- the actual random numbers in each trend.

## Creating Single Trends

Sorting a sequence of random numbers turns it a single, monotonic sequence. Is there an efficient way to turn it into a single trend? Two ways come right to mind:

1. A sorted sequence is a trend. We could sort it. Sorting is $O(N * log(N))$

1. We said, before, that exactly one circular permutation of every sequence is a single trend. We could rotate the sequence, looking for the one that only decomposes into a single trend.

This brute-force approach would be slower -- $O(N^2)$ -- since there are $N$ rotations, and decomposing each is also $O(N)$. Fortunately, there's a faster approach here, too.

1. Decompose the sequence into trends. This is $O(N)$
1. Rotate trends front-to-back to create a circular permutation that's a single trend. The average number of trends is only $O(log(N))$

Using this approach, turning a random sequence into a trend should only be $O(N) + O(log(N)) = O(N)$

In [None]:
t = TrendList([2, 1])
left = t[0]
right = TrendList(t[1:])
right.append(left)
right

In [None]:
from collections import namedtuple

Rotations = namedtuple('Rotations', 'start rotations')

def rotate(self):
    merged = self
    left = merged[0]
    right = TrendList(merged[1:])
    right.append(left)
    merged = right
    return merged
    
def single(self):
    new = self
    rotations = 0   # how many rotations to get to a single trend
    orig_start = 0    # position in original list that will become new[0]
    while len(new) > 1:
        orig_start += new[0].length  # positon in self that will become new[0] following rotation
        new = new.rotate()
        rotations += 1
    return Rotations(start = orig_start, rotations=rotations)

TrendList.rotate = rotate
TrendList.single = single

We can check the speed.

In [None]:
for n in pows(7, base=10):
    print(f"{n=:10_d}: ", end="")
    %timeit TrendList(rands(n)).single()

Yep. Still $O(N)$

Here's a simple example, to illustrate how it works.

In [None]:
s = pows(4)
for perm in itertools.permutations(s):
        trends = TrendList(list(perm)) # itertools.permutations() returns tuples
        print(f"{perm} ->")
        print(f"\t{trends} ->")
        print(f"\t\t{trends.single()}")

## Bucking the Trend

We now have the tools we need to study trends: 

* two classes: `Trend` and `Trendlist`
* operators in those two classes: `Trend.merge(), Trendlist.append(), Trendlist.rotate(), Trendlist.single()`
* good performance, both in speed and in space
* a pair of utility routines to generate sequences to study: `pows(), rands()`

As an example, let's do something a little kinky. 
Instead of looking at falling trends inside of random sequences, we'll look at falling trends inside of sequences that we know are single, rising trends.  
That is, we'll

* decompose a sequence into trends
* rotate the trendlist to create a single trend
* decompose that sequence into falling trends, which go the other way
* look at those falling trends

We'll "buck the trend."

In [None]:
def buck(s):
    trends = TrendList(s)
    single_start = trends.single().start
    rising_trend = s[single_start:] + s[:single_start]
    print(f"{rising_trend=:}")
    buck_trend = TrendList(rising_trend, reverse=True)
    print(f"decomposed into falling trends = {buck_trend}")
    

And now you see what the `reverse` argument is for in $TrendList$ and $Trend$: Using it generates **falling trends**:
trends where the average of everything to the left of your finger is *greater than* the average of everything to the right, wherever you put your finger. Things are always looking down.

In [None]:
buck([4,8,2,1])

What if, instead of lists of integers, we want to look at sequences of random floats?
We'll tweak `buck()` slightly replacing the list argument, which specified the sequence to use,
with a length and just generate a random sequence that long.

This, at last, shows off what that third argument of `rands()` is good for.
Though `TrendList` throws away the generated, random numbers of the original list,
keeping track of `seed` lets us regenerate the sequence, 
and then `start` lets us rotate it to a single trend.


In [None]:
def buck(n):
    seed = random.random() # random seed
    s = rands(n, seed=seed)
    trends = TrendList(s)
    single_start = trends.single().start
    single_trend = rands(n, seed=seed, start=single_start) # same, random numbers, rotated to a single trend!
    return TrendList(single_trend, reverse=True) # and broken into trends in the opposite direction

buck(1_000)

Let's time that.

In [None]:
for n in pows(7, base=10):
    print(f"{n=:10_d}: ", end="")
    %timeit buck(n)

Which is what you'd expect: 
requiring twice as many decompositions into trends -- one rising, one falling -- roughly doubles the runtime.

And how many trands are we talking about?

In [None]:
for n in pows(7, base=10):
    print(f"{n=:10_d}: ", end="")
    print(f"random={len(TrendList(rands(n)))}, buck-trends={len(buck(n))}")

Also what you'd expect! If you start with a sequence that's already trend in one direction, the trends in the other direction should be shorter, and therefore more frequent than the ones you get starting with a simple, random sequence.

## Summary

You can import the `trendlist` package to play with sequence trends and lists of trends.

The main package defines two classes, `Trend` and `TrendList`, with methods on each, that make writing code to investigate these easy.


The subpackage, `trendlist.simple` defines other methods and functions that let you do the same tasks with less sophistication, while preserving all the data so you can look at it while you're playing. It's a self-teaching tool.

`trendlist` also has a pair of convenient utility functions, `pows()` and `rands()` that make it easy to create lists of nunmbers you can decompose.

