# Using the `trendlist` Package

`trendlist` is a Python package that lets you create, manipulate, and explore trends.
This notebook will show you how to use it, with half a dozen examples.

(If you aren't familiar with trends at all, you may want to start by working through the Trends_Tutorial notebook.)

If you want to look at the source, you can clone it with `git clone https://github.com/jsh/trendlist.git`,
but a ready-to-go package, `trendlist`, is already in the Python Package Index, PyPI, so you can install it directly like this:

In [None]:
pip install trendlist

Once installed, you can just import and use it in the usual way.

In [None]:
import trendlist

## How fast can you decompose a random sequence into trends?

### Brute Force Is Slow.

In the trends tutorial, you decomposed random sequences into maximal trends 
in two ways. 

The first used `trendlist.simple.trend_list`, which finds the longest prefix that is a trend, then does that again on the remaining suffix, and continues until the sequence is completely decomposed.

In [None]:
from trendlist import rands
from trendlist.simple import trend_list
%timeit trend_list(list(rands(600)))

`trend_list` is fast on short sequences, but slows down sharply as the sequences grow longer than a few hundred. It's impractical for really long sequences of random numbers.

`trendlist.TrendList` offers a faster alternative.

## TrendList Is Fast

In [None]:
from trendlist import TrendList
%timeit TrendList(rands(1_000))

We expect longer sequences to take longer to decompose, 
but how quickly do `TrendList` run times grow with sequence size?

In [None]:
for n in range(1_000, 10_000, 1_000):
    print(f"{n=}: ", end="")
    %timeit TrendList(rands(n))

Looks quite linear. Unless this breaks down, you'd predict `TrendList` would decompose a sequence of a million floats in about $2msec*1000 = 2 sec$. Let's try.

In [None]:
%timeit TrendList(rands(1_000_000))

Sho 'nuff. `TrendList` can decompose a million-float sequence in less than half the time `trend_list()` takes to decompose a list of $1,000$.

It's peppy. You can work with it.

## How Many Trends in a Random Sequence?

That was fun. What else could you use the `trendlist` package to ask?
How about the average number of trends in a random sequence.

Again, this will depend on sequence length -- you'd expect longer sequences to usually have more trends -- 
and although the result doesn't seem obvious,
the code isn't hard.

Let's watch that.

In [None]:
def trend_count(seq):
    trends = TrendList(seq)
    return len(trends)

In [None]:
s_up = [0.1, 0.2, 0.4]    # an increasing trend
print(trend_count(s_up))
s_down = [0.4, 0.2, 0.1]  # the decreasing trend `list(reversed(s.up))`
print(trend_count(s_down))
s_mixed = [0.4, 0.1, 0.2] # neither increasing nor decreasing
print(trend_count(s_mixed))
s_random = rands(10) # a sequence of random floats
print(trend_count(s_random))

Try some yourself!

From here, it's easy. We'll walk through everything step-by-step.

First, the function to find the mean number of trends for a fixed length sequence over a number of trials.

In [None]:
def mean_trend_count(n, trials):
    total_trends = 0  # a running total
    for _ in range(trials):
        s_random = rands(n) # generate a sequence of n floats
        total_trends += trend_count(s_random) # add the number of trends to the running total
    return round(total_trends / trials, 3)  # average number of trends to three decimal places

Trying it out:

In [None]:
mean_trend_count(20, 100)

Next, wrap that in another function, which prints the mean at a variety of sequence lengths.

In [None]:
def trends_by_length(max, trials, npoints):
    increment = max // (npoints)
    ntrends = {}
    for nrands in range(increment, max, increment):
        ntrends[nrands] = mean_trend_count(nrands, trials)
    return ntrends

In [None]:
trends_by_length(10_000, 100, 20)

Unfortunately, that doesn't look cleanly linear. Not as lucky as when we were timing `TrendList`.

No reason to be discouraged. It turns out you can get a better line with a simple transformation. Instead of using $sequence\ length$ for the X axis, use $ln(sequence\ length)$.

In [None]:
import math

def trends_by_log_length(max, trials):
    ntrends = {}
    nrands = 1         # start small
    while nrands <= max:
        log_length = round(math.log(nrands), 2)
        ntrends[log_length] = mean_trend_count(nrands, trials)
        nrands *= 2    # double nrands at each iteration
    return ntrends

In [None]:
nt = trends_by_log_length(10_000, 100)
print(nt)

That looks pretty linear. We'll use `numpy`, to test it.
`numpy` is a big library with a ton o' stuff for solving mathematical problems with vectors and matrices, so we're killing a fly with a sledgehammer, but it's near-at-hand.

In [None]:
import numpy as np
x = list(nt.keys()) # log of the length
y = list(nt.values()) # number of trends
r = np.corrcoef(x, y)[0, 1]  # how closely are the two variables correlated?
print(f"coefficient of determination = {r**2}")
# np.polyfit(x, y), deg=1)

That's really quite a good fit. What's the equation of the line?

In [None]:
m, b = np.polyfit(x, y, deg=1)
print(f"trends = {m:#.2g}*ln(length) + {b:#.2g}")

And what a line! For a sequence of length $N$, $trends \approx ln(N)$.

How about a really long sequence?

In [None]:
length = 1_000_000
count = mean_trend_count(length, 7)  # a million floats, 7 trials
print(f"Predicted: {math.log(length):#.2g}, Actual: {count:#.2g}")

How long a sequence would you need to average 7 trends?

Try decomposing a sequence that length to see if you're right.

## Turns a Random Sequence into a Single Trend with TrendList.rotate

Every random sequence has a exactly one circular permutation that's a single trend.
After a sequence is decomposed into a list of `Trend` objects -- a `TrendList` -- a circular permutation of those objects, followed by any necessary merges -- creates a new TrendList with fewer trends.

The `rotate()` method of a `TrendList` object does that rotation.

In [None]:
trends = TrendList(rands(1_000_000))
print("Initial trendlist:")
for trend in trends:
    print(trend)
print(f"{len(trends)=}")
print("\nBut with each rotation and merge, the number of trends shrinks.")
while len(trends) > 1:
    trends = trends.rotate()
    print(f"{len(trends)=}")
print("Until there's finally a single trend!")

## How Many Rotations to Get a Single Trend?

Okay, we can rotate to a single trend. How many rotations are needed?
For this, we have the `trendlist` method, `TrendList.single`, which reports both the number of rotations it needs and the distance rotated.

In [None]:
trends = TrendList(rands(1_000))
print(f"{len(trends)=}")
print(f"{trends.single()=}")

Watch it in slow motion.

In [None]:
trends = TrendList(rands(1_000)) # make a random TrendList
print("Initial trendlist:") # and print out its trends.
for trend in trends:
    print(trend)
print(f"{trends.single()=}") # what does trends.single() report?

# now go back and do the same thing, one rotation at a time.
distance = 0
rotations = 0
print(f"{len(trends)=}, {distance=}, {rotations=}")
while len(trends) > 1:
    distance += trends[0].length # the distance the next rotation will take us
    trends = trends.rotate()
    rotations += 1 # count the rotations
    print(f"{len(trends)=}, {distance=}, {rotations=}")
# and finally, the trend we end up with
for trend in trends:
    print(trend)

But what do you *expect* those numbers to be?

The final, single trend is as long as the original sequence -- that's a relief, right? -- and has a mean of about the mean of a random, U(0,1) sequence, $0.5$.

But what about the average distance rotated?

In [None]:
from statistics import mean
n = 10_000
trials = 1000
distance = []
for _ in range(trials):
    trends = TrendList(rands(n)) # break a long, random sequence into trends
    distance.append(trends.single().start) # rotate to a single trend, save away the distance
print(f"{mean(distance)=}")    

Hah! About half the length of the sequence.

Makes sense. Pick a trend of length $N$. `single()` turns each of its circular permutation back into that trend. The average distance for all those rotations is half the length of the sequence: $N/2$.

How about the number of rotations?

In [None]:
from statistics import mean
from math import log
n = 10_000
trials = 1000
rots = []
for _ in range(trials):
    trends = TrendList(rands(n)) # again, decompose a long, random sequence
    rots.append(trends.single().num_rots) # this time, save the rotations to get to 1 trend
print(f"{mean(rots)=}") # the average
print(f"{log(n)/2=:.2g}") # and ...?

Can you see why this makes sense?

## Wrap-Up

Taken together, this means `trendlist` lets you create a single trend from a random sequence of length $N$ in $O(N)$ time.

* You're guaranteed that one circular permutation will be a single trend.
* You can decompose the sequence into trends in $O(N)$ time.
* You can rotate that decomposition it to a single trend in $O(log(N))$ steps.
* You can use the trend-length data to know how much to go back and rotate the original sequence to get the single trend.

With the `trendlist` toolkit, you've also had a chance to

* decompose sequences into trends, fast. You can break a million floats into maximal trends in a couple of seconds.
* see the expected number of trends is, roughly, the natural log of the sequence length.
* rotate the trends to produce a single trend.
* realize that the final, single trend

    * is the same length as the original sequence, 
    * has a mean the same as that of the random numbers
    * starts at a position uniformly distributed along the sequence
    * takes about $ln(N)/2$ rotations to create
    
Now, it's your turn. Go play with `trendlist`. Ask some questions of your own.