# Demo of USM Renyi Entropy in Python
## How to compute continuous Renyi entropy of USM using the pyusm library
#### By Katherine Wuestney

**Contents**
* [Introduction](#Intro)
    - [Terminology Introduction](#termintro)
* [Section 1](#Section1) Renyi continuous entropy of USM
    - [Section 1.1](#Section1.1) Renyi Continuous Entropy
    - [Section 1.2](#Section1.2) Parzen Window Kernel Density Estimation
    - [Section 1.3](#Section1.3) Equation of *d*-dimensional Renyi Entropy of USM
* [Section 2](#Section2) Demo of usm_entropy module

<a id='Intro'></a>
## Introduction

Universal Sequence Maps (USM) are a generalized version of the CGR maps first introduced by Jeffrey and they function as generalized order-free Markov transition matrices of symbolic sequences. Due to this property, and the others discussed in the notebook demonstration 'demo_usm_make' 

<a id='termintro'></a>
### Terminology Introduction - Use this section as a glossary
The topics presented in this notebook are highly cross-disciplinary and as such there are a variety of different terms used in the literature to refer to the same basic concepts. To ensure clarity of discussion we will define the following within the context of symbolic sequence analysis:
* "Sequence" or "Symbolic sequence" - a set of indexed symbols for which the order of the symbols is integral and thus object of analysis. Sequences are commonly encountered in genomics, time series analysis, linguistics and natural language processing, among others. Synonyms of sequence include strings, series, or sometimes vectors. 
* "Symbol" - a nominal data element which has no numeric magnitude in the general Euclidean sense. Symbols are common data types encountered in linguistics and natural language processing, information theory and cryptography as well as genomics. A symbol is congruous with a single category of a categorical variable. A symbol can be anything, including numbers, but it does not behave like regular numbers do as it is generally a proxy "symbolizing" some other construct.
* "Generating function" - a process or phenomenon which produces a sequence of symbols.
* "Alphabet" - the set of all possible symbols a generating function may produce. For example, if our sequence is a paragraph of a Charles Dickens novel, our generating function could be considered 19th century English typography, and our alphabet would be the 26 letters of the English alphabet plus each punctuation character in use during that time. The term alphabet is congruous to the term "state space" from dynamical systems, in that the alphabet functions as the basic state space of a symbolic generating function. There are many different ways the size of an alphabet is refered to in the literature but for this discussion here we will refer to an alphabet's size as its dimension *d* and  
* "k-gram" - a subsequence of a longer sequence comprising of k sequential symbols. For example, if our sequence is "ACTGGCA", "TG" would be a k-gram with k=2. In symbolic sequence analysis we are often most interested in the frequency and patterns of k-grams of various lengths. Synonyms for k-gram include L-tuple, subsequence, words, motifs, sub-strings, or vectors. 
* "Suffix" - the k-gram occurring at the very end of a sequence. For "ACTGGCA", its length 3 suffix is the k-gram "GCA".
* "Prefix" - the k-gram occurring at the very beginning of a sequence.
