## Artificial Data

Most often, we would like data where the variance decays for the components. For now, we consider an exponential decay with $\alpha\geq 0$ (in the special case $\alpha=0$, the variance does not decay). The highest variance is thus set at $1$.

In [5]:
k0 = 10

In [6]:
alpha = 1

In [7]:
n = 10000

In [8]:
import numpy as np

In [9]:
pc_variance = np.exp(-alpha*np.array(range(k0)))

In [10]:
pc_variance

array([1.00000000e+00, 3.67879441e-01, 1.35335283e-01, 4.97870684e-02,
       1.83156389e-02, 6.73794700e-03, 2.47875218e-03, 9.11881966e-04,
       3.35462628e-04, 1.23409804e-04])

In [11]:
import scipy.stats as stats

In [12]:
rand_ortho = stats.ortho_group.rvs(k0)

In [13]:
rand_ortho

array([[ 0.03113753, -0.10756708,  0.3829638 , -0.29449399,  0.38944405,
        -0.3968229 ,  0.04771298,  0.07001126, -0.11376546,  0.65177913],
       [-0.11525023, -0.02037656,  0.08927556, -0.08101602,  0.40664692,
         0.36917823, -0.46045378, -0.31050022,  0.59774144,  0.06626683],
       [ 0.14127141,  0.11664488, -0.63247937, -0.47459633,  0.01061824,
         0.19992794, -0.2108889 ,  0.44073605, -0.03033881,  0.24786642],
       [-0.16510134, -0.90756312, -0.05278363, -0.05206021, -0.23688768,
         0.07820758,  0.10236342,  0.15746166,  0.20424004,  0.06599796],
       [-0.43007134, -0.15025137, -0.07656355,  0.16677394,  0.32812153,
         0.41681248, -0.10812714, -0.08925167, -0.67251008,  0.07391964],
       [ 0.18364891,  0.05924229, -0.1539361 ,  0.01046509,  0.22094375,
         0.39632736,  0.79616699, -0.18713127,  0.1611838 ,  0.19541204],
       [ 0.19301993, -0.12784881, -0.27740139,  0.61053827,  0.52103364,
        -0.25648143, -0.03607002,  0.36786525

In [14]:
rand_normal = np.random.normal(scale = pc_variance, size = (n,k0))

In [15]:
rand_input = np.matmul(rand_normal, rand_ortho)

In [16]:
import sklearn.decomposition as decomp

In [17]:
pca = decomp.PCA()

In [18]:
pca_input = pca.fit(rand_input)

In [19]:
pca_input.explained_variance_

array([1.00577096e+00, 1.33666767e-01, 1.89121831e-02, 2.50179747e-03,
       3.37748930e-04, 4.50521934e-05, 6.13145041e-06, 8.59585235e-07,
       1.11560676e-07, 1.49624884e-08])

In [20]:
import lazytools
lazytools.matrix_heatmap(np.matmul(pca_input.components_, rand_ortho.T), pole = 0)

<Figure size 640x480 with 1 Axes>

<ggplot: (-9223363263101735129)>

This is a simple way to create inherently high-dimensional data, where the different dimensions carry less and less weight -- providing a perfect way to study linear predictive coding.

### Abstraction

In [3]:
import predicode as pc

In [4]:
art_data = pc.DecayingMultiNormal(dimensions = 10, samples = 10000, alpha = 1)

In [8]:
art_data.data.shape

(10000, 10)