# Demo 1: Producers

- [Introduction](#Introduction)
- [Creation Routines](#Creation-Routines)
    - [From arrays](#From-arrays)
    - [From sequences](#From-Sequences)
    - [From binary files](#From-binary-files)
    - [From generating functions](#From-generating-functions)
    - [From producers](#From-producers)
- [Producers-to-arrays](#Producers-to-arrays)
- [Masked Producers](#Masked-Producers) 

In [1]:
import numpy as np
import matplotlib.pyplot as plt

from openseize import producer
from openseize.io.edf import Reader

## Introduction

In this document, we will introduce Producers, one of the primary building blocks of OpenSeize. Producers are an answer to the issue of loading in massive amounts of data into memory before performing analysis or other manipulations. They do so by taking in one of various forms of data and a desired chunksize, splitting the data into numpy arrays of that chunksize. Think of Producers as multipurpose PEZ Dispensers. You slot your input data in, along with the chunksize, to create a Producer object. This object can then spit out chunks one at a time, as you iterate over it using one of Python's many iterative methods.

Below, we will show off how to use Producers to read in from the following data input types:
 - Numpy Arrays
 - Python Sequences
 - Binary Files (e.g. EDF Files)
 - Python Generator Functions
 - Other Producers

For each of these, we will also give examples of how to read / use them in practice.

In addition, we'll take a look at Masked Producers, which are a convenient way to filter the data that the Producer outputs based on criteria you might want to enforce.

## Creation Routines

### From arrays

First, we'll look at numpy arrays and how to fit them into a Producer. Note that because numpy arrays are already loaded in memory, you won't get the same memory saving advantages as for, say, a large binary file, but you can still take advantage of the Producer's ability to split the array into chunks, one at a time and at will.

Here, I'm going to make a basic 2D array of values from 0 to 5000.

In [24]:
data_array = np.arange(5000).reshape(20, 250)
data_array[:5]

array([[   0,    1,    2, ...,  247,  248,  249],
       [ 250,  251,  252, ...,  497,  498,  499],
       [ 500,  501,  502, ...,  747,  748,  749],
       [ 750,  751,  752, ...,  997,  998,  999],
       [1000, 1001, 1002, ..., 1247, 1248, 1249]])

Now, to create the Producer object. 

We need to choose the chunksize, and the axis along which we're going to split the data. In this example, we will split the data along the first axis.

In [19]:
chunksize = 5
axis = 0
array_producer = producer(data_array, chunksize=chunksize, axis=axis)

Now that we have our Producer, we can iterate over it like any other Python iterable. Here, we use a for-each loop to access each chunk.

We can see the shape of each chunk; because we split along the first axis with a chunksize of 5, each chunk has 5 as its first dimension.
Because the output chunks are numpy arrays, you can perform any numpy operations on them as you see fit.

In [21]:
for chunk in array_producer:
    print(chunk.shape)
    print(np.average(chunk))

(5, 250)
624.5
(5, 250)
1874.5
(5, 250)
3124.5
(5, 250)
4374.5


To see what would happen if we split the data along the second axis, we will pass it in as input to a new Producer and see how the chunks look when split that way. Note that, this time, the dimensions are the same as the original data along the first axis, but reduced to length 5 chunks along the second axis.

In [31]:
axis = 1
array_producer2 = producer(data_array, chunksize=chunksize, axis=axis)

# Here, we convert the iterable Producer into a list so we can access just the first few elements.
# Generally, this is not recommended, as it loads the whole of the data into memory. Again, for arrays, this doesn't matter.
array_producer2_list = list(array_producer2)

for i in range(5):
    print(array_producer2_list[i].shape)
    print(np.average(array_producer2_list[i]))

(20, 5)
2377.0
(20, 5)
2382.0
(20, 5)
2387.0
(20, 5)
2392.0
(20, 5)
2397.0


### From sequences

Producers can also take in any Python sequence object as an input. Primary examples of sequences include strings, lists, tuples, and byte arrays. The behavior of Producers in this context is exactly the same as in the arrays example.

In [54]:
data_sequence_list = [[num for num in range(10)] for _ in range(5000)]

# Print out the first five rows of the list.
data_sequence_list[:5]


[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]

In [64]:
chunksize = 5
axis = 0

sequence_producer = producer(data_sequence_list, chunksize=chunksize, axis=axis)

# Here, again, we convert the iterable Producer into a list so we can access just the first few elements.
# Do not do this in practice.
sequence_producer_list = list(sequence_producer)

for i in range(5):
    print(sequence_producer_list[i])

[0 1 2 3 4]
[5 6 7 8 9]
[0 1 2 3 4]
[5 6 7 8 9]
[0 1 2 3 4]


Showing off once more, this time with a tuple object to demonstrate the general use for any sequence object.

In [75]:
arrs = [[num for num in range(10)] for _ in range(5000)]
data_sequence_tuple = (arrs[:])

sequence_producer2 = producer(data_sequence_tuple, chunksize=chunksize, axis=axis)

sequence_producer_list = list(sequence_producer)

for i in range(5):
    print(sequence_producer_list[i])

[0 1 2 3 4]
[5 6 7 8 9]
[0 1 2 3 4]
[5 6 7 8 9]
[0 1 2 3 4]


### From binary files (e.g. EDFs)

### From Generator Functions

### From other Producers

One possible use case you might run into is that you'd want to reshape the data that you've already inserted into a Producer, or would rather split it by a different axis. This can be done by directly passing an existing Producer into the instantiation of another one, as we show below.

In [96]:
base_array = np.arange(500).reshape(20, -1)

chunksize = 5
axis = 0

initial_producer = producer(base_array, chunksize=chunksize, axis=axis)

producer_list = list(initial_producer)

for i in range(1):
    print(producer_list[i])

[[  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
   18  19  20  21  22  23  24]
 [ 25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42
   43  44  45  46  47  48  49]
 [ 50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67
   68  69  70  71  72  73  74]
 [ 75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92
   93  94  95  96  97  98  99]
 [100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117
  118 119 120 121 122 123 124]]


In [85]:
# Change the chunksize and axis to something new
chunksize = 10
axis = 0

# Create a new producer from the old one
new_producer = producer(initial_producer, chunksize=chunksize, axis=axis)

producer_list = list(new_producer)

for i in range(2):
    print(producer_list[i])

[[   0    1    2 ...  247  248  249]
 [ 250  251  252 ...  497  498  499]
 [ 500  501  502 ...  747  748  749]
 ...
 [1750 1751 1752 ... 1997 1998 1999]
 [2000 2001 2002 ... 2247 2248 2249]
 [2250 2251 2252 ... 2497 2498 2499]]
[[2500 2501 2502 ... 2747 2748 2749]
 [2750 2751 2752 ... 2997 2998 2999]
 [3000 3001 3002 ... 3247 3248 3249]
 ...
 [4250 4251 4252 ... 4497 4498 4499]
 [4500 4501 4502 ... 4747 4748 4749]
 [4750 4751 4752 ... 4997 4998 4999]]
