In [16]:
from pyldpc import*
import numpy as np
from time import time
import pickle
from scipy.sparse import csr_matrix

<h1 align="center"> pyLDPC Matrices Tutorial: construct a specific code </h1> 


When using pyLDPC in Image, Sound or Text transmission simulation, one needs to build matrices H and G with a certain rate and a certain k. For example, let's say I want to code an decode an RGB image pixel by pixel. Since each pixel is a 3-sized tuple of unit8 numbers, a pixel can be transformed to 24 bits array. Hence, we'll need a coding matrix G with k (number of rows) equal to 24. 

<br>

<font color="blue"><h2> Outline:</h2></font> 
<font color="blue"><h2> --------------------------------------------------</h2></font> 

** I - Users' guide: HtG function**

     I.1 HtG function 
     I.2 Applications
** II - Theoritical Intuition **

     II.1 Theory 
     II.2 How HtG works
    
** III - Introduction to scipy.sparse.csr_matrix format and pickle**

    III.1 CSR format .. what ?
    III.2 Save and load matrices from files. 

<font color="blue"><h2> --------------------------------------------------</h2></font> 




# I. User's guide

## I-1 The HtG function:

Use the *HtG* function: 

```python
def HtG(invrate,k,systematic=True)
```

- *Invrate* is the approximate rate's inverse (for example if k = 10, n = 28, rate = 1/3, invrate = 3) 
- k is the message's length. 
- systematic: boolean, if true, coding matrix is systematic. 


In [9]:
H,tG = HtG(3,24)
print("H=\n", H)
print("\ntG=\n ", tG)
print("\nG's shape is:", tG.T.shape)

H=
 [[0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 ..., 
 [0 0 0 ..., 0 0 1]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]]

tG=
  [[1 0 0 ..., 0 0 0]
 [0 1 0 ..., 0 0 0]
 [0 0 1 ..., 0 0 0]
 ..., 
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 0 0]
 [0 0 0 ..., 0 1 1]]

G's shape is: (24, 69)


##  I-2 Applications

1. First decide what rate would you like to use in your code.  
2. Second fix k, the number of bits in one message:
    - Pixel by pixel image coding: k = 8 if grayscale ; k = 24 if RGB. 
    - Row by row image coding: k = number of bits in one row (width x 8 if grayscale, width x 24 if RGB)
   
Let's say we would a code with a rate 1/3 (which means n is approximately equal to 3k), and use it in coding an image of size 200x100. 

> Grayscale: k = 100 x 8 = 800

In [18]:
H,tG = HtG(3,800)
print(tG.shape)

(2397, 800)


Indeed, k = 800, n  approximately = 2400 ! 

> Grayscale: k = 100 x 24 = 2400

In [19]:
H,tG = HtG(3,2400)
print(tG.shape)

(7197, 2400)


Indeed, k = 2400, n  approximately = 7200 = 2400x3 ! 


 # II. Theoritical intuition
Let's generate a regular parity-check matrix (n,d_v,d_c) = (15,3,5). Let m be H's number of rows. We a priori know that <font color="red"> m.d_c = n.d_v    (1) </font> (because the matrix is regular: number of ones in H seen from a rows perspective = number of ones in H seen from a columns perspective). 
Gallager's method in constructing H (using sets) relies on the property <font color="red" > d_c divives n.   (2)</font> 

<img src="Equations/Matrices1.png"> 

Coding matrix G's number of rows is, since G is full rank,<font color="red"> k = n - rank(H). (3)  </font>. 

Using the <font color="red"> 3 relations </font> above, we can create an effective method to build any couple of matrices.  

**1 - If H is full rank:**

Which happens almost never, let's suppose that for the sake of argument, then rank of H is equal to m, the number of rows. We have then: 

<img src="Equations/Matrices2.png"> 

The first equality is the result of the full rank assumption. 

** 2 - If H is not full rank:**

In this case, m is **greater** than m above, which means:

<img src="Equations/Matrices3.png"> 

** 3 - Conclusion: ** 

In practice, the number of linear dependant rows is low compared to m. Major findings using the inequality above: 

1. Approximate rate k/n can be estimated by 1 - dv/dc
2. Specific k can be obtained by fixing dv and dc, and then decreasing n (by dc, 2*dc .. to keep condition (2) valid) 

In the example above, H has 9 rows, rank(H) = 7. 

# II -2  How HtG operates:

###  small matrices

> Situation A: Generate a code of rate = 1/4 (approximate rate) and k = 24. 

#### 1 - Remember approximate rate = 1 - dv/dc. 
H must be as sparse as possible to guaratee a good decoding. Chose d_v and d_c the smallest possible numbers. Therefore, dv = 3, dc = 4. 


In [12]:
d_v, d_c = 3,4 

#### 2 - Remember k > n(1-dv/dc) 
Which means if you pick n such as n(1-dv/dc) = k, you will get a **higher final k** than expected: 
n = 4k = 96 

In [13]:
n = 96 
H = RegularH(n,d_v,d_c)
n-BinaryRank(H)

26

This means that we should decrease n. 

#### 3 - Remember d_c must divide n
Try n = 92, 88 .. (Notice than we got 26 instead of 24, you may anticipate that you will need to decrease n by 2 dc).


In [14]:
n = 92
H = RegularH(n,d_v,d_c)
n-BinaryRank(H)

25

In [15]:
n = 88
H = RegularH(n,d_v,d_c)
n-BinaryRank(H)

24

Here's your matrix ! (88,3,4).

> Situation B: Generate a code of approximate rate 1/3, k = 24

Now that we've *seen* how this construction method operates, we can skip some useless tries:
1. Fix d_v = 2, d_c = 3.
2. Try n = 3k - d_c, 3k - 2d_c ... 

In [18]:
d_v,d_c = 2,3
n = 3*24 - d_c
H = RegularH(n,d_v,d_c)
n-BinaryRank(H)

24

###  Large Matrices: (Coding row by row) 

 The method is the same with large matrices except that the calculations take more time ! That's why check first if the matrix is not available in the *<a href="http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Tutorial-Library.ipynb?flush_cache=true/"> Library</a>*. If not, well I recommand this *"empirical trick"*: 
 
 - rate = 1/3: use n = 3k - d_c = 3k - 3  
 - rate = 1/4: use n = 4k-2*d_c = 4k - 8 
 - rate = 1/5: use n = 5k - 3*d_c = 5k - 15
 
 and so on. 

 
 > <font color="blue"> Situation C:</font> Generate a (1/3) code for a (100x200) image (row by row coding). 
 
 ### 1 - First evaluate k:
 
 - Grayscale image: each pixel is an uint8 number => one pixel = 8 bits => Each row has k = 200*8 = 1600 bits.
 - RGB image: each pixe has 3 uint8 numbers => one pixel = 24 bits => Each row has k = 200*24 = 4800 bits. 
 
 > <font color="blue"> C: </font>Grayscale 

In [22]:
d_v,d_c = 2,3
n = 3*1600 - 3 
H = RegularH(n,d_v,d_c)
n-BinaryRank(H)

1600

Bingo ! (Takes a few seconds to calculate)
 > <font color="blue"> C: </font>RGB 

In [23]:
d_v,d_c = 2,3
n = 3*4800 - 3 
H = RegularH(n,d_v,d_c)
n-BinaryRank(H)

4800

YES ! (Takes one or two minutes to calculate) 

With bigger images, you may feel frustrated to wait for the matrices to be constructed, particularly if n is greater than 10000. 

I introduced *<a href="http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Tutorial-Library.ipynb?flush_cache=true/">a cool library</a>*   of large matrices  ready to be downloaded. To save and load objects from files we'll use the package <font color="green"><b> pickle </b></font>.

# III - CSR objects, Pickle: 
When using large matrices, **scipy.sparse.csr_matrix** format is highly recommanded. Since v.0.7, pyLDPC support both numpy arrays and scipy.sparse.csr_matrix objects. 
## III - 1 sparse csr .. what ?
CSR format (*Compressed-Storage-Rows*) is a very effective way to store, compute products, access data from rows when dealing with large matrices containing very few non-zeros entries. Example:

In [26]:
from scipy.sparse import csr_matrix 
Hs = csr_matrix(H)
print(Hs)

  (0, 0)	1
  (0, 1)	1
  (0, 2)	1
  (1, 3)	1
  (1, 4)	1
  (1, 5)	1
  (2, 6)	1
  (2, 7)	1
  (2, 8)	1
  (3, 9)	1
  (3, 10)	1
  (3, 11)	1
  (4, 12)	1
  (4, 13)	1
  (4, 14)	1
  (5, 15)	1
  (5, 16)	1
  (5, 17)	1
  (6, 18)	1
  (6, 19)	1
  (6, 20)	1
  (7, 21)	1
  (7, 22)	1
  (7, 23)	1
  (8, 24)	1
  :	:
  (9589, 13201)	1
  (9590, 3994)	1
  (9590, 8502)	1
  (9590, 14368)	1
  (9591, 1381)	1
  (9591, 1749)	1
  (9591, 3740)	1
  (9592, 1392)	1
  (9592, 4610)	1
  (9592, 11614)	1
  (9593, 9097)	1
  (9593, 9931)	1
  (9593, 10515)	1
  (9594, 2292)	1
  (9594, 10009)	1
  (9594, 13616)	1
  (9595, 5753)	1
  (9595, 6290)	1
  (9595, 12018)	1
  (9596, 1378)	1
  (9596, 2606)	1
  (9596, 12165)	1
  (9597, 1914)	1
  (9597, 4752)	1
  (9597, 10596)	1


See ? csr format store only coordinates and value of non-zeros entries. You can get the numpy array version using toarray() method: Hs.toarray().

To use sparse matrices in code and decoding, simply pass Hs instead of H to coding/decoding functions.

## 3 - 2 Using Pickle to save and load CSR objects

### Saving objects: 
Lets generate a H,tG large matrices: 

<font color="red"> Note that since v0.7, pyLDPC (construction,coding and decoding) functions (return,take) transposed G instead of G. </font>

In [27]:
d_v,d_c = 2,3
n = 3*1600 - 3 
H = RegularH(n,d_v,d_c)
H,tG = CodingMatrix_systematic(H)
Hs = csr_matrix(H)
tGs = csr_matrix(tG)
tGs.shape

(4797, 1600)

Saving Hs, create a folder, in my case its Matrices/1600/1-3/ (1600 for k, 1-3 for rate). And then open (create) a file that I chose to name Hs with the option "write in binary" (wb). Create an instance of a "Pickler" object associated to my file. This "Pickler" object will "dump" Hs in the file: 

In [28]:
import pickle
with open("Matrices/1-3/1600/Hs","wb") as myfileHs:
    mypickler = pickle.Pickler(myfileHs)
    mypickler.dump(Hs)

<img src="Equations/Matrices4.png">

You can actually dump as many objects as you want by adding more mypickler.dump(object) instructions. Personally, I prefer to save on object per file.

Let's try to load (read) Hs. It's actually similar to saving: instead of a Pickler object, an Unpickler object is used. And instead of "wb" mode, "read in binary" "rb" is passed to *open*.

<font color="red"> Careful with modes when using pickle ! If you pass "wb" (write mode) instead of "rb" (read mode) when you read a file, the file will be overwritten ! </font>

In [30]:
with open("Matrices/1-3/1600/Hs","rb") as myfileHs:
    myunpickler = pickle.Unpickler(myfileHs)
    HsRead = myunpickler.load()

Test if Hsread is equal to Hs: 

In [35]:
(HsRead.toarray() == Hs.toarray()).all()

True

Bingo ! You can know save and load large matrices which is not only a save of a huge amount of time but also a possibility to use the same matrices any time ! 

## Go to other pyLDPC tutorials:


- Users' Guide: 

1- <a href=http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Tutorial-Basics.ipynb?flush_cache=true> LDPC Coding-Decoding Simulation </a> 

2- <a href=http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Tutorial-Images.ipynb?flush_cache=true> Images Coding-DecodingTutorial </a>

3- <a href =http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Tutorial-Sound.ipynb?flush_cache=true> Sound Coding-DecodingTutorial </a> 

4- <a href=http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Tutorial-Matrices.ipynb?flush_cache=true> LDPC Matrices Construction and User's Tutorial </a> 

5- <a href="http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Tutorial-Library.ipynb?flush_cache=true/"> Large Matrices Library </a>
- For LDPC construction details:

1- <a href=http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Presentation.ipynb?flush_cache=true>pyLDPC Construction </a>

2- <a href= http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Images-Construction.ipynb?flush_cache=true> LDPC Images Functions Construction </a> 
 
3- <a href = http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Sound-Construction.ipynb?flush_cache=true> LDPC Sound Functions Construction </a> 

4- <a href=http://nbviewer.jupyter.org/github/janatiH/pyldpc/blob/master/pyLDPC-Tutorial-Matrices.ipynb?flush_cache=true> LDPC Matrices Construction and User's Tutorial </a> 