# IERG4190 Multimedia Coding and Processing
---
## Chapter 3 Multimedia Coding (2)
[Lecture Note](https://blackboard.cuhk.edu.hk/bbcswebdav/pid-3095878-dt-content-rid-23712435_1/xid-23712435_1) (Need CUHK Logon)

Multi-symbol Coding with Adaptive Dictionary
### LZ77
#### Basic Information
In the LZ77 approach, the dictionary is simply a portion of the previously encoded sequence. The encoder examines the input sequence through a sliding window.

*See the graphical explanation on slide 5 in the lecture note.*

Search buffer: contains a portion of the recently encoded sequence, with usually several thousand characters.

Look-ahead buffer: contains the next portion of the sequence to be encoded, with usually ten to one hundred characters.

To encode the sequence in the look-ahead buffer, the encoder moves a search pointer back through the search buffer until it encounters a match to the first symbol in the look-ahead buffer.

Offset: the distance of the pointer from the look-ahead buffer.

Length-of-match: the number of consecutive symbols in the search buffer that match consecutive symbols in the look-ahead buffer, starting with the first symbol.

Once the longest match is found, the encoder encodes a triple:

<the offset, the length of match, the symbol following the match>

*See the graphical explanation on slide 6 in the lecture note.*


#### Example
*See slides 7-11 in the lecture note.*

#### Limitations
Very simple adaptive scheme that requires no any prior knowledge of the source. The dictionary is the search window.

Problems with LZ77:
* The algorithm uses only a small window into previously seen text, which means it continuously throws away valuable phrases because they slide out of the dictionary.
* The limited lengths of the two buffers limit the size of a phrase that can be matched.
* Worst case situation is that the sequence to be encoded is periodic with a period longer than the search buffer.
    > e.g. **abcdefghij**kabcdefghijk

In [3]:
# Encoding example code from http://www.codeceo.com/article/python-lz77.html
class Lz77:
    def __init__(self, inputStr):
        self.inputStr = inputStr #输入流
        self.searchSize = 7    #搜索缓冲区(已编码区)大小
        self.aheadSize = 6     #lookAhead缓冲区（待编码区）大小 
        self.windSpiltIndex = 0 #lookHead缓冲区开始的索引
        self.move = 0
        self.notFind = -1   #没有找到匹配字符串

    #得到滑动窗口的末端索引
    def getWinEndIndex(self):
        return self.windSpiltIndex + self.aheadSize

    #得到滑动窗口的始端索引
    def getWinStartIndex(self):
        return self.windSpiltIndex - self.searchSize

    #判断lookHead缓冲区是否为空
    def isLookHeadEmpty(self):
        return True if self.windSpiltIndex + self.move> len(self.inputStr) - 1   else False

    def encoding(self):
        step = 0
        print("Step   Position   Match   Output")
        while not self.isLookHeadEmpty():
            #1.滑动窗口
            self.winMove()
            #2. 得到最大匹配串的偏移值和长度
            (offset, matchLen) = self.findMaxMatch()
            #3.设置窗口下一步需要滑动的距离
            self.setMoveSteps(matchLen) 
            if matchLen == 0:
                #匹配为0，说明无字符串匹配，输出下一个需要编码的字母
                nextChar = self.inputStr[self.windSpiltIndex]
                result = (step, self.windSpiltIndex, '-',  '(0,0)' + nextChar)
            else:
                result = (step, self.windSpiltIndex, self.inputStr[self.windSpiltIndex - offset: self.windSpiltIndex - offset + matchLen], '(' + str(offset) + ',' + str(matchLen) + ')')
            #4.输出结果
            self.output(result)    
            step = step + 1        #仅用来设置第几步

    #滑动窗口(移动分界点)
    def winMove(self):
        self.windSpiltIndex = self.windSpiltIndex + self.move

    #寻找最大匹配字符并返回相对于窗口分界点的偏移值和匹配长度
    def findMaxMatch(self):
        matchLen = 0
        offset = 0
        minEdge = self.minEdge() + 1  #得到编码区域的右边界
        #遍历待编码区，寻找最大匹配串
        for i in range(self.windSpiltIndex + 1, minEdge):
            #print("i: %d" %i)
            offsetTemp = self.searchBufferOffest(i)
            if offsetTemp == self.notFind: 
                return (offset, matchLen)
            offset = offsetTemp #偏移值

            matchLen = matchLen + 1  #每找到一个匹配串，加1

        return (offset, matchLen)

    #入参字符串是否存在于搜索缓冲区，如果存在，返回匹配字符串的起始索引
    def searchBufferOffest(self, i):
        searchStart = self.getWinStartIndex()
        searchEnd = self.windSpiltIndex 
        #下面几个if是处理开始时的特殊情况
        if searchEnd < 1:
            return self.notFind
        if searchStart < 0:
            searchStart = 0
            if searchEnd == 0:
                searchEnd = 1
        searchStr = self.inputStr[searchStart : searchEnd]  #搜索区字符串
        findIndex = searchStr.find(self.inputStr[self.windSpiltIndex : i])
        if findIndex == -1:
            return -1
        return len(searchStr) - findIndex

    #设置下一次窗口需要滑动的步数
    def setMoveSteps(self, matchLen):
        if matchLen == 0:
            self.move = 1
        else:
            self.move = matchLen

    def minEdge(self):
        return len(self.inputStr)  if len(self.inputStr) - 1 < self.getWinEndIndex() else self.getWinEndIndex() + 1

    def output(self, touple):
        print("%d      %d           %s     %s" % touple)

if __name__ == "__main__":
    lz77 = Lz77("AABCBBABC")
    lz77.encoding()

Step   Position   Match   Output
0      0           -     (0,0)A
1      1           A     (1,1)
2      2           -     (0,0)B
3      3           -     (0,0)C
4      4           B     (2,1)
5      5           B     (3,1)
6      6           ABC     (5,3)
