<a href="https://colab.research.google.com/github/shivavsrivastava/Algorithms/blob/main/Course3_W3_HuffmanCoding_DynamicProgramming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Huffman Coding and Dynamic Programming


In [None]:
import numpy as np
import random
import urllib3
import math
from collections import defaultdict
import time
import heapq
from heapq import heappush, heappop, heapify
import pandas as pd
import sys
from itertools import count

## Programming Assignment: Part 1 & 2


In this programming problem and the next you'll code up the greedy algorithm from the lectures on Huffman coding.

Download the text file below. -- **huffman.txt**

This file describes an instance of the problem. It has the following format:

[number_of_symbols]

[weight of symbol #1]

[weight of symbol #2]

...

For example, the third line of the file is "6852892," indicating that the weight of the second symbol of the alphabet is 6852892.  (We're using weights instead of frequencies, like in the "A More Complex Example" video.)

Your task in this problem is to run the Huffman coding algorithm from lecture on this data set. What is the maximum length of a codeword in the resulting Huffman code?

ADVICE: If you're not getting the correct answer, try debugging your algorithm using some small test cases. And then post them to the discussion forum!

## Huffman Coding
There are mainly two major parts in Huffman Coding

1.   Build a Huffman Tree from input characters.
2.   Traverse the Huffman Tree and assign codes to characters.

#### Huffman Node

In [None]:
class HuffmanNode:
  def __init__(self, symbol, freq):
    self.symbol = symbol
    self.freq = freq
    self.left = None
    self.right = None
    self.huff = ''

  def __lt__(self, other):
    return self.freq < other.freq

#### Print Nodes Utility

In [None]:
def printNodes(node, val, outArr):

  # current node huffman code
  newVal = val + str(node.huff)

  # Traverse
  if node.left:
    printNodes(node.left, newVal, outArr)
  if node.right:
    printNodes(node.right, newVal, outArr)

  # If leaf node
  if not node.left and not node.right:
    print(f'{node.symbol} : {newVal}')
    outArr.append(newVal)


#### Huffman code function

In [None]:
def huffmanCode(inputArray):
  # Make a minHeap
  nodes = []
  for i in range(len(inputArray)):
    heappush(nodes, HuffmanNode(str(i), inputArray[i]))

  # Loop ends with a rootNode
  while len(nodes)>1:
    left = heappop(nodes)
    right = heappop(nodes)
    left.huff += '0'
    right.huff += '1'
    # combine the 2 smallest nodes and create a parent node
    parent = HuffmanNode(left.symbol+right.symbol, left.freq + right.freq)
    #print("Creating parent node with symbol {}, freq {}".format(parent.symbol, parent.freq))
    parent.left = left
    parent.right = right
    heappush(nodes, parent)

  # Traverse the tree
  minLen = float('inf')
  maxLen = 0
  outArr = []
  printNodes(nodes[0], '', outArr)
  for code in outArr:
    if len(code) < minLen:
      minLen = len(code)
    if len(code) > maxLen:
      maxLen = len(code)
  print(f'Min Length: {minLen}')
  print(f'Max Length: {maxLen}')



### Small testcases

In [None]:
## Test 1
test1 = [37, 59, 43, 27, 30, 96, 96, 71, 8, 76]
huffmanCode(test1)

#### Assignment Part 1

In [None]:
df = pd.read_table('https://d3c33hcgiwev3.cloudfront.net/_eed1bd08e2fa58bbe94b24c06a20dcdb_huffman.txt?Expires=1714521600&Signature=ZFo9pSUcWJl0ztMkizRALWWpXFJkx18zzV1rnZNtf5gcjjMfteILHn-obabyr5~WYs0aoiSwDLUq-DZIknonRv7jws3zco-MpqN8HectGi5eNL2Q9cQtOK60WnPbtWH9SdVCpMK49VUqJF-LCe4WFOH9qGPkr-BcYIhs6kdbe7M_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A',header=None)

In [None]:
numNodes = df[0].iloc[0]
print(numNodes)


In [None]:
df = df.drop(df.index[0])
df.head()

In [None]:
inputArray = df[0].to_list()
inputArray[:5]

Answer:
Min Length: 9
Max Length: 19

#### This is a 2 queue implementation
https://www.geeksforgeeks.org/efficient-huffman-coding-for-sorted-input-greedy-algo-4/



## Programming Assignment: Part 3

In this programming problem you'll code up the dynamic programming algorithm for computing a maximum-weight independent set of a path graph.

Download the text file below -- **mwis.txt**

This file describes the weights of the vertices in a path graph (with the weights listed in the order in which vertices appear in the path). It has the following format:

[number_of_vertices]

[weight of first vertex]

[weight of second vertex]

...

For example, the third line of the file is "6395702," indicating that the weight of the second vertex of the graph is 6395702.

Your task in this problem is to run the dynamic programming algorithm (and the reconstruction procedure) from lecture on this data set.  The question is: of the vertices 1, 2, 3, 4, 17, 117, 517, and 997, which ones belong to the maximum-weight independent set?  (By "vertex 1" we mean the first vertex of the graph---there is no vertex 0.)   In the box below, enter a 8-bit string, where the ith bit should be 1 if the ith of these 8 vertices is in the maximum-weight independent set, and 0 otherwise. For example, if you think that the vertices 1, 4, 17, and 517 are in the maximum-weight independent set and the other four vertices are not, then you should enter the string 10011010 in the box below.

In [None]:
def maxWISRecursive(inputArray, A, index):
  # Base case
  if len(inputArray) == 0:
    return A[0]
  elif len(inputArray) == 1:
    return A[1]

  if A[index-1]:
    aIndexExcluding = A[index-1]
  else:
    aIndexExcluding = maxWISRecursive(inputArray[:-1], A, index-1)

  if A[index-2]:
    aIndexIncluding = A[index-2]
  else:
    aIndexIncluding = maxWISRecursive(inputArray[:-2], A, index-2)

  if aIndexExcluding > (aIndexIncluding + inputArray[-1]):
    A[index] = aIndexExcluding
  else:
    A[index] = aIndexIncluding + inputArray[-1]
  return A[index]


In [None]:
def reconstruct(S, V, inA, A):
  i = len(A) - 1
  while i>= 2:
    if A[i-1] >= A[i-2] + inA[i-1]:
      if i== 2:
        S.append(inA[0])
        V.append(1)
      i -= 1
    else:
      S.append(inA[i-1])
      V.append(i)
      if i == 3:
        S.append(inA[0])
        V.append(1)
      i -= 2
  return S, V

In [None]:
def maxWeigthIndependentSet(inputArray):
  arrSize = len(inputArray)
  # Create an array arrWIS
  arrWIS = [0]*(arrSize+1)
  arrWIS[0] = 0
  arrWIS[1] = inputArray[0]

  maxWISRecursive(inputArray, arrWIS, arrSize)

  #print("Weights: {}".format(arrWIS))
  S = [] # Set
  V = [] # vertices
  reconstruct(S, V, inputArray, arrWIS)
  #print(" Max Weight Independent Set: {}".format(S))
  #print("Max Weight Vertices included: {}".format(V))
  return V

In [None]:
# Test 1
inputArray = [8, 1, 1, 8, 1, 1, 8 ]
maxWeigthIndependentSet(inputArray)

#### Read the clustering_big.txt

In [None]:
df = pd.read_table('https://d3c33hcgiwev3.cloudfront.net/_790eb8b186eefb5b63d0bf38b5096873_mwis.txt?Expires=1714608000&Signature=R3KHNGGTtPgreMm2jwbKGpuiUkroNrtdcgg~adBzpOjHYOvoGaO0c05Dt9vohKC2ki~LY5OWuFyd2JU6jQr4nTR09bdbetpWe2xPrUOTjbDZiwp8qFrBq3cxlEh8kr2mJyAFQCYfLzydDzO-oRs~qlvLkNibOX0Mymet~pLcu94_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A', header=None)
df.head()

In [None]:
df = df.drop(index=0)
df.head()

In [None]:
# Now work on the df
inArray = df[0].to_list()
print(len(inArray))


In [None]:
print(inArray)

In [None]:
sys.setrecursionlimit(10000)

In [None]:
vertices = maxWeigthIndependentSet(inArray)

In [None]:
result = ""
for v in [1, 2, 3, 4, 17, 117, 517, 997]:
  if v in vertices:
    result += "1"
  else:
    result += "0"
print(result)

Answer : 10100110