### Shaleigh Smith

# <font color='red'>difflib Module</font>

---

# <font color='blue'>What is the difflib Module used for?</font>  

# <font color='navy'>Comparing Sequences: </font>

### <font color='purple'>difflib.Differ( )</font> 

### <font color='teal'>difflib.unified_diff( ) & difflib.context_diff</font> 

###  <font color='green'>difflib.SequenceMatcher( )</font> 

###  <font color='turquoise'>difflib.HtmlDiff( )</font> 

---

---

# <font color='purple'>difflib.Differ( )</font>

### Compares sequences of text lines

### Displays reocognizable deltas (differences)

### Shows specific differences between individual lines

---

### <font color='purple'>Deltas are shown using symbols at the beginning of the line: </font>

#### - the line was in the 1st sequence but not the 2nd
    
#### + the line was in the 2nd sequence but not the 1st
    
#### ? displays specific differences and their location
    
####  __ nothing has changed



---

In [None]:
poem1 = """Sometimes coding is tough.
Sometimes coding is rough.
But other times coding is fun.
Although I would rather be coding in the sun."""

poem1_lines = poem1.splitlines()

poem2 = """Sometimes coding is tough.
Python makes me huff and puff,
And other times, coding is fun! :)
Although I would rather be coding in the sun."""

poem2_lines = poem2.splitlines()


In [None]:
import difflib

d = difflib.Differ()

poem_difference = d.compare(poem1_lines, poem2_lines)
print('\n'.join(poem_difference))


---


In [None]:
dna1 = """AGAGCCGTCGGGTCAAAGTCAGTCAAGTTTGG"""

dna2 = """AGAGCCGTCGGGTCAAAAGTCAGTCAAGTTGG"""

import difflib

d = difflib.Differ()

dna_difference = d.compare(dna1, dna2)
print('\n'.join(dna_difference))

---


# <font color='teal'>difflib.unified_diff( )</font> 

# <font color='teal'>difflib.context_diff( )</font>

### Similar to difflib.Differ

### Less Information

---

## <font color='teal'>difflib.unified_diff( )</font> 


In [None]:
poem1 = """Sometimes coding is tough.
Sometimes coding is rough.
But other times coding is fun.
Although I would rather be coding in the sun."""

poem1_lines = poem1.splitlines()

poem2 = """Sometimes coding is tough.
Python makes me huff and puff,
And other times, coding is fun! :)
Although I would rather be coding in the sun."""

poem2_lines = poem2.splitlines()

In [None]:
poem_difference2 = difflib.unified_diff(poem1_lines, poem2_lines)
print('\n'.join(poem_difference2))

---


## <font color='teal'>difflib.context_diff( )</font> 

In [None]:
poem1 = """Sometimes coding is tough.
Sometimes coding is rough.
But other times coding is fun.
Although I would rather be coding in the sun."""

poem1_lines = poem1.splitlines()

poem2 = """Sometimes coding is tough.
Python makes me huff and puff,
And other times, coding is fun! :)
Although I would rather be coding in the sun."""

poem2_lines = poem2.splitlines()

In [None]:
poem_difference2 = difflib.context_diff(poem1_lines, poem2_lines)
print('\n'.join(poem_difference2))

---

---


# <font color='green'>difflib.SequenceMatcher( )</font>


### Compares pairs of sequences of any type

### MUST be hashable 

---

## <font color='green'> An object is _hashable_ if it has a hash value which never changes during its lifetime</font>



---


### Hashable: 
#### Objects that are immutable
    - Strings
    - Integers
    - Booleans
    - Floats
    - Tupples
    
    

### Not Hashable:
#### Objects that are mutable
    - Lists
    - Sets
    - Dictionaries
        

In [None]:
hash(560)

In [None]:
hash('Hello World')

In [None]:
w = [1, 2, 3, 4]
hash(w)

In [None]:
x = (1, 2, 3, 4)
hash(x)

In [None]:
y = set(['hi', 'hello', 'bonjour'])
hash(y)

In [None]:
z = {'a': 1, 'b': 2}
hash(z)

---


## <font color='green'> Why is this important? </font>
### We can use any hashable (immutable) object with SequenceMatcher

---


In [None]:
a = 'Hey how are you doing today?'
b = 'Good, how are you doing today?'

match = difflib.SequenceMatcher(None, a, b)
c = match.ratio()*100
print(c)

## <font color='green'> What does None mean? </font>

### <font color='green'> Default _isjunk_ argument: no elements are ignored </font>





---

In [None]:
a = 'Hey how are you doing today?'
b = 'Good,                     how are you doing today?'

match = difflib.SequenceMatcher(None, a, b)
c = match.ratio()*100
print(c)

---

In [None]:
d = 'Hey how are you doing today?'
e = 'Hey how are you doing today?'

match1 = difflib.SequenceMatcher(None, d, e)
f = match1.ratio()*100
print(f)

---

In [None]:
g = 'Hey how are you doing today?'
h = 'Fine, the weather is nice!'

match3 = difflib.SequenceMatcher(None, g, h)
i = match3.ratio()*100
print(i)

---

In [None]:
list1 = [1, 2, 3]
list2 = [4, 5, 6, 7, 9]

match4 = difflib.SequenceMatcher(None, list1, list2)
lists = match4.ratio()*100
print(lists)

In [None]:
list3 = [1, 2, 3]
list4 = [2, 3, 6, 7, 9]

match4 = difflib.SequenceMatcher(None, list3, list4)
lists1 = match4.ratio()*100
print(lists1)

---

In [None]:
dna_1 = 'AGAGCCGTCGGGTCAAAGTCAGTCAAGTTTGG'

dna_2 = 'AGAGCCGTCGGGTCAAAAGTCAGTCAAGTTGG'

dna_match = difflib.SequenceMatcher(None, dna_1, dna_2)
dna_seq = dna_match.ratio()*100
print(dna_seq)


---

#  <font color='turquoise'>difflib.HtmlDiff( )</font> 

---


Now talk about other variaions using ?difflib

### Sources

https://pymotw.com/3/difflib/index.html

https://docs.python.org/3.6/library/difflib.html

https://docs.python.org/3.6/glossary.html#term-hashable

https://docs.python.org/2.4/lib/sequence-matcher.html