### <font color='pink'>Shaleigh Smith</font>

---

# <font color='salmon'>difflib Module</font>

---

# <font color='scarlet'>What is the difflib Module used for?</font>  

---


# <font color='black'>Comparing Sequences: </font>

### <font color='indigo'>difflib.Differ( )</font> 

### <font color='purple'>difflib.unified_diff( ) & difflib.context_diff</font> 

###  <font color='navy'>difflib.SequenceMatcher( )</font> 

###  <font color='green'>difflib.get_close_matches( )</font> 

### <font color='teal'>difflib.ndiff( ) & difflib.restore( )</font> 

###  <font color='turquoise'>difflib.HtmlDiff( )</font> 

---

---

In [None]:
import difflib

---

# <font color='indigo'>difflib.Differ( )</font>

### Compares sequences of text lines

### Displays reocognizable deltas (differences)

### Shows specific differences between individual lines

---

### <font color='indigo'>Deltas are shown using symbols at the beginning of the line: </font>

#### - the line was in the 1st sequence but not the 2nd
    
#### + the line was in the 2nd sequence but not the 1st
    
#### ? displays specific differences and their location
    
####  __ nothing has changed



---

# <font color='indigo'>difflib.Differ.compare(a,b)</font>



#### .splitlines( ) returns a list of the lines as strings 

---

In [None]:
poem1 = """Sometimes coding is tough.
Sometimes coding is rough.
But other times coding is fun.
Although I would rather be coding in the sun."""

poem1_lines = poem1.splitlines()

poem2 = """Sometimes coding is tough.
Python makes me huff and puff,
And other times, coding is fun! :)
Although I would rather be coding in the sun."""

poem2_lines = poem2.splitlines()


In [None]:

d = difflib.Differ()

poem_difference = d.compare(poem1_lines, poem2_lines)
print('\n'.join(poem_difference))


---


In [None]:
dna1 = """AGAGCCGTCGGGTCAAAGTCAGTCAAGTTTGG"""

dna2 = """AGAGCCGTCGGGTCAAAAGTCAGTCAAGTTGG"""

d = difflib.Differ()

dna_difference = d.compare(dna1, dna2)
print('\n'.join(dna_difference))

---


# <font color='purple'>difflib.unified_diff( )</font> 

# <font color='purple'>difflib.context_diff( )</font>

### Similar to difflib.Differ

### Less Information

---

## <font color='purple'>difflib.unified_diff(a,b)</font> 


In [None]:
poem1 = """Sometimes coding is tough.
Sometimes coding is rough.
But other times coding is fun.
Although I would rather be coding in the sun."""

poem1_lines = poem1.splitlines()

poem2 = """Sometimes coding is tough.
Python makes me huff and puff,
And other times, coding is fun! :)
Although I would rather be coding in the sun."""

poem2_lines = poem2.splitlines()

In [None]:
poem_difference2 = difflib.unified_diff(poem1_lines, poem2_lines)
print('\n'.join(poem_difference2))

---


## <font color='purple'>difflib.context_diff(a,b)</font> 

In [None]:
poem1 = """Sometimes coding is tough.
Sometimes coding is rough.
But other times coding is fun.
Although I would rather be coding in the sun."""

poem1_lines = poem1.splitlines()

poem2 = """Sometimes coding is tough.
Python makes me huff and puff,
And other times, coding is fun! :)
Although I would rather be coding in the sun."""

poem2_lines = poem2.splitlines()

In [None]:
poem_difference2 = difflib.context_diff(poem1_lines, poem2_lines)
print('\n'.join(poem_difference2))

---

---


# <font color='navy'>difflib.SequenceMatcher( )</font>


### Compares pairs of sequences of any type

### MUST be hashable 

---

## <font color='navy'> An object is _hashable_ if it has a hash value which never changes during its lifetime</font>



---


### Hashable: 
#### Objects that are immutable
    - Strings
    - Integers
    - Booleans
    - Floats
    - Tupples
    
    

### Not Hashable:
#### Objects that are mutable
    - Lists
    - Sets
    - Dictionaries
        

In [None]:
hash(560)

In [None]:
hash('Hello World')

In [None]:
w = [1, 2, 3, 4]
hash(w)

In [None]:
x = (1, 2, 3, 4)
hash(x)

In [None]:
y = set(['hi', 'hello', 'bonjour'])
hash(y)

In [None]:
z = {'a': 1, 'b': 2}
hash(z)

---


## <font color='navy'> Why is this important? </font>
### We can use any hashable (immutable) object with SequenceMatcher

---


## <font color='navy'>difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True )</font>


###  Default _isjunk_ argument: no elements are ignored 

### Default _autojunk_ argument: this automatically treats certain items in a sequence as junk

- If an item has duplicates that make up more than 1% of a sequence that's at least 200 items long it's marked 'popular' and is considered junk.

### .ratio() value over 0.6 means the sequences are close matches

---

In [None]:
a = 'Hey how are you doing today?'
b = 'Good, how are you doing today?'

match = difflib.SequenceMatcher(None, a, b)
c = match.ratio()*100
print(c)

---

In [None]:
a = 'Hey how are you doing today?'
b = 'Good,                     how are you doing today?'

match = difflib.SequenceMatcher(None, a, b)
c = match.ratio()*100
print(c)

---

In [None]:
d = 'Hey how are you doing today?'
e = 'Hey how are you doing today?'

match1 = difflib.SequenceMatcher(None, d, e)
f = match1.ratio()*100
print(f)

---

In [None]:
g = 'Hey how are you doing today?'
h = 'Fine, the weather is nice!'

match3 = difflib.SequenceMatcher(None, g, h)
i = match3.ratio()*100
print(i)

---

In [None]:
list1 = [1, 2, 3]
list2 = [4, 5, 6, 7, 9]

match4 = difflib.SequenceMatcher(None, list1, list2)
lists = match4.ratio()*100
print(lists)

In [None]:
list3 = [1, 2, 3]
list4 = [2, 3, 6, 7, 9]

match4 = difflib.SequenceMatcher(None, list3, list4)
lists1 = match4.ratio()*100
print(lists1)

---

In [None]:
dna_1 = 'AGAGCCGTCGGGTCAAAGTCAGTCAAGTTTGG'

dna_2 = 'AGAGCCGTCGGGTCAAAAGTCAGTCAAGTTGG'

dna_match = difflib.SequenceMatcher(None, dna_1, dna_2)
dna_seq = dna_match.ratio()*100
print(dna_seq)


---

# <font color='green'>difflib.get_close_matches( )</font> 

###  Compares words using a 'similarity score'

### Returns a list of the matches above that score

---

## <font color='green'>difflib.get_close_matches(word, possibilities, n=3, cuttoff=.06)</font> 

In [None]:
difflib.get_close_matches('Hello', ['Hi', 'Helo', 'Heyo', 'Hell', 'Bonjour', 'Bye'])

---

In [None]:
difflib.get_close_matches('Hello', ['Hi', 'Hey', 'Heyo', 'Hell', 'Bonjour', 'Bye'], n = 1)

---

In [None]:
difflib.get_close_matches('Hello', ['Hi', 'Hey', 'Heyo', 'Hell', 'Bonjour', 'Bye'], cutoff=0.4)

---

# <font color='teal'>difflib.ndiff( )</font> 


### Compares lists of strings

### Returns a delta (like differ)


# <font color='teal'>difflib.restore( )</font> 

### Returns one of the sequences that generated a delta

### Used after difflib.ndiff

## <font color='teal'>difflib.ndiff(a, b)</font> 



In [None]:
diff = difflib.ndiff('hello\nmy\nname\nis\nShaleigh\n'.splitlines(keepends=True), #line breaks included
                    'hey\nmi\nnae\nis\nShaliehg\n'.splitlines(keepends=True))
print(''.join(diff), end="")

---

## <font color='teal'>difflib.restore(delta, which)</font> 



In [None]:
diff = difflib.ndiff('hello\nmy\nname\nis\nShaleigh\n'.splitlines(keepends=True),
             'hey\nmi\nnae\nis\nShaliehg\n'.splitlines(keepends=True))

diff = list(diff)
print(''.join(difflib.restore(diff, 1)), end="")

---

#  <font color='turquoise'>difflib.HtmlDiff( )</font> 


### Creates an HTML table or HTML file with table

### Compares text line by line, side by side

---


In [None]:
?difflib

### Sources

https://pymotw.com/3/difflib/index.html

https://docs.python.org/3.6/library/difflib.html

https://docs.python.org/3.6/glossary.html#term-hashable

https://docs.python.org/2.4/lib/sequence-matcher.html