# Fuzzy Wuzzy at a glance

Fuzzy Wuzzy is a sting matching algorithm which uses Levenshtein distance, for calculating the difference between the sequences of the two string. 

### Requirements
The basic requirements for using fuzzy-wuzzy packages are listed below
- **Python 2.7 or higher version**
- **difflib** provides many classes and functions for comparing sequences, directories and files including HTML and context and unified diffs
- **python-Levenshtein** is optional, but can provide a 4-10x speedup in String Matching

### Testing is performed on the basis of
- **pycodestyle** is a tool to check your Python code against some of the style conventions 
- **hypothesis test** is statistical method, used to make statistical decisions usning some raw data
- **pytest** is a testing framework which allow us to write test codes

### Installtion
- **using pip**
   <br>pip install fuzzywuzzy
   <br>pip install python-Levenshtein
   
- **or directly by**
   <br> pip install fuzzywuzzy[speedup]

## How to use Fuzzy Wuzzy Package

In [2]:
from fuzzywuzzy import fuzz as f

### Simple Ratio 
It gives the percent similarity between the two strings as the Levenshtein distance give.


In [4]:
print(f.ratio("Tech Boomerang","tech boomerang"))
print(f.ratio("tech boomerang","tech boomerang"))
print(f.ratio("tech boomerang!","tech"))

86
100
42


But it gives very less percent similarity, when it comes to match substring of x length with the string of y length (x<y). Then the other fuction of fuzzy wuzzy can be used.

### Partial ratio
It is very useful for matching the subtring with another string.

In [5]:
print(f.partial_ratio("Tech Boomerang!","tech boomerang"))
print(f.partial_ratio("Tech Boomerang!".lower(),"tech boomerang".lower()))
print(f.partial_ratio("tech boomerang".lower(),"boomerang tech".lower()))

86
100
64


But the method of Partial ratio is also failed, when the arrangemet of data is changed. Then to solve this problem we will go for another function of fuzzy wuzzy, which is termed as Token Sort Ratio.

### Token Sort Ratio
This function is used where parial ratio fails, but it only works when there are equal number of words in both the strings. And it gives less percent similarity when we use this function for matching substring of length x with string of length y where x<y. To overcome this failure,, there is another function named token sort ratio.

In [20]:
print(f.token_sort_ratio("tech boomerang","boomerang tech"))
print(f.token_sort_ratio("tech boomerang","boomerang tech boomerang !"))

100
74


### Token Set Ratio
This function gives us more flexibility than token sort function because it performs set operation, intersection for finding out the common words then applying the fuzzy ratio to find out the comparision.

In [6]:
print(f.token_set_ratio("tech boomerang","boomerang tech boomerang !"))

100



### Process
Process is one of the powerful function of fuzzy wuzzy. It is used to perform string matching on the vector of the strings or can get string of highest percent similarity among the vector of the string. 

In [8]:
from fuzzywuzzy import process as p
key=["tech boomerang","boomerang tech boomerang!","tech boomerang is a tech related things","boomerang tech!"]
print(p.extract("tech boomerang",key))
print(p.extractOne("tech boomerang",key))

[('tech boomerang', 100), ('boomerang tech!', 95), ('boomerang tech boomerang!', 90), ('tech boomerang is a tech related things', 90)]
('tech boomerang', 100)


And Also we can limit the number of strings in the decresing order of percent similar, can be extracted from the vector of string.

In [9]:
print(p.extract("tech boomerang",key,limit=3))

[('tech boomerang', 100), ('boomerang tech!', 95), ('boomerang tech boomerang!', 90)]
