## String Matching

In this notebook, we use [fuzzywuzzy](https://github.com/seatgeek/fuzzywuzzy), a popular string matching library by SeatGeek. 

For more information on the different methods available and how they differ, see [their blog post explaining methodologies](http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/).

In [3]:
from fuzzywuzzy import fuzz, process

In [4]:
berlin = ['Berlin, Germany', 
          'Berlin, Deutschland', 
          'Berlin', 
          'Berlin, DE']

#### Try matching the first and second strings: 'Berlin, Germany' and 'Berlin, Deutschland'

In [5]:
fuzz.partial_ratio(berlin[0], berlin[1])

60

In [6]:
fuzz.ratio(berlin[0], berlin[1])

65

In [7]:
fuzz.token_set_ratio(berlin[0], berlin[1])

62

In [8]:
fuzz.token_sort_ratio(berlin[0], berlin[1])

62

#### Try matching the second and third strings: 'Berlin, Deutschland' and 'Berlin'

In [9]:
fuzz.partial_ratio(berlin[1], berlin[2])

100

In [10]:
fuzz.ratio(berlin[1], berlin[2])

48

In [11]:
fuzz.token_sort_ratio(berlin[1], berlin[2])

50

### What do you think will score lowest and highest for the final two elements: 
- 'Berlin'
- 'Berlin, DE'

In [12]:
fuzz.ratio(berlin[2], berlin[3])

75

### Extracting a guess out of a list

In [13]:
choices = ['Germany', 'Deutschland', 'France', 
           'United Kingdom', 'Great Britain', 
           'United States']

In [14]:
process.extract('DE', choices, limit=2)

[('Deutschland', 90), ('United States', 57)]

In [15]:
process.extract('UK', choices)

[('Deutschland', 45),
 ('United Kingdom', 45),
 ('United States', 45),
 ('Germany', 0),
 ('France', 0)]

In [16]:
process.extract('frankreich', choices)

[('France', 62),
 ('Great Britain', 41),
 ('Germany', 35),
 ('United Kingdom', 25),
 ('United States', 25)]

### Will this properly extract?

In [17]:
process.extract('USA', choices)

[('Deutschland', 60),
 ('United States', 60),
 ('Germany', 30),
 ('France', 30),
 ('United Kingdom', 30)]