Skip to content

Commit

Permalink
I don’t know rst
Browse files Browse the repository at this point in the history
  • Loading branch information
kvh committed Mar 26, 2017
1 parent 9d345a2 commit 7f4cc7f
Showing 1 changed file with 46 additions and 49 deletions.
95 changes: 46 additions & 49 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,65 +35,62 @@ Usage

Basic entity detection and matching for built-in types.

```python
>>> import match
>>> match.detect_type('608-555-5555')
(1, PhoneNumberType)
>>> match.detect_type('joe.van.gogh@example.com')
(1, EmailType)
>>> match.detect_type('John R. Smith')
(.95, FullNameType)
>>> match.detect_type('Hi, how are you?')
(1, StringType)
>>> match.score_similarity('Jonathon R. Smith', 'john r smith')
(.92, FullNameType)
>>> match.score_similarity('123 easy st, NY, NY', '123 Easy Street, New York City')
(.98, AddressType)
>>> match.score_similarity('Hi, how are you Joe?', 'hi how are you doing joe?')
(.81, StringType)
>>> match.score_similarity_as_type('608-555-5555', '608-555-5554', 'phonenumber')
.0
>>> match.score_similarity_as_type('608-555-5555', '608-555-5554', 'string')
.9
>>> match.parse('608-555-5555')
('+1 608 555 5555', PhoneNumberType)
>>> match.parse(' march 3rd, 1997', to_object=True)
(datetime.datetime(1997, 3, 3), DateTimeType)
>>> match.parse_as_type(' march 3rd, 1997', 'email')
None
```
.. code:: python
>>> import match
>>> match.detect_type('608-555-5555')
(1, PhoneNumberType)
>>> match.detect_type('joe.van.gogh@example.com')
(1, EmailType)
>>> match.detect_type('John R. Smith')
(.95, FullNameType)
>>> match.detect_type('Hi, how are you?')
(1, StringType)
>>> match.score_similarity('Jonathon R. Smith', 'john r smith')
(.92, FullNameType)
>>> match.score_similarity('123 easy st, NY, NY', '123 Easy Street, New York City')
(.98, AddressType)
>>> match.score_similarity('Hi, how are you Joe?', 'hi how are you doing joe?')
(.81, StringType)
>>> match.score_similarity_as_type('608-555-5555', '608-555-5554', 'phonenumber')
.0
>>> match.score_similarity_as_type('608-555-5555', '608-555-5554', 'string')
.9
>>> match.parse('608-555-5555')
('+1 608 555 5555', PhoneNumberType)
>>> match.parse(' march 3rd, 1997', to_object=True)
(datetime.datetime(1997, 3, 3), DateTimeType)
>>> match.parse_as_type(' march 3rd, 1997', 'email')
None
Probabilistic matching, based on frequencies in a given corpus.

```python
.. code:: python
>>> from match import similarities
>>> import random
>>> corpus = random.sample('a'*10000 + ' '*10000 + 'b'*1000 + 'c'*100 + 'd'*10, k=21110)
>>> psim = similarities.ProbabilisticNgramSimilarity(corpus, grams=2)
>>> psim.similarity('ab ba c', 'ab ba d') # Lower similarity since 'a' is common
.6
>>> psim.similarity('db bd c', 'db bd a') # Higher similarity since 'd' is rare
.8
>>> from match import similarities
>>> import random
>>> corpus = random.sample('a'*10000 + ' '*10000 + 'b'*1000 + 'c'*100 + 'd'*10, k=21110)
>>> psim = similarities.ProbabilisticNgramSimilarity(corpus, grams=2)
>>> psim.similarity('ab ba c', 'ab ba d') # Lower similarity since 'a' is common
.6
>>> psim.similarity('db bd c', 'db bd a') # Higher similarity since 'd' is rare
.8
```
Custom types

```python
.. code:: python
>>> from match.similarity import ProbabilisticDiceCoefficient
>>> corpus = ''.join(['cheddar', 'brie', 'guyere', 'mozzarella', 'parmesian', 'jack', 'colby'])
>>> cheese_sim = ProbabilisticDiceCoefficient(corpus)
>>> match.add_type('cheese', StringType(similarity_measure=cheese_sim))
>>> match.detect_type('colby jack')
(.8, 'cheese')
>>> from match.similarity import ProbabilisticDiceCoefficient
>>> corpus = ''.join(['cheddar', 'brie', 'guyere', 'mozzarella', 'parmesian', 'jack', 'colby'])
>>> cheese_sim = ProbabilisticDiceCoefficient(corpus)
>>> match.add_type('cheese', StringType(similarity_measure=cheese_sim))
>>> match.detect_type('colby jack')
(.8, 'cheese')
```
Credits
---------
Expand Down

0 comments on commit 7f4cc7f

Please sign in to comment.