Skip to content

slalit360/fuzzy-word-search

Repository files navigation

Write a HTTP service that provides an endpoint for fuzzy search / autocomplete of English words.

You are given a dataset that contains 333,333 English words and the frequency of their usage in some corpus. A very small sample is shown below:

track 112385243
australia 112197265
discussion 111973466
archive 111971865
once 111882023
others 111397714
entertainment 111394818
agreement 111356320
format 111279626

Let us assume we’re building a web app where the user types in a single word from this list in a search box. We wish to autocomplete the input in the search box.

Your objective is to write a Python app using Django framework that exposes a single endpoint:

GET /search?word=

where input is the (partial) word that the user has typed so far. For example, if the user is looking up procrastination, the service might receive this sequence of requests:

GET /search?word=pro

GET /search?word=procr

GET /search?word=procra

and so on.

The response should be a JSON array containing upto 25 results, ranked by some criteria (see below). Constraints

  1. Matches can occur anywhere in the string, not just at the beginning. For example, eryx should match archaeopteryx (among others).
  2. The ranking of results should satisfy the following:
  • [ 2.1 ] We assume that the user is typing the beginning of the word. Thus, matches at the start of a word should be ranked higher. For example, for the input pract, the result practical should be ranked higher than impractical.

  • [ 2.2 ] Common words (those with a higher usage count) should rank higher than rare words.

  • [ 2.3 ] Short words should rank higher than long words. For example, given the input environ, the result environment should rank higher than environmentalism.

  1. As a corollary to the above, an exact match should always be ranked as the first result. The search algorithm you develop should ideally incorporate some form of a weighted average of all qualifying parameters. The perfect weights, in production systems, are however derived through the use of ML algorithms.

Steps:

  1. install python and

    pip install virtualenv or pipenv
    
  2. cd to project dir and

    vitualenv .
    
  3. activate virtualenv

    .\Script\activate
    
  4. install django

    pip install Django
    
  5. move to django project

    cd WordSearchDjango 
    
  6. complete django code run django

    python manage.py runserver 8080
    
  7. install server

    pip install gunicorn
    
  8. django-heroku install

    pip install django-heroku
    
  9. save to requirements

    pip freeze > requirements.txt
    
  10. create heroku app after installing heroku cli

     heroku login
     heroku create fuzzy-word-search
    
  11. git setup >

     git init
     git add . or git add --all
     git commit -m "final upload"
     heroku git:remote -a fuzzy-word-search
    
  12. deploy and setup

     git push heroku master
     heroku run bash
     python manage.py migrate
     python manage.py createsuperuser
    
  13. visit app

     https://fuzzy-word-search.herokuapp.com/
    

project dir tree:-

+ Include
+ Lib
+ Scripts
- WordSearchDjango
    - Procfile
    - SearchApp/
        - templates/
            - index.html
        - admin.py
        - apps.py
        - fuzzy.py
        - views.py
        - models.py
        - tests.py
        - urls.py
    - WordSearchDjango/
        - settings.py
        - urls.py
        - __init__.py
    - db.sqlite3
    - manage.py
    - requirements.txt
    - word_search.tsv