No description, website, or topics provided.
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
README
__init__.py
graph.html
index.html
manage.py
model.py
settings.py
test.py
urls.py
wordstats.py

README

Thoughts
Used a primitive form of TDD using assert
Integrated into web later
1 hour to get filtered word count but with duplicates
1 hour to fix duplicates and get shortest longest words and limit to 10
1 hour to get web UI and graph
Avoided trying to reuse modules as it is kind of different from previous exercises and last time that took an extra hour.

Simple Part: 
- Use the Python website that you created in week 2 assignment. 
- Accept a url as input 
- Read page content of the url 
- Split the content into words and find out the number of occurences 
of every word in the page. 
- Do not consider the following words: a, an, the, on, in, for, and, 
to 
- Show a simple bar graph on the ui showing the top 10 words with 
respective count (X axis - words, Y axis - height of the bar as per 
count of word in page) (Do this using HTML5. Add whatever transitions 
or effects if you want to beautify it) 
- Also find out the longest and the shortest word in the page and 
mention in UI. (Longest and shortest - as per number of characters in 
a word) 
ONE BIG CONDITION - YOU CANNOT USE ANY LOOPING CONSTRUCTS LIKE FOR AND 
WHILE. (Hint: use map and lambda expressions) 
Try to use map, filter, reduce functionality of Python. Explore use of 
lambda, pure functions, generators, etc. 
Complex part 1: (Optional. if you complete simple part and want to 
take up something more challenging) 
- Store word counts of the url in database, e.g. in sqlite3 
- If the user enters more such urls consecutively, do same processing 
and store word counts per site in the db. 
- There will be a search textbox in UI. The user can enter a word in 
the text box and click "Search Url" button. 
- On click of "Search Url", show the urls from db in order as per the 
number of occurences of that word in that url 
Yet more.. Complex part 2: 
Demonstrate how MapReduce of the above example can be done in a 
distributed environment