#Extracting college names, address from UGC site
Author: Karambir Singh Nain
This include a python script which I made to extract college names from ugc main site. It uses reguler expressions. It outputs a file name colleges.txt with all college names and address. I am able to extract 7758 colleges from 8000 in the list. Most which I couldn't extract were bad data entries in UGC's site.
I wanted to practice Rgex a bit.
It can also be done with string find methods.
##Requirements:
-
UrlLib2 - for downloading html files from usc website.
-
Re - regular expressions module.
If you have any query, give a pull request.