Skip to content

karambir/ugc-colleges

Repository files navigation

#Extracting college names, address from UGC site

Author: Karambir Singh Nain

This include a python script which I made to extract college names from ugc main site. It uses reguler expressions. It outputs a file name colleges.txt with all college names and address. I am able to extract 7758 colleges from 8000 in the list. Most which I couldn't extract were bad data entries in UGC's site.

I wanted to practice Rgex a bit.

It can also be done with string find methods.

##Requirements:

  1. UrlLib2 - for downloading html files from usc website.

  2. Re - regular expressions module.

If you have any query, give a pull request.

About

Python Script to extract college names from UGC, India website.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages