Removes HTML tags from a column in a .csv file

About :

The python script runs 2 versions of cleaning and returns a file with 4 additional columns:

Regex matching with "<>" , "&;"(with 4 or 5 characters in between) anything in between will be removed and "\*" will be replaced with a white space character. Note: the special characters will simply be removed. eg:   &rpos; etc.
BeautifulSoup HTML to text conversion. This will remove HTML tags and convert special characters into their respective ASCII characters
2 parity columns which will return the difference in the number of charcters between the newly generated columns and the original columns. (This is basically a flag that you can check if there has been too many characters replaced)

You need to install these modules:

Place the file in the same directory as the csv file
open terminal at the file location windows : ctrl+ r then cmd then cd <path to file>
Type: python remove_html.py and hit enter
Follow the instructions
You are done.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
remove_html.py		remove_html.py