data cleaning of 250 resumes
In conjunction with assessment of FIT5196 - Data Wrangling where one is provided with 250 resumes and is required to convert them into numerical representations that will be suitable for input into recommender-systems/information-retrieval algorithms.
Performed text preprocessing like sentence segmentation, case normalisation, word tokenisation, stopwords removal, stemming and bigrams generation using Python nltk library
Generated sparse vector representations of the resumes after text pre-processing in Python for further usage in recommender-systems/information-retrieval algorithms