Skip to content

kahwangt/resume-data-preprocessing

Repository files navigation

resume data preprocessing

data cleaning of 250 resumes

In conjunction with assessment of FIT5196 - Data Wrangling where one is provided with 250 resumes and is required to convert them into numerical representations that will be suitable for input into recommender-systems/information-retrieval algorithms.

Performed text preprocessing like sentence segmentation, case normalisation, word tokenisation, stopwords removal, stemming and bigrams generation using Python nltk library

Generated sparse vector representations of the resumes after text pre-processing in Python for further usage in recommender-systems/information-retrieval algorithms

About

data cleaning of 250 resumes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published