resume data preprocessing

data cleaning of 250 resumes

In conjunction with assessment of FIT5196 - Data Wrangling where one is provided with 250 resumes and is required to convert them into numerical representations that will be suitable for input into recommender-systems/information-retrieval algorithms.

Performed text preprocessing like sentence segmentation, case normalisation, word tokenisation, stopwords removal, stemming and bigrams generation using Python nltk library

Generated sparse vector representations of the resumes after text pre-processing in Python for further usage in recommender-systems/information-retrieval algorithms

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitattributes		.gitattributes
29442826_countVec.txt		29442826_countVec.txt
29442826_vocab.txt		29442826_vocab.txt
README.md		README.md
task2_29442826_1.ipynb		task2_29442826_1.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

resume data preprocessing

About

Uh oh!

Releases

Packages

Languages

kahwangt/resume-data-preprocessing

Folders and files

Latest commit

History

Repository files navigation

resume data preprocessing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages