The following mini project focuses on implementation of Term Frequency Inverse Document Frequency algorithm along with Natural Language Toolkit for classification of resumes based on certain parameters.
This project is divided into two parts. First, I have designed a TFIDF(Term Frequency Inverse Document Frequency) which works on the principle of bag of words where I have calculated the frequencies of the unique words and most occurring words. Based on these criteria I have calculated the log function which gives me the respective values for each of unique and most occurring words/token within the document. Secondly, I have applied this particular TFIDF algorithm on other resume sets and have categorised different impacting factors required for resume classification/selection. I have used a "stopwords" terminology and "word tokeninizing" to find out various unique characteristics and parameters for classification.
The datasets are present within the files and are in .txt format.
To use this repo just download the repository, open in jupyter notebook. Start creating something awesome! Good Luck!
- Prerequisite Things required:
- Python3
- Jupyter Notebook
- Matplotlib
- Pandas
- NLTK toolkit
- Other dependencies
N.B.- If you like my work, show some appreciation by giving a star. This motivates me to work on different problems.