This project performs Natural Language Processing (NLP) tasks on a PDF document.
- PDF Reading
- Text Extraction
- Lowercasing
- Remove Numbers using Regex
- Remove Special Symbols
- Remove Extra Spaces
- Tokenization
- Stopword Removal
- Stemming
- Lemmatization
- One Hot Encoding
- TF-IDF
- Plotly Visualization
- PyPDF2
- nltk
- spacy
- scikit-learn
- pandas
- plotly
Think Python PDF: https://greenteapress.com/thinkpython2/thinkpython2.pdf
- Install required libraries
- Open the notebook
- Run all cells
Sumiya Riaz