Skip to content

sumiyariaz6/NLP-PDF-Assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

NLP PDF Assignment

Objective

This project performs Natural Language Processing (NLP) tasks on a PDF document.

Tasks Performed

  • PDF Reading
  • Text Extraction
  • Lowercasing
  • Remove Numbers using Regex
  • Remove Special Symbols
  • Remove Extra Spaces
  • Tokenization
  • Stopword Removal
  • Stemming
  • Lemmatization
  • One Hot Encoding
  • TF-IDF
  • Plotly Visualization

Libraries Used

  • PyPDF2
  • nltk
  • spacy
  • scikit-learn
  • pandas
  • plotly

PDF Source

Think Python PDF: https://greenteapress.com/thinkpython2/thinkpython2.pdf

How to Run

  1. Install required libraries
  2. Open the notebook
  3. Run all cells

Author

Sumiya Riaz

About

NLP assignment on PDF text preprocessing, feature extraction, TF-IDF, and Plotly visualization using Python.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors