Skip to content

edumagalhaes10/tp_spln

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python pytesseract gramformer nltk streamlit License

Welcome to HydraText,

A Natural Language Processing tool developed in the context of a Software Engineering Master's course: Scripting in Natural Language Processing.

The present tool provides the following functionalities and is designed to be accessible for any person with or without knowledge either in Software Engineering and Natural Language Processing.

Have fun using our tool!!! We hope it's useful!!! 😁

⚙️ Installation

To install and use HydraText follow the steps below:

  • Clone this repository
  • Run either flit install or pip install -e in the project directory

Note: After installing the tool, the command hydratxt_post_install may be run.

How to use: To use HydraText just use the command hydratxt in your command prompt.

📸 Extract text from images and PDFs

Provide an image of any type and the text will be extracted from it. The default language is English, but you can also use other languages. For that, in the image section just select the language you want to use and download it.

Note: This feature uses the Tesseract OCR engine. To use it, you need to download it first. See installation instructions.

You can also extract text from a PDF file. Just upload it and the text will be extracted.

✅ Correct grammar

Provide a text, or use the text extracted from an image or PDF, and the grammar will be corrected. The corrected text can be seen in two ways:

  • Highlights: The words that were removed will be highlighted in red and the words that were added will be highlighted in green.

  • Raw: The corrected text with no highlights.

📄 Summarize text

Provide a text, use the text extracted from an image or PDF, or the text with the grammar corrected. The result will be a summary of the text. The length of the summary can be changed by the user. Default value is 50% of the original text but it can be changed to 25% or 75% of it.

📊 Analyze text sentiment

Provide a text, or use the text extracted from an image or PDF or the text with the grammar corrected, and it will tell if the text is neutral, postive or negative in what comes to sentiment.

📊 Text statistics

Provide a text, or use the text extracted from an image or PDF and it will show the following statistics:

  • Vocabulary size (number of unique words)
  • Top 10 most used words
  • Top 10 most used words without stopwords
  • List of collocations (A sequence of words that occurs together unusually often.)

Authors

Releases

No releases published

Packages

No packages published

Languages