![Logo](https://private-user-images.githubusercontent.com/84189062/262231381-648d883e-4bdc-4e86-9509-d45a2b26d318.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTk5MzI4MTIsIm5iZiI6MTcxOTkzMjUxMiwicGF0aCI6Ii84NDE4OTA2Mi8yNjIyMzEzODEtNjQ4ZDg4M2UtNGJkYy00ZTg2LTk1MDktZDQ1YTJiMjZkMzE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAyVDE1MDE1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTFhYjRhMTNjOWMyM2YxNjg1ZDBhOWYyMTJmZDhiNWMyOTFkYTIxNmExMWZhOWVkM2MyYzA5NjYwNzgxYWIxMjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.nJCI8fXYVKomPc9cLeDkXyJZzfjPvInru_F3IcKbuWs)
Transform images with text into a concise summary using Tesseract OCR and Google's Pegasus model
VIDEO DEMO
Report Bug
·
Request Feature
Table of Contents
Summarize Snap is a cutting-edge project that seamlessly bridges the gap between visual content and concise textual summaries. This innovative solution is designed to streamline the process of extracting meaningful insights from images containing textual information. Whether it's a snapshot of a magazine article, a wiki page, or any other image containing text, Summarize Snap empowers users to swiftly and accurately obtain summaries.
Key Features:
-
Image-to-Text Conversion: Leveraging the power of Tesseract OCR (Optical Character Recognition), Summarize Snap efficiently converts images containing text into editable textual data. This foundational step ensures that the textual content is accurately extracted from the image, setting the stage for robust summarization.
-
Advanced Text Summarization: With the integration of Google's Pegasus model, Summarize Snap takes text summarization to the next level. This state-of-the-art model, trained on massive amounts of data, excels at capturing the essence of lengthy passages and distilling them into concise and coherent summaries. The model I used was specifically trained on the cnn_dailymail dataset.
-
User-Friendly Interface: Summarize Snap boasts an intuitive and user-friendly interface, making it accessible to both tech-savvy users and newcomers. Simply upload an image with text, and the tool takes care of the rest, ensuring a seamless user experience from start to finish.
-
Versatility and Application: From students seeking to grasp the main ideas of dense academic texts to professionals needing quick insights from business documents, Summarize Snap finds application across various domains and sectors.
Experience the future of text summarization with Summarize Snap. Whether you're a researcher, a student, a professional, or simply someone looking to extract valuable information from images, this project offers a revolutionary solution at your fingertips. Embrace the synergy of Tesseract OCR and Google's Pegasus model for an unparalleled summarization experience.
Unlock the potential of images as a source of succinct knowledge with Summarize Snap today. Transform visual content into actionable insights effortlessly and elevate your information processing game.
LIVE DEMO ISN'T AVAILABLE BECAUSE UNFORTUNATELY I COULDN'T GET TESSERACT TO BE INSTALLED PROPERLY ON RENDER.COM. IT DOESN'T WORK BUT THE LINK IS HERE REGARDLESS: NOT WORKING
VIDEO DEMO HERE
LOCAL VERSION WORKS FINE, BELOW IS THE INSTRUCTIONS
To get a local copy up and running follow these simple example steps.
- Click the green button
-
Download ZIP
-
Extract the file
Make sure all of the files are in the same folder!!!
-
Install Tesseract manually
Latest installer for window: https://github.com/UB-Mannheim/tesseract/wiki
For other OS: https://tesseract-ocr.github.io/tessdoc/Installation.html
Search Edit the system environment variables -> Environment Variables -> PATH -> NEW -> add the path to tesseract-ocr (usually C:\Program Files\Tesseract-OCR) -> OK
In Environment Variables -> New -> Variable name: TESSDATA_PREFIX | Variable value: C:\Program Files\Tesseract-OCR\tessdata -> OK
-
Open cmd -> change directory to "src" folder -> Create a virtual environment (below is for Windows)
py -3 -m venv .venv .venv\Scripts\activate
-
Install all the dependencies
pip install -r requirements.txt
if this doesn't work, try this instead:
pip install transformers torch sentencepiece pytesseract Flask Flask-Reuploaded Flask-WTF
- Run the below command in terminal
flask --app app run
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Duong Hoang - LinkedIn
Project Link: github.com/skald1311/Summerize-Snap