SATS

Source Agnostic Text Summerizer (SATS). An application designed to automate text-based data extraction and summarisation in volumes.

First Principles

Most communication over the internet happens through unstructured text.
To get an informed view regarding general trends based on feedback for a given topic (e.g. product, event, etc.), extraction of aggregated themes based on numerous documents of text will be invaluable.

Design overview

Set-up

The environment used for this requires the installation of tesseract-ocr for text recognition. If you use windows, you can install it from here: https://github.com/UB-Mannheim/tesseract/wiki

You should take note of where the destination folder for the install location. It will likely look as follows: C:\Users\username\AppData\Local\Programs\Tesseract-OCR This will be required when runnin Tesseract-OCR through the python script.

You will also need the Poppler library if you are converting your PDF to images. This can be installed here: [install poppler](https://github.com/oschwartz10612/poppler-windows/releases/r. Read the instructions to ensure you extract all of the documents in the correct pkgs or library folder.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.idea		.idea
extraction_layer		extraction_layer
static		static
templates		templates
utils		utils
.env.example		.env.example
README.md		README.md
SATS.drawio		SATS.drawio
__init__.py		__init__.py
app.py		app.py
environment.yml		environment.yml
forms.py		forms.py
prompts.py		prompts.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SATS

First Principles

Design overview

Set-up

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sm634/SATS

Folders and files

Latest commit

History

Repository files navigation

SATS

First Principles

Design overview

Set-up

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages