Document Archive Browser as Static Website With Keyword Search

The following description is PART TWO of a two parts description for a whole document archive system. PART TWO deals with the genaration of a static webpage as a simple, yet powerful document search and retrieval system to simply browse your documents as PDF within your web browser.

The Document Archive Browser as a static web page can either reside on any file system (e.g. USB stick) or on a simple Web Server or Webpage hoster.

PART ONE deals with the scanning of documents, collecting some meta information for further usage, enhancing the quality of the documents, reducing the file size of them drastically, extract with the help of OCR tools (Tesseract) some plain text, putting the scanned images and extracted texts into one PDF with some (previously collected) meta information and finally organizing all the files within a simple tree structure onto your file system. It is a totally decoupled workflow and the Communication Interface between PART ONE and PART TWO is just the file tree structure, existing of Document_IDs, Metadata as JSON files and the PDF files.

Feature List

eigenes, simples Template System
responsive WebDesign (mobile first)
statische Web-Seiten
so wenige Abhängigkeiten wie möglich
Suche über Schlagwort-Katalog
Generierung entspricht einem Build-Prozess, inkl. Initialisierung, CleanUp, usw.
Metainformationen zu den Dokumenten liegen als JSON-Dateien vor

SiteMap of Website

Grobe SiteMap des Static-Document-Archive sieht wie folgt aus:

doc_archive_root
├── index.html (Liste aller Dokumente zum aktuellen Jahr <YYYY>)
├── pages.css
├── keyword_catalog.html (Liste aller Schlagworte)
├── keyword_<xxx>.html (Liste aller Dokumente zu einem Schlagwort <xxx>)
├── archive.html (Liste aller Jahresarchive)
├── archive_<YYYY>.html (Liste aller Dokumente zu einem Jahr <YYYY>)
└── archive
    ├── <yyyymmdd_xx>.* (verlinktes Dokument, z.B. PDF, PNG, JPG)
    └── ...

SiteMap of Build

Grobe SiteMap der Build-Umgebung des Generators sieht wie folgt aus:

project_root
├── source
|   ├── config_template.py
|   ├── build.py
|   ├── templatehandler.py
|   ├── oneyear.py
|   ├── onekeyword.py
|   ├── allkeywords.py
|   ├── allyears.py
|   ├── jsontreewalker.py
|   └── dirtreewalker.py
├── doc
|   ├── 
|   └── 
├── pages.css
├── requirements.txt
├── README.md
├── LICENSE
├── .gitignore
└── templates
    ├── 
    ├── 
    └──

SiteMap of Scan Archive

scan_archive_root
├── YYYYMMDD_01
│   ├── YYYYMMDD_01.json
│   ├── YYYYMMDD_01.pdf
│   └── ...
├── YYYYMMDD_02
│   ├── YYYYMMDD_02.json
...

Process of Build

Grober Ablauf des Build-Prozesses:

Initialisierung
CleanUp des letzten Builds (Verzeichnisbaum des Dokumenten-Archivs löschen)
Zielverzeichnisse erstellen
Scan-Archiv-Baum durchschreiten und in allen Unterverzeichnissen die JSON-Dateien einlesen und deren Metadaten in die globale Datenstruktur aufnehmen.
Datenstruktur für Jahresarchive erstellen (Jahr --> Dokument-ID)
Datenstruktur für Stichwortverzeichnisse erstellen (Stichwort --> Dokument-ID)
index.html generieren
archive.html generieren
.html generieren (optional)
keyword_catalog.html generieren
verlinkte Dateien (Bilder, PDF, usw.) vom Scan-Archiv in das Dokumenten-Archiv kopieren

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Document Archive Browser as Static Website With Keyword Search

Content

Feature List

SiteMap of Website

SiteMap of Build

SiteMap of Scan Archive

Process of Build

Links on CSS

Links on Python

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
doc		doc
source		source
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

marctrommen/docarchivebrowser

Folders and files

Latest commit

History

Repository files navigation

Document Archive Browser as Static Website With Keyword Search

Content

Feature List

SiteMap of Website

SiteMap of Build

SiteMap of Scan Archive

Process of Build

Links on CSS

Links on Python

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages