Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Commit

Permalink
Merge 68cfde5 into 5612ad0
Browse files Browse the repository at this point in the history
  • Loading branch information
shamoon authored Feb 15, 2022
2 parents 5612ad0 + 68cfde5 commit c796890
Show file tree
Hide file tree
Showing 22 changed files with 687 additions and 348 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ If you want to see paperless-ng in action, [more screenshots are available in th

# Getting started

The recommended way to deploy paperless is docker-compose. The files in the /docker/hub directory are configured to pull the image from Docker Hub.
The recommended way to deploy paperless is docker-compose. The files in the /docker/compose directory are configured to pull the image from Docker Hub.

Read the [documentation](https://paperless-ng.readthedocs.io/en/latest/setup.html#installation) on how to get started.

Expand Down
2 changes: 1 addition & 1 deletion ansible/tasks/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@
- regexp: PAPERLESS_TIKA_ENDPOINT
line: "PAPERLESS_TIKA_ENDPOINT={{ paperlessng_tika_endpoint }}"
- regexp: PAPERLESS_TIKA_GOTENBERG_ENDPOINT
line: "PAPERLESS_TIKA_GOTENBERG_ENDPOINT={{ paperlessng_tika_endpoint }}"
line: "PAPERLESS_TIKA_GOTENBERG_ENDPOINT={{ paperlessng_tika_gotenberg_endpoint }}"
# Software tweaks
- regexp: PAPERLESS_TIME_ZONE
line: "PAPERLESS_TIME_ZONE={{ paperlessng_time_zone }}"
Expand Down
22 changes: 11 additions & 11 deletions docs/advanced_usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ that had a ``match`` property of ``bc hydro`` and a ``matching_algorithm`` of
your ``Home Utility`` tag so long as the text ``bc hydro`` appears in the body
of the document somewhere.

The matching logic is quite powerful, and supports searching the text of your
The matching logic is quite powerful. It supports searching the text of your
document with different algorithms, and as such, some experimentation may be
necessary to get things right.

In order to have a tag, correspondent or type assigned automatically to newly
In order to have a tag, correspondent, or type assigned automatically to newly
consumed documents, assign a match and matching algorithm using the web
interface. These settings define when to assign correspondents, tags and types
interface. These settings define when to assign correspondents, tags, and types
to documents.

The following algorithms are available:
Expand All @@ -34,16 +34,16 @@ The following algorithms are available:
either of these terms.
* **All:** Requires that every word provided appears in the PDF, albeit not in the
order provided.
* **Literal:** Matches only if the match appears exactly as provided in the PDF.
* **Literal:** Matches only if the match appears exactly as provided (i.e. preserve ordering) in the PDF.
* **Regular expression:** Parses the match as a regular expression and tries to
find a match within the document.
* **Fuzzy match:** I dont know. Look at the source.
* **Auto:** Tries to automatically match new documents. This does not require you
to set a match. See the notes below.

When using the "any" or "all" matching algorithms, you can search for terms
When using the *any* or *all* matching algorithms, you can search for terms
that consist of multiple words by enclosing them in double quotes. For example,
defining a match text of ``"Bank of America" BofA`` using the "any" algorithm,
defining a match text of ``"Bank of America" BofA`` using the *any* algorithm,
will match documents that contain either "Bank of America" or "BofA", but will
not match documents containing "Bank of South America".

Expand All @@ -58,8 +58,8 @@ Automatic matching
==================

Paperless-ng comes with a new matching algorithm called *Auto*. This matching
algorithm tries to assign tags, correspondents and document types to your
documents based on how you have assigned these on existing documents. It
algorithm tries to assign tags, correspondents, and document types to your
documents based on how you have already assigned these on existing documents. It
uses a neural network under the hood.

If, for example, all your bank statements of your account 123 at the Bank of
Expand All @@ -76,11 +76,11 @@ feature:
changes. Paperless periodically (default: once each hour) checks for changes
and does this automatically for you.
* The Auto matching algorithm only takes documents into account which are NOT
placed in your inbox (i.e., have inbox tags assigned to them). This ensures
placed in your inbox (i.e. have any inbox tags assigned to them). This ensures
that the neural network only learns from documents which you have correctly
tagged before.
* The matching algorithm can only work if there is a correlation between the
tag, correspondent or document type and the document itself. Your bank
tag, correspondent, or document type and the document itself. Your bank
statements usually contain your bank account number and the name of the bank,
so this works reasonably well, However, tags such as "TODO" cannot be
automatically assigned.
Expand Down Expand Up @@ -167,7 +167,7 @@ into paperless. It receives the following arguments:
* Correspondent
* Tags

The script can be in any language you like, but for a simple shell script
The script can be written in any language, but for a simple shell script
example, you can take a look at ``post-consumption-example.sh`` in the
``scripts`` directory in this project.

Expand Down
7 changes: 7 additions & 0 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,13 @@ PAPERLESS_HTTP_REMOTE_USER_HEADER_NAME=<str>

Defaults to `HTTP_REMOTE_USER`.

PAPERLESS_LOGOUT_REDIRECT_URL=<str>
URL to redirect the user to after a logout. This can be used together with
`PAPERLESS_ENABLE_HTTP_REMOTE_USER` to redirect the user back to the SSO
application's logout page.

Defaults to None, which disables this feature.

.. _configuration-ocr:

OCR settings
Expand Down
34 changes: 24 additions & 10 deletions docs/scanners.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,55 +18,69 @@ Physical scanners
+---------+----------------+-----+-----+-----+------+----------+----------------+
| | | FTP | NFS | SMB | SMTP | API [1]_ | |
+=========+================+=====+=====+=====+======+==========+================+
| Brother | `ADS-1700W`_ | yes | no | yes | yes | |`holzhannes`_ |
| Brother | `ADS-1700W`_ | yes | | yes | yes | |`holzhannes`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `ADS-1600W`_ | yes | no | yes | yes | |`holzhannes`_ |
| Brother | `ADS-1600W`_ | yes | | yes | yes | |`holzhannes`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `ADS-1500W`_ | yes | no | yes | yes | |`danielquinn`_ |
| Brother | `ADS-1500W`_ | yes | | yes | yes | |`danielquinn`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `ADS-1100W`_ | yes | no | no | no | |`ytzelf`_ |
| Brother | `ADS-1100W`_ | yes | | | | |`ytzelf`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `ADS-2800W`_ | yes | yes | | yes | yes |`philpagel`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-J6930DW`_ | yes | | | | |`ayounggun`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-L5850DW`_ | yes | | | yes | |`holzhannes`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-L2750DW`_ | yes | | yes | yes | |`muued`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-J5910DW`_ | yes | | | | |`bmsleight`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-8950DW`_ | yes | | | yes | yes |`philpagel`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Brother | `MFC-9142CDN`_ | yes | | yes | | |`REOLDEV`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Fujitsu | `ix500`_ | yes | | yes | | |`eonist`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Epson | `ES-580W`_ | yes | | yes | yes | |`fignew`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Epson | `WF-7710DWF`_ | yes | | yes | | |`Skylinar`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Fujitsu | `S1300i`_ | yes | | yes | | |`jonaswinkler`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+
| Doxie | `Q2`_ | no | no | no | no | yes |`Unkn0wnCat`_ |
| Doxie | `Q2`_ | | | | | yes |`Unkn0wnCat`_ |
+---------+----------------+-----+-----+-----+------+----------+----------------+

.. _MFC-L5850DW: https://www.brother-usa.com/products/mfcl5850dw
.. _MFC-L2750DW: https://www.brother.de/drucker/laserdrucker/mfc-l2750dw
.. _ADS-1700W: https://www.brother-usa.com/products/ads1700w
.. _ADS-1600W: https://www.brother-usa.com/products/ads1600w
.. _ADS-1500W: https://www.brother.ca/en/p/ads1500w
.. _ADS-1100W: https://support.brother.com/g/b/downloadtop.aspx?c=fr&lang=fr&prod=ads1100w_eu_as_cn
.. _ADS-2800W: https://www.brother-usa.com/products/ads2800w
.. _MFC-J6930DW: https://www.brother.ca/en/p/MFCJ6930DW
.. _MFC-J5910DW: https://www.brother.co.uk/printers/inkjet-printers/mfcj5910dw
.. _MFC-8950DW: https://www.brother-usa.com/products/mfc8950dw
.. _MFC-9142CDN: https://www.brother.co.uk/printers/laser-printers/mfc9140cdn
.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
.. _ES-580W: https://epson.com/Support/Scanners/ES-Series/Epson-WorkForce-ES-580W/s/SPT_B11B258201
.. _WF-7710DWF: https://www.epson.de/en/products/printers/inkjet-printers/for-home/workforce-wf-7710dwf
.. _ix500: http://www.fujitsu.com/us/products/computing/peripheral/scanners/scansnap/ix500/
.. _S1300i: https://www.fujitsu.com/global/products/computing/peripheral/scanners/soho/s1300i/
.. _Q2: https://www.getdoxie.com/product/doxie-q/


.. _danielquinn: https://github.com/danielquinn
.. _ayounggun: https://github.com/ayounggun
.. _bmsleight: https://github.com/bmsleight
.. _danielquinn: https://github.com/danielquinn
.. _eonist: https://github.com/eonist
.. _fignew: https://github.com/fignew
.. _holzhannes: https://github.com/holzhannes
.. _jonaswinkler: https://github.com/jonaswinkler
.. _REOLDEV: https://github.com/REOLDEV
.. _Skylinar: https://github.com/Skylinar
.. _jonaswinkler: https://github.com/jonaswinkler
.. _holzhannes: https://github.com/holzhannes
.. _ytzelf: https://github.com/ytzelf
.. _Unkn0wnCat: https://github.com/Unkn0wnCat
.. _muued: https://github.com/muued
.. _philpagel: https://github.com/philpagel

.. [1] Scanners with API Integration allow to push scanned documents directly to :ref:`Paperless API <api-file_uploads>`, sometimes referred to as Webhook or Document POST.
Expand Down
24 changes: 21 additions & 3 deletions docs/setup.rst
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ You can go multiple routes to setup and run Paperless:

The Docker routes are quick & easy. These are the recommended routes. This configures all the stuff
from the above automatically so that it just works and uses sensible defaults for all configuration options.
Here you find a cheat-sheet for docker beginners: `CLI Basics <https://sehn.tech/post/devops-with-docker/>`_
Here you find a cheat-sheet for docker beginners: `CLI Basics <https://www.sehn.tech/refs/devops-with-docker/>`_

The bare metal route is complicated to setup but makes it easier
should you want to contribute some code back. You need to configure and
Expand All @@ -99,7 +99,7 @@ The ansible route combines benefits of both options:
the setup process is fully automated, reproducible and `idempotent <https://docs.ansible.com/ansible/latest/reference_appendices/glossary.html#Idempotency>`_,
it includes the same sensible defaults, and it simultaneously provides the flexibility of a bare metal installation.

.. _CLI Basics: https://sehn.tech/post/devops-with-docker/
.. _CLI Basics: https://www.sehn.tech/refs/devops-with-docker/
.. _idempotent: https://docs.ansible.com/ansible/latest/reference_appendices/glossary.html#Idempotency

.. _setup-docker_script:
Expand All @@ -116,7 +116,7 @@ performs all the steps described in :ref:`setup-docker_hub` automatically.

.. code:: shell-session
$ curl -L https://raw.githubusercontent.com/jonaswinkler/paperless-ng/master/install-paperless-ng.sh | bash
$ bash <(curl -L https://raw.githubusercontent.com/jonaswinkler/paperless-ng/master/install-paperless-ng.sh)
.. _setup-docker_hub:

Expand Down Expand Up @@ -171,6 +171,24 @@ Install Paperless from Docker Hub
Don't change the part after the colon or paperless wont find your documents.

You may also need to change the default port that the webserver will use
from the default (8000):

.. code::
ports:
- 8000:8000
Replace the part BEFORE the colon with a port of your choice:

.. code::
ports:
- 8010:8000
Don't change the part after the colon or edit other lines that refer to
port 8000. Modifying the part before the colon will map requests on another
port to the webserver running on the default port.

5. Modify ``docker-compose.env``, following the comments in the file. The
most important change is to set ``USERMAP_UID`` and ``USERMAP_GID``
Expand Down
13 changes: 6 additions & 7 deletions docs/usage_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Each document has a couple of fields that you can assign to them:
* A *Document* is a piece of paper that sometimes contains valuable
information.
* The *correspondent* of a document is the person, institution or company that
a document either originates form, or is sent to.
a document either originates from, or is sent to.
* A *tag* is a label that you can assign to documents. Think of labels as more
powerful folders: Multiple documents can be grouped together with a single
tag, however, a single document can also have multiple tags. This is not
Expand Down Expand Up @@ -86,10 +86,9 @@ The consumption directory
=========================

The primary method of getting documents into your database is by putting them in
the consumption directory. The consumer runs in an infinite
loop looking for new additions to this directory and when it finds them, it goes
about the process of parsing them with the OCR, indexing what it finds, and storing
it in the media directory.
the consumption directory. The consumer runs in an infinite loop, looking for new
additions to this directory. When it finds them, the consumer goes about the process
of parsing them with the OCR, indexing what it finds, and storing it in the media directory.

Getting stuff into this directory is up to you. If you're running Paperless
on your local computer, you might just want to drag and drop files there, but if
Expand Down Expand Up @@ -128,7 +127,7 @@ IMAP (Email)
============

You can tell paperless-ng to consume documents from your email accounts.
This is a very flexible and powerful feature, if you regularly received documents
This is a very flexible and powerful feature if you regularly received documents
via mail that you need to archive. The mail consumer can be configured by using the
admin interface in the following manner:

Expand Down Expand Up @@ -396,7 +395,7 @@ Task management

Some documents require attention and require you to act on the document. You
may take two different approaches to handle these documents based on how
regularly you intent to use paperless and scan documents.
regularly you intend to scan documents and use paperless.

* If you scan and process your documents in paperless regularly, assign a
TODO tag to all scanned documents that you need to process. Create a saved
Expand Down
1 change: 1 addition & 0 deletions install-paperless-ng.sh
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ echo "Specify the default language that most of your documents are written in."
echo "Use ISO 639-2, (T) variant language codes: "
echo "https://www.loc.gov/standards/iso639-2/php/code_list.php"
echo "Common values: eng (English) deu (German) nld (Dutch) fra (French)"
echo "This can be a combination of multiple languages such as deu+eng"
echo ""

ask "OCR language" "eng"
Expand Down
Loading

0 comments on commit c796890

Please sign in to comment.