Skip to content
This repository has been archived by the owner on Feb 16, 2023. It is now read-only.

Commit

Permalink
Merge branch 'dev'
Browse files Browse the repository at this point in the history
  • Loading branch information
Jonas Winkler committed Nov 18, 2020
2 parents 85721f1 + d7a0848 commit 8395bdf
Show file tree
Hide file tree
Showing 56 changed files with 2,165 additions and 8,914 deletions.
5 changes: 5 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
/src-ui/.vscode
/src-ui/node_modules
/src-ui/dist
.git
/export
/consume
/media
/data
/docs
.pytest_cache
/dist
/scripts
7 changes: 1 addition & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,23 +5,18 @@ python:
- "3.7"
- "3.8"

services:
- docker

before_install:
- sudo apt-get update -qq
- sudo apt-get install -qq libpoppler-cpp-dev unpaper tesseract-ocr

install:
- pip install --upgrade pipenv
- pipenv install --dev
- pipenv install --system --dev

script:
- cd src/
- pipenv run pytest --cov
- pipenv run pycodestyle
- cd ..
- docker build --tag=jonaswinkler/paperless-ng .

after_success:
- pipenv run coveralls
1 change: 1 addition & 0 deletions Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ watchdog = "*"
pathvalidate = "*"
django-q = "*"
redis = "*"
imap-tools = "*"

[dev-packages]
coveralls = "*"
Expand Down
10 changes: 9 additions & 1 deletion Pipfile.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file added docs/_static/paperless-11-mail-filters.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/recommended_workflow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 7 additions & 3 deletions docs/administration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -294,10 +294,14 @@ Documents can be stored in Paperless using GnuPG encryption.

.. danger::

Decryption is depreceated since paperless-ng 1.0 and doesn't really provide any
Decryption is depreceated since paperless-ng 0.9 and doesn't really provide any
additional security, since you have to store the passphrase in a configuration
file on the same system as the encrypted documents for paperless to work. Also,
paperless provides transparent access to your encrypted documents.
file on the same system as the encrypted documents for paperless to work.
Furthermore, the entire text content of the documents is stored plain in the
database, even if your documents are encrypted. Filenames are not encrypted as
well.

Also, the web server provides transparent access to your encrypted documents.

Consider running paperless on an encrypted filesystem instead, which will then
at least provide security against physical hardware theft.
Expand Down
171 changes: 157 additions & 14 deletions docs/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,25 +3,168 @@
The REST API
************

.. warning::

This section is not updated to paperless-ng yet.

Paperless makes use of the `Django REST Framework`_ standard API interface
because of its inherent awesomeness. Conveniently, the system is also
self-documenting, so to learn more about the access points, schema, what's
accepted and what isn't, you need only visit ``/api`` on your local Paperless
installation.
Paperless makes use of the `Django REST Framework`_ standard API interface.
It provides a browsable API for most of its endpoints, which you can inspect
at ``http://<paperless-host>:<port>/api/``. This also documents most of the
available filters and ordering fields.

.. _Django REST Framework: http://django-rest-framework.org/

The API provides 5 main endpoints:

* ``/api/correspondents/``: Full CRUD support.
* ``/api/document_types/``: Full CRUD support.
* ``/api/documents/``: Full CRUD support, except POSTing new documents. See below.
* ``/api/logs/``: Read-Only.
* ``/api/tags/``: Full CRUD support.

All of these endpoints except for the logging endpoint
allow you to fetch, edit and delete individual objects
by appending their primary key to the path, for example ``/api/documents/454/``.

In addition to that, the document endpoint offers these additional actions on
individual documents:

* ``/api/documents/<pk>/download/``: Download the original document.
* ``/api/documents/<pk>/thumb/``: Download the PNG thumbnail of a document.
* ``/api/documents/<pk>/preview/``: Display the original document inline,
without downloading it.

.. hint::

Paperless used to provide these functionality at ``/fetch/<pk>/preview``,
``/fetch/<pk>/thumb`` and ``/fetch/<pk>/doc``. Redirects to the new URLs
are in place. However, if you use these old URLs to access documents, you
should update your app or script to use the new URLs.

Searching for documents
#######################

Paperless-ng offers API endpoints for full text search. These are as follows:

``/api/search/``
================

Get search results based on a query.

Query parameters:

* ``query``: The query string. See
`here <https://whoosh.readthedocs.io/en/latest/querylang.html>`_
for details on the syntax.
* ``page``: Specify the page you want to retrieve. Each page
contains 10 search results and the first page is ``page=1``, which
is the default if this is omitted.

Result list object returned by the endpoint:

.. code:: json
{
"count": 1,
"page": 1,
"page_count": 1,
"results": [
]
}
* ``count``: The approximate total number of results.
* ``page``: The page returned to you. This might be different from
the page you requested, if you requested a page that is behind
the last page. In that case, the last page is returned.
* ``page_count``: The total number of pages.
* ``results``: A list of result objects on the current page.

Result object:

.. code:: json
{
"id": 1,
"highlights": [
],
"score": 6.34234,
"rank": 23,
"document": {
}
* ``id``: the primary key of the found document
* ``highlights``: an object containing parseable highlights for the result.
See below.
* ``score``: The score assigned to the document. A higher score indicates a
better match with the query. Search results are sorted descending by score.
* ``rank``: the position of the document within the entire search results list.
* ``document``: The full json of the document, as returned by
``/api/documents/<id>/``.
Highlights object:
Highlights are provided as a list of fragments. A fragment is a longer section of
text from the original document.
Each fragment contains a list of strings, and some of them are marked as a highlight.
.. code:: json
"highlights": [
[
{"text": "This is a sample text with a "},
{"text": "highlighted", "term": 0},
{"text": " word."}
],
[
{"text": "Another", "term": 1},
{"text": " fragment with a highlight."}
]
]
When ``term`` is present within a string, the word within ``text`` should be highlighted.
The term index groups multiple matches together and words with the same index
should get identical highlighting.
A client may use this example to produce the following output:
... This is a sample text with a **highlighted** word. ... **Another** fragment with a highlight. ...
``/api/search/autocomplete/``
=============================
Get auto completions for a partial search term.
Query parameters:
* ``term``: The incomplete term.
* ``limit``: Amount of results. Defaults to 10.
Results returned by the endpoint are ordered by importance of the term in the
document index. The first result is the term that has the highest Tf/Idf score
in the index.
.. code:: json
[
"term1",
"term3",
"term6",
"term4"
]
.. _api-file_uploads:
POSTing Documents
=================
POSTing documents
#################
The API provides a special endpoint for file uploads:
``/api/documents/post_document/``
POST a multipart form to this endpoint, where the form field ``document`` contains
the document that you want to upload to paperless. The filename is sanitized and
then used to store the document in the consumption folder, where the consumer will
detect the document and process it as any other document.
File uploads in an API are hard and so far as I've been able to tell, there's
no standard way of accepting them, so rather than crowbar file uploads into the
REST API and endure that headache, I've left that process to a simple HTTP
POST.
The endpoint will immediately return "OK." if the document was stored in the
consumption directory.
31 changes: 25 additions & 6 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,8 @@ Changelog
paperless-ng 0.9.0
##################

* **Deprecated:** GnuPG. Don't use it. If you're still using it, be aware that it
offers no protection at all, since the passphrase is stored alongside with the
encrypted documents itself. This features will most likely be removed in future
versions.
* **Deprecated:** GnuPG. :ref:`See this note on the state of GnuPG in paperless-ng. <utilities-encyption>`
This features will most likely be removed in future versions.

* **Added:** New frontend. Features:

Expand All @@ -38,6 +36,25 @@ paperless-ng 0.9.0
multi user solution, however, it allows more than one user to access the website
and set some basic permissions / renew passwords.

* **Modified [breaking]:** All new mail consumer with customizable filters, actions and
multiple account support. Replaces the old mail consumer. The new mail consumer
needs different configuration but can be configured to act exactly like the old
consumer.


* **Modified:** Changes to the consumer:

* Now uses the excellent watchdog library that should make sure files are
discovered no matter what the platform is.
* The consumer now uses a task scheduler to run consumption processes in parallel.
This means that consuming many documents should be much faster on systems with
many cores.
* Concurrency is controlled with the new settings ``PAPERLESS_TASK_WORKERS``
and ``PAPERLESS_THREADS_PER_WORKER``. See TODO for details on concurrency.
* The consumer no longer blocks the database for extended periods of time.
* An issue with tesseract running multiple threads per page and slowing down
the consumer was fixed.

* **Modified [breaking]:** REST Api changes:

* New filters added, other filters removed (case sensitive filters, slug filters)
Expand All @@ -64,8 +81,8 @@ paperless-ng 0.9.0
* Rework of the code of the tesseract parser. This is now a lot cleaner.
* Rework of the filename handling code. It was a mess.
* Fixed some issues with the document exporter not exporting all documents when encountering duplicate filenames.
* Consumer rework: now uses the excellent watchdog library, lots of code removed.
* Added a task scheduler that takes care of checking mail, training the classifier and maintaining the document search index.
* Added a task scheduler that takes care of checking mail, training the classifier, maintaining the document search index
and consuming documents.
* Updated dependencies. Now uses Pipenv all around.
* Updated Dockerfile and docker-compose. Now uses ``supervisord`` to run everything paperless-related in a single container.

Expand All @@ -77,6 +94,8 @@ paperless-ng 0.9.0
* ``PAPERLESS_DEBUG`` defaults to ``false``.
* The presence of ``PAPERLESS_DBHOST`` now determines whether to use PostgreSQL or
sqlite.
* ``PAPERLESS_OCR_THREADS`` is gone and replaced with ``PAPERLESS_TASK_WORKERS`` and
``PAPERLESS_THREADS_PER_WORKER``. Refer to the config example for details.

* Many more small changes here and there. The usual stuff.

Expand Down
4 changes: 0 additions & 4 deletions docs/configuration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,3 @@ places.
Copy ``paperless.conf.example`` to any of these locations and adjust it to your
needs.

.. warning::

TBD: explain config options.
4 changes: 4 additions & 0 deletions docs/screenshots.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,10 @@ The old admin is still there and accessible!

.. image:: _static/paperless-9-admin.png

Fancy mail filters!

.. image:: _static/paperless-11-mail-filters.png

Mobile support in the future? This doesn't really work yet.

.. image:: _static/paperless-10-mobile.png
Expand Down
Loading

0 comments on commit 8395bdf

Please sign in to comment.