Web application offering browsing, search, retrieval, addition, and deletion of documents in a repository, with user registration, authentication, and directory-specific authorization.
This application is a web server that manages, and provides selective access to, a repository of documents.
The intended use case is a person or organization that has possession, on its own server, of a collection of documents in various formats and wants to make various parts of the collection accessible for various actions by various categories of users using web browsers.
This project was initially developed at Learners Guild in the course of an apprenticeship in full-stack web development. The learning objectives served by the project included:
- Encrypted server-client communication
- Cookie-based session persistence
- Role-based authorization
- Web-database integration
- Web-email integration
- User administration
- Web-filesystem integration
- File-access permission management
- Cross-format document relevance discovery
- Document display and delivery
- Controlled distributed document repository modification
- Security of administrative and user secrets
- Protection of customizations from deletion by updates
pg (node-postgres), the SendGrid Web API, and
pdftotext, a utility available in the
poppler-utils package. Searching is currently based on those operations.
The application is a work in progress. Its intended functionalites include the following (“*” = not yet implemented):
User identity capabilities:
- Login with temporary username (“UID”) issued on registration.
User document capabilities:
- Browse through the directory tree.
- Browser-based return to previous tree nodes.
- *Breadcrumb-based return to previous tree nodes.
- Display and download specific documents.
- Search with query strings for documents a user is authorized to see.
- Filesystem-based document addition and deletion.
- *Browser-based document addition and deletion.
Role-based document access:
- Distinct permissions for reading, adding, and deleting.
- Directory-specific permissions.
- Propagation of permissions to subdirectories.
- Multi-role users having the union of their role permissions.
- Pruning of redundant entries in displayed directory trees.
Administrator (“curator”) capabilities:
- File-based customization of the application configuration (see below).
- Registration as a curator with a secret code.
- *Web-based definition of user roles (“categories”).
- *Web-based assignment of permissions to categories.
- Assignment of users to categories.
- Assignment of permanent UIDs to users.
- Editing of user registration records.
- Deregistration of users.
- Events triggering notices:
- User registration.
- User deregistration.
- Curator editing of a user registration record.
- Curator deregistration of a user.
- Parties receiving notices:
- Affected user.
- Performing curator.
- Application administrator.
- Events triggering notices:
- File-based whole-application language localization.
- *User-based dynamic localization.
Suggestions on priorities for the further development of the project, and of course bug reports, are welcome. Feel free to file issues at the repository.
Efforts have been made, and are continuing, to make the user interface comply with level AAA of [WCAG 2.1][wcag] so that the application is reasonably accessible to persons with disabilities.
Accessibility features include:
- Natural navigation order
- Explicit main and sectional region structure
- Visible focus
- Mouse-free operability
- Semantic headings
- Descriptive titling
- Contrastive colors
- Purpose-labeled controls
- Appearance of form-error messages after the offending elements
- Declared page language
The application is designed so that the texts in its interface can exist in multiple, linguistically distinct, versions, and choices among the versions can be made. This feature is described in the configuration instructions.
As distributed for installation, the application is configured to allow you to replicate the demonstration cited above, including the sample documents.
To navigate back up the document tree when browsing, use the browser’s back button.
These instructions presuppose that (1) npm, PostgreSQL, and pdftotext are installed, (2) there is a PostgreSQL database cluster, (3) PostgreSQL is running, (4) when you connect to the cluster you are a PostgreSQL superuser, and (5) your PostgreSQL configuration permits trusted local IPv4 connections from you and from the PostgreSQL user that this application will create. If you get authentication errors running the
revive_dbscript described below, you can edit your
pg_hba.conffile, which may be located in
/usr/local/var/postgres. Insert the following lines above the existing similar line of type
host, then restart postgreSQL with the applicable command on your server, such as
sudo service postgresql restartor
pg_ctl restart. You will replace «docsearchowner» with the value of
PGUSERthat you choose (see below).
host all «you» 127.0.0.1/32 trust host all «docsearchowner» 127.0.0.1/32 trust
Your copy of this project will be located in its own directory, inside some other directory that you may choose or create. For example, to create that parent directory inside your own home directory’s
Documentssubdirectory and call it
projects, you can execute:
Make that parent directory your working directory, by executing, for example:
Clone this project’s repository into it, thereby creating the project directory, named
docsearch, by executing:
git clone https://github.com/jrpool/docsearch.git docsearch
Make the project directory your working directory by executing:
Create a directory named
If there is no
access.logfile in the
logsdirectory, rename the
access-init.logfile there to
Obtain an account at SendGrid. For development or light production use, the free plan with a limit of 100 messages per day will suffice. (Each complete user registration entails sending 4 messages.) Note the API key that SendGrid issues to you.
Create a file named
.envat the root of your project directory and populate it with the following content, amended as you wish. This file will be protected from modification by any updates of the application. Details:
PUBLIC_CATare the categories whose users are to have the access rights of curators (maximum rights) and of the general public (minimum rights), respectively.
DAEMONcan be left as is, but, if you install two or more instances of this application on the same server, each must have a distinct value of
MSGSshould have the values
demomsgswhile you are running the demonstration. When you add your own data and configuration, change these to match the names you give to your directories in the
src/dbdirectories and the file containing your messages. Updates of the application may update
demomsgs, but will not interfere with your own customizations of these, as long as you give them different names.
LINK_PREFIXis equal to any application prefix you use with a reverse proxy server, or
''if none. For example, if requests to
https://yourdomain.org/docs/…are passed to the application, the value should be
- If you are doing development on the application, change the value of
- See below for information about the
LANGvariable, and above for information about the
PGUSERmust be unique to this installation if you have multiple installations on the same host. They both are deleted and recreated in the course of installation, so
PGUSERshould exist only for this installation.
PGUSERis a PostgreSQL user, but not necessarily an operating-system user.
PORTis the port the application will listen for requests on. If users will connect via a reverse proxy server, make it a port that the host’s firewall does not permit incoming traffic to address. (Letting users connect directly to the port is considered secure only if user clients are on the same host as the application, because otherwise unencrypted transmission of all content, including passwords and confidential documents, will occur.)
STYLESHEETis the base of the name of your stylesheet file in
public. You can leave it as
demostyle. If you want to customize any styles, copy
demostyle.cssto a differently named file, customize the copy, and reference its filename in
TEMP_UID_MAXvalue is the largest number of registrants you expect to still have temporary UIDs at the same time, before curators assign permanent UIDs to them.
URLis the URL the application will tell users to use in reaching the application. Whether it specifies
httpsdepends on the user’s required behavior, not on the protocol used by the application itself (see the next paragraph).
- Decide whether to make the application require the
httpsprotocol. You may have it use
httpand still require users to connect with
https, by passing all requests through a reverse proxy server that communicates with users via
httpsbut with the application via
http. The deployed live demonstration does this. It uses Nginx as a reverse proxy server, with credentials obtained from
- If users connect with
HTTPS_CERTto the path to your SSL/TLS certificate.
HTTPS_KEYto the path to your SSL/TLS private key.
- If users connect with
- If users connect with
COOKIE_EXPIRE_DAYS=7 CURATOR_CAT=0 CURATOR_KEY=ASecretKey DAEMON=demodocsearch DOC_DIR=docs DOMAIN=yourdomain.org FROM_EMAILfirstname.lastname@example.org FROM_NAME='Documents from Your Organization' HTTPS_CERT=/etc/letsencrypt/live/yourdomain.org/fullchain.pem HTTPS_KEY=/etc/letsencrypt/live/yourdomain.org/privkey.pem LANG=eng LINK_PREFIX=/ds MSGS=msgs NODE_ENV=production PGDATABASE=demodocs PGHOST=localhost PGPASSWORD=null PGPORT=5432 PGUSER=demodocmaster # PORT must be 1024 or greater to allow a non-root process owner. PORT=3000 PROTOCOL=https PUBLIC_CAT=1 REG_EMAILemail@example.com REG_NAME='Your Administrator' SECRET=AnAuthenticationSecret SEED_DIR=seed SENDGRID_API_KEY=wHaTeVer.SenDGriDgIvEs.YoU STYLESHEET=demostyle TEMP_UID_MAX=3 URL=https://www.yourdomain.org/ds/
Install required dependencies (you can see them listed in
package.json) by executing
npm i. The dependencies that this installs will depend on whether you defined the Node environment as
productionin step 0.
Create your document directory (named in
public, as the root of your repository. Populate it with subdirectories and files. You may include symbolic links in it, and users with access to those links will also have access to the files and directories that they reference. This feature offers you the ability to grant multiple categories of users access to a particular file or directory without the need to make copies of it. But the feature requires care, because it is possible to mistakenly include a symbolic link to directories and files, anywhere in your file system, that you intend not to disclose.
Create your seed directory (named in
src/db. Copy the
demoseedfiles into it. Edit them to define the categories of users you want to have and their access rights to directories in your repository. The user access rights must conform to this application’s fundamental principle that permission to do something to a directory implies permission to do the same thing to all of its descendants. The names of categories in
seedcat.sqlare internal to the database, so they should each begin with a letter or
_and contain only letters, digits, and
_(thus, no spaces).
src/server/demomsgs.jsinto the same
src/serverdirectory, giving your copy the name you specified as
MSGS. In your copy, modify the values of the properties in the
engobject to conform to your requirements. Among the properties that you will probably need to redefine are
If you wish to add an additional language, add an object like
engto your message file, replacing the English values of the properties with strings in the other language. Name the new object with the ISO 639-3 alpha-3 code of that language. Add it to the export list at the end of the file. To make that language the language of the application’s user interface, replace
engwith that code as the value of the
LANGenvironment variable in your
.envfile. This version of the application does not yet support on-the-fly localization per user or browser preferences.
Once the application is installed, create and populate the database by executing
npm run revive_db.
There are 3 ways to start the application. In each case, make the project directory your working directory first.
If you have chosen to install a development environment, execute
npm run start_dev. This will run the application under
nodemon, automatically restarting the application when you change files or their content, to ensure that the changes are live.
If you have installed a production environment and want to test it, execute
If you have installed a production environment and want to launch it as a daemon, so it is detached from your command-line environment and it restarts when the server reboots, execute
npm run start_daemon. If you want to stop the application after that, execute
npm run stop_daemon. (On some systems it is necessary to execute these commands as a superuser, namely as
sudo npm run start_devand
sudo npm run stop_daemon.)
In a production environment, both start methods cannot be relied on to adapt to any changes you make in the code. So, if you have made changes and want to test them, stop the application with
npm run stop_daemonand then start it again.
To access the application while it is running, use a web browser to request the application’s port on your server, such as:
When you access the application with your browser, register yourself as a curator. To obtain curator status, enter the CURATOR_KEY value into the “For administrative use” text field. Then, when you log in, you will be a curator.