Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation with recommendations on running Datasette in production without using Docker #514

Open
chrismp opened this issue Jun 21, 2019 · 23 comments

Comments

@chrismp
Copy link

commented Jun 21, 2019

I've got some SQLite databases too big to push to Heroku or the other services with built-in support in datasette.

So instead I moved my datasette code and databases to a remote server on Kimsufi. In the folder containing the SQLite databases I run the following code.

nohup datasette serve -h 0.0.0.0 *.db --cors --port 8000 --metadata metadata.json > output.log 2>&1 &.

When I go to http://my-remote-server.com:8000, the site loads. But I know this is not a good long-term solution to running datasette on this server.

What is the "correct" way to have this site run, preferably on server port 80?

@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 22, 2019

I'm still trying to figure this out myself.

I'm confident that running nginx on port 80 and using it to proxy traffic to Datasette is a sensible way to solve the port problem.

As for running Datasette itself: the two options that seem best to me are some kind of Init.d service or running it under supervisord. I have to admit I haven't worked out the necessary incantation for either of those yet: the solitary instance I have that's not running as a Docker container is sitting in a "screen" instance for the moment!

@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 22, 2019

A section in the Datasette docs that acts as recommendations plus a tutorial for running Datasette on a VPS without using a Docker would be excellent.

@simonw simonw changed the title Looking for help setting up datasette on server that isn't easily supported in the docs Documentation with recommendations on running Datasette in production without using Docker Jun 22, 2019
@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 22, 2019

This is also relevant to Datasette Library #417

@russss

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2019

On most modern Linux distros, systemd is the easiest answer.

Example systemd unit file (save to /etc/systemd/system/datasette.service):

[Unit]
Description=Datasette
After=network.target

[Service]
Type=simple
User=<username>
WorkingDirectory=/path/to/data
ExecStart=/path/to/datasette serve -h 0.0.0.0 ./my.db
Restart=on-failure

[Install]
WantedBy=multi-user.target

Activate it with:

$ sudo systemctl daemon-reload
$ sudo systemctl enable datasette
$ sudo systemctl start datasette

Logs are best viewed using journalctl -u datasette -f.

@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 22, 2019

@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 22, 2019

... and @russss also suggested systemd 21 seconds before I posted that!

@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 22, 2019

Here are some partial notes I have saved from an nginx configuration I've used in the past:

cat /etc/nginx/sites-available/default
server {
	listen 80 default_server;
	listen [::]:80 default_server;

        location / {
                proxy_pass http://127.0.0.1:8001/;
		proxy_set_header Host $host;
        }
...
@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 22, 2019

This example is useful to - I like how it has a Makefile that knows how to set up systemd: https://github.com/pikesley/Queube

@russss

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2019

This example is useful to - I like how it has a Makefile that knows how to set up systemd: https://github.com/pikesley/Queube

I wasn't even aware it was possible to add a systemd service at an arbitrary path, but it seems a little messy to me.

Maybe worth noting that systemd does support per-user services which don't require root access. Cool but probably overkill for most people (especially when you're going to need root to listen on port 80 anyway, directly or via a reverse proxy).

@chrismp

This comment has been minimized.

Copy link
Author

commented Jun 22, 2019

WorkingDirectory=/path/to/data

@russss, Which directory does this represent?

@russss

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2019

WorkingDirectory=/path/to/data

@russss, Which directory does this represent?

It's the working directory (cwd) of the spawned process. In this case if you set it to the directory your data is in, you can use relative paths to the db (and metadata/templates/etc) in the ExecStart command.

@chrismp

This comment has been minimized.

Copy link
Author

commented Jun 22, 2019

WorkingDirectory=/path/to/data

@russss, Which directory does this represent?

It's the working directory (cwd) of the spawned process. In this case if you set it to the directory your data is in, you can use relative paths to the db (and metadata/templates/etc) in the ExecStart command.

In my case, on a remote server, I set up a virtual environment in /home/chris/Env/datasette, and when I activated that environment I ran pip install datasette.

My datasette project is in /home/chris/datatsette-project, so I guess I'd use that directory in the WorkingDirectory parameter?

And the ExecStart parameter would be /home/chris/Env/datasette/lib/python3.7/site-packages/datasette serve -h 0.0.0.0 my.db I'm guessing?

@chrismp

This comment has been minimized.

Copy link
Author

commented Jun 22, 2019

@russss

Actually, here's what I've got in /etc/systemd/system/datasette.service

[Unit]
Description=Datasette
After=network.target

[Service]
Type=simple
User=chris
WorkingDirectory=/home/chris/digital-library
ExecStart=/home/chris/Env/datasette/lib/python3.7/site-packages/datasette serve -h 0.0.0.0 databases/*.db --cors --metadata metadata.json
Restart=on-failure

[Install]
WantedBy=multi-user.target

I ran:

$ sudo systemctl daemon-reload
$ sudo systemctl enable datasette
$ sudo systemctl start datasette

Then I ran:
$ journalctl -u datasette -f

Got this message.

Hint: You are currently not seeing messages from other users and the system.
      Users in groups 'adm', 'systemd-journal', 'wheel' can see all messages.
      Pass -q to turn off this notice.
-- Logs begin at Thu 2019-06-20 00:05:23 CEST. --
Jun 22 19:55:57 ns331247 systemd[16176]: datasette.service: Failed to execute command: Permission denied
Jun 22 19:55:57 ns331247 systemd[16176]: datasette.service: Failed at step EXEC spawning /home/chris/Env/datasette/lib/python3.7/site-packages/datasette: Permission denied
Jun 22 19:55:57 ns331247 systemd[16184]: datasette.service: Failed to execute command: Permission denied
Jun 22 19:55:57 ns331247 systemd[16184]: datasette.service: Failed at step EXEC spawning /home/chris/Env/datasette/lib/python3.7/site-packages/datasette: Permission denied
Jun 22 19:55:58 ns331247 systemd[16186]: datasette.service: Failed to execute command: Permission denied
Jun 22 19:55:58 ns331247 systemd[16186]: datasette.service: Failed at step EXEC spawning /home/chris/Env/datasette/lib/python3.7/site-packages/datasette: Permission denied
Jun 22 19:55:58 ns331247 systemd[16190]: datasette.service: Failed to execute command: Permission denied
Jun 22 19:55:58 ns331247 systemd[16190]: datasette.service: Failed at step EXEC spawning /home/chris/Env/datasette/lib/python3.7/site-packages/datasette: Permission denied
Jun 22 19:55:58 ns331247 systemd[16191]: datasette.service: Failed to execute command: Permission denied
Jun 22 19:55:58 ns331247 systemd[16191]: datasette.service: Failed at step EXEC spawning /home/chris/Env/datasette/lib/python3.7/site-packages/datasette: Permission denied

When I go to the address for my server, I am met with the standard "Welcome to nginx" message:

Welcome to nginx!
If you see this page, the nginx web server is successfully installed and working. Further configuration is required.

For online documentation and support please refer to nginx.org.
Commercial support is available at nginx.com.

Thank you for using nginx.
@russss

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2019

I'd rather not turn this into a systemd support thread, but you're trying to execute the package directory there. Your datasette executable is probably at /home/chris/Env/datasette/bin/datasette.

@chrismp

This comment has been minimized.

Copy link
Author

commented Jun 23, 2019

@russss

Thanks, just one more thing.

I edited datasette.service:

[Unit]
Description=Datasette
After=network.target

[Service]
Type=simple
User=chris
WorkingDirectory=/home/chris/digital-library
ExecStart=/home/chris/Env/datasette/bin/datasette serve -h 0.0.0.0 databases/*.db --cors --metadata metadata.json
Restart=on-failure

[Install]
WantedBy=multi-user.target

Then ran:

$ sudo systemctl daemon-reload
$ sudo systemctl enable datasette
$ sudo systemctl start datasette

But the logs from journalctl show this datasette error:

Jun 23 23:31:41 ns331247 datasette[1771]: Error: Invalid value for "[FILES]...": Path "databases/*.db" does not exist.
Jun 23 23:31:44 ns331247 datasette[1778]: Usage: datasette serve [OPTIONS] [FILES]...
Jun 23 23:31:44 ns331247 datasette[1778]: Try "datasette serve --help" for help.

But the databases directory does exist in the directory specified by WorkingDirectory. Is this a datasette problem or did I write something incorrectly in the .service file?

@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 23, 2019

I suggest trying a full path in ExecStart like this:

ExecStart=/home/chris/Env/datasette/bin/datasette serve -h 0.0.0.0 /home/chris/digital-library/databases/*.db --cors --metadata /home/chris/digital-library/metadata.json

That should eliminate the chance of some kind of path confusion.

@chrismp

This comment has been minimized.

Copy link
Author

commented Jun 24, 2019

@simonw

This comment has been minimized.

Copy link
Owner

commented Jun 24, 2019

I'm suspicious of the wildcard. Does it work if you do the following?

ExecStart=/home/chris/Env/datasette/bin/datasette serve -h 0.0.0.0 /home/chris/digital-library/databases/actual-database.db --cors --metadata /home/chris/digital-library/metadata.json

If that does work then it means the ExecStart line doesn't support bash wildcard expansion. You'll need to create a separate script like this:

#!/bin/bash
/home/chris/Env/datasette/bin/datasette serve -h 0.0.0.0 /home/chris/digital-library/databases/*.db --cors --metadata /home/chris/digital-library/metadata.json

Then save that as /home/chris/digital-library/run-datasette.sh and try this:

ExecStart=/home/chris/digital-library/run-datasette.sh
@chrismp

This comment has been minimized.

Copy link
Author

commented Jun 25, 2019

Yep, that worked to get the site up and running at my-server.com:8000 but when I edited run-datasette.sh to contain this...

#!/bin/bash
/home/chris/Env/datasette/bin/datasette serve -h 0.0.0.0 -p 80 /home/chris/digital-library/databases/*.db --cors --metadata /home/chris/digital-library/metadata.json

I got this error.

Jun 25 02:42:41 ns331247 run-datasette.sh[747]: [2019-06-25 02:42:41 +0200] [752] [INFO] Goin' Fast @ http://0.0.0.0:80
Jun 25 02:42:41 ns331247 run-datasette.sh[747]: [2019-06-25 02:42:41 +0200] [752] [ERROR] Unable to start server
Jun 25 02:42:41 ns331247 run-datasette.sh[747]: Traceback (most recent call last):
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "uvloop/loop.pyx", line 1111, in uvloop.loop.Loop._create_server
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "uvloop/handles/tcp.pyx", line 89, in uvloop.loop.TCPServer.bind
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "uvloop/handles/streamserver.pyx", line 95, in uvloop.loop.UVStreamServer._fatal_error
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "uvloop/handles/tcp.pyx", line 87, in uvloop.loop.TCPServer.bind
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "uvloop/handles/tcp.pyx", line 26, in uvloop.loop.__tcp_bind
Jun 25 02:42:41 ns331247 run-datasette.sh[747]: PermissionError: [Errno 13] Permission denied
Jun 25 02:42:41 ns331247 run-datasette.sh[747]: During handling of the above exception, another exception occurred:
Jun 25 02:42:41 ns331247 run-datasette.sh[747]: Traceback (most recent call last):
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "/home/chris/Env/datasette/lib/python3.7/site-packages/sanic/server.py", line 591, in serve
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:     http_server = loop.run_until_complete(server_coroutine)
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "uvloop/loop.pyx", line 1451, in uvloop.loop.Loop.run_until_complete
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "uvloop/loop.pyx", line 1684, in create_server
Jun 25 02:42:41 ns331247 run-datasette.sh[747]:   File "uvloop/loop.pyx", line 1116, in uvloop.loop.Loop._create_server
Jun 25 02:42:41 ns331247 run-datasette.sh[747]: PermissionError: [Errno 13] error while attempting to bind on address ('0.0.0.0', 80): permission denied
Jun 25 02:42:41 ns331247 run-datasette.sh[747]: [2019-06-25 02:42:41 +0200] [752] [INFO] Server Stopped
@JesperTreetop

This comment has been minimized.

Copy link

commented Jul 8, 2019

@chrismp: Ports 1024 and under are privileged and can usually only be bound by a root or supervisor user, so it makes sense if you're running as the user chris that port 8000 works but 80 doesn't.

See this generic question-and-answer and this systemd question-and-answer for more information about ways to skin this cat. Without knowing your specific circumstances, either extending those privileges to that service/executable/user, proxying them through something like nginx or indeed looking at what the nginx systemd job has to do to listen at port 80 all sound like good ways to start.

At this point, this is more generic systemd/Linux support than a Datasette issue, which is why a complete rando like me is able to contribute anything.

@chrismp

This comment has been minimized.

Copy link
Author

commented Jul 8, 2019

In datasette.service, I edited

User=chris

To...

User=root

It worked. I can access http://my-server.com. I hope this is safe. Thanks for all the help, everyone.

@simonw

This comment has been minimized.

Copy link
Owner

commented Jul 9, 2019

Running as root isn't ideal because it means that if there are any security vulnerabilities in Datasette an attacker could use them to execute any command they like as root on your machine.

I'm moderately confident there aren't any vulnerabilities like that, but I'm definite not 100% certain!

My recommendation is to run Datasette on 127.0.0.1 port 8001 and then have nginx proxy port 80 to it. See #514 (comment) for suggested nginx configuration.

@ipmb

This comment has been minimized.

Copy link

commented Oct 8, 2019

If you are just using Nginx to open a reserved port, systemd can do that on its own. https://www.freedesktop.org/software/systemd/man/systemd.socket.html.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.