-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting up a Zotero translation-server instance for citation metadata #82
Comments
Here is a brief summary of the web server config (I will put more details later): #!/bin/bash
# Update packages
sudo apt update
sudo apt upgrade -y
# Install nginx, nodejs, npm and supervisor
sudo apt install nginx supervisor nodejs npm
# Update npm to the latest version
sudo npm i npm -g
# Add new user "translate"
...
# Clone translation-server repo and set it up
...
# Configure supervisor to run translation-server at boot time
...
# Install certbot to take care of SSL
...
# Configure Nginx
...
# Reboot the box Here is nginx config file: # HTTP/HTTPS translation-server
server {
if ($host != translate.manubot.org) {
return 404;
}
listen 80;
listen 443 ssl;
server_name translate.manubot.org;
ssl_certificate /etc/letsencrypt/live/translate.manubot.org/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/translate.manubot.org/privkey.pem; # managed by Certbot
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
location / {
proxy_pass http://127.0.0.1:1969;
proxy_set_header X-Forwarded-Host $server_name;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
} And here is the supervisor config file which starts the translation-server process at boot time: [program:translation-server]
command=node src/server.js
directory=/home/translate/translation-server
stdout_logfile=/var/log/supervisor/translation-server.log
stderr_logfile=/var/log/supervisor/translation-server.err
user=translate
group=translate
autostart=true
autorestart=true
priority=991
stopsignal=KILL |
Thanks @dongbohu for the setup protocol. I've got a few questions/ideas for additional improvements. What command do we run to add an ssh public key to allow administrator login? If the npm application crashes, will supervisor restart it? The goal here should be to make this server very robust without the need for human attention. Is there a way we can monitor usage? Specifically, it'd be nice to see a summary of what User Agents were calling the API (see #83) and what API calls users are making. When I run We might want to implement caching, since it's best to minimize the outgoing API calls such that we don't exceed rate limits. Where is the right place to implement caching? On the nginx level or at the translation-server level? |
@dhimmel I tweaked supervisor cofig file so that the translation-server process will automatically restart after crash. The logging output is now in About the cache, it is probably easier to configure it on Nginx. I will try it. |
@dongbohu and I looked into the cache, but it wasn't working. II am thinking it is because we our requests are passed as the POST body. Do we need to do something along the lines of the approach here? |
Merges #84 Closes #70 Refs zotero/translation-server#51 Queries Manubot's translation-server instance at https://translate.manubot.org. Setup information for this instance is available at #82 Adds functions in manubot.cite.zotero to query the web, search, and export endpoints of a translation-server. Adds tests in test_zotery.py covering this functionality. translation-server queries are now used for USL and ISBN citation metadata retrieval. These methods now use fallbacks, such that metadata is first collected via translation-server, but if that fails, then the preexisting methods are used.
Updating translation-serverHere's the sequence of commands @dongbohu and I used today to update translation-server with new commits. sudo su translate
cd /home/translate/translation-server
git checkout -- package-lock.json
git pull --ff-only --recurse-submodules
# update just the translators submodule for the latest translatros
git submodule update --remote --merge modules/translators
npm install
exit
# Restart supervisor which restarts translation-server
sudo systemctl restart supervisor |
Noting the location of log output according to
|
Release Upgrade FailLooking to upgrade node to resolve zotero/translation-server#140, I upgraded the
This worked via SSH. I then ran Looking at my Google Compute Engine console, I don't think I have permission to see the VM for |
Hey @dhimmel, sorry for the delay. I've given you owner permissions on the |
Where would I see that? I don't think I can see the GCS project the VM is under.
I think the upgrade only partially worked, since |
Right, sorry, I guess entity-level permissions are hard to see if you don't have permission to view the project. I've added you as a Compute Engine owner to the project-level IAM, so you should be able to see (and, if you like, reboot) the translation server at https://console.cloud.google.com/compute/instances?project=fast-fire-224415. Regarding the update, good point there; I didn't catch it hadn't finished updating. I've gone ahead and updated all the packages that were candidates for installation. Also, just FYI, I've also enabled serial connections on the VM, so if you find yourself unable to connect again via SSH you could attempt to debug it via the serial interface. You can access that using the "Connect to Serial Console" on the VM details page near the top. By default it'll show the kernel log, but if you press enter you'll see a prompt for your username (same as the one you usually use) and then your password. If you'd like to set your password yourself, you should be able to do so via Finally, what would you think about starting over with the translation server in Docker with Google's Container-Optimized OS? Basically, you can give it a Docker image and it'll deploy it to the machine. It'll handle security updates, etc. on its own; generally it's less flexible, but also more secure and less fussy than managing a full Ubuntu installation ourselves. |
That sounds great. We don't need much flexibility. Also will be nice if we can restore old containers should a new one break. Would SSL be handled by Google? There's a basic Dockerfile provided in the translation-server repo. The node base image is outdated, so made a PR to update it at zotero/translation-server#142. Where would we build the Docker image? Locally or some other way? Or use the version on docker hub if they update it. |
So the GC VM is now running Ubuntu 22.04, but @falquaddoomi, I see there is translation-container in addition to the existing VM. Is that the Google Container-Optimized OS you mentioned? And if so, are we ready to switch |
Hey @dhimmel, apologies again for the delay. That machine was just to test a Docker-based setup for the translation server; the VM is running Container-optimized OS, and the image that's deployed to it is I see that you've been working with the Zotero folks on updates to the translation server, and it looks like your latest changes have been merged into master, but apparently not built and deployed to DockerHub yet. We could check out the translation server repo on the new VM, build a fresh image from master (or whichever version/commit you think is stable enough), and use that until they update DockerHub. Another thing that's blocking me from just switching to Container-optimized OS is the TLS termination question: we could either use a GCP load balancer for an additional cost that'll handle TLS termination, or we could run nginx + certbot as we're doing now. I'd prefer the latter for the lower cost, but it's not straightforward to launch multiple containers automatically (via the GCP instance metadata) on a single VM. I've looked into it a bit, and it is possible to rewrite the metadata manually to launch multiple containers, but I haven't had time lately to work on and test it. On the other hand, if we just want something quick and (mildly) dirty, it is easy to manually launch additional containers (COS is just running Docker, after all), so I could just launch nginx and certbot and have them set to start at boot separately from the instance metadata mechanism. |
I am supportive of building an image ourselves using the latest master commit from
I'm not going to be of much help here. So feel free to proceed as you see fit. It's also fine with me if we get node 14+ on the current VM with Ubuntu 22.04, since that would unblock us from upgrading translation-server to the latest master commit. |
I've upgraded nodejs on the Ubuntu 22.04 VM to v14.21.2, then followed the procedure from #82 (comment) to update the translation server. It seems to have upgraded without issue, and the translation server at https://translate.manubot.org seems to be functional, at least for the sample queries listed in Zotero's readme. Let me know if anything seems to be amiss with it and I'll look into it ASAP. Regarding moving to container-optimized OS (COS), I still think it's a desirable option since we'd get automatic security updates and the convenience of just launching a container. I'd still like to run nginx alongside the translation-server container so we don't have to pay for a load balancer just to provide SSL, but I haven't had time to look into a nice way to run multiple containers on COS. Anyway, I built an image from Zotero's current translation-server master and pushed it to a GCP artifact repo I set up under the manbuot project; the image is at |
In #70, we discussed using some of Zotero's infrastructure for generating citation metadata. Specifically, the translation-server, which has translators for many webpages.
I touched bases with the translation-server community in zotero/translation-server#51 and there didn't seem to be any public instances. Therefore, we could either host a public instance for Manubot users or have Manubot users set up their own instances. While installing the nodejs package is not difficult, it does create a node dependency in an otherwise Python repository.
Therefore @dongbohu and I decided to spin up a translation-server at
https://translate.manubot.org
. It's currently hosted in a Google Cloud instance and we're still finalizing the setup. I'll let @dongbohu comment with the configuration details. While it'd be great to have a reproducible way to exactly configure this instance (perhaps via Terraform), we opted for the convenience of a traditional instantiation process.The text was updated successfully, but these errors were encountered: