New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixes bug 1106992, 1145487, 1146984 - Centos7 major packaging / infra update #2685
Conversation
Install the Socorro repository. | ||
:: | ||
sudo rpm -ivh https://s3-us-west-2.amazonaws.com/org.mozilla.crash-stats.packages-public/el/7/noarch/socorro-public-repo-1-1.el7.centos.noarch.rpm | ||
|
||
Now you can actually install the packages: | ||
:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a sudo yum makecache
step before install
, just to be sure.
Concerning the systemd unit files: Given the environmental requirements and the length of the Per-service example:# socorro-collector.service (snipped for brevity)
EnvironmentFile=-/etc/sysconfig/socorro-collector
ExecStart=$CMD $CMD_OPTS # sysconfig/socorro-collector
VENV="/data/socorro/socorro-virtualenv"
CMD="/usr/bin/envconsul"
CMD_OPTS="-once -upcase=false -prefix socorro/collector $VENV/bin/uwsgi -H $VENV -M --need-app -w wsgi.collector -s /var/run/uwsgi/socorro-collector.sock --chmod-socket=664 --uid=socorro --gid=nginx" Global example:# socorro-collector.service (snipped for brevity)
EnvironmentFile=-/etc/sysconfig/socorro
ExecStart=$ENVCONSUL_BIN $ENVCONSUL_OPTS -prefix socorro/collector $VENV/bin/uwsgi -H $VENV -M --need-app -w wsgi.collector -s $UWSGI_DIR/socorro-collector.sock --chmod-socket=$SOCKET_MODE --uid=$USER --gid=$GROUP" # sysconfig/socorro
VENV="/data/socorro/socorro-virtualenv"
ENVCONSUL_BIN="/usr/bin/envconsul"
ENVCONSUL_OPTS="-once -upcase=false"
UWSGI_DIR="/var/run/uwsgi"
USER="socorro"
GROUP="nginx"
SOCKET_MODE="664" Note that these are strictly off the top of my head, so YMMV. 😁 Update: I hadn't yet read about the so-called "Emperor and Vassal" approach, which definitely has merit, and would play into the suggestions I've made above should we choose to roll that way. (cf. bug 1145487) |
:: | ||
sudo service httpd start | ||
sudo chkconfig httpd on | ||
rabbitmq-server elasticsearch httpd mod_wsgi memcached socorro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure about httpd
and mod_wsgi
here given that we're now using Nginx and uwsgi? Maybe s/httpd\smod_wsgi/nginx/;
or so?
Concerning Also, the name |
fi | ||
|
||
# create ElasticSearch indexes | ||
echo "Creating ElasticSearch indexes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elasticsearch (below, too). 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning scripts/install.sh and scripts/package.sh (and yes, I've said this before), but at some point we'll want to replace FPM with proper packaging. We don't need to fix this today, but we should probably create a bug for it - could be good for an intern or junior?
sgtm, but some things I'd note:
- we're going to start breaking up the repo into separate modules and therefore will need separate packages (planning to do
socorro-collector
first)- the build system in the current
socorro
module is difficult to deal with, it'll be easier with the new split-up modules
- the build system in the current
- we're probably going to want debian packages too (which was part of the impetus for fpm)
- the new split-up socorro modules are going to be proper python packages, so we might be able to get away from having to do distro-specific packaging (say we used something like https://github.com/progrium/buildstep and had generic treatment of "web" and "worker" classes of app)
Concerning #!/usr/bin/env bash
function help {
echo "USAGE: ${0} <role>"
echo "Valid roles are: postgres, webapp, whatever."
exit 1
}
function validate {
# No argument? That's a paddlin'.
if [ "x${1}" == "x" ]; then
help
fi
# Invalid function? That's a paddin'.
VALID_FUNC=`type -t $1 | grep -q function`
if [ $? != 0 ]; then
help
fi
}
function postgres {
echo "this is the postgres function"
}
function webapp {
echo "this is the webapp function"
}
function whatever {
echo "this is the whatever function"
}
# Aaaaand go!
validate $1
echo "Initialising ${1}."
$1
exit 0 |
|
||
# create DB if it does not exist | ||
# TODO handle DB not on localhost - could use setupdb for this | ||
su - postgres -c "psql breakpad -c ''" > /var/log/socorro/setupdb.log 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this stuff was run as part of the package install, and thus, implicitly as root; however, now that it has been split out into a separate script, su
's should likely be run via sudo
. Either that, or we mandate that the entire script be run via sudo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to embed sudo
in the script, that seems too surprising - I'd rather the script just check if you're running as root and fail if not, then people can sudo
or su
as appropriate.
I realise that this is contextual and explanatory, but we should be careful here going forward: the example above implies that each node is running a Consul server when in fact, each node needs to have (at least) a Consul agent. Globally this speaks to a larger issue we have with regards to our packaging, install scripts, documentation, etc, where we can't decide whether everything is supposed to run on one node by default or not. I think we should strongly consider elevating the idea of "roles" throughout - it would allow us to more cleanly describe every element both programatically and in documentation. Thoughts ❓ |
Yes this is tricky, I have found that most people who want to try out Socorro (and even for production) besides us tend to have low volume, and want to run it on a single node or maybe just a few. I want to make initial setup as simple as we can, while making it clear how to distribute services if that's what you need - any ideas for how to expose this consistently in the docs would be great! |
daad0be adopts the sysconfig model for systemd unit files (yay) - but where is the sysconfig file itself? We should at least include a functioning sample. |
@phrawzty just testing this, and one thing we're missing - we need to install Should we just make these a dependency for the RPM? |
@phrawzty also - this drops in a nginx config fragment to |
@phrawzty btw systemd doesn't seem to be playing nice with Having similar problems running the actual command, haven't worked those out quite yet... |
@phrawzty http://0pointer.de/blog/projects/on-etc-sysinit.html talks about "what's wrong with /etc/sysconfig (...) Why might it make sense to fade out use of these files in a systemd world" |
@phrawzty so (as you are no doubt aware) Fedora recommends sysconfig, although Lennart obviously doesn't: I think the problem is we can't use it for |
OK - tested all Socorro-specific roles and PR currently works OK - Socorro apps start up fine, but have no Consul agent to connect to, work on that is happening elsewhere. |
Oh BTW - crontabber is the only remaining app that isn't running under |
PR 40 addresses a bunch of Consul-related stuff, including the addition of both consul (agent) and envconsul to the Base AMI. We should be careful about making them dependencies for the RPM since not everybody will be using our Yum repo - this is a policy decision more than a technical one. I don't time to form an opinion on this right now, heh.
👍 |
This is a battle that pits a young tool with no established best practices against a legacy of behaviour that's never been particularly consistent. There are no winners. 😾 I don't understand why those variables aren't working (the In the interest of ⏩ let's keep the reverted (non-sysconfig) version for now. We can re-examine this issue later - in fact, we should, since it may help to explain how to better configure and run stuff under systemd in general. |
I would guess that the issue with systemd is it is parsing the config and seeing that the string |
Everybody using our docs should be using our repo. If they don't want to that's fine, but they need to figure it out. |
I take this back - turns out it wasn't hard at all. I think I can get this too, I just need to adjust the |
102f1ae
to
cf29cbe
Compare
23e5419
to
59f93cb
Compare
@phrawzty OK tested this by activating all of the roles on a single node, and (given a correct config) we seem to work in "distributed mode" (using S3 and RabbitMQ) I found a problem though - processor can't push crashes into ES, we get:
We've got ES 1.4 in AWS, so I tried swapping out the |
OK I can:
The only problem I've found is supersearch doesn't seem to work:
@phrawzty @AdrianGaudebert do you know if the middleware has been tested with the new ES classes ^? Middleware is configured like this:
|
Yep! The former is for <= 0.90, while the latter is for >= 1.0 . |
@rhelmer That should be a configuration problem. Make sure that |
Cool, thanks! I set this and it works, I saved it in the |
c7d000e
to
8f75dd2
Compare
@phrawzty ok this is all rebased down, I've done a lot of testing on it and I think it's good to go. I am not thrilled with the docs, but I think they are good enough for the moment except for one thing - we need to provide an example of how to use consul, since that's what our RPM supports now. One advantage we have now is that our infra is all public, I think what I should do is make the |
I dared not believe this day would ever come. 😂
We need to establish how we intend to use Consul before we can craft examples. Let's do some light policy work here and then we'll put in a new PR for for doc, OK?
Sure, we just need to be really careful about the contents of that repo - for example, revealing the names of private buckets. |
Excepting the talking points above (which don't prevent this from landing, imho), this PR is
|
48d7e43
to
d9b56f8
Compare
fixes bug 1106992, 1145487, 1146984 - Centos7 major packaging / infra update
r? @phrawzty - I need to get figure out how to get this associated with bugs properly (it fixes several), so don't merge yet :) I think we should land them all together to minimize breaking external users - I think the docs aren't quite good enough yet in particular, but the following all WFM:
/usr/bin/setup-socorro.sh
I've been testing this out on our AMI and it seems to work! Now that some of the "socorro lite" work has landed, you can run collection+processing without any additional services running, except for consul which is now required for all Socorro services:
First you must onfigure collector to use WSGI rather than default built-in web.py server:
(@twobraids we should really change the name of that config key to remove the
ApacheMod
part and just call itsocorro.webapi.servers.WSGI
)Then just start the services:
This will store both raw and processed crashes to
~socorro/crashes
, and will scan the filesystem rather than using a queue. Storing crashes to ES/S3/PG and enabling RabbitMQ are just a matter of setting the right keys/values in consul.You should be able to submit crashes, and they should be processed successfully (both raw
.json
and.dump
and processed.jsonz
files are stored in~socorro/crashes
):For a distributed setup we don't want to share a filesystem, so you'll need to turn RabbitMQ and S3 on via consul.
The webapp has more dependencies before it'll work:
PG and ES need to be set up (NOTE this script assumes they are on localhost, it should fail gracefully if that's not the case. Also it's safe to re-run the script, it won't destroy anything already set up):
The RPM drops nginx sample configs into
/etc/nginx/conf.d
that listen on the vhostscrash-reports
(collector),crash-stats
(webapp), andsocorro-middleware
(this one only listens on localhost since it's not safe, don't want anyone to accidentally expose it)The webapp will give you 404s for the default
WaterWolf
unless you either usesocorro setupdb
's--fakedata
option, or set up a new product via the admin UI per http://socorro.readthedocs.org/en/latest/configuring-socorro.html (maybe we should have thesetup-socorro
script do some/all of this?)