Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Email alert tasks
Skipping email alerts
Want to load stuff into OpenAustralia.org but don't want alerts going out?
tl;dr Bump the timestamp and batch number in
alerts-lastsent which is in
alerts-lastsent file has two lines, the first is a Unix timestamp and the second is a batch ID.
After you've loaded debates into the DB and run indexing (
twfy/search/index.pl (indexing is run from
../search/index.pl /srv/www/www.openaustralia.org/shared/searchdb sincefile /srv/www/www.openaustralia.org/shared/searchdb/../searchdb-lastupdated)) update the first line to the current time as this will be later than the index you just did - you can find out the time with
date +%s. Then increment the batch ID on the second line and save the file.
Now when you run
alertmailer.php it shouldn't send out alerts for any of the debates you've just loaded. There's more detail in this helpful email from mySociety's Matthew Somerville.
I've never quite understood the relationship between the alert emailer and the xapian index. Unfortunately, looking through the code hasn't helped me much. :-(
Sorry, it's all a bit strewn about. I'll try and explain below.
Let's say I make some fixes to the parser and I regenerate all the xml files. I then reload this into the database and do a xapian reindex on everything. How can I do this without generating spurious email alerts the next day for any pages that have changed?
Unbelievably, it might just work, although there are a number of extra things you can do if you're worried about it (or just test it better than we ever did ;) ). Here's the full process of how email alerts runs, then some history, and then some actually maybe useful stuff ;)
Firstly, the database. When you load data with xml2db.pl, a brand new GID gets new created and modified timestamps set to NOW(). When you reload data, if the GID is the same then only the modified timestamp is updated to NOW(), the created doesn't change ever.
When running a Xapian index with index.pl, if you use "sincefile" as we do, then it only indexes GIDs that have a modified column dated since the last time it ran. If you run Xapian index with a date range etc., then it will index or reindex everything on those dates.
Whatever is decided to be indexed, each gets a new batch number (that goes up 1 each time you run Xapian indexing and it indexes something), and also gets a "created" value consisting of its hansard created column concatenated with its hpos.
When you run the alert email script, it only fetches those GIDs that have: a) a batch number larger than the highest batch number last time the script was run; b) a created higher than the last time the script was run; c) is later than the manual date given in the script (line 174ish).
So let's do a worked example. You've fixed a parser bug which has caused 10 GIDs on 2001-01-01 to change, and 5 new GIDs to appear on 2002-02-02. You ran the email alert script yesterday, before doing this, up to batch 100. You reload the XML for those two days, the 10 GIDs on 2001-01-01 get their modified column updated, the 5 new GIDS on 2002-02-02 get inserted. You reindex with sincefile, and those 15 GIDs get indexed (or you reindex with those two dates and everything on those 2 days gets indexed), with batch ID 101. You run the email alert script, which will only return things: * in batch ID 101 * created since yesterday - ie the 5 new GIDs from 2002-02-02, but not the 10 from 2001-01-01.
Hope that makes sense. Now, the original theory was that people would want to be alerted to new stuff even if it was old. But it turns out that a) they don't (they'll complain saying why have they got an alert for old stuff), b) all of our old "new" stuff wasn't actually new, but was because of parser fixes or because they make spelling corrections and our model means everything on that day gets a new GID in that case - which is awful but it's too late to change now...
So what we do. I try to make sure any changing of old content isn't added/indexed along with new stuff - ie. run major changes during holidays etc.. Then when the loading/indexing is done, I can manually bump the batch number/ timestamp in the alerts-lastsent file so that the alert script thinks it's been run on that data already and will just ignore it.
If there's some new stuff indexed as well, which you do want to send alerts for, then that's what the manual date in the alert script is for, to just say "ignore anything before this date completely". Bit yucky, I know, but then so's everything.
Phew, hope that's of use to you, and happy Easter!
Resetting email alerts
In development you want to send alerts multiple times. These commands will help:
# Totally reset (will send an alert for everything, ever) echo -e "0\n0" > twfy/scripts/alerts-lastsent # Reset to a certain point in time (will send alerts for all content since that date) echo -e "`date --date=20130823 +%s`\n0" > twfy/scripts/alerts-lastsent