diff --git a/build/getlogs b/build/getlogs index 4899210b..4bce01e7 100755 --- a/build/getlogs +++ b/build/getlogs @@ -10,7 +10,7 @@ cd logs mkdir -p tmpdownload cd tmpdownload -rsync -v --progress --delete -az -e "ssh" $USER:./www.publicwhip.org.uk_logs/ . +rsync -v --progress --delete -az -e "ssh" $USER:./www.publicwhip.org.uk_logs/*.gz . for X in access_log.*.gz do diff --git a/errata.txt b/errata.txt index f6ded631..02009577 100644 --- a/errata.txt +++ b/errata.txt @@ -1,24 +1,11 @@ Hansard errata -------------- -Ask about Gareth Thomas ambiguity: -http://www.publications.parliament.uk/pa/cm200203/cmhansrd/cm030120/debtext/30120-25.htm#30120-25_div56 - 27 Nov 2002, Division 10 has wrong ayes count or wrong vote list -- it says 341 ayes, but 240 are listed! +- it says 240 ayes, but 341 are listed! http://www.publications.parliament.uk/pa/cm200203/cmhansrd/vo021127/debtext/21127-30.htm -Why does a division 73 appear twice here, and where is division 74? (3 Feb 2003): +Why does a division 73 appear twice (identical!) here, and where is division 74? (3 Feb 2003): http://www.publications.parliament.uk/pa/cm200203/cmhansrd/cm030203/debindx/30203-x.htm http://www.publications.parliament.uk/pa/cm200203/cmhansrd/cm030204/debindx/30204-x.htm -Ask about corrections of divisions - are they fixed up properly in bound volume? -329 2002-10-28 is correction of 329 2002-10-23 -99 2003-03-06 is correctoin of 99 2003-03-04 - -Trivial -------- - -Ask about 253 2003-06-24 division labelled wrongly in index - - diff --git a/ideas.txt b/ideas.txt index 9f0c7525..d45f7279 100644 --- a/ideas.txt +++ b/ideas.txt @@ -1,10 +1,3 @@ -Data integrity --------------- - -Search for MPs who voted on both sides in one division -Check for an MP voting both for and against! See "Abstention" here: -http://www.parliament.uk/documents/upload/p09.pdf - Party politics -------------- @@ -37,6 +30,10 @@ Table of all MPs which have never voted - interesting to see Tables of worst attendance record (after those with an excuse, such as ministry posts, speaker, ill, SF...) +"Performance tests" for government - turning excessive monitoring and +testing back onto them. +corruptometer, loyaltometer, evilness, sleepometer, waffle-meter + Data anlaysis (using existing data) ------------- @@ -56,6 +53,14 @@ variation." - we could do this with MP clustering. Improve clustering distance algorithm See J Vaughan suggestions +Colour dots in cluster diagram by how many times they have voted. +Bright colours for more relevant the data - i.e. how many intersections +with other's votes there are. + +Play with stuff in vector search article +http://www.perl.com/pub/a/2003/02/19/engine.html +In particular PDL for speeding up octave algebra stuff + > Idea 2. Darren suggested that the reason Tony Blair is an outlier > in the java app is coz he only turns up to votes he thinks are > going to be controversial, hence ones that people are probably @@ -86,6 +91,9 @@ Why did this happen? Anomalies in Hansard. Email them to complain. Find three line whip definition Infer no Whip if 10% +- from base? Or at least +-1 +Animated cluster diagram over last 15 years. +3 month window moving week by week + Additional numeric data ----------------------- @@ -104,11 +112,13 @@ It is worth looking for MPs who spoke but did not vote. This is a good way to detect active abstentions. It may also have all sorts of other interesting meanings. division.php?date=2003-06-10&number=224&showall=yes +Count how many times MP spoke in a debate, or on the day Integrate parliamentary majority, and look for correlations with rebelliousness? Majorities here: http://www.psr.keele.ac.uk/area/uk/mps.htm (Should be no correlation, as reselection more important?) +Plot majority as a colour on the cluster diagram Analyse if MPs who are "sir" vote differently in anyway first check data integrity that title always has "Sir" for knights @@ -127,9 +137,27 @@ Collate all MPs articles in newspapers Regional analysis. Scotland, NI, Whales, North v South. Urban v. Rural. +Area of land for constituency. This gives a "ruralness" measure. +Population of constituency. + +Make cluster diagram for just divisions relating to one issue. Or +for one person's interested issues. Plot point on cluster diagram for +issues themselves. + +Value of the vote. What is the monetary expenditure cost of agreeing +the motion? Graph against time spent discussing, and see how silly the +correlation is. + +Measure lobbying power behind each issue (expenditure by interested +parties). Again, correlate to time spent on it. + Additional text content ----------------------- +Issue sub-selector. User can log in, name an issue, and say which way +votes should have gone to satisfy him on that issue. Get all manner of +people to make issues for next general election. + Software to follow legislation from Queen's speech Group votes by department, so you can see areas of interest (Sirius @@ -151,6 +179,10 @@ Link to draft of Bill which is being debated Usability --------- +Email reports to people when search queries change +e.g. When your MP has voted. When he has rebelled. When an issue is +voted on, and so on. + Show majority in division table - sort by which ones majority was least on? Link from MP to other sources of info @@ -162,6 +194,38 @@ Link from search engine to Links to other political resource websites +hansard.php - takes links to days and chunks, does a redirect +reduce bandwidth, and do tracking of where people link through to + +Log failed searches so we can improve the search engine + +Detect MS Java applet and upgrade it +FastCGI if our load gets high +mod_gzip to reduce bandwidth + +Usability (some sort of done - this is just some notes) + - make website name link back to homepage + - consider link titles http://www.useit.com/alertbox/980111.html + - about the authors, so feels personal to people + - consider breadcrumb trail + - about section (not all FAQ?) + - company name/logo at topleft, search at topright + - search input box on front page (http://www.useit.com/alertbox/20010513.html) + - print stylesheet media="print" removing menus + +Physicsl gimmicks +----------------- + +Actually post a whipping sheet to MPs. This would arrive every week at +the same time as their party whipping sheet. It would tell them how +many voters in their constituency have register with organisations which +would like them to vote particular ways. + +Make big wall chart of cluster diagram - colour, pretty +Maybe even sell it to people + +Newsletter (may be better than blog that you have) + About one MP ------------ diff --git a/todo.txt b/todo.txt index 3a2dc1f4..e886a43f 100644 --- a/todo.txt +++ b/todo.txt @@ -23,72 +23,36 @@ Facility to register and define voting subset for a particular issue Do Iraq subselection ourselves Do climate change subselection -Investigate EDMs +Investigate EDMs, legality of using them Look at majorities Website ------- +Put news (at least headlines) on front page + Make cluster diagram clearer in "highlights" section on front page Sort cluster diagram name entries by surname - rather than no order -Think more about excess motion text on bills - Consider changing support@ email addr to something less corporate sounding -Move distant metric stuff so it is uses the initial data, not the munged -data from the cluster diagram - -Change our use of the word "rebel" more consistently - Trim some opinion waffle out of Cluster text -News/blog of observations we make about things -Some kind of news system +Newsletter Some kind of comments system -Solicit help in some way? -Email address more prominant everywhere - -Log failed searches so we can improve the search engine Colour blind people, or indeed blind people, need a better rebel marker than redness in MPs division list. Boldness is one idea. -Log file grabbing for permanent keeps? - -hansard.php - takes links to days and chunks, does a redirect -reduce bandwidth, and do tracking of where people link through to - Find logo -Detect MS Java applet and upgrade it - -FastCGI if our load gets high -mod_gzip to reduce bandwidth - -Usability (some sort of done - this is just some notes) - - make website name link back to homepage - - consider link titles http://www.useit.com/alertbox/980111.html - - about the authors, so feels personal to people - - license link violates by popping up new window - should be in main window - - consider breadcrumb trail - - about section (not all FAQ?) - - company name/logo at topleft, search at topright - - search input box on front page (http://www.useit.com/alertbox/20010513.html) - - print stylesheet media="print" removing menus - -Make big wall chart of cluster diagram - colour, pretty -Maybe even sell it to people - -Play with stuff in vector search article -http://www.perl.com/pub/a/2003/02/19/engine.html -In particular PDL for speeding up octave algebra stuff - Scraper ------- +Check out tapiR, see if useful + Finish last few divisions that you don't have right Missing division 74 @@ -99,9 +63,17 @@ Check "Question accordingly..." fits with our counting Tally vote numbers in text and check they fit with our counting Deal with when an MP voted twice in one division +Search for MPs who voted on both sides in one division +Check for an MP voting both for and against! See "Abstention" here: +http://www.parliament.uk/documents/upload/p09.pdf Also I need to tidy the whole thing up to be more usable. Reduce the number of commands, make the pipeline more straightforward, and so it doesn't go wrong if you do things in the wrong order. +Make one script I have to run which just does everything (backup logs, +backup sf cvs repository, get latest divisions, upload to db) + +Improve motion text extraction + diff --git a/website/division.php b/website/division.php index c194e84c..c8cbe392 100644 --- a/website/division.php +++ b/website/division.php @@ -1,5 +1,5 @@ Party Summary"; print "

Votes by party, bold entries are a guess at the party - whip, red entries a guess at rebels.

"; + whip, red entries a guess at rebels. Abstentions are calculated + from the expected turnout, which is statistical based on the + average proporionate turnout for that party in all divisions. A + negative abstention indicates that more members of that party than + expected voted; this is always relative, so it could be that another + party has failed to turn out en masse.

"; # Precalc values $ayes = array(); @@ -102,8 +107,9 @@ # Make table print ""; - print ""; - $allparties = array_unique(array_merge(array_keys($ayes), array_keys($noes))); + print ""; + #$allparties = array_unique(array_merge(array_keys($ayes), array_keys($noes))); + $allparties = array_keys($alldivs); usort($allparties, strcasecmp); $votes = array_sum(array_values($ayes)) + array_sum(array_values($noes)); if ($votes <> $turnout) @@ -125,20 +131,21 @@ $alldiv = $alldivs[$party]; $expected = round($votes * ($alldiv / $alldivs_total), 1); - $extra = number_format(100 * $total / ($votes * ($alldiv / $alldivs_total)) - 100, 1); - if ($extra > 0) + $abstentions = $expected - $total; + $classabs = "normal"; + if (abs($abstentions) >= 2) { $classabs = "important"; } + + if ($aye > 0 or $noe > 0 or $abstentions >= 2) { - $extra = "+" . $extra; + $prettyrow = pretty_row_start($prettyrow); + print ""; + print ""; + print ""; + print ""; + print ""; + print ""; + print ""; } - - $prettyrow = pretty_row_start($prettyrow); - print ""; - print ""; - print ""; - print ""; - print ""; - print ""; - print ""; } print "
PartyAyesNoesTurnoutExpectedExtra Turnout
ExpectedAbstain
" . pretty_party($party) . "$aye$noe$total$expected$abstentions
" . pretty_party($party) . "$aye$noe$total$expected$extra%
"; diff --git a/website/news.php b/website/news.php index ca700d18..e1faa7cd 100644 --- a/website/news.php +++ b/website/news.php @@ -1,5 +1,5 @@ +

Detecting abstentions - 16 September 2003 by Francis

+

Quite often members deliberately refrain from voting in a division, +even if they are in the house so could have done so. Conversely, on an +important vote, the whip of one party will deliberately try and get a +higher turnout. A while ago Becka suggested a way of detecting these +effects.

+ +

You add up the turnouts for each party across all divisions +and end up with a percentage expected vote share per party. Then you +calculate, given the total turnout for this particular division, what +the percentage would lead you to expect. If the number of voters in +the party is much different from your expectation, then something +interesting is happening.

+ +

This calculation has been in Public Whip for a while, manifest as a +mysterious column of numbers on the party table in the division listing. +I've hopefully made it a bit clearer, using the terminology of +abstentions, and displating high abstention parties even if nobody in them +voted. Have a look at the recent Iraq and the UN vote, +where the Lib Dems proposed a motion. You can see from the large +abstention number for the Conservatives that the party whip must have +been to abstain. Indeed none of them voted at all.

+

Which Gareth Thomas? - 12 September 2003 by Francis

One of the things I'm doing at the moment is improving the quality of data for the current parliament. There are sometimes ommissions or diff --git a/website/publicwhip.css b/website/publicwhip.css index 4d953042..5074f03a 100644 --- a/website/publicwhip.css +++ b/website/publicwhip.css @@ -1,4 +1,4 @@ -/* $Id: publicwhip.css,v 1.3 2003/09/09 14:26:08 frabcus Exp $ +/* $Id: publicwhip.css,v 1.4 2003/09/17 12:01:33 frabcus Exp $ The Public Whip, Copyright (C) 2003 Francis Irving and Julian Todd This is free software, and you are welcome to redistribute it under @@ -68,6 +68,7 @@ hr.topline */ table td.rebel { background-color: #ee7777; } table td.whip { font-weight: bold; } table td.percent { text-align: right; } +table td.important { font-style: italic; font-weight: bold; } table { margin: 0; padding: 0;