Browse files

Added Goa stuff

  • Loading branch information...
raphael-susewind committed Aug 9, 2017
1 parent 0b2cf15 commit e08a520eea082ea8ac339afcbfc040c67380fba1
@@ -16,6 +16,9 @@ table | description
[delhiid]( | ID matching and integration table for Delhi (see below)
[delhigis]( | GIS coordinates and other spatial characteristics of polling booths in Delhi
[delhirolls2014]( | Booth-level estimates of religious demography for 2014 across Delhi
[goaid]( | ID matching and integration table for Goa (see below)
[goagis]( | GIS coordinates and other spatial characteristics of polling booths in Goa
[goarolls2014]( | Booth-level estimates of religious demography for 2014 across Goa
[gujid]( | ID matching and integration table for Gujarat (see below)
[gujgis]( | GIS coordinates and other spatial characteristics of polling booths in Gujarat
[gujloksabha2014]( | Booth-level (form 20) results for the 2014 Lok Sabha election from Gujarat
@@ -74,3 +74,6 @@ pragma temp_store_directory = '.';
.read wbrolls2014/wbrolls2014-a.sql
.read wbrolls2014/wbrolls2014-b.sql
.read wbgis/wbgis.sql
.read goarolls2014/goarolls2014-a.sql
.read goarolls2014/goarolls2014-b.sql
.read goagis/goagis.sql
@@ -13,3 +13,4 @@ pragma temp_store_directory = '.';
.read orid/orid-b.sql
.read rajid/rajid-b.sql
.read wbid/wbid-b.sql
.read goaid/goaid-b.sql
@@ -0,0 +1,52 @@
## ODC Database Contents License
The Licensor and You agree as follows:
### 1.0 Definitions of Capitalised Words
The definitions of the Open Database License (ODbL) 1.0 are incorporated
by reference into the Database Contents License.
### 2.0 Rights granted and Conditions of Use
2.1 Rights granted. The Licensor grants to You a worldwide,
royalty-free, non-exclusive, perpetual, irrevocable copyright license to
do any act that is restricted by copyright over anything within the
Contents, whether in the original medium or any other. These rights
explicitly include commercial use, and do not exclude any field of
endeavour. These rights include, without limitation, the right to
sublicense the work.
2.2 Conditions of Use. You must comply with the ODbL.
2.3 Relationship to Databases and ODbL. This license does not cover any
Database Rights, Database copyright, or contract over the Contents as
part of the Database. Please see the ODbL covering the Database for more
details about Your rights and obligations.
2.4 Non-assertion of copyright over facts. The Licensor takes the
position that factual information is not covered by copyright. The DbCL
grants you permission for any information having copyright contained in
the Contents.
### 3.0 Warranties, disclaimer, and limitation of liability
3.1 The Contents are licensed by the Licensor "as is" and without any
warranty of any kind, either express or implied, whether of title, of
accuracy, of the presence of absence of errors, of fitness for purpose,
or otherwise. Some jurisdictions do not allow the exclusion of implied
warranties, so this exclusion may not apply to You.
3.2 Subject to any liability that may not be excluded or limited by law,
the Licensor is not liable for, and expressly excludes, all liability
for loss or damage however and whenever caused to anyone by any use
under this License, whether by You or by anyone else, and whether caused
by any fault on the part of the Licensor or not. This exclusion of
liability includes, but is not limited to, any special, incidental,
consequential, punitive, or exemplary damages. This exclusion applies
even if the Licensor has been advised of the possibility of such
3.3 If liability may not be excluded by law, it is limited to actual and
direct financial loss to the extent it is caused by proved negligence on
the part of the Licensor.
@@ -0,0 +1,33 @@
# Data on religion and politics in India
## goagis
This table contains GIS coordinates and other spatial characteristics of polling booths in Goa
## Variables
name | description
--- | ---
ac_id_09 | ID code of the assembly segment assigned by the Election Commission (identical with all other post-delimitation codes, hence the _09)
booth_id_14 | ID code of the polling booth assigned by the Election Commission for 2014 booths (together with ac_id_09, this should suffice for matching with other tables)
booth_name_14 | Name of the polling booth assigned by the Election Commission for 2014 booths
district_name_14 | Name of the district into which this polling booth is supposed to fall in 2014 (could be used for cleaning the data)
latitude | Geographical latitude
longitude | Geographical longitude
modis | Urban area or not? Derived from MODIS polygon (see below)
modis_rank | How urban? MODIS Scalerank (see below)
## Raw data
The 2014 data was originally scraped using the Firefox MozRepl plugin in conjunction with and the custom proxy server at on May 5, 2014 from " The data used here is NOT cleaned up, and quality varies from district to district, so you need to be careful. The ID codes are the same used for the 2014 Lok Sabha elections. This dataset is identical with the data included in my (more comprehensive) [GIS Shapefiles](
All three sets of point data were then dumped into CSVs, transformed into ESRI shapefiles using `ogr2ogr booths-locality.shp booths-locality.vrt` and matched manually against the MODIS polygon from [Naturalearth]( using QGIS. The result was then exported back into booths-locality-modis.sqlite.
The final table was put together using `cat transform.sql | sqlite3`.
## License
While the database in its entirety is subject to an [ODC Open Database License](, as explained in the main [README]( and [LICENSE]( files, the content of this specific table is factual data, and as such only subject to a simple [ODC Database Contents License]( (at the time of scraping, the respective websites did not display any copyright information). Code used for crawling and compilation is subject to a [CC-BY-NC-SA 4.0]( license. If you use the modis and modis_rank variables, the original authors ask that you additionally attribute then:
> Schneider, A., M. A. Friedl, D. K. McIver, and C. E. Woodcock (2003) Mapping urban areas by fusing multiple sources of coarse resolution remotely sensed data. Photogrammetric Engineering and Remote Sensing, volume 69, pages 1377-1386.
@@ -0,0 +1,14 @@
<OGRVRTLayer name="booths-locality">
<LayerSRS>+proj=latlong +datum=WGS84</LayerSRS>
<GeometryField encoding="PointFromColumns" x="longitude" y="latitude"/>
<Field name="constituency" type="Integer"/>
<Field name="station_name" type="String"/>
<Field name="booth" type="Integer"/>
<Field name="district_name" type="String"/>
<Field name="latitude" type="String"/>
<Field name="longitude" type="String"/>
@@ -0,0 +1,105 @@
# system("rm -f booths-locality.sqlite");
system("rm -f touched");
use DBI;
our $dbh = DBI->connect("dbi:SQLite:dbname=booths-locality.sqlite","","",{sqlite_unicode => 1});
$dbh->do ("CREATE TABLE booths (state CHAR, district INTEGER, district_name CHAR, constituency INTEGER, constituency_name CHAR, booth INTEGER, station_name CHAR, latitude FLOAT, longitude FLOAT)");
use utf8;
use WWW::Mechanize::Firefox;
# Iterate through everything while simultaenously saving everything of relevance with a hidden proxy - pretty neat!
system("./ &");
my $ua = WWW::Mechanize::Firefox->new(autodie=>0,activate=>1,autoclose=>0);
our $pageloaded=0;
my @statesraw = $ua->xpath(".//select[\@name='ddlState']/option");
my @states; my %statesname; foreach my $state (@statesraw) {next if $state->{'textContent'} =~ /Select/; push(@states,$state->{'value'}); $statesname{$state->{'value'}}=$state->{'textContent'};}
my $done=0;
foreach my $state (@states) {
next if ($done == 0 && $state ne "S25"); # TODO - to speed up crawling if almost all states are done ;-)
print "Processing State ".$statesname{$state}."\n";
my @forms = $ua->forms();
if (scalar(@forms) == 0) {$ua->get(""); goto repeat}
$ua->field('ddlState' => $state);
$ua->eval('javascript:setTimeout("__doPostBack(\"ddlState\",\"\")", 0)');
my $waittime=0;
districtsraw: sleep 1; $waittime++; if ($waittime > 180) {goto repeat}
my @districtsraw = $ua->xpath(".//select[\@name='ddlDistrict']/option");
if (scalar(@districtsraw == 1)) {goto districtsraw}
sleep 1;
my @districts=(); my %districtname; foreach my $district (@districtsraw) {next if $district->{'textContent'} =~ /Select/; push(@districts,$district->{'value'}); $districtname{$district->{'value'}}=$district->{'textContent'}}
if (scalar(@districts) ==0) {goto repeat}
district: foreach my $district (@districts) {
my @check = $dbh->selectrow_array("SELECT * from booths WHERE state = ? AND district = ?",undef,$statesname{$state},$district);
if (scalar(@check)>1) { print "|--> District ".$districtname{$district}." skipped\n"; next}
print "|--> District ".$districtname{$district}."\n";
if ($ua->value('ddlState') ne $state) { # crashed somehow, redo it
$ua->field('ddlState' => $state);
$ua->eval('javascript:setTimeout("__doPostBack(\"ddlState\",\"\")", 0)');
my $waittime=0;
districtsrawagain: sleep 1; $waittime++; if ($waittime > 180) {goto repeatdistrict}
my @districtsraw = $ua->xpath(".//select[\@name='ddlDistrict']/option");
if (scalar(@districtsraw == 1)) {goto districtsrawagain}
sleep 1;
$ua->field('ddlDistrict' => $district);
$ua->eval('javascript:setTimeout("__doPostBack(\"ddlDistrict\",\"\")", 0)');
my $waittime=0;
acraw: sleep 1;
my @acraw = $ua->xpath(".//select[\@name='ddlAC']/option");
$waittime++; if ($waittime > 180) {goto repeatdistrict}
if (scalar(@acraw == 1)) {goto acraw}
sleep 5;
my $waittime=0;
while (!-e "touched") {
sleep 1; $waittime++; if ($waittime > 180) {
system("echo '".$statesname{$state}." ($state) - ".$districtname{$district}." ($district)' >> crashlog");
print "|--> Crashed, continue with next district\n";
next district;
system("rm -f touched");
Oops, something went wrong.

0 comments on commit e08a520

Please sign in to comment.