Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0b2cf15
commit e08a520
Showing
38 changed files
with
106,679 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,52 @@ | |||
## ODC Database Contents License | |||
|
|||
The Licensor and You agree as follows: | |||
|
|||
### 1.0 Definitions of Capitalised Words | |||
|
|||
The definitions of the Open Database License (ODbL) 1.0 are incorporated | |||
by reference into the Database Contents License. | |||
|
|||
### 2.0 Rights granted and Conditions of Use | |||
|
|||
2.1 Rights granted. The Licensor grants to You a worldwide, | |||
royalty-free, non-exclusive, perpetual, irrevocable copyright license to | |||
do any act that is restricted by copyright over anything within the | |||
Contents, whether in the original medium or any other. These rights | |||
explicitly include commercial use, and do not exclude any field of | |||
endeavour. These rights include, without limitation, the right to | |||
sublicense the work. | |||
|
|||
2.2 Conditions of Use. You must comply with the ODbL. | |||
|
|||
2.3 Relationship to Databases and ODbL. This license does not cover any | |||
Database Rights, Database copyright, or contract over the Contents as | |||
part of the Database. Please see the ODbL covering the Database for more | |||
details about Your rights and obligations. | |||
|
|||
2.4 Non-assertion of copyright over facts. The Licensor takes the | |||
position that factual information is not covered by copyright. The DbCL | |||
grants you permission for any information having copyright contained in | |||
the Contents. | |||
|
|||
### 3.0 Warranties, disclaimer, and limitation of liability | |||
|
|||
3.1 The Contents are licensed by the Licensor "as is" and without any | |||
warranty of any kind, either express or implied, whether of title, of | |||
accuracy, of the presence of absence of errors, of fitness for purpose, | |||
or otherwise. Some jurisdictions do not allow the exclusion of implied | |||
warranties, so this exclusion may not apply to You. | |||
|
|||
3.2 Subject to any liability that may not be excluded or limited by law, | |||
the Licensor is not liable for, and expressly excludes, all liability | |||
for loss or damage however and whenever caused to anyone by any use | |||
under this License, whether by You or by anyone else, and whether caused | |||
by any fault on the part of the Licensor or not. This exclusion of | |||
liability includes, but is not limited to, any special, incidental, | |||
consequential, punitive, or exemplary damages. This exclusion applies | |||
even if the Licensor has been advised of the possibility of such | |||
damages. | |||
|
|||
3.3 If liability may not be excluded by law, it is limited to actual and | |||
direct financial loss to the extent it is caused by proved negligence on | |||
the part of the Licensor. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,33 @@ | |||
# Data on religion and politics in India | |||
|
|||
## goagis | |||
|
|||
This table contains GIS coordinates and other spatial characteristics of polling booths in Goa | |||
|
|||
## Variables | |||
|
|||
name | description | |||
--- | --- | |||
ac_id_09 | ID code of the assembly segment assigned by the Election Commission (identical with all other post-delimitation codes, hence the _09) | |||
booth_id_14 | ID code of the polling booth assigned by the Election Commission for 2014 booths (together with ac_id_09, this should suffice for matching with other tables) | |||
booth_name_14 | Name of the polling booth assigned by the Election Commission for 2014 booths | |||
district_name_14 | Name of the district into which this polling booth is supposed to fall in 2014 (could be used for cleaning the data) | |||
latitude | Geographical latitude | |||
longitude | Geographical longitude | |||
modis | Urban area or not? Derived from MODIS polygon (see below) | |||
modis_rank | How urban? MODIS Scalerank (see below) | |||
|
|||
## Raw data | |||
|
|||
The 2014 data was originally scraped using the Firefox MozRepl plugin in conjunction with download.pl and the custom proxy server at proxy.pl on May 5, 2014 from "http://www.eci-polldaymonitoring.nic.in/psleci. The data used here is NOT cleaned up, and quality varies from district to district, so you need to be careful. The ID codes are the same used for the 2014 Lok Sabha elections. This dataset is identical with the data included in my (more comprehensive) [GIS Shapefiles](http://dx.doi.org/10.4119/unibi/2674065). | |||
|
|||
All three sets of point data were then dumped into CSVs, transformed into ESRI shapefiles using `ogr2ogr booths-locality.shp booths-locality.vrt` and matched manually against the MODIS polygon from [Naturalearth](http://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-urban-area/) using QGIS. The result was then exported back into booths-locality-modis.sqlite. | |||
|
|||
The final table was put together using `cat transform.sql | sqlite3`. | |||
|
|||
## License | |||
|
|||
While the database in its entirety is subject to an [ODC Open Database License](http://opendatacommons.org/licenses/odbl/), as explained in the main [README](https://github.com/raphael-susewind/india-religion-politics/blob/master/README.md) and [LICENSE](https://github.com/raphael-susewind/india-religion-politics/blob/master/LICENSE.md) files, the content of this specific table is factual data, and as such only subject to a simple [ODC Database Contents License](http://opendatacommons.org/licenses/dbcl/) (at the time of scraping, the respective websites did not display any copyright information). Code used for crawling and compilation is subject to a [CC-BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license. If you use the modis and modis_rank variables, the original authors ask that you additionally attribute then: | |||
|
|||
> Schneider, A., M. A. Friedl, D. K. McIver, and C. E. Woodcock (2003) Mapping urban areas by fusing multiple sources of coarse resolution remotely sensed data. Photogrammetric Engineering and Remote Sensing, volume 69, pages 1377-1386. | |||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,14 @@ | |||
<OGRVRTDataSource> | |||
<OGRVRTLayer name="booths-locality"> | |||
<SrcDataSource>booths-locality.csv</SrcDataSource> | |||
<GeometryType>wkbPoint</GeometryType> | |||
<LayerSRS>+proj=latlong +datum=WGS84</LayerSRS> | |||
<GeometryField encoding="PointFromColumns" x="longitude" y="latitude"/> | |||
<Field name="constituency" type="Integer"/> | |||
<Field name="station_name" type="String"/> | |||
<Field name="booth" type="Integer"/> | |||
<Field name="district_name" type="String"/> | |||
<Field name="latitude" type="String"/> | |||
<Field name="longitude" type="String"/> | |||
</OGRVRTLayer> | |||
</OGRVRTDataSource> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Original file line | Diff line number | Diff line change |
---|---|---|---|
@@ -0,0 +1,105 @@ | |||
#!/usr/bin/perl | |||
|
|||
# system("rm -f booths-locality.sqlite"); | |||
system("rm -f touched"); | |||
|
|||
use DBI; | |||
our $dbh = DBI->connect("dbi:SQLite:dbname=booths-locality.sqlite","","",{sqlite_unicode => 1}); | |||
$dbh->do ("CREATE TABLE booths (state CHAR, district INTEGER, district_name CHAR, constituency INTEGER, constituency_name CHAR, booth INTEGER, station_name CHAR, latitude FLOAT, longitude FLOAT)"); | |||
|
|||
$|=1; | |||
use utf8; | |||
|
|||
use WWW::Mechanize::Firefox; | |||
|
|||
# | |||
# Iterate through everything while simultaenously saving everything of relevance with a hidden proxy - pretty neat! | |||
# | |||
|
|||
system("./proxy.pl &"); | |||
|
|||
my $ua = WWW::Mechanize::Firefox->new(autodie=>0,activate=>1,autoclose=>0); | |||
our $pageloaded=0; | |||
|
|||
$ua->get("http://www.eci-polldaymonitoring.nic.in/psleci/default.aspx"); | |||
|
|||
my @statesraw = $ua->xpath(".//select[\@name='ddlState']/option"); | |||
my @states; my %statesname; foreach my $state (@statesraw) {next if $state->{'textContent'} =~ /Select/; push(@states,$state->{'value'}); $statesname{$state->{'value'}}=$state->{'textContent'};} | |||
|
|||
my $done=0; | |||
foreach my $state (@states) { | |||
|
|||
repeat: | |||
|
|||
next if ($done == 0 && $state ne "S25"); # TODO - to speed up crawling if almost all states are done ;-) | |||
$done=1; | |||
|
|||
print "Processing State ".$statesname{$state}."\n"; | |||
|
|||
my @forms = $ua->forms(); | |||
if (scalar(@forms) == 0) {$ua->get("http://www.eci-polldaymonitoring.nic.in/psleci/default.aspx"); goto repeat} | |||
|
|||
$ua->form_name('form1'); | |||
$ua->field('ddlState' => $state); | |||
$ua->eval('javascript:setTimeout("__doPostBack(\"ddlState\",\"\")", 0)'); | |||
|
|||
my $waittime=0; | |||
districtsraw: sleep 1; $waittime++; if ($waittime > 180) {goto repeat} | |||
my @districtsraw = $ua->xpath(".//select[\@name='ddlDistrict']/option"); | |||
if (scalar(@districtsraw == 1)) {goto districtsraw} | |||
sleep 1; | |||
|
|||
my @districts=(); my %districtname; foreach my $district (@districtsraw) {next if $district->{'textContent'} =~ /Select/; push(@districts,$district->{'value'}); $districtname{$district->{'value'}}=$district->{'textContent'}} | |||
if (scalar(@districts) ==0) {goto repeat} | |||
|
|||
district: foreach my $district (@districts) { | |||
my @check = $dbh->selectrow_array("SELECT * from booths WHERE state = ? AND district = ?",undef,$statesname{$state},$district); | |||
if (scalar(@check)>1) { print "|--> District ".$districtname{$district}." skipped\n"; next} | |||
|
|||
repeatdistrict: | |||
|
|||
print "|--> District ".$districtname{$district}."\n"; | |||
|
|||
$ua->form_name('form1'); | |||
|
|||
if ($ua->value('ddlState') ne $state) { # crashed somehow, redo it | |||
|
|||
$ua->form_name('form1'); | |||
$ua->field('ddlState' => $state); | |||
$ua->eval('javascript:setTimeout("__doPostBack(\"ddlState\",\"\")", 0)'); | |||
my $waittime=0; | |||
districtsrawagain: sleep 1; $waittime++; if ($waittime > 180) {goto repeatdistrict} | |||
my @districtsraw = $ua->xpath(".//select[\@name='ddlDistrict']/option"); | |||
if (scalar(@districtsraw == 1)) {goto districtsrawagain} | |||
sleep 1; | |||
} | |||
|
|||
$ua->form_name('form1'); | |||
$ua->field('ddlDistrict' => $district); | |||
$ua->eval('javascript:setTimeout("__doPostBack(\"ddlDistrict\",\"\")", 0)'); | |||
|
|||
my $waittime=0; | |||
acraw: sleep 1; | |||
my @acraw = $ua->xpath(".//select[\@name='ddlAC']/option"); | |||
$waittime++; if ($waittime > 180) {goto repeatdistrict} | |||
if (scalar(@acraw == 1)) {goto acraw} | |||
sleep 5; | |||
|
|||
$ua->click({xpath=>".//input[\@name='imgbtnFind']",synchronize=>0}); | |||
|
|||
my $waittime=0; | |||
while (!-e "touched") { | |||
sleep 1; $waittime++; if ($waittime > 180) { | |||
system("echo '".$statesname{$state}." ($state) - ".$districtname{$district}." ($district)' >> crashlog"); | |||
print "|--> Crashed, continue with next district\n"; | |||
next district; | |||
} | |||
} | |||
system("rm -f touched"); | |||
|
|||
} | |||
} | |||
|
|||
$dbh->disconnect; | |||
|
|||
system("killall proxy.pl"); |
Oops, something went wrong.