# Predict poverty from space

Folder organization :

* LICENSE  
* README.md
* Gaussian_Process.ipynb
* 0_data.ipynb
* data/ 
    *    daytime_images/  
    *    nightlights_intensities/  
        * F182010.v4d_web.stable_lights.avg_vis.tif  
    *    surveys/
        * DHS/
            * Rwanda_2010/
                * RWGE61FL.dbf
                * RWHR61FL.DTA
            * ...
        * LSMS-ISA/
            * ...
* models/
* papers/

In [18]:
import pandas as pd
import numpy as np
from simpledbf import Dbf5

PyTables is not installed. No support for HDF output.
SQLalchemy is not installed. No support for SQL output.


In [19]:
wealth_file = "./data/surveys/DHS/Rwanda_2010/RWHR61FL.DTA"
geo_file = "./data/surveys/DHS/Rwanda_2010/RWGE61FL.dbf"

## 1) Import wealth data

Le premier fichier (RWHR61FL.DTA) contient les résultats des sondages. Deux colonnes nous intéressent plus particulièrement : HV001 est un identifiant qui fait le lien avec les données géographiques et HV271 contient un indice de richesse de la région.

In [19]:
df_wealth = pd.read_stata(wealth_file)

In [39]:
result = df_wealth.groupby('hv001')['hv271'].median().reset_index()
result['hv271'] /= 100000.
print(result.columns)

Index(['hv001', 'hv271'], dtype='object')


## 2) Import geographical data

Le second fichier contient les coordonnées géographiques des sondages. Nous nous intéressons notamment à la colonne DHSCLUST qui fait le lien avec les clusters du fichier précédent ainsi qu'aux colonnes LATNUM et LONGNUM qui nous permettront d'obtenir l'intensité lumineuse de ces zones géographiques ainsi que les images satellites de jour.

In [23]:
dbf = Dbf5(geo_file)

In [24]:
df_geo = dbf.to_dataframe()

In [40]:
result2 = df[['DHSCLUST', 'LATNUM', 'LONGNUM']]
result2['DHSCLUST'] = result2['DHSCLUST'].astype(int)
print(result2.columns)

Index(['DHSCLUST', 'LATNUM', 'LONGNUM'], dtype='object')


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  


## 3) Merge wealth and geographical data

L'objectif de cette section est d'obtenir un tableau de données dans lequel on a l'indice de richesse en fonction des coordonnées géographiques.

In [43]:
final_result = pd.merge(result, result2, how='inner', left_on='hv001', right_on='DHSCLUST')[['DHSCLUST', 'LATNUM', 'LONGNUM', 'hv271']]
print(final_result.columns)

Index(['DHSCLUST', 'LATNUM', 'LONGNUM', 'hv271'], dtype='object')


In [45]:
final_result = final_result.rename(columns={'DHSCLUST': 'cluster', 'LATNUM': 'latitude', 'LONGNUM': 'longitude', 'hv271': 'wealth_index'})

In [48]:
print(final_result.columns)

Index(['cluster', 'latitude', 'longitude', 'wealth_index'], dtype='object')


## 4) Import nightlights intensities data

In [2]:
night_file = './data/nightlights_intensities/F182010.v4d_web.stable_lights.avg_vis.tif'

In [1]:
from osgeo import gdal, ogr, osr

In [8]:
dataset = gdal.Open(night_file, gdal.GA_ReadOnly)
print(type(dataset))

<class 'osgeo.gdal.Dataset'>


In [13]:
print("Driver: {}/{}".format(dataset.GetDriver().ShortName,
                             dataset.GetDriver().LongName))
print("Size is {} x {} x {}".format(dataset.RasterXSize,
                                    dataset.RasterYSize,
                                    dataset.RasterCount))
print("Projection is {}".format(dataset.GetProjection()))
geotransform = dataset.GetGeoTransform()
if geotransform:
    print("Origin = ({}, {})".format(geotransform[0], geotransform[3]))
    print("Pixel Size = ({}, {})".format(geotransform[1], geotransform[5]))

Driver: GTiff/GeoTIFF
Size is 43201 x 16801 x 1
Projection is GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433],AUTHORITY["EPSG","4326"]]
Origin = (-180.00416666665, 75.00416666665)
Pixel Size = (0.0083333333, -0.0083333333)


In [14]:
band = dataset.GetRasterBand(1)
print("Band Type={}".format(gdal.GetDataTypeName(band.DataType)))
      
min = band.GetMinimum()
max = band.GetMaximum()
if not min or not max:
    (min,max) = band.ComputeRasterMinMax(True)
print("Min={:.3f}, Max={:.3f}".format(min,max))
      
if band.GetOverviewCount() > 0:
    print("Band has {} overviews".format(band.GetOverviewCount()))
      
if band.GetRasterColorTable():
    print("Band has a color table with {} entries".format(band.GetRasterColorTable().GetCount()))

Band Type=Byte
Min=0.000, Max=63.000


In [42]:
band = dataset.GetRasterBand(1)
band_array = band.ReadAsArray()
print(band_array.mean())
print(np.shape(band_array))

0.6435769258995662
(16801, 43201)


## 5) Merge wealth, geographical and nightlights intensities data

L'objectif de cette partie est d'obtenir un tableau de données dans lequel on a, pour chaque région données par ses coordonnées, l'indice de richesse et l'indice de luminosité.

## 6) Try to predict wealth thanks to nightlights intensities

## 7) Download daytime images

## 8) Merge wealth, geographical, nightlights intensities and daytime images data

## 9) Try to predict wealth thanks to deep features from daytime images

## 10) Retrain VGG by predicting nightlights intensities thanks to daytime images

## 11) Try to predict wealth thanks to new deep features from daytime images