A hierarchical architecture of deconvolution networks which is made to learn to identify cars from an aerial view. The architecture does so by looking at different resolutions of the image to process context and focus on promising areas. Each level of the hierarchy tries to predict a heatmap of a certain resolution
report at this link