Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Google Summer of Code 2017
Clone this wiki locally
Google Summer of Code 2017
PySAL is inviting students to join in PySAL's development by applying for Google Summer of Code 2017. This is the second year PySAL will be seeking to participate, and we hope to again work under the umbrella of the Python Software Foundation (PSF).
PySAL is an open source library of spatial analysis functions written in Python intended to support the development of high level applications. See our documentation for more details. The developer guide describes in more details how to make contributions to PySAL and our work flow for contributing to the project. Our issues are also on github, which include bug reports and 'wishlist' items and enhancement plans and ideas.
If you are interested in participating in GSoC as a student, the best approach is to become an active and engaged contributor to the project right away. You should take a look at some of the existing issues on GitHub and see if there are any you think you might be able to take a crack at. Try submitting a pull request for something and start getting the hang of the process and interacting with the PySAL code base and development community.
Guidelines and Prerequisites
Students should start by reading the guidelines for participation. Google also provides guidelines to help with writing a proposal as part of their GSoC Student Guide. It is a good idea to start on your proposal early, post a draft to the pysal-dev mailing list and iterate based on the feedback you receive. This will not only improve the quality of your proposal, but also help you find a suitable mentor.
Please note that as a sub-organization of the PSF (and active members of the Python community), we ask that all mentors and students working with PySAL abide by the Python Community Code of Conduct.
Below are a listing of possible projects that students might consider. We also encourage students to propose their own projects, though several of the following topics are relatively high on our priority list. Our priority list is flexible, and it is important that the topic matches the interest and background of the student.
When considering the following projects, don't be put off by the knowledge prerequisites -- you don't need to be an expert, and there is some scope for research and learning within the GSoC period. However, familiarity with and interest in the subject area and involved technologies will be helpful!
Point Pattern Analysis (PPA) Module
Point pattern analysis (PPA) is the study of the spatial arrangements of points in (usually 2D) space. Currently, there are very few options for conducting comprehensive PPA in Python. A preliminary module has been developed for PySAL which is a first step in this direction, however, extension of this module with unit-tests, examples, new functions/statistical tests, etc would be an excellent GSoC project. The goal is not neccesarily to be as comprehensive as say R's spatstat* package, but to support as much of the PPA workflow as possible in Python.
Specific activities/goals include:
- additional tests/additional test coverage
- optimization of envelopes and simulation based inference
- algorithmic improvements and speedups
- additional statistical tests and generating processes
- development of educational resources
Difficulty level: beginner to intermediate
Knowledge of PPA theory and mathematical/statistical properties of 2D point processes is required. The primary goal at this stage is API development and extensions of tests, optimizations may be done as needed.
Expected outcomes: a set of production-ready tests and data generating functions for PPA to rival other languages/packages!
PySAL was originally conceived as a library implementing advanced spatial statistics and econometric methods. Given that there were many different visualization toolkits in the Python ecosystem as well as GIS packages, visualization was not a focus of our library. However, over time users of PySAL wanted the ability to visualize the results of the computations that the analytical components provided. In response a contributed module viz was developed to explore alternative approaches towards providing light-weight visualization for PySAL.
The goal of the viz module is to provide a simple to use and lightweight interface that connects PySAL to different popular visualization toolkits. While much progress has been made, there is more that can be done on the viz project as the visualization space is one that is constantly evolving.
Specific activities for the viz project include:
- Refinement and extension of the matplotlib interface (e.g. legends, views for analytics, regression object plots)
- Development of interactive visualizations in jupyter
- Exploration of potential interfaces for alternative packages (e.g., Bokeh, folium, D3)
Difficulty level: intermediate
Bayesian Spatial Models
Many of the models in
pysal.spreg have long been able to be estimated using Bayesian methods. However, due to the lack of support for the simultaneous autoregressive specifications in common Bayesian spatial analysis packages, many statistical users end up writing custom Gibbs samplers for new model specifications.
To help the Bayesian computation community in Python and the spatial analysis community generally, a project demonstrating implementations of the common SAR specifications in
pysal.spreg, in addition to spatial gaussian process models, would provide a set of common reference implementations for Bayesian Spatial Econometrics. These implementations could target either PyMC3 or Stan, but the goal would be to provide examples that allow HMC techniques to be used to estimate common spatial econometric models.
To make these estimation techniques efficient, we anticipate interested candidates possibly needing familiarity with sparse matrix techniques & libraries in python, namely
scipy.sparse. This module may be rolled together with with the new multilevel SAR-Error model estimators in
spvcm. Together, this would include any custom classes, distributions, or utilities required to state & estimate models efficiently in either PyMC3 or Stan, as well as examples demonstrating how to do so.
- Familiarity with Theano, Numpy, Stan, and PyMC3
- Background or familiarity with econometric methods and techniques
- Basic understanding of Bayesian statistics, particularly Bayesian linear models or Gaussian process models
- Bannerjee, G. and B. Carlin and A. Gelfand. 2014. Hierarchical Modeling and Analysis for Spatial Data
- LeSage, J. and R.K. Pace. 2010. Introduction to Spatial Econometrics
Difficulty Level: intermediate
Explicitly spatial unsupervised learning (regionalization)
The field of regionalization (Duque, Ramos, & Suriñach, 2007) is a subdomain that aims to bring space explicitly into the grouping of observations into consistent categories. In essence, the idea is to cluster observations based on a given set of attributes --similar to how it would be performed in traditional unsupervised learning-- but to restrict the groupings by imposing a spatial constraint (usually, the observations be contiguous geographically). The result is thus the geographic aggregation of small areas into consistent and coherent regions.
Currently, there is an excellent package purely written in Python (
clusterpy). However, it is Python 2 compatible only and it is not fully integrated with
PySAL, so the workflow is not smooth to work with the rest of the eco-system (e.g.
geopandas data structures), ultimately compromising its more general adoption.
This project will focus on three specific lines of work:
- Designing and implementing an architecture for
clusterpythat allows it to be fully integrated in the pydata-geo eco-system (e.g.
clusterpyfunctionality to be Python 3 compatible.
- Extending the suite of regionalization algorithms implemented.
Difficulty level: intermediate/advanced
PySAL is an open source project and as such we invite contributions from any interested developer. If you have an idea for an enhancement for PySAL please contact one of the developers to discuss the possibilities for the project in GSOC17.
* Note: spatstat is licenced under the Gnu GPL, so its code base is not compatible with that of PySAL.
- January 19-February 9 organizations apply
- Feruary 27 organizations announced
- February 27-March 20 students discuss applications with mentoring organizations
- March 20 - April 3 Student application period
- May 4 Accepted student proposals announced
- May 5 - May 29 community bonding
- May 30 - Aug 29 coding
- September 6 results announced
Student Application Template
Python Software Foundation's student application template.