Skip to content

Google Summer of Code 2016

neohe23 edited this page Mar 2, 2016 · 17 revisions

Google Summer of Code 2016

PySAL is inviting students to join in PySAL's development by applying for Google Summer of Code 2016. As this is the first year PySAL will be seeking to participate, we hope to work under the umbrella of the Python Software Foundation (PSF).

Introduction

PySAL is an open source library of spatial analysis functions written in Python intended to support the development of high level applications. See our documentation for more details. The developer guide describes in more details how to make contributions to PySAL and our work flow for contributing to the project. Our issues are also on github, which include bug reports and 'wishlist' items and enhancement plans and ideas.

If you are interested in participating in GSoC as a student, the best approach is to become an active and engaged contributor to the project right away. You should take a look at some of the existing issues on GitHub and see if there are any you think you might be able to take a crack at. Try submitting a pull request for something and start getting the hang of the process and interacting with the PySAL code base and development community.

Guidelines and Prerequisites

Students should start by reading the guidelines for participation. Google also provides guidelines to help with writing a proposal as part of their GSoC Student Guide. It is a good idea to start on your proposal early, post a draft to the pysal-dev mailing list and iterate based on the feedback you receive. This will not only improve the quality of your proposal, but also help you find a suitable mentor.

Please note that as a sub-organization of the PSF (and active members of the Python community), we ask that all mentors and students working with PySAL abide by the Python Community Code of Conduct.

Project Ideas

Below are a listing of possible projects that students might consider. We also encourage students to propose their own projects, though several of the following topics are relatively high on our priority list. Our priority list is flexible, and it is important that the topic matches the interest and background of the student.

When considering the following projects, don't be put off by the knowledge prerequisites -- you don't need to be an expert, and there is some scope for research and learning within the GSoC period. However, familiarity with and interest in the subject area and involved technologies will be helpful!

Point Pattern Analysis (PPA) Module

Point pattern analysis (PPA) is the study of the spatial arrangements of points in (usually 2D) space. Currently, there are very few options for conducting comprehensive PPA in Python. A preliminary module has been developed for PySAL which is a first step in this direction, however, extension of this module with unit-tests, examples, new functions/statistical tests, etc would be an excellent GSoC project. The goal is not neccesarily to be as comprehensive as say R's spatstat* package, but to support as much of the PPA workflow as possible in Python.

Specific activities/goals include:

  • additional tests/additional test coverage
  • optimization of envelopes and simulation based inference
  • algorithmic improvements and speedups
  • additional statistical tests and generating processes
  • development of educational resources

Difficulty level: beginner to intermediate

Knowledge of PPA theory and mathematical/statistical properties of 2D point processes is required. The primary goal at this stage is API development and extensions of tests, optimizations may be done as needed.

Expected outcomes: a set of production-ready tests and data generating functions for PPA to rival other languages/packages!

Mentors: David Folch, Jay Laura

Spatial Interaction Modeling

Spatial Interaction (SpInt) modeling seeks to model and predict the flow or movement of individuals, goods, or information through space. It provides a common conceptual framework that is utilized within a diversity of disciplines such as economics, geography, and urban planning, to name a few. At the same time, numerous technical specifications have been proposed, along with various conceptual enahancements based on geographic and statistical theory. At present, there is only Python code avaialble for legacy methods, so the primary goal of this project would be to make different spatial interaction modeling methods more widely avaialble.

Specific activities/goals include:

  • Extend PySAL's spatial regression classes for econometric flow models.
  • Implement flow-based spatial weights.
  • Add functions to compute spatial eigenvector filter and competing destination variables for flow models.
  • Develop new classes for deterministic "universal" models and neural network flow models.
  • Explore exploratory data analysis and diagnostic tools for flows.
  • Design associated unit tests and educational resources.

Difficulty level: Intermediate

This project will require some knowledge of the relevant literature on spatial interaction models, specifically from a spatial econometrics and spatial statistics viewpoint. It would also be helpful to have some familiarity with existing PySAL infrastructure.

Expected outcomes: a collection of tools for spatial interaction modeling that utilize existing PySAL infrastrucutre, build upon its spatial regression classes, and provides additional techniques that are otherwise unavailable.

Mentors: Carson Farmer, Dani Arribas-Bel

Geovisualization Module

PySAL was originally conceived as a library implementing advanced spatial statistics and econometric methods. Given that there were many different visualization toolkits in the Python ecosystem as well as GIS packages, visualization was not a focus of our library. However, over time users of PySAL wanted the ability to visualize the results of the computations that the analytical components provided. In response a contributed module viz was developed to explore alternative approaches towards providing light-weight visualization for PySAL.

The goal of the viz module is to provide a simple to use and lightweight interface that connects PySAL to different popular visualization toolkits. While much progress has been made, there is more that can be done on the viz project as the visualization space is one that is constantly evolving.

Specific activities for the viz project include:

  • Refinement and extension of the matplotlib interface
  • Development of interactive visualizations in jupyter
  • Exploration of potential interfaces for alternative packages (e.g., Bokeh, folium, D3)

Difficulty level: intermediate

Mentors: Dani Arribas-Bel, Philip Stephens

Space-Time Analytics and Visualization

The package Space-Time Analysis of Regional Systems (STARS) served as one of the motivations for PySAL as many of the space-time analytical methods from STARS formed the beginnings of the spatial-dynamics module of PySAL. STARS development was put on hold so that efforts could be concentrated on the growth of PySAL. The plan has always been to return to STARS and rebuild a new version by replacing the internal computational modules with the enhanced versions in PySAL.

This effort will align with a recent NSF project to build a new version of STARS that uses PySAL as its main dependency. This will be in the same spirit as the approach used to build the GeoDaSpace project that wraps key functionality from the spreg module in PySAL with a user friendly interface.

Specific activities:

  • Refactor core analytical functionality of STARS to use PySAL.spatial_dynamics (and related modules)
  • Explore alternative visualization toolkits that update the current visualization capabilities of STARS
  • Redesign space-time data structures of STARS with a possible upstream push back to PySAL

Difficulty level: intermediate

Mentors: Serge Rey, Philip Stephens

Integrating R-style Formulas Into PySAL

Since version 0.5.0, statsmodels allows users to fit statistical models using R-style formulas. Internally, statsmodels uses the patsy package to convert formulas and data to the matrices that are used in model fitting. The formula framework is quite powerful, and full description of the formula language can be found in the patsy docs. The goal of this GSoC Project Idea is to integrate R-Style formula into PySAL in a similar way to statsmodels.

Statsmodels uses a separate formula module instead of the usual statsmodels API. The models wrapped in the formula module accept a string which describes the model in terms of a patsy formula. They also accept a df argument which takes a pandas data frame. There are a few key differences between statsmodels and PySAL (namely the spatial component), and so this project will likely involve a significant amount of 'scoping' of the problem to decide exactly how to proceed (e.g., do we integrate GeoPandas/Pandas data frames or stick with existing internal PySAL data structures?).

Specific activities/goals include:

  • scoping activities (i.e., how to proceed)
  • development of formula-based API
  • integration of patsy, pandas, geopandas?
  • test building
  • developing formula-based documentation

Difficulty level: advanced

This is a large project. Knowledge or R-Style formulas, model-building, code refactoring, documentation writing, etc. is all required. This project will likely require significant work beyond GSoC, so be prepared!

Expected outcomes: a new R-Style formula API for PySAL, likely implemented as a contrib module.

Mentors: Carson Farmer, David Folch

Light & Fast Geocomputation

At our core, PySAL is a spatial analytics library. Spatial statistical techniques often require a way to quantify spatial relationships. This usually takes the form of constructing "Spatial Weights" that describe the topological and geographic relationships between entities. Right now, our weights construction tools are strongly reliant on a single input data type: shapefiles. But, shapefiles are not the future of spatial data, due to strong limitations on their size and their structure. Thus, we need to generalize the spatial weights object, and build tools that can construct spatial weights from arbitrary collections of geometries.

To make these weights constructors fast and efficient, this project could also encompass constructing computational geometry tools that preserve our goal of implementations that stay firmly within the set of easily-accessible PyData tools. Thus, implementing the Dimensionally-Extended Nine-Intersection Model, commonly used to describe planar geometric operations, either in Python or Cython would help provide for flexible spatial weights constructors, and open the possibility of using spatial weights in configurations where topological relationships are flexible.

Specific activities for include:

  • Construction of fast and robust spatial weighting algorithms
  • Coverage of a full range of weighting functions:
    • Contiguity/Adjacency
    • Distance with kernel function
    • Binary Minimum Threshold Distance
    • Continuous Minimum Threshold Distance
    • K-Nearest Neighbors
  • Full coverage of the DE9IM topological relation set
  • Higher-level tabular functions useful in spatial data science:
    • spatial query
    • spatial join
    • spatial groupby

Difficulty level: advanced

Mentors: Serge Rey, Jay Laura

Other

PySAL is an open source project and as such we invite contributions from any interested developer. If you have an idea for an enhancement for PySAL please contact one of the developers to discuss the possibilities for the project in GSOC16.

Some of the above guidelines were 'borrowed' from previously successful GSoC Mentoring Organizations, such as Julia and Statsmodels.

* Note: spatstat is licenced under the Gnu GPL, so its code base is not compatible with that of PySAL.

Timeline

  • February 8-19 organizations apply
  • Feruary 29 organizations announced
  • March 14-25 students apply
  • April 22 students announced
  • April 22 - May 22 community bonding
  • May 23 - Aug 23 coding
  • Aug 30 results announced

Source: https://summerofcode.withgoogle.com

Student Application Template

Python Software Foundation's student application template.

Clone this wiki locally