Skip to content

Commit

Permalink
Merge pull request #127 from jbouffard/performance-master
Browse files Browse the repository at this point in the history
Performance Refactor
  • Loading branch information
Jacob Bouffard committed Apr 25, 2017
2 parents 15b395c + 0272b6d commit fa15e1d
Show file tree
Hide file tree
Showing 90 changed files with 4,500 additions and 4,343 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -100,3 +100,4 @@ geopyspark/jars/*.jar

# Unit test performance results
prof/
.ensime_cache
9 changes: 3 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ branches:
only:
- io-refactor
- master
- refactor/performance-refactor

addons:
apt:
Expand Down Expand Up @@ -38,14 +39,10 @@ cache:
- $HOME/.ivy2
- $HOME/.cache/pip

notifications:
email:
recipients:
- jbouffard@azavea.com

script:
- "if [ ! -f archives/spark-2.1.0-bin-hadoop2.7.tgz ]; then pushd archives ; wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.0-bin-hadoop2.7.tgz; popd; fi"
- "tar -xvf archives/spark-2.1.0-bin-hadoop2.7.tgz"
- "export SPARK_HOME=./spark-2.1.0-bin-hadoop2.7/"
- "export JAVA_HOME=/usr/lib/jvm/java-8-oracle"
- pytest
- pytest -k "schema" geopyspark/tests/schema_tests/
- pytest -k "not schema" geopyspark/tests/*test.py
12 changes: 0 additions & 12 deletions README.md

This file was deleted.

96 changes: 96 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
GeoPySpark
***********
.. image:: https://travis-ci.org/locationtech-labs/geopyspark.svg?branch=master
:target: https://travis-ci.org/locationtech-labs/geopyspark

``GeoPySpark`` provides Python bindings for working with geospatial data using `PySpark <http://spark.apache.org/docs/latest/api/python/pyspark.html>`_
It will provide interfaces into GeoTrellis and GeoMesa LocationTech frameworks.
It is currently under development, and has just entered alpha.

Currently, only functionality from GeoTrellis has been supported. GeoMesa
LocationTech frameworks will be added at a later date.

Contact and Support
--------------------

If you need help, have questions, or like to talk to the developers (let us
know what you're working on!) you contact us at:

* `Gitter <https://gitter.im/geotrellis/geotrellis>`_
* `Mailing list <https://locationtech.org/mailman/listinfo/geotrellis-user>`_

As you may have noticed from the above links, those are links to the GeoTrellis
gitter channel and mailing list. This is because this project is currently an
offshoot of GeoTrellis, and we will be using their mailing list and gitter
channel as a means of contact. However, we will form our own if there is
a need for it.

Setup
------

GeoPySpark Requirements
^^^^^^^^^^^^^^^^^^^^^^^^

============ ============
Requirement Version
============ ============
Java >=1.8
Scala 2.11.8
Python 3.3 - 3.5
Hadoop >=2.0.1
============ ============

Java 8 and Scala 2.11 are needed for GeoPySpark to work; as they are required
by GeoTrellis. In addition, Spark needs to be installed and configured with the
environment variable, ``SPARK_HOME`` set.

You can test to see if Spark is installed properly by running the following in the
terminal:

.. code:: console
> echo $SPARK_HOME
/usr/local/bin/spark
If the return is a path leading to your Spark folder, then it means that Spark
has been configured correctly.

How to Install
^^^^^^^^^^^^^^^

Before installing, check the above table to make sure that the
requirements are met.

To install via ``pip`` open the terminal and run the following:

.. code:: console
pip install geopyspark
If you would rather install from source, you can do so by running the following
in the terminal:

.. code:: console
git clone https://github.com/locationtech-labs/geopyspark.git
cd geopyspark
make install
This will assemble the backend-end ``jar`` that contains the Scala code,
move it to the ``jars`` module, and then runs the ``setup.py`` script.

Make Targets
^^^^^^^^^^^^

- **isntall** - install ``GeoPySpark`` python package locally
- **wheel** - build python ``GeoPySpark`` wheel for distribution
- **pyspark** - start pyspark shell with project jars
- **docker-build** - build docker image for Jupyter with ``GeoPySpark``

Contributing
------------

Any kind of feedback and contributions to GeoPySpark is always welcomed.
A CLA is required for contribution, see `Contributing <docs/contributing.rst>`_ for more
information.
>>>>>>> Expanded README
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXPROJ = GeoPySpark
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
33 changes: 33 additions & 0 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
Changelog
==========

0.1.0
------

The first release of GeoPySpark! After being in development for the past 5
months, it is now ready for its initial release! Since nothing has been changed
or updated per se, we'll just go over the features that will be present in
0.1.0.


**geopyspark.geotrellis**

- Create a ``RasterRDD`` from GeoTiffs that are stored locally, on S3, or on
HDFS.
- Serialize Python RDDs to Scala and back.
- Perform various tiling operations such as ``tile_to_layout``, ``cut_tiles``,
and ``pyramid``.
- Stitch together a ``TiledRasterRDD`` to create one ``Raster``.
- ``rasterize`` geometries and turn them into ``RasterRDD``.
- ``reclassify`` values of Rasters in RDDs.
- Calculate ``cost_distance`` on a ``TiledRasterRDD``.
- Perform local and focal operations on ``TiledRasterRDD``.
- Read, write, and query GeoTrellis tile layers.
- Read tiles from a layer.

**Documentation**

- Added docstrings to all python classes, methods, etc.
- Core-Concepts.
- Ingesting and creating a tile server with a greyscale data.
- Ingesting and creating a tile server with data from Sentinel.
162 changes: 162 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# GeoPySpark documentation build configuration file, created by
# sphinx-quickstart on Wed Apr 12 16:16:48 2017.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import sys
from geopyspark.geopyspark_utils import setup_environment

setup_environment()
sys.path.insert(0, os.path.abspath('../geopyspark/'))


# -- General configuration ------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
#
# needs_sphinx = '1.0'

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']

napoleon_google_docstring = True

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'

# The master toctree document.
master_doc = 'index'

# General information about the project.
project = 'GeoPySpark'
copyright = '2017, Jacob Bouffard, James McClean, Eugene Cheipesh'
author = 'Jacob Bouffard, James McClean, Eugene Cheipesh'

# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '0.1.0'
# The full version, including alpha/beta/rc tags.
release = '0.1.0'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This patterns also effect to html_static_path and html_extra_path
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False


# -- Options for HTML output ----------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
# html_theme_options = {}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']


# -- Options for HTMLHelp output ------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = 'GeoPySparkdoc'


# -- Options for LaTeX output ---------------------------------------------

latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',

# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',

# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',

# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}

# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'GeoPySpark.tex', 'GeoPySpark Documentation',
'Jacob Bouffard, James McClean, Eugene Cheipesh', 'manual'),
]


# -- Options for manual page output ---------------------------------------

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'geopyspark', 'GeoPySpark Documentation',
[author], 1)
]


# -- Options for Texinfo output -------------------------------------------

# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'GeoPySpark', 'GeoPySpark Documentation',
author, 'GeoPySpark', 'One line description of project.',
'Miscellaneous'),
]



0 comments on commit fa15e1d

Please sign in to comment.