Merge pull request #100 from synthicity/high-level-docs

High level docs
UDST · Aug 13, 2014 · daf9fd7 · daf9fd7
2 parents 5410135 + e798c69
commit daf9fd7
Show file tree

Hide file tree

Showing 10 changed files with 404 additions and 101 deletions.
diff --git a/docs/developer/developer.rst b/docs/developer/developer.rst
diff --git a/docs/developer/index.rst b/docs/developer/index.rst
@@ -1,8 +1,140 @@
 Real Estate Development Models
 ==============================
 
+The real estate development models included in this module are designed to
+implement pencil out pro formas, which generally measure the cash inflows and
+outflows of a potential investment (in this case, real estate development)
+with the outcome being some measure of profitability or return on investment.
+Pro formas would normally be performed in a spreadsheet program (e.g. Excel),
+but are implemented in vectorized Python implementations so that many (think
+millions) of pro formas can be performed at a time.
+
+The functionality is split into two modules - the square foot pro forma and
+the developer model - as there are many use cases that call for the pro formas
+without the developer model.  The ``sqftproforma`` module computes real
+estate feasibility for a set of parcels dependent on allowed uses, prices,
+and building costs, but does not actually *build* anything (both figuratively
+and literally).  The ``developer model`` decides how much to build,
+then picks among the set of feasible buildings attempting to meet demand,
+and adds the new buildings to the set of current buildings.  Thus
+``developer model`` is primarily useful in the context of an urban forecast.
+
+An example of the sample code required to generate the set of feasible
+buildings is shown below.  This code comes from the ``utils`` module of the
+current `sanfran_urbansim <https://github
+.com/synthicity/sanfran_urbansim>`_ demo.  Notice that the SqFtProForma is
+first initialized and a DataFrame of parcels is tested for feasibliity (each
+individual parcel is tested for feasibility).  Each *use* (e.g. retail, office,
+residential, etc) is assigned a price per parcel, typically from empirical data
+of currents rents and prices in the city but can be the result of forecast
+rents and prices as well.  The ``lookup`` function is then called with a
+specific building ``form`` and the pro forma returns whether that form is
+profitable for each parcel.
+
+A large number of assumptions enter in to the computation of profitability
+and these are set in the `SqFtProFormaConfig <#urbansim.developer.sqftproforma.SqFtProFormaConfig>`_ module, and include such things
+as the set of ``uses`` to model, the mix of ``uses`` into ``forms``,
+the impact of parking requirements, parking costs,
+building costs at different heights (taller buildings typically requiring
+more expensive construction methods), the profit ratio required,
+the building efficiency, parcel coverage, and cap rate to name a few.  See
+the API documentation for the complete list and detailed descriptions.
+
+Note that unit mixes don't typically enter in to the square foot pro forma
+(hence the name).  After discussions with numerous real estate developers,
+we found that most developers thought first and foremost in terms of price and
+cost per square foot and the arbitrage between, and second in terms of the
+translation to unit sizes and mixes in a given market (also larger and
+smaller units of a given unit type will typically lower and raise their
+prices as stands to reason).  Since getting data on unit mixes in the current
+building stock is extremely difficult, most feasibility computations here
+happen on a square foot basis and the ``developer`` model below handles the
+translation to units. ::
+
+    pf = sqftproforma.SqFtProForma()
+
+    df = parcels.to_frame()
+
+    # add prices for each use
+    for use in pf.config.uses:
+        df[use] = parcel_price_callback(use)
+
+    # convert from cost to yearly rent
+    if residential_to_yearly:
+        df["residential"] *= pf.config.cap_rate
+
+    d = {}
+    for form in pf.config.forms:
+        print "Computing feasibility for form %s" % form
+        d[form] = pf.lookup(form, df[parcel_use_allowed_callback(form)])
+
+    far_predictions = pd.concat(d.values(), keys=d.keys(), axis=1)
+
+    sim.add_table("feasibility", far_predictions)
+
+
+The ``developer model`` is responsible for picking among feasible buildings
+in order to meet demand.  An example usage of the model is shown below - which
+is also lifted form the `sanfran_urbansim <https://github.com/synthicity/sanfran_urbansim>`_ demo.
+
+This module provides a simple utility to compute the number of units (or
+amount of floorspace) to build.  Although the vacancy rate *can* be applied
+at the regional level, it can also be used to meet vacancy rates at a
+sub-regional level.  The developer model itself is agnostic to which parcels
+the user passes it, and the user is responsible for knowing at which level of
+geography demand is assumed to operate.  The developer model then chooses
+which buildings to "build," usually as a random choice weighted by profitability.
+This means more profitable buildings are more likely to be built although
+the results are a bit stochastic.
+
+The only remaining steps are then "bookkeeping" in the sense that some
+additional fields might need to be added (``year_built`` or a conversion from
+developer ``forms`` to ``building_type_ids``).  Finally the new buildings
+and old buildings need to be merged in such a way that the old ids are
+preserved and not duplicated (new ids are assigned at the max of the old
+ids+1 and then incremented from there).  ::
+
+    dev = developer.Developer(feasibility.to_frame())
+
+    target_units = dev.\
+        compute_units_to_build(len(agents),
+                               buildings[supply_fname].sum(),
+                               target_vacancy)
+
+    new_buildings = dev.pick(forms,
+                             target_units,
+                             parcel_size,
+                             ave_unit_size,
+                             total_units,
+                             max_parcel_size=max_parcel_size,
+                             drop_after_build=True,
+                             residential=residential,
+                             bldg_sqft_per_job=bldg_sqft_per_job)
+
+    if year is not None:
+        new_buildings["year_built"] = year
+
+    if form_to_btype_callback is not None:
+        new_buildings["building_type_id"] = new_buildings["form"].\
+            apply(form_to_btype_callback)
+
+    all_buildings = dev.merge(buildings.to_frame(buildings.local_columns),
+                              new_buildings[buildings.local_columns])
+
+    sim.add_table("buildings", all_buildings)
+
 .. toctree::
    :maxdepth: 2
 
-   developer
-   sqftproforma
+
+Square Foot Pro Forma API
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: urbansim.developer.sqftproforma
+   :members:
+
+Developer Model API
+~~~~~~~~~~~~~~~~~~~
+
+.. automodule:: urbansim.developer.developer
+   :members:
diff --git a/docs/developer/sqftproforma.rst b/docs/developer/sqftproforma.rst
diff --git a/docs/examples.rst b/docs/examples.rst
@@ -219,75 +219,115 @@ A sample simulation workflow (a complete UrbanSim simulation is available `in th
 
 This notebook is possibly even simpler than the estimation workflow as it has only one substantive cell which runs all of the available models in the appropriate sequence.  Passing a range of years will run the simulation for multiple years (the example simply runs the simulation for a single year).  Other parameters are available to the  `sim.run <sim/index.html#running-simulations>`_ method which write the output to an HDF5 file.
 
+.. _exploration-workflow:
+
 Exploration Workflow
 ~~~~~~~~~~~~~~~~~~~~
 
 UrbanSim now also provides a method to interactively explore UrbanSim inputs and outputs using web mapping tools, and the `exploration notebook <http://nbviewer.ipython.org/github/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/Exploration.ipynb>`_ demonstrates how to set up and use this interactive display tool.
 
-This is another simple and powerful notebook which can be used to quickly map variables of both base year and simulated data without leaving the workflow to use GIS tools.  This example first creates the DataFrames for many of the UrbanSim tables that have been registered (``buildings``, ``househlds``, ``jobs``, and others).  Once the DataFrames have been created, they are passed to the `dframe_explorer.start <maps/dframe_explorer.html#urbansim.maps.dframe_explorer.start>`_ method.
-
-The dframe_explorer takes a dictionary of DataFrames which are joined to a set of shapes for visualization.  The most common case is to use a `geojson <http://geojson.org/>`_ format shapefile of zones to join to any DataFrame that has a zone_id (the dframe_explorer module does the join for you).  Here the center and zoom level are set for the map, the name of geojson shapefile is passed, as are the join keys both in the geojson file and the DataFrames.
-
-Once that is accomplished, the cell can be executed and the IPython Notebook is now running a web service which will respond to queries from a web browser.  Try is out - open your web browser and navigate to http://localhost:8765/ or follow the same link embedded in your notebook.  Note the link won't work on the web example - you need to have the example running on your local machine - all queries are run interactively between your web browser and the IPython Notebook.  Your web browser should show a page like the following:
-
-.. image:: ./screenshots/dframe_explorer_screenshot.png
-
-Here is what each dropdown on the web page does:
-
-* The first dropdown gives the names of the DataFames you have passed ``dframe_explorer.start``
-* The second dropdown allows you to choose between each of the columns in the DataFrame with the name from the first dropdown
-* The third dropdown selects the color scheme from the `colorbrewer <http://colorbrewer2.org/>`_ color schemes
-* The fourth dropdown sets ``quantile`` and ``equal_interval`` `color schemes <http://www.ncgia.ucsb.edu/cctp/units/unit47/html/quanteq.html>`_
-* The fifth dropdown selects the Pandas aggregation method to use
-* The sixth dropdown executes the `.query <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html>`_ method on the Pandas DataFrame in order to filter the input data
-* The seventh dropdown executes the `.eval <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.eval.html>`_ method on the Pandas DataFrame in order to create simple computed variables that are not already columns on the DataFrame.
-
-So what is this doing?  The web service is translating the drop downs to a simple interactive Pandas statement, for example: ::
-
-    df.groupby('zone_id')['residential_units'].sum()
-
-The notebook will print out each statement it executes.  The website then transparently joins the output Pandas series to the shapes and create an interactive *slippy* web map using the `Leaflet <http://leafletjs.com/>`_ Javasript library.  The code for this map is really `quite simple <https://github.com/synthicity/urbansim/tree/master/urbansim/maps>`_ - feel free to browse the code and add functionality as required.
-
-To be clear, the website is performing a Pandas aggregation on the fly.  If you have a buildings DataFrame with millions of records, Pandas will ``groupby`` the ``zone_id`` and perform an aggregation of your choice.  This is designed to give you a quickly navigable map interface to understand the underlying disaggregate data, similar to that supplied by commercial projects such as `Tableau <http://kb.tableausoftware.com/articles/knowledgebase/mapping-basics>`_.
-
-As a concrete example, note that the ``households`` table has a ``zone_id`` and is thus available for aggregation in ``dframe_explorer``.  Since the web service is running aggregations on the *disaggregate* data, clicking to the ``households`` table and ``persons`` attribute and an aggregation of ``sum`` will run: ::
+This is another simple and powerful notebook which can be used to quickly map variables of both base year and simulated data without leaving the workflow to use GIS tools.  This example first creates the DataFrames for many of the UrbanSim tables that have been registered (``buildings``, ``househlds``, ``jobs``, and others).  Once the DataFrames have been created, they are passed to the `start <maps/index.html#module-urbansim.maps.dframe_explorer>`_ method.
 
-    households.groupby('zone_id').persons.sum()
+See :ref:`dframe-explorer` for detailed information on how to call the ``start`` method and what queries the website is performing.
 
-This computes the sum of persons in each household by zone, or more simply, the population of each zone.  If the aggregation is changed to mean, the service will run: ::
+Once the ``start`` method has been called, the IPython Notebook is running a web service which will respond to queries from a web browser.  Try is out - open your web browser and navigate to http://localhost:8765/ or follow the same link embedded in your notebook.  Note the link won't work on the web example - you need to have the example running on your local machine - all queries are run interactively between your web browser and the IPython Notebook.  Your web browser should show a page like the following:
 
-    households.groupby('zone_id').persons.mean()
+.. image:: screenshots/dframe_explorer_screenshot.png
 
-What does this compute exactly?  It computes the average number of persons per household in each zone, or the average household size by zone.
+See :ref:`dframe-explorer-website` for a description of how to use the website that is rendered.
 
-Because this is serving these queries directly from the IPython Notebook, you can execute some part of a data processing workflow, then run ``dframe_explorer`` and look at the results.  If something needs modification, simply hit the ``interrupt kernel`` menu item in the IPython Notebook.  You can now execute more Notebook cells and return to ``dframe_explorer`` at any time by running the appropraite cell again.  Now map exploration is simply another interactive step in your data processing workflow.
+Because the web service is serving these queries directly from the IPython Notebook, you can execute some part of a data processing workflow, then run ``dframe_explorer`` and look at the results.  If something needs modification, simply hit the ``interrupt kernel`` menu item in the IPython Notebook.  You can now execute more Notebook cells and return to ``dframe_explorer`` at any time by running the appropriate cell again.  Now map exploration is simply another interactive step in your data processing workflow.
 
-Specifying Scenario Inputs
---------------------------
+Model Implementation Choices
+----------------------------
 
-Control Totals
-~~~~~~~~~~~~~~
+There are a number of model implementation choices that can be made in
+implementing an UrbanSim regional forecasting tool, and this will describe a
+few of the possibilities.  There is definitely a set of best practices
+though, so shoot us an email if you want more detail.
 
-Zoning Changes
-~~~~~~~~~~~~~~
+Geographic Detail
+~~~~~~~~~~~~~~~~~
 
-Fees and Subsidies
-~~~~~~~~~~~~~~~~~~
+Although zone or block-level models can be done (and gridcells have been used
+historically), at this point the geographic detail is typically at the parcel or
+building level.  If good information is available for individual units,
+this level or detail is actually ideal.
 
-Model Implementation Choices
-----------------------------
+Most household and employment location choices choose building_ids at this
+point, and the number of available units is measured as the supply of
+units / job_spaces in the building minus the number of households / jobs in the
+building.
 
 UrbanAccess or Zones
 ~~~~~~~~~~~~~~~~~~~~
 
-Geographic Detail
-~~~~~~~~~~~~~~~~~
+It is fairly standard to combine the buildings from the locations discussed
+above with some measure of the neighborhood around each building.  The simplest
+implementation of this idea is used in the sanfran_example - and is typical of
+traditional GIS - which is to use aggregations within some higher level polygon.
+In the most common case, the region has zones assigned and every parcel is
+assigned a ``zone_id`` (the ``zone_id`` is then available on the other related
+tables).  Once ``zone_ids`` are available, vanilla Pandas is usable and GIS
+is not strictly required.
+
+Although this is the easiest implementation method, a pedestrian-scale
+network-based method is perhaps more appropriate when analyses are happening
+at the parcel- and building-scale and this is the exactly the intended purpose
+of the `urbanaccess <https://github.com/synthicity/urbanaccess>`_ framework.
+Most full UrbanSim implementations now use aggregations along the local street
+network, and ``urbanaccess`` will be released as an official product by the
+end of 2014.
+
+Jobs or Establishments
+~~~~~~~~~~~~~~~~~~~~~~
+
+Jobs by sector is often the unit of analysis for the non-residential side,
+as this kind of model is completely analagous to the residential side and is
+perhaps the easiest to understand.  In some cases establishments can be used
+instead of jobs to capture different behavior of different size
+establishments, but fitting establishments into buildings then becomes a
+tricky endeavor (and modeling the movements of large employers should not
+really be part of the scope of the model system).
 
 Configuration of Models
 ~~~~~~~~~~~~~~~~~~~~~~~
 
+Some choices need to made on the configuration of models.  For instance,
+is there a single hedonic for residential sales price or is there a second
+model for rent?  Is  non-residential rent segmented by building type?  How many
+different uses are there in the pro forma and what forms (mixes of uses) will be
+tested. The simplest model configuration is shown in the sanfran_urbansim
+example, and additional behavior can be captured to answer specific research
+questions.
+
 Dealing with NaNs
 ~~~~~~~~~~~~~~~~~
 
+There is not a standard method for dealing with NaNs (typically indicating
+missing data) within UrbanSim, but there is a good convention that can be
+used.  First an injectable can be set with an object in this form (make sure
+to set the name appropriately): ::
+
+    sim.add_injectable("fillna_config", {
+        "buildings": {
+            "residential_sales_price": ("zero", "int"),
+            "non_residential_rent": ("zero", "int"),
+            "residential_units": ("zero", "int"),
+            "non_residential_sqft": ("zero", "int"),
+            "year_built": ("median", "int"),
+            "building_type_id": ("mode", "int")
+        },
+        "jobs": {
+            "job_category": ("mode", "str"),
+        }
+    })
 
-
+The keys in this object are table names, the values are also dictionary
+where the keys are column names and the values are a tuple.  The first value
+of the tuple is what to call the Pandas ``fillna`` function with,
+and can be a choice of "zero," "median," or "mode" and should be set
+appropriately by the user for the specific column.  The second argument is
+the data type to conver to. The user can then call
+``utils.fill_na_from_config`` as in the `example <https://github.com/synthicity/sanfran_urbansim/blob/98b308f795c73ffc36c420845f394cbe3322b11b/dataset.py#L22>`_ with a DataFrame and table name and all NaNs will be filled. This
+functionality will eventually be moved into UrbanSim.