Skip to content

Commit

Permalink
Merge pull request #100 from synthicity/high-level-docs
Browse files Browse the repository at this point in the history
High level docs
  • Loading branch information
fscottfoti committed Aug 13, 2014
2 parents 5410135 + e798c69 commit daf9fd7
Show file tree
Hide file tree
Showing 10 changed files with 404 additions and 101 deletions.
9 changes: 0 additions & 9 deletions docs/developer/developer.rst

This file was deleted.

136 changes: 134 additions & 2 deletions docs/developer/index.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,140 @@
Real Estate Development Models
==============================

The real estate development models included in this module are designed to
implement pencil out pro formas, which generally measure the cash inflows and
outflows of a potential investment (in this case, real estate development)
with the outcome being some measure of profitability or return on investment.
Pro formas would normally be performed in a spreadsheet program (e.g. Excel),
but are implemented in vectorized Python implementations so that many (think
millions) of pro formas can be performed at a time.

The functionality is split into two modules - the square foot pro forma and
the developer model - as there are many use cases that call for the pro formas
without the developer model. The ``sqftproforma`` module computes real
estate feasibility for a set of parcels dependent on allowed uses, prices,
and building costs, but does not actually *build* anything (both figuratively
and literally). The ``developer model`` decides how much to build,
then picks among the set of feasible buildings attempting to meet demand,
and adds the new buildings to the set of current buildings. Thus
``developer model`` is primarily useful in the context of an urban forecast.

An example of the sample code required to generate the set of feasible
buildings is shown below. This code comes from the ``utils`` module of the
current `sanfran_urbansim <https://github
.com/synthicity/sanfran_urbansim>`_ demo. Notice that the SqFtProForma is
first initialized and a DataFrame of parcels is tested for feasibliity (each
individual parcel is tested for feasibility). Each *use* (e.g. retail, office,
residential, etc) is assigned a price per parcel, typically from empirical data
of currents rents and prices in the city but can be the result of forecast
rents and prices as well. The ``lookup`` function is then called with a
specific building ``form`` and the pro forma returns whether that form is
profitable for each parcel.

A large number of assumptions enter in to the computation of profitability
and these are set in the `SqFtProFormaConfig <#urbansim.developer.sqftproforma.SqFtProFormaConfig>`_ module, and include such things
as the set of ``uses`` to model, the mix of ``uses`` into ``forms``,
the impact of parking requirements, parking costs,
building costs at different heights (taller buildings typically requiring
more expensive construction methods), the profit ratio required,
the building efficiency, parcel coverage, and cap rate to name a few. See
the API documentation for the complete list and detailed descriptions.

Note that unit mixes don't typically enter in to the square foot pro forma
(hence the name). After discussions with numerous real estate developers,
we found that most developers thought first and foremost in terms of price and
cost per square foot and the arbitrage between, and second in terms of the
translation to unit sizes and mixes in a given market (also larger and
smaller units of a given unit type will typically lower and raise their
prices as stands to reason). Since getting data on unit mixes in the current
building stock is extremely difficult, most feasibility computations here
happen on a square foot basis and the ``developer`` model below handles the
translation to units. ::

pf = sqftproforma.SqFtProForma()

df = parcels.to_frame()

# add prices for each use
for use in pf.config.uses:
df[use] = parcel_price_callback(use)

# convert from cost to yearly rent
if residential_to_yearly:
df["residential"] *= pf.config.cap_rate

d = {}
for form in pf.config.forms:
print "Computing feasibility for form %s" % form
d[form] = pf.lookup(form, df[parcel_use_allowed_callback(form)])

far_predictions = pd.concat(d.values(), keys=d.keys(), axis=1)

sim.add_table("feasibility", far_predictions)


The ``developer model`` is responsible for picking among feasible buildings
in order to meet demand. An example usage of the model is shown below - which
is also lifted form the `sanfran_urbansim <https://github.com/synthicity/sanfran_urbansim>`_ demo.

This module provides a simple utility to compute the number of units (or
amount of floorspace) to build. Although the vacancy rate *can* be applied
at the regional level, it can also be used to meet vacancy rates at a
sub-regional level. The developer model itself is agnostic to which parcels
the user passes it, and the user is responsible for knowing at which level of
geography demand is assumed to operate. The developer model then chooses
which buildings to "build," usually as a random choice weighted by profitability.
This means more profitable buildings are more likely to be built although
the results are a bit stochastic.

The only remaining steps are then "bookkeeping" in the sense that some
additional fields might need to be added (``year_built`` or a conversion from
developer ``forms`` to ``building_type_ids``). Finally the new buildings
and old buildings need to be merged in such a way that the old ids are
preserved and not duplicated (new ids are assigned at the max of the old
ids+1 and then incremented from there). ::

dev = developer.Developer(feasibility.to_frame())

target_units = dev.\
compute_units_to_build(len(agents),
buildings[supply_fname].sum(),
target_vacancy)

new_buildings = dev.pick(forms,
target_units,
parcel_size,
ave_unit_size,
total_units,
max_parcel_size=max_parcel_size,
drop_after_build=True,
residential=residential,
bldg_sqft_per_job=bldg_sqft_per_job)

if year is not None:
new_buildings["year_built"] = year

if form_to_btype_callback is not None:
new_buildings["building_type_id"] = new_buildings["form"].\
apply(form_to_btype_callback)

all_buildings = dev.merge(buildings.to_frame(buildings.local_columns),
new_buildings[buildings.local_columns])

sim.add_table("buildings", all_buildings)

.. toctree::
:maxdepth: 2

developer
sqftproforma

Square Foot Pro Forma API
~~~~~~~~~~~~~~~~~~~~~~~~~

.. automodule:: urbansim.developer.sqftproforma
:members:

Developer Model API
~~~~~~~~~~~~~~~~~~~

.. automodule:: urbansim.developer.developer
:members:
9 changes: 0 additions & 9 deletions docs/developer/sqftproforma.rst

This file was deleted.

130 changes: 85 additions & 45 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -219,75 +219,115 @@ A sample simulation workflow (a complete UrbanSim simulation is available `in th

This notebook is possibly even simpler than the estimation workflow as it has only one substantive cell which runs all of the available models in the appropriate sequence. Passing a range of years will run the simulation for multiple years (the example simply runs the simulation for a single year). Other parameters are available to the `sim.run <sim/index.html#running-simulations>`_ method which write the output to an HDF5 file.

.. _exploration-workflow:

Exploration Workflow
~~~~~~~~~~~~~~~~~~~~

UrbanSim now also provides a method to interactively explore UrbanSim inputs and outputs using web mapping tools, and the `exploration notebook <http://nbviewer.ipython.org/github/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/Exploration.ipynb>`_ demonstrates how to set up and use this interactive display tool.

This is another simple and powerful notebook which can be used to quickly map variables of both base year and simulated data without leaving the workflow to use GIS tools. This example first creates the DataFrames for many of the UrbanSim tables that have been registered (``buildings``, ``househlds``, ``jobs``, and others). Once the DataFrames have been created, they are passed to the `dframe_explorer.start <maps/dframe_explorer.html#urbansim.maps.dframe_explorer.start>`_ method.

The dframe_explorer takes a dictionary of DataFrames which are joined to a set of shapes for visualization. The most common case is to use a `geojson <http://geojson.org/>`_ format shapefile of zones to join to any DataFrame that has a zone_id (the dframe_explorer module does the join for you). Here the center and zoom level are set for the map, the name of geojson shapefile is passed, as are the join keys both in the geojson file and the DataFrames.

Once that is accomplished, the cell can be executed and the IPython Notebook is now running a web service which will respond to queries from a web browser. Try is out - open your web browser and navigate to http://localhost:8765/ or follow the same link embedded in your notebook. Note the link won't work on the web example - you need to have the example running on your local machine - all queries are run interactively between your web browser and the IPython Notebook. Your web browser should show a page like the following:

.. image:: ./screenshots/dframe_explorer_screenshot.png

Here is what each dropdown on the web page does:

* The first dropdown gives the names of the DataFames you have passed ``dframe_explorer.start``
* The second dropdown allows you to choose between each of the columns in the DataFrame with the name from the first dropdown
* The third dropdown selects the color scheme from the `colorbrewer <http://colorbrewer2.org/>`_ color schemes
* The fourth dropdown sets ``quantile`` and ``equal_interval`` `color schemes <http://www.ncgia.ucsb.edu/cctp/units/unit47/html/quanteq.html>`_
* The fifth dropdown selects the Pandas aggregation method to use
* The sixth dropdown executes the `.query <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html>`_ method on the Pandas DataFrame in order to filter the input data
* The seventh dropdown executes the `.eval <http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.eval.html>`_ method on the Pandas DataFrame in order to create simple computed variables that are not already columns on the DataFrame.

So what is this doing? The web service is translating the drop downs to a simple interactive Pandas statement, for example: ::

df.groupby('zone_id')['residential_units'].sum()

The notebook will print out each statement it executes. The website then transparently joins the output Pandas series to the shapes and create an interactive *slippy* web map using the `Leaflet <http://leafletjs.com/>`_ Javasript library. The code for this map is really `quite simple <https://github.com/synthicity/urbansim/tree/master/urbansim/maps>`_ - feel free to browse the code and add functionality as required.

To be clear, the website is performing a Pandas aggregation on the fly. If you have a buildings DataFrame with millions of records, Pandas will ``groupby`` the ``zone_id`` and perform an aggregation of your choice. This is designed to give you a quickly navigable map interface to understand the underlying disaggregate data, similar to that supplied by commercial projects such as `Tableau <http://kb.tableausoftware.com/articles/knowledgebase/mapping-basics>`_.

As a concrete example, note that the ``households`` table has a ``zone_id`` and is thus available for aggregation in ``dframe_explorer``. Since the web service is running aggregations on the *disaggregate* data, clicking to the ``households`` table and ``persons`` attribute and an aggregation of ``sum`` will run: ::
This is another simple and powerful notebook which can be used to quickly map variables of both base year and simulated data without leaving the workflow to use GIS tools. This example first creates the DataFrames for many of the UrbanSim tables that have been registered (``buildings``, ``househlds``, ``jobs``, and others). Once the DataFrames have been created, they are passed to the `start <maps/index.html#module-urbansim.maps.dframe_explorer>`_ method.

households.groupby('zone_id').persons.sum()
See :ref:`dframe-explorer` for detailed information on how to call the ``start`` method and what queries the website is performing.

This computes the sum of persons in each household by zone, or more simply, the population of each zone. If the aggregation is changed to mean, the service will run: ::
Once the ``start`` method has been called, the IPython Notebook is running a web service which will respond to queries from a web browser. Try is out - open your web browser and navigate to http://localhost:8765/ or follow the same link embedded in your notebook. Note the link won't work on the web example - you need to have the example running on your local machine - all queries are run interactively between your web browser and the IPython Notebook. Your web browser should show a page like the following:

households.groupby('zone_id').persons.mean()
.. image:: screenshots/dframe_explorer_screenshot.png

What does this compute exactly? It computes the average number of persons per household in each zone, or the average household size by zone.
See :ref:`dframe-explorer-website` for a description of how to use the website that is rendered.

Because this is serving these queries directly from the IPython Notebook, you can execute some part of a data processing workflow, then run ``dframe_explorer`` and look at the results. If something needs modification, simply hit the ``interrupt kernel`` menu item in the IPython Notebook. You can now execute more Notebook cells and return to ``dframe_explorer`` at any time by running the appropraite cell again. Now map exploration is simply another interactive step in your data processing workflow.
Because the web service is serving these queries directly from the IPython Notebook, you can execute some part of a data processing workflow, then run ``dframe_explorer`` and look at the results. If something needs modification, simply hit the ``interrupt kernel`` menu item in the IPython Notebook. You can now execute more Notebook cells and return to ``dframe_explorer`` at any time by running the appropriate cell again. Now map exploration is simply another interactive step in your data processing workflow.

Specifying Scenario Inputs
--------------------------
Model Implementation Choices
----------------------------

Control Totals
~~~~~~~~~~~~~~
There are a number of model implementation choices that can be made in
implementing an UrbanSim regional forecasting tool, and this will describe a
few of the possibilities. There is definitely a set of best practices
though, so shoot us an email if you want more detail.

Zoning Changes
~~~~~~~~~~~~~~
Geographic Detail
~~~~~~~~~~~~~~~~~

Fees and Subsidies
~~~~~~~~~~~~~~~~~~
Although zone or block-level models can be done (and gridcells have been used
historically), at this point the geographic detail is typically at the parcel or
building level. If good information is available for individual units,
this level or detail is actually ideal.

Model Implementation Choices
----------------------------
Most household and employment location choices choose building_ids at this
point, and the number of available units is measured as the supply of
units / job_spaces in the building minus the number of households / jobs in the
building.

UrbanAccess or Zones
~~~~~~~~~~~~~~~~~~~~

Geographic Detail
~~~~~~~~~~~~~~~~~
It is fairly standard to combine the buildings from the locations discussed
above with some measure of the neighborhood around each building. The simplest
implementation of this idea is used in the sanfran_example - and is typical of
traditional GIS - which is to use aggregations within some higher level polygon.
In the most common case, the region has zones assigned and every parcel is
assigned a ``zone_id`` (the ``zone_id`` is then available on the other related
tables). Once ``zone_ids`` are available, vanilla Pandas is usable and GIS
is not strictly required.

Although this is the easiest implementation method, a pedestrian-scale
network-based method is perhaps more appropriate when analyses are happening
at the parcel- and building-scale and this is the exactly the intended purpose
of the `urbanaccess <https://github.com/synthicity/urbanaccess>`_ framework.
Most full UrbanSim implementations now use aggregations along the local street
network, and ``urbanaccess`` will be released as an official product by the
end of 2014.

Jobs or Establishments
~~~~~~~~~~~~~~~~~~~~~~

Jobs by sector is often the unit of analysis for the non-residential side,
as this kind of model is completely analagous to the residential side and is
perhaps the easiest to understand. In some cases establishments can be used
instead of jobs to capture different behavior of different size
establishments, but fitting establishments into buildings then becomes a
tricky endeavor (and modeling the movements of large employers should not
really be part of the scope of the model system).

Configuration of Models
~~~~~~~~~~~~~~~~~~~~~~~

Some choices need to made on the configuration of models. For instance,
is there a single hedonic for residential sales price or is there a second
model for rent? Is non-residential rent segmented by building type? How many
different uses are there in the pro forma and what forms (mixes of uses) will be
tested. The simplest model configuration is shown in the sanfran_urbansim
example, and additional behavior can be captured to answer specific research
questions.

Dealing with NaNs
~~~~~~~~~~~~~~~~~

There is not a standard method for dealing with NaNs (typically indicating
missing data) within UrbanSim, but there is a good convention that can be
used. First an injectable can be set with an object in this form (make sure
to set the name appropriately): ::

sim.add_injectable("fillna_config", {
"buildings": {
"residential_sales_price": ("zero", "int"),
"non_residential_rent": ("zero", "int"),
"residential_units": ("zero", "int"),
"non_residential_sqft": ("zero", "int"),
"year_built": ("median", "int"),
"building_type_id": ("mode", "int")
},
"jobs": {
"job_category": ("mode", "str"),
}
})


The keys in this object are table names, the values are also dictionary
where the keys are column names and the values are a tuple. The first value
of the tuple is what to call the Pandas ``fillna`` function with,
and can be a choice of "zero," "median," or "mode" and should be set
appropriately by the user for the specific column. The second argument is
the data type to conver to. The user can then call
``utils.fill_na_from_config`` as in the `example <https://github.com/synthicity/sanfran_urbansim/blob/98b308f795c73ffc36c420845f394cbe3322b11b/dataset.py#L22>`_ with a DataFrame and table name and all NaNs will be filled. This
functionality will eventually be moved into UrbanSim.
Loading

0 comments on commit daf9fd7

Please sign in to comment.