Skip to content

Commit

Permalink
Merge pull request #101 from synthicity/ej-doc-edits
Browse files Browse the repository at this point in the history
A set of very minor doc edits
  • Loading branch information
jiffyclub committed Aug 14, 2014
2 parents daf9fd7 + d1ee62c commit 810172d
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 18 deletions.
16 changes: 8 additions & 8 deletions docs/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ A fairly complete case study of using UrbanSim can be shown entirely within a si

As the canonical example of using UrbanSim, take the case of a residential sales hedonic model used to perform an ordinary least squares regression on a table of building price data. The best practice would be to store the building data in a Pandas HDFStore, and the buildings table can include millions of rows (all of the buildings in a region) and attributes like square footage, lot size, number of bedrooms and bathrooms and the like. Importantly, the dependent variable should also be included which in this case might be the assessed or observed price of each unit. The example repository includes sample data so that this Notebook can be executed.

This Notebook performs the exact same residential price hedonic as in the complete example below, but all entirely within the same IPython Notebook (and without explicitly using the ``sim.model`` decorator). The simplest use case of the UrbanSim methodology is to create a single model to study an emperical behavior or interest to the modeler, and a good place to start in building such a model is this example.
This Notebook performs the exact same residential price hedonic as in the complete example below, but all entirely within the same IPython Notebook (and without explicitly using the ``sim.model`` decorator). The simplest use case of the UrbanSim methodology is to create a single model to study an emperical behavior of interest to the modeler, and a good place to start in building such a model is this example.

Note that the flow of the notebook is one often followed in statistical modeling:

Expand Down Expand Up @@ -107,7 +107,7 @@ The ``buildings`` object that gets passed in is a `Table Wrapper <sim/index.html

To convert a ``Table Wrapper`` to a DataFrame, the user can simply call `to_frame <sim/index.html#urbansim.sim.simulation.DataFrameWrapper.to_frame>`_ but this returns *all* computed columns on the table and so has performance implications. In general it's better to use the Series objects directly where possible.

As a concrete example, the above code is recommended: ::
As a concrete example, the following code is recommended: ::

return buildings.residential_units.groupby(buildings.zone_id).sum()

Expand All @@ -130,13 +130,13 @@ Finally, if all the attributes being used are primary, the user can call ``local
Models
~~~~~~

The main objective of the `models.py <https://github.com/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/models.py>`_ file is to define the "entry points" into the model system. Although UrbanSim provides the direct API for a `Regression Model <models/statistical.html#urbansim.models.regression.RegressionModel>`_ a `Location Choice Model <models/statistical.html#urbansim.models.lcm.MNLLocationChoiceModel>`_, etc, it is the models.py file which defines the specific *steps* that outline a simulation or even a more general data processing workflow.
The main objective of the `models.py <https://github.com/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/models.py>`_ file is to define the "entry points" into the model system. Although UrbanSim provides the direct API for a `Regression Model <models/statistical.html#urbansim.models.regression.RegressionModel>`_, a `Location Choice Model <models/statistical.html#urbansim.models.lcm.MNLLocationChoiceModel>`_, etc, it is the models.py file which defines the specific *steps* that outline a simulation or even a more general data processing workflow.

In the San Francisco example, there are two price/rent `hedonic models <http://en.wikipedia.org/wiki/Hedonic_regression>`_ which both use the RegressionModel, one which is the residential sales hedonic which is estimated with the entry point `rsh_estimate <https://github.com/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/models.py#L9>`_ and then run in simulation mode with the entry point rsh_simulate. The non-residential rent hedonic has similar entry points `nrh_estimate <https://github.com/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/models.py#L20>`_ and nrh_simulate. Note that both functions call `hedonic_estimate <https://github.com/synthicity/sanfran_urbansim/blob/master/utils.py#L110>`_ and hedonic_simulate in `utils.py <https://github.com/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/utils.py>`_. In this case ``utils.py`` actually uses the UrbanSim API by calling the `fit_from_cfg <models/statistical.html#urbansim.models.regression.RegressionModel.fit_from_cfg>`_ method on the Regressionmodel.

There are two things that warrant further explanation at this point.

* ``utils.py`` is a set of helper functions that assist with merging data and running models from configuration files. Note that the code in this file is generally sharable across UrbanSim implementations (in fact, this exact code is in use in multiple live simulations). It defines a certain style of UrbanSim and handles a number of boundary cases in a transparent way. In the long run, this kind of functionality might be unit tested and moved to UrbanSim, but for now we think it helps with transparency, flexibility, and debugging to keep this file with the specific client implementations.
* ``utils.py`` is a set of helper functions that assist with merging data and running models from configuration files. Note that the code in this file is generally shareable across UrbanSim implementations (in fact, this exact code is in use in multiple live simulations). It defines a certain style of UrbanSim and handles a number of boundary cases in a transparent way. In the long run, this kind of functionality might be unit tested and moved to UrbanSim, but for now we think it helps with transparency, flexibility, and debugging to keep this file with the specific client implementations.

* Many of the models use configuration files to define the actual model configuration. In fact, most models in this file are very short *stub* functions which pass a Pandas DataFrame into the estimation and configure the model using a configuration file in the `YAML file format <http://en.wikipedia.org/wiki/YAML>`_. For instance, the ``rsh_estimate`` function knows to read the configuration file, estimate the model defined in the configuration on the dataframe passed in, and write the estimated coefficients back to the same configuration file, and the complete method is pasted below::

Expand Down Expand Up @@ -215,7 +215,7 @@ This notebook estimates all of the models in the example that need estimation (b
Simulation Workflow
~~~~~~~~~~~~~~~~~~~

A sample simulation workflow (a complete UrbanSim simulation is available `in this Notebook <http://nbviewer.ipython.org/github/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/Simulation.ipynb>`__.
A sample simulation workflow (a complete UrbanSim simulation) is available `in this Notebook <http://nbviewer.ipython.org/github/synthicity/sanfran_urbansim/blob/462f1f9f7286ffbaf83ae5ad04775494bf4d1677/Simulation.ipynb>`__.

This notebook is possibly even simpler than the estimation workflow as it has only one substantive cell which runs all of the available models in the appropriate sequence. Passing a range of years will run the simulation for multiple years (the example simply runs the simulation for a single year). Other parameters are available to the `sim.run <sim/index.html#running-simulations>`_ method which write the output to an HDF5 file.

Expand All @@ -230,7 +230,7 @@ This is another simple and powerful notebook which can be used to quickly map va

See :ref:`dframe-explorer` for detailed information on how to call the ``start`` method and what queries the website is performing.

Once the ``start`` method has been called, the IPython Notebook is running a web service which will respond to queries from a web browser. Try is out - open your web browser and navigate to http://localhost:8765/ or follow the same link embedded in your notebook. Note the link won't work on the web example - you need to have the example running on your local machine - all queries are run interactively between your web browser and the IPython Notebook. Your web browser should show a page like the following:
Once the ``start`` method has been called, the IPython Notebook is running a web service which will respond to queries from a web browser. Try it out - open your web browser and navigate to http://localhost:8765/ or follow the same link embedded in your notebook. Note the link won't work on the web example - you need to have the example running on your local machine - all queries are run interactively between your web browser and the IPython Notebook. Your web browser should show a page like the following:

.. image:: screenshots/dframe_explorer_screenshot.png

Expand Down Expand Up @@ -323,11 +323,11 @@ to set the name appropriately): ::
}
})

The keys in this object are table names, the values are also dictionary
The keys in this object are table names, the values are also a dictionary
where the keys are column names and the values are a tuple. The first value
of the tuple is what to call the Pandas ``fillna`` function with,
and can be a choice of "zero," "median," or "mode" and should be set
appropriately by the user for the specific column. The second argument is
the data type to conver to. The user can then call
the data type to convert to. The user can then call
``utils.fill_na_from_config`` as in the `example <https://github.com/synthicity/sanfran_urbansim/blob/98b308f795c73ffc36c420845f394cbe3322b11b/dataset.py#L22>`_ with a DataFrame and table name and all NaNs will be filled. This
functionality will eventually be moved into UrbanSim.
8 changes: 4 additions & 4 deletions docs/gettingstarted.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ One of the main motivations for the current implementation of UrbanSim is to ref
A Note on Pandas Indexing
~~~~~~~~~~~~~~~~~~~~~~~~~

One very import note about Pandas - the real genius of the abstraction is that all records in a table are viewed as key-value pairs. Every table has an `index <http://pandas.pydata.org/pandas-docs/stable/indexing.html>`_ or a `multi-index <http://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex>`_ which is used to `align <http://pandas.pydata.org/pandas-docs/stable/basics.html#aligning-objects-with-each-other-with-align>`_ the table on the key for that table.
One very important note about Pandas - the real genius of the abstraction is that all records in a table are viewed as key-value pairs. Every table has an `index <http://pandas.pydata.org/pandas-docs/stable/indexing.html>`_ or a `multi-index <http://pandas.pydata.org/pandas-docs/stable/indexing.html#hierarchical-indexing-multiindex>`_ which is used to `align <http://pandas.pydata.org/pandas-docs/stable/basics.html#aligning-objects-with-each-other-with-align>`_ the table on the key for that table.

This is similar to having a `primary key <http://en.wikipedia.org/wiki/Unique_key>`_ in a database except that now you can do mathematical operations with columns. For instance, you can now take a column from one table and a column from another table and add or multiply them and the operation will automatically align on the key (i.e. it will add elements with the same index value).

Expand All @@ -72,7 +72,7 @@ IPython

One of the most useful features of IPython is the `IPython notebook <http://ipython.org/notebook.html>`_, which is perfect for interactively executing small cells of Python code. We use notebooks a LOT, and they are a wonderful way to avoid the command line in a cross-platform way. The notebook is a fantastic tool to develop snippets of code a few lines at a time, and to capture and communicate higher-level workflows.

This also makes the notebook a fantastic pedagogical tool - in other words it's great for demos and communicating both the input and output of cells of Python code (e.g. `nbviewer <http://nbviewer.ipython.org/>`_. Many of the full-size examples of UrbanSim on this site are presented in notebooks.
This also makes the notebook a fantastic pedagogical tool - in other words it's great for demos and communicating both the input and output of cells of Python code (e.g. `nbviewer <http://nbviewer.ipython.org/>`_). Many of the full-size examples of UrbanSim on this site are presented in notebooks.

In many cases, you can write entire UrbanSim models in the notebook, but this is not generally considered the best practice. It's entirely up to you though, and we are happy to share with you our insights from many hours of developing and using this set of tools.

Expand All @@ -90,7 +90,7 @@ UrbanSim has been an active research project since the late 1990's, and has unde
for model in models:
model.simulate(model_configuration_parameters)

The set of models varies among the many UrbanSim applications to different regions, due to the data availability and cleanliness, the time and resources that can be devoted to the project, and specific research questions that motivated the projects. The set of models almost always includes at least the following:
The set of models varies among the many UrbanSim applications to different regions, due to data availability and cleanliness, the time and resources that can be devoted to the project, and specific research questions that motivated the projects. The set of models almost always includes at least the following:

Residential Real Estate Models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -127,7 +127,7 @@ Some representation of real estate development must be modeled to accurately rep

It should be noted that many other kinds of models can be included in the simulation loop as well. For instance, inclusion of scheduled development events is a key element to representing known future development projects.

In general, any Python script that reads and writes data can be included to help answer a specific research question or to model a certain real-world behavior - models can even be parameterized in JSON or YAML and included in the standard model set and an ever-increasing set of functionality will be added over time.
In general, any Python script that reads and writes data can be included to help answer a specific research question or to model a certain real-world behavior - models can even be parameterized in JSON or YAML and included in the standard model set, and an ever-increasing set of functionality will be added over time.

Specifying Scenario Inputs
--------------------------
Expand Down
2 changes: 1 addition & 1 deletion urbansim/developer/sqftproforma.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ class SqFtProFormaConfig(object):
parcel_sizes : list
A list of parcel sizes to test. Interestingly, right now
the parcel sizes cancel is this style of pro forma computation so
the parcel sizes cancel in this style of pro forma computation so
you can set this to something reasonable for debugging purposes -
e.g. [10000]. All sizes can be feet or meters as long as they are
consistently used.
Expand Down
10 changes: 5 additions & 5 deletions urbansim/models/lcm.py
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ def predict(self, choosers, alternatives, debug=False):
alternatives : pandas.DataFrame
Table describing the things from which agents are choosing.
debug : bool
If debug is set to true, well set the variable "sim_pdf" on
If debug is set to true, will set the variable "sim_pdf" on
the object to store the probabilities for mapping of the
outcome.
Expand Down Expand Up @@ -504,7 +504,7 @@ def predict_from_cfg(cls, movers, locations, cfgname,
movers : DataFrame
A dataframe of agents doing the choosing.
locations : DataFrame
A dataframe of locations which the choosers are location in and which
A dataframe of locations which the choosers are locating in and which
have a supply.
cfgname : string
The name of the yaml config file from which to read the location
Expand Down Expand Up @@ -697,7 +697,7 @@ def predict(self, choosers, alternatives, debug=False):
alternatives : pandas.DataFrame
Table describing the things from which agents are choosing.
debug : bool
If debug is set to true, well set the variable "sim_pdf" on
If debug is set to true, will set the variable "sim_pdf" on
the object to store the probabilities for mapping of the
outcome.
Expand Down Expand Up @@ -985,7 +985,7 @@ def predict(self, choosers, alternatives, debug=False):
alternatives : pandas.DataFrame
Table describing the things from which agents are choosing.
debug : bool
If debug is set to true, well set the variable "sim_pdf" on
If debug is set to true, will set the variable "sim_pdf" on
the object to store the probabilities for mapping of the
outcome.
Expand Down Expand Up @@ -1175,7 +1175,7 @@ def predict_from_cfg(cls, movers, locations, cfgname,
movers : DataFrame
A dataframe of agents doing the choosing.
locations : DataFrame
A dataframe of locations which the choosers are location in and which
A dataframe of locations which the choosers are locating in and which
have a supply.
cfgname : string
The name of the yaml config file from which to read the location
Expand Down

0 comments on commit 810172d

Please sign in to comment.