Skip to content

Latest commit

 

History

History
58 lines (40 loc) · 1.71 KB

quickstart.rst

File metadata and controls

58 lines (40 loc) · 1.71 KB

Quickstart

Basic

The best way to learn R-style formula syntax with ydot is to head on over to patsy :cite:`2020:patsy` and read the documentation. Below, we show very simple code to transform a Spark dataframe into two design matrices (these are also Spark dataframes), y and X, using a formula that defines a model up to two-way interactions.

.. literalinclude:: _code/demo.py
   :language: python
   :linenos:

More

We use the code below to generate the models (data) below.

.. literalinclude:: _code/demo-formulas.py
   :language: python
   :linenos:

You can use numpy functions against continuous variables.

The * specifies interactions and keeps lower order terms.

The : specifies interactions and drops lower order terms.

The / is quirky according to the patsy documentation, but it is shorthand for a / b = a + a:b.

If you need to drop the Intercept, add - 1 at the end. Note that one of the dummy variables for a is not dropped. This could be a bug with patsy.