pandas-dev · TomAugspurger · Jan 25, 2019 · Jan 24, 2019 · Jan 25, 2019 · jorisvandenbossche
diff --git a/doc/redirects.csv b/doc/redirects.csv
@@ -4,6 +4,10 @@
 # getting started
 10min,getting_started/10min
 basics,getting_started/basics
+comparison_with_r,getting_started/comparison/comparison_with_r
+comparison_with_sql,getting_started/comparison/comparison_with_sql
+comparison_with_sas,getting_started/comparison/comparison_with_sas
+comparison_with_stata,getting_started/comparison/comparison_with_stata
 dsintro,getting_started/dsintro
 overview,getting_started/overview
 tutorials,getting_started/tutorials
@@ -12,6 +16,7 @@ tutorials,getting_started/tutorials
 advanced,user_guide/advanced
 categorical,user_guide/categorical
 computation,user_guide/computation
+cookbook,user_guide/cookbook
 enhancingperf,user_guide/enhancingperf
 gotchas,user_guide/gotchas
 groupby,user_guide/groupby

diff --git a/doc/source/comparison_with_r.rst → ..._started/comparison/comparison_with_r.rst b/doc/source/comparison_with_r.rst → ..._started/comparison/comparison_with_r.rst
diff --git a/doc/source/comparison_with_sas.rst → ...tarted/comparison/comparison_with_sas.rst b/doc/source/comparison_with_sas.rst → ...tarted/comparison/comparison_with_sas.rst
diff --git a/doc/source/comparison_with_sql.rst → ...tarted/comparison/comparison_with_sql.rst b/doc/source/comparison_with_sql.rst → ...tarted/comparison/comparison_with_sql.rst
diff --git a/doc/source/comparison_with_stata.rst → ...rted/comparison/comparison_with_stata.rst b/doc/source/comparison_with_stata.rst → ...rted/comparison/comparison_with_stata.rst
diff --git a/doc/source/getting_started/comparison/index.rst b/doc/source/getting_started/comparison/index.rst
@@ -0,0 +1,15 @@
+{{ header }}
+
+.. _comparison:
+
+===========================
+Comparison with other tools
+===========================
+
+.. toctree::
+    :maxdepth: 2
+
+    comparison_with_r
+    comparison_with_sql
+    comparison_with_sas
+    comparison_with_stata
diff --git a/doc/source/getting_started/index.rst b/doc/source/getting_started/index.rst
@@ -13,4 +13,5 @@ Getting started
     10min
     basics
     dsintro
+    comparison/index
     tutorials
diff --git a/doc/source/getting_started/overview.rst b/doc/source/getting_started/overview.rst
@@ -6,25 +6,80 @@
 Package overview
 ****************
 
-:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
-easy-to-use data structures and data analysis tools for the `Python <https://www.python.org/>`__
-programming language.
-
-:mod:`pandas` consists of the following elements:
-
-* A set of labeled array data structures, the primary of which are
-  Series and DataFrame.
-* Index objects enabling both simple axis indexing and multi-level /
-  hierarchical axis indexing.
-* An integrated group by engine for aggregating and transforming data sets.
-* Date range generation (date_range) and custom date offsets enabling the
-  implementation of customized frequencies.
-* Input/Output tools: loading tabular data from flat files (CSV, delimited,
-  Excel 2003), and saving and loading pandas objects from the fast and
-  efficient PyTables/HDF5 format.
-* Memory-efficient "sparse" versions of the standard data structures for storing
-  data that is mostly missing or mostly constant (some fixed value).
-* Moving window statistics (rolling mean, rolling standard deviation, etc.).
+**pandas** is a `Python <https://www.python.org>`__ package providing fast,
+flexible, and expressive data structures designed to make working with
+"relational" or "labeled" data both easy and intuitive. It aims to be the
+fundamental high-level building block for doing practical, **real world** data
+analysis in Python. Additionally, it has the broader goal of becoming **the
+most powerful and flexible open source data analysis / manipulation tool
+available in any language**. It is already well on its way toward this goal.
+
+pandas is well suited for many different kinds of data:
+
+  - Tabular data with heterogeneously-typed columns, as in an SQL table or
+    Excel spreadsheet
+  - Ordered and unordered (not necessarily fixed-frequency) time series data.
+  - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
+    column labels
+  - Any other form of observational / statistical data sets. The data actually
+    need not be labeled at all to be placed into a pandas data structure
+
+The two primary data structures of pandas, :class:`Series` (1-dimensional)
+and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
+cases in finance, statistics, social science, and many areas of
+engineering. For R users, :class:`DataFrame` provides everything that R's
+``data.frame`` provides and much more. pandas is built on top of `NumPy
+<https://www.numpy.org>`__ and is intended to integrate well within a scientific
+computing environment with many other 3rd party libraries.
+
+Here are just a few of the things that pandas does well:
+
+  - Easy handling of **missing data** (represented as NaN) in floating point as
+    well as non-floating point data
+  - Size mutability: columns can be **inserted and deleted** from DataFrame and
+    higher dimensional objects
+  - Automatic and explicit **data alignment**: objects can be explicitly
+    aligned to a set of labels, or the user can simply ignore the labels and
+    let `Series`, `DataFrame`, etc. automatically align the data for you in
+    computations
+  - Powerful, flexible **group by** functionality to perform
+    split-apply-combine operations on data sets, for both aggregating and
+    transforming data
+  - Make it **easy to convert** ragged, differently-indexed data in other
+    Python and NumPy data structures into DataFrame objects
+  - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
+    of large data sets
+  - Intuitive **merging** and **joining** data sets
+  - Flexible **reshaping** and pivoting of data sets
+  - **Hierarchical** labeling of axes (possible to have multiple labels per
+    tick)
+  - Robust IO tools for loading data from **flat files** (CSV and delimited),
+    Excel files, databases, and saving / loading data from the ultrafast **HDF5
+    format**
+  - **Time series**-specific functionality: date range generation and frequency
+    conversion, moving window statistics, moving window linear regressions,
+    date shifting and lagging, etc.
+
+Many of these principles are here to address the shortcomings frequently
+experienced using other languages / scientific research environments. For data
+scientists, working with data is typically divided into multiple stages:
+munging and cleaning data, analyzing / modeling it, then organizing the results
+of the analysis into a form suitable for plotting or tabular display. pandas
+is the ideal tool for all of these tasks.
+
+Some other notes
+
+ - pandas is **fast**. Many of the low-level algorithmic bits have been
+   extensively tweaked in `Cython <https://cython.org>`__ code. However, as with
+   anything else generalization usually sacrifices performance. So if you focus
+   on one feature for your application you may be able to create a faster
+   specialized tool.
+
+ - pandas is a dependency of `statsmodels
+   <https://www.statsmodels.org/stable/index.html>`__, making it an important part of the
+   statistical computing ecosystem in Python.
+
+ - pandas has been used extensively in production in financial applications.
 
 Data Structures
 ---------------

diff --git a/doc/source/index.rst.template b/doc/source/index.rst.template
@@ -22,93 +22,15 @@ pandas: powerful Python data analysis toolkit
 
 **Developer Mailing List:** https://groups.google.com/forum/#!forum/pydata
 
-**pandas** is a `Python <https://www.python.org>`__ package providing fast,
-flexible, and expressive data structures designed to make working with
-"relational" or "labeled" data both easy and intuitive. It aims to be the
-fundamental high-level building block for doing practical, **real world** data
-analysis in Python. Additionally, it has the broader goal of becoming **the
-most powerful and flexible open source data analysis / manipulation tool
-available in any language**. It is already well on its way toward this goal.
-
-pandas is well suited for many different kinds of data:
-
-  - Tabular data with heterogeneously-typed columns, as in an SQL table or
-    Excel spreadsheet
-  - Ordered and unordered (not necessarily fixed-frequency) time series data.
-  - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
-    column labels
-  - Any other form of observational / statistical data sets. The data actually
-    need not be labeled at all to be placed into a pandas data structure
-
-The two primary data structures of pandas, :class:`Series` (1-dimensional)
-and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
-cases in finance, statistics, social science, and many areas of
-engineering. For R users, :class:`DataFrame` provides everything that R's
-``data.frame`` provides and much more. pandas is built on top of `NumPy
-<https://www.numpy.org>`__ and is intended to integrate well within a scientific
-computing environment with many other 3rd party libraries.
-
-Here are just a few of the things that pandas does well:
-
-  - Easy handling of **missing data** (represented as NaN) in floating point as
-    well as non-floating point data
-  - Size mutability: columns can be **inserted and deleted** from DataFrame and
-    higher dimensional objects
-  - Automatic and explicit **data alignment**: objects can be explicitly
-    aligned to a set of labels, or the user can simply ignore the labels and
-    let `Series`, `DataFrame`, etc. automatically align the data for you in
-    computations
-  - Powerful, flexible **group by** functionality to perform
-    split-apply-combine operations on data sets, for both aggregating and
-    transforming data
-  - Make it **easy to convert** ragged, differently-indexed data in other
-    Python and NumPy data structures into DataFrame objects
-  - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
-    of large data sets
-  - Intuitive **merging** and **joining** data sets
-  - Flexible **reshaping** and pivoting of data sets
-  - **Hierarchical** labeling of axes (possible to have multiple labels per
-    tick)
-  - Robust IO tools for loading data from **flat files** (CSV and delimited),
-    Excel files, databases, and saving / loading data from the ultrafast **HDF5
-    format**
-  - **Time series**-specific functionality: date range generation and frequency
-    conversion, moving window statistics, moving window linear regressions,
-    date shifting and lagging, etc.
-
-Many of these principles are here to address the shortcomings frequently
-experienced using other languages / scientific research environments. For data
-scientists, working with data is typically divided into multiple stages:
-munging and cleaning data, analyzing / modeling it, then organizing the results
-of the analysis into a form suitable for plotting or tabular display. pandas
-is the ideal tool for all of these tasks.
-
-Some other notes
-
- - pandas is **fast**. Many of the low-level algorithmic bits have been
-   extensively tweaked in `Cython <https://cython.org>`__ code. However, as with
-   anything else generalization usually sacrifices performance. So if you focus
-   on one feature for your application you may be able to create a faster
-   specialized tool.
-
- - pandas is a dependency of `statsmodels
-   <https://www.statsmodels.org/stable/index.html>`__, making it an important part of the
-   statistical computing ecosystem in Python.
-
- - pandas has been used extensively in production in financial applications.
-
-.. note::
-
-   This documentation assumes general familiarity with NumPy. If you haven't
-   used NumPy much or at all, do invest some time in `learning about NumPy
-   <https://docs.scipy.org>`__ first.
-
-See the package overview for more detail about what's in the library.
+:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
+easy-to-use data structures and data analysis tools for the `Python <https://www.python.org/>`__
+programming language.
 
+See the :ref:`overview` for more detail about what's in the library.
 
 {% if single_doc and single_doc.endswith('.rst') -%}
 .. toctree::
-    :maxdepth: 4
+    :maxdepth: 2
 
     {{ single_doc[:-4] }}
 {% elif single_doc %}
@@ -118,21 +40,15 @@ See the package overview for more detail about what's in the library.
     {{ single_doc }}
 {% else -%}
 .. toctree::
-    :maxdepth: 4
+    :maxdepth: 2
 {% endif %}
 
     {% if not single_doc -%}
     What's New <whatsnew/v0.24.0>
     install
     getting_started/index
-    cookbook
     user_guide/index
-    r_interface
     ecosystem
-    comparison_with_r
-    comparison_with_sql
-    comparison_with_sas
-    comparison_with_stata
     {% endif -%}
     {% if include_api -%}
     api/index

diff --git a/doc/source/r_interface.rst b/doc/source/r_interface.rst
diff --git a/doc/source/cookbook.rst → doc/source/user_guide/cookbook.rst b/doc/source/cookbook.rst → doc/source/user_guide/cookbook.rst
diff --git a/doc/source/user_guide/index.rst b/doc/source/user_guide/index.rst
@@ -37,3 +37,4 @@ Further information on any specific method can be obtained in the
     enhancingperf
     sparse
     gotchas
+    cookbook
diff --git a/doc/source/user_guide/style.ipynb b/doc/source/user_guide/style.ipynb
@@ -1133,7 +1133,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "with open(\"template_structure.html\") as f:\n",
+    "with open(\"templates/template_structure.html\") as f:\n",
     "    structure = f.read()\n",
     "    \n",
     "HTML(structure)"

diff --git a/doc/source/templates/myhtml.tpl → doc/source/user_guide/templates/myhtml.tpl b/doc/source/templates/myhtml.tpl → doc/source/user_guide/templates/myhtml.tpl
diff --git a/doc/source/template_structure.html → ...r_guide/templates/template_structure.html b/doc/source/template_structure.html → ...r_guide/templates/template_structure.html