diff --git a/doc/redirects.csv b/doc/redirects.csv
index 4f4b3d7fc0780..8c62ecc362ccd 100644
--- a/doc/redirects.csv
+++ b/doc/redirects.csv
@@ -4,6 +4,10 @@
# getting started
10min,getting_started/10min
basics,getting_started/basics
+comparison_with_r,getting_started/comparison/comparison_with_r
+comparison_with_sql,getting_started/comparison/comparison_with_sql
+comparison_with_sas,getting_started/comparison/comparison_with_sas
+comparison_with_stata,getting_started/comparison/comparison_with_stata
dsintro,getting_started/dsintro
overview,getting_started/overview
tutorials,getting_started/tutorials
@@ -12,6 +16,7 @@ tutorials,getting_started/tutorials
advanced,user_guide/advanced
categorical,user_guide/categorical
computation,user_guide/computation
+cookbook,user_guide/cookbook
enhancingperf,user_guide/enhancingperf
gotchas,user_guide/gotchas
groupby,user_guide/groupby
diff --git a/doc/source/comparison_with_r.rst b/doc/source/getting_started/comparison/comparison_with_r.rst
similarity index 100%
rename from doc/source/comparison_with_r.rst
rename to doc/source/getting_started/comparison/comparison_with_r.rst
diff --git a/doc/source/comparison_with_sas.rst b/doc/source/getting_started/comparison/comparison_with_sas.rst
similarity index 100%
rename from doc/source/comparison_with_sas.rst
rename to doc/source/getting_started/comparison/comparison_with_sas.rst
diff --git a/doc/source/comparison_with_sql.rst b/doc/source/getting_started/comparison/comparison_with_sql.rst
similarity index 100%
rename from doc/source/comparison_with_sql.rst
rename to doc/source/getting_started/comparison/comparison_with_sql.rst
diff --git a/doc/source/comparison_with_stata.rst b/doc/source/getting_started/comparison/comparison_with_stata.rst
similarity index 100%
rename from doc/source/comparison_with_stata.rst
rename to doc/source/getting_started/comparison/comparison_with_stata.rst
diff --git a/doc/source/getting_started/comparison/index.rst b/doc/source/getting_started/comparison/index.rst
new file mode 100644
index 0000000000000..998706ce0c639
--- /dev/null
+++ b/doc/source/getting_started/comparison/index.rst
@@ -0,0 +1,15 @@
+{{ header }}
+
+.. _comparison:
+
+===========================
+Comparison with other tools
+===========================
+
+.. toctree::
+ :maxdepth: 2
+
+ comparison_with_r
+ comparison_with_sql
+ comparison_with_sas
+ comparison_with_stata
diff --git a/doc/source/getting_started/index.rst b/doc/source/getting_started/index.rst
index 116efe79beef1..4c5d26461a667 100644
--- a/doc/source/getting_started/index.rst
+++ b/doc/source/getting_started/index.rst
@@ -13,4 +13,5 @@ Getting started
10min
basics
dsintro
+ comparison/index
tutorials
diff --git a/doc/source/getting_started/overview.rst b/doc/source/getting_started/overview.rst
index 1e07df47aadca..b531f686951fc 100644
--- a/doc/source/getting_started/overview.rst
+++ b/doc/source/getting_started/overview.rst
@@ -6,25 +6,80 @@
Package overview
****************
-:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
-easy-to-use data structures and data analysis tools for the `Python `__
-programming language.
-
-:mod:`pandas` consists of the following elements:
-
-* A set of labeled array data structures, the primary of which are
- Series and DataFrame.
-* Index objects enabling both simple axis indexing and multi-level /
- hierarchical axis indexing.
-* An integrated group by engine for aggregating and transforming data sets.
-* Date range generation (date_range) and custom date offsets enabling the
- implementation of customized frequencies.
-* Input/Output tools: loading tabular data from flat files (CSV, delimited,
- Excel 2003), and saving and loading pandas objects from the fast and
- efficient PyTables/HDF5 format.
-* Memory-efficient "sparse" versions of the standard data structures for storing
- data that is mostly missing or mostly constant (some fixed value).
-* Moving window statistics (rolling mean, rolling standard deviation, etc.).
+**pandas** is a `Python `__ package providing fast,
+flexible, and expressive data structures designed to make working with
+"relational" or "labeled" data both easy and intuitive. It aims to be the
+fundamental high-level building block for doing practical, **real world** data
+analysis in Python. Additionally, it has the broader goal of becoming **the
+most powerful and flexible open source data analysis / manipulation tool
+available in any language**. It is already well on its way toward this goal.
+
+pandas is well suited for many different kinds of data:
+
+ - Tabular data with heterogeneously-typed columns, as in an SQL table or
+ Excel spreadsheet
+ - Ordered and unordered (not necessarily fixed-frequency) time series data.
+ - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
+ column labels
+ - Any other form of observational / statistical data sets. The data actually
+ need not be labeled at all to be placed into a pandas data structure
+
+The two primary data structures of pandas, :class:`Series` (1-dimensional)
+and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
+cases in finance, statistics, social science, and many areas of
+engineering. For R users, :class:`DataFrame` provides everything that R's
+``data.frame`` provides and much more. pandas is built on top of `NumPy
+`__ and is intended to integrate well within a scientific
+computing environment with many other 3rd party libraries.
+
+Here are just a few of the things that pandas does well:
+
+ - Easy handling of **missing data** (represented as NaN) in floating point as
+ well as non-floating point data
+ - Size mutability: columns can be **inserted and deleted** from DataFrame and
+ higher dimensional objects
+ - Automatic and explicit **data alignment**: objects can be explicitly
+ aligned to a set of labels, or the user can simply ignore the labels and
+ let `Series`, `DataFrame`, etc. automatically align the data for you in
+ computations
+ - Powerful, flexible **group by** functionality to perform
+ split-apply-combine operations on data sets, for both aggregating and
+ transforming data
+ - Make it **easy to convert** ragged, differently-indexed data in other
+ Python and NumPy data structures into DataFrame objects
+ - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
+ of large data sets
+ - Intuitive **merging** and **joining** data sets
+ - Flexible **reshaping** and pivoting of data sets
+ - **Hierarchical** labeling of axes (possible to have multiple labels per
+ tick)
+ - Robust IO tools for loading data from **flat files** (CSV and delimited),
+ Excel files, databases, and saving / loading data from the ultrafast **HDF5
+ format**
+ - **Time series**-specific functionality: date range generation and frequency
+ conversion, moving window statistics, moving window linear regressions,
+ date shifting and lagging, etc.
+
+Many of these principles are here to address the shortcomings frequently
+experienced using other languages / scientific research environments. For data
+scientists, working with data is typically divided into multiple stages:
+munging and cleaning data, analyzing / modeling it, then organizing the results
+of the analysis into a form suitable for plotting or tabular display. pandas
+is the ideal tool for all of these tasks.
+
+Some other notes
+
+ - pandas is **fast**. Many of the low-level algorithmic bits have been
+ extensively tweaked in `Cython `__ code. However, as with
+ anything else generalization usually sacrifices performance. So if you focus
+ on one feature for your application you may be able to create a faster
+ specialized tool.
+
+ - pandas is a dependency of `statsmodels
+ `__, making it an important part of the
+ statistical computing ecosystem in Python.
+
+ - pandas has been used extensively in production in financial applications.
Data Structures
---------------
diff --git a/doc/source/index.rst.template b/doc/source/index.rst.template
index bc420a906b59c..ab51911a610e3 100644
--- a/doc/source/index.rst.template
+++ b/doc/source/index.rst.template
@@ -22,93 +22,15 @@ pandas: powerful Python data analysis toolkit
**Developer Mailing List:** https://groups.google.com/forum/#!forum/pydata
-**pandas** is a `Python `__ package providing fast,
-flexible, and expressive data structures designed to make working with
-"relational" or "labeled" data both easy and intuitive. It aims to be the
-fundamental high-level building block for doing practical, **real world** data
-analysis in Python. Additionally, it has the broader goal of becoming **the
-most powerful and flexible open source data analysis / manipulation tool
-available in any language**. It is already well on its way toward this goal.
-
-pandas is well suited for many different kinds of data:
-
- - Tabular data with heterogeneously-typed columns, as in an SQL table or
- Excel spreadsheet
- - Ordered and unordered (not necessarily fixed-frequency) time series data.
- - Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
- column labels
- - Any other form of observational / statistical data sets. The data actually
- need not be labeled at all to be placed into a pandas data structure
-
-The two primary data structures of pandas, :class:`Series` (1-dimensional)
-and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
-cases in finance, statistics, social science, and many areas of
-engineering. For R users, :class:`DataFrame` provides everything that R's
-``data.frame`` provides and much more. pandas is built on top of `NumPy
-`__ and is intended to integrate well within a scientific
-computing environment with many other 3rd party libraries.
-
-Here are just a few of the things that pandas does well:
-
- - Easy handling of **missing data** (represented as NaN) in floating point as
- well as non-floating point data
- - Size mutability: columns can be **inserted and deleted** from DataFrame and
- higher dimensional objects
- - Automatic and explicit **data alignment**: objects can be explicitly
- aligned to a set of labels, or the user can simply ignore the labels and
- let `Series`, `DataFrame`, etc. automatically align the data for you in
- computations
- - Powerful, flexible **group by** functionality to perform
- split-apply-combine operations on data sets, for both aggregating and
- transforming data
- - Make it **easy to convert** ragged, differently-indexed data in other
- Python and NumPy data structures into DataFrame objects
- - Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
- of large data sets
- - Intuitive **merging** and **joining** data sets
- - Flexible **reshaping** and pivoting of data sets
- - **Hierarchical** labeling of axes (possible to have multiple labels per
- tick)
- - Robust IO tools for loading data from **flat files** (CSV and delimited),
- Excel files, databases, and saving / loading data from the ultrafast **HDF5
- format**
- - **Time series**-specific functionality: date range generation and frequency
- conversion, moving window statistics, moving window linear regressions,
- date shifting and lagging, etc.
-
-Many of these principles are here to address the shortcomings frequently
-experienced using other languages / scientific research environments. For data
-scientists, working with data is typically divided into multiple stages:
-munging and cleaning data, analyzing / modeling it, then organizing the results
-of the analysis into a form suitable for plotting or tabular display. pandas
-is the ideal tool for all of these tasks.
-
-Some other notes
-
- - pandas is **fast**. Many of the low-level algorithmic bits have been
- extensively tweaked in `Cython `__ code. However, as with
- anything else generalization usually sacrifices performance. So if you focus
- on one feature for your application you may be able to create a faster
- specialized tool.
-
- - pandas is a dependency of `statsmodels
- `__, making it an important part of the
- statistical computing ecosystem in Python.
-
- - pandas has been used extensively in production in financial applications.
-
-.. note::
-
- This documentation assumes general familiarity with NumPy. If you haven't
- used NumPy much or at all, do invest some time in `learning about NumPy
- `__ first.
-
-See the package overview for more detail about what's in the library.
+:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
+easy-to-use data structures and data analysis tools for the `Python `__
+programming language.
+See the :ref:`overview` for more detail about what's in the library.
{% if single_doc and single_doc.endswith('.rst') -%}
.. toctree::
- :maxdepth: 4
+ :maxdepth: 2
{{ single_doc[:-4] }}
{% elif single_doc %}
@@ -118,21 +40,15 @@ See the package overview for more detail about what's in the library.
{{ single_doc }}
{% else -%}
.. toctree::
- :maxdepth: 4
+ :maxdepth: 2
{% endif %}
{% if not single_doc -%}
What's New
install
getting_started/index
- cookbook
user_guide/index
- r_interface
ecosystem
- comparison_with_r
- comparison_with_sql
- comparison_with_sas
- comparison_with_stata
{% endif -%}
{% if include_api -%}
api/index
diff --git a/doc/source/r_interface.rst b/doc/source/r_interface.rst
deleted file mode 100644
index 9839bba4884d4..0000000000000
--- a/doc/source/r_interface.rst
+++ /dev/null
@@ -1,94 +0,0 @@
-.. _rpy:
-
-{{ header }}
-
-******************
-rpy2 / R interface
-******************
-
-.. warning::
-
- Up to pandas 0.19, a ``pandas.rpy`` module existed with functionality to
- convert between pandas and ``rpy2`` objects. This functionality now lives in
- the `rpy2 `__ project itself.
- See the `updating section `__
- of the previous documentation for a guide to port your code from the
- removed ``pandas.rpy`` to ``rpy2`` functions.
-
-
-`rpy2 `__ is an interface to R running embedded in a Python process, and also includes functionality to deal with pandas DataFrames.
-Converting data frames back and forth between rpy2 and pandas should be largely
-automated (no need to convert explicitly, it will be done on the fly in most
-rpy2 functions).
-To convert explicitly, the functions are ``pandas2ri.py2ri()`` and
-``pandas2ri.ri2py()``.
-
-
-See also the documentation of the `rpy2 `__ project: https://rpy2.readthedocs.io.
-
-In the remainder of this page, a few examples of explicit conversion is given. The pandas conversion of rpy2 needs first to be activated:
-
-.. ipython::
- :verbatim:
-
- In [1]: from rpy2.robjects import pandas2ri
- ...: pandas2ri.activate()
-
-Transferring R data sets into Python
-------------------------------------
-
-Once the pandas conversion is activated (``pandas2ri.activate()``), many conversions
-of R to pandas objects will be done automatically. For example, to obtain the 'iris' dataset as a pandas DataFrame:
-
-.. ipython::
- :verbatim:
-
- In [2]: from rpy2.robjects import r
-
- In [3]: r.data('iris')
-
- In [4]: r['iris'].head()
- Out[4]:
- Sepal.Length Sepal.Width Petal.Length Petal.Width Species
- 0 5.1 3.5 1.4 0.2 setosa
- 1 4.9 3.0 1.4 0.2 setosa
- 2 4.7 3.2 1.3 0.2 setosa
- 3 4.6 3.1 1.5 0.2 setosa
- 4 5.0 3.6 1.4 0.2 setosa
-
-If the pandas conversion was not activated, the above could also be accomplished
-by explicitly converting it with the ``pandas2ri.ri2py`` function
-(``pandas2ri.ri2py(r['iris'])``).
-
-Converting DataFrames into R objects
-------------------------------------
-
-The ``pandas2ri.py2ri`` function support the reverse operation to convert
-DataFrames into the equivalent R object (that is, **data.frame**):
-
-.. ipython::
- :verbatim:
-
- In [5]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]},
- ...: index=["one", "two", "three"])
-
- In [6]: r_dataframe = pandas2ri.py2ri(df)
-
- In [7]: print(type(r_dataframe))
- Out[7]:
-
- In [8]: print(r_dataframe)
- Out[8]:
- A B C
- one 1 4 7
- two 2 5 8
- three 3 6 9
-
-
-The DataFrame's index is stored as the ``rownames`` attribute of the
-data.frame instance.
-
-
-..
- Calling R functions with pandas objects
- High-level interface to R estimators
diff --git a/doc/source/cookbook.rst b/doc/source/user_guide/cookbook.rst
similarity index 100%
rename from doc/source/cookbook.rst
rename to doc/source/user_guide/cookbook.rst
diff --git a/doc/source/user_guide/index.rst b/doc/source/user_guide/index.rst
index 60e722808d647..d39cf7103ab63 100644
--- a/doc/source/user_guide/index.rst
+++ b/doc/source/user_guide/index.rst
@@ -37,3 +37,4 @@ Further information on any specific method can be obtained in the
enhancingperf
sparse
gotchas
+ cookbook
diff --git a/doc/source/user_guide/style.ipynb b/doc/source/user_guide/style.ipynb
index a238c3b16e9ad..79a9848704eec 100644
--- a/doc/source/user_guide/style.ipynb
+++ b/doc/source/user_guide/style.ipynb
@@ -1133,7 +1133,7 @@
"metadata": {},
"outputs": [],
"source": [
- "with open(\"template_structure.html\") as f:\n",
+ "with open(\"templates/template_structure.html\") as f:\n",
" structure = f.read()\n",
" \n",
"HTML(structure)"
diff --git a/doc/source/templates/myhtml.tpl b/doc/source/user_guide/templates/myhtml.tpl
similarity index 100%
rename from doc/source/templates/myhtml.tpl
rename to doc/source/user_guide/templates/myhtml.tpl
diff --git a/doc/source/template_structure.html b/doc/source/user_guide/templates/template_structure.html
similarity index 100%
rename from doc/source/template_structure.html
rename to doc/source/user_guide/templates/template_structure.html