Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Final reorganization of documentation pages #24890

Merged
merged 2 commits into from Jan 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/redirects.csv
Expand Up @@ -4,6 +4,10 @@
# getting started
10min,getting_started/10min
basics,getting_started/basics
comparison_with_r,getting_started/comparison/comparison_with_r
comparison_with_sql,getting_started/comparison/comparison_with_sql
comparison_with_sas,getting_started/comparison/comparison_with_sas
comparison_with_stata,getting_started/comparison/comparison_with_stata
dsintro,getting_started/dsintro
overview,getting_started/overview
tutorials,getting_started/tutorials
Expand All @@ -12,6 +16,7 @@ tutorials,getting_started/tutorials
advanced,user_guide/advanced
categorical,user_guide/categorical
computation,user_guide/computation
cookbook,user_guide/cookbook
enhancingperf,user_guide/enhancingperf
gotchas,user_guide/gotchas
groupby,user_guide/groupby
Expand Down
15 changes: 15 additions & 0 deletions doc/source/getting_started/comparison/index.rst
@@ -0,0 +1,15 @@
{{ header }}

.. _comparison:

===========================
Comparison with other tools
===========================

.. toctree::
:maxdepth: 2

comparison_with_r
comparison_with_sql
comparison_with_sas
comparison_with_stata
1 change: 1 addition & 0 deletions doc/source/getting_started/index.rst
Expand Up @@ -13,4 +13,5 @@ Getting started
10min
basics
dsintro
comparison/index
tutorials
93 changes: 74 additions & 19 deletions doc/source/getting_started/overview.rst
Expand Up @@ -6,25 +6,80 @@
Package overview
****************

:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
easy-to-use data structures and data analysis tools for the `Python <https://www.python.org/>`__
programming language.

:mod:`pandas` consists of the following elements:

* A set of labeled array data structures, the primary of which are
Series and DataFrame.
* Index objects enabling both simple axis indexing and multi-level /
hierarchical axis indexing.
* An integrated group by engine for aggregating and transforming data sets.
* Date range generation (date_range) and custom date offsets enabling the
implementation of customized frequencies.
* Input/Output tools: loading tabular data from flat files (CSV, delimited,
Excel 2003), and saving and loading pandas objects from the fast and
efficient PyTables/HDF5 format.
* Memory-efficient "sparse" versions of the standard data structures for storing
data that is mostly missing or mostly constant (some fixed value).
* Moving window statistics (rolling mean, rolling standard deviation, etc.).
**pandas** is a `Python <https://www.python.org>`__ package providing fast,
flexible, and expressive data structures designed to make working with
"relational" or "labeled" data both easy and intuitive. It aims to be the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a copy paste from what was there before, correct?

(before I start nitpicking on it :-))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry, moved from the home to the overview without any change.

I didn't move the note that being familiar with numpy was required, I didn't feel it made sense in any of the pages.

fundamental high-level building block for doing practical, **real world** data
analysis in Python. Additionally, it has the broader goal of becoming **the
most powerful and flexible open source data analysis / manipulation tool
available in any language**. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:

- Tabular data with heterogeneously-typed columns, as in an SQL table or
Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
column labels
- Any other form of observational / statistical data sets. The data actually
need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, :class:`Series` (1-dimensional)
and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
cases in finance, statistics, social science, and many areas of
engineering. For R users, :class:`DataFrame` provides everything that R's
``data.frame`` provides and much more. pandas is built on top of `NumPy
<https://www.numpy.org>`__ and is intended to integrate well within a scientific
computing environment with many other 3rd party libraries.

Here are just a few of the things that pandas does well:

- Easy handling of **missing data** (represented as NaN) in floating point as
well as non-floating point data
- Size mutability: columns can be **inserted and deleted** from DataFrame and
higher dimensional objects
- Automatic and explicit **data alignment**: objects can be explicitly
aligned to a set of labels, or the user can simply ignore the labels and
let `Series`, `DataFrame`, etc. automatically align the data for you in
computations
- Powerful, flexible **group by** functionality to perform
split-apply-combine operations on data sets, for both aggregating and
transforming data
- Make it **easy to convert** ragged, differently-indexed data in other
Python and NumPy data structures into DataFrame objects
- Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
of large data sets
- Intuitive **merging** and **joining** data sets
- Flexible **reshaping** and pivoting of data sets
- **Hierarchical** labeling of axes (possible to have multiple labels per
tick)
- Robust IO tools for loading data from **flat files** (CSV and delimited),
Excel files, databases, and saving / loading data from the ultrafast **HDF5
format**
- **Time series**-specific functionality: date range generation and frequency
conversion, moving window statistics, moving window linear regressions,
date shifting and lagging, etc.

Many of these principles are here to address the shortcomings frequently
experienced using other languages / scientific research environments. For data
scientists, working with data is typically divided into multiple stages:
munging and cleaning data, analyzing / modeling it, then organizing the results
of the analysis into a form suitable for plotting or tabular display. pandas
is the ideal tool for all of these tasks.

Some other notes

- pandas is **fast**. Many of the low-level algorithmic bits have been
extensively tweaked in `Cython <https://cython.org>`__ code. However, as with
anything else generalization usually sacrifices performance. So if you focus
on one feature for your application you may be able to create a faster
specialized tool.

- pandas is a dependency of `statsmodels
<https://www.statsmodels.org/stable/index.html>`__, making it an important part of the
statistical computing ecosystem in Python.

- pandas has been used extensively in production in financial applications.

Data Structures
---------------
Expand Down
96 changes: 6 additions & 90 deletions doc/source/index.rst.template
Expand Up @@ -22,93 +22,15 @@ pandas: powerful Python data analysis toolkit

**Developer Mailing List:** https://groups.google.com/forum/#!forum/pydata

**pandas** is a `Python <https://www.python.org>`__ package providing fast,
flexible, and expressive data structures designed to make working with
"relational" or "labeled" data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, **real world** data
analysis in Python. Additionally, it has the broader goal of becoming **the
most powerful and flexible open source data analysis / manipulation tool
available in any language**. It is already well on its way toward this goal.

pandas is well suited for many different kinds of data:

- Tabular data with heterogeneously-typed columns, as in an SQL table or
Excel spreadsheet
- Ordered and unordered (not necessarily fixed-frequency) time series data.
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
column labels
- Any other form of observational / statistical data sets. The data actually
need not be labeled at all to be placed into a pandas data structure

The two primary data structures of pandas, :class:`Series` (1-dimensional)
and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
cases in finance, statistics, social science, and many areas of
engineering. For R users, :class:`DataFrame` provides everything that R's
``data.frame`` provides and much more. pandas is built on top of `NumPy
<https://www.numpy.org>`__ and is intended to integrate well within a scientific
computing environment with many other 3rd party libraries.

Here are just a few of the things that pandas does well:

- Easy handling of **missing data** (represented as NaN) in floating point as
well as non-floating point data
- Size mutability: columns can be **inserted and deleted** from DataFrame and
higher dimensional objects
- Automatic and explicit **data alignment**: objects can be explicitly
aligned to a set of labels, or the user can simply ignore the labels and
let `Series`, `DataFrame`, etc. automatically align the data for you in
computations
- Powerful, flexible **group by** functionality to perform
split-apply-combine operations on data sets, for both aggregating and
transforming data
- Make it **easy to convert** ragged, differently-indexed data in other
Python and NumPy data structures into DataFrame objects
- Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
of large data sets
- Intuitive **merging** and **joining** data sets
- Flexible **reshaping** and pivoting of data sets
- **Hierarchical** labeling of axes (possible to have multiple labels per
tick)
- Robust IO tools for loading data from **flat files** (CSV and delimited),
Excel files, databases, and saving / loading data from the ultrafast **HDF5
format**
- **Time series**-specific functionality: date range generation and frequency
conversion, moving window statistics, moving window linear regressions,
date shifting and lagging, etc.

Many of these principles are here to address the shortcomings frequently
experienced using other languages / scientific research environments. For data
scientists, working with data is typically divided into multiple stages:
munging and cleaning data, analyzing / modeling it, then organizing the results
of the analysis into a form suitable for plotting or tabular display. pandas
is the ideal tool for all of these tasks.

Some other notes

- pandas is **fast**. Many of the low-level algorithmic bits have been
extensively tweaked in `Cython <https://cython.org>`__ code. However, as with
anything else generalization usually sacrifices performance. So if you focus
on one feature for your application you may be able to create a faster
specialized tool.

- pandas is a dependency of `statsmodels
<https://www.statsmodels.org/stable/index.html>`__, making it an important part of the
statistical computing ecosystem in Python.

- pandas has been used extensively in production in financial applications.

.. note::

This documentation assumes general familiarity with NumPy. If you haven't
used NumPy much or at all, do invest some time in `learning about NumPy
<https://docs.scipy.org>`__ first.

See the package overview for more detail about what's in the library.
:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
easy-to-use data structures and data analysis tools for the `Python <https://www.python.org/>`__
programming language.

See the :ref:`overview` for more detail about what's in the library.

{% if single_doc and single_doc.endswith('.rst') -%}
.. toctree::
:maxdepth: 4
:maxdepth: 2

{{ single_doc[:-4] }}
{% elif single_doc %}
Expand All @@ -118,21 +40,15 @@ See the package overview for more detail about what's in the library.
{{ single_doc }}
{% else -%}
.. toctree::
:maxdepth: 4
:maxdepth: 2
{% endif %}

{% if not single_doc -%}
What's New <whatsnew/v0.24.0>
install
getting_started/index
cookbook
user_guide/index
r_interface
ecosystem
comparison_with_r
comparison_with_sql
comparison_with_sas
comparison_with_stata
{% endif -%}
{% if include_api -%}
api/index
Expand Down
94 changes: 0 additions & 94 deletions doc/source/r_interface.rst

This file was deleted.

File renamed without changes.
1 change: 1 addition & 0 deletions doc/source/user_guide/index.rst
Expand Up @@ -37,3 +37,4 @@ Further information on any specific method can be obtained in the
enhancingperf
sparse
gotchas
cookbook
2 changes: 1 addition & 1 deletion doc/source/user_guide/style.ipynb
Expand Up @@ -1133,7 +1133,7 @@
"metadata": {},
"outputs": [],
"source": [
"with open(\"template_structure.html\") as f:\n",
"with open(\"templates/template_structure.html\") as f:\n",
" structure = f.read()\n",
" \n",
"HTML(structure)"
Expand Down