Skip to content

Commit

Permalink
DOCS-#3988: Rename Developer page to Development in docs (#3989)
Browse files Browse the repository at this point in the history
Signed-off-by: Igoshev, Yaroslav <yaroslav.igoshev@intel.com>
  • Loading branch information
YarShev committed Jan 21, 2022
1 parent 6224aba commit 406af7c
Show file tree
Hide file tree
Showing 19 changed files with 71 additions and 71 deletions.
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@ if you have questions about contributing.
- [ ] signed commit with `git commit -s` <!-- you can amend your commit with a signature via `git commit -amend -s` -->
- [ ] Resolves #? <!-- issue must be created for each patch -->
- [ ] tests added and passing
- [ ] module layout described at `docs/developer/architecture.rst` is up-to-date <!-- if you have added, renamed or removed files or directories please update the documentation accordingly -->
- [ ] module layout described at `docs/development/architecture.rst` is up-to-date <!-- if you have added, renamed or removed files or directories please update the documentation accordingly -->
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -208,7 +208,7 @@ and improve:

![Architecture](docs/img/modin_architecture.png)

Visit the [Documentation](https://modin.readthedocs.io/en/latest/developer/architecture.html) for
Visit the [Documentation](https://modin.readthedocs.io/en/latest/development/architecture.html) for
more information, and checkout [the difference between Modin and Dask!](https://github.com/modin-project/modin/tree/master/docs/modin_vs_dask.md)

**`modin.pandas` is currently under active development. Requests and contributions are welcome!**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -206,29 +206,29 @@ Supported Execution Engines and Storage Formats

This is a list of execution engines and in-memory formats supported in Modin. If you
would like to contribute a new execution engine or in-memory format, please see the
documentation page on :doc:`contributing </developer/contributing>`.
documentation page on :doc:`contributing </development/contributing>`.

- :doc:`pandas on Ray </developer/using_pandas_on_ray>`
- :doc:`pandas on Ray </development/using_pandas_on_ray>`
- Uses the Ray_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`pandas on Ray </flow/modin/core/execution/ray/implementations/pandas_on_ray/index>` page.
- :doc:`pandas on Dask </developer/using_pandas_on_dask>`
- :doc:`pandas on Dask </development/using_pandas_on_dask>`
- Uses the `Dask Futures`_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`pandas on Dask </flow/modin/core/execution/dask/implementations/pandas_on_dask/index>` page.
- :doc:`pandas on Python </developer/using_pandas_on_python>`
- :doc:`pandas on Python </development/using_pandas_on_python>`
- Uses native python execution - mainly used for debugging.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`pandas on Python </flow/modin/core/execution/python/implementations/pandas_on_python/index>` page.
- :doc:`pandas on Ray` (experimental)
- Uses the Ray_ execution framework.
- The storage format is `pandas` and the in-memory partition type is a pandas DataFrame.
- For more information on the execution path, see the :doc:`experimental pandas on Ray </flow/modin/experimental/core/execution/ray/implementations/pandas_on_ray/index>` page.
- :doc:`OmniSci on Native </developer/using_omnisci>` (experimental)
- :doc:`OmniSci on Native </development/using_omnisci>` (experimental)
- Uses OmniSciDB as an engine.
- The storage format is `omnisci` and the in-memory partition type is a pyarrow Table. When defaulting to pandas, the pandas DataFrame is used.
- For more information on the execution path, see the :doc:`OmniSci on Native </flow/modin/experimental/core/execution/native/implementations/omnisci_on_native/index>` page.
- :doc:`Pyarrow on Ray </developer/using_pyarrow_on_ray>` (experimental)
- :doc:`Pyarrow on Ray </development/using_pyarrow_on_ray>` (experimental)
- Uses the Ray_ execution framework.
- The storage format is `pyarrow` and the in-memory partition type is a pyarrow Table.
- For more information on the execution path, see the :doc:`Pyarrow on Ray </flow/modin/experimental/core/execution/ray/implementations/pyarrow_on_ray>` page.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ If you're interested in getting involved in the development of Modin, but aren't
where start, take a look at the issues tagged `Good first issue`_ or Documentation_.
These are issues that would be good for getting familiar with the codebase and better
understanding some of the more complex components of the architecture. There is
documentation here about the :doc:`architecture </developer/architecture>` that you will
documentation here about the :doc:`architecture </development/architecture>` that you will
want to review in order to get started.

Also, feel free to join the discussions on the `developer mailing list`_.
Expand Down Expand Up @@ -208,7 +208,7 @@ Contributing a new execution framework or in-memory format
----------------------------------------------------------

If you are interested in contributing support for a new execution framework or in-memory
format, please make sure you understand the :doc:`architecture </developer/architecture>` of Modin.
format, please make sure you understand the :doc:`architecture </development/architecture>` of Modin.

The best place to start the discussion for adding a new execution framework or in-memory
format is the `developer mailing list`_.
Expand Down
8 changes: 4 additions & 4 deletions docs/developer/index.rst → docs/development/index.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
Developer
=========
Development
===========

.. toctree::
:maxdepth: 4

contributing
architecture
partition_api
Expand All @@ -16,4 +16,4 @@ Developer

.. meta::
:description lang=en:
Developer-specific documentation.
Development-specific documentation.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,6 @@ and you might not specify it explicitly.

.. note::
If you encounter ``LLVM ERROR: inconsistency in registered CommandLine options`` error when using OmniSci,
please refer to the respective section in :doc:`Troubleshooting </developer/troubleshooting>` page to avoid the issue.
please refer to the respective section in :doc:`Troubleshooting </development/troubleshooting>` page to avoid the issue.

.. _OmnisciDB: https://www.omnisci.com/platform/omniscidb
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Our plans with the SQL API for Modin are to create an interface that allows you
intermix SQL and pandas operations without copying the entire dataset into a new
structure between the two. This is possible due to the architecture of Modin. Currently,
Modin has a query compiler that acts as an intermediate layer between the query language
(e.g. SQL, pandas) and the execution (See :doc:`architecture </developer/architecture>` documentation for details).
(e.g. SQL, pandas) and the execution (See :doc:`architecture </development/architecture>` documentation for details).

*We have implemented a simple example that can be found below. Feedback welcome!*

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ PandasOnDask Execution
Queries that perform data transformation, data ingress or data egress using the `pandas on Dask` execution
pass through the Modin components detailed below.

To enable `pandas on Dask` execution, please refer to the usage section in :doc:`pandas on Dask </developer/using_pandas_on_dask>`.
To enable `pandas on Dask` execution, please refer to the usage section in :doc:`pandas on Dask </development/using_pandas_on_dask>`.

Data Transformation
'''''''''''''''''''
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Queries that perform data transformation, data ingress or data egress using the
pass through the Modin components detailed below.

`pandas on Python` execution is sequential and it's used for the debug purposes. To enable `pandas on Python` execution,
please refer to the usage section in :doc:`pandas on Python </developer/using_pandas_on_python>`.
please refer to the usage section in :doc:`pandas on Python </development/using_pandas_on_python>`.

Data Transformation
'''''''''''''''''''
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ PandasOnRay Execution
Queries that perform data transformation, data ingress or data egress using the `pandas on Ray` execution
pass through the Modin components detailed below.

To enable `pandas on Ray` execution, please refer to the usage section in :doc:`pandas on Ray </developer/using_pandas_on_ray>`.
To enable `pandas on Ray` execution, please refer to the usage section in :doc:`pandas on Ray </development/using_pandas_on_ray>`.

Data Transformation
'''''''''''''''''''
Expand Down
86 changes: 43 additions & 43 deletions docs/getting_started/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,18 +8,18 @@ question on the #support channel on our Slack_ community or open a Github issue_
What’s wrong with pandas and why should I use Modin?
""""""""""""""""""""""""""""""""""""""""""""""""""""

While pandas works extremely well on small datasets, as soon as you start working with
medium to large datasets that are more than a few GBs, pandas can become painfully
slow or run out of memory. This is because pandas is single-threaded. In other words,
you can only process your data with one core at a time. This approach does not scale to
larger data sets and adding more hardware does not lead to more performance gain.

The :py:class:`~modin.pandas.dataframe.DataFrame` is a highly
scalable, parallel DataFrame. Modin transparently distributes the data and computation so
that you can continue using the same pandas API while being able to work with more data faster.
Modin lets you use all the CPU cores on your machine, and because it is lightweight, it
often has less memory overhead than pandas. See this :doc:`page </getting_started/pandas>` to
learn more about how Modin is different from pandas.
While pandas works extremely well on small datasets, as soon as you start working with
medium to large datasets that are more than a few GBs, pandas can become painfully
slow or run out of memory. This is because pandas is single-threaded. In other words,
you can only process your data with one core at a time. This approach does not scale to
larger data sets and adding more hardware does not lead to more performance gain.

The :py:class:`~modin.pandas.dataframe.DataFrame` is a highly
scalable, parallel DataFrame. Modin transparently distributes the data and computation so
that you can continue using the same pandas API while being able to work with more data faster.
Modin lets you use all the CPU cores on your machine, and because it is lightweight, it
often has less memory overhead than pandas. See this :doc:`page </getting_started/pandas>` to
learn more about how Modin is different from pandas.

Why not just improve pandas?
""""""""""""""""""""""""""""
Expand All @@ -30,49 +30,49 @@ implementation. While we would be happy to donate parts of Modin that
make sense in pandas, many of these components would require significant (or
total) redesign of the pandas architecture. Modin's architecture goes beyond
pandas, which is why the pandas API is just a thin layer at the user level. To learn
more about Modin's architecture, see the :doc:`architecture </developer/architecture>` documentation.
more about Modin's architecture, see the :doc:`architecture </development/architecture>` documentation.

How much faster can I go with Modin compared to pandas?
"""""""""""""""""""""""""""""""""""""""""""""""""""""""

Modin is designed to scale with the amount of hardware available.
Even in a traditionally serial task like ``read_csv``, we see large gains by efficiently
distributing the work across your entire machine. Because it is so light-weight,
Even in a traditionally serial task like ``read_csv``, we see large gains by efficiently
distributing the work across your entire machine. Because it is so light-weight,
Modin provides speed-ups of up to 4x on a laptop with 4 physical cores. This speedup scales
efficiently to larger machines with more cores. We have several published papers_ that
include performance results and comparisons against pandas.

How much more data would I be able to process with Modin?
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Often data scientists have to use different tools for operating on datasets of different sizes.
This is not only because processing large dataframes is slow, but also pandas does not support working
with dataframes that don't fit into the available memory. As a result, pandas workflows that work well
for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size
of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably
work with hundreds of GBs without worrying about substantial slowdown or memory errors. For more information,
Often data scientists have to use different tools for operating on datasets of different sizes.
This is not only because processing large dataframes is slow, but also pandas does not support working
with dataframes that don't fit into the available memory. As a result, pandas workflows that work well
for prototyping on a few MBs of data do not scale to tens or hundreds of GBs (depending on the size
of your machine). Modin supports operating on data that does not fit in memory, so that you can comfortably
work with hundreds of GBs without worrying about substantial slowdown or memory errors. For more information,
see :doc:`out-of-memory support <getting_started/out_of_core.rst>` for Modin.

How does Modin work under the hood?
"""""""""""""""""""""""""""""""""""

Modin is logically separated into different layers that represent the hierarchy of a
typical Database Management System. User queries which perform data transformation,
data ingress or data egress pass through the Modin Query Compiler which translates
queries from the top-level pandas API Layer that users interact with to the Modin Core
Dataframe layer.
The Modin Core DataFrame is our efficient DataFrame implementation that utilizes a partitioning schema
Modin is logically separated into different layers that represent the hierarchy of a
typical Database Management System. User queries which perform data transformation,
data ingress or data egress pass through the Modin Query Compiler which translates
queries from the top-level pandas API Layer that users interact with to the Modin Core
Dataframe layer.
The Modin Core DataFrame is our efficient DataFrame implementation that utilizes a partitioning schema
which allows for distributing tasks and queries. From here, the Modin DataFrame works with engines like
Ray or Dask to execute computation, and then return the results to the user.

For more details, take a look at our system :doc:`architecture </developer/architecture>`.
For more details, take a look at our system :doc:`architecture </development/architecture>`.

If I’m only using my laptop, can I still get the benefits of Modin?
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Absolutely! Unlike other parallel DataFrame systems, Modin is an extremely
light-weight, robust DataFrame. Because it is so light-weight, Modin provides
speed-ups of up to 4x on a laptop with 4 physical cores
Absolutely! Unlike other parallel DataFrame systems, Modin is an extremely
light-weight, robust DataFrame. Because it is so light-weight, Modin provides
speed-ups of up to 4x on a laptop with 4 physical cores
and allows you to work on data that doesn't fit in your laptop's RAM.

How do I use Jupyter or Colab notebooks with Modin?
Expand All @@ -90,13 +90,13 @@ import with Modin import:
Which execution engine (Ray or Dask) should I use for Modin?
""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Whichever one you want! Modin supports Ray_ and Dask_ execution engines to provide an effortless way
to speed up your pandas workflows. The best thing is that you don't need to know
anything about Ray and Dask in order to use Modin and Modin will automatically
detect which engine you have
installed and use that for scheduling computation. If you don't have a preference, we recommend
starting with Modin's default Ray engine. If you want to use a specific
compute engine, you can set the environment variable ``MODIN_ENGINE`` and
Whichever one you want! Modin supports Ray_ and Dask_ execution engines to provide an effortless way
to speed up your pandas workflows. The best thing is that you don't need to know
anything about Ray and Dask in order to use Modin and Modin will automatically
detect which engine you have
installed and use that for scheduling computation. If you don't have a preference, we recommend
starting with Modin's default Ray engine. If you want to use a specific
compute engine, you can set the environment variable ``MODIN_ENGINE`` and
Modin will do computation with that engine:

.. code-block:: bash
Expand All @@ -107,17 +107,17 @@ Modin will do computation with that engine:
pip install "modin[dask]" # Install Modin dependencies and Dask to run on Dask
export MODIN_ENGINE=dask # Modin will use Dask
We also have an experimental OmniSciDB-based engine of Modin you can read about :doc:`here </developer/using_omnisci>`.
We plan to support more execution engines in future. If you have a specific request,
please post on the #feature-requests channel on our Slack_ community.
We also have an experimental OmniSciDB-based engine of Modin you can read about :doc:`here </development/using_omnisci>`.
We plan to support more execution engines in future. If you have a specific request,
please post on the #feature-requests channel on our Slack_ community.

How can I contribute to Modin?
""""""""""""""""""""""""""""""

**Modin is currently under active development. Requests and contributions are welcome!**

If you are interested in contributing please check out the :doc:`Getting Started</getting_started/index>`
guide then refer to the :doc:`Developer Documentation</developer/index>` section,
guide then refer to the :doc:`Development Documentation</development/index>` section,
where you can find system architecture, internal implementation details, and other useful information.
Also check out the `Github`_ to view open issues and make contributions.

Expand All @@ -128,4 +128,4 @@ Also check out the `Github`_ to view open issues and make contributions.
.. _Dask: https://dask.org/
.. _papers: https://arxiv.org/abs/2001.00888
.. _guide: https://modin.readthedocs.io/en/stable/installation.html?#installing-on-google-colab
.. _tutorial: https://github.com/modin-project/modin/tree/master/examples/tutorial
.. _tutorial: https://github.com/modin-project/modin/tree/master/examples/tutorial

0 comments on commit 406af7c

Please sign in to comment.