Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
g-insana committed Oct 19, 2019
1 parent d030148 commit 9e06b31
Show file tree
Hide file tree
Showing 12 changed files with 463 additions and 0 deletions.
20 changes: 20 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
4 changes: 4 additions & 0 deletions author.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Author
======

* Dr Giuseppe Insana `(website) <http://insana.net>`_ `(contact) <http://insana.net/i/#contact>`.
56 changes: 56 additions & 0 deletions conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------

project = 'Cloudy Mountain Plot'
copyright = '2019, Dr Giuseppe Insana'
author = 'Dr Giuseppe Insana'
master_doc = 'index'

# The full version, including alpha/beta/rc tags
release = '0.9.2'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'sphinx_rtd_theme'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
#html_static_path = ['_static']
36 changes: 36 additions & 0 deletions index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
====================
Cloudy Mountain Plot
====================

.. toctree::
:maxdepth: 2
:caption: Contents:

introduction
installation
quickstart
tutorial
options
plotly
author
references

Contribute
----------

- `Python Source Code on GitHub <https://github.com/g-insana/cmplot.py>`_
- `Julia Source Code on GitHub <https://github.com/g-insana/CMplot.jl>`_

Support
-------

- `Python Issue Tracker <https://github.com/g-insana/cmplot.py/issues>`_
- `Julia Issue Tracker <https://github.com/g-insana/CMplot.jl/issues>`_

Copyright
---------

Licensed under the `GNU Affero General Public License <https://choosealicense.com/licenses/agpl-3.0/>`_.

Copyright © `Giuseppe Insana <http://insana.net>`_, 2019-

30 changes: 30 additions & 0 deletions installation.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
Download and installation
=========================

cmplot has no platform-specific dependencies and should thus work on all platforms.

Python instructions
-------------------

The latest version of cmplot can be installed by typing either:

.. code-block:: bash
pip3 install --upgrade cmplot
(from `Python Package Index <https://pypi.org/project/cmplot/>`_)

or:

.. code-block:: bash
pip3 install git+git://github.com/g-insana/cmplot.py.git
(from `GitHub <https://github.com/g-insana/cmplot.py/>`_).

Julia instructions
------------------

.. code-block:: julia
julia> ] dev https://github.com/g-insana/CMPlot.jl.git
60 changes: 60 additions & 0 deletions introduction.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
Introduction
============

A Cloudy Mountain Plot is an informative RDI [#f1]_ `categorical distribution <https://en.wikipedia.org/wiki/Categorical_distribution>`_ plot inspired by Violin, Bean and Pirate Plots.

* Like `Violin plots <https://en.wikipedia.org/wiki/Violin_plot>`_ [Hintze_Nelson_1998]_, it shows smoothed kernel density curves, revealing information which would be hidden in boxplots, for example presence of multiple "peaks" ("modes") in the distribution "mountain".

* Like `Bean plots <https://www.jstatsoft.org/article/view/v028c01>`_ [Kampstra_2008]_, it shows the raw data, drawn as a cloud of points. By default all data points are shown but you can optionally control this and limit the display to a subset of the data.

* Like `Pirate plots <https://github.com/ndphillips/yarrr>`_ [Phillips_2017]_, it marks confidence intervals (either from Student's T or as Bayesian Highest Density Intervals or as interquantile ranges) for the probable position of the true population mean.

Since by default it does not symmetrically mirror the density curves, it allows immediate comparisions of distributions side-by-side.

The present documentation introduces both what cloudy mountain plots are
and how to create them, using a plotting function which has been coded in both Julia
and Python, built on top of the freely available :doc:`plotly` graphic library.

Elements of the plot
--------------------

.. figure:: img/cloudy_mountain_plot_elements.png
:alt: elements of a cloudy mountain plot

.. glossary::

cloud
Marker symbols show the number and location of the raw data points.
They are shown jittered for clarity.
It is possible to fully control both the aspect (:option:`opacity <pointsopacity>` and :option:`shapes <pointshapes>`) of
the markers and their :option:`number <pointsmaxdisplayed>` (in case showing them all would prove too slow or
unelegant). It is also possible :option:`not to show <showpoints>` any point.
For clarity, by default the points are plotted on the opposite side of the kernel density curve. They can alternatively be plotted :option:`over the density curve <pointsoverdens>`, as in the above image.

mountain
`Kernel density estimation <https://en.wikipedia.org/wiki/Kernel_density_estimation>`_ curve.

line
Indicates the mean of the distribution

band
Probable position of the true population mean, to desired level of confidence.
Method used can be :option:`specified <inf>` as either CI [#f2]_ , HDI [#f3]_ or IQR [#f4]_.
It is also possible not to show the band.


boxplot
A small `boxplot <https://en.wikipedia.org/wiki/Boxplot>`_. It can be
:option:`shown or hidden <showboxplot>`, as desired.

outliers
The `outliers <https://en.wikipedia.org/wiki/Outlier>`_ are marked without
jitter, on the baseline, and with less transparency. It is of course possible
to choose :option:`whether to show <markoutliers>` the outliers.

.. rubric:: Footnotes

.. [#f1] RDI: Raw data + Descriptive statistics + Inferential statistics
.. [#f2] CI: `Confidence Interval <https://en.wikipedia.org/wiki/Confidence_interval>`_, from Student's T distribution
.. [#f3] HDI: `Bayesian Highest Density Intervals <https://en.wikipedia.org/wiki/Credible_interval>`_
.. [#f4] IQR: `Interquartile range <https://en.wikipedia.org/wiki/IQR>`_
35 changes: 35 additions & 0 deletions make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=.
set BUILDDIR=_build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
144 changes: 144 additions & 0 deletions options.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
Options
=======

The only mandatory arguments for cmplot are a dataframe containing the data and either a string or a list of strings which label the columns containing the discrete independent variables in the dataframe, as shown in the :doc:`quickstart` section.

Several additional optional arguments can be specified to customize the result, both in terms of content and of form.

.. option:: xcol

a string or an array of strings, column name(s) of the dataframe that you wish to plot as "x".

This should be the categorical independent variable. If more than one column name is given, the combination of these will be used as "x". See examples for interpretation. e.g. xcol="Species"

.. option:: ycol

a string or an array of strings, column name(s) of the dataframe that you wish to plot as "y". Optional.

These should be the continuous dependent variables. If ycol is not specified, then the function will plot all the columns of the dataframe except those specified in xcol.

e.g. ycol=["Sepal.Length","Sepal.Width"] would plot sepals' length and width as a function of the flower species

.. option:: orientation

'h' | 'v', default is 'h'

Orientation of the plot (horizontal or vertical)

.. option:: xsuperimposed

boolean, default is False

The default behaviour is to plot each value of the categorical variable (or each combination of values for multiple categorical variables) in a separate position. Set to True to superimpose the plots. This is useful in combination with "side='alt'" to create asymmetrical plots and comparing combinations of categorical variables (e.g. Married + Gender ~ Wage).

.. option:: xlabel

string or list of strings

Override for labelling (and placing) the plots of the categorical variables. Only relevant when using xsuperimposed

.. option:: title

string

If not specified, the plot title will be automatically created from the names of the variables plotted.

e.g. title="Length of petals for the three species"

.. option:: side

'pos' | 'neg' | 'both' | 'alt', default is 'alt'

'pos' would create kernel density curves rising towards the positive end of the axis, 'neg' towards the negative, 'both' creates symmetric curves (like violin/bean/pirate plots). 'alt' will alternate between 'pos' and 'neg' in case where multiple ycol are plotted.

e.g. side='both'

.. option:: altsidesflip

boolean, default is False

Set to True to flip the order of alternation between sides for the kernel density curves. Only relevant when side='alt'

.. option:: ycolorgroups

boolean, default is True

Set to False to have the function assign a separate colour when plotting different values of the categorical variable. Leave as True if all should be coloured the same.

.. option:: pointsoverdens

boolean, default is False

Set to True to plot the raw data points over the kernel density curves. This is obviously the case when side='both', but otherwise by default points are plotted on the opposite side.

.. option:: showpoints

boolean, default is True

Set to False to avoid plotting the cloud of data points

.. option:: pointsopacity

float, range 0-1, default is 0.4

The default is to plot the data points at 40% opacity. 1 would make points completely opaque and 0 completely transparent (in that case you'd be better served by setting showpoints to False).

.. option:: inf

'hdi' | 'ci' | 'iqr' | 'none', default is 'hdi'

To select the method to use for calculating the confidence interval for the inference band around the mean. 'hdi' for Bayesian Highest Density Interval, 'ci' for Confidence Interval based on Student's T, 'iqr' for Inter Quantile Range. Use 'none' to avoid plotting the inference band.

.. option:: conf_level

float, range 0-1, default is 0.95

Confidence level to use when inf='ci', credible mass for inf='hdi'

.. option:: hdi_iter

integer, default is 10000

Iterations to use when performing Bayesian t-test when inf='hdi'

.. option:: showboxplot

boolean, default is True

Set to False to avoid displaying the mini :term:`boxplot`

.. option:: markoutliers

boolean, default is True

Set to False to avoid marking the :term:`outliers`

.. option:: pointshapes

array of strings

You can specify manually which symbols to use for each distribution plotted. If not specified, a random symbol is chosen for each distribution.

.. option:: pointsdistance

float, range 0-1, default is 0.6

Distance at which data points will be plotted, measured from the base of the density curve. 0 is at the base, 1 is at the top.

.. option:: pointsmaxdisplayed

integer, default is 0

This option sets the maximum number of points to be drawn on the graph. The default value '0' corresponds to no limit (plot all points). This option can be useful when the data amount is massive and would prove inefficient or inelegant to plot.

.. option:: colorrange

integer, default is None

By default, the distribution will be coloured independently, with the colours automatically chosen as needed for a single plot, maximising the difference in hue across the colour spectrum. You can override this by specifying a number to accomodate. This is useful when joining different plots together. E.g. if the total number of colours to be accomodating, after joining two plots, would equal 4, then set colorrange=4

.. option:: colorshift

integer, default is 0

This option is used in combination with colorrange to skip a certain amount of colours when they are to be assigned to the distributions to be plotted. This is useful when joining different plots together, to avoid having distributions plotted with the same colour.

0 comments on commit 9e06b31

Please sign in to comment.