# Introduction

In [None]:
# Generate notebook download link
from IPython.display import FileLink
print('To download this notebook, right click on the link and Save link as...')
FileLink('tutorial_intro.ipynb')

To download this notebook, right click on the link and Save link as...


## What Is Riptable?

Riptable is an open source library built for high-performance data analysis. It's similar to Pandas by design, but it's been optimized to meet the needs of Riptable's core users: quantitative traders analyzing large volumes of trading data interactively, in real time. 

Riptable is based on NumPy, so it shares many core NumPy methods for array-based operations. For users who work with large datasets, Riptable improves on NumPy and Pandas by using multi-threading and efficient memory management, much of it implemented at the C++ level.

Riptable has implemented its own Pandas-like functions for grouping and aggregation. See Riptable for Pandas Users to get an overview of key differences.

Riptable's APIs are designed to be more feature-rich and easier to work with than those provided by Pandas and other existing libraries. An additional package, Playa, builds upon Riptable to offer supplemental data manipulation and analysis capabilities, in some cases providing APIs that more closely resemble those of the standard packages.

NumPy and Pandas users will find it easy to convert their data to Riptable (and back again if need be). It's also possible to convert data from CSV or SQL files. Similarly, h5 files can be converted to Riptable's format. Matlab users, who will generally find similar syntax and functionality in Riptable, can use special keyword arguments to convert Matlab data to Riptable's format. See [IO Tools and Working with Other File Types](tutorial_io.ipynb) for details.

For data visualization, any of the standard plotting tools (for example, matplotlib.pyplot) will work out of the box. To see a few basic examples, check out the [Visualize Data](tutorial_visualize.ipynb) section.

## A Note to Pandas Users

If you've used Pandas, you'll notice many similarities in Riptable -- though be aware that Riptable has some not-always-immediately-obvious differences. This tutorial doesn't call out those differences specifically; they're covered separately in [Comparison to Pandas](https://confluence/display/SOT/Comparison+to+Pandas) and [Riptable "Gotchas" for Pandas Users](https://confluence/pages/viewpage.action?pageId=148702274).

## Who This Tutorial Is For

If you're new to Riptable, this tutorial is for you. It's intended to help get you familiar with Riptable's basic functionality and syntax.

Some experience with Python will be helpful, especially familiarity with dictionary syntax, sequences (lists, tuples, etc.), and basic functions and arguments. 

## Install and Import Riptable

To get started with Riptable and JupyterLab on Linux or Windows, see these [setup instructions](https://confluence/display/SQI/Setup).

<!-- - (CJ Note: As this tutorial is published only internally for now, I'm just linking to the Confluence setup pages. Before this is published externally, I'll need more info on how non-SIG people should install Riptable. Conda?) -->

To access Riptable and its functions in your Python code, add these lines to your code:

In [2]:
import riptable as rt
import numpy as np

## Display Options

You can modify Riptable's default display options using the attributes offered in [rt.Display.options](https://github.com/rtosholdings/riptable/blob/master/riptable/Utils/display_options.py). Here are a few you might find useful.


### General Display Options

Some general options you can set for a session:

In [7]:
# Display all Dataset columns -- the default max is 9.
rt.Display.options.COL_ALL = True

# Render up to 100MM before showing in scientific notation.
rt.Display.options.E_MAX = 100_000_000

# Truncate small decimals, rather than showing infinitesimal scientific notation.
rt.Display.options.P_THRESHOLD = 0

# Put commas in numbers.
rt.Display.options.NUMBER_SEPARATOR = True

# Turn on Riptable autocomplete (start typing, then press Tab to see options).
rt.autocomplete()

### Contextual Help

The `rt.autocomplete()` option listed above can be used as an alternative to Python's built-in `dir()` function, which shows various attributes and methods associated with an object.

For example, to see the attributes and methods of Riptable's Date object, you can use `dir()`:

In [5]:
# Limit and format the output.
dir_date = dir(rt.Date)
print("Some of the attributes and methods include...\n")
print(", ".join(list(dir_date)[::10]))

Some of the attributes and methods include...

CompressPickle, T, _LDUMP, _TON, __array_function__, __class__, __doc__, __hash__, __init_subclass__, __le__, __new__, __rfloordiv__, __rsub__, __truediv__, _check_mathops, _fa_keyword_wrapper, _max, _nanstd, _reduce_op_identity_value, _yearday_splits_leap, argpartition, clip_upper, cummin, differs, ema_decay, format_date_num, is_leapyear, isnormal, map_old, move_mean, nanmean, nonzero, push, reshape, round, sign, strides, tolist, year


Note: The resulting list may not be complete. For details, see Python's documentation for `dir()` in the section on built-in functions.

Alternatively, you can use Riptable's autocomplete interface. With `rt.autocomplete()` turned on, type `rt.Date.<TAB>` where `<TAB>` is the Tab key. You'll see a pop-up list of attributes and methods. Keep typing to narrow down the list.

<img src="rt_autocomplete.png" width="224" height="220">

Note that private/internal attributes and methods (those whose names are preceded by an underscore) are omitted by default, but you can access them by typing the underscore. For example: `rt.Date._fa<TAB>`.

<img src="rt_autocomplete_internal.png" width="286" height="116">

You can access the doc string on any (documented) function or object with the following syntax:

* IPython prompt: `my_func?`
* Python prompt: `help(my_obj)`
    
For example:

In [9]:
rt.sum?

[1;31mSignature:[0m [0mrt[0m[1;33m.[0m[0msum[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [0mfilter[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mdtype[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Computes the sum of the first argument. For example
>>> a = rt.FastArray( [1,2,3])
>>> rt.sum(a)
6

If possible, rt.sum(x, *args) calls x.sum(*args). If possible, look there for
documentation. In particular, rt.sum(x) may NOT accept a filter argument, depending
on the type of x. If x is a FastArray, then mean accepts the following keywords.

filter : array of bool, optional
Specifies which elements to include in the mean. For example,
>>> a = rt.FastArray( [1,3,5,7])
>>> b = rt.FastArray( [False, True, False, True,True] )
>>> rt.sum(a, filter = b)
10
If the filter is uniformly False, this will return 0.


dtype : optional
What datatype should the result be returned as. For a FastArray x,
x.su

You can access the source code with `??`:

In [10]:
rt.sum??

[1;31mSignature:[0m [0mrt[0m[1;33m.[0m[0msum[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [0mfilter[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [0mdtype[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mSource:[0m   
[1;32mdef[0m [0msum[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m[0mfilter[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m [0mdtype[0m [1;33m=[0m [1;32mNone[0m[1;33m,[0m[1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m:[0m[1;33m
[0m    [1;34m'''
    Computes the sum of the first argument. For example
    >>> a = rt.FastArray( [1,2,3])
    >>> rt.sum(a)
    6

    If possible, rt.sum(x, *args) calls x.sum(*args). If possible, look there for
    documentation. In particular, rt.sum(x) may NOT accept a filter argument, depending
    on the type of x. If x is a FastArray, then mean accepts the following keywords.

    filter : array of bool, optional
    Specifies which elements t

### Dataset Display Options

When you view a Dataset, some data might be elided or truncated. By default:

- Up to 9 columns are shown. If the Dataset has more than 9 columns, the middle columns are elided (with a "..." column displayed).
- Up to 30 rows are shown. If the Dataset has more than 30 rows, the middle rows are elided (with a "..." row displayed).
- Strings are displayed up to 15 characters, with additional characters truncated.

The following internal/private methods override the defaults on a per-display basis:

- Show all columns and rows (up to 10,000 rows), as well as long strings: `ds._A`
- Show all columns and long strings: `ds._H`
- Show all columns with wrapping, and long strings: `ds._G`
- Show all rows (up to 10,000): `ds._V`
- Transpose columns and rows: `ds._T`

Now that we're all set up, we're ready to look at Riptable's foundational data structures: [Intro to Riptable Datasets, FastArrays, and Structs](tutorial_datasets.ipynb).

<br>
<br>

---

Questions or comments about this guide? Email RiptableDocumentation@sig.com.