Skip to content

Commit

Permalink
[docs] Minor tweaks to ARCTICDB_INTEGRATION.md
Browse files Browse the repository at this point in the history
  • Loading branch information
poodlewars authored and aschonfeld committed Jul 21, 2023
1 parent 57f3721 commit 00d8c40
Showing 1 changed file with 44 additions and 21 deletions.
65 changes: 44 additions & 21 deletions docs/arcticdb/ARCTICDB_INTEGRATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> ⚠️ **Pre-built binaries for ArcticDB only available for Linux and Windows**: MacOS binaries are coming soon!
It's been almost 4 years since the original version of D-Tale was released as a way for pandas users to navigate their dataframes. There has been many new features added and lots of bugs fixed. One aspect of D-Tale's construction that has consistently been a point of contention has been its need to work from Dataframes stored in memory. For small Dataframes, this is by far the best solution, but user's have become more accustomed to using larger (north of 10 million rows) and wider (sometimes are high as 300K columns) dataframes. So storing dataframes of these dimensions prove to be unwieldy. But alas, a solution to this problem has come in the form of ArcticDB!
It's been almost 4 years since the original version of D-Tale was released as a way for pandas users to navigate their dataframes. There has been many new features added and lots of bugs fixed. One aspect of D-Tale's construction that has consistently been a point of contention has been its need to work from Dataframes stored in memory. For small Dataframes, this is by far the best solution, but user's have become more accustomed to using larger (north of 10 million rows) and wider (sometimes are high as 300K columns) dataframes. So storing dataframes of these dimensions prove to be unwieldy. Thankfully, a solution to this problem has come in the form of ArcticDB!

At long last! There is finally a solution for navigating the underlying data of your ArcticDB databases. Once again, the solution is D-Tale!

Expand All @@ -20,29 +20,38 @@ This will install D-Tale and ArcticDB. From there, you can set up a local databa

```python
import pandas as pd
import numpy as np
from datetime import datetime
from arcticdb import Arctic

conn = Arctic('lmdb:///[path to DB]/test_db')
uri = "lmdb:///tmp/dtale/arcticdb"
conn = Arctic(uri)
conn.create_library('lib1')
lib1 = conn['lib1']

lib1.write('symbol1', pd.DataFrame([
{'col1': 1, 'col2': 1.1, 'col3': pd.Timestamp('20230101')}
]))
lib1.write('symbol2', pd.DataFrame([

cols = ['COL_%d' % i for i in range(50)]
df = pd.DataFrame(np.random.randint(0, 50, size=(25, 50)), columns=cols)
df.index = pd.date_range(datetime(2000, 1, 1, 5), periods=25, freq="H")

lib1.write('symbol2', df)
lib1.write('symbol3', pd.DataFrame([
{'col1': 2, 'col2': 2.2, 'col3': pd.Timestamp('20230102')}
]))

conn.create_library('lib2')
lib2 = conn['lib2']
lib2.write('symbol3', pd.DataFrame([
{'col1': 3, 'col2': 3.3, 'col3': pd.Timestamp('20230103')}
]))
lib2.write('symbol4', pd.DataFrame([
{'col1': 4, 'col2': 4.4, 'col3': pd.Timestamp('20230104')},
{'col1': 5, 'col2': 5.5, 'col3': pd.Timestamp('20230105')},
{'col1': 6, 'col2': 6.6, 'col3': pd.Timestamp('20230106')},
{'col1': 7, 'col2': 7.7, 'col3': pd.Timestamp('20230107')}
]))

N_COLS = 1000
N_ROWS = 20_000
cols = ['COL_%d' % i for i in range(N_COLS)]
big_df = pd.DataFrame(np.random.normal(loc=10.0, scale=2.5, size=(N_ROWS, N_COLS)), columns=cols)
big_df.index = pd.date_range(datetime(2000, 1, 1, 5), periods=N_ROWS, freq="H")

lib2.write('symbol4', big_df)

conn.create_library('lib3')
lib3 = conn['lib3']
Expand All @@ -51,17 +60,27 @@ lib3.write('symbol5', pd.DataFrame(
))
```

Here is a link to the [source code](https://github.com/man-group/dtale/tree/master/docs/arcticdb/arcticdb_demo.py) which can be run as a script or bundled within gunicorn (`gunicorn --workers=5 --preload arcticdb_demo:app -b 0.0.0.0:9207`)
You can then spin up the D-Tale backend:

```python
import dtale.global_state as global_state
from dtale.app import build_app

global_state.use_arcticdb_store(uri=uri)
app = build_app(reaper_on=False)

app.run(host="0.0.0.0", port=9207)
```

Once this is complete you can connect D-Tale to this DB and start navigating the data:
```python
import dtale

dtale.show_arcticdb(uri='lmdb:///[path to DB]/test_db', use_store=True)
>>> import dtale
>>> uri = "lmdb:///tmp/dtale/arcticdb"
>>> dtale.show_arcticdb(uri=uri, use_store=True)
<URL to access D-Tale UI>
```

Using `use_store=True` will force the mechanism D-Tale uses for storing/reading/writing data to be ArcticDB. This seems insignificant, but it actually provides a huge enhancement to D-Tale's infrastructure. You can now read dataframes of any size without having any memory constraints. D-Tale will simply read the rows/columns of your dataframe that need to be displayed in your browser, rather than the entire dataframe into memory.
`use_store=True` forces the mechanism D-Tale uses for storing/reading/writing data to be ArcticDB. This seems insignificant, but it actually provides a huge enhancement to D-Tale's infrastructure. You can now read dataframes of any size without having any memory constraints. D-Tale will simply read the rows/columns of your dataframe that need to be displayed in your browser, rather than the entire dataframe into memory.

### Why back your data with ArcticDB?

Expand Down Expand Up @@ -96,9 +115,13 @@ And now you'll see only the columns you've had locked (we've locked no columns i

![](https://raw.githubusercontent.com/aschonfeld/dtale-media/master/images/arcticdb/demo/arcticdb_symbol5_col100.png)

### Differences in Functionality from the Original D-Tale
## Going Further with ArcticDB

See the ArcticDB [docs](https://docs.arcticdb.io/) and [website](https://arcticdb.io/) for more information about it. The LMDB backend for ArcticDB is recommended for local testing, but for real workloads we recommend using an S3 backend. You can simply point D-Tale at an S3 backed ArcticDB instance and browse the data being written by processes across your system.

## Differences in Functionality from the Original D-Tale

Unfortunately, since we are leveraging ArcticDB for our data fetching there was some functionality that is available in the standard (in-memory) version of D-Tale:
Unfortunately, since we are leveraging ArcticDB for our data fetching there is some functionality that is absent compared to the standard (in-memory) version of D-Tale:
* Custom Filtering - the ability to specify custom pandas queries
* Column Filters
* Numeric - range filters (`[]`, `()`) are no longer available
Expand All @@ -108,9 +131,9 @@ Unfortunately, since we are leveraging ArcticDB for our data fetching there was
* Data Reshapers
* Dataframe Functions

Functionality that is still available, but for dataframes w/ less-than 1 million rows & less-than-or-equal to 100 columns:
Functionality that is still available, but for dataframes with less-than 1 million rows & less-than-or-equal to 100 columns:
* Unique Count
* Outliers & Outlier Highlighting
* Much of the "Describe" popup initial details

If there is any functionality (which isn't controlled by ArcticDB itself) you want added back in for ArcticDB through the use of configuration flags please submit an issue. Thanks :pray
If there is any functionality (which isn't controlled by ArcticDB itself) you want added back in for ArcticDB please submit an issue. Thanks :pray:

0 comments on commit 00d8c40

Please sign in to comment.