Skip to content

Commit

Permalink
correlations "numpy.corrcoef" performance boost and remapping of gith…
Browse files Browse the repository at this point in the history
…ub location
  • Loading branch information
Andrew Schonfeld authored and aschonfeld committed Nov 7, 2019
1 parent c01b061 commit 934212f
Show file tree
Hide file tree
Showing 9 changed files with 72 additions and 51 deletions.
5 changes: 3 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,15 +100,16 @@ defaults: &defaults
path: /tmp/circleci-test-results



version: 2
jobs:
build_2_7:
working_directory: ~/manahl/dtale_2_7
working_directory: ~/man-group/dtale_2_7
docker:
- image: circleci/python:2.7-stretch-node-browsers
<<: *defaults
build_3:
working_directory: ~/manahl/dtale_3
working_directory: ~/man-group/dtale_3
docker:
- image: circleci/python:3.6-stretch-node-browsers
<<: *defaults
Expand Down
10 changes: 7 additions & 3 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ Changelog

### 1.1.1 (2019-10-23)

* [#13](https://github.com/manahl/dtale/issues/13): fix for auto-detection of column widths for strings and floats
* [#13](https://github.com/man-group/dtale/issues/13): fix for auto-detection of column widths for strings and floats

### 1.2.0 (2019-10-24)

* [#20](https://github.com/manahl/dtale/issues/13): fix for data being overriden with each new instance
* [#21](https://github.com/manahl/dtale/issues/13): fix for displaying timestamps if they exist
* [#20](https://github.com/man-group/dtale/issues/13): fix for data being overriden with each new instance
* [#21](https://github.com/man-group/dtale/issues/13): fix for displaying timestamps if they exist
* calling `show()` now returns an object which can alter the state of a process
* accessing/altering state through the `data` property
* shutting down a process using the `kill()` function
Expand All @@ -42,3 +42,7 @@ Changelog
### 1.3.3 (2019-11-05)

* hotfix for failing test under certain versions of `future` package

### 1.3.4 (2019-11-07)

* updated correlation calculation to use `numpy.corrcoef` for performance purposes
46 changes: 23 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
[![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Title.png)](https://github.com/manahl/dtale)
[![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Title.png)](https://github.com/man-group/dtale)

[Live Demo](http://andrewschonfeld.pythonanywhere.com/dtale/main)

-----------------

[![CircleCI](https://circleci.com/gh/manahl/dtale.svg?style=shield&circle-token=4b67588a87157cc03b484fb96be438f70b5cd151)](https://circleci.com/gh/manahl/dtale)
[![CircleCI](https://circleci.com/gh/man-group/dtale.svg?style=shield&circle-token=4b67588a87157cc03b484fb96be438f70b5cd151)](https://circleci.com/gh/man-group/dtale)
[![PyPI](https://img.shields.io/pypi/pyversions/dtale.svg)](https://pypi.python.org/pypi/dtale/)
[![ReadTheDocs](https://readthedocs.org/projects/dtale/badge)](https://dtale.readthedocs.io)
[![codecov](https://codecov.io/gh/manahl/dtale/branch/master/graph/badge.svg)](https://codecov.io/gh/manahl/dtale)
[![Downloads](https://pepy.tech/badge/dtale)](https://pepy.tech/project/dtale)

## Getting Started

![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/blog/dtale_demo_mini.gif)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/blog/dtale_demo_mini.gif)

Setup/Activate your environment and install the egg

Expand Down Expand Up @@ -123,7 +123,7 @@ DTALE_CLI_LOADERS=./path_to_loaders bash -c 'dtale --testdata-rows 10 --testdata

### Python Terminal
This comes courtesy of PyCharm
![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Python_Terminal.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Python_Terminal.png)
Feel free to invoke `python` or `ipython` directly and use the commands in the screenshot above and it should work
#####Additional functions available programatically
```python
Expand Down Expand Up @@ -157,39 +157,39 @@ d._url # the url to access the process

## UI
Once you have kicked off your D-Tale session please copy & paste the link on the last line of output in your browser
![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Browser1.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Browser1.png)

The information in the upper right-hand corner is similar to saslook ![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Info_cell.png)
The information in the upper right-hand corner is similar to saslook ![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Info_cell.png)
- lower-left => row count
- upper-right => column count
- clicking the triangle displays the menu of standard functions (click outside menu to close it)
![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Info_menu_small.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Info_menu_small.png)

Selecting/Deselecting Columns
- to select a column, simply click on the column header (to deselect, click the column header again)
- You'll notice that the columns you've selected will display in the top of your browser
![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Col_select.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Col_select.png)

### Menu functions w/ no columns selected

![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Info_menu.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Info_menu.png)

- **Describe**: view all the columns & their data types as well as individual details of each column

![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Describe.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Describe.png)

|Data Type|Display|Notes|
|--------|:------:|:------:|
|date|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Describe_date.png)||
|string|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Describe_string.png)|If you have less than or equal to 100 unique values they will be displayed at the bottom of your popup|
|int|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Describe_int.png)|Anything with standard numeric classifications (min, max, 25%, 50%, 75%) will have a nice boxplot with the mean (if it exists) displayed as an outlier if you look closely.|
|float|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Describe_float.png)||
|date|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Describe_date.png)||
|string|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Describe_string.png)|If you have less than or equal to 100 unique values they will be displayed at the bottom of your popup|
|int|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Describe_int.png)|Anything with standard numeric classifications (min, max, 25%, 50%, 75%) will have a nice boxplot with the mean (if it exists) displayed as an outlier if you look closely.|
|float|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Describe_float.png)||

- **Filter**: apply a simple pandas `query` to your data (link to pandas documentation included in popup)

|Editing|Result|
|--------|:------:|
|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Filter_apply.png)|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Post_filter.png)|
|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Filter_apply.png)|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Post_filter.png)|

- **Coverage**: check for coverage gaps on column(s) by way of other column(s) as group(s)
- Select column(s) in "Group(s)" & "Col(s)"
Expand All @@ -200,7 +200,7 @@ Selecting/Deselecting Columns

|Daily|Daily Regional|
|-----|:-------------:|
|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Coverage_daily.png)|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Coverage_daily_regions.png)|
|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Coverage_daily.png)|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Coverage_daily_regions.png)|

- **Correlations**: shows a pearson correlation matrix of all numeric columns against all other numeric columns
- By deafult, it will show a grid of pearson correlations
Expand All @@ -210,13 +210,13 @@ Selecting/Deselecting Columns

|Matrix|Timeseries|Scatter|
|------|----------|-------|
|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Correlations.png)|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Correlations_ts.png)|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Correlations_scatter.png)|
|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Correlations.png)|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Correlations_ts.png)|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Correlations_scatter.png)|

- **About**: This will give you information about what version of D-Tale you're running as well as if its out of date to whats on PyPi.

|Up To Date|Out Of Date|
|--------|:------:|
|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/About-up-to-date.png)|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/About-out-of-date.png)|
|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/About-up-to-date.png)|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/About-out-of-date.png)|

- **Instances**: this will give you information about other D-Tale instances are running under your current Python process.

Expand All @@ -236,7 +236,7 @@ dtale.show(pd.DataFrame([range(6), range(6), range(6), range(6), range(6), range
```
This will make the **Instances** button available in all 3 of these D-Tale instances. Clicking that button while in the first instance invoked above will give you this popup:

![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Instances.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Instances.png)

The grid above contains the following information:
- Process: timestamp when the process was started along with the name (if specified in `dtale.show()`)
Expand All @@ -249,14 +249,14 @@ The grid above contains the following information:

Here is an example of clicking the "Preview" button:

![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Instances_preview.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Instances_preview.png)

- **Resize**: mostly a fail-safe in the event that your columns are no longer lining up. Click this and should fix that
- **Shutdown**: pretty self-explanatory, kills your D-Tale session (there is also an auto-kill process that will kill your D-Tale after an hour of inactivity)

### Menu functions w/ one column is selected

![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Menu_one_col.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Menu_one_col.png)

- **Move To Front**: moves your column to the front of the "unlocked" columns
- **Lock**: adds your column to "locked" columns
Expand Down Expand Up @@ -285,7 +285,7 @@ Here is an example of clicking the "Preview" button:

|Editing|Result|
|--------|:------:|
|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Formatting_apply.png)|![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Post_formatting.png)|
|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Formatting_apply.png)|![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Post_formatting.png)|

Here's a grid of all the formats available with -123456.789 as input:

Expand All @@ -300,7 +300,7 @@ Here's a grid of all the formats available with -123456.789 as input:

- **Histogram**: display histograms in bins of 5, 10, 20 or 50 for any numeric column

![](https://raw.githubusercontent.com/manahl/dtale/master/docs/images/Histogram.png)
![](https://raw.githubusercontent.com/man-group/dtale/master/docs/images/Histogram.png)

## For Developers

Expand Down
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,9 +63,9 @@
# built documents.
#
# The short X.Y version.
version = u'1.3.3'
version = u'1.3.4'
# The full version, including alpha/beta/rc tags.
release = u'1.3.3'
release = u'1.3.4'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down
2 changes: 1 addition & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Welcome to D-Tale's documentation!

Installation
------------
For installation steps, please refer to the project `README <https://github.com/manahl/dtale>`_.
For installation steps, please refer to the project `README <https://github.com/man-group/dtale>`_.

General use
-----------
Expand Down
2 changes: 1 addition & 1 deletion dtale/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ def build_app(reaper_on=True, hide_shutdown=False):
'contact': {
'name': 'Man Alpha Technology',
'email': 'ManAlphaTech@man.com',
'url': 'https://github.com/manahl/dtale'
'url': 'https://github.com/man-group/dtale'
},
},
host=socket.gethostname(),
Expand Down
39 changes: 25 additions & 14 deletions dtale/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,10 @@ def cleanup(port):
DTYPES.pop(port, None)


def get_port():
return get_str_arg(request, 'port', request.environ.get('SERVER_PORT', 'curr'))


@dtale.route('/main')
@swag_from('swagger/dtale/views/main.yml')
def view_main():
Expand All @@ -166,7 +170,7 @@ def view_main():
:return: HTML
"""
curr_settings = SETTINGS.get(request.environ.get('SERVER_PORT', 'curr'), {})
curr_settings = SETTINGS.get(get_port(), {})
_, version = retrieve_meta_info_and_version('dtale')
return render_template(
'dtale/main.html', settings=json.dumps(curr_settings), version=str(version), processes=len(DATA)
Expand Down Expand Up @@ -228,10 +232,10 @@ def update_settings():
try:
global SETTINGS

server_port = request.environ.get('SERVER_PORT', 'curr')
curr_settings = SETTINGS.get(server_port, {})
port = get_port()
curr_settings = SETTINGS.get(port, {})
updated_settings = dict_merge(curr_settings, json.loads(get_str_arg(request, 'settings', '{}')))
SETTINGS[server_port] = updated_settings
SETTINGS[port] = updated_settings
return jsonify(dict(success=True))
except BaseException as e:
return jsonify(dict(error=str(e), traceback=str(traceback.format_exc())))
Expand All @@ -254,7 +258,7 @@ def test_filter():
"""
try:
query = get_str_arg(request, 'query')
_test_filter(DATA[request.environ.get('SERVER_PORT', 'curr')], query)
_test_filter(DATA[get_port()], query)
return jsonify(dict(success=True))
except BaseException as e:
return jsonify(dict(error=str(e), traceback=str(traceback.format_exc())))
Expand All @@ -276,7 +280,7 @@ def dtypes():
}
"""
try:
return jsonify(dtypes=DTYPES[request.environ.get('SERVER_PORT', 'curr')], success=True)
return jsonify(dtypes=DTYPES[get_port()], success=True)
except BaseException as e:
return jsonify(error=str(e), traceback=str(traceback.format_exc()))

Expand Down Expand Up @@ -319,7 +323,7 @@ def describe(column):
"""
try:
data = DATA[request.environ.get('SERVER_PORT', 'curr')]
data = DATA[get_port()]
desc = load_describe(data[column])
return_data = dict(describe=desc, success=True)
uniq_vals = data[column].unique()
Expand Down Expand Up @@ -358,7 +362,7 @@ def get_data():
"""
try:
global SETTINGS, DATA, DTYPES
port = get_str_arg(request, 'port', request.environ.get('SERVER_PORT', 'curr'))
port = get_port()
data = DATA[port]

# this will check for when someone instantiates D-Tale programatically and directly alters the internal
Expand Down Expand Up @@ -433,7 +437,7 @@ def get_histogram():
query = get_str_arg(request, 'query')
bins = get_int_arg(request, 'bins', 20)
try:
data = DATA[request.environ.get('SERVER_PORT', 'curr')]
data = DATA[get_port()]
if query:
data = data.query(query)

Expand Down Expand Up @@ -461,9 +465,16 @@ def get_correlations():
"""
try:
query = get_str_arg(request, 'query')
data = DATA[request.environ.get('SERVER_PORT', 'curr')]
port = get_port()
data = DATA[port]
data = data.query(query) if query is not None else data
data = data.corr(method='pearson')

# using pandas.corr proved to be quite slow on large datasets so I moved to numpy:
# https://stackoverflow.com/questions/48270953/pandas-corr-and-corrwith-very-slow
valid_corr_cols = [c['name'] for c in DTYPES[port] if any((c['dtype'].startswith(s) for s in ['int', 'float']))]
data = np.corrcoef(data[valid_corr_cols].values, rowvar=False)
data = pd.DataFrame(data, columns=valid_corr_cols, index=valid_corr_cols)

data.index.name = str('column')
data = data.reset_index()
col_types = grid_columns(data)
Expand Down Expand Up @@ -520,7 +531,7 @@ def get_correlations_ts():
"""
try:
query = get_str_arg(request, 'query')
data = DATA[request.environ.get('SERVER_PORT', 'curr')]
data = DATA[get_port()]
data = data.query(query) if query is not None else data
cols = get_str_arg(request, 'cols')
cols = cols.split(',')
Expand Down Expand Up @@ -565,7 +576,7 @@ def get_scatter():
date = get_str_arg(request, 'date')
date_col = get_str_arg(request, 'dateCol')
try:
data = DATA[request.environ.get('SERVER_PORT', 'curr')]
data = DATA[get_port()]
data = data[data[date_col] == date] if date else data
if query:
data = data.query(query)
Expand Down Expand Up @@ -645,7 +656,7 @@ def filter_data(df, req, groups, query=None):
groups = get_str_arg(request, 'group')
if groups:
groups = json.loads(groups)
data = DATA[request.environ.get('SERVER_PORT', 'curr')]
data = DATA[get_port()]
data, groups, query = filter_data(data, request, groups, query=get_str_arg(request, 'query'))
grouper = []
for g_cfg in groups:
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,14 @@ def run_tests(self):

setup(
name="dtale",
version="1.3.3",
version="1.3.4",
author="MAN Alpha Technology",
author_email="ManAlphaTech@man.com",
description="Web Client for Visualizing Pandas Objects",
license="LGPL",
long_description='\n'.join((long_description, changelog)),
keywords=["numeric", "pandas", "visualization", "flask"],
url="https://github.com/manahl/dtale",
url="https://github.com/man-group/dtale",
install_requires=[
"arctic",
"jsonschema<3.0.0",
Expand Down
11 changes: 8 additions & 3 deletions tests/dtale/test_views.py
Original file line number Diff line number Diff line change
Expand Up @@ -353,8 +353,10 @@ def test_get_correlations(unittest, test_data):
import dtale.views as views

with app.test_client() as c:
test_data, _ = views.format_data(test_data)
with mock.patch('dtale.views.DATA', {c.port: test_data}):
with ExitStack() as stack:
test_data, _ = views.format_data(test_data)
stack.enter_context(mock.patch('dtale.views.DATA', {c.port: test_data}))
stack.enter_context(mock.patch('dtale.views.DTYPES', {c.port: views.build_dtypes_state(test_data)}))
response = c.get('/dtale/correlations')
response_data = json.loads(response.data)
expected = dict(data=[
Expand All @@ -365,7 +367,10 @@ def test_get_correlations(unittest, test_data):
unittest.assertEqual(response_data, expected, 'should return correlations')

with app.test_client() as c:
with mock.patch('dtale.views.DATA', {c.port: test_data}):
with ExitStack() as stack:
test_data, _ = views.format_data(test_data)
stack.enter_context(mock.patch('dtale.views.DATA', {c.port: test_data}))
stack.enter_context(mock.patch('dtale.views.DTYPES', {c.port: views.build_dtypes_state(test_data)}))
response = c.get('/dtale/correlations', query_string=dict(query="missing_col == 'blah'"))
response_data = json.loads(response.data)
unittest.assertEqual(
Expand Down

0 comments on commit 934212f

Please sign in to comment.