Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDA plot function #5

Closed
dovahcrow opened this issue May 23, 2019 · 0 comments
Closed

EDA plot function #5

dovahcrow opened this issue May 23, 2019 · 0 comments

Comments

@dovahcrow
Copy link
Member

Goal: Plot function includes plot(df), plot(df, x="x") and plot(df, x="x", y="y")

Step 1: create intermediates
Step 2: plot graphs based on intermediates

shub970 added a commit that referenced this issue May 28, 2019
Added the function plot(df, x, y) to plot bi-variate graphs.

Close #5
shub970 added a commit that referenced this issue Jun 3, 2019
code refactoring #5

Signed-off-by: shub970 <laddha.shubham97@gmail.com>
dovahcrow pushed a commit that referenced this issue Jun 11, 2019
Added the function plot(df, x, y) to plot bi-variate graphs.
Close #5
peiwangdb pushed a commit that referenced this issue Mar 6, 2020
committer pei wang <pennyiscomputing@gmail.com> 1583453127 -0800

Initial commit

doc(meta): Add contributing guide and changelog

refactor(dataprep): Add project structure

feat(eda.plot): Implement plot(df) and plot(df,x,y)

Close #5

refactor(dataprep): creating package structure

ci(CircleCI): Add circleci

build(dataprep): Add pytype & pylint & pipenv

refactor(eda.plot): plot function combined into one.

feat(eda.plot): Implement QQ norm plot

And add mypy check

chore(dataprep): Add pull request template and chore commit type.

chore(dataprep): Add editor related directory to .gitignore

E.g. ",idea" and ".vscode"

feat(eda.plot_correlation): Implement the calculation of intermediates

feat(eda.plot_correlation): Impelment visualization

docs(CONTRIBUTING): add more guidelines about PR

feat(eda.plot_missing, eda.plot_correlation): implement plot_missing and plot_correlation

feat(eda.plot): Add visualization code for plot(df, ...)

fix(eda.plot_missing): fix the plot issue of categorical data

style(eda.plot_missing): revise the import order

build: lock in dependencies version and make setup.py PIpenv aware

fix(eda.plot): fix the number of bars and bins of plot

fix(eda.plot_missing): fix the color, too many categories and actual name

style(eda.plot_missing): disable too-many-statements

fix(eda.plot_correlation): node size, color, x-label and order

fix(eda.plot_correlation): acceleration and polish figures

fix(eda.plot_correlation): polish figures

fix(eda.plot_missing): fix the color, too many categories and actual name

 style(eda): format the code using black

chore(CONTRIBUTING): suggesting not having merge commits in PR

chore(README): add build status badge and contribution guideline

build: use black to do formatting

fix(eda.plot): Changed the histogram visualization, formatted x axis labels, added bars for missing values

Made aesthetic changes to the plot(df) function: changed the histogram visualization, formated labels, added additional bars for missing values, specify the number of plots that appear on each row e.g. plot(df, ncolumns=4).

fix(eda.plot): Fixed comments from the pull request

Fixed problems from the last pull request

fix(eda.plot): Made changes to test_plot.py so that all tests pass

Fixed comments. All tests passed with the changes I made to test_plot.py which were necessary to deal with missing values.

fix(eda.plot): Forgot to add viz_uni.py to earlier commit

Forgot to add viz_uni.py to my earlier commit

fix(eda.plot): Changed colours to match with new palette

Changed colours to match with new colour palette

fix(eda.plot): Ran black so the code is formatted

build(CircleCI): Remove docker cache

We reached the limit of the free plan.

fix(eda.plot): fixed problems with plot(df, x)

fix(eda.plot): fixed comments from pull request

fix(eda.plot): fixed additional comments from pull request

fix(eda.plot): removed unnecessary lines of code  and changed sorting key

fix(eda.plot_correlation): Fixed comments from the Trello

fix(plot_correlation): fix corresponding test

Fixed comments from the Trello

fix(eda.plot_correlation): Fixed comments from Trello

fix(eda.plot_correlation): Fixed comments from Trello

fix(eda.plot_correlation): Fixed comments from Trello

fix(eda.plot_correlation): Fixed comments from the Trello

fix(eda.plot_correlation): move sample_size to user function

docs: use sphinx to generate the doc

build: More compatibility for the dependencies

build: use anchor in the circleci yaml

fix(eda.plot_correlation): remove x label and self attention

feat(DataConnector): Merge DataConnector into Dataprep

Link to DataConnectorConfigs repo

build: switch to poetry from pipenv

fix(eda.plot_correlation): add document

fix(eda.plot_missing): add document

fix(eda.plot): merge conflict with render.py

fix(eda.plot): fixed comments from pull request and bugs found in testing

build: More compatibility for the dependencies

refactor(eda.correlation): refactor computing code

refactor(eda.correlation): refactor rendering code

fix(eda): make rest of the code runnable with the new code

feat(data_connector): implement OAuth2 ClientCredentials

implement show_schema

test

fix(eda.plot_missing): add row and columns limitation

fix(eda.plot_missing): add document

fix(eda.plot_missing): change num_rows to bins_num

fix(eda.plot_missing): fix bugs of bins_num

fix(eda.plot_missing): fix details according to Jinglin comment

fix(eda.plot_missing): fix color, colorbar and x-label

fix(eda.plot_missing): consistent with viz_uni.py

refactor(eda.missing): refactor the code into compute_* and render_*

Also removes holoview

feat(build): add poetry build to justfile

fix(eda.correlation): slightly tweak the heatmap visualization

fix(eda.correlation): Show top 30 if the cardinality is too large in the column

refactor(eda-basic): refactored some of the basic plot functions

refactor(eda-basic) finished refactoring the basic functions

refactor(eda-basic) removed show() from render

refactor(eda-basic): added the plot() function

fix(eda.basic): add sanity tests and minor fixes for CI

refactor(eda-basic): fixed comments from the pull request

feat(data_connector): Implement Github config file

fix(eda-basic): fixed the hover tooltip problem of when a column name contains a dash

feat(data_connector): Implement Github config file

chore: modify project info for first release

fix(eda): fix not recognizing categorical dtype

fix(eda-basic): fixed  bugs when numerical column set as categorical

fix(eda-basic): added ngroups for boxplot and formatted intervals for histogram

fix(eda-correlation): optimize the correlation calculation of scatter

refactor(eda.correlation): refactor for readability

fix(eda-basic): fixed categorical column when values are non-strings, and updated the box plots

fix(eda-basic): made plots get larger if ngroups or nsubgroups is large

fix(eda.missing): decrease the min y_range to 0 for histograms

The y_range min should be smaller than the minimal value in the column for histogram otherwise some bars are compressed down to the x-axis and are not visible.
Also fixes not cutting off # of bars in plot_missing(df, x, y).

fix(eda.missing): Fix the label order and boxplot color

fix(eda.missing): accurately calculate missing spectrum using map_blocks

fix(eda.missing): correctly handle categorical data type

fix(data_connector): should auto re-download config files if there's an update in the config repo

Update readme and examples

Add images

Fix readme image not working on pypi

v0.1.0

fix(eda.missing): make the tooltip style align with plot(df)

fix(eda.correlation): it works for the columns with missing values

fix(eda.correlation): plot_correlation only supports for numerical data

fix(eda-basic): fixed xtics for histograms

fix(eda-basic): commented code

fix(eda-basic): add variables for Jinglin comment

fix(eda.correlation): fix scatter and top-k nan

Committer: waterpine <songbian@zju.edu.cn>

docs(dataprep.eda): add documentation

add documentation for eda, plot, plot_correlation and plot_missing

fix(docs): fix warnings

fix(eda-basic): fixed xtic rounding

fix(eda-basic): improved plot(df) efficiency

feat(data-connector): support template

fix(eda.missing): fix parameter names

renew show schema

simple .info demo

refined show_schema and info methods

fix(dc.info): refined data_connector.info format

fix(dc.info): code revision

resolve conflict with master

refommat code

fixed type issue

further improve code

further improve code

blank

further improve code
fatbuddy added a commit to fatbuddy/dataprep that referenced this issue Feb 10, 2024
fatbuddy added a commit to fatbuddy/dataprep that referenced this issue Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant