How to speed the data in lot of data #293

Benjamin-zhangjb · 2019-12-10T06:29:47Z

No description provided.

Benjamin-zhangjb · 2019-12-10T06:30:52Z

plot={'histogram':{'bayesian_blocks_bins': False}} Where it should be added

neomatrix369 · 2019-12-12T22:39:22Z

plot={'histogram':{'bayesian_blocks_bins': False}} Where it should be added

Here is an example:

training_profile = train.profile_report(title='Pandas Profiling on training set', 
plot={'histogram': {'bins': 8}}, 
style={'full_width': True}, minify_html=True, pool_size=no_processors)

Thats how you use the parameters.

rmokros · 2020-02-13T22:33:44Z

df.shape (6.370.599, 33)
profile = ProfileReport( df,

plot={'histogram': {'bins': None}},

                    plot={'histogram':{'bayesian_blocks_bins': False}},
                    check_correlation_pearson=False,
                    correlations={
                        "pearson": False,
                        "spearman": False,
                        "kendall": False,
                        "phi_k": False,
                        "cramers": False,
                        "recoded":False})

AWS ml.m5.24xlarge (384 GB memory)
Result:
with 4 floats memory error convert this float to string and runs ok .....

sbrugman · 2020-05-12T00:03:37Z

The minimal mode uses even less computation as of v2.8.0. The release before that contained numerous performance optimizations as well, included disabling bayesian_blocks by default.

mthomp89 · 2020-07-15T20:13:18Z

Running minimal=True on dataframe with shape (326878, 38) with profile report dumping out to html file. The profile report does not produce. Executed 5% sample of original dataframe and the profile report produced in 2-3 minutes. What other configuration settings can be turned off?

pandas_profiling v2.4
pandas v1.0.1

sbrugman · 2020-07-15T20:29:00Z

Running minimal=True on dataframe with shape (326878, 38) with profile report dumping out to html file. The profile report does not produce. Executed 5% sample of original dataframe and the profile report produced in 2-3 minutes. What other configuration settings can be turned off?

pandas_profiling v2.4
pandas v1.0.1

Upgrading to the latest version will speed up significantly.

…i#258, ydataai#261, ydataai#293)

Benjamin-zhangjb added the feature request 💬 Requests for new features label Dec 10, 2019

sbrugman added a commit that referenced this issue Jan 2, 2020

Performance: introduce minimal mode. (#76, #222, #258, #261, #293)

3f099c4

neomatrix369 mentioned this issue Jan 13, 2020

get_rejected_variables missing after release of v2.4.0 #315

Closed

neomatrix369 mentioned this issue Mar 27, 2020

[Question] What are the different options we can use when running analysis on a big/large datasets? #420

Closed

sbrugman added the help wanted 🙋 Contributions are welcome! label Apr 13, 2020

sbrugman closed this as completed May 12, 2020

chanedwin pushed a commit to chanedwin/pandas-profiling that referenced this issue Oct 11, 2020

Performance: introduce minimal mode. (ydataai#76, ydataai#222, ydataa…

fa60395

…i#258, ydataai#261, ydataai#293)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to speed the data in lot of data #293

How to speed the data in lot of data #293

Benjamin-zhangjb commented Dec 10, 2019

Benjamin-zhangjb commented Dec 10, 2019

neomatrix369 commented Dec 12, 2019 •

edited

rmokros commented Feb 13, 2020

sbrugman commented May 12, 2020

mthomp89 commented Jul 15, 2020

sbrugman commented Jul 15, 2020

How to speed the data in lot of data #293

How to speed the data in lot of data #293

Comments

Benjamin-zhangjb commented Dec 10, 2019

Benjamin-zhangjb commented Dec 10, 2019

neomatrix369 commented Dec 12, 2019 • edited

rmokros commented Feb 13, 2020

plot={'histogram': {'bins': None}},

sbrugman commented May 12, 2020

mthomp89 commented Jul 15, 2020

sbrugman commented Jul 15, 2020

neomatrix369 commented Dec 12, 2019 •

edited