Skip to content
This repository has been archived by the owner on Jun 3, 2024. It is now read-only.
This repository has been archived by the owner on Jun 3, 2024. It is now read-only.

How to use countplot() in plotly with VAEX data frame? #174

Open
bhargav-inthezone opened this issue Jun 9, 2021 · 9 comments
Open

How to use countplot() in plotly with VAEX data frame? #174

bhargav-inthezone opened this issue Jun 9, 2021 · 9 comments

Comments

@bhargav-inthezone
Copy link

Some one please give me an alternate plotly code for this one :
sns.countplot(x='Census_ProcessorClass', hue='HasDetections',data=df_train)
plt.show()

both are int64

@nicolaskruchten
Copy link
Contributor

This is basically px.histogram.

@bhargav-inthezone
Copy link
Author

This is basically px.histogram.

df_train = vaex DataFrame
when I tried using this :

fig = px.histogram(df_train, x ='Census_ProcessorClass' , color = 'HasDetections', barmode = 'relative')
fig.show()

I am getting this Value error :
ValueError Traceback (most recent call last)
in
----> 1 fig = px.histogram(df_train, x ='Census_ProcessorClass' , color = 'HasDetections', barmode = 'relative')
2 fig.show()

/opt/conda/lib/python3.7/site-packages/plotly/express/_chart_types.py in histogram(data_frame, x, y, color, facet_row, facet_col, facet_col_wrap, facet_row_spacing, facet_col_spacing, hover_name, hover_data, animation_frame, animation_group, category_orders, labels, color_discrete_sequence, color_discrete_map, marginal, opacity, orientation, barmode, barnorm, histnorm, log_x, log_y, range_x, range_y, histfunc, cumulative, nbins, title, template, width, height)
454 histnorm=histnorm, histfunc=histfunc, cumulative=dict(enabled=cumulative),
455 ),
--> 456 layout_patch=dict(barmode=barmode, barnorm=barnorm),
457 )
458

/opt/conda/lib/python3.7/site-packages/plotly/express/_core.py in make_figure(args, constructor, trace_patch, layout_patch)
1859 apply_default_cascade(args)
1860
-> 1861 args = build_dataframe(args, constructor)
1862 if constructor in [go.Treemap, go.Sunburst] and args["path"] is not None:
1863 args = process_dataframe_hierarchy(args)

/opt/conda/lib/python3.7/site-packages/plotly/express/_core.py in build_dataframe(args, constructor)
1376
1377 df_output, wide_id_vars = process_args_into_dataframe(
-> 1378 args, wide_mode, var_name, value_name
1379 )
1380

/opt/conda/lib/python3.7/site-packages/plotly/express/_core.py in process_args_into_dataframe(args, wide_mode, var_name, value_name)
1181 if argument == "index":
1182 err_msg += "\n To use the index, pass it in directly as df.index."
-> 1183 raise ValueError(err_msg)
1184 elif length and len(df_input[argument]) != length:
1185 raise ValueError(

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of [0] but received: Census_ProcessorClass

@nicolaskruchten
Copy link
Contributor

Try converting your Vaex df to a Pandas one to see if that resolves things?

@bhargav-inthezone
Copy link
Author

Try converting your Vaex df to a Pandas one to see if that resolves things?

Yeah Nic I am pretty sure it will resolve the issue but it will take a lot of time and memory to convert my data into pandas dataframe. I think my system may crash.

I am looking for more efficient ways. Is there any method to make Vaex dataframe acceptable by plotly.

@nicolaskruchten
Copy link
Contributor

PX doesn't natively accept Vaex data frames at the moment, no. Part of the reason for that is that for plots like these histograms, it doesn't do Python-side aggregation: all the data is sent to the browser for aggregation, so there's a bit of an upper bound on the dataset size that px.histogram can handle anyway.

@nicolaskruchten
Copy link
Contributor

See plotly/plotly.py#2649 for more details

@bhargav-inthezone
Copy link
Author

See plotly/plotly.py#2649 for more details

Thanks will check this

@bhargav-inthezone
Copy link
Author

See plotly/plotly.py#2649 for more details

Hey after lot of trail and errors, I think I found a better way. Check this code it worked

fig = px.histogram (x = df_train['Census_ProcessorClass'].tolist(), color= df_train['HasDetections'].tolist())
fig.show()

newplot

@bhargav-inthezone
Copy link
Author

See plotly/plotly.py#2649 for more details

Hey after lot of trail and errors, I think I found a better way. Check this code it worked

fig = px.histogram (x = df_train['Census_ProcessorClass'].tolist(), color= df_train['HasDetections'].tolist())
fig.show()

newplot

I found a much better method:

df_train.select(df_train['Census_ProcessorClass'] ,'Census_ProcessorClass' != 'None' )
x_axis = df_train.evaluate(df_train['Census_ProcessorClass'], selection = True)
color_axis = df_train.evaluate(df_train['HasDetections'], selection = True)

%%time
fig = px.histogram (x = x_axis, color= color_axis, width = 300, height = 400)
fig.show()
newplot (1)

CPU times: user 761 ms, sys: 33.1 ms, total: 794 ms
Wall time: 811 ms

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants