New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas extension dtypes cause failure when generating profile report #251
Comments
Just encountered this bug in a current project. It was solved when I removed unused categories in all my categorical columns (using https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.CategoricalIndex.remove_unused_categories.html). |
Stale issue |
Could not reproduce this issue with the latest version, closing for now. |
I am having this issue. However, it happens inconsistently. The issue is not present until I subset my data to conversion=1. This works fine:
profile = ProfileReport(
df, title="Profile Report of the January Conversion Dataset"
)
profile.to_file(Path("../../../products/jan_cvr_report.html"))
profile0 = ProfileReport(
df[df['conversion']==0], title="Profile Report of the January Conversion==0 Dataset"
)
profile0.to_file(Path("../../../products/jan_cvr0_report.html")) This is when it breaks:
profile1 = ProfileReport(
df[df['conversion']==1], title="Profile Report of the January Conversion==1 Dataset"
)
profile1.to_file(Path("../../../products/jan_cvr1_report.html")) The only difference is what subset of the data it is. This is the error stack trace I get:
Summarize dataset: 100%
32/32 [00:31<00:00, 1.03it/s, Completed]
Generate report structure: 100%
1/1 [00:04<00:00, 4.85s/it]
Render HTML: 0%
0/1 [00:00<?, ?it/s]
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-60-d419b0248170> in <module>
2 df[df['conversion']==1], title="Profile Report of the January Conversion==1 Dataset"
3 )
----> 4 profile1.to_file(Path("../../../products/jan_cvr1_report.html"))
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/profile_report.py in to_file(self, output_file, silent)
243 create_html_assets(output_file)
244
--> 245 data = self.to_html()
246
247 if output_file.suffix != ".html":
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/profile_report.py in to_html(self)
346
347 """
--> 348 return self.html
349
350 def to_json(self) -> str:
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/profile_report.py in html(self)
166 def html(self):
167 if self._html is None:
--> 168 self._html = self._render_html()
169 return self._html
170
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/profile_report.py in _render_html(self)
287 title=self.description_set["analysis"]["title"],
288 date=self.description_set["analysis"]["date_start"],
--> 289 version=self.description_set["package"]["pandas_profiling_version"],
290 )
291
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/root.py in render(self, **kwargs)
11
12 return templates.template("report.html").render(
---> 13 **self.content, nav_items=nav_items, **kwargs
14 )
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in render(self, *args, **kwargs)
1088 return concat(self.root_render_func(self.new_context(vars)))
1089 except Exception:
-> 1090 self.environment.handle_exception()
1091
1092 def render_async(self, *args, **kwargs):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in handle_exception(self, source)
830 from .debug import rewrite_traceback_stack
831
--> 832 reraise(*rewrite_traceback_stack(source=source))
833
834 def join_path(self, template, parent):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/_compat.py in reraise(tp, value, tb)
26 def reraise(tp, value, tb=None):
27 if value.__traceback__ is not tb:
---> 28 raise value.with_traceback(tb)
29 raise value
30
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/templates/report.html in top-level template code()
20 {% endif %}
21 <div class="content">
---> 22 {{ body.render() }}
23 </div>
24 {% include 'wrapper/footer.html' %}
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/container.py in render(self)
29 return templates.template("sequence/sections.html").render(
30 sections=self.content["items"],
---> 31 full_width=config["html"]["style"]["full_width"].get(bool),
32 )
33 elif self.sequence_type == "grid":
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in render(self, *args, **kwargs)
1088 return concat(self.root_render_func(self.new_context(vars)))
1089 except Exception:
-> 1090 self.environment.handle_exception()
1091
1092 def render_async(self, *args, **kwargs):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in handle_exception(self, source)
830 from .debug import rewrite_traceback_stack
831
--> 832 reraise(*rewrite_traceback_stack(source=source))
833
834 def join_path(self, template, parent):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/_compat.py in reraise(tp, value, tb)
26 def reraise(tp, value, tb=None):
27 if value.__traceback__ is not tb:
---> 28 raise value.with_traceback(tb)
29 raise value
30
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/templates/sequence/sections.html in top-level template code()
1 <div class="{% if full_width %}container-fluid{% else %}container{% endif %}">
2 {% for section in sections %}
----> 3 {% set html = section.render() %}
4 {% if html | length > 0 %}
5 <div class="row header">
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/container.py in render(self)
8 if self.sequence_type in ["list", "accordion"]:
9 return templates.template("sequence/list.html").render(
---> 10 anchor_id=self.content["anchor_id"], items=self.content["items"]
11 )
12 elif self.sequence_type == "named_list":
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in render(self, *args, **kwargs)
1088 return concat(self.root_render_func(self.new_context(vars)))
1089 except Exception:
-> 1090 self.environment.handle_exception()
1091
1092 def render_async(self, *args, **kwargs):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in handle_exception(self, source)
830 from .debug import rewrite_traceback_stack
831
--> 832 reraise(*rewrite_traceback_stack(source=source))
833
834 def join_path(self, template, parent):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/_compat.py in reraise(tp, value, tb)
26 def reraise(tp, value, tb=None):
27 if value.__traceback__ is not tb:
---> 28 raise value.with_traceback(tb)
29 raise value
30
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/templates/sequence/list.html in top-level template code()
2 {% for item in items %}
3 <div class="row spacing">
----> 4 {{ item.render() }}
5 </div>
6 {% endfor %}
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/variable.py in render(self)
5 class HTMLVariable(Variable):
6 def render(self):
----> 7 return templates.template("variable.html").render(**self.content)
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in render(self, *args, **kwargs)
1088 return concat(self.root_render_func(self.new_context(vars)))
1089 except Exception:
-> 1090 self.environment.handle_exception()
1091
1092 def render_async(self, *args, **kwargs):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in handle_exception(self, source)
830 from .debug import rewrite_traceback_stack
831
--> 832 reraise(*rewrite_traceback_stack(source=source))
833
834 def join_path(self, template, parent):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/_compat.py in reraise(tp, value, tb)
26 def reraise(tp, value, tb=None):
27 if value.__traceback__ is not tb:
---> 28 raise value.with_traceback(tb)
29 raise value
30
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/templates/variable.html in top-level template code()
1 <a class="anchor-pos anchor-pos-variable" id="pp_var_{{ anchor_id }}"></a>
2 <div class="variable{% if ignore %} ignore{% endif %}">
----> 3 {{ top.render() }}
4
5 {% if bottom is not none %}
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/container.py in render(self)
33 elif self.sequence_type == "grid":
34 return templates.template("sequence/grid.html").render(
---> 35 items=self.content["items"]
36 )
37
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in render(self, *args, **kwargs)
1088 return concat(self.root_render_func(self.new_context(vars)))
1089 except Exception:
-> 1090 self.environment.handle_exception()
1091
1092 def render_async(self, *args, **kwargs):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in handle_exception(self, source)
830 from .debug import rewrite_traceback_stack
831
--> 832 reraise(*rewrite_traceback_stack(source=source))
833
834 def join_path(self, template, parent):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/_compat.py in reraise(tp, value, tb)
26 def reraise(tp, value, tb=None):
27 if value.__traceback__ is not tb:
---> 28 raise value.with_traceback(tb)
29 raise value
30
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/templates/sequence/grid.html in top-level template code()
1 {% for item in items %}
2 <div class="col-sm-{% if (loop.last and loop.length == 3) or loop.length == 2 %}6{% else %}3{% endif %}{% if item.content['classes'] %} {{ item.content['classes'] }}{% endif %}">
----> 3 {{ item.render() }}
4 </div>
5 {% endfor %}
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/table.py in render(self)
5 class HTMLTable(Table):
6 def render(self):
----> 7 return templates.template("table.html").render(**self.content)
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in render(self, *args, **kwargs)
1088 return concat(self.root_render_func(self.new_context(vars)))
1089 except Exception:
-> 1090 self.environment.handle_exception()
1091
1092 def render_async(self, *args, **kwargs):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/environment.py in handle_exception(self, source)
830 from .debug import rewrite_traceback_stack
831
--> 832 reraise(*rewrite_traceback_stack(source=source))
833
834 def join_path(self, template, parent):
~/opt/anaconda3/lib/python3.7/site-packages/jinja2/_compat.py in reraise(tp, value, tb)
26 def reraise(tp, value, tb=None):
27 if value.__traceback__ is not tb:
---> 28 raise value.with_traceback(tb)
29 raise value
30
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/templates/table.html in top-level template code()
7 <tr{% if 'alert' in row and row['alert'] %} class="alert"{% endif %}>
8 <th>{{ row['name'] }}</th>
----> 9 <td>{{ row['value'] | dynamic_filter(row['fmt']) }}</td>
10 </tr>
11 {% endfor %}
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/presentation/flavours/html/templates.py in <lambda>(x, v)
24 r"\((\d+)\)", r'<span class="badge">\1</span>', x
25 )
---> 26 jinja2_env.filters["dynamic_filter"] = lambda x, v: fmt_mapping[v](x)
27
28
~/opt/anaconda3/lib/python3.7/site-packages/pandas_profiling/report/formatters.py in fmt_percent(value, edge_cases)
60 """
61 if not (1.0 >= value >= 0.0):
---> 62 raise ValueError(f"Value '{value}' should be a ratio between 1 and 0.")
63 if edge_cases and round(value, 3) == 0 and value > 0:
64 return "< 0.1%"
ValueError: Value '6.180529706513958' should be a ratio between 1 and 0. I tried the remove unused category and dropping the only constant column, but no luck. df1 = df[df['conversion']==1].copy(deep=True)
df1.source.cat.remove_unused_categories(inplace=True) #note I only have 2 categorical vars
profile1 = ProfileReport(
df1.drop('conversion',axis=1), title="Profile Report of the January Conversion==1 Dataset"
)
profile1.to_file(Path("../../../products/jan_cvr1_report.html")) UPDATE: Solution FoundIt works with df1.drop('user_id',axis=1) so I trieddf1.user_id.cat.remove_unused_categories(inplace=True) and it works! I didn't realize my user_id column was being treated as a category. I added to stackoverflow in case anyone else runs into this. |
In my case, it also worked by setting |
When attempting to profile a data frame that uses an extension dtype (such as
Int64
in order to be able to represent missing values), a ValueError is raised.To Reproduce
The following is a self-contained example that demonstrates the problem:
Version information:
pip
: If you are usingpip
, runpip freeze
in your environment and report the results. The list of packages can be rather long, you can use the snippet below to collapse the output.Click to expand Version information
The text was updated successfully, but these errors were encountered: