Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scattertext has an issue with my code! #3

Closed
shyamalschandra opened this issue Oct 23, 2016 · 5 comments
Closed

Scattertext has an issue with my code! #3

shyamalschandra opened this issue Oct 23, 2016 · 5 comments

Comments

@shyamalschandra
Copy link

shyamalschandra commented Oct 23, 2016

I just took two text datasets and fed them into the boiler-plate code that was shown in the jupyter notebook example but I am getting the following error:

Traceback (most recent call last):
  File "stextt.py", line 1, in <module>
    import scattertext as ST
  File "/Users/shyamalsuhanachandra/Desktop/scattertext.py", line 12, in <module>
AttributeError: 'module' object has no attribute 'TermDocMatrixFromPandas'

Do you know what could be the problem? What should I do?

Here is the code:

import scattertext as ST
import pandas as pd
import io
from IPython.display import IFrame
text1 = open("text1.txt", "r").read()
text2 = open("text2.txt", "r").read()
df = pd.DataFrame( [{'text': text.strip(), 'label': 'text1'} for text in text1.decode('utf-8', errors='ignore').split('\n')] + [{'text': text.strip(), 'label': 'text2'} for text in text2.decode('utf-8', errors='ignore').split('\n')]
)
term_doc_mat = ST.TermDocMatrixFromPandas(data_frame = df, category_col = 'label', text_col = 'text', nlp = ST.fast_but_crap_nlp ).build()
tered_term_doc_mat = (ST.TermDocMatrixFilter(pmi_threshold_coef = 3, min_freq = 10).filter(term_doc_mat))
scatter_chart_data = (ST.ScatterChart(filtered_term_doc_mat).to_dict('text1', category_name='text1', not_category_name='text2'))
viz_data_adapter = ST.viz.VizDataAdapter(scatter_chart_data)
html = ST.viz.HTMLVisualizationAssembly(viz_data_adapter).to_html()
open('subj_obj_scatter.html', 'wb').write(html.encode('utf-8'))
IFrame(src='subj_obj_scatter.html', width = 1000, height=1000)

@shyamalschandra shyamalschandra changed the title Scattertext has a problem! Scattertext has an issue with my code! Oct 23, 2016
@JasonKessler
Copy link
Owner

I can't replicate this. How did you install the package?

Also, make sure that there's not file or folder named "scattertext.py" or "scattertext" in your working directory.

@shyamalschandra
Copy link
Author

I used pip.

Here is the error I am getting now that I removed the scattertext.pyc file from the cwd:

Traceback (most recent call last):
  File "stextt.py", line 13, in <module>
    filtered_term_doc_mat = (ST.TermDocMatrixFilter(pmi_threshold_coef = 3, min_freq = 10).filter(term_doc_mat))
TypeError: __init__() got an unexpected keyword argument 'min_freq'

I will look into the code later today. Thanks for responding so quickly!

@JasonKessler
Copy link
Owner

Ah. Looks like I forgot to update the param name in the example after changing it in a new version. I'll go ahead and change it, but use minimum_term_freq instead of min_freq.

@shyamalschandra
Copy link
Author

Okay, I changed the parameter name to minimum_term_freq instead of min_freq and reran the code and got the following error:

iMac:Desktop shyamalsuhanachandra$ python stextt.py 
Traceback (most recent call last):
  File "stextt.py", line 15, in <module>
    scatter_chart_data = (ST.ScatterChart(filtered_term_doc_mat).to_dict('text1', category_name='text1', not_category_name='text2')) 
  File "/usr/local/lib/python2.7/site-packages/scattertext/ScatterChart.py", line 61, in to_dict
    df = self._build_dataframe_for_drawing(all_categories, category, scores)
  File "/usr/local/lib/python2.7/site-packages/scattertext/ScatterChart.py", line 188, in _build_dataframe_for_drawing
    df[df[all_categories].sum(axis=1) > self.minimum_term_frequency],
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 1991, in __getitem__
    return self._getitem_array(key)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 2035, in _getitem_array
    indexer = self.ix._convert_to_indexer(key, axis=1)
  File "/usr/local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1214, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: "['text1 freq'] not in index"

Any thoughts?

@shyamalschandra
Copy link
Author

I changed the names to text1 and text2 and it runs successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants