Skip to content

Commit

Permalink
DOC fetch_20newsgroups_vectorized is based on CountVectorizer (#11685)
Browse files Browse the repository at this point in the history
  • Loading branch information
qinhanmin2014 committed Jul 26, 2018
1 parent 9d649c5 commit f1c9678
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 5 deletions.
2 changes: 1 addition & 1 deletion sklearn/datasets/descr/twenty_newsgroups.rst
Expand Up @@ -117,7 +117,7 @@ components by sample in a more than 30000-dimensional space
159.01327...

:func:`sklearn.datasets.fetch_20newsgroups_vectorized` is a function which
returns ready-to-use tfidf features instead of file names.
returns ready-to-use token counts features instead of file names.

.. _`20 newsgroups website`: http://people.csail.mit.edu/jrennie/20Newsgroups/
.. _`TF-IDF`: https://en.wikipedia.org/wiki/Tf-idf
Expand Down
13 changes: 9 additions & 4 deletions sklearn/datasets/twenty_newsgroups.py
Expand Up @@ -313,15 +313,20 @@ def fetch_20newsgroups(data_home=None, subset='train', categories=None,

def fetch_20newsgroups_vectorized(subset="train", remove=(), data_home=None,
download_if_missing=True, return_X_y=False):
"""Load the 20 newsgroups dataset and transform it into tf-idf vectors \
"""Load the 20 newsgroups dataset and vectorize it into token counts \
(classification).
Download it if necessary.
This is a convenience function; the tf-idf transformation is done using the
default settings for `sklearn.feature_extraction.text.Vectorizer`. For more
This is a convenience function; the transformation is done using the
default settings for
:class:`sklearn.feature_extraction.text.CountVectorizer`. For more
advanced usage (stopword filtering, n-gram extraction, etc.), combine
fetch_20newsgroups with a custom `Vectorizer` or `CountVectorizer`.
fetch_20newsgroups with a custom
:class:`sklearn.feature_extraction.text.CountVectorizer`,
:class:`sklearn.feature_extraction.text.HashingVectorizer`,
:class:`sklearn.feature_extraction.text.TfidfTransformer` or
:class:`sklearn.feature_extraction.text.TfidfVectorizer`.
================= ==========
Classes 20
Expand Down

0 comments on commit f1c9678

Please sign in to comment.