How to input tokenized sentences into sklearn.feature_extraction.text.TfidfVectorizer 

I find the example in the document :
```
from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus)
```
But in my real NLP project , I have tokenized sentences in my special way which means ```corpus``` is a ```corpus_list``` like this :
```
from sklearn.feature_extraction.text import TfidfVectorizer
corpus_list = [
    ['This', 'is', 'the', 'first', 'document', '.'],
    ['This', 'document', 'is', 'the', 'second', 'document', '.'],
    ['And', 'this', 'is', 'the', 'third', 'one', '.'],
    ['Is', 'this', 'the', 'first', 'document', '?']
]
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(corpus_list)
```
I think ```TfidfVectorizer``` should support this input ,because we often use a complex tokenize strategy .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to input tokenized sentences into sklearn.feature_extraction.text.TfidfVectorizer #17279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

How to input tokenized sentences into sklearn.feature_extraction.text.TfidfVectorizer #17279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions