Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessing: explain how to create a custom pipeline #38

Open
jbesomi opened this issue Jul 8, 2020 · 5 comments
Open

Preprocessing: explain how to create a custom pipeline #38

jbesomi opened this issue Jul 8, 2020 · 5 comments
Labels
documentation Improvements or additions to documentation

Comments

@jbesomi
Copy link
Owner

jbesomi commented Jul 8, 2020

(Edit)

Add under Getting Started - Preprocessing a section that explains how to create a custom pipeline. This solution is easier than #9

Explain in the docstring of clean how to create a custom pipeline. Code example:

import texthero as hero
import pandas as pd

s = pd.Series(["is is a stopword"])
custom_set_of_stopwords = ['is']

pipeline = [
    lambda s: hero.remove_stopwords(s, stopwords=custom_set_of_stopwords)
]

s.pipe(clean, pipeline=pipeline)
@cedricconol
Copy link
Contributor

Hey @jbesomi, should we include all methods in the default clean pipeline to the example?

@jbesomi
Copy link
Owner Author

jbesomi commented Jul 21, 2020

Hey Cedric, what do you mean with that? The idea here is to explain how to generate a custom pipeline ...

@cedricconol
Copy link
Contributor

cedricconol commented Jul 21, 2020

@jbesomi, sorry it wasn't clear. I'm talking about how to explain generating a custom pipeline in clean's docstring.

import texthero as hero
import pandas as pd

s = pd.Series(["is is a stopword"])
custom_set_of_stopwords = ['is']

pipeline = [
    lambda s: hero.remove_stopwords(s, stopwords=custom_set_of_stopwords)
]

s.pipe(clean, pipeline=pipeline)

^^This example that you gave shows how to customize pipeline by passing a custom set of stop words to remove_stopwords. I'm wondering if you want to add more examples in the docstring which shows other ways to customize the pipeline.
For example, showing how to use only some of the methods in the default pipeline:

from texthero import preprocessing

custom_pipeline = [preprocessing.fillna,
                   preprocessing.lowercase,
                   preprocessing.remove_whitespace]
df['clean_text'] = hero.clean(df['text'], custom_pipeline)

Actually, I'm not sure if it's a good idea to show more one than example in the docstring, maybe a more detailed explanation should be in a separate Getting Started - Preprocessing section as you suggested.

@jbesomi
Copy link
Owner Author

jbesomi commented Jul 24, 2020

The second code you showed is basically the once from get_default_pipeline. The discussion here was more intended to show how to create a custom pipeline with functions that might require other arguments as input ...

@cedricconol
Copy link
Contributor

I see, that is much clearer now. Can you give me some pointers on what should be added to the docstring? Is adding the code below in Examples enough?

import pandas as pd

s = pd.Series(["is is a stopword"])
custom_set_of_stopwords = ['is']

pipeline = [
    lambda s: hero.remove_stopwords(s, stopwords=custom_set_of_stopwords)
]

s.pipe(clean, pipeline=pipeline)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants