Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Remove Tags and Replace Tags #50

Merged
merged 2 commits into from
Jul 9, 2020

Conversation

ishanarora04
Copy link
Contributor

No description provided.

@ishanarora04
Copy link
Contributor Author

Hey @jbesomi , Can you check this PR ?

@jbesomi
Copy link
Owner

jbesomi commented Jul 9, 2020

Hey @ishanarora04, thank you for your PR! Great job 👍

This PR looks very similar to PR #47 and #35. Please, have a look at it for some important feedbacks.

Unfortunately, this PR cannot be merged it's failing the CI test and it has also some minor problems (see implementation issues).

Furthermore, may you please add some extra unit-tests for checking all the edge cases (@abc should be removed, whereas abc@def no, @ abc should not be removed (extra space between @ and abc) , #123 should be removed, ... and so on )?

Implementation issues

  1. Before open a new PR, it's always really useful to look at previous code. If look carefully, you will notice that all replace_x, remove_x follow the same patter: remove_x just call replace_x whth symbol= . We should stick to the same format.
  2. The regular expression does not take into account numbers in the tags, right? I would say a tag must be formed of alphanumeric character (upper and lowercase) + numeric character

If anything is unclear, please just let me know!

Read contribution.md
If you haven't already done, have a look at CONTRIBUTING.md

@ishanarora04
Copy link
Contributor Author

Hey @jbesomi Can I have a recheck over here ?

@jbesomi
Copy link
Owner

jbesomi commented Jul 9, 2020

Hey @ishanarora04, that looks super great!!

As we aim for great documentation (Texthero's users will be very grateful for your work), there are still some details we can improve.

  1. Unit tests + code looks great!
  2. Add Parameters under the docstring for replace_tags (for the remove_tags this is not necessary).
    You can look at texthero.preprocessing.replace_stopwords for an example
  3. replace_tags: it's written "replace_tags replace any tags with an empty space.", this is actually not what it's doing. Right? Should be replaced with something like "Replace all tags in the given Pandas Series with symbol".
  4. remove_tags: it's written "remove_tags removes any tags with an empty space.". Maybe "remove_tags remove any tags and it replaces it with an empty space." Gives a better idea?
  5. In both docstring, it's for sure useful to define what it is a tag. "A tag is a string formed by @ concatenated with a sequence composed of characters and digits. Example: @hero21" (feel free to rewrite it yourself to make it even more clear)

After these last changes, probably the PR will be good to be merged 👍

@ishanarora04
Copy link
Contributor Author

Documentation updated

@jbesomi
Copy link
Owner

jbesomi commented Jul 9, 2020

Thank you, great job!

I will merge it now but will change some minor details later today:

  • the word "them" can probably not be used to refer inanimate object, it's that possible?
  • Under parameters, there should not be the tab space (Have a look at the numpy docstring guide)
  • The docstring should be just after the function declaration, no extra empty line there (this is a best practice)

So, congrats on your first contribution!

Next step? You can add your name under "Contributors" on the README and send us a PR. I will be glad to accept it!

Next next step? You can do about the same you did now but for hashtags ( PR #30) 🎉🎉

@jbesomi
Copy link
Owner

jbesomi commented Jul 9, 2020

(Edited)

Just noticed you added an extra commit. In general, please keep all commits and do not "force-push". That makes it easier to see what you changed from one commit to the other. When the PR will be merged, only at this point we will squash all commit.

Also, for each new commit, it would be great if you can write a custom title that well-represent what you changed/improved, again: to make the process more simple to all :)

@ishanarora04
Copy link
Contributor Author

ishanarora04 commented Jul 9, 2020

Thanks @jbesomi. I will keep in mind for future. Just wanted to keep it to single commit for the feature. I have updated the README.md. please consider the PR.

@jbesomi
Copy link
Owner

jbesomi commented Jul 9, 2020

Oh, sorry, my bad; I wasn't clear enough. The PR for the change of the README should be another one. One PR should be single scoped; in this case solve issue #49.

@ishanarora04
Copy link
Contributor Author

I hope it's all clear now. Raised a separate PR for Contributors.

@jbesomi jbesomi merged commit ccdd3af into jbesomi:master Jul 9, 2020
@jbesomi
Copy link
Owner

jbesomi commented Jul 9, 2020

All clear! Thank you!

henrifroese pushed a commit to SummerOfCode-NoHate/texthero that referenced this pull request Jul 12, 2020
* Added Remove Tags and Replace Tags

* removed contributor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add replace_tags, remove_tags
2 participants