Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation docstrings etc #107

Conversation

henrifroese
Copy link
Collaborator

@henrifroese henrifroese commented Jul 19, 2020

So this is quite a big PR that will finish the first part of #85 . We went through all docstrings and added examples/tests, added other arguments, and fixed some stuff along the way. We also updated the README.md and the getting-started.md

Besides the docstrings updates, some small code changes are:

  • more parameters for the representation functions
  • change to scatterplot to support 3d visualization and return figure correctly

I just went through some other issues and think that additionally this fixes

After this, in line with #85 , a new version should be deployed / published.

We can safely skip these doctests as they're only a wrapper around other modules. the doctests give rounding errors in travis CI anyway
@jbesomi
Copy link
Owner

jbesomi commented Jul 19, 2020

That's sounds amazing! Thank you, Henri and Max!! 🎉

Review:

  1. We should probably have separate PR for documentation improvements and API changes as this makes it easier to see the changes step-by-step. This for helping other contributors to quickly understand changes.
  2. You are adding many new arguments to the functions; that's super great! The only problem? We should unit-test all :S

What are your valuable opinions on that?

Thank you again, I'm impressed!! 👍

also expand "See Also" in representation.py
@henrifroese
Copy link
Collaborator Author

We should probably have separate PR for documentation improvements and API changes as this makes it easier to see the changes step-by-step. This for helping other contributors to quickly understand changes.

As we discussed, we'll do this in the future! 🙃

You are adding many new arguments to the functions; that's super great! The only problem? We should unit-test all :S

We only added arguments to functions from sklearn (CountVectorizer, PCA, Kmeans, ...) where the code calls the sklearn function with exactly the same arguments, so the only errors can come if sklearn is wrong. The only other function where arguments were added is scatterplot, but for this, we also only added an argument exactly from Plotly, so we should be good!

Soon Max and I will go through the code and the getting-started and README again to test out whether everything really works, and then I'll write here and convert this from a draft to a PR to be merged 🥅

@henrifroese
Copy link
Collaborator Author

Soon Max and I will go through the code and the getting-started and README again to test out whether everything really works, and then I'll write here and convert this from a draft to a PR to be merged goal_net

Just did this, fixed some typos, went through the whole code and the getting-started and REAMDE, added a nice 3d-plot to the getting-started 🦅 . We don't see any further issues 🤞

@henrifroese henrifroese marked this pull request as ready for review July 20, 2020 20:24
@henrifroese
Copy link
Collaborator Author

We're now wondering why we can't see our changes in "view deployment" in vercel above (the getting-started and the README which we both edited)?

@jbesomi
Copy link
Owner

jbesomi commented Jul 20, 2020

Good question. I don't really know ... :S
Did you check your changes locally btw? After having installed all dependencies:

cd website
npm run start

@jbesomi
Copy link
Owner

jbesomi commented Jul 20, 2020

It looks like your changes are visible: Getting Started

@henrifroese
Copy link
Collaborator Author

henrifroese commented Jul 21, 2020

Did you check your changes locally btw? After having installed all dependencies:
It looks like your changes are visible: Getting Started

They're not visible there, and they don't appear locally with npm run start, it's still the old version. There should be a second picture at the bottom of getting-started.md. Locally, in website/build/texthero, it's still the old files. Might be that we have to re-build or something? Or maybe that's done automatically when merging? (Sorry, my web dev knowledge is quite limited)

@jbesomi
Copy link
Owner

jbesomi commented Jul 21, 2020

Hey Henri!

It looks like your changes are visible: Getting Started

That's why if you look carefully you will see that actually the changes are there. Example: "we createdThis" => "we created this".

And, if you look at the source code(right-click "view source code" in Chrome), you will see that the image is there too!

The updated file is "scatterplot_bbcsport_3d.png" and not "scatterplot_bccsport_3d.png", that's why you don't see the images appear both locally and online. But the HTML code is there.

Review:

  1. In the README/getting-started, you are not mentioning and neither updating the code so that the representation functions accept a TokenSeries. Is that right? (for me is okay for now as long as we don't plan to push a new version on pypi)

@henrifroese
Copy link
Collaborator Author

The updated file is "scatterplot_bbcsport_3d.png" and not "scatterplot_bccsport_3d.png", that's why you don't see the images appear both locally and online. But the HTML code is there.

Wow, thanks, we would probably still wonder what's wrong without your help! 🥴 🤦‍♂️

We fixed this.

In the README/getting-started, you are not mentioning and neither updating the code so that the representation functions accept a TokenSeries. Is that right? (for me is okay for now as long as we don't plan to push a new version on pypi)

Just pushed changes to fix this. We now show in the README/getting-started how to use hero.tokenize and that it should be used.

@jbesomi
Copy link
Owner

jbesomi commented Jul 22, 2020

Hey Henri and Max!

I'm very sorry but I cannot accept this PR as it is.

Although your work is super good, this is too much for a single PR. There are too many different things going on and without caution, I would lose control over the codebase.

Initially, I thought we would be able to merge it all together, but by looking carefully I believe we will need to split in into self-contained and shorter PRs.

For example, we might split this PR into this sub-PRs:

  1. Improve term_frequency: add new arguments, better docstring
  2. Improve pca: add ...
  3. ...
  4. README.md fix mistakes
  5. README.md + getting-started: add tokenization (this will arrive just at the end, just before submitting a new version) and after having made sure everything (documentation + API) is correct (I will add some extra unit tests that check some pipelines.)

Overall review:

  1. Texthero's purpose is to avoid the complexity of scikit-learn. Texthero functions should be minimal and simple, i.e accepting just a few parameters. In most cases, the default settings should work for most of the users, otherwise, for custom work they can use scikit-learn.

For instance, for the tsne function, we need to understand which arguments are crucial and which are not. At this point, we can add the docstring for the selected ones and remove the others from the function signature.

  1. Sometimes the function Signature is enough explicit:
hero.do_something(s:pd.Series) -> pd.Series

Having "Parameters": Pandas Series and "Return": Pandas Series do not add anything.
With representation functions it might be useful as long as we add more information, for instance by saying that Input Pandas Series is a TokenSeries, the output is RepresentationSeries, etc (this will happen once we define all the different kind of Series ...)

  1. getting-started.md: the "Summary" part contains different code wrt to the tutorial (3d plot instead of 2d). This might be confusing. What if we keep it as it was and under Getting started - representation - scatterplot we will show an example with a 3D plot?

I'm sorry that this review is quite strict, I'm sure you will understand that this is for the good of ours community. If you believe the CONTRIBUTING.md was not clear enough, feel free to tell me and I will try to update the document according to your feedback! 👍

Thank you for your help!! :)

@henrifroese
Copy link
Collaborator Author

henrifroese commented Jul 22, 2020

I'm very sorry but I cannot accept this PR as it is.

No worries 🤓

Although your work is super good, this is too much for a single PR. There are too many different things going on and without caution, I would lose control over the codebase.
Initially, I thought we would be able to merge it all together, but by looking carefully I believe we will need to split in into self-contained and shorter PRs.

We completely understand! We'll split everything into small, self-contained PRs over the next days!

Texthero's purpose is to avoid the complexity of scikit-learn. Texthero functions should be minimal and simple, i.e accepting just a few parameters. In most cases, the default settings should work for most of the users, otherwise, for custom work they can use scikit-learn.

That makes sense, we'll take a close look at what parameters are really needed for most users to keep things simple!

Having "Parameters": Pandas Series and "Return": Pandas Series do not add anything.

Yes, we just saw that in the documentation you can see the types (e.g. f(s: pd.Series) -> pd.Series) anyway, so we'll leave this out for the simple functions.

getting-started.md: the "Summary" part contains different code wrt to the tutorial (3d plot instead of 2d). This might be confusing. What if we keep it as it was and under Getting started - representation - scatterplot we will show an example with a 3D plot?

Makes sense, we'll leave it out. It might make sense to leave it out completely from the getting-started as we'll write tutorials etc. later on anyways that showcase this in more detail, and it's not really "needed" to get started.

Also, I'll close this PR.

@mk2510

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants