Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the top-level README to be more focused #1081

Merged
merged 23 commits into from Mar 24, 2020
Merged

Conversation

huonw
Copy link
Member

@huonw huonw commented Mar 15, 2020

This tweaks our top-level README to try to be a bit more focused and helpful:

  • rewrite some of the text to be a bit more relevant to data scientists, e.g. reading a bit more naturally by being slightly less precise/exact, and trying to draw the connection to their real data and problems
  • providing a concrete workflow to get started in two forms:
    • a Getting Started section: this includes providing basic Installation instructions, in addition to the detailed ones (users familiar with their environment and Python tooling can start with one of the basic ones, while the expanded section provides more guidance for those who need it)
    • a code example (GCN), that tries to emphasise how much of the workflow just builds on the normal ML/TF workflow, and finishes with links to the full GCN demo as well as the other examples
  • providing a "quick links" for getting help, as well as making sure that section can be found easily with a search (it mentions "help", "support", "contact us")
  • removing/moving some of the content that's more internal/developer focused

This doesn't touch the algorithms or references sections, since these are somewhat standalone, are good concepts, and can be tweaked as independent work. This also doesn't touch any other READMEs, such as those for the demos.

There's potentially many more tweaks we could do. Let me know if you think any are good.

Rendered form: https://github.com/stellargraph/stellargraph/blob/feature/725-readme/README.md

See: #725

@codeclimate
Copy link

codeclimate bot commented Mar 15, 2020

Code Climate has analyzed commit 905b17d and detected 0 issues on this pull request.

View more on Code Climate.

@huonw huonw marked this pull request as ready for review March 16, 2020 01:26
Copy link
Contributor

@kjun9 kjun9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I like the new way it's organised, with a better focus on getting started and getting help, and removing guiding principles from the top.

Some of my suggestions are based around trying to make the minimal example appear more minimal

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Co-Authored-By: kevin <33508488+kjun9@users.noreply.github.com>
Copy link
Member Author

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kjun9, I much prefer the new more minimal one.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@huonw
Copy link
Member Author

huonw commented Mar 16, 2020

I've updated this with a few changes:

  • added @kjun9's suggestions
  • tweaked the comments/exact code from those suggestions very slightly
  • expanded the "StellarGraph supports analysis of ..." sentence into a list, covering more than just the homogeneous vs. heterogeneous split
  • moved the "The StellarGraph library offers state-of-the-art algorithms ..." section to be first in the Introduction, since it's the important bit, and the discussion of what graph data is is more clarification (I could easily be convinced this isn't an improvement, and some other approach is better 😄)
  • added more links, including to some of our blog posts

@huonw huonw requested a review from kjun9 March 16, 2020 05:09
Copy link
Contributor

@kjun9 kjun9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 👍

README.md Outdated
@@ -38,90 +38,132 @@
</p>


# Table of Contents
**StellarGraph** is a Python library for machine learning on [graphs and networks](https://en.wikipedia.org/wiki/Graph_\(discrete_mathematics\)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great concise summary, however the line of text gets a bit lost between the badges and contents.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try moving the badges above the # StellarGraph Machine Learning Library title, and see if that works better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot about this. @timpitman what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, thanks!

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
Copy link
Contributor

@timpitman timpitman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great refactor of the readme, well done! I've found a few minor issues. There's future work in expanding "getting started" that we're already looking at for #992.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
@huonw huonw requested a review from timpitman March 17, 2020 03:37
Copy link
Contributor

@timpitman timpitman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excellent!

@huonw huonw requested a review from habiba-h March 18, 2020 03:18
Copy link
Contributor

@kieranricardo kieranricardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Outdated

Graph-structured data represent entities as nodes (or vertices) and relationships between them as edges (or links), along with associated data as attributes. For example, a graph can contain people as nodes and friendships between them as links, with data like a person's age and the date a friendship was established. StellarGraph supports analysis of many kinds of graphs:
Copy link
Contributor

@habiba-h habiba-h Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data as attributes

associated with nodes and/or edges. (Didn't sound complete to me without whose attributes we are talking about.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to phrase this without being a bit awkward. I went with

and can data include data associated with either as attributes

but I don't particularly like it. What do you think? Do you have a suggestion?


## Guiding Principles
- homogeneous (with nodes and links of one type),
- heterogeneous (with more than one type of nodes and/or links)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe a comma after all the bullets and an ', and' on the second last one and a full stop after the last.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in two minds about this style. It does make it read as a sentence, but the hanging and and inconsistent punctuation can be disconcerting.

For instance, https://developers.google.com/tech-writing/one/lists-and-tables doesn't use this style, and the discussion of "keep items parallel" somewhat implies against it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a confident opinion about the style but having only bullet point with a comma at the end bothered me :-).
I usually do commas and the comma and when having bullets that are not starting as standalone sentences. I don't know when I started doing that but held on to that convention. I am down with any popular style guide :-).

README.md Show resolved Hide resolved
supervised classifier training for the downstream task.
- See the demo in folder `demos/node-classification-hinsage` for examples of how to predict attributes of nodes
using the HinSAGE algorithm for given node features and training labels.
## Getting Help
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Getting Help section would look more appropriate at the end before the references?
Usually Help tab or link is towards the end. Also, the way I see it is that the Getting Help is for everything, code, installation, and all else. So once, all the things are already mentioned here in the readme, if still a user needs here, here are all the ways they can get it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I replied to this on your comment on the table of contents)


**StellarGraph** is a Python library for machine learning on [graphs and networks](https://en.wikipedia.org/wiki/Graph_\(discrete_mathematics\)).

## Table of Contents
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw I have a bit of a different take on the structure and this is only meant as a suggestion, so take whatever makes sense in the end.

  1. Introduction the overarching explanation of what Graph ML is and what StellarGraph does
  2. Algorithms short teaser of all the methods implemented in the library
  3. Installation the structure right now tells a user how to install with demo before installing the library. I think the basic library installation should be upfront. The demo are the optional bit that if a user is interested in testing the functionality of the library before they can install with the demos.
  4. Sample demos here you point out to the demos directory.
    4.1 Example The detailed example should follow right after the general introduction to the sample demos. (the example looks much more user friendly to me now :-)). I have some other comments that I will add to the example.
  5. Getting Help _I think the help is for all the things above, installation, understanding of approaches, demos etc. so should come at the end of everything. _
  6. Citing
  7. References

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Algorithms short teaser of all the methods implemented in the library

I put this towards the end because I felt it's mainly helpful for someone familiar with graphs and graph ML. For instance, someone who is looking for a specific algorithm. They might either come here through a google search for that algorithm name or searching for graph ML libraries, and then either ctrl-F for the specific algorithm or jump to the section.

For someone unfamiliar with graph ML (or, at least, unfamiliar with the specific algorithms), I don't think the current table of algorithms is very easily digested, and so I'm concerned readers will get distracted before wading through it all.

Installation the structure right now tells a user how to install with demo before installing the library. I think the basic library installation should be upfront. The demo are the optional bit that if a user is interested in testing the functionality of the library before they can install with the demos.

I put this later because it's a bit of a "reference" section. If a user has decided to install StellarGraph, they can easily jump to the relevant section; if they haven't decided to install it, it's mostly distraction. For the latter, one thing it is useful for might be to emphasise how easy it is to install, so I've added:

It is thus also [easy to install with `pip` or Anaconda](#Installation).

to the end of the last paragraph of the introduction.

Sample demos here you point out to the demos directory.
4.1 Example The detailed example should follow right after the general introduction to the sample demos. (the example looks much more user friendly to me now :-)). I have some other comments that I will add to the example.

To be specific, you think it's better to have a "Sample demos" section than a "Getting started" one? I feel that the "getting started" phrasing is more obvious as a place for users to look: it is talking about what they want to do (start using StellarGraph), rather than the route we're suggesting they use (look at demos).

Getting Help _I think the help is for all the things above, installation, understanding of approaches, demos etc. so should come at the end of everything. _

As touched upon my some of my earlier comments, I don't see this as a linear document because there's some sections that many uses will not look at (because they don't need to, for their particular use-case/background, e.g. if pip install stellargraph is fine for a user, they don't need to look at the Anaconda or docker installation instructions). In addition, I think we should be emphasising the getting help section: we are not getting too many issues at the moment, so we should be happy to help anyone with any problem related to StellarGraph. It's better for someone to ask a question that has already been answered (but they didn't find) than to not ask and give up using StellarGraph.


# convert the raw data into StellarGraph's graph format for faster operations
graph = sg.StellarGraph(nodes, edges)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this part should be split. Creating StellarGraph is an independent task than the Generator and other model specific tasks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in theory it is, but I put this here for 3 reasons:

  • three has a reputation as being a good number for writing https://en.wikipedia.org/wiki/Rule_of_three_(writing)
  • put all of the StellarGraph stuff together so that we can say that the first and third sections are just normal data science, to emphasise that graph ML is not a new workflow
  • I couldn't think of a third reason, but 3 is nice 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about the rule of three. Learnt something new :-D.

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
Copy link
Member Author

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thoughtful review @habiba-h, I've made some updates based on some, and replied with some questions for the others.

README.md Outdated
@@ -38,90 +38,132 @@
</p>


# Table of Contents
**StellarGraph** is a Python library for machine learning on [graphs and networks](https://en.wikipedia.org/wiki/Graph_\(discrete_mathematics\)).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot about this. @timpitman what do you think?

README.md Outdated

Graph-structured data represent entities as nodes (or vertices) and relationships between them as edges (or links), along with associated data as attributes. For example, a graph can contain people as nodes and friendships between them as links, with data like a person's age and the date a friendship was established. StellarGraph supports analysis of many kinds of graphs:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to phrase this without being a bit awkward. I went with

and can data include data associated with either as attributes

but I don't particularly like it. What do you think? Do you have a suggestion?


## Guiding Principles
- homogeneous (with nodes and links of one type),
- heterogeneous (with more than one type of nodes and/or links)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in two minds about this style. It does make it read as a sentence, but the hanging and and inconsistent punctuation can be disconcerting.

For instance, https://developers.google.com/tech-writing/one/lists-and-tables doesn't use this style, and the discussion of "keep items parallel" somewhat implies against it.

from sklearn import model_selection

# load data into Pandas DataFrames, e.g. from CSV files or a database
nodes, edges, targets = load_my_data()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's no load_my_data function in StellarGraph. We previously spelled this out in more detail, e.g. https://github.com/stellargraph/stellargraph/blob/1aa1188fbc0ea5ae7f65f1861dc5e8f31f44f3e9/README.md#example-gcn shows the pre-review version of the README, but we cut it out because it allowed reducing the amount of code significantly (see #1081 (comment)).

As before, this example is focusing on giving the high-level taster for StellarGraph; the demo notebook seems like a better place to look for copy/pasteable code, and so I think the trade-off of having an implied "write your own code" section is ok. You make a good point that it's a little ambiguous, so I've added a stub function:

import pandas as pd
from sklearn import model_selection

def load_my_data():
    # your own code to load data into Pandas DataFrames, e.g. from CSV files or a database
    ...

nodes, edges, targets = load_my_data()

# Use scikit-learn to compute training and test sets
train_targets, test_targets = model_selection.train_test_split(targets, train_size=0.5)


# convert the raw data into StellarGraph's graph format for faster operations
graph = sg.StellarGraph(nodes, edges)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in theory it is, but I put this here for 3 reasons:

  • three has a reputation as being a good number for writing https://en.wikipedia.org/wiki/Rule_of_three_(writing)
  • put all of the StellarGraph stuff together so that we can say that the first and third sections are just normal data science, to emphasise that graph ML is not a new workflow
  • I couldn't think of a third reason, but 3 is nice 😄


**StellarGraph** is a Python library for machine learning on [graphs and networks](https://en.wikipedia.org/wiki/Graph_\(discrete_mathematics\)).

## Table of Contents
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Algorithms short teaser of all the methods implemented in the library

I put this towards the end because I felt it's mainly helpful for someone familiar with graphs and graph ML. For instance, someone who is looking for a specific algorithm. They might either come here through a google search for that algorithm name or searching for graph ML libraries, and then either ctrl-F for the specific algorithm or jump to the section.

For someone unfamiliar with graph ML (or, at least, unfamiliar with the specific algorithms), I don't think the current table of algorithms is very easily digested, and so I'm concerned readers will get distracted before wading through it all.

Installation the structure right now tells a user how to install with demo before installing the library. I think the basic library installation should be upfront. The demo are the optional bit that if a user is interested in testing the functionality of the library before they can install with the demos.

I put this later because it's a bit of a "reference" section. If a user has decided to install StellarGraph, they can easily jump to the relevant section; if they haven't decided to install it, it's mostly distraction. For the latter, one thing it is useful for might be to emphasise how easy it is to install, so I've added:

It is thus also [easy to install with `pip` or Anaconda](#Installation).

to the end of the last paragraph of the introduction.

Sample demos here you point out to the demos directory.
4.1 Example The detailed example should follow right after the general introduction to the sample demos. (the example looks much more user friendly to me now :-)). I have some other comments that I will add to the example.

To be specific, you think it's better to have a "Sample demos" section than a "Getting started" one? I feel that the "getting started" phrasing is more obvious as a place for users to look: it is talking about what they want to do (start using StellarGraph), rather than the route we're suggesting they use (look at demos).

Getting Help _I think the help is for all the things above, installation, understanding of approaches, demos etc. so should come at the end of everything. _

As touched upon my some of my earlier comments, I don't see this as a linear document because there's some sections that many uses will not look at (because they don't need to, for their particular use-case/background, e.g. if pip install stellargraph is fine for a user, they don't need to look at the Anaconda or docker installation instructions). In addition, I think we should be emphasising the getting help section: we are not getting too many issues at the moment, so we should be happy to help anyone with any problem related to StellarGraph. It's better for someone to ask a question that has already been answered (but they didn't find) than to not ask and give up using StellarGraph.

README.md Show resolved Hide resolved
supervised classifier training for the downstream task.
- See the demo in folder `demos/node-classification-hinsage` for examples of how to predict attributes of nodes
using the HinSAGE algorithm for given node features and training labels.
## Getting Help
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I replied to this on your comment on the table of contents)

README.md Show resolved Hide resolved
README.md Show resolved Hide resolved
Copy link
Contributor

@timpitman timpitman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of our email contact to the README looks good. This did get me thinking about CONTRIBUTING.md though - should it also have the contact details (or a link back to the README?), and also the README itself doesn't link to CONTRIBUTING.md.

@huonw
Copy link
Member Author

huonw commented Mar 24, 2020

Thanks for the reviews everyone, I think it's now much better than my first version! 😄

@huonw huonw merged commit 9ef9a1f into develop Mar 24, 2020
@huonw huonw deleted the feature/725-readme branch March 31, 2020 03:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants