New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation structure #8
Changes from all commits
7ae37cf
914ecfd
55f7137
4447654
f7e4997
e33c8b9
c24ad80
c28f308
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,299 @@ | ||
# Documentation | ||
|
||
For the purpose of this proposal document we consider all of the following as documentation: | ||
|
||
- documentation in our code within the Fairlearn repository | ||
- Jupyter notebooks | ||
- project website | ||
- blog posts | ||
|
||
Throughout the document we compare with the documentation of other popular | ||
projects such as | ||
|
||
- [scikit-learn](https://scikit-learn.org/) | ||
- [pandas](https://pandas.pydata.org/) | ||
|
||
## Goals | ||
|
||
The documentation should be | ||
|
||
- discoverable, ideally in a single place as opposed to multiple | ||
- clear | ||
- concise when describing individual pieces of functionality | ||
- detailed when describing entire application scenarios, e.g. in the form of | ||
example notebooks | ||
- available for the latest version, but if possible also for past versions | ||
([example](https://scikit-learn.org/dev/versions.html)) | ||
- maintainable: it should be simple or at least clear for maintainers how to | ||
update/validate | ||
- without ads (readthedocs always has ads that are shown alongside our | ||
documentation) | ||
|
||
## Proposal | ||
|
||
Like for most projects the website, [fairlearn.org](http://fairlearn.org) will | ||
be the central place to look for documentation. | ||
From there visitors have various paths to explore content depending on what | ||
they are looking for, as detailed in the following subsections. | ||
|
||
### Homepage | ||
|
||
``` | ||
website | ||
|--- About | ||
|--- Quickstart | ||
|--- API reference | ||
|--- User guide | ||
|--- Example Notebooks | ||
|--- Contributor guide | ||
``` | ||
|
||
### About | ||
|
||
This page provides a high-level overview of Fairlearn including | ||
|
||
- mission | ||
- Who does the project serve? Who are current users? | ||
- project roadmap | ||
- governance structure | ||
- history of the project | ||
- FAQ section | ||
|
||
It serves as a primary entrypoint for people who want to understand what | ||
Fairlearn's purpose is (mission), what's coming up (roadmap), and how the | ||
project is set up and controlled (governance). The FAQ section should round | ||
this out by providing answers to frequently asked questions. We already have | ||
lots of reoccurring questions that would make sense there. | ||
|
||
### Quickstart | ||
|
||
This page provides information on | ||
|
||
- Installation | ||
- brief introduction/framing of fairness in ML | ||
- including basic terminology (perhaps link to more comprehensive section on a | ||
different page) | ||
- walk-through | ||
- load data | ||
- mitigate disparity of an estimator | ||
- evaluate a few metrics | ||
- run the dashboard | ||
- links showing where to go next, e.g. links to section of the user guide | ||
|
||
Installation should cover various platforms, which should be very | ||
straightforward. Any reoccurring patterns in reported issues should be listed, | ||
as well as how to troubleshoot them. It may end up similar to | ||
[this guide](https://scikit-learn.org/dev/install.html)). | ||
|
||
Example: | ||
|
||
- [pandas](https://pandas.pydata.org/getting_started.html) | ||
- [scikit-learn](https://scikit-learn.org/dev/getting_started.html) | ||
|
||
This should be very similar to what we currently have in our README, so a lot | ||
of the content won't be entirely new. | ||
|
||
### API reference | ||
romanlutz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
This is simply the generated documentation from our code using docstrings. | ||
Currently we host this in readthedocs, but we want to include it on our | ||
webpage. A good example for this is | ||
[scikit-learn](https://scikit-learn.org/dev/modules/classes.html) | ||
|
||
### User guide | ||
|
||
The user guide explains all parts of Fairlearn by providing context that | ||
wouldn't fit into the code documentation such as mathematical derivations, | ||
but without using application-specific context (as we'd find it in the | ||
"example notebooks"). The guides are grouped by topic, e.g. | ||
|
||
1. What we mean by fairness in ML - should properly frame fairness as a | ||
sociotechnical challenge incl. | ||
- considering harms instead of biases | ||
- why "debiasing" is not possible | ||
- fairness through unawareness and why it is not sufficient | ||
- the ML lifecycle and how individual stages can affect fairness | ||
- AI systems need to be designed around the people they affect | ||
(specifically subpopulations that may be harmed by a system) | ||
- some reference to the | ||
[fairness checklist](https://www.microsoft.com/en-us/research/publication/co-designing-checklists-to-understand-organizational-challenges-and-opportunities-around-fairness-in-ai/) | ||
- ... | ||
1. Assessment | ||
1. Fairness definitions | ||
1. ... | ||
1. Metrics | ||
1. ... | ||
1. Dashboard | ||
1. ... | ||
1. Mitigation | ||
1. Postprocessing | ||
1. Threshold Optimizer | ||
1. Reductions methods | ||
1. Exponentiated Gradient | ||
1. Grid Search | ||
|
||
Importantly, the code samples should be minimal. For comprehensive examples | ||
we have the "Example Notebooks" section. In comparison, this section is more | ||
like a tutorial. It's about showing how to use our API while elaborating on | ||
mathemtical background that we can't explain in API documentation | ||
|
||
Examples: | ||
|
||
- [scikit-learn](https://scikit-learn.org/dev/user_guide.html) | ||
- [pandas](https://pandas.pydata.org/docs/user_guide/index.html) | ||
|
||
Of our current notebooks the following may be most suitable as "user guides": | ||
|
||
- [Group Metrics](https://github.com/fairlearn/fairlearn/blob/master/notebooks/Group%20Metrics.ipynb) - | ||
a great example for something that should be a user guide | ||
- [Grid Search for Binary Classification](https://github.com/fairlearn/fairlearn/blob/master/notebooks/Grid%20Search%20for%20Binary%20Classification.ipynb) - | ||
the purpose is mostly to show Grid Search's functionality; perhaps it may need to be trimmed down to the essentials about Grid Search | ||
- [Grid Search with Census Data](https://github.com/fairlearn/fairlearn/blob/master/notebooks/Grid%20Search%20with%20Census%20Data.ipynb) - | ||
similar to the previous notebook it covers Grid Search; we could leverage some of this for a user guide for Grid Search, or alternatively for the dashboard visualizations | ||
|
||
### Example Notebooks | ||
|
||
The purpose of the example notebooks is to walk through an application of | ||
Fairlearn in detail. Any application of a fairness toolkit needs to be done | ||
with great care while taking into account an entire range of concerns due to | ||
the sociotechnical nature of fairness. The showcased notebooks will provide | ||
the space to cover scenarios in depth. The focus is not only on showing | ||
example usage of the Fairlearn toolkit, but on how to approach fairness in ML | ||
in general. We may want to add a scenario even if it contains only few of | ||
Fairlearn's capabilities, but it otherwise demonstrates a great example of | ||
how to build AI responsibly. | ||
|
||
All the example notebooks should be downloadable as Jupyter notebooks and | ||
Python source code, and be launchable in [Binder](https://mybinder.org/) or a | ||
similar platform. | ||
|
||
Note: [scikit-learn](https://scikit-learn.org/dev/auto_examples/index.html) | ||
refers to these as "Examples". However, they use them to highlight a specific | ||
aspect of a feature/model. For Fairlearn it would be more about a properly | ||
framed example from a fairness point of view. | ||
|
||
Of our current notebooks the following would be most closely aligned with | ||
this section: | ||
|
||
- [Binary Classification on COMPAS dataset](https://github.com/fairlearn/fairlearn/blob/master/notebooks/Binary%20Classification%20on%20COMPAS%20dataset.ipynb), | ||
although we should perhaps consider removing it since it may not do | ||
justice to this complex setup | ||
- [Binary Classification with the UCI Credit-card Default Dataset](https://github.com/fairlearn/fairlearn/blob/master/notebooks/Binary%20Classification%20with%20the%20UCI%20Credit-card%20Default%20Dataset.ipynb) | ||
- [Mitigating Disparities in Ranking from Binary Data](https://github.com/fairlearn/fairlearn/blob/master/notebooks/Mitigating%20Disparities%20in%20Ranking%20from%20Binary%20Data.ipynb) | ||
|
||
### Contributor Guide | ||
|
||
We want to ensure people know | ||
|
||
- ways to contribute | ||
- It should be clear how people can reach out if they want to contribute | ||
to Fairlearn, and where they can find small items to get started. | ||
- the respository sturcture / organization of work | ||
- Fairlearn proposals | ||
- how to contribute code | ||
- Moments | ||
- ... | ||
- how to contribute notebooks | ||
- style guide | ||
- good workflow for editing (from `.ipynb` to `.py` etc.) | ||
- ... | ||
|
||
[This](https://scikit-learn.org/dev/developers/contributing.html) is an | ||
example of how scikit-learn handles it through a contribution guide that is | ||
somewhat similar to ours that we currently have in the repo. | ||
|
||
Some projects have a page showing the maintainers as well: | ||
|
||
- [scikit-learn](https://scikit-learn.org/stable/about.html#people) | ||
- [pandas](https://pandas.pydata.org/about/team.html) | ||
|
||
## Required steps | ||
|
||
1. Get GitHub Pages page/repository up and running | ||
1. Set up CI to deploy current documentation there automatically | ||
1. Set up CI to make documentation changes viewable, i.e. the generated | ||
HTML pages need to be visible (CircleCI) | ||
1. Establish webpage section as outlined above (Quickstart, User guide, etc.) | ||
1. Convert existing content, including reformatting markdown as | ||
ReST. | ||
- We already have the examples gallery thanks to Adrin's work on | ||
`sphinx-gallery`. There will be plenty of work to convert existing | ||
notebooks to ReST example notebooks (or user guides) as mentioned in | ||
earlier sections. If this is very laborious we can consider shortcuts | ||
for the short-term such as linking to GitHub notebooks, or using a | ||
Jupyter plugin for `sphinx`. | ||
- Related: Document notebook development process (see separate section | ||
below) | ||
1. Write remaining content for all of them. | ||
1. We need to find a way to present the dashboard in a website | ||
where it can't be interactive. Perhaps with screenshots for the user | ||
guides, but the example notebooks are downloadable as Jupyter notebooks. | ||
[Could we perhaps pre-calculate all metrics and show the interactive | ||
dashboard in the example notebooks? There may be a sphinx extension for | ||
typescript] | ||
1. integrate landing page (will be provided by a designer), everything | ||
else from the Fairlearn project repositories | ||
1. add style template from pandas | ||
1. remove API doc from readthedocs | ||
1. ensure the navigation from homepage to the other sections works | ||
- manual testing | ||
1. automated testing of navigation/broken links | ||
1. Add an example notebook to show how to use estimators from various packages | ||
with our mitigation techniques. | ||
|
||
The repository structure should look similar to | ||
[what scikit-learn has](https://github.com/scikit-learn/scikit-learn/tree/master/doc): | ||
|
||
- top-level doc directory has all the ReST files for the webpage except for | ||
API documentation and example notebooks | ||
- API documentation comes directly from the code documentation | ||
- example notebooks live in a separate top-level directory | ||
(scikit-learn calls it `examples`) as python files | ||
|
||
### Less urgent | ||
|
||
- Switch to numpy doc format; benefits explained in | ||
[this issue](https://github.com/fairlearn/fairlearn/issues/314); | ||
definitely worthwhile, but not as urgent as other items that get the webpage | ||
started. Proposed solution in that issue was to switch over piece-by-piece. | ||
|
||
## Development process | ||
|
||
We need to document all the processes around generating documentation. | ||
Specifically, we need to document how one can | ||
|
||
- build/generate the documentation and subsequently view it | ||
- develop example notebooks that end up in python files, but perhaps while | ||
creating them using Jupyter | ||
- get documentation changes into the Fairlearn repository | ||
(PR with CI generating documentation and storing it as artifacts) | ||
|
||
Document exactly which tools/plugins we recommend, e.g. VSCode extensions | ||
or Jupytext, etc. | ||
|
||
## Outstanding questions / tasks | ||
|
||
Documentation infrastructure related tasks should be tracked through the | ||
corresponding [GitHub project](https://github.com/fairlearn/fairlearn/projects/6) | ||
in the Fairlearn repository. | ||
|
||
1. Do we want users to cite us in any way? See | ||
[this example](https://scikit-learn.org/dev/about.html#citing-scikit-learn). | ||
1. Do we want user testimonials? It definitely provides credibility (assuming | ||
users are willing) | ||
1. Do we want a "News" section? It could list recent updates such as new | ||
versions (link to changelog), but also upcoming presentations, references | ||
to conference papers, blog posts, etc. | ||
1. Do we want a blog? [Example: pandas](https://pandas.pydata.org/community/blog/) | ||
1. Do we want to highlight differences to other fairness toolkits anywhere? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it'll probably come when users start asking the questions There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kevinrobinson mentioned this somewhere else in this PR. This is actually a frequently encountered question, and perhaps fits into the "About Fairlearn" page or a related one. |
||
1. Do we want to have an "ecosystem" page where we mention our relationship | ||
with other projects such as [InterpretML](https://github.com/interpretml) | ||
1. Should we have a glossary? | ||
[Example: scikit-learn](https://scikit-learn.org/dev/glossary.html) | ||
1. Do we want any kind of website analytics to figure out how users interact | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @adrinjalali do you have any analytics for As I wrote in the PR itself, it's probably very useful for the educational resources (user guide, notebooks) to know how many people view it, where we lose people (could be an indication to redesign material) and how people navigate through the page. I want to be super careful about this, though, because this sounds like data collection (GDPR? Privacy?). Beyond that, if we end up collecting data this can't just be accessible to, say, MSFT contributors, but it should be accessible by the entire community. Any thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we have google analytics as well in sklearn, but I had to check and I didn't know we have it, and I've never looked at the data. I think the GDPR issues are handled by now through notifications? I'd agree that the data should be available to the community, but it can be that it's available if they ask for it. Or at least I wouldn't mind it if that's the case. In terms of privacy, that's why I have all the blockers I've got and no data gets sent to any analytics server from my side lol What I mean is that I understand the value of those data, and I'm okay if we collect them in this project, but I also recommend people to protect themselves especially since it's not too hard for people to do so via installing one or more adons. I wouldn't worry about this too much, but I also wouldn't rely on the data too much. Having user surveys such as the ones done by dask and pandas people is a better way of understanding what the users do and want from the library. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps if we do start collecting data, the 'agree' page should contain links to the various plugins which can thwart the trackers :-) Less tongue-in-cheek, we are particularly concerned about this because of the potentially contradictory message it sends, as compared to the purpose of the repo (Differential Privacy would be the one worse project to have a tracker quietly appear). Thank you @adrinjalali for your thoughts - we may have been too paranoid. And your point about how reliable the data will be is well taken - it will take some time for us to get to our first million users, and hence for \sqrt{n} to beat down the errors. We would welcome further outside perspectives on adding tracking (how much, sharing the data etc.). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I hadn't even heard of these before! Very interesting, thanks Adrin! https://pandas.pydata.org/community/blog/2019-user-survey.html |
||
with the content? Given that this project's goal is to be about more than | ||
just code, we should have mechanisms to understand whether our educational | ||
material is actually useful (and used). Any suggestions? | ||
1. [Currently using fairlearn.github.io] deploy through fairlearn.org | ||
- check that all pages are reachable through fairlearn.org | ||
- https for fairlearn.org (currently only http works) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other substance that would be helpful in some place is:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanlutz added thoughts on "What feels like success" to the team in https://github.com/kevinrobinson/fairlearn/pull/1/files#r420978039
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romanlutz also added thoughts on the roadmap and milestones in https://github.com/kevinrobinson/fairlearn/pull/1/files#r420980548.
FWIW, as an interested outsider, one way to evaluate what's happening is a project is to look at open issues and milestone issues in GitHub, and then compare that to what's talked about in chat rooms, and what PRs and commits are actually shipping. After doing those things myself, I took my best guess at what the project's goals are for the next three months, and for the next year, and I wrote them out here: https://github.com/kevinrobinson/fairlearn/pull/1/files#diff-a594ce9af6d9751647bbd4efefa65312R63. But it's mostly guessing and I am sure its inaccurate :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is you shouldn't have to guess :-) Thanks for raising this point. I wrote a little bit above about how we should have some sort of "About Fairlearn" page which very much ties into this. I mentioned a few of these things in the Community section already, so let me expand on that. Still, I think highlighting that on the landing page and directly linking to it from the landing page would be prudent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MiroDudik do you have thoughts on this? The more I work on this the more I agree that we need some kind of "About" page that outlines
That could just be a separate top-level section. I do feel there's some overlap with what we have in "community", which in turn has some overlap with "contributing". Maybe splitting the "community" content into "about" and "contributing" clears this up? Just thinking out loud here, really open to pretty much anything that improves this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As detailed in another comment I've restructured this a little bit, and added an "About" section for this purpose. @riedgar-ms @MiroDudik please lmk if you have comments