Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code documentation #14

Closed
miquelduranfrigola opened this issue Mar 23, 2022 · 21 comments
Closed

Code documentation #14

miquelduranfrigola opened this issue Mar 23, 2022 · 21 comments
Assignees
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers

Comments

@miquelduranfrigola
Copy link
Member

miquelduranfrigola commented Mar 23, 2022

Background

Our high-level documentation is being built using a GitBook. In addition, we have the automatically generated documentation (package index) as created by Sphinx. This low-level documentation can be seen here.

Help needed

We currently lack docstrings for most of the classes in ersilia, which results in poor code documentation. We are seeking advice on how to write code documentation efficiently, as well as contributors who would be able to help us in the writing.
Please comment under this issue with good resources on how to write documentation

@GemmaTuron GemmaTuron added documentation Improvements or additions to documentation good first issue Good for newcomers labels Mar 23, 2022
@AviatorIfeanyi
Copy link

Would it be possible to implement the open api specification?

@AviatorIfeanyi
Copy link

I am still trying to wrap my head around this issue😎😎

I found this though

https://www.writethedocs.org/guide/writing/beginners-guide-to-docs/

would love to know how i would be of help to the project

@yigakpoa
Copy link

Hi @GemmaTuron @miquelduranfrigola please find below a few amazing resources on how to write documentation:

Victoria of Freecodecamp advises we curate accurate notes, explaining decisions, no neglect of prerequisite knowledge, and documentation of everything: https://www.freecodecamp.org/news/how-to-write-good-documentation/

HeroThemes educates us on the various types of documentation, when and what to document, reviewing and testing, update schedules, etc.: https://herothemes.com/blog/how-to-write-documentation/

Here Lari tells us about the steps she uses in a simplified way: https://medium.com/larimaza-en/how-to-write-good-documentation-e19c70dc67f0

Whatafix gives us insight into various documentation tools that can help us streamline the documentation process, thereby making writing and distribution easier: https://whatfix.com/blog/software-documentation-tools/

And finally, is Daniele Procida’s speech at Write the Docs EU Conference in 2017 where he focuses on the four kinds of documentation and how they work:
https://www.writethedocs.org/videos/eu/2017/the-four-kinds-of-documentation-and-why-you-need-to-understand-what-they-are-daniele-procida/

@victorabba
Copy link
Contributor

victorabba commented Mar 26, 2022

The resource below shows an efficient way of writting code documentation.

https://guides.lib.berkeley.edu/how-to-write-good-documentation

Sample of how to document class docstrings

https://realpython.com/documenting-python-code/#documenting-your-python-code-base-using-docstrings

@victorabba
Copy link
Contributor

I would love to help in the code documentation. Docstrings for classes and modules are documented in a different way. However we can may not necessarily need to do the documentation manually for all the codes. If the code is written with python we can use the python console or python shell to generate the code description while we input the docstrings manually.

@AishaSaman
Copy link

Hi, I am Aisha and I am an Outreachy internship applicant.

I am willing to help and contribute in writing documentation.

I found these resources helpful:
https://docs.readthedocs.io/en/stable/tutorial/
https://www.geeksforgeeks.org/python-docstrings/

@GemmaTuron
Copy link
Member

I am still trying to wrap my head around this issue😎😎

I found this though

https://www.writethedocs.org/guide/writing/beginners-guide-to-docs/

would love to know how i would be of help to the project

Hello @AviatorIfeanyi, this guide to docs is referring to high level documentation (such as readme files). Here we are thinking about low level, technical documentation. If you are interested in high-level docs, here are other issues focused on them, such as #16

@GemmaTuron
Copy link
Member

GemmaTuron commented Mar 28, 2022

Hi @GemmaTuron @miquelduranfrigola please find below a few amazing resources on how to write documentation:

Victoria of Freecodecamp advises we curate accurate notes, explaining decisions, no neglect of prerequisite knowledge, and documentation of everything: https://www.freecodecamp.org/news/how-to-write-good-documentation/

HeroThemes educates us on the various types of documentation, when and what to document, reviewing and testing, update schedules, etc.: https://herothemes.com/blog/how-to-write-documentation/

Here Lari tells us about the steps she uses in a simplified way: https://medium.com/larimaza-en/how-to-write-good-documentation-e19c70dc67f0

Whatafix gives us insight into various documentation tools that can help us streamline the documentation process, thereby making writing and distribution easier: https://whatfix.com/blog/software-documentation-tools/

And finally, is Daniele Procida’s speech at Write the Docs EU Conference in 2017 where he focuses on the four kinds of documentation and how they work: https://www.writethedocs.org/videos/eu/2017/the-four-kinds-of-documentation-and-why-you-need-to-understand-what-they-are-daniele-procida/

Hello @yigakpoa , thanks for the resources, well-found. Would you be interested in pursuing this issue further? If so let us know and we will assign you a specific contribution regarding low-level documentation.
Please note that Python knowledge is necessary for this issue.

@GemmaTuron
Copy link
Member

GemmaTuron commented Mar 28, 2022

I would love to help in the code documentation. Docstrings for classes and modules are documented in a different way. However we can may not necessarily need to do the documentation manually for all the codes. If the code is written with python we can use the python console or python shell to generate the code description while we input the docstrings manually.

Hello @victorabba, indeed. When you finish your contributions to other issues let us know and we will assign you a specific task for low-level documentation.
Please note that Python knowledge is necessary for this issue

@GemmaTuron
Copy link
Member

Hello @AishaSaman

Thanks, are you interested in contributing to low-level documentation? Please note that for technical documents Python knowledge is required

@miquelduranfrigola
Copy link
Member Author

miquelduranfrigola commented Mar 30, 2022

@GemmaTuron what is the current status of this issue? People have provided great resources, I am very grateful for this. We need to identify a couple of easy-to-document scripts so that people can start making their contributions. Assigning this to myself for now. As soon as I identify a few exemplary scripts I will notify folks and ask for help

@miquelduranfrigola miquelduranfrigola self-assigned this Mar 30, 2022
@arushi2715
Copy link
Contributor

Hello @miquelduranfrigola I would like to work on this

@pauline-banye
Copy link
Contributor

pauline-banye commented Mar 30, 2022

Hi @miquelduranfrigola @GemmaTuron I would love to volunteer to work on this. I'll hold on until @miquelduranfrigola has identified the exemplary scripts.

I do have some experience with Python/Django and I have contributed to open source projects before.

@delphine-boke
Copy link

hello. i think we can use:
https://medium.com/technical-writing-is-easy/tools-for-code-documentation-4fd9e8e39eed

or use Open API like Swagger to document this. i can be of help there.

@victorabba
Copy link
Contributor

Hi @GemmaTuron. I'm currently done with my previous issues, can you assign me a task on this issue?

@victorabba
Copy link
Contributor

I forgot to mention I have knowledge in Python and Django.

@delphine-boke
Copy link

i am familiar with python(django, flask); java and python for ML models

@delphine-boke
Copy link

@GemmaTuron @miquelduranfrigola, i would love to contribute to this issue

@camus60
Copy link

camus60 commented Mar 31, 2022

@GemmaTuron @miquelduranfrigola are you referring to something similar to this?

reference: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/models/atmodel.py

"""Author-topic model.
This module trains the author-topic model on documents and corresponding author-document dictionaries.
The training is online and is constant in memory w.r.t. the number of documents.
The model is *not* constant in memory w.r.t. the number of authors.
The model can be updated with additional documents after training has been completed. It is
also possible to continue training on the existing data.
The model is closely related to :class:`~gensim.models.ldamodel.LdaModel`.
The :class:`~gensim.models.atmodel.AuthorTopicModel` class inherits  :class:`~gensim.models.ldamodel.LdaModel`,
and its usage is thus similar.
The model was introduced by  `Rosen-Zvi and co-authors: "The Author-Topic Model for Authors and Documents"
<https://arxiv.org/abs/1207.4169>`_. The model correlates the authorship information with the topics to give a better
insight on the subject knowledge of an author.
.. _'Online Learning for LDA' by Hoffman et al.: online-lda_
.. _online-lda: https://papers.neurips.cc/paper/2010/file/71f6278d140af599e06ad9bf1ba03cb0-Paper.pdf
Example
-------
.. sourcecode:: pycon
    >>> from gensim.models import AuthorTopicModel
    >>> from gensim.corpora import mmcorpus
    >>> from gensim.test.utils import common_dictionary, datapath, temporary_file
    >>> author2doc = {
    ...     'john': [0, 1, 2, 3, 4, 5, 6],
    ...     'jane': [2, 3, 4, 5, 6, 7, 8],
    ...     'jack': [0, 2, 4, 6, 8]
    ... }
    >>>
    >>> corpus = mmcorpus.MmCorpus(datapath('testcorpus.mm'))
    >>>
    >>> with temporary_file("serialized") as s_path:
    ...     model = AuthorTopicModel(
    ...         corpus, author2doc=author2doc, id2word=common_dictionary, num_topics=4,
    ...         serialized=True, serialization_path=s_path
    ...     )
    ...
    ...     model.update(corpus, author2doc)  # update the author-topic model with additional documents
    >>>
    >>> # construct vectors for authors
    >>> author_vecs = [model.get_author_topics(author) for author in model.id2author.values()]
"""

# TODO: this class inherits LdaModel and overwrites some methods. There is some code
# duplication still, and a refactor could be made to avoid this. Comments with "TODOs"
# are included in the code where this is the case, for example in the log_perplexity
# and do_estep methods.

import logging
from itertools import chain
from copy import deepcopy
from shutil import copyfile
from os.path import isfile
from os import remove

import numpy as np  # for arrays, array broadcasting etc.
from scipy.special import gammaln  # gamma function utils

from gensim import utils
from gensim.models import LdaModel
from gensim.models.ldamodel import LdaState
from gensim.matutils import dirichlet_expectation, mean_absolute_difference
from gensim.corpora import MmCorpus

logger = logging.getLogger(__name__)


class AuthorTopicState(LdaState):
    """Encapsulate information for computation of :class:`~gensim.models.atmodel.AuthorTopicModel`."""

    def __init__(self, eta, lambda_shape, gamma_shape):
        """
        Parameters
        ----------
        eta: numpy.ndarray
            Dirichlet topic parameter for sparsity.
        lambda_shape: (int, int)
            Initialize topic parameters.
        gamma_shape: int
            Initialize topic parameters.
        """
        self.eta = eta
        self.sstats = np.zeros(lambda_shape)
        self.gamma = np.zeros(gamma_shape)
        self.numdocs = 0
        self.dtype = np.float64  # To be compatible with LdaState


def construct_doc2author(corpus, author2doc):
    """Create a mapping from document IDs to author IDs.
    Parameters
    ----------
    corpus: iterable of list of (int, float)
        Corpus in BoW format.
    author2doc: dict of (str, list of int)
        Mapping of authors to documents.
    Returns
    -------
    dict of (int, list of str)
        Document to Author mapping.
    """
    doc2author = {}
    for d, _ in enumerate(corpus):
        author_ids = []
        for a, a_doc_ids in author2doc.items():
            if d in a_doc_ids:
                author_ids.append(a)
        doc2author[d] = author_ids
    return doc2author

@AishaSaman
Copy link

Hello @AishaSaman

Thanks, are you interested in contributing to low-level documentation? Please note that for technical documents Python knowledge is required

Yes I am interested. I have basic python knowledge.

@GemmaTuron
Copy link
Member

hello all!
@camus60: yes, this is a good example
@AishaSaman, @delphine-boke @victorabba please check the new contribution guidelines for outreachy participants and follow them. The low-level documentation is related to the #code project rather than the documentation one, but if you want to contibute as part of the documentation it would also be possible, follow the new instructions!

I am closing this issue to avoid duplicating information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

10 participants