# DS-210: Programming for Data Science


# Lecture 13: Documentation generation in Python. Version control.


* Stanford Large Network Dataset Collection: https://snap.stanford.edu/data/
* Big Graph Analytics: https://lgylym.github.io/big-graph/dataset.html
* UCI Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets.php
* Kaggle: https://www.kaggle.com/datasets
* UNdata: https://data.un.org/
* Free Public Data Sets for Analysis: https://www.tableau.com/learn/articles/free-public-data-sets
* Crowdsourced list of large data sets: https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public?share=1
* Interesting Data Sets: https://piktochart.com/blog/100-data-sets/
* Earthdata: https://search.earthdata.nasa.gov/search
* Nasdaq Data Link: https://data.nasdaq.com/search?filters=%5B%22Free%22%5D
* US Government’s open data: https://www.data.gov
* Harvard Dataverse: https://dataverse.harvard.edu




* <b> Some interesting project ideas in https://github.com/kthanasi-git/ds210-demo </b>  
* Project proposal due Nov 14th
* Project implementation due Dec 14th

* Installation instruction https://git-scm.com/book/en/v2/Getting-Started-Installing-Git  
* Some of the most useful commands:  
```
`> git --version #show the version`  
`> git init #initialize a local repository`  
`> git status #show the status of repository/branch`  
`> git add -A #add files to branch`  
`> git commit -m "My first commit in the local repository" #Submit files to repository`   
`> git branch mynewbranch #Make a new branch`  
`> git checkout mynewbranch #switch to using the new branch`  
`# Now you can add and commit files in the new branch mynewbranch`  
`# You can move between branches using the checkout command`  
`> git checkout master; git merge mynewbranch; git branch -d mynewbranch #Merge branch with master and delete it`  
`# Conflicts can arise when merging branches.  You may need to resolve them using an editor`  
`> git remote add origin https://github.com/kthanasi-git/My-new-repo #Add a non-local origin for safety`  
`> git push --set-upstream origin master #push our current local repository to origin`  
`> git fetch origin #Fetch from origin to local repository` 
`> git merge origin/master #Merge origin to master`  
`> git pull origin #Fetch and merge in a single operation`  
`> git push origin #Push to origin after having made local changes`  
```
* You can read a lot more at https://git-scm.com/ or https://www.w3schools.com/git/  

<div align="center">
    <h3>What are the goals of documentation?</h3>
</div>

<br><br>


* **Users:** how to use your software

<br>

* **Developers:** how to extend it and maintain it

## Challenges of good documentation

Keeping it
* up to date
* concise
* exhaustive

## Various types

* Code commenting
* Reference
* Tutorial
* Quick HOWTO's
* ...

## A few tips on code commenting

* Make code explain itself as much as possible
  - Code refactoring is very important
  
* Explain only non-trivial parts
  - The reader should already know this programming language

* Keep comments close to the related code

In [None]:
# DON'T DO THIS!!!
def add(x,y):
    # we add the first parameter to the second one and return their sum
    return x + y

* Can be used for tagging parts of the code (`TODO` or `FIXME`)

In [1]:
def return_one(x):
    # FIXME: properly handle the corner case of x = 0
    return x/x

# The Linux codebase has >3,000 FIXME/TODO comments some >10 years old

## Today's focus: docstrings

Main idea:
 * put the description of a function, class, or object in a comment block next to it
 
Advantages:
 * for someone reading or maintaining the code: you'll find your information right there
 * encourages the developer to update it after making changes
 * can be extracted automatically to produce nice docs (we'll look at examples today)

In [None]:
def addition(x,y):
    """(one-line summary) Addition of two numbers.
    
    Here is where you can get into more details. `x`
    is an integer and so is `y`. The output is the
    sum of `x` and `y`.
    """
    return x + y

Popular in many languages:
 * easy to add even if officially not supported


## Simplest tool: `pydoc`

* Should be included in your Python installation

* Go to the command line and just type: `pydoc WHATEVER-YOU-ARE-INTERESTED-IN`

* Example: `pydoc matplotlib.pyplot.scatter` or `pydoc numpy.array`

## Using `pdoc` for generating HTML/Markdown/PDF

**Installation:** `pip install pdoc3` (adjust to your package manager!)

**Generating HTML pages:** `pdoc --html MODULE-NAME` (with `--force` to overwrite an earlier version)

<div align="center">
    <b>[Demo in the terminal]</b>
</div>

## Many other tools for processing docstrings in Python

* Explore them before committing to one of them

* Check out Sphinx!


## Similar tools for Rust

**Use /// to identify comments that can be translated to documentation**  
**// for regular comments**
```/// Adds one to the number given.
///
/// # Examples
///
/// ```
/// let arg = 5;
/// let answer = my_crate::add_one(arg);
///
/// assert_eq!(6, answer);
/// ```
pub fn add_one(x: i32) -> i32 {
    x + 1
}
```

![image.png](attachment:image.png)

`# cargo doc ` to generate the documentation

* How to install Rust: Installing Rust: `curl --proto '=https' --tlsv1.2 https://sh.rustup.rs -sSf | sh`
* The Rust Bible (https://doc.rust-lang.org/stable/book/). 
* Some tricky directions that you need to follow so you can execute Rust in Jupyter Notebooks
 * Install cmake (https://cmake.org/install/).  Follow instructions for Mac or Windows
 * `cargo install evcxr_jupyter`
 * `evcxr_jupyter --install`
