# Data management

The sharing of data is important not just to software development, but also project management. For scientific fields, this is particularly so. These are some rough guidelines that would make project maintenance easier:

1. **Use descriptive and informative file names.** 
2. **Choose file formats that will ensure long-term access.**
3. **Track different versions of your documents.**
4. **Create metadata for every experiment or analysis you run.**
5. **Find helpful tools for analyzing your data.**
6. **Handle sensitive data in an appropriate manner.**

## Information for file names

File names should allow you to identify a precise experiment from the name. Choose a format for naming your files and use it consistently. 

You might consider including some of the following information in your file names, but you can include any information that will allow you to distinguish your files from one another. 

- Project or experiment name or acronym
- Location/spatial coordinates
- Researcher name/initials
- Date or date range of experiment
- Type of data
- Conditions
- Version number of file
- Three-letter file extension for application-specific files

## Guidelines for choosing file formats:

Guidelines for choosing formats
When selecting file formats for archiving, the formats should ideally be:

- Non-proprietary
- Unencrypted
- Uncompressed
- In common usage by the research community
- Adherent to an open, documented standard:
  - Interoperable among diverse platforms and applications
  -  Fully published and available royalty-free
  -  Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
  -  Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.
  
## File versioning

When creating new versions of your files, record what changes are being made to the files and give the new files a unique name. Follow the general advice on the site for naming files, but also consider the following:

- Include a version number, e.g "v1," "v2," or "v2.1".
- Include information about the status of the file, e.g. "draft" or "final," as long as you don't end up with confusing names like "final2" or "final_revised".
- Include information about what changes were made, e.g. "cropped" or "normalized".

SIMPLE FILE VERSIONING

One simple way to version files is to manually save new versions when you make significant changes. This works well if:

- You don't need to keep a lot of different versions.
- Only one person is working on the files.
- The files are always accessed from one location.

## Tracking metadata

In its most basic sense, metadata is information about data, and describes basic characteristics of the data, such as

- Who created the data
- What the data file contains
- When the data were generated
- Where the data were generated
- Why the data were generated
- How the data were generated
- Metadata makes it easier for you and others to identify and reuse data correctly at a later date.


# Coding best practices (for Python)

When writing code in Python, it’s important to make sure that your code can be easily understood by others. Giving variables obvious names, defining explicit functions, and organizing your code are all great ways to do this.

Another *awesome* and easy way to increase the readability of your code is by using comments!

In this tutorial, you’ll cover some of the basics of writing comments in Python. You’ll learn how to write comments that are clean and concise, and when you might not need to write any comments at all.

You’ll also learn:

- Why it’s so important to comment your code
- Best practices for writing comments in Python
- Types of comments you might want to avoid
- How to practice writing cleaner comments

Now that we understand why it’s so important to comment your code, let’s go over some basics so we know how to do it properly.

# Python Commenting Basics

```
"Readability counts." -- The Zen of Python
```

As Guido van Rossum said, “Code is read much more often than it is written.” You may spend a few minutes, or a whole day, writing a piece of code to process user authentication. Once you’ve written it, you’re never going to write it again. But you’ll definitely have to read it again. That piece of code might remain part of a project you’re working on. Every time you go back to that file, you’ll have to remember what that code does and why you wrote it, so readability matters.

<figure>
  <img alt="Coding Best Practices" width="75%" height="75%" align="center" src="https://live.staticflickr.com/2203/2245445147_ff54c5997d.jpg"/>
  <figcaption><b>Figure 1: </b><i>Code Metrics: <b>W</b>ell, <b>T</b>hat's <b>F</b>unny!</i></figcaption>
</figure>

Comments are for developers. They describe parts of the code where necessary to facilitate the understanding of programmers, including yourself.

To write a comment in Python, simply put the hash mark # before your desired comment:

```python
# This is a comment
```

Python ignores everything after the hash mark and up to the end of the line. You can insert them anywhere in your code, even inline with other code:

```python
print("This will run.")  # This text after `print` won't run
```

When you run the above code, you will only see the output This will run. Everything else is ignored.

Comments should be short, sweet, and to the point. While [Python Exhancement Proposal (PEP) 8](https://www.python.org/dev/peps/pep-0008/) advises keeping code at 79 characters or fewer per line, it suggests a max of 72 characters for inline comments and docstrings. If your comment is approaching or exceeding that length, then you’ll want to spread it out over multiple lines.


# Python Multiline Comments

Unfortunately, Python doesn’t have a way to write multiline comments as you can in languages such as C, Java, and Go:

```
# So you can't
just do this
in python
```

In the above example, the first line will be ignored by the program, but the other lines will raise a Syntax Error.

In contrast, a language like Java will allow you to spread a comment out over multiple lines quite easily:

```java
/* You can easily
write multiline
comments in Java */
```

Everything between `/*` and `*/` is ignored by the program.

While Python doesn’t have native multiline commenting functionality, you can create multiline comments in Python. There are two simple ways to do so.

The first way is simply by pressing the return key after each line, adding a new hash mark and continuing your comment from there:

```python
def multiline_example():
    # This is a pretty good example
    # of how you can spread comments
    # over multiple lines in Python
```

Each line that starts with a hash mark will be ignored by the program.

Another thing you can do is use multiline strings by wrapping your comment inside a set of triple quotes:

```python
"""
If I really hate pressing `enter` and
typing all those hash marks, I could
just do this instead
"""
```

This is like multiline comments in Java, where everything enclosed in the triple quotes will function as a comment.

While this gives you the multiline functionality, this isn’t technically a comment. It’s a string that’s not assigned to any variable, so it’s not called or referenced by your program. Still, since it’ll be ignored at runtime and won’t appear in the bytecode, it can effectively act as a comment. (You can take a look at [this article](https://dbader.org/blog/python-multiline-comment) for proof that these strings won’t show up in the bytecode.)

However, be careful where you place these multiline “comments.” Depending on where they sit in your program, they could turn into docstrings, which are pieces of documentation that are associated with a function or method. If you slip one of these bad boys right after a function definition, then what you intended to be a comment will become associated with that object.

Be careful where you use these, and when in doubt, just put a hash mark on each subsequent line.

# Python Commenting Best Practices

While it’s good to know how to write comments in Python, it’s just as vital to make sure that your comments are readable and easy to understand.

Take a look at these tips to help you write comments that really support your code.

## When Writing Code for Yourself

You can make life easier for yourself by commenting your own code properly. Even if no one else will ever see it, you’ll see it, and that’s enough reason to make it right. You’re a developer after all, so your code should be easy for you to understand as well.

One extremely useful way to use comments for yourself is as an outline for your code. If you’re not sure how your program is going to turn out, then you can use comments as a way to keep track of what’s left to do, or even as a way of tracking the high-level flow of your program. For instance, use comments to outline a function in pseudo-code:

```python
from collections import defaultdict

def get_top_cities(prices):
    top_cities = defaultdict(int)

    # For each price range
    # Get city searches in that price
    # Count num times city was searched
    # Take top 3 cities & add to dict

    return dict(top_cities)
```

These comments plan out `get_top_cities()``. Once you know exactly what you want your function to do, you can work on translating that to code.

Using comments like this can help keep everything straight in your head. As you walk through your program, you’ll know what’s left to do in order to have a fully functional script. After “translating” the comments to code, remember to remove any comments that have become redundant so that your code stays crisp and clean.

You can also use comments as part of the debugging process. Comment out the old code and see how that affects your output. If you agree with the change, then don’t leave the code commented out in your program, as it decreases readability. Delete it and use version control if you need to bring it back.

Finally, **use comments to define tricky parts of your own code. If you put a project down and come back to it months or years later, you’ll spend a lot of time trying to get reacquainted with what you wrote.** In case you forget what your own code does, do Future You a favor and mark it down so that it will be easier to get back up to speed later on.

# When Writing Code for Others

People like to skim and jump back and forth through text, and reading code is no different. The only time you’ll probably read through code line by line is when it isn’t working and you have to figure out what’s going on.

In most other cases, you’ll take a quick glance at variables and function definitions in order to get the gist. Having comments to explain what’s happening in plain English can really assist a developer in this position.

Be nice to your fellow devs and use comments to help them skim through your code. Inline comments should be used sparingly to clear up bits of code that aren’t obvious on their own. (Of course, your first priority should be to make your code stand on its own, but inline comments can be useful in this regard.)

If you have a complicated method or function whose name isn’t easily understandable, you may want to include a short comment after the def line to shed some light:

```python
def complicated_function(s):
    # This function does something complicated
    pass
```

However, avoid writing comments that indicate something obvious to the reader. A better comment would explain _why_ a particular action was done, rather than _how_.

So, a better version of a comment for `complicated_function` would be:

```python
def complicated_function(s):
    # This function uses a string to compute X
    # since the input provided by the user are 
    # provided as strings. Then, a complicated
    # function would be evaluated.
    pass
```

This can help other devs who are skimming your code get a feel for what the function does.

For any public functions, you’ll want to include an associated docstring, whether it’s complicated or not:

```python
def sparsity_ratio(x: np.array) -> float:
    """Return a float

    Percentage of values in array that are zero or NaN
    """
    pass
```

This string will become the `.__doc__` attribute of your function and will officially be associated with that specific method. The [PEP 257 docstring guidelines](https://www.python.org/dev/peps/pep-0257/#one-line-docstrings) will help you to structure your docstring. These are a set of conventions that developers generally use when structuring docstrings.

The [PEP 257 guidelines have conventions for multiline docstrings](https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings) as well. These docstrings appear right at the top of a file and include a high-level overview of the entire script and what it’s supposed to do:

```python
# -*- coding: utf-8 -*-
"""A module-level docstring

Notice the comment above the docstring specifying the encoding.
Docstrings do appear in the bytecode, so you can access this through
the ``__doc__`` attribute. This is also what you'll see if you call
help() on a module or any other Python object.
"""
```

A module-level docstring like this one will contain any pertinent or need-to-know information for the developer reading it. When writing one, it’s recommended to list out all classes, exceptions, and functions as well as a one-line summary for each.



# Python Commenting - _Worst_ Practices

Just as there are standards for writing Python comments, there are a few types of comments that don’t lead to Pythonic code. Here are just a few.

## Avoid: W.E.T. Comments

Your comments should be D.R.Y. The acronym stands for the programming maxim “Don’t Repeat Yourself.” This means that your code should have little to no redundancy. You don’t need to comment a piece of code that sufficiently explains itself, like this one:

```python
return a  # Returns a
```

We can clearly see that a is returned, so there’s no need to explicitly state this in a comment. This makes comments W.E.T., meaning you “wrote everything twice.” (Or, for the more cynical out there, “wasted everyone’s time.”)

W.E.T. comments can be a simple mistake, especially if you used comments to plan out your code before writing it. But once you’ve got the code running well, be sure to go back and remove comments that have become unnecessary.

## Avoid: Smelly Comments

Comments can be a sign of “code smell,” which is anything that indicates there might be a deeper problem with your code. Code smells try to mask the underlying issues of a program, and comments are one way to try and hide those problems. Comments should support your code, not try to explain it away. If your code is poorly written, no amount of commenting is going to fix it.

Let’s take this simple example:

```python
# A dictionary of families who live in each city
mydict = {
    "Midtown": ["Powell", "Brantley", "Young"],
    "Norcross": ["Montgomery"], 
    "Ackworth": []
}

def a(dict):
    # For each city
    for p in dict:
        # If there are no families in the city
        if not mydict[p]:
            # Say that there are no families
            print("None.")
```

This code is quite unruly. There’s a comment before every line explaining what the code does. This script could have been made simpler by assigning obvious names to variables, functions, and collections, like so:

```python
families_by_city = {
    "Midtown": ["Powell", "Brantley", "Young"],
    "Norcross": ["Montgomery"],
    "Ackworth": [],
}

def no_families(cities):
    for city in cities:
        if not families_by_city[city]:
            print(f"No families in {city}.")
```

By using obvious naming conventions, we were able to remove all unnecessary comments and reduce the length of the code as well!

Your comments should rarely be longer than the code they support. If you’re spending too much time explaining what you did, then you need to go back and refactor to make your code more clear and concise.


## Avoid: Rude Comments

This is something that’s likely to come up when working on a development team. When several people are all working on the same code, others are going to be going in and reviewing what you’ve written and making changes. From time to time, you might come across someone who dared to write a comment like this one:

```python
# Put this here to fix Ryan's stupid-a** mistake
```

Honestly, it’s just a good idea to not do this. It’s not okay if it’s your friend’s code, and you’re sure they won’t be offended by it. You never know what might get shipped to production, and how is it going to look if you’d accidentally left that comment in there, and a client discovered it down the road? You’re a professional, and including vulgar words in your comments is not the way to show that.





# How to Practice Commenting

The simplest way to start writing more Pythonic comments is just to do it!

Start writing comments for yourself in your own code. Make it a point to include simple comments from now on where necessary. Add some clarity to complex functions, and put a docstring at the top of all your scripts.

Another good way to practice is to go back and review old code that you’ve written. See where anything might not make sense, and clean up the code. If it still needs some extra support, add a quick comment to help clarify the code’s purpose.

This is an especially good idea if your code is up on GitHub and people are forking your repo. Help them get started by guiding them through what you’ve already done.

You can also give back to the community by commenting other people’s code. If you’ve downloaded something from GitHub and had trouble sifting through it, add comments as you come to understand what each piece of code does.

“Sign” your comment with your initials and the date, and then submit your changes as a pull request. If your changes are merged, you could be helping dozens if not hundreds of developers like yourself get a leg up on their next project.

Lastly, one can consider this maxim from the **Zen of Python** as a summary of everything we've covered so far: "Simple is better than complex".


# References and Further Reading

1. The ten commandments for learning how to code: https://www.nature.com/articles/d41586-019-00653-5?utm_source=facebook&utm_medium=social&utm_content=organic&utm_campaign=NGMT_2_JAL_Nature&fbclid=IwAR0qWnAPH87R34RF5Jx4FgltGFAO3TDpMRSYk9IAkvMQ7PiEp-r5UVae4Z0
2. Python comments guide: https://realpython.com/python-comments-guide/
3. PEP 8 guidelines: https://realpython.com/python-pep8/#why-we-need-pep-8
4. Data management guidelines: https://library.stanford.edu/research/data-management-services/data-best-practices