# Readability

When writing code, it can often feel like you're in a conversation (or fight) with your computer. But if you take nothing else away from this reading, let it be this: **we write code for people, not computers**. 

Your computer doesn't need all the neat words you get in a language like Python — `print`, `in`, or `list`. It's just moving electrons through trillions of little circuits. Programming language are there to make what a computer does legible to *people*. And yet, despite that, we so often find ways to make the code we write utterly impossible for people to read.

That's a problem, because if you can't easily understand what's going on when you look at code, there's no chance you'll be able to identify mistakes. That's why writing *readable* code isn't just about aesthetics — its key to avoiding mistakes.

So what makes code readable?

## Use informative variable names. 

Don't call something `var212` if you can call it `unemployment_percentage`. Informative names require more typing, but they make your code so much easier to read. Moreover, including units in your variables names (`percentage`, `km`, etc.) can also help avoid confusion.


In [2]:
# Bad
def foo(a, b):
    return a / 12 + b


# Good
def convert_ft_and_inches_to_inches(feet, inches):
    return feet * 12 + inches

The two functions above do exactly the same thing, but when you see the second function, it's immediately clear what the function does and how to use it. With the first function, not only is it unclear what it does, but it can also be confusing to use when you know the purpose — which argument is for feet and which is for inches? The names of the arguments (`a` and `b`) don't tell you!

Code with really clear variable names is sometimes called "self-documenting" because a reader can understand its purpose without having to reference comments or additional documentation.

## Comment

Comment your code! Comments help in two ways. 

First, and most obviously, they make it easy to figure out what's going on when you come back to code days, weeks, or months after it was originally written. 

And second, it forces you to think about what you're doing in _substantive_ terms ("This section calculates the share of people within each occupation who have college degrees") rather than just in programming logic, which can help you catch _substantive_ problems with code that may run without problems but will not actually generate the quantity of interest.

To be clear, not everything requires a comment, and the more informative your variable and function names, the fewer comments you need to put around your code to make it legible. But because much of what we do as data analysts is motivated by what we know about the underlying data — and that isn't self-evident in the code we write — comments about the motivation for data manipulations are critical.

## Format Your Code

Humans are visual creatures with superb visual pattern recognition skills (our ancestors wouldn't have survived if they weren't able to find camouflaged lions in the grass!). Formatting your code in a consistent manner allows you to leverage those innate abilities to notice when things don't look right.

Once upon a time, I'd point you to a document about principles of good formatting. Today, though, I don't have to, because there are programs you can use to format your code automatically. In the world of Python, nearly everyone formats with a program called `black`, or similar formatters (like [Ruff](https://docs.astral.sh/ruff/formatter/)) that is designed to do exactly the same thing as Black. I *strongly* recommend setting up "format on save" in your editor. For directions on how to do that in VS Code, [go here](../00_setup_env/setup_vscode.html#format-on-save-with-black-extension).

> What is "Black"? [Black](https://black.readthedocs.io/en/stable/) is an "opinionated code formatter." As they say on their website "By using Black, you agree to cede control over minutiae of hand-formatting. In return, Black gives you speed, determinism, and freedom from pycodestyle nagging about formatting. You will save time and mental energy for more important matters." It's become the standard for formatting Python code, not necessarily because everyone thinks the way it styles codes is "the best," but rather because it's put an end to fights over code format that used to arise in every open-source project. The name for the package comes from a quotation from Henry Ford, who once said of the first mass produced automobile, the Model T, "Any customer can have a car painted any color that he wants so long as it is black."

## Interpret Your Numbers

When you print out numbers in a data analysis workflow, don't just throw them out there without context! Data science is about *interpreting* data to help improve our understanding of the world around us. To that end, any time you are sharing a result, you **absolutely must** interpret it for the reader. That means no answering a question with Python printing out a number—you should also tell the reader what, in substantive terms, that number means. And honestly, even if you *aren't* sharing a result — you're just trying to print out an important quantity for yourself — making yourself interpret it explicitly will force you to think about what you're doing in substantive terms.

For example, if you're asked what share of US households are living below the poverty line, don't answer `0.183`. 

- First, it's not clear if that's the share / proportion (18.3%), or the percentage of households (0.18%). 
- Second, one should always include units when presenting numbers (in this case "US Households"), since the share of households living below the poverty line worldwide is different from the share of households living below the poverty line in the US is different from the share of _people_ living below the poverty line in the US. 

I recognize that when an exercise prompt is pretty clear this may feel unnecessary / pedantic, but it's a good practice to get used to for the real world. Get your units wrong and... [bad things can happen.](https://www.wired.com/2010/11/1110mars-climate-observer-report/)

My personal suggestion for how to do this is to present answers with an f-strings (not familiar with f-strings? Best Python feature in years. [Learn all the tricks here!](https://nickeubank.github.io/practicaldatascience_book/notebooks/other/fstrings.html)). These will not only allow you to combine interpretation / units with your result, but you can also format it (so it doesn't have 20 decimal places), and you have an output that will automatically update if you change code further up in the notebook. For example:


In [3]:
below_poverty = 0.183
print(
    "The share of US Households earning less than "
    f"$20,000 in 2008 was {below_poverty:.2f}"
)

The share of US Households earning less than $20,000 in 2008 was 0.18


And for money values, you can easily add commas and round off to two decimals:

In [4]:
amount_of_money = 30_021_032.2398823
print(f"The price paid for the company was ${amount_of_money:,.2f}")

The price paid for the company was $30,021,032.24
