# Module 8: File Names, Project Organization, Virtual Environments

In this module, we take a look at the importance of setting up correct file names, project layout, and the benefits of utilizing virtual environments.

## File Names

When it comes to naming convetions, there exists a specific guideline that is to be followed.

In particular, we have three main principles for effective file names.

File names should be:
1. Informative to humans and easy to read
2. Interpretable by machines
3. Accomodate a sense of ordering




## 1. Readable by Humans

**Add example here**

Filenames offer a chance to describe a file's contents, eliminating the need to open each file to determine its contents (which would be incredibly tedious!).

Thus, it is crucial to write filenames that are easy for humans to interpret. Always aim to use filenames that reflect key aspects of the file's content.

The informative part of a filename is often referred to as a "slug." This term is also used for the last part of a URL, intended to be relevant to the webpage's content. A well-matched and readable slug increases the likelihood of clicks and mentions, and it aids search engines in identifying the webpage's content.


## 2. Readable by Machines

It is crucial to use filenames that are both machine-friendly and consistent across files with similar or related content.

For example, it is important to avoid spaces and punctuation in filenames. These characters often have special meanings in programming, so filenames containing them require special handling to prevent misinterpretation by scripts or the terminal.

For example, `figure1-scatterplot-*-vs-&.png` is not an effective filename. It includes special characters, some of which (like `*`) have specific functions in the shell, making it difficult to work with and giving unwanted commands.

The second name, `Figure 1 scatterplot of asterisks vs ampersands.png`, might seem suitable because it's easy to read and doesn't appear to contain special characters. But the problem lies in the fact that it contains spaces, which can be challenging to handle in languages like Python. It is recommended to use hyphens instead of spaces to separate words.

Additionally, the filename should be easy for humans to read and descriptive of the file's contents. For example, `fig1_scatterplot-asterisks-vs-ampersands.png` addresses all these issues effectively.


## 3. Plays well with Ordering

Another helpful convention to consider is using filenames that are ordered in a preferable way by default. Depending on the context, we should practice several techniques: starting filenames with something numeric, left-padding numbers with zeros or displaying files in "natural" order, and using the YYYY-MM-DD format for dates. These practices ensure that files are organized logically and can be easily sorted and accessed when needed.

As an example, let's consider three example files: `figure03.png`, `figure23.png`, and `figure3.png`.

If we were to type `ls` in the terminal, the output would give

```
figure03.png
figure23.png
figure3.png

```

We can see how if we're not careful with our naming convention,
`figure23.png` would come before `figure3.png`, even though 3 is a smaller number than 23. This is why we use leading zeros (evident in the example where
`figure03.png` comes before `figure23.png`).

Another option is to add the -v tag in ls, making the command `ls -v`. The `-v` tag represents the "natural" sort.

## Project Organization

Earlier in this module, we discussed how using certain filename conventions makes it easier to find relevant files. When it comes to organizing an entire project, it's even more crucial to structure files and folders in a logically accepted way. Maintaining a standard project structure offers several benefits:

1. Well-organized code is self-documenting, making it easier to understand.
2. A new collaborator can quickly locate specific components.
3. Your future self will find it much simpler to reproduce your own results.

Therefore, it's important to recognize that the code and data producing the end result are as significant as the final product itself. Since it's challenging to change your approach once you're halfway through a project, it's best to start with a standard project structure from the very beginning.

## The Files & Folders in a Standard Project Structure

The `.gitignore` file contains a list of files that should not be committed to your repository.

The `README.md` file provides general explanations about the project.

The `environment.yaml` file details the packages and libraries needed to reproduce the computational environment.

The `data/` directory is designated for raw and processed data. Since data typically doesn't require version control, this folder can be included in the `.gitignore` file.

The `docs/` directory holds project documentation, which can sometimes be generated automatically.

Jupyter notebook files, used for exploration and communication, are stored in the `notebooks/` directory.

Code files are usually kept in the `src/` directory, which can include subfolders as necessary.

## Virtual Environments

To put it simply, virtual environments allow us to use different versions of packages on the same computer, instead of us having to uninstall then reinstall different versions of a particular package.

The majority of the time, users work on many projects on their computer. This means the computational environments in which the projects operate within must be isolated from each other.

With these isolated environments, allows for the pathway of reproducibility.  


## Conda


To create and use these virtual environments, we have a few options. `virtualenv`, `venv`, `pipenv`, and `conda` are just some notable examples.

Conda in particular is a very popular open source package & environment management system for python.

## Managing Conda Environments

A Conda environment is a collection of packages that can be used for one or multiple projects. By using Conda environments, you can create an isolated Python environment tailored to your specific project.

Major Benefits of Using Conda Environments:
- Reproducibility: Ensures that others can reproduce your project by specifying the exact package versions you used, making it easy for them to install the same versions.
- Manage Different Versions: Allows you to handle different versions of the same package across projects by installing them in separate environments.
- Create Isolated Environments: Enables you to experiment with new packages without affecting the packages used for your data analysis, thus avoiding potential issues.

## Creating a Conda environment

There are two different ways of setting up a Conda environment.

The first method is to create the environment and install packages manually. These packages can be installed either at the time of environment creation or after creating the environment.

The second method is to create the environment through a file in `.yaml` format.

Let's take a look at an example code which creates a new environment called `test_env`.

`conda create -n test_env`

We can create a Conda environment named `test_env` by running the command `conda create -n test_env`. When you see the prompt `Proceed ([y]/n)?`, press Enter to continue.

The `-n` in the command `conda create -n test_env` stands for "name." It is used to specify the name of the new Conda environment you want to create. In this case, `test_env` is the name given to the new environment.

It's also helpful to specify additional details when creating the environment, such as the channel for installing packages, the Python version, and a list of packages to include.

In the example below, I am creating the test_env environment that uses Python 3.9, the latest version of jupyterlab, and pandas version 1.3.0.

`conda create -n test_env python=3.9 jupyterlab pandas=1.3.0`

## Activating a Conda Environment

The default Conda environment is the base environment, which includes only the essential packages. In this environment, the shell's prompt string is prefixed with `(base)`.

To activate the new environment we just created, type `conda activate test_env` (and `conda deactivate` to deactivate it).

Notice how the prompt string prefix in your shell changes from `(base)` to `(test_env)`, indicating that the `test_env` environment is now active.

To view all your environments, type `conda env list`.

Lastly, the environment with an asterik (*) next to it denotes the current active one.