# Files and File Types

## Prerequisites
- None (?)

## Learning Outcomes
- File Types
- Organising your files

## Organising files

Files are organised using <b>directories</b>. These will often appear on your computer as folder symbols. 

Directories branch downwards from the root directory, and can contain both files and subdirectories. For example, you may have a directory called 'Lab Documents', which might contain a subdirectory 'Module A Practical B', in which you hold all your results, risk assessments, and writeups for Practical B of Module A. The directory 'Lab Documents', is itself a subdirectory of something else, stemming back until you reach the root directory, often called ``C:\`` on a Windows operating system, or ``/var/root/`` on a Apple mac. This way, we can cluster together groups of related files to keep ourselves better organised. 

When you are programming, it is important to know which directory you are in for a number of reasons. 
- To use libraries like NumPy and MatPlotLib, they need to be accessible to the directory you are working in.
- If you are running code from the command line (more information in another lesson), you need to be able to change directories forwards and backwards to find your code. 
- If you want to read or write to another file, that file must be in the <b>same</b> directory as your code file. 

Directories are separated by a forward slash, /, or a backslash, \ (both are supported on most operating systems). If I was looking at my lab notes, I might be in the directory: ``C:\Users\Bella\Documents\University\Lab Documents\Module A\Practical B``. The directory list can get very long very fast, so naming your files and directories accurately and consistently is recommended. Also try not to have too many unnecessary additional mostly-empty directories. 

### Directory branch diagram

## File types

If you are running a practical in the lab, you might take a measurement once a minute or multiple times a second, for example if you are running cyclic voltammetry. You then need somewhere to store all those data points. Different file types are marked by the extension at the end of the name, for example ``.txt``, ``.dat``, ``.cif``, or ``.xyz``. If you have used Excel for data procesing in the past, you might have noticed that Excel files are often stored as ``.csv``. Each has a purpose, and each has their own formatting depending on that purpose. 

| Name | Extension | Uses | Format |
| -----| ----- | ----- | --- |
| Text file | ``.txt`` | Can take any values | Plain text, for example IMAGE |
| Comma-separated values (CSV) | ``.csv`` | Data storage, delimiter is a comma | IMAGE |
| Data file (DAT) | ``.dat`` | Can contain any file type (plaint text, PDF, audio, etc.). Often not human-readable. | Any |
| Crystallographic Information File (CIF) | ``.cif`` | Standard format for storing crystallographic structural data | IMAGE |
| XYZ file | ``.xyz`` | Standard format for storing atomic coordinates | IMAGE | 
| Python file (PY) | ``.py`` | Text file containing Python code, to be opened and used by a Python IDE | |
| Jupyter Notebook (IPYNB) | ``.ipynb`` | Text-based file used by Jupyter Notebook | IMAGE |

Each file type serves a different purpose. Try opening the files with different editors. For example, open a ``.py`` file in both your Python IDE (e.g. Spyder, IDLE, or VSCode), and then open it in a text editor (e.g. Word or Notepad) and look at the difference. The Python IDE is able to interpret the data in a useful way, and then save any changes you make. Open one of the data files. You will be able to extract data from a file (e.g. a .xyz file) and make it useable by your program.

If you want to use information from a file in your program, the file must be in the same <b>directory</b> as your pam.

## Delimiters

Objects in different kinds of files are separated in different ways. In text files, words are separated by spaces. In .csv files, cells are separated with a comma, or sometimes a semicolon. The <b>delimiter</b> is that character or sequence of characters used to specify the boundary between separate, independent values. 

Other kinds of delimiter might be new lines, appearing as "\n", a tab (standardly made up of 4 spaces), or occasionally a series of characters, such as "&#&#", used when items may contain the delimiter within them (for example, commas inside Excel cells in a CSV file would cause an error).

You will see these appear in the files below.

## Common file types

### .csv files

.csv files are some of the more common file types. You will often see Excel documents saved in this format. 

IMAGE = excel doc and associated csv format

As you can see, each cell is separated by a comma to its sides, and a new line above and below. The delimiter is a comma.

If you want to take data values from an Excel sheet, you need to make sure it is saved with the .csv format. If it isn't, you can simply click "save as" and change the file type to .csv. However, make sure you do not save it as "CSV UTF-8", or you will get unexpected formatting when you try to extract information from your file