# File Formats
## Try me
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ffraile/computer_science_tutorials/blob/main/source/Data%20Manipulation/tutorials/Files.ipynb)[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ffraile/computer_science_tutorials/main?labpath=source%2FData%20Manipulation%2Ftutorials%2FFiles.ipynb)
## Introduction
Before we dive into data processing, let us discuss some commmon file formats used to store data, set the basic terminology and describe the main steps involved when dealing with data files in computer programming.

### Basic explanation of how Python read files
At the end, a file is just a collection of bytes containing information for a specific purpose. In this Notebook, we will address different common file formats that contain information represented as **text**. Text files are composed of characters and organized in **lines**. In storage, characters need to be **encoded** into bytes. This process is called character encoding and each file may use a different character encoding, although your operating system will define a default character encoding to be used.
Line breaks will be stored using a special character, and the end of the file will also be encoded as an special character.
So basically, when reading a file in Python, we will read the contents line by line, until the end of file character is detected. But before we are able
Another important

#### Opening files
After this brief explanation, with no further ado, let´s start with practice. Copy the content of the next cell in a file using a text editor (a plain text editor like Notepad or TextEdit) and save it in a file named example.txt

```text
Hello,
This is the first file to try in Python.
Best luck!
```
Once you have saved it, you need to import it in your Python runtime. If you have opened this Notebook in Colabs, you need to open the lateral menu *Files* (the one with the folder 📁 icon), and either drag and drop the file in the area where the files and folders in your runtime are listed, or click on the button upload.

![Import file in colabs](img/colabs_import.png)

> ☝ Note that you can also connect your Google Drive folder to your runtime and use any file you have stored in there!

Once you have uploaded the file (and the example.txt file is available in the file system of your Python runtime, as in the figure), you are ready to test the following cell:


In [1]:
f = open("example.txt")
line = f.readline() # read one line
while line: # if line is an empty string, this will evaluate to false
    print(line)
    line = f.readline() # read a new line again
f.close() # Close the file

Hello,

This is the first file to try in Python.

Best luck!



Note that we used the built-in function ```open()``` to open the file. This built-in method takes one argument with the location of the file you want to open, either relative to your Python script, or absolute, from the root directory of your file system. You need to have permissions in your file system to open the file, otherwise this line might raise an error.

The ```open()``` method returns a file object (assigned to variable ```f``` in the example), which has a ```readline()``` method that returns a string with the context of the next line (the first line after calling open and subsequent lines thereafter), until the end-of-file character is detected, in which case, an empty string is returned. In the example, we assign the result to the variable ```line``` in a while loop. Since an empty string evaluates to false, the example prints the file line by line and exists the loop when the end of the file is reached.

Finally, we use the method ```close()``` to close the file. In practice, closing the file makes sure that the runtime keeps track of which files are open by which applications and takes measures to avoid inconsistencies (more on this below).

In some examples, you may find that the file is opened using the keyword ```with```, as in the following example:


In [3]:
with open("example.txt") as f:
    line = f.readline() # read one line
    while line: # if line is an empty string, this will evaluate to false
        print(line)
        line = f.readline() # read a new line again
    f.close() # Close the file

Hello,

This is the first file to try in Python.

Best luck!



The ```with``` statement assigns the result of the ```open()``` function to a variable f that only exists in the context of the indented code below it. This gives us more control to ensure that the file is  loaded in memory only when it is required.

#### Modes
The ```open()``` function has some additional arguments worth highlighting, one is the opening mode. This arguments gives additional security control to open the file, explicitly indicating what we want to do with the file in our program, so that for instance we cannot write in a file if we do not have permissions to modify it. The opening mode is specified using the characters in the table below, extracted from the official Python documentation:

| Character | Meaning                                                         |
|-----------|-----------------------------------------------------------------|
| 'r'       | open for reading (default)                                      |
| 'w'       | open for writing, truncating the file first                     |
| 'x'       | open for exclusive creation, failing if the file already exists |
| 'a'       | open for writing, appending to the end of file if it exists     |
| 'b'       | binary mode                                                     |
| 't'       | text mode (default)                                             |
| '+'       | open for updating (reading and writing)                         |

By default, files are opened with mode 'rt' or 'r' which is equivalent, so what we can only read lines in the file, and do not write to it. The mode 'w' allows us to write in the file, using the ```write()``` method, but first it *truncates* the file, meaning that in practice we will overwrite its contents. If we do not want to override the contents of the file, we can either use mode 'a' (to append content after the last line of the file), or mode 'r+', to read the file from the beginning and being able to modify each line with ```write()``` before reading new lines.

In the example below, we write a small program to write a shopping list into a file using the input provided by the user:

In [None]:
with open("list.txt", 'a') as f:
  while True:
    line = input("Write something to append to the list or click Enter to exit")
    if line:
      f.write(line + "\n")
    else:
      f.close()
      break

Note that we added the special character ```"\n"``` to the method write so that each entry is written in the list is written in a new line.
