## **D3TOP - Tópicos em Ciência de Dados (IFSP Campinas)**
**Prof. Dr. Samuel Martins (@iamsamucoding @samucoding @xavecoding)** <br/>
xavecoding: https://youtube.com/c/xavecoding <br/><br/>

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

<hr/>

# Files

Python uses file objects to interact with external files on your computer. These file objects can be any sort of file you have on your computer, whether it be an audio file, a text file, emails, Excel documents, etc. Note: You will probably need to install certain libraries or modules to interact with those various file types, but they are easily available. (We will cover downloading modules later on in the course).

Python has a built-in open function that allows us to open and play with basic file types.

## Python Opening a File

### Know Your File's Location

In [None]:
# show the absolute pathname from the folder that contains
# this current jupyter-notebook file

### Opening a file

Be sure your are passing the **correct pathname** for the file, which can be:
- the _relative pathname_ from the current folder of this jupyter-notebook file
- the _absolute_ pathname of the file

In [None]:
# incomplete pathname


In [None]:
# correct relative pathname


In [None]:
# or correct absolute pathname

The `open` function receives two parameters: `file pathname` and `access mode`. The availabe `access modes` are:

- `"r"`: **Read** - Default value. Opens a file for reading, error if the file does not exist
- `"a"`: **Append** - Opens a file for appending, creates the file if it does not exist
- `"w"`: **Write** - Opens a file for writing, creates the file if it does not exist
- `"x"`: **Create** - Creates the specified file, returns an error if the file exists

In addition you can specify if the file should be handled as **binary** or **text mode**:

- `"t"`: **Text** - Default value. Text mode
- `"b"`: **Binary** - Binary mode (e.g. images)

By default, files are opened in **text reading mode** - `rt`.

In [None]:
# alternative one


In [None]:
# alternative two


In [None]:
# alternative three


## Closing a file
Once there is no further interaction with the file, it must be **closed**.

## Reading the content of a file: `.read()`

In [None]:
myfile = open('demos/star-wars-wikipedia.txt')
myfile

The `.read()` command reads the *entire content* of the file.

Note that the file has three paragraphs separated by an empty line each. The content stores `\n` for each line break. 

<br/>

If you try to read the same file again:

**There is no content!**

This happens because you can imagine the **reading "cursor"** is _at the end of the file_ after having read it. **So there is nothing left to read.**

When opening a file, the reading "cursor" is at the _beginning of it_.

<br/>

We can reset the reading "cursor" by using the command `.seak()`:

In [None]:
# Seek to the start of file (index 0)


This command **moves** the _reading "cursor"_ for the position of the **byte 0**, that is, the beginning of the file.

<br/>

Now we can read the entire file again!

We can choose any (valid) position to move the reading cursor. In text files, each character requires one byte of space.

Thus, if you pass 16 for the `seak()` command, we are moving the _reading cursor_ in 16 bytes/characters from the beginning of the file.

In [None]:
content_again = myfile.read()
content_again

<br/>

Note that the first 16 characters including the spaces were ignored: `'Star Wars is an '`

<br/>

To find out the current position of the reading cursor, we use the `.tell()` command:

In [None]:
# move the cursor for the 16th byte/character


In [None]:
# show the current position of the file


### `.readlines()`
You can read a file **line by line** using the `readlines()` method, returning a list where each element is a line of the file.

**Use caution with *large files***, since everything will be held in memory.

## Writing to a File

By default, the `open()` function will only allow us to read the file.

We need to pass the argument `'w'` to write over the file. It will create an ***empty text file*** to be written.

PS: you can choose the extension you wish.

<br/>

Now, use the `.write()` commands **to write** a string to the file.

PS: This method also accepts the _f-string_ convention.

We've just written 35 bytes/characters. <br/>
Now, open the file in a text editor and check it. See that the file is empty. This happen because we still **don't close** the file. The written content is just trully written to the file after closing it. 

Now, open the file in a text editor again and check its content. The content is now there!

<br/>

If you open the file again in the **writing mode**, its **content will be overwritten** since the file already exists, that is, all previous content will be lost.

Open the file in the file editor again and check it.

### Using `f-string`
The `.write()` command also accepts the `f-string` pattern:

In [None]:
love_letter.write('I walked alone on the street.\n')
love_letter.write('I spoke to the stars and the moon.\n')

### Appending to a File
Passing the read mode `'a'` opens the file and puts the cursor **at the end**, so anything written is **appended**. If the file does not exist, one will be created.

### `.writelines()`
We can write a list of elements to a file.

In [None]:
products = ['iPhone', 'Xbox', 'Playstation', 'Nintendo Switch', 'Fusca']

Note that no line separator (e.g., `\n`) was added to each element.

## Aliases and Context Managers
You can assign _temporary variable_ names as **aliases**, and manage the opening and closing of files **automatically** using a **context manager** (`with`):

Note that the `with ... as ...:` *context manager* **automatically closed** `animals.txt` after assigning the first line of text to `first_animal`:

## Iterating through a File

### Reading a CSV by scratch

In [15]:
# package for debugging


In [26]:
playstore_dict = {
    'apps': [],
    'categories': [],
    'ratings': [],
    'reviews': []
}



In [27]:
from pprint import pprint

pprint(playstore_dict)

{'apps': ['App',
          'Photo Editor & Candy Camera & Grid & ScrapBook',
          'Coloring book moana',
          'Gas Prices (Germany only)',
          'Sketch - Draw & Paint',
          'Pixel Draw - Number Art Coloring Book',
          'Paper flowers instructions',
          'Smoke Effect Photo Maker - Smoke Editor',
          'Infinite Painter',
          'Garden Coloring Book',
          'Kids Paint Free - Drawing Fun',
          'Text on Photo - Fonteee',
          'Name Art Photo Editor - Focus n Filters',
          'Tattoo Name On My Photo Editor',
          'Mandala Coloring Book',
          '3D Color Pixel by Number - Sandbox Art Coloring',
          'Learn To Draw Kawaii Characters',
          'Photo Designer - Write your name with shapes',
          '350 Diy Room Decor Ideas',
          'FlipaClip - Cartoon animation',
          'ibis Paint X',
          'Logo Maker - Small Business',
          "Boys Photo Editor - Six Pack & Men's Suit",
          'Superheroes Wallpa

Let's remove the _header_ (first element of each list):

'Reviews'

In [29]:
pprint(playstore_dict)

{'apps': ['Photo Editor & Candy Camera & Grid & ScrapBook',
          'Coloring book moana',
          'Gas Prices (Germany only)',
          'Sketch - Draw & Paint',
          'Pixel Draw - Number Art Coloring Book',
          'Paper flowers instructions',
          'Smoke Effect Photo Maker - Smoke Editor',
          'Infinite Painter',
          'Garden Coloring Book',
          'Kids Paint Free - Drawing Fun',
          'Text on Photo - Fonteee',
          'Name Art Photo Editor - Focus n Filters',
          'Tattoo Name On My Photo Editor',
          'Mandala Coloring Book',
          '3D Color Pixel by Number - Sandbox Art Coloring',
          'Learn To Draw Kawaii Characters',
          'Photo Designer - Write your name with shapes',
          '350 Diy Room Decor Ideas',
          'FlipaClip - Cartoon animation',
          'ibis Paint X',
          'Logo Maker - Small Business',
          "Boys Photo Editor - Six Pack & Men's Suit",
          'Superheroes Wallpapers | 4K Backgro

### Writing a CSV by scratch
Let's create a CSV file by scratch for our apps read previously. Let's only consider the columns: `'apps'` and `'ratings'`.

In [33]:
# writing the header


11