# ⇝🥗 Summary Week 03 🥗⇜
## 👨🏽‍🚀🏄‍♂️🚀👨‍🎓🔰

![unpackAI Logo](images/unpackAI_logo_whiteBG.svg)

# Lesson 10: Files 📁
💡📌📦❔ 💻🧙⚠🔖

## 📌To process a file (read its content or write in it), you need 3 steps

1. Open the file first: it will create a "file handler": `<file handler> = open(<path>)`
2. Use the file handler for your operations (read or write)
  - Read the whole content: `<file handler>.read()`
  - Write some content: `<file handler>.write(<some string>)`
  - ... or some choices we will see later
3. Optionaly (but recommended), close the file so other programs can read/write it: `<file handler>.close()`

💡When reading a file with **non-ASCII characters** (meaning non-standard English so for examples icons like this 🧙, Chinese characters, or even French or Russian...), you will need to specify an **encoding** to decipher the characters. Usually, UTF-8 encoding is working in most cases: to use it, you need to write `open(<path>, encoding="utf-8")` when opening the file. 

In [9]:
# Usually, we use variable "f" for file handler
# We open this notebook with plenty of icons so we use utf-8
f = open("Summary_W03.ipynb", encoding="utf-8")  
content = f.read()
content[:600] # we will just show the first characters

'{\n "metadata": {\n  "language_info": {\n   "codemirror_mode": {\n    "name": "ipython",\n    "version": 3\n   },\n   "file_extension": ".py",\n   "mimetype": "text/x-python",\n   "name": "python",\n   "nbconvert_exporter": "python",\n   "pygments_lexer": "ipython3",\n   "version": "3.7.1"\n  },\n  "orig_nbformat": 2,\n  "kernelspec": {\n   "name": "python371jvsc74a57bd08a428f088ac08f7fdf89fb2f4bb23c2bc8a83acc5d068a78d54410b8e3eeec73",\n   "display_name": "Python 3.7.1 32-bit"\n  }\n },\n "nbformat": 4,\n "nbformat_minor": 2,\n "cells": [\n  {\n   "source": [\n    "# ⇝🥗 Summary Week 03 🥗⇜\\n",\n    "## 👨🏽\u200d🚀🏄\u200d♂️🚀👨\u200d🎓🔰\\n",'

## 💡 Easier way to manipulate files: with `pathlib` 🧙💻

Python version 3.4 introduced an easier way to manipulate paths with the module `pathlib`.

It works like this:

1. Import `Path` from `pathlib` module: `from pathlib import Path`
2. Create the path and store in a variable: `<path variable> = Path(<path>)`
3. Use the path variable, like for example:
  - Read the whole content of the file (no need to open it): `<path variable>.read_text()`
  - Write content to the file (no need to open it): `<path variable>.write_text(<string content>)`
  - ... and much more you can do

All methods available for paths can be found in the [official documentation](https://docs.python.org/3/library/pathlib.html).

In [56]:
# Example to create a file with 10,000 random integers between 1 and 100
from pathlib import Path
from random import randint

txt_path = Path("list_int.txt") 
txt_path.write_text("\n".join(
    str(randint(1, 100)) if i % 20 else "" 
    for i in range(10_000)
))

28222

In [63]:
# You can then read the content of this file
# We want to read the first 5 lines
i = 0
list_lines = list()
for line in open("list_int.txt"):
    list_lines.append(line)
    i += 1
    if i >= 10:
        break

print(list_lines)

# ... or with list comprehensions and enumerate
list_lines = [line for i, line in enumerate(open("list_int.txt")) if i < 10]
print(list_lines)

['\n', '14\n', '63\n', '1\n', '81\n', '74\n', '61\n', '49\n', '98\n', '74\n']
['\n', '14\n', '63\n', '1\n', '81\n', '74\n', '61\n', '49\n', '98\n', '74\n']


## 📌Notice that lines end with a "EOL"

**EOL** = _End of Line_. An EOL character is a character that marks the end of a line. This is the equivalent of line return ("ENTER") in MS Word or a typewriter.

In a program, it is usually the character `\n`, but also sometimes on Windows the two characters `\r` followed by `\n`.


You can remove this character with the `str.strip()` or `str.rstrip()` methods.


In [62]:
[line.rstrip() for i, line in enumerate(open("list_int.txt")) if i < 10]

['', '14', '63', '1', '81', '74', '61', '49', '98', '74']

### We have several ways to check if a line is empty:

* Check that it starts with `\n` character (`line.startswith("\n")`)
* Check that after removing EOL at the end, it's the empty string (`line.rstrip() == ""`)

💡 The opposite of `str.startswith()` is `str.endswith()` :)

In [64]:
[line.rstrip() for line in open("list_int.txt") if not line.startswith("\n")][:10]

['14', '63', '1', '81', '74', '61', '49', '98', '74', '14']

In [57]:
# ❔ QUESTION: How to count how many times each number is present in the file "list_int.txt"?
# Note: we need to skip empty lines :)

# We will use "counter" to help count (without having to check if we already have the key)
from collections import Counter

count_nb = Counter()
for line in open("list_int.txt"):
    if line.strip():  # meaning the stripped line has some content
        count_nb[int(line)] += 1  # note: int("10\n") will return integer 10 ("\n" is dropped)

# ... or even shorter because "Counter" can count automatically from a list ;)
count_nb = Counter(int(line) for line in open("list_int.txt") if line.strip())

# Numbers sorted by frequency (by default when you print a counter)
print(count_nb)

print("\n--- Average ---")
average_count = sum(count_nb.values()) / len(count_nb)
print(average_count)

print("\n--- 10 most common numbers ---")
count_nb.most_common(10)

Counter({91: 116, 99: 113, 3: 113, 31: 112, 11: 111, 44: 111, 9: 110, 74: 109, 73: 109, 70: 109, 13: 107, 90: 107, 54: 107, 80: 106, 10: 104, 16: 104, 38: 104, 72: 103, 57: 103, 89: 103, 24: 103, 21: 102, 94: 102, 69: 102, 6: 102, 81: 101, 78: 101, 50: 101, 43: 101, 47: 100, 33: 100, 82: 99, 87: 99, 27: 98, 41: 98, 42: 98, 58: 98, 96: 97, 12: 97, 95: 97, 37: 97, 23: 97, 49: 96, 98: 96, 52: 96, 92: 96, 32: 96, 1: 95, 88: 95, 53: 95, 71: 95, 25: 95, 20: 95, 66: 94, 8: 94, 46: 93, 59: 93, 63: 92, 77: 92, 100: 92, 22: 92, 83: 92, 60: 92, 67: 92, 65: 91, 5: 91, 48: 91, 2: 91, 68: 90, 29: 90, 19: 89, 36: 88, 30: 88, 28: 88, 17: 87, 39: 87, 4: 87, 34: 87, 64: 87, 79: 87, 56: 87, 40: 87, 86: 87, 85: 86, 7: 86, 93: 86, 55: 85, 45: 85, 75: 85, 14: 84, 61: 84, 15: 84, 26: 83, 62: 81, 51: 81, 76: 81, 35: 80, 97: 77, 84: 77, 18: 76})

--- Average ---
95.0

--- 10 most common numbers ---


[(91, 116),
 (99, 113),
 (3, 113),
 (31, 112),
 (11, 111),
 (44, 111),
 (9, 110),
 (74, 109),
 (73, 109),
 (70, 109)]

### You can also read lines one by one with `readline()`

In [60]:
f = open("list_int.txt")
line_1 = f.readline().rstrip()
line_2 = f.readline().rstrip()
line_3 = f.readline().rstrip()
print(f"First line: {line_1}")
print(f"Second line: {line_2}")
print(f"Third line: {line_3}")

First line: 
Second line: 14
Third line: 63


# Lesson 11: Writing to Files 📝

When we open a file, we need to specify what we want to do with it by writing `open(<file>, <mode>)` with `<mode>` being:

* Read content (the default) => mode `r`
* Write content AND ERASE EVERYTHING THAT WAS THERE BEFORE => mode `w`
* Write content after the existing content (i.e. "append") => mode `a`

If no mode is selected, it's the "reading" mode that is selected (like we have seen in previous lesson).

👨🏽‍🚀Actually, there are also 3 other modes `rb`, `wb`, and `ab` to use when you want to read binary code of file... but we will ignore that for the time being ;) There is also the Read-Write mode `r+` but it's a nightmare to use and you better never use it ;)

In [67]:
f = open("list_int.txt")
for _ in range(5):
    print(f.readline().rstrip())


14
63
1
81


### To write in a file, you need to open with proper mode and use `write()` method

⚠ Unlike `print` that adds a line return at the end, `write()` does not. 
The reason behind is because we more often need to write the content of a file pieces by pieces (and not line by line)

In [87]:
f = open("my_file.txt", "w")
for i in range(5):
    f.write(f"{i}\n")

# print("I have finished writting in the file... or have I?")

### After you have written to a file with `write()`, the file might look empty...

If you open the file after running the previous cell, you will notice that this file (_my_file.txt_) is empty (at least, sometimes it is).

What happens is that the content is first stored in memory and needs to be pushed to the file (i.e. "flushed", like we do for the toilets🚽).

You can manually flush the toilets, euh sorry the file, by using the `<file handler>.flush()` method (i.e. `f.flusht()` in our previous case).
But usually, the way we do is to close the file once we have finished writting (and this will flush automatically).


In [88]:
f = open("my_file.txt", "w")
for i in range(5):
    f.write(f"{i**2}\n")
f.close()

### There is a better way to close the file

Because it's so important to close the file, even where there is an error somewhere in the code, and that we don't want to always type `f.close()` each time we open (remember: coders and mathematicians are lazy), we have a simplified syntax to do it which is the golden standard:

```
with open(<path>, <mode>) as f:
    ... (the code about the file)
```

Even if we have an exception in the code (e.g. missing index in list), the file will be closed clean: this is the magic of `with`.

In [90]:
# ... so the previous code would be re-written like this:

with open("my_file.txt", "w") as f:
    for i in range(5):
        f.write(f"{i**2}\n")

## Going back to `pathlib.Path`

Once again, I personally recommend using `pathlib` when you work with files, paths, and read/write content.

Look how easy it is:
* A first line to import (i.e. "load") the module: `from pathlib import Path` 
* `Path("list_int.txt").read_text()`: open the file, read the content, and close it
* `Path("my_file.txt").write_text("Hello\nunpackAI")`: open the file for writing, write the content, flush, and close it

You can learn more about it in the [official documentation](https://docs.python.org/3/library/pathlib.html) or in any tutorial among the bunch of them that exist.

In [92]:
from pathlib import Path

file_path = Path("my_file.txt")  # create the path
file_path.write_text("\n".join(str(i**2) for i in range(5)))  # write
print(file_path.read_text())  # read and print

0
1
4
9
16


In [None]:
# Lesson 12: Functions ℱ


# Lesson 13: More about Functions ℱ🥏




# ⇝🔰 THE 🧙 END 🔰⇜