# In the previous episode...

### quick overview of Classes
```python
d = Dog('Fido')
d.add_trick('roll over')
d.tricks
```

and re-implemented High-Low card game

# .. and now

# File system
Interacting with the file system is one of the most important things to do. It is used in almost all applications as it can :
- allow user to input large quantity of data at once
- save temporary intermediate files
- store final output files
- keep a log of what has been done, useful for troubleshooting

but first... let's download a silly little file from the internet

In [None]:
# Don't worry about the code in this cell for now, we'll get to this stuff a future lesson
import urllib.request   
urllib.request.urlretrieve("https://raw.githubusercontent.com/gabrielecalvo/Language4Water/master/assets/cat_haiku.txt", 'cat_haiku.txt')
print("downloaded :)")

## Opening a file
This can be achieved using the `open` function, which takes the **path to the file** as an input and returns a **file handle**
```python
file_handle = open('myfile.txt')
```
`open` also takes a second parameter **mode** which is the way in which we are asking to open the file, the most important for us now are:
- `"r"`: which is the defaut and it stands for **read** mode (load data from existing file)
- `"w"`: which stands for **write** mode (create a file)

In [None]:
fh = open('cat_haiku.txt')
fh

to read the content, we can use the `.read()` method of the file handler

## Reading the file content

In [None]:
content = fh.read()
content

those ugly `\n` are newline characters, if you use the `print` function, you can see it better:

In [None]:
print(content)

## Closing the file
It is **important** to ***close*** the handler once you are done with the file.
- If you opened it in read mode, others won't be able to edit it.
- If you opened it in write mode, data might not be writtend to the file until you close it.

In [None]:
fh.close()

## Using the context manager
To take all this headache away from you, it is ***strongly*** recommended to use the *context manager* (using the `with` keyword) which will handle the closing for you so you don't forget.

In [None]:
with open('cat_haiku.txt') as fh:
    content = fh.read()
    
print(content)

## reading content as list of lines
sometimes it is helpful to read the content as a list of lines, for example during pre-processing.

The file handle can be used as a sequence of strings separated by *newline* characters `\n`, so we can iterate over it with a for loop to extract each line.

In [None]:
clean_lines = []
with open('cat_haiku.txt') as fh:
    for line in fh:
        if not line.startswith("#") and line != "\n":
            clean_lines.append(line)
            
clean_lines

## writing a file
All that changes when writing a file is the mode attribute (`w` instead of `r`) and the use of the `write` method instead of `read`

In [None]:
with open("new_shiny_file.txt", "w") as fh:
    fh.write("hey, this is cool!!\n" * 20)

## warning about Windows file paths
Paths are sparated by different characters on Windows `\` compared to Linux/MacOs `/`. 

When using paths in windows that contain `\`, they need to be prefix by `r` to avoid that character being used to create special characters (e.g. `\n`). So use:

open(**r**".\new_shiny_file.txt")

In [None]:
# this will work
with open(r".\new_shiny_file.txt") as f:
    print(f.readlines()[0])

# this won't work
with open(".\new_shiny_file.txt") as f:
    print(f.readlines()[0])

# Exercise: Ishmael Counter
Count "Ishmael"s in Moby Dick and write the count to another file called `ishmael_counts.txt`

For this lesson we'll need an example file. Let's use the book ["Moby Dick"](http://www.gutenberg.org/cache/epub/2701/pg2701.txt), let's download it.

The actual text starts at line 536 and ends at 21743 but for this exercise it doesn't matter.

In [None]:
# Don't worry about the code in this cell for now, we'll get to this stuff a future lesson
import urllib.request
urllib.request.urlretrieve("https://raw.githubusercontent.com/egh/moby-dick/master/mobydick.txt", "moby_dick.txt")
print("downloaded :)")

In [None]:
# enter your solution here
...

#### possible solution

In [None]:
with open("moby_dick.txt") as fh:
    content = fh.read()

counter = 0
for word in content.split():
    if "Ishmael" in word:
        counter +=1
# or just `content.count("Ishmael")`        

result = f"There are {counter} `Ishmael`s in Moby Dick"
with open("ishmael_counts.txt", "w") as fh:
    fh.write(result)

# Exercise: Haiku Checker
let's create a small program that will:

- open the cat_haiku.txt file, 
- take the haiku part and clean up empty rows
- check if it matches the following criteria:
  - 3 sentences
  - pattern 5, 7, 5 syllables

To start, use the following `sentence_syllable_count` and `remove_punctuation` as given below, but, if you want a challenge you can try to implement them from scratch.

### sub-exercise: syllable counting
try to implement it using the following rules ([1-3](https://personal.utdallas.edu/~pervin/Flesch.txt), [4](http://english.glendale.cc.ca.us/phonics.rules.html)):
  1. Each group of adjacent vowels {a,e,i,o,u,y} counts as one syllable (for example, the "ea" in "real" contributes one syllable, but the "e..a" in "regal" count as two syllables). 
  2. An "e" at the end of a word doesn't count as a syllable.
  3. Each word has at least one syllable, even if the previous rules give a count of 0.
  4. The diphthongs are: "oi, oy, ou, ow, au, aw, oo" always count as 1 syllable.

In [None]:
def word_syllable_count(word):
    """simplified syllable counting, won't work every time: e.g. `vehicle`"""
    word = word.lower()
    vowels = "aeiouy"
    diphthongs = "oi,oy,ou,ow,au,aw,oo".split(',')
    syllables = 0
    last_was_consonant = False
    
    for diphthong in diphthongs:
        word = word.replace(diphthong, "xox")
    
    for char in word:
        if (char in vowels) and last_was_consonant:
            syllables += 1
            last_was_consonant = False
        else:
            last_was_consonant = True
            
    if word[-1] == 'e':
        syllables -= 1
        
    return max(1, syllables)

def sentence_syllable_count(sentence):
    syllable_count = 0
    for word in sentence.split():
        syllable_count += word_syllable_count(word)
    return syllable_count

def remove_punctuation(s):
    to_remove = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~\n'
    for i in to_remove:
        s = s.replace(i, "")
    return s

In [None]:
# tests
print(word_syllable_count(word="flower") == 2)
print(word_syllable_count(word="thought") == 1)
print(word_syllable_count(word="teacher") == 2)
print(word_syllable_count(word="broadcast") == 2)
print(word_syllable_count(word="dreamed") == 2)
print(word_syllable_count(word="face") == 1)
print(word_syllable_count(word="meow") == 2)
print(sentence_syllable_count(sentence="cat in a hat") == 4)
print(remove_punctuation("my.ha;i#ku\n") == "myhaiku")
print(remove_punctuation("my.ha;i#ku\n") == "myhaiku")

In [None]:
with open("cat_haiku.txt") as fh:
    lines = fh.readlines()
    
clean_lines = []
for item in lines:
    if item.startswith("#") or item.startswith("\n"):
        continue
    clean_lines.append(item)
    
is_haiku = True    

if len(clean_lines) != 3:
    is_haiku = False
    print("NOT AN HAIKU: it had != 3 lines")   
if sentence_syllable_count(clean_lines[0]) != 5:
    is_haiku = False
    print("NOT AN HAIKU the first sentence does not have 5 sillables")
if sentence_syllable_count(clean_lines[1]) != 7:
    is_haiku = False
    print("NOT AN HAIKU the second sentence does not have 7 sillables")
if sentence_syllable_count(clean_lines[2]) != 5:
    is_haiku = False
    print("NOT AN HAIKU the third sentence does not have 5 sillables")
    
if is_haiku:
    print("IT IS AN HAIKU")

### Possible Alternative Solution

In [None]:
def load_valid_lines(filepath):
    clean_lines = []
    with open(filepath) as fh:
        for line in fh.readlines():
            if not line.startswith("#") and line != "\n":
                clean_lines.append(line)
    return clean_lines

def check_sentence(sentence, expected_count):
    clean_sencence = remove_punctuation(sentence)
    actual_count = sentence_syllable_count(clean_sencence)
    if actual_count != expected_count: 
        print(f"The following sentence in the Haiku should have {expected_count} syllables but has {actual_count}:\n`{clean_sencence}`")
        return False
    return True

def is_haiku(filepath):
    clean_lines = load_valid_lines(filepath)
    print(''.join(clean_lines))
    
    if len(clean_lines) != 3: 
        print(f"Haiku must have 3 sentences! This has {len(clean_lines)}")
        return False
    
    if (
        check_sentence(sentence=clean_lines[0], expected_count=5) and 
        check_sentence(sentence=clean_lines[1], expected_count=7) and 
        check_sentence(sentence=clean_lines[2], expected_count=5) 
    ):
        return True
    else:
        return False
    
is_haiku(filepath='cat_haiku.txt')