# File handling

Felhantering med with, and closes it automatically

```
with open(path, option) as name:
    statements
```

options

- "r": read
- "a": append to a file (if it doesn't exit, it will create the file)
- "w": write (if it doesn't exist, it will create the file)
- "x": create a file, error if it already exists


In [6]:
with open("../Data/quotes.txt", 'r') as f:
    text = f.read()

print(text)

  If     we     knew what it was      we were doing, it would not be called research,          would it?     - Albert Einstein

Time is a drug. Too       much of it kills you.  -  Terry Pratchett


 An expert is a person who       has made all the mistakes that           can be made in a          very narrow field - Niels Bohr

   Everything must be made as simple as possible. But not simpler. - Albert Einstein     


  Nothing in life                is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie  Curie  

If I have seen further     it is by standing on the shoulders of Giants. - Isaac Newton


# Cleaning up quotes.txt

- inspect txt-file manually (some prankster has added random noise in form of whitespace and newlines)
- remove leading and trailing whitespaces
- remove excessive white spaces in between words
- add quote numbers

In [111]:
import re

with open("../Data/quotes.txt", 'r') as f_read, \
    open("../Data/quotes_clean.txt", 'w') as f_write:

    quote_number = 1

    # loops through each line in the text file
    for quote in f_read: # every line
        quote = quote.strip(" \n") # removes leading trailing whitespace and newlines
        quote = re.sub(" +", " ", quote) # regex to substitute >= 1 white space with 1 whitespace

        # if not blank line, write to new file
        # so write only overwrites when opening file as f_write
        # makes sense to keep all manipulation within with, if writing at end
        if quote != "":
            f_write.write(f"{quote_number}. {quote}\n")
            quote_number += 1

        print(repr(quote))

'If we knew what it was we were doing, it would not be called research, would it? - Albert Einstein'
''
'Time is a drug. Too much of it kills you. - Terry Pratchett'
''
''
'An expert is a person who has made all the mistakes that can be made in a very narrow field - Niels Bohr'
''
'Everything must be made as simple as possible. But not simpler. - Albert Einstein'
''
''
'Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie Curie'
''
'If I have seen further it is by standing on the shoulders of Giants. - Isaac Newton'


## Pick out the authors

- find digit to find quote
- extract first name and last names
- join into full name
- gen unique values

In [112]:
with open("../Data/quotes_clean.txt", 'r') as f_quotes, \
    open("../Data/quotes_clean.txt", 'a') as f_append:

    # quotes = [quote.strip("\n") for quote in f_quotes.read()] # quote is a letter
    # quotes = [quote.strip("\n") for quote in f_quotes] # quote is a line

    # .readlines() every line as a list
    # strips away "/n"
    quotes = [quote.strip("\n") for quote in f_quotes.readlines()] # quote is a line
    print(quotes)
    authors = [quote.split()[-2:] for quote in quotes]
    print(authors)

    # set - gives the unique elements (random order)
    authors = set([" ".join(author) for author in authors])
    print(authors)

    f_append.write("\nAuthors: ")
    for author in authors:
        f_append.write(f"{author},  ")


['1. If we knew what it was we were doing, it would not be called research, would it? - Albert Einstein', '2. Time is a drug. Too much of it kills you. - Terry Pratchett', '3. An expert is a person who has made all the mistakes that can be made in a very narrow field - Niels Bohr', '4. Everything must be made as simple as possible. But not simpler. - Albert Einstein', '5. Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less. - Marie Curie', '6. If I have seen further it is by standing on the shoulders of Giants. - Isaac Newton']
[['Albert', 'Einstein'], ['Terry', 'Pratchett'], ['Niels', 'Bohr'], ['Albert', 'Einstein'], ['Marie', 'Curie'], ['Isaac', 'Newton']]
{'Terry Pratchett', 'Isaac Newton', 'Niels Bohr', 'Marie Curie', 'Albert Einstein'}
test


In [97]:
name = [["Johan", "Sandberg"]]

" ".join(name[0])

'Johan Sandberg'