### Read a file

* open a file for reading, 'r', that is in the same directory
* read with the read() function
* close the file

Before running the code below, make a file named 'sample.txt' in the same folder as this code. Type anything in that file, save, and exit. 

In [None]:
f = open('sample1.txt','r') 
text = f.read()
print('You read:\n', text)
f.close()

You read:
 Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. Challenges in natural language processing frequently involve natural language understanding, natural language generation (frequently from formal, machine-readable logical forms), connecting language and machine perception, managing human-computer dialog systems, or some combination thereof.
Source: https://en.wikipedia.org/wiki/Natural_language_processing


### Read a line at a time

The following code shows a *for* loop to process one line at a time.

In [None]:
f = open('sample1.txt', 'r')
for line in f:
    print(line)
f.close()

Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. Challenges in natural language processing frequently involve natural language understanding, natural language generation (frequently from formal, machine-readable logical forms), connecting language and machine perception, managing human-computer dialog systems, or some combination thereof.

Source: https://en.wikipedia.org/wiki/Natural_language_processing


### Using "with"

The *with* statement starts a block of code. When we are through with the block of code, Python will close the file automatically.

In [None]:
with open('sample1.txt', 'r') as f:
    text = f.read()
print("You read:\n", text)

You read:
 Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. Challenges in natural language processing frequently involve natural language understanding, natural language generation (frequently from formal, machine-readable logical forms), connecting language and machine perception, managing human-computer dialog systems, or some combination thereof.
Source: https://en.wikipedia.org/wiki/Natural_language_processing


### Encoding

Encoding used to be a pain in Python 2 but is less of a problem in Python 3, which uses utf-8 by default. However, you can specify the encoding if you need to. The strip() function removes newlines.

In [None]:
with open('sample1.txt', 'r', encoding='utf-8') as f:
    for line in f:
        print(line.strip())

Natural language processing (NLP) is a field of computer science, artificial intelligence and computational linguistics concerned with the interactions between computers and human (natural) languages, and, in particular, concerned with programming computers to fruitfully process large natural language corpora. Challenges in natural language processing frequently involve natural language understanding, natural language generation (frequently from formal, machine-readable logical forms), connecting language and machine perception, managing human-computer dialog systems, or some combination thereof.
Source: https://en.wikipedia.org/wiki/Natural_language_processing


### Get text from the web

The urllib library contains functions to handle urls. Below we read text from a web page.

In [1]:
from urllib import request
url = "http://www.gutenberg.org/files/2554/2554-0.txt"
crime = request.urlopen(url).read().decode('utf8')
crime[:1000]

'\ufeffThe Project Gutenberg eBook of Crime and Punishment, by Fyodor Dostoevsky\r\n\r\nThis eBook is for the use of anyone anywhere in the United States and\r\nmost other parts of the world at no cost and with almost no restrictions\r\nwhatsoever. You may copy it, give it away or re-use it under the terms\r\nof the Project Gutenberg License included with this eBook or online at\r\nwww.gutenberg.org. If you are not located in the United States, you\r\nwill have to check the laws of the country where you are located before\r\nusing this eBook.\r\n\r\nTitle: Crime and Punishment\r\n\r\nAuthor: Fyodor Dostoevsky\r\n\r\nTranslator: Constance Garnett\r\n\r\nRelease Date: March, 2001 [eBook #2554]\r\n[Most recently updated: August 6, 2021]\r\n\r\nLanguage: English\r\n\r\nCharacter set encoding: UTF-8\r\n\r\nProduced by: John Bickers, Dagny and David Widger\r\n\r\n*** START OF THE PROJECT GUTENBERG EBOOK CRIME AND PUNISHMENT ***\r\n\r\n\r\n\r\n\r\nCRIME AND PUNISHMENT\r\n\r\nBy Fyodor Dostoev

# Write to a file

Writing to a file involves 3 steps:
* open the file
* write to the file
* close the file

All 3 are demonstrated below. Note that the write() function doesn't write newline so we need to.

In [None]:
f = open('temp.txt', 'w')
f.write('This is the first line\n')
f.write('This is another line\n')
f.close()

Let's read the file in and print each line to the screen.

f.read() reads the file while .splitlines() separates on newline, getting rid of newline in the process.

In [None]:
with open('temp.txt','r') as f:
    lines = f.read().splitlines()
for line in lines:
    print(line)

This is the first line
This is another line


## Formatting output

There are a few ways to format output:

* the old way: '%d %s' % (number, name)
* a newer way: '{} {}'.format(number, name)
* the f-string way: f'{number} {name}'

Python f-strings have been available since Python 3.6. The {} in fstrings can contain variables or expressions. 


In [None]:
number = 42
name = 'Who'

print('%d %s' % (number, name))
print('{} {}'.format(number, name))
print(f'{number} {name}')

42 Who
42 Who
42 Who


In [None]:
num = 3
gpa = 3.7
name = 'Ralph'
f = open('temp.txt', 'w')
f.write(f'Name: {name:8} Favorite number is {num} GPA is {gpa:.2f}')
f.close

# read back in
with open('temp.txt', 'r') as f:
    lines = f.read().splitlines()
for line in lines:
    print(line)

Name: Ralph    Favorite number is 3 GPA is 3.70


Formatting is a lengthy and boring subject. When you need to know details, refer to the Python documentation or [this link](https://realpython.com/python-f-strings/).

### Path

The Path methods are useful for on-disk programs that need to run on both Macs and Windows computers. We will cover importing files into colab later in the course. 

The following code blocks are for reference only in writing on-disk programs. 

In [None]:
from pathlib import Path

folder = Path('data/')
file_path = folder / 'titanic3.csv'

f = open(file_path)

contents = f.read()
print(contents[:200])

pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
1,1,"Allen, Miss. Elisabeth Walton",female,29,0,0,24160,211.3375,B5,S,2,,"St Louis, MO"
1,1,"Allison, Master. Hu


In [None]:
# get current working directory
import pathlib

pathlib.Path.cwd()

PosixPath('/Users/mazidi/Dropbox/NSL/Python_Fundamentals/notebooks')