# Week 5 - File Management, Input / Output, Web Interaction

## File Manipulation

Every piece of data on your computer has an address - generally you will use windows ecplorer or similar to manipulate files. Open windows explorer and navigate to the directory where you saved this file. Try and find the path to this file - typically this is done by right clicking the files and selecting 'properties' - the path should then read something like:

`C:\Users\Admin\PycharmProjects\DT9426`

You can see a hierarchy - the first part is typically called the `root` - in windows usually _C:_. Each folder or directory is then access via a backslash in windows (forward slash iOS and Linux).

In Python, files are usually accessed and manipulated via the `os` library. This is part of the standard library and is included in every stock version of Python.

In [None]:
import os

We will use the `os.getcwd()` command to __get__ the __c__ urrent __w__ orking __d__ irectory. This returns the directory (or folder) in which the current program is running. Python will assume when looking for files without a path supplied that you are looking for them in the current working directory.

In [None]:
os.getcwd()

os.getcwd()We can build paths to files using the `file.path()` function from the `os` module. By doing this we can build paths that are independent of operating system - ie it will insert `/` or `\` depending on the operating system. Remember that the `\` is a special character in Python, so if you want to use it you have to write it with a preceeding backslash (see above for the string that represents a valid file path in Python for Windows). To build the above path, we would use:

`os.file.path("C:", "Users", "Admin", "PycharmProjects", "DT9426", "week5")`

The `os.listdir()` function will return a list of the files within a directory whose path is supplies as an argument. See below.

In [None]:
this_dir = os.getcwd()

print(os.listdir(this_dir))

print(os.listdir("C:\\Users\\Admin\\PycharmProjects\\DT9426\\week5"))

os.listdir('data')

There are two ways to specify a file path.

 - An absolute path, which always begins with the root folder

 - A relative path, which is relative to the program’s current working directory

There are also the dot (.) and dot-dot (..) folders. These are not real folders but special names that can be used in a path. A single period (“dot”) for a folder name is shorthand for “this directory.” Two periods (“dot-dot”) means “the parent folder.”

Your programs can create new folders (directories) with the os.makedirs() function.

#### Try making a folder for a project that we're going to do toady. Call it "exercise_data"  and create it in the current working directory.

The `os.path` module has lots of functions to help in handling path manipulations - it is not something we need to go into reight now, but might be useful for scripting and making your code portable - ie so you can give code to someone else who can run it without problems on their machine. Lots of detail at https://automatetheboringstuff.com/chapter8/

# Reading and Writing Files

A very important of any programming is intaking data in some file format and outputting data as a file - this allows us to persist data beyond the runtime of the program you are executing.

### The open( ) function

In order to open a file for writing or use in Python, you must rely on the built-in open() function. 

open() is passed a path to a file and will return a _file object_, it is also passed an argument which specifies the way in which the program wants to use the file.  

An argument is nothing more than a value that has been provided to a function, which is relayed when you call it. So, for instance, if we declare the name of a file as “Test File,” that name would be considered an argument. 

The syntax to open a file object in Python is: 

`file_object  = open(“filename”, “mode”)` 

where 
 - file_object is the variable to add the file object. 
 - mode is the way the file will be used.


Mode

Including a mode argument is optional because a default value of ‘r’ will be assumed if it is omitted. The ‘r’ value stands for read mode, which is just one of many. 

The modes are: 

‘r’ – Read mode which is used when the file is only being read 
‘w’ – Write mode which is used to edit and write new information to the file (any existing files with the same name will be erased when this mode is activated) 
‘a’ – Appending mode, which is used to add new data to the end of the file; that is new information is automatically amended to the end 
‘r+’ – Special read and write mode, which is used to handle both actions when working with a file 

In [None]:
test_file = open("data\\testfile.txt", "r") 
print(test_file.read())
test_file.close()

Note: it is important to `close` your file after using it - not doing so _can_ create problems with corrupting the file, much like not ejecting a usb drive before removing it.

In order to avoid having to remember to close the file, it is better to get into the habit of using a `with` statement to handle files. This is called a `context manager` and will automatically close the file once operations have concluded. IT also provides better error handling. The syntax is as follows (note the indentation):

In [None]:
with open("data\\testfile.txt") as test_file:  
    data = test_file.readlines() 
    print(data)

Note above I used `readlines()`. 

What does this return? 

What does `readline()` do? 

What advantage do they give?

### Writing files

This is very similar to reading, in that we have to open a file (it is usually a file that does not exist yet...) and do something with that file. Instead of the mode being `r`, we now use `w`:

In [None]:
sonnet_1 = [
"From fairest creatures we desire increase,",
"That thereby beauty's rose might never die,",
"But as the riper should by time decease,",
"His tender heir might bear his memory:",
"But thou contracted to thine own bright eyes,",
"Feed'st thy light's flame with self-substantial fuel,",
"Making a famine where abundance lies,",
"Thy self thy foe, to thy sweet self too cruel:",
"Thou that art now the world's fresh ornament,",
"And only herald to the gaudy spring,",
"Within thine own bud buriest thy content,",
"And, tender churl, mak'st waste in niggarding:",
"Pity the world, or else this glutton be,",
"To eat the world's due, by the grave and thee."]

with open("data\\new_file.txt", 'w') as new_file:
    for l in sonnet_1:
        new_file.writelines(l + '\n')

You might also want to handle binary files - these are files which are made up of raw binary data (1s and 0s). An example would be an image file. In this case when you are supplying the `mode` argument to the read or write function, you add a `b` to the end. You would also do this if you wanted to be very particular about the text encoding of the data you are reading or writing.

# Downloading Data from the Web

Most of us use the web every day (every hour???!!!). It is a great source of knowledge and information, especially for pictures of cats...

So it would be very useful to be able to access the web from our Python program. Luckily for us, someone has already thought of this....

The `requests` module is a treasure trove of functions that can be used to access and download from the web. Raw web-pages can be downloaded and parsed, and it also provides easy ways to download data.

In [None]:
import requests

The requests module lets you easily download files from the Web without having to worry about complicated issues such as network errors, connection problems, and data compression.

The requests.get() function takes a string of a URL to download. By calling type() on requests.get()’s return value, you can see that it returns a Response object, which contains the response that the web server gave for your request.

In [None]:
url = "https://www.gutenberg.org/files/11/11-0.txt" # Alice in Wonderland, Project Gutenburg

web_request = requests.get(url)

print("Type:    " + str(type(web_request)))
print("\n\nLength:    " + str(len(web_request.text)))

print("\n\nFirst 500 characters:\n\n" + web_request.text[:500])

Let's now save this to a file... We will need to use the binary mode to preserve the text encoding of this data. This is beyond the scope of this course to discuss, suffice it to say that if you are having trouble with a read/write exectuion, try the binary mode. We will use the `.content` attribute of the requests object to access the content of the website.

In [None]:
with open("data\\Alice_In_Wonderland.txt", 'wb') as write_file:
    write_file.write(web_request.content)
    
# check the file now exists
os.listdir("data")

Now, have a go at downloading and saving a book that you would like from https://www.gutenberg.org/wiki/Main_Page

And then, have a go at reading the contents of the book into a program, and count the frequency of words in the document. Hint, a dict would be really useful here.....

Lots more detail at https://automatetheboringstuff.com/chapter11/