# Managing files in Python

## What are files, directories and paths?

These are simple thing that many computer users already know, but I'll go
through them just to make sure you know them also.

### Files

- Each file has a **name**, like `hello.py`, `mytext.txt` or
    `coolimage.png`. Usually the name ends with an **extension** that
    describes the content, like `py` for Python, `txt` for text or `png`
    for "portable network graphic".
- With just names identifying the files, it wouldn't be possible to have
    two files with the same name. That's why files also have a
    **location**. We'll talk more about this in a moment.
- Files have **content** that consists of
    [8-bit bytes](https://www.youtube.com/watch?v=Dnd28lQHquU).

### Directories/folders

Directories are a way to group files. They also have a name and a location
like files, but instead of containing data directly like files do they
contain other files and directories.

### Paths

Directories and files have a path, like `C:\Users\me\hello.py`. That just
means that there's a folder called `C:`, and inside it there's a folder
called `Users`, and inside it there's a folder called `me` and inside it
there's a `hello.py`. Like this:

```
C:
└── Users
    └── me
        └── hello.py
```

`C:\Users\me\hello.py` is an **absolute path**. But there are also
**relative paths**. For example, if we are in `C:\Users`, `me\hello.py`
is same as `C:\Users\me\hello.py`. 

The place/path we are in is sometimes
called **current directory**, **working directory** or
**current working directory**.

On a Mac (or any other `UNIX` like operating system) paths are slightly different. The equivalent to the above path would be `/home/me/hello.py`. And If you're in `/home`, `me/hello.py` is same as `/home/me/hello.py`.

```
/
└── home
    └── me
        └── hello.py

```

## Writing to a file

Let's create a file and write a hello world to it.


In [None]:
with open('hello.txt', 'w') as f:
    f.write("Hello World!")

Doesn't seem like it did anything. But actually it created a `hello.txt`
somewhere on our system

Since we're using jupyter notebooks - the code above would have created this file in the directory where the `jupyter notebook` is running (a.k.a this directory)

On Windows it's probably in `C:\Users\YourName`,
and on most other systems it should be in `/home/yourname`. 

You can open it with notepad or any other plain text editor your system comes with by opening the folder that contains the file and then double-clicking the file.


#### So how does that code work?

This is what's happening - 
* we open a path with `open`, and it gives us a Python file object that is assigned to the variable `f`.
    * The first argument / input we passed to `open` was the path we wanted to write.
    * The second argument was `w`. It's short for write, and that just means that we'll create a new file. There's some other modes we can use also:

| Mode  | Short for | Meaning                                                               |
|-------|-----------|-----------------------------------------------------------------------|
| `r`   | read      | Read from an existing file.                                           |
| `w`   | write     | Write to a file. **If the file exists, its old content is removed.**  |
| `a`   | append    | Write to the end of a file, and keep the old content.                 |

* this variable `f` has an inbuilt function called `write` which writes strings to files.

But what is that `with ourfile as f` crap? That's just a fancy way to make sure that the file gets closed, no matter what happens. As we can see, the file was indeed closed.


In [None]:
f.closed

In Python 3.0 - We can also write to a file using the `print` statement. The print is just like any other print, but we also need to specify that we want to print to the file we opened using `file=f`.


In [None]:
with open('hello1.txt', 'w') as f:
    print("Hello again, World!", file=f)

## Reading from files

After opening a file with the `r` mode we can for loop over it, just
like it was a list. So let's go ahead and read everything in the file
we created to a list of lines.


In [None]:
lines = []
with open('hello1.txt', 'r') as f:
    for line in f:
        lines.append(line)

In [None]:
lines

So now we have the hello world in the `lines` variable, but it's
`['Hello again, World!\n']` instead of `['Hello again, World!']`. 

`\n` means newline. Note that it needs to be a backslash, so `/n`
doesn't have any special meaning like `\n` has. When we wrote the file with print it actually added a `\n` to the end of it. It's recommended to end the content of files with a newline character, but it's not necessary.

However, when we initially wrote to the `hello.txt` file using the `write` function - we see that the newline was not added


In [None]:
lines = []
with open('hello.txt', 'r') as f:
    for line in f:
        lines.append(line)

In [None]:
lines

Trying to open a non-existent file with `w` or `a` creates the file for
us, but doing that with `r` gives us an error instead

In [None]:
with open('this-doesnt-exist.txt', 'r') as f:
    print('Is it working?')

Let's see how that works if we have more than one line in the file.


In [None]:
with open('hello.txt', 'w') as f:
    print("Hello one!", file=f)
    print("Hello two!", file=f)
    print("Hello three!", file=f)

In [None]:
lines = []
with open('hello.txt', 'r') as f:
    for line in f:
        lines.append(line)

In [None]:
lines

There we go, each of our lines now ends with a `\n`. When we for
loop over the file it's divided into lines based on where the `\n`
characters are, not based on how we printed to it.

But how to get rid of that `\n`? 

Strings have a function called `rstrip` which removes a character from the right side of the string.


In [None]:
stripped = []
for line in lines:
    stripped.append(line.rstrip('\n'))


In [None]:
stripped

The code above can be written using a **list comprehension** as well

In [None]:
[line.rstrip('\n') for line in lines]

There's only one confusing thing about reading files. If we try
to read the same file object twice we'll find out that it only gets read
once:

In [None]:
first = []
second = []
with open('hello.txt', 'r') as f:
    for line in f:
        first.append(line)
    for line in f:
        second.append(line)


In [None]:
first

In [None]:
second

File objects remember their position. When we tried to read the
file again it was already at the end, and there was nothing left
to read. But if we open the file again, we get a new file object that
is in the beginning and everything works.


In [None]:
first = []
second = []
with open('hello.txt', 'r') as f:
    for line in f:
        first.append(line)

with open('hello.txt', 'r') as f:
    for line in f:
        second.append(line)

In [None]:
first

In [None]:
second

Usually it's best to just read the file once, and use the
content we have read from it multiple times.

If we need all of the content as a string, we can use [the read
method](https://docs.python.org/3/library/io.html#io.TextIOBase.read).


In [None]:
with open('hello.txt', 'r') as f:
    full_content = f.read()


In [None]:
full_content

We can also open full paths, like `open('/home/me/myfile.txt', 'r')`.


## Examples

This program prints the contents of files:


In [None]:
while True:
    filename = input("Filename or path, or nothing at all to exit: ")
    if filename == '':
        break

    with open(filename, 'r') as f:
        # We could read the whole file at once, but this is
        # faster if the file is very large.
        for line in f:
            print(line.rstrip('\n'))


This program stores the user's username and password in a file.
Plain text files are definitely not a good way to store usernames
and passwords, but this is just an example.


In [None]:
# Ask repeatedly until the user answers 'y' or 'n'.
while True:
    answer = input("Have you been here before? (y/n) ")
    if answer == 'Y' or answer == 'y':
        been_here_before = True
        break
    elif answer == 'N' or answer == 'n':
        been_here_before = False
        break
    else:
        print("Enter 'y' or 'n'.")

if been_here_before:
    # Read username and password from a file.
    with open('userinfo.txt', 'r') as f:
        username = f.readline().rstrip('\n')
        password = f.readline().rstrip('\n')

    if input("Username: ") != username:
        print("Wrong username!")
    elif input("Password: ") != password:
        print("Wrong password!")
    else:
        print("Correct username and password, welcome!")

else:
    # Write username and password to a file.
    username = input("Username: ")
    password = input("Password: ")
    with open('userinfo.txt', 'w') as f:
        print(username, file=f)
        print(password, file=f)

    print("Done! Now run this program again and select 'y'.")


# Modules

Let's say we want to generate a random number between 1 and 3. The random module is a really easy way to do this:


In [None]:
import random


In [None]:
random.randint(1, 3)


In [None]:
random.randint(1, 3)


In [None]:
random.randint(1, 3)


In [None]:
random.randint(1, 3)


That's cool... but how does that work?



# What are modules?

The first line in the example, import random, was an import statement. But what is that random thing that it gave us?

The first line in the example, `import random`, was an
**import statement.** But what is that random thing that it
gave us?


In [None]:
random

So it's a module, and it comes from a path... but what does
all that mean?

Now open the folder that contains your `random.py` is. On my
system it's `/usr/lib/python3.6`. 

You'll see a bunch of files and a few directories in the folder
that opens.

All of these `.py` files can be imported like we just imported
`random.py`. In random.py, there's a line like `randint = something`,
so we can use its randint variable with `random.randint` after
importing it.

You're probably wondering how a computer can generate random numbers.
The random module does different things on different operating systems,
but on most systems it reads random noise that several programs on the
computer produce and creates random numbers based on that.

## Where do modules come from?

Create a `random.py` file with the following content:

```python
import random

print("A random number between 1 and 3:", random.randint(1, 3))
```
Now run the program by going to the command line, `cd` into the directory where the gile was created and type: 

```
python3 random.py
```

This is what we get - 
```python
Traceback (most recent call last):
  File "random.py", line 1, in <module>
    import random
  File "/home/akuli/random.py", line 4, in <module>
    print("A random number between 1 and 3:", random.randint(1, 3))
AttributeError: 'module' object has no attribute 'randint'
```

But what was that? Why didn't it work?


What the heck? It's a module called random... but it's not the
`random.py` we thought it was. **Our** `random.py` has imported
itself!

So let's go ahead and rename our file from `random.py` to
something like `ourrandom.py` and try again:

```
A random number between 1 and 3: 3
```

There we go, now we don't have our own `random.py` so it works.

So seems like that modules can be imported from the directory that
our Python file is in, and also from the directory that the real
`random.py` is in. But where else can they come from?


There's a module called **sys** that contains various things built
into Python. Actually the whole module is built-in, so there's no
`sys.py` anywhere. The sys module has a list that contains all
places that modules are searched from:



In [None]:
import sys

sys.path

So that's where my Python finds its modules. The first thing in my
sys.path is an empty string, and in this case it means the current
working directory.

## Caching modules

Let's create a file called `hello.py` that contains a classic greeting:
```
print("Hello World!")
```

Let's go ahead and import it, and see how it works.

In [None]:

import hello

Works as expected, but what happens if we try to import it again?


In [None]:
import hello

Nothing happened at all.

The reason why the module wasn't loaded twice is simple. In a
large project with many files it's normal to import the same
module in many files, so it gets imported multiple times. If
Python would reload the module every time it's imported,
dividing code to multiple files would make the code run slower.

If we need to load the module again we can just exit out of Python and
launch it again.


## Brief overview of the standard library

The **standard library** consists of modules that Python comes
with. Here's a very brief overview of what it can do. All of
these modules can also do other things, and you can read more
about that in the official documentation.

### Random numbers

The official documentation is
[here](https://docs.python.org/3/library/random.html).



In [None]:
import random

In [None]:
random.randint(1, 3)      # 1, 2 or 3

In [None]:
colors = ['red', 'blue', 'yellow']
random.choice(colors)     # choose one color

In [None]:
random.sample(colors, 2)  # choose two different colors


In [None]:
random.shuffle(colors)    # mix the color list in-place
print(colors)

### Things that are built into Python

The module name "sys" is short for "system", and it contains things
that are built into Python. The official documentation is
[here](https://docs.python.org/3/library/sys.html).

`sys.stdin`, `sys.stdout` and `sys.stderr` are file objects, just like the file objects that `open()` gives us.



In [None]:
import sys

print("Hello!", file=sys.stdout)  # this is where prints go by default


In [None]:
print("Hello!", file=sys.stderr)  # use this for error messages

In [None]:
line = sys.stdin.readline()  # type hello and press enter

In [None]:
# information about Python's version, behaves like a tuple
sys.version_info

In [None]:
sys.version_info[:3] 

In [None]:
sys.exit()  # exit out of Python


### Mathematics

There's no math.py anywhere, math is a built-in module like
sys. The official documentation is
[here](https://docs.python.org/3/library/math.html).


In [None]:
import math

In [None]:
math

In [None]:
math.pi # approximate value of π

In [None]:
math.sqrt(2)             # square root of 2

In [None]:
math.sin(math.pi/2)      # sin of 90 degrees or 1/2 π radians

### Time-related things

The official documentation for the time module is
[here](https://docs.python.org/3/library/time.html).


In [None]:
import time

In [None]:
time.sleep(1)   # wait one second

In [None]:
time.time()     # return time in seconds since beginning of the year 1970

In [None]:
time.strftime('%d.%m.%Y %H:%M:%S')  # format current time nicely

You are probably wondering how `time.time()` can be used and why its
timing starts from the beginning of 1970. `time.time()` is useful for
measuring time differences because we can save its return value to a
variable before doing something, and then afterwards check how much it
changed.

### Operating system related things
The module name "os" is short for "operating system", and it contains
handy functions for interacting with the operating system that Python
is running on. The official documentation is
[here](https://docs.python.org/3/library/os.html).


In [None]:
import os

In [None]:
os.getcwd()        # short for "get current working directory"

In [None]:
os.mkdir('stuff')  # create a folder, short for "make directory"

In [None]:
os.path.isfile('hello.txt')  # check if it's a file


In [None]:
os.path.isfile('stuff')

In [None]:
os.path.isdir('hello.txt')   # check if it's a directory

In [None]:
os.path.isdir('stuff')

In [None]:
os.path.exists('hello.txt')  # check if it's anything

In [None]:
os.path.exists('stuff')

In [None]:
 # this joins two paths by '/' on Mac
path = os.path.join('stuff', 'hello-world.txt')
print(path)

In [None]:
with open(path, 'w') as f:
    # now this goes to the stuff folder we created
    print("Hello World!", file=f)

In [None]:
os.listdir('stuff') # create a list of everything in stuff

## Examples

#### Mix a list of things.


In [None]:
import random

print("Enter things to mix, and press Enter without typing",
      "anything when you're done.")
things = []
while True:
    thing = input("Next thing: ")
    if thing == "":
        break
    things.append(thing)

random.shuffle(things)

print("After mixing:")
for thing in things:
    print(thing)



#### Measure how long it takes for the user to answer a question.
The `{.2f}` rounds to 2 decimals, and you can find more formatting
tricks [here](https://pyformat.info/).


In [None]:
import time

start = time.time()
answer = input("What is 1 + 2? ")
end = time.time()
difference = end - start

if answer == '3':
    print("Correct! That took {:2f} seconds.".format(difference))
else:
    print("That's not correct...")


### Wait a given number of seconds.


In [None]:
import sys
import time


answer = input("How long do you want to wait in seconds? ")
waitingtime = float(answer)
if waitingtime < 0:
    print("Error: cannot wait a negative time.", file=sys.stderr)
    sys.exit(1)

print("Waiting...")
time.sleep(waitingtime)
print("Done!")


### Check what a path points to.

In [None]:
import os
import sys

print("You are currently in %s." % os.getcwd())

while True:
    path = input("A path, or nothing at all to quit: ")
    if path == '':
        # We could just break out of the loop, but I'll show how
        # this can be done with sys.exit. The difference is that
        # break only breaks the innermost loop it is in, and
        # sys.exit ends the whole program.
        sys.exit()
    if os.path.isfile(path):
        print("It's a file!")
    elif os.path.isdir(path):
        print("It's a folder!")
    elif os.path.exists(path):
        # i have no idea when this code would actually run
        print("Interesting, it exists but it's not a file or a folder.")
    else:
        print("I can't find it :(", file=sys.stderr)


## More modules!

Python's standard library has many awesome modules and I just
can't tell about each and every module I use here. Here's some of
my favorite modules from the standard library. Don't study them
one by one, but look into them when you think you might need them.
When reading the documentation it's usually easiest to find what
you are looking for by pressing Ctrl+F in your web browser, and
then typing in what you want to search for.

- [argparse](https://docs.python.org/3/howto/argparse.html):
    a full-featured command-line argument parser
- [collections](https://docs.python.org/3/library/collections.html),
    [functools](https://docs.python.org/3/library/functools.html) and
    [itertools](https://docs.python.org/3/library/itertools.html):
    handy utilities
- [configparser](https://docs.python.org/3/library/configparser.html):
    load and save setting files
- [csv](https://docs.python.org/3/library/csv.html):
    store comma-separated lines in files
- [json](https://docs.python.org/3/library/json.html):
    yet another way to store data in files and strings
- [textwrap](https://docs.python.org/3/library/textwrap.html):
    break long text into multiple lines
- [warnings](https://pymotw.com/3/warnings/):
    like [exceptions](exceptions.md), but they don't interrupt the
    whole program
- [webbrowser](https://pymotw.com/3/webbrowser/):
    open a web browser from Python

I also use these modules, but they don't come with Python so you'll
need to install them yourself if you want to use them:

- [appdirs](https://github.com/activestate/appdirs):
    an easy way to find out where to put setting files
- [requests](http://docs.python-requests.org/en/master/user/quickstart/):
    an awesome networking library

I recommend reading [the official documentation about installing
modules](https://docs.python.org/3/installing/). If you're using
GNU/Linux also read the "Installing into the system Python on Linux"
section at the bottom.

## Summary

- Most modules are files on our computers, but some of them are built
    in to Python. We can use modules in our projects by importing them,
    and after that using `modulename.variable` to get a variable from
    the module.
- Some of the most commonly used modules are random, sys, math, time
    and os.
- Avoid creating `.py` files that have the same name as a name of a
    module you want to use.
- Python comes with many modules, and we can install even more modules
    if we want to.
