# Files and Directories

Your computer is full of files. These files are contained in different folders. Often times, you will need to move, copy, rename, create, or delete these files and directories. Python provides a number of functions for doing this. 

Even if you don't realize it, your notebook is always "running" in a particular folder on your computer. This always starts as the folder that the file is saved in. You can use one of the IPython "magic" functions, `%pwd` to see what your "Present Working Directory" is. Just run the cell below. 

In [None]:
%pwd

## File Paths

You may be unfamiliar with the way this prints out. The idea is, your folders are organized in a hierarchical structure. You have 1 big "root" folder, with subfolders inside of it, and subfolders inside of that. In Windows, if you click on the Start button, then "Computer", you can double-click on the C drive, which is the "root" folder of your system. But the system can have other drives, like the R drive, which we've been using for this class.  

On Mac, you navigate through your folders using the Finder. If you open the Finder, it might start in a different folder (like your Home folder, which matches your username). You might have a link on the left that points to the highest level. It will be something like "Bob's Macbook Pro". If you click on that, then you will see "Macintosh HD" which is your hard drive. This is the "root" folder for your system. Here's another trick: whatever folder you're in, you can move "up" in the hierarchy by pressing Cmd+[up arrow]. Just keep doing it until you see the link to the Macintosh HD. 

Each time you double-click on a folder name, you are moving inside of that folder, and seeing what's inside of it. This will usually be a mixture of files and other folders. You can represent your current location in this hierarchy using a **path**. It's a way to show you where you are in the system, starting from the root directory. Each folder name is separated by a slash. In Windows, this is a backslash \, in Mac and Linux, it's a forward slash /. So, if my path is: 

**R:\Psy407_9\Class_Data\homework**

I can navigate to that folder by clicking the Start button, then "R", "Psy407_9", "Class_Data", then "homework"

On a Mac, your path might look like:

**/Volumes/Psy407_9/Class_Data/homework**

On Windows, all paths start with a drive as the starting point. C is the default where everything is kept, and each drive connected to the computer is given another letter (like R). 

On Mac, **all** paths start with the root folder / . If you connect a different hard drive, or connect to a remote drive (like you do for the class), then those are kept under the Volumes subfolder under /. When you are clicking around in the Finder, you can't see the "Volumes" folder, because it hides it, but it's there. 


## Listing files and Changing Folders in Python

Now let's see what's inside of our present working directory. If we just want to look at them (and not use them for our script), we can use the `%ls` magic function. This "lists" everything inside of the current folder. Run the cell below, then outside of Jupyter try to navigate to the same folder and see if it is listing the same files. Notice that any folders are printed in blue and have a slash by their name. 

In [None]:
%ls

Now let's move our working directory to a different folder. Let's try to move to **R:\Psy407_9\Class_Data\homework**.

This can be accomplished using the `os` package (it stands for Operating System). There is a function called `chdir` that allows you to change directories. 

In [None]:
import os

os.chdir('R:\Psy407_9\Class_Data\homework')
%pwd


## Folder shortcuts: `.` and `..`

There are 2 useful shortcuts for moving between folders. Whatever folder you are currently in, you can represent it using the `.` character. Your system understands this as "my current folder". So, if you want to start in the current folder, and go to a subfolder called "answers", you can type: 

```python
os.chdir('./answers')

```
This means "start wherever I happen to be, and move to the subfolder "answers". The benefit of this is that it is portable. If I copy this whole folder to a different location, like the C drive instead of the R drive, then it still works. If I moved the files then tried to move to "R:\Psy407_9\Class_Data\homework\answers" then it wouldn't work. 


Another useful shortcut is `..`, which translates to "move 1 folder up from wherever I am now". So if we want to go *back* up to R:\Psy407_9\Class_Data\homework, then all we have to do is: 

```python
os.chdir('..')

```

You can chain the `..` characters together to keep moving up through the hierarchy. So if you want to move up by 3 folders, you can do: 

```python
os.chdir('../../../')

```

Run the cells below and see if you understand what's going on. 

In [None]:
os.chdir('R:\Psy407_9\Class_Data\homework')
%pwd



In [None]:
os.chdir('./answers') #go to the subfolder
%pwd

In [None]:
os.chdir('..') #go back where you started
%pwd

In [None]:
os.chdir('../../') #now go up TWO levels 
%pwd


The nice thing about paths is that they're just strings! We can then construct paths using different string functions, and direct Python wherever we want. There are a number of functions inside the `os.path` package that make this convenient, although we could do it using things like `split` and `join` if we wanted. 

Notice that the argument we give `os.chdir` is just a string. That string happens to be a path. Remember, wherever we use a string or number, we can use a variable instead!

In [None]:
mypath = 'R:\Psy407_9\Class_Data\homework'

os.chdir(mypath)
%pwd

The `join` function from `os.path` is great for creating a file path based on a bunch of folder and file names. It inserts the appropriate slash characters between them (and will change to forward or backslash if you're on a mac or pc). 

In [None]:

mypath = os.path.join('R:\\','Psy407_9','Class_Data','homework','answers')
print mypath

os.chdir(mypath)
%pwd

We can check if a file or directory exists using `os.path.exists`. This is great for `if` statements. 

In [None]:
os.path.exists(mypath)

There are a number of other useful functions, which are listed below, and on this page: <https://docs.python.org/2/library/os.path.html> . I will let you study these on your own

In [None]:

print os.path.split(mypath) #useful for separating a file from the folder it is in
print os.path.basename(mypath) #a more straightforward method of doing the same thing
print os.path.isdir(mypath)
print os.path.isfile(mypath)


### Saving File and directory names into a list

The `glob` function from the `glob` package is great if you want to list the files in the current working directory and save the names of those files into a Python list. Why would you want to do this? Well, maybe you have a folder full of text files, and you want to loop through each one and read it into Python. 

`glob` can use a wildcard character `*`, which means "anything". For instance, if we wanted to list all files in the current folder and save them as a list, we would just do: 

In [None]:
from glob import glob #otherwise we have to say glob.glob every time

allfiles= glob('*') #list all files

allfiles

If we wanted to list all files and directories that start with the letter h, we could do it like this: 

In [None]:
hfiles = glob('h*') #lowercase h, followed by anything else
hfiles

This is great if you only want certain file types. File types are specified by the file extension (.txt, .csv, .ipynb, .docx, and so on). So, if we wanted to see just the jupyter notebooks in the current directory, we would search for anything that ends in `.ipynb`

In [None]:
notebooks = glob('*.ipynb')
notebooks

The output is just a list, and each element in the list is a string. There is nothing special about this output, except it corresponds to filenames. We can then use that information to load files into python. 

Let's say we want to list all .txt files and load one of them in. Before, we always specified a filename based on a string, but remember, we can use a variable anywhere we use a string: 

In [None]:

os.chdir('./datasets/') #move to datasets folder
txtfiles = glob('*.txt') #list all txt files
print txtfiles #print all of them

print txtfiles[0] #print just the first



In [None]:

with open(txtfiles[0]) as f: #take the first text file and read it in
    lines = f.readlines()
    
    
print lines[:10] #print out the first 10 lines



The `%pwd` function is nice for telling you what folder you're in, but you can't use that information in your script. If you want to save the current working directory as a string, so you can use it, then use the function `os.getcwd`

In [None]:
current_path = os.getcwd()

print current_path

The nice thing, is we can use this with `os.path.join` to create a full path to one of our text files:

In [None]:

print os.path.join(current_path,txtfiles[0])



A nice shortcut, though, is to use `os.path.abspath` to get the "absolute path" to a file

In [None]:
print os.path.abspath(txtfiles[0])