<h1 id="tocheading">Table of Contents</h1>
<div id="toc"></div>

In [92]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

<IPython.core.display.Javascript object>

# From python, access scripts and files from other folders on my computer
Author: Patricia Schuster  
Affiliation: University of Michigan  
Date: January 2017

# Goal

In the process of developing my data analysis scripts, I want to version control those scripts in a git repository on my computer, but operate on data in a different folder on my computer. Thus, I need a way to run python scripts from other folders on my computer.

My workplan will be:  
- Editing scripts / version-controlled-scripts  
- Run python instance from data folder  
- Add the scripts folder to the current system path so that I can import scripts to my current session of python

In [9]:
import sys
import os

# Access files beyond your working directory
Wherever you are running your current instance of python. Can change it using the `os` module.

If I want to access files outside of the current working directory, I can:

1. Use relative paths. By default it creates paths relative to the current working directory. I must always launch Python from within the same directory. This would work well if I have the same folder structure on different computers so the absolute paths differ but the relative paths are the same.
2. Or I can change the working directory during the data analysis. That can get messy because I have to keep track of what directory I am in.
3. A third and more stable option is to use the full path for any file in another directory I am trying to access. This makes the code successful regardless of what the working directory is. However, it may present problems later if I move my documents from one computer to another or from one parent directory to another.

Let's try each of these.

First, what is the current working directory?

In [48]:
# (Make sure to use import os above)
os.getcwd()

'C:\\Users\\pfsch\\Box Sync\\tutorial\\python\\2017_01_28_python_access_other_folders'

In [49]:
# Store this to a variable
starting_path = os.getcwd()

In [14]:
# What files are in my current directory?
os.listdir(os.getcwd())

['.ipynb_checkpoints', 'python_access_other_folders.ipynb']

## 1. Use a relative path:

In [30]:
# What is the path of the parent directory?
# Relative to current working directory
os.path.pardir

'..'

In [32]:
# Relative to the absolute path of the current working directory
os.path.join(os.getcwd(),os.path.pardir)

'C:\\Users\\pfsch\\Box Sync\\tutorial\\python\\2017_01_28_python_access_other_folders\\..'

In [34]:
# Print the contents of the parent directory
os.listdir(os.path.pardir)

['2017_01_28_jupyter_table_of_contents',
 '2017_01_28_python_access_other_folders',
 'python-tutorial-plotting']

In [46]:
# Alternately, use relative path `..`
os.listdir('..')

['2017_01_28_jupyter_table_of_contents',
 '2017_01_28_python_access_other_folders',
 'python-tutorial-plotting']

In [40]:
# What is in the first folder of the parent directory?
print('Folder to investigate: ', os.listdir(os.path.pardir)[0])
print('Folder contents:', os.listdir(os.path.join(os.path.pardir,os.listdir(os.path.pardir)[0])))

# Charlie says I should store some of these to variables to make it simpler. Nah...

Folder to investigate:  2017_01_28_jupyter_table_of_contents
Folder contents: ['ipython_notebook_toc.js']


In [42]:
# What is the absolute path of a folder, given its relative path
os.path.abspath(os.path.join(os.path.pardir,os.listdir(os.path.pardir)[0]))

'C:\\Users\\pfsch\\Box Sync\\tutorial\\python\\2017_01_28_jupyter_table_of_contents'

In [47]:
os.path.join(os.path.pardir,'.')

'..\\.'

## 2. Change the current working directory
It some cases it may be easier simply to switch into another folder. That can be accomplished using either absolute or relative paths. 

Starting with **absolute paths**.

In [61]:
os.getcwd()

'C:\\Users\\pfsch\\Box Sync\\tutorial\\python\\2017_01_28_python_access_other_folders'

In [62]:
os.chdir(r'C:\Users\pfsch\Box Sync')
os.getcwd()

'C:\\Users\\pfsch\\Box Sync'

In [64]:
# Go back to the folder we started in
os.chdir(starting_path)
os.getcwd()

'C:\\Users\\pfsch\\Box Sync\\tutorial\\python\\2017_01_28_python_access_other_folders'

Now try with **relative paths**

In [65]:
os.chdir('..\..')
os.getcwd()

'C:\\Users\\pfsch\\Box Sync\\tutorial'

In [66]:
os.chdir('python')
os.getcwd()

'C:\\Users\\pfsch\\Box Sync\\tutorial\\python'

In [68]:
# Go back to the starting folder
os.chdir(starting_path)

## 3. Use absolute paths
This one is simpler, but more susceptible to breaking if you move your folder around on your computer or between computers. Regardless, here you go:

In [70]:
os.getcwd()

'C:\\Users\\pfsch\\Box Sync\\tutorial\\python\\2017_01_28_python_access_other_folders'

Try opening a text file I have elsewhere on my machine. I'll look at the readme markdown file for my resources folder.

In [76]:
f = open(r'C:\Users\pfsch\Box Sync\resources\readme.md','r')
for line in f:
    print(line)

# Resources folder



My plan is to keep the resources folder full of content that I did not produce.

Articles, books, theses by other authors. Not my own.

Journal articles organized in Mendeley



Installation information for my computer



That looks successful, and I did not have to change directory. 

What about using a path that is half absolute and half relative? This could be accomplished by using a path that is relative to my home directory. This way, if I install my Box Sync folder in my home directory on multiple computers, the paths will not break.

In [78]:
os.path.expanduser('~')

'C:\\Users\\pfsch'

In [86]:
home_dir = os.path.expanduser('~')
os.chdir(os.path.join(home_dir,r'Box Sync\resources\templates\python'))
os.getcwd()

'C:\\Users\\pfsch\\Box Sync\\resources\\templates\\python'

# Managing modules and scripts

## Import a .py module from another folder: sys.path
`sys.path` is the source lookup path where Python searches for modules that you use import using `import module_name`. 

In [2]:
sys.path

['',
 'C:\\Users\\pfsch\\Anaconda3\\python35.zip',
 'C:\\Users\\pfsch\\Anaconda3\\DLLs',
 'C:\\Users\\pfsch\\Anaconda3\\lib',
 'C:\\Users\\pfsch\\Anaconda3',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\Sphinx-1.4.6-py3.5.egg',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\win32',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\win32\\lib',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\Pythonwin',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\setuptools-27.2.0-py3.5.egg',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\IPython\\extensions',
 'C:\\Users\\pfsch\\.ipython']

This is a list of paths. Add the new path in a different location using `sys.path.append(r'path')`. The `r` reads it in as a raw string, and adds extra backslashes to the path. Normally in strings, a `\ ` is used to negate a special character, so in paths, python uses double slashes `\\`. Documentation on raw strings [here](https://docs.python.org/3.3/library/re.html#raw-string-notation). 

Note: Windows uses back slashes `\ `, while Linux uses forward slashes `/`. In Linux, you don't have to worry about negating your backslahes because directory paths use a different character- the forward slash.

For instance, connect to the directory where I keep my UROP_DNNG scripts: `C:\Users\pfsch\Box Sync\Projects\urop\GitHub_repo\UROP_DNNG`

In [6]:
new_path = r'C:\Users\pfsch\Box Sync\Projects\urop\GitHub_repo\UROP_DNNG'
# Add if logic so that I don't add it numerous times
# Even if I run this cell more than once, it won't add the path more than once
if new_path not in sys.path:
    sys.path.append(new_path)

In [7]:
sys.path

['',
 'C:\\Users\\pfsch\\Anaconda3\\python35.zip',
 'C:\\Users\\pfsch\\Anaconda3\\DLLs',
 'C:\\Users\\pfsch\\Anaconda3\\lib',
 'C:\\Users\\pfsch\\Anaconda3',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\Sphinx-1.4.6-py3.5.egg',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\win32',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\win32\\lib',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\Pythonwin',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\setuptools-27.2.0-py3.5.egg',
 'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\IPython\\extensions',
 'C:\\Users\\pfsch\\.ipython',
 'C:\\Users\\pfsch\\Box Sync\\Projects\\urop\\GitHub_repo\\UROP_DNNG']

## Find the path of a module
A handy built-in capability for finding the path where a module's raw source code is located: If I import a module called `new_module`, I can use the command `new_module.__file__` to find where the source code is located on my machine. Try this out with the `os` module:

In [87]:
os.__file__

'C:\\Users\\pfsch\\Anaconda3\\lib\\os.py'

In [90]:
import numpy as np
np.__file__

'C:\\Users\\pfsch\\Anaconda3\\lib\\site-packages\\numpy\\__init__.py'

Imagine you write a module that depends on static text files. I want the module to know how to find those files independent of where it's deployed (assuming it has the same directory structure). Use that file attribute `new_module.__file__` as the base path for all relative paths. 

## Future option: Pip packages
In the future, when my packges are stable, I can install the module to python using pip. In this case, I would not have to add the path that holds my scripts in each instance of python. Pip would install the module into the `site-packages` directory which is part of `sys.path`.

If I later needed to modify a package that I had already installed using pip, I would have to reinstall it and pip would overwrite the previous version.

This is the same logic as using `pip install numpy` or any other similar package.

In order to do this, I would have to look up `setup.py python` to figure out how to package up my code for pip to operate on.