<p style="text-align:center">
PSY 341K <b>Python Coding for Psychological Sciences</b>, Fall 2018

<img src="https://www.python.org/static/community_logos/python-logo-master-v3-TM.png" alt="Python logo" width="200">
</p>

<h1 style="text-align:center"> Files </h1>

<h4 style="text-align:center"> October 11 - 18, 2018 </h4>
<hr style="height:5px;border:none" />
<p>

# 1. Directories, files, and paths
<hr style="height:1px;border:none" />

### Path

On a computer, files are organized under different directories. For example,

<img src="https://github.com/sathayas/JupyterPythonFall2018/blob/master/images/File_Folders.png?raw=true" alt="Python shell" style="width: 500px; float: center;"/>

The document **`project.docx`** is under a directory (or folder) **`Documents`**, under a directory **`asweigart`**, under a folder **`Users`**, under the drive **`C:\`**. The file and all the directories leading up to the file can be addressed by
```
C:\Users\asweigart\Documents\project.docx
```
This notation is known as a **file path**. This type of path is used by Windows computers. For Mac OS X and Linux, let’s say the file `project.docx` is located under a directory **`Documents`**, under a directory **`hayasaka`**, under a directory **`Users`**, under the root directory **`/`**. The path to this file is denoted as
```
/Users/hayasaka/Documents/project.docx
```

### Creating a path with `os.path.join()`

On Windows, a backslash (`\`) is used between directories on a path, whereas a slash (`/`) is used on Mac OS X and Linux computers. If you are writing a Python program that can be ran on any platform, then we can use **`os.path.join()`** function to create a path. The `path.join()` function is in a module **`os`**. 

Here is an example to demonstrate the use of `os.path.join()`.

`<OsPathJoin.py>`

In [1]:
# import the os module
import os

# Home directory
homeDir = 'Some_home_directory'
# Directories for subjects 
subjects = ['sub001', 'sub003', 'sub005']
# Experiment outcome data files
expData = ['congruent.txt', 'incongruent.txt', 'mixed.txt']

for iSubj in subjects:
    for iExp in expData:
        dataFullPath = os.path.join(homeDir, iSubj, iExp)
        print(dataFullPath)

Some_home_directory/sub001/congruent.txt
Some_home_directory/sub001/incongruent.txt
Some_home_directory/sub001/mixed.txt
Some_home_directory/sub003/congruent.txt
Some_home_directory/sub003/incongruent.txt
Some_home_directory/sub003/mixed.txt
Some_home_directory/sub005/congruent.txt
Some_home_directory/sub005/incongruent.txt
Some_home_directory/sub005/mixed.txt


You may have a different output, depending on your computer's OS.

### Current working directory
Your IDLE Python shell is running on the current working directory. To see at
which directory you are working on, you can use **`os.getcwd()`**. 

In [2]:
os.getcwd()

'/Users/sh45474/Documents/Teaching/Python_Fall_2018/Notes'

When you run a code in IDLE, the current working directory is automatically
changed to the directory where your code is saved.

### Exercise
1. Using `os.getcwd()`, find out which directory you are working on right now on the Python shell.

### Absolute and relative paths

<img src="https://github.com/sathayas/JupyterPythonFall2018/blob/master/images/File_Paths.png?raw=true" alt="Python shell" style="width: 500px; float: center;"/>


To change a the current working directory, you can use **`os.chdir()`** function.
Say, if I want to change my Python shell to work on a directory named
`images`, then

In [3]:
os.chdir('images')
os.getcwd()

'/Users/sh45474/Documents/Teaching/Python_Fall_2018/Notes/images'

If I want to go up one directory, then I can do

In [4]:
os.chdir('..')
os.getcwd()

'/Users/sh45474/Documents/Teaching/Python_Fall_2018/Notes'

The notation **`'..'`** refers to the parent directory, or the directory above the
current directory. Say, under the directory `Python_Fall_2018`, there is a directory called `Codes`. To change the working directory to `Codes`, you can do one of the following:

In [5]:
os.chdir('../Codes')
os.getcwd()

'/Users/sh45474/Documents/Teaching/Python_Fall_2018/Codes'

Or

In [10]:
os.chdir('/Users/sh45474/Documents/Teaching/Python_Fall_2018/Codes/')
os.getcwd()

'/Users/sh45474/Documents/Teaching/Python_Fall_2018/Codes'

The first one is an example of a ***relative path***, whereas the latter is an example of an ***absolute path***. A *relative path* describes the location of a file or a directory relative from the current working directory, whereas an *absolute path* describes the location without any reference to a particular directory. Conceptually you can think
  * Relative path -- directions to get to a desired location
  * Absolute path -- the physical address of a desired location
  
### Exercise
1. Change the current working directory to its parent directory, using a
relative path.
2. Change the current working directory again to its parent directory (or the
grandparent directory of where you started), using a relative path.
3. Go back to the directory you started (recall the earlier exercise), using a
relative path.
4. Change the current working directory to its parent directory, using an
absolute path.
5. Change the current working directory again to its parent directory (or the
grandparent directory of where you started), using an absolute path.
6. Go back to the directory you started (recall the earlier exercise), using an
absolute path.

# 2. Reading a file
<hr style="height:1px;border:none" />

### Preparation
Before we get into this topic, please download the file [**`TestData.zip`**](https://github.com/sathayas/PythonClassFall2018/blob/master/fileExamples/TestData.zip) under
**`fileExamples`** in the course **GitHub** page and save them in the directory where you save your codes. Then upzip the file. That should create a directory called **`TestData`** containing test data sets for this module.

If you have cloned my note repository or my code repository from GitHub, then the `TestData` directory is already available in those repositories. 

### Reading one line of file
To read a file `SingleLineData.txt`, which contains one line of data, you can do the
following:

`<ReadOneLine.py>`

In [14]:
import os

# first, open the file
f = open(os.path.join('TestData','SingleLineData.txt'), 'r')
# read a line, then put in a variable called line
line = f.readline()
# always close the file you opened
f.close()

To read a file, first you need to open the file by the **`open()`** function. You supply the file name (with a path if not located in the current directory), and the mode **`'r'`** for reading.
```python
f = open(os.path.join('TestData','SingleLineData.txt'), 'r')
```
The `open()` function returns a file object (in this case, called **`f`**). You can name the file object with whatever the name you would like, just the way you name a
variable. Once the file is open, then you can read one line of data with the
**`readline()`** method for the file object.
```python
line = f.readline()
```
Here, the variable called **`line`** is storing the content of the read line. At this point, you are done reading the file, so you must close the file with the **`close()`** method.
```python
f.close()
```
Now you can take a look at what was read from the file on the Python shell.

In [15]:
line

'  1  35  98  '

It seems like the string line has spaces at the beginning and at the end. The
**`strip()`** method for a string removes these spaces.

In [16]:
line.strip()

'1  35  98'

Now, you want to split these numbers from a single string into individual
numbers.

In [17]:
line.strip().split()

['1', '35', '98']

Notice that these numbers are still strings. So you need to convert them into
integers with the `int()` function. You can write a for loop to do that. Or, you
can do this.

In [18]:
inString = line.strip().split()
inNumber = [int(i) for i in inString]
inNumber

[1, 35, 98]

So the expression 
```python
inNumber = [int(i) for i in inString] 
```
is equivalent to
```python
inNumber = []
for i in inString:
    inNumber.append(int(i))
```
It is a short-hand notation for Python. At this point, you can put these numbers to
different variables. For example,

In [19]:
trial = inNumber[0]
onsetTime = inNumber[1]
respTime = inNumber[2]

### Exercise
1. **Read one-line data**. Write a program to read a data file
`ExerciseData_OneLine.txt` under `TestData` directory. Upon reading the file, store the data into the following lists:
  * First set of 3 numbers to a list `trial`
  * Second set of 3 numbers to a list `onsetTime`
  * Third set of 3 numbers to a list `respTime`
  * Last set of 3 numbers to a list `score`
2. **State list**. Write a program to read a data file `StateList.txt` under `TestData` directory, which includes the names of 50 states separated by a comma. Then print out a list of states.

### Reading a file with multiple lines
There are at least three different approaches to read a file with multiple lines.
You can read everything in the file at once by the **`read()`** method for a file
object.

`<ReadWithRead.py>`

In [20]:
import os

# first, open the file
f = open(os.path.join('TestData','MultiLineData.txt'), 'r')
# read the entire file content into a string variable fileData
fileData = f.read()
# always close the file you opened
f.close()

This will result in a string variable containing the entire content of a file.

In [21]:
fileData

'  1  35  98  \n  2  45  102  \n  3  55  101  \n  4  65  89  \n  5  75  93  '

You can split lines by the `split()` method.

In [22]:
fileData.split('\n')

['  1  35  98  ',
 '  2  45  102  ',
 '  3  55  101  ',
 '  4  65  89  ',
 '  5  75  93  ']

Or, you can read the content of a file into a list, with each item being each line.
To do this, we can use the **`readlines()`** method

`<ReadWithReadlines.py>`

In [23]:
import os

# first, open the file
f = open(os.path.join('TestData','MultiLineData.txt'), 'r')
# read the file into a list called fileData.
fileData = f.readlines()
# always close the file you opened
f.close()

You can see that lines are broken up into a list.

In [24]:
fileData

['  1  35  98  \n',
 '  2  45  102  \n',
 '  3  55  101  \n',
 '  4  65  89  \n',
 '  5  75  93  ']

Finally, you can read one line at a time using the **`readline()`** method.

`<ReadLineByLine.py>`

In [25]:
import os

# first, open the file
f = open(os.path.join('TestData','MultiLineData.txt'), 'r')
# initializing lists
trial = []
onsetTime = []
respTime = []
# reading one line at a time, then processing
line = f.readline()
while line:
    inString = line.strip().split()
    inNumber = [int(i) for i in inString]
    trial.append(inNumber[0])
    onsetTime.append(inNumber[1])
    respTime.append(inNumber[2])
    # read the next line
    line = f.readline()
    
# always close the file you opened
f.close()

Notice in the `while` loop, **`while line:`** is **True** when the variable `line` is not
empty. In this case, each line is processed right after it is read, and its values are
placed into appropriate lists. This will continue until an empty line (indicating the
end of the file) is read.

In [26]:
trial

[1, 2, 3, 4, 5]

In [27]:
onsetTime

[35, 45, 55, 65, 75]

In [28]:
respTime

[98, 102, 101, 89, 93]

### Exercise
1. **State capitals**. Write a program to read a data file
`StateCapitalList.txt` under `TestData` directory. In this file, each line lists a state and its capital separated by a comma. Then print out state capitals and states in the following format:
```
Montgomery (Alabama)
Juneau (Alaska)
...
Cheyenne (Wyoming)
```

# 3. Content of a directory
<hr style="height:1px;border:none" />

You can get file names in a directory directly using the **`os.listdir()`** function, which returns the content of the directory as a list.

In [29]:
import os

os.listdir()

['.git',
 '.gitignore',
 '.ipynb_checkpoints',
 'Dictionary.ipynb',
 'File.ipynb',
 'For.ipynb',
 'Function.ipynb',
 'HelloWorld.ipynb',
 'If.ipynb',
 'images',
 'Installation.ipynb',
 'List.ipynb',
 'Operators.ipynb',
 'README.md',
 'String.ipynb',
 'TestData',
 'While.ipynb']

Without any input, `os.listdir()` prints out the content of the current directory.
You can also specify another directory (either by a relative or an absolute path).

In [30]:
os.listdir('TestData')

['ExerciseData_OneLine.txt',
 'MultiLineData.txt',
 'SingleLineData.txt',
 'StateCapitalList.csv',
 'StateCapitalList.txt',
 'StateList.txt',
 'StudySubjects.csv']

# 4. Formatting numbers
<hr style="height:1px;border:none" />

Remember from some exercises earlier in the semester, a simple calculation in
Python may produce a number with many decimal places. For example,

In [31]:
tempF=70
tempC=(tempF-32)*5/9
print(tempC)

21.11111111111111


To limit decimal places, you can do this:

In [32]:
print('%.1f' % tempC)

21.1


In this case, the number before **`f`** inside the quotation marks sets how many
decimal places you want in your output. Then the variable `tempC` is printed with
this specification.

In [33]:
print('%.3f' % tempC)

21.111


In [34]:
print('%.5f' % tempC)

21.11111


In [35]:
print('%.10f' % tempC)

21.1111111111


You can also specify the total length of a number as well, with a number before
**`'.'`** inside the quotation marks.

`<PrintRandNum.py>`

In [36]:
import random

for i in range(3):
    a = random.random() * 300
    print('%8.3f' % a)

for i in range(3):
    a = random.random() * 300
    print('%10.3f' % a)

for i in range(3):
    a = random.random() * 300
    print('%12.3f' % a)

 101.599
 172.275
 121.342
    78.164
   169.240
   179.798
     194.987
     241.202
      25.231


If the total length is 12 but there are only 7 digits, then spaces are added at the
beginning. Alternatively, you can pad a number with leading zeros by adding a
zero after **`%`**.

In [37]:
'%04.0f' % 12

'0012'

In [38]:
'%08.0f' % 12

'00000012'

Or you can also use d, if there is no decimal place.

In [39]:
'%08d' % 12

'00000012'

In [40]:
'%08d' % 12.5

'00000012'

### Exercise
1. In the module **`math`**, there is a constant **`math.pi = 3.141592653...`**. Write expressions to print out $\pi$ with the following precision
  1. '3.14'
  2. '3.1416'
  3. '3.14159265'
2. **ID numbers & runs**. Say, in your experiment, there are 10 subjects and each subjects undergoes 3 runs of a certain experimental paradigm. Each subject is assigned an ID with the format `001`, `002`, ... `010`. And runs are identified by `01`, `02`, `03`. Write a program to print out all possible combinations of subject IDs and runs in the following format.
```
001-01
001-02
001-03
002-01
...
010-02
010-03
```

# 5. Writing a file
<hr style="height:1px;border:none" />

Say, if you want to write to a file 5-digit ID numbers, ranging from `00010` to
`00025`. The process is very similar to reading a file.

`<WriteIDs.py>`

In [41]:
f = open('IDs.txt','w')
for i in range(10,26):
    f.write('%05d' % i + '\n')
    
f.close()

The file is opened with **`'w'`** option. This means that the file will be written. Each line of a file is written by the **`write()`** method.
```python
f.write('%05d' % i + '\n')
```
The `write()` method writes a string inside the brackets. It is very similar to the
`print()` function, but you have to *manually* output a new line **`\n`**. You can
check the content of the `IDs.txt` file.
```
00010
00011
00012
. . .
00024
00025
```
You can also append additional data to a file with the **`'a'`** option with the
`write()` method. For example,

`<AppendIDs.py>`

In [42]:
f = open('IDs.txt','a')
for i in range(110,126):
    f.write('%05d' % i + '\n')
    
f.close()

This program adds ID numbers ranging from `00110` to `00125` to the existing file
`IDs.txt`.
```
00010
00011
. . .
00024
00025
00110
00111
00112
. . .
00124
00125
```

### Exercise
1. **State capitals, revisited**. Earlier you wrote a program to read a data file `StateCapitalList.txt` under `TestData` directory, a list of states and their capitals. That program printed out the state capitals in the format:
```
Montgomery (Alabama)
Juneau (Alaska)
...
Cheyenne (Wyoming)
```
Modify the program so that it writes to a file in the same format.

# 6. With statement
<hr style="height:1px;border:none" />

Instead of opening and closing a file with the `open()` and `close()` methods,
you can use a **`with`** statement to open a file, perform a necessary operation, and
close the file. A `with` statement includes a block of code to be executed while
the file is open. When the block is done, then the file is closed automatically.
Here is an example.

`<WithExample.py>`

In [43]:
import os

stateList = []
capitalList = []
with open(os.path.join('TestData','StateCapitalList.txt'),'r') as infile:
    for line in infile:
        state, capital = line.strip().split(',')
        stateList.append(state)
        capitalList.append(capital)
        print('State: ' + '%-15s' % state + '\tCapital: ' + capital)

with open(os.path.join('TestData','FormatStateList.txt'),'w') as outfile:
    for i, state in enumerate(stateList):
        outfile.write('State: ' + '%-15s' % state + '\tCapital: ' + capitalList[i])
        outfile.write('\n')

State: Alabama        	Capital: Montgomery
State: Alaska         	Capital: Juneau
State: Arizona        	Capital: Phoenix
State: Arkansas       	Capital: Little Rock
State: California     	Capital: Sacramento
State: Colorado       	Capital: Denver
State: Connecticut    	Capital: Hartford
State: Delaware       	Capital: Dover
State: Florida        	Capital: Tallahassee
State: Georgia        	Capital: Atlanta
State: Hawaii         	Capital: Honolulu
State: Idaho          	Capital: Boise
State: Illinois       	Capital: Springfield
State: Indiana        	Capital: Indianapolis
State: Iowa           	Capital: Des Moines
State: Kansas         	Capital: Topeka
State: Kentucky       	Capital: Frankfort
State: Louisiana      	Capital: Baton Rouge
State: Maine          	Capital: Augusta
State: Maryland       	Capital: Annapolis
State: Massachusetts  	Capital: Boston
State: Michigan       	Capital: Lansing
State: Minnesota      	Capital: Saint Paul
State: Mississippi    	Capital: Jackson
State: Mi

The with statement opens a file `StateCapitalList.txt` under `TestData`
directory and assigns the file object `infile` to this file. The file is read line by
line using a `for` loop.

Then a file `FormatStateList.txt` under `TestData` directory is opened as a file object `outfile`. Then, in a `for` loop, each state and its capital are written out.

By the way, you may notice the format **`'%-15s' % state`**. This means the string variable `state` is printed out as a string of length 15 (thus `15s`). This means 15 spaces are allocated for this string. If the string is shorter than 15, then the remainder is padded with spaces.

In [44]:
print('123456789012345678901234567890')
print('%15s' % state + '%15s' % capital)

123456789012345678901234567890
        Wyoming       Cheyenne


By default, string is printed out, aligned to the right. If you want it to be aligned to the left, then you add `-` (negative sign) in front of the number.

In [45]:
print('123456789012345678901234567890')
print('%-15s' % state + '%-15s' % capital)

123456789012345678901234567890
Wyoming        Cheyenne       


# 7. Reading and writing CSV files 
<hr style="height:1px;border:none" />

A popular file format for data is the **CSV (comma separated value)** format. You
can read and write in Microsoft Excel, as well as in many programming
languages. Python has a module called **`csv`** to facilitate reading and writing of a
CSV file. Here is an example of how to read a CSV file. There is a CSV file called
`StudySubjects.csv` under the `TestData` directory under `fileExamples` on
**GitHub**.

`<ReadCSV.py>`

In [46]:
import csv
import os

f = open(os.path.join('TestData','StudySubjects.csv'),'r')
reader = csv.reader(f)
for row in reader:
    print(row)

f.close()

['ID', 'Age', 'Handedness', 'AvgRT', 'AvgScore']
['003', '28', 'right', '148.41', '8.4']
['005', '24', 'right', '248.98', '12.32']
['008', '23', 'right', '185.64', '11.8']
['010', '24', 'right', '195.63', '18']
['013', '24', 'left', '192.83', '19.44']
['018', '26', 'right', '178.12', '6.52']
['020', '31', 'right', '239.09', '8.04']
['021', '32', 'left', '219.19', '13']


This program uses the **`reader()`** function under module **`csv`**. The resulting object **`reader`** is used to read a CSV file. It can be used in a `for` loop to read the content of a file, one line at a time. As you can see, each row is read as a list. Notice that the values in each list is a string. So you still need convert the data into appropriate types.

If the first line of a CSV file is a *heading* with variable names, then you
can read each line in the file as a dictionary using the **`DictReader()`** function,
instead of the **`reader()`** function.

`<ReadCSVwithDict.py>`

In [47]:
import csv
import os

f = open(os.path.join('TestData','StudySubjects.csv'),'r')
reader = csv.DictReader(f)
for row in reader:
    print(row)

f.close()

OrderedDict([('ID', '003'), ('Age', '28'), ('Handedness', 'right'), ('AvgRT', '148.41'), ('AvgScore', '8.4')])
OrderedDict([('ID', '005'), ('Age', '24'), ('Handedness', 'right'), ('AvgRT', '248.98'), ('AvgScore', '12.32')])
OrderedDict([('ID', '008'), ('Age', '23'), ('Handedness', 'right'), ('AvgRT', '185.64'), ('AvgScore', '11.8')])
OrderedDict([('ID', '010'), ('Age', '24'), ('Handedness', 'right'), ('AvgRT', '195.63'), ('AvgScore', '18')])
OrderedDict([('ID', '013'), ('Age', '24'), ('Handedness', 'left'), ('AvgRT', '192.83'), ('AvgScore', '19.44')])
OrderedDict([('ID', '018'), ('Age', '26'), ('Handedness', 'right'), ('AvgRT', '178.12'), ('AvgScore', '6.52')])
OrderedDict([('ID', '020'), ('Age', '31'), ('Handedness', 'right'), ('AvgRT', '239.09'), ('AvgScore', '8.04')])
OrderedDict([('ID', '021'), ('Age', '32'), ('Handedness', 'left'), ('AvgRT', '219.19'), ('AvgScore', '13')])


In other words, each row is read into a dictionary, with the keys specified by the
first line and the values specified by the subsequent lines.

In [48]:
row['ID']

'021'

In [49]:
row['AvgScore']

'13'

To write a file, you can use the **`writer()`** function in `csv` module to create a writer object. Then each line is written by the **`writerow()`** method associated with the writer object. You pass on a list to the **`writerow()`** method to write one line of data. For example,

`<WriteCSV.py>`

In [50]:
import csv
import random

f = open('RandomData.csv','w')
writer = csv.writer(f)
writer.writerow(['ID', 'Rand1', 'Rand2'])
for i in range(1,11):
    writer.writerow(['%04d' % i, random.randint(0,10), random.randint(10,20)])

f.close()

And this program produces a CSV file called `RandomData.csv`, with the content
```
ID,Rand1,Rand2
0001,10,16
0002,8,12
. . .
0010,4,17
```

### Exercise
1. **State capital list, as a dictionary**. Under the `TestData` directory, there
is a CSV file called `StateCapitalList.csv`. Read this file and create a dictionary with items:
```python
StateCapDict = {
        'Alabama':'Montgomery',
        'Alaska':'Juneau',
        ...
        'Cheyenne':'Wyoming'
}
```
In other words, each item in this dictionary, the key is a state and the value is its capital.