In [1]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/zV949buXdSg?autoplay=1&loop=1" frameborder="0" allowfullscreen></iframe>')

### Structures like these are encoded in "PDB" files

![pdb_atoms](https://raw.githubusercontent.com/harmsm/pythonic-science/master/chapters/03_dealing-with-files/data/pdb_header.png)

![pdb_atoms](https://raw.githubusercontent.com/harmsm/pythonic-science/master/chapters/03_dealing-with-files/data/pdb_atoms.png)

### How can we parse a complicted file like this one?

In [2]:
import pandas as pd
pd.read_table("data/1stn.pdb")

Unnamed: 0,HEADER HYDROLASE(PHOSPHORIC DIESTER) 17-FEB-93 1STN
0,TITLE THE CRYSTAL STRUCTURE OF STAPHYLOCOC...
1,TITLE 2 1.7 ANGSTROMS RESOLUTION ...
2,COMPND MOL_ID: 1; ...
3,COMPND 2 MOLECULE: STAPHYLOCOCCAL NUCLEASE; ...
4,COMPND 3 CHAIN: A; ...
5,COMPND 4 EC: 3.1.31.1; ...
6,COMPND 5 ENGINEERED: YES ...
7,SOURCE MOL_ID: 1; ...
8,SOURCE 2 ORGANISM_SCIENTIFIC: STAPHYLOCOCCUS...
9,SOURCE 3 ORGANISM_TAXID: 1280 ...


### We can do better by manually *parsing* the file.

### Our test file
![pdb_atoms](https://raw.githubusercontent.com/harmsm/pythonic-science/master/chapters/03_dealing-with-files/data/test-file.png)

### Predict what this will print

In [8]:
f = open("test-file.txt")
print(f.readlines())
f.close()

['Line 1 of a file\n', 'It can be a very long line -- up to a basically unlimited number of characters.\n', '\n', 'just passed a blank line!\n', '1.5 2 20,32|5']


### Predict what this will print

In [9]:
f = open("test-file.txt")
for line in f.readlines():
    print(line)
f.close()

Line 1 of a file

It can be a very long line -- up to a basically unlimited number of characters.



just passed a blank line!

1.5 2 20,32|5


### Predict what this will print

In [12]:
f = open("test-file.txt")
for line in f.readlines():
    print(line,end="")
f.close()

Line 1 of a file
It can be a very long line -- up to a basically unlimited number of characters.

just passed a blank line!
1.5 2 20,32|5

### Basic file reading operations: 

+ Open a file for reading: `f = open(SOME_FILE_NAME)` 
+ Read lines of file sequentially: `f.readlines()`
+ Read one line from the file: `f.readline()`
+ Read the whole file into a string: `f.read()`
+ Close the file: `f.close()`



### Now what do we do with each line?

### Predict what the following program will do

In [13]:
f = open("test-file.txt")
for line in f.readlines():
    print(line.split())
f.close()   

['Line', '1', 'of', 'a', 'file']
['It', 'can', 'be', 'a', 'very', 'long', 'line', '--', 'up', 'to', 'a', 'basically', 'unlimited', 'number', 'of', 'characters.']
[]
['just', 'passed', 'a', 'blank', 'line!']
['1.5', '2', '20,32|5']


### Predict what the following program will do

In [14]:
f = open("test-file.txt")
for line in f.readlines():
    print(line.split("1"))
f.close()   

['Line ', ' of a file\n']
['It can be a very long line -- up to a basically unlimited number of characters.\n']
['\n']
['just passed a blank line!\n']
['', '.5 2 20,32|5']


### Splitting strings

+ `SOME_STRING.split(CHAR_TO_SPLIT_ON)` allows you to split strings into a list.  
+ If `CHAR_TO_SPLIT_ON` is not defined, it will split on *all* whitespace (" ","\t","\n","\r")
+ "\t" is TAB, "\n" is NEWLINE, "\r" is CARRIAGE_RETURN. 

### Predict what the following will do
![test_file](https://raw.githubusercontent.com/harmsm/pythonic-science/master/chapters/03_dealing-with-files/data/test-file.png)

In [16]:
f = open("test-file.txt")
lines = f.readlines()
f.close()

line_of_interest = lines[-1]
value = line_of_interest.split()[0]
print(value)

1.5


### Predict what will happen:

In [17]:
print(value*5)

1.51.51.51.51.5


`value` is a string of "1.5".  You can't do math on it yet. 

### The solution is to *cast* it into a float

In [19]:
value_as_float = float(value) 
print(value_as_float*5)

7.5


### Cast calls:

`float`, `int`, `str`, `list`, `tuple`

In [20]:
list("1.5")

['1', '.', '5']

### Write a program that grabs the "1" from the first line in the file and multiplies it by 75. 

In [21]:
f = open("test-file.txt")
lines = f.readlines()
f.close()

value = lines[0].split(" ")[1]
value_as_int = int(value)
print(value_as_int*75)

75


## What about *writing* to files?

### Basic file writing operations: 

+ Open a file for writing: `f = open(SOME_FILE_NAME,'w')` **will wipe out file immediately!**
+ Open a file to append: `f = open(SOME_FILE_NAME,'a')`
+ Write a string to a file: `f.write(SOME_STRING)`
+ Write a list of strings: `f.writelines([STRING1,STRING2,...])`
+ Close the file: `f.close()`

In [22]:
def file_printer(file_name):
    f = open(file_name)
    for line in f.readlines():
        print(line,end="")
    f.close()

### Predict what this code will do

In [23]:
a_list = ["a","b","c"]
f = open("another-file.txt","w")
for a in a_list:
    f.write(a)
f.close()
file_printer("another-file.txt")

abc

### Predict what this code will do

In [24]:
a_list = ["a","b","c"]
f = open("another-file.txt","w")
for a in a_list:
    f.write(a)
    f.write("\n")
f.close()
file_printer("another-file.txt")

a
b
c


### Predict what this code will do

In [38]:
a_list = ["a","b","ccat"]
f = open("another-file.txt","w")
for a in a_list:
    f.write("A test {{}} {}\n".format(a))
f.close()
file_printer("another-file.txt")

A test {} a
A test {} b
A test {} ccat


### `format` lets you make pretty strings

In [35]:
print("The value is: {:}".format(10.35151))
print("The value is: {:.2f}".format(10.35151))
print("The value is: {:20.2f}".format(10.35151))

The value is: 10.35151
The value is: 10.35
The value is:                10.35


In [36]:
print("The value is: {:}".format(10))
print("The value is: {:20d}".format(10))

The value is: 10
The value is:                   10


### String formatting

+ Pretty decimal printing: `"{:LENGITH_OF_STRING.NUM_DECIMALSf}".format(FLOAT)`
+ Pretty integer printing: `"{:LENGTH_OF_STRINGd}".format(INT)`
+ Pretty string printing:  `"{:LENGTH_OF_STRINGs}".format(STRING)`

### Create a loop that prints 0 to 9 to a file.  Each number should be on its own line, written to 3 decimal places. 

In [None]:
f = open("junk","w")
for i in range(10):
    f.write("{:.3f}\n".format(i))
f.close()
file_printer("junk")

### Basic file reading operations: 

+ Open a file for reading: `f = open(SOME_FILE_NAME)` 
+ Read lines of file sequentially: `f.readlines()`
+ Read one line from the file: `f.readline()`
+ Read the whole file into a string: `f.read()`
+ Close the file: `f.close()`



### Basic file writing operations: 

+ Open a file for writing: `f = open(SOME_FILE_NAME,'w')` **will wipe out file immediately!**
+ Open a file to append: `f = open(SOME_FILE_NAME,'a')`
+ Write a string to a file: `f.write(SOME_STRING)`
+ Write a list of strings: `f.writeline([STRING1,STRING2,...])`
+ Close the file: `f.close()`

### Splitting strings

+ `SOME_STRING.split(CHAR_TO_SPLIT_ON)` allows you to split strings into a list.  
+ If `CHAR_TO_SPLIT_ON` is not defined, it will split on *all* whitespace (" ","\t","\n","\r")
+ "\t" is TAB, "\n" is NEWLINE, "\r" is CARRIAGE_RETURN. 

### String formatting

+ Pretty decimal printing: `"{:LENGITH_OF_STRING.NUM_DECIMALSf}".format(FLOAT)`
+ Pretty integer printing: `"{:LENGTH_OF_STRINGd}".format(INT)`
+ Pretty string printing:  `"{:LENGTH_OF_STRINGs}".format(STRING)`