# Basic Python- Files

Python has several functions for creating, reading, updating, and deleting files, using input and output functions (also known as I/O).

The key function for working with files in Python is the open() function.  
The open() function takes two parameters; filename, and mode.  

There are four different methods (modes) for opening a file:  
"r" - Read - Default value. Opens a file for reading, error if the file does not exist  

"a" - Append - Opens a file for appending, creates the file if it does not exist  

"w" - Write - Opens a file for writing, creates the file if it does not exist  

"x" - Create - Creates the specified file, returns an error if the file exists  

In [1]:
# To open a file for reading it is enough to specify the name of the file:
infile = open("test_dnapar.par", "r")

# So we have a variable that has this file, but a simple print method won't work
print(infile)

<_io.TextIOWrapper name='test_dnapar.par' mode='r' encoding='UTF-8'>


In [2]:
# We need to specifically tell Python what to do with this file
infile = open("test_dnapar.par", "r")

print(infile.read())

 141 # base-pairs
   0 # ***local base-pair & step parameters***
#        Shear    Stretch   Stagger   Buckle   Prop-Tw   Opening     Shift     Slide     Rise      Tilt      Roll      Twist
A-T      0.066    -0.105    -0.325     7.186   -14.363     5.753     0.000     0.000     0.000     0.000     0.000     0.000
A-T     -0.003    -0.100     0.192     3.612   -10.756     1.663    -0.538     0.255     3.265    -5.567     2.173    35.664
T-A      0.002    -0.405     0.101     9.230    -6.731     2.162     0.587    -0.544     3.248     2.935    -2.176    33.502
A-T      0.343    -0.049     0.485     5.135   -15.533     0.178     0.062    -0.175     3.333    -1.970    -5.696    41.489
T-A      0.205    -0.083     0.300     2.460   -10.158     0.341     0.241    -0.797     3.349     0.511    -3.190    32.852
C-G      0.360    -0.096     0.040    -1.162    -8.479     3.251     0.196    -0.305     3.244     2.480    -1.180    40.634
C-G      0.255    -0.242    -0.159     1.269    -0.885    -4

In [3]:
# Load the file and return the first 100 characters of the file

infile = open("test_dnapar.par", "r")

print(infile.read(100))

 141 # base-pairs
   0 # ***local base-pair & step parameters***
#        Shear    Stretch   Stagger


In [4]:
# Load the file and return one line by using the readline() method:

infile = open("test_dnapar.par", "r")

print(infile.readline())

 141 # base-pairs



In [5]:
# Read two lines of the file:

infile = open("test_dnapar.par", "r")

print(infile.readline())
print(infile.readline())

 141 # base-pairs

   0 # ***local base-pair & step parameters***



In the first notebook you saw print() to put in a space between lines.  
However, notice above that this space was already there.  
In files there are regular expressions that take care of specific non-viewable tasks:  
"\s" is for a space  
"\t" is for a tabbed space
"\n" is for a new line, equivalent to using Enter in a text document

So in the first two lines of text above, they must end with "\n" to generate that extra blank space below

In [6]:
# -!- Be nice to your memory
# Allows close your files when you are done with them

infile = open("test_dnarefframe.dat", "r")

print(infile.read(100))

infile.close()

print(infile.read(100))

200 base pairs 
... 1 C-G ...
0.000000        0.000000        0.000000
1.000000        0.000000     


ValueError: I/O operation on closed file.

## Write to an Existing File
To write to an existing file, you must add a parameter to the open() function:  

"a" - Append - will append to the end of the file  

"w" - Write - will overwrite any existing content  

## Create a New File
To create a new file in Python, use the open() method, with one of the following parameters:  

"x" - Create - will create a file, returns an error if the file exist  

"a" - Append - will create a file if the specified file does not exist  

"w" - Write - will create a file if the specified file does not exist  

In [7]:
f = open("myfile.txt", "w")

Check the directory you are working from. You should see a new empty file called "myfile"

Try to delete it manually right now

You cannot. It's because it's still being used. You must close the file you open before you can manually delete it.

In [8]:
f.close()

# Now try to delete it and you will see it works

In [9]:
# I am going to make a header text file by copying the first two lines of the par file into a separate file

infile  = open("test_dnapar.par", "r")
outfile = open("test_header.txt", "w")

outfile.write(infile.readline())
outfile.write(infile.readline())

infile.close()
outfile.close()

In [10]:
# Deleting the file requires the "os" module

import os

os.remove("test_header.txt")

# You can check if the file exists

if os.path.exists("test_header.txt"):
    os.remove("test_header.txt")
else:
    print("The file does not exist")

The file does not exist


In [2]:
# TASK: Read and print the first 20 lines of the "test_dnapdb.pdb" file


infile = open("test_dnapdb.pdb", "r")

lst = infile.readlines()

infile.close()

#print(lst)





In [None]:
for line in lst:
    if 'OP1' in line:
        print(line)

In [None]:

lst2 = [i.rstrip('\n') for i in lst]

print(lst2)


In [None]:
for line in lst2:
    if 'OP1' in line:
        print(line)

In [15]:
# TASK: Read the first 20 lines of the "test_dnapdb.pdb" file
# ... and make a new file "test_dna_firstbasepair.pdb"

infile = open("test_dnapdb.pdb", "r")
outfile = open("test_dna_firstbasepair.pdb", "w")

i = 20
lst = infile.readlines()

for j in range(len(lst)):
    if j <= i:
        print(lst[j])
        outfile.write(lst[j])

del lst
infile.close()
outfile.close()

REMARK    3DNA v2.3.4-2018nov06, created and maintained by Xiang-Jun Lu (PhD)

ATOM      1  P    DC A   1      -0.275   9.443  -1.528  1.00  1.00           P  

ATOM      2  OP1  DC A   1      -0.347  10.779  -2.161  1.00  1.00           O  

ATOM      3  OP2  DC A   1       0.736   9.284  -0.459  1.00  1.00           O  

ATOM      4  O5'  DC A   1      -1.717   9.036  -0.967  1.00  1.00           O  

ATOM      5  C5'  DC A   1      -2.657   8.399  -1.854  1.00  1.00           C  

ATOM      6  C4'  DC A   1      -3.334   7.242  -1.148  1.00  1.00           C  

ATOM      7  O4'  DC A   1      -2.590   5.997  -1.284  1.00  1.00           O  

ATOM      8  C3'  DC A   1      -3.517   7.399   0.362  1.00  1.00           C  

ATOM      9  O3'  DC A   1      -4.762   6.819   0.737  1.00  1.00           O  

ATOM     10  C2'  DC A   1      -2.390   6.569   0.977  1.00  1.00           C  

ATOM     11  C1'  DC A   1      -2.477   5.402   0.000  1.00  1.00           C  

ATOM     12  N1   D

In [16]:
infile  = open("test_dnapdb.pdb", "r")
outfile = open("test_dna_PHOSPHATES.pdb", "w")

lst = infile.readlines()
infile.close()

for j in range(len(lst)):
    if " P " in lst[j]:
        print(lst[j])
        outfile.write(lst[j])

del lst
outfile.close()

ATOM      1  P    DC A   1      -0.275   9.443  -1.528  1.00  1.00           P  

ATOM     20  P    DT A   2      -5.446   7.530   2.268  1.00  1.00           P  

ATOM     40  P    DG A   3      -8.125   4.140   6.617  1.00  1.00           P  

ATOM     62  P    DT A   4      -9.620  -3.324  10.494  1.00  1.00           P  

ATOM     82  P    DC A   5      -7.268  -6.891  13.210  1.00  1.00           P  

ATOM    101  P    DC A   6       0.186 -10.839  14.426  1.00  1.00           P  

ATOM    120  P    DC A   7       6.748 -10.398  16.342  1.00  1.00           P  

ATOM    139  P    DC A   8      11.068  -5.215  17.386  1.00  1.00           P  

ATOM    158  P    DC A   9      12.514  -0.047  22.116  1.00  1.00           P  

ATOM    177  P    DG A  10      11.695   2.465  27.745  1.00  1.00           P  

ATOM    199  P    DC A  11       8.741   2.199  33.126  1.00  1.00           P  

ATOM    218  P    DG A  12       5.550  -0.703  39.042  1.00  1.00           P  

ATOM    240  P  

In [3]:
infile  = open("test_dnapdb.pdb", "r")

# I want DC 74 in chain A and DG 327 in chain B
outfile = open("test_dna_basepair.pdb", "w")

lst = infile.readlines()
infile.close()

for j in range(len(lst)):
    if " DC A  74 " in lst[j] or " DG B 327" in lst[j]:
        print(lst[j])
        outfile.write(lst[j])

del lst
outfile.close()

ATOM   1480  P    DC A  74     -11.431 -12.626 -37.688  1.00  1.00           P  

ATOM   1481  OP1  DC A  74     -10.673 -12.433 -38.944  1.00  1.00           O  

ATOM   1482  OP2  DC A  74     -11.888 -11.393 -37.008  1.00  1.00           O  

ATOM   1483  O5'  DC A  74     -12.683 -13.585 -37.957  1.00  1.00           O  

ATOM   1484  C5'  DC A  74     -12.497 -15.012 -37.888  1.00  1.00           C  

ATOM   1485  C4'  DC A  74     -13.646 -15.648 -37.132  1.00  1.00           C  

ATOM   1486  O4'  DC A  74     -13.404 -15.697 -35.696  1.00  1.00           O  

ATOM   1487  C3'  DC A  74     -14.996 -14.942 -37.255  1.00  1.00           C  

ATOM   1488  O3'  DC A  74     -16.027 -15.923 -37.293  1.00  1.00           O  

ATOM   1489  C2'  DC A  74     -15.135 -14.143 -35.958  1.00  1.00           C  

ATOM   1490  C1'  DC A  74     -14.560 -15.208 -35.032  1.00  1.00           C  

ATOM   1491  N1   DC A  74     -14.147 -14.690 -33.720  1.00  1.00           N  

ATOM   1492  C2 