## Notebook 2.3: Files I/O

This notebook will correspond with chapter 7 in the official Python tutorial https://docs.python.org/3/tutorial/.  


### Learning objectives: 

By the end of this exercise you should:

1. Understand how to import libraries.
2. Read and write data to files. 
3. Be able to load fastq genomic data from a file to a Python object.

### Importing a package
Python is very *atomic* language, meaning that many packages in the standard library are packaged into individual libraries that need to be loaded in order to access their utilities. This makes Python very light weight since the base language does not need to load all of these extra utilities unless we ask it to. To load a package that is installed on our system we can call the `import` function like below. Here we are also using a package that is not part of the standard library but was installed separately, called requests, which is used to download data from the web.

In [1]:
import os
import gzip
import requests

### Download data files for this notebook
Run the bash script below to create a new folder and download two files that we will use in this notebook into that folder. This code should look familiar, we used very similar bash commands in the notebooks from session 1. 

In [2]:
%%bash
mkdir -p datafiles/
wget http://eaton-lab.org/data/40578.fastq.gz -q -O datafiles/40578.fastq.gz
wget http://eaton-lab.org/data/iris-data-dirty.csv -q -O datafiles/iris-data-dirty.csv

We can perform the same task using Python. Here we will name the directory for the files "datafiles2" to differentiate it. In this case the Python version of the code looks quite a bit more complicated than the bash script. This isn't always the case, indeed Python code is often much simpler to read. By the end of this notebook you should be able to understand the code below.

In [3]:
# make a new directory
os.makedirs("datafiles2", exist_ok=True)

# download files to that directory
url1 = "https://eaton-lab.org/data/40578.fastq.gz"
with open("./datafiles2/40578.fastq.gz", 'wb') as ffile:
    ffile.write(requests.get(url1).content)

url2 = "https://eaton-lab.org/data/iris-data-dirty.csv"
with open("./datafiles2/iris-data-dirty.csv", 'wb') as ffile:
    ffile.write(requests.get(url2).content)

### List directories
Another common tool that we used in the bash terminal is the `ls` command to look at the files in a given location in the filesystem. Below is the `ls` command as well as a Python equivalent. The `os.listdir()` function in Python returns the contents as a `list`. 

In [4]:
%%bash
ls datafiles/

40578.fastq.gz
helloworld.txt
iris-data-dirty.csv
newfiles.txt
newfile.txt


In [5]:
os.listdir("datafiles2/")

['40578.fastq.gz', 'iris-data-dirty.csv']

### Using packages
The `os` package has many functions but we will be using just a small part of it today, primarily the `path` submodule. Just like everything else in Python packages are also objects, and so we can access all of the functions in this package using tab completion. Put your cursor after the period in the cell below and press `<tab>` to see available options in `os`. There are many!

In [None]:
## use tab-completion after the '.' to see available options in os
os.

### Filepath operations with the `os` package
A type of string that is often difficult to format properly when writing code is a filepath. If the string representation of a filepath is incorrect by even a single typo then the path will not be found. This becomes extra tricky when a program needs to access filepaths on different types of computers, since filepaths look different on a Mac and PC, for example. Here understanding the filesystem hierarchy that we learned in lesson 1 becomes important. Fortunately the `os.path` package makes this easy. 

### Using `os.path`
The `os.path` submodule is used to format filepaths. We can expand shortened path names, we can join together multiple paths, we can search for special directories like $HOME, or current directory. Essentially, the package is making calls similar to those we learned from bash scripting last week, such as `pwd` to show your current directory, or `~` as a shorthand for your home directory. Here we can access those filepaths as string variables and work with them very easily. 

NB: The goal here is not for you to master the `os` package, but to understand that many such packages exist in the Python standard library and that you can use tab-completion, google search, and other sources to find them and how to use them.

In [7]:
# return my $HOME directory
os.path.expanduser("~")

'/home/deren'

In [8]:
# convert relative path to a full path
os.path.abspath('./')

'/home/deren/Documents/genomics-course/2-genes/notebooks'

<div class="alert alert-success">
    <b>Action:</b> Write a relative path to the iris-data-dirty.csv file that we downloaded earlier and expand it to a full path using the `os.path.abspath()` function.
</div>

In [10]:
os.path.abspath("./datafiles/iris-data-dirty.csv")

'/home/deren/Documents/genomics-course/2-genes/notebooks/datafiles/iris-data-dirty.csv'

### Operations on filepaths

In [11]:
# assign my current dir to a variable
curdir = os.path.abspath('.')
curdir

'/home/deren/Documents/genomics-course/2-genes/notebooks'

In [12]:
# get the lowest level directory in curdir
os.path.basename(curdir)

'notebooks'

In [13]:
# get the directory structure above curdir
os.path.dirname(curdir)

'/home/deren/Documents/genomics-course/2-genes'

### Joining filepaths
Because it can be hard to keep track of the "/" characters between directories and filepaths it is useful to use the `.join` function of the `os.path` module to join together path names. Here we will create string variable with a new pathname for a file that doesn't yet exist in our current directory. You can see in the three examples below that it doesn't matter when we include a "/" after a directory name or not, the `join` function figures it out for us. 

In [14]:
# see how os.path.join handles '/' characters in path names
print(os.path.join("/home/user/fakeuser", "folder1/", "folder2", "newfile.txt"))
print(os.path.join("/home/user/fakeuser", "folder1", "folder2", "newfile.txt"))
print(os.path.join("/home/user/fakeuser/", "folder1/", "folder2/", "newfile.txt"))

/home/user/fakeuser/folder1/folder2/newfile.txt
/home/user/fakeuser/folder1/folder2/newfile.txt
/home/user/fakeuser/folder1/folder2/newfile.txt


In [15]:
# get the full path name to a newfile in our current directory
newfile = os.path.join(curdir, "newfile.txt")
newfile

'/home/deren/Documents/genomics-course/2-genes/notebooks/newfile.txt'

### Writing files

The function `open` can be used to create views of files. The format for this is `open(filename, mode)` where mode is the thing you plan to do with this file. The main arguments for this are `w` for 'write', `r` for 'read', or `a` for 'append'. Below we will use `w` to write, which we can use to create a new file. 

In [50]:
# get an open file object
ofile = open("./datafiles/helloworld.txt", 'w')

# return the file object
ofile

<_io.TextIOWrapper name='./datafiles/helloworld.txt' mode='w' encoding='UTF-8'>

#### File objects
As with other objects, this variable `ofile` has attributes and functions that we can access and see by using tab-completion. Move your cursor to the end of the object below after the period and use tab to see some of the options. 

In [None]:
## use tab to see options associated with open file objects
ofile.

Use the `.write()` function to write a string to the file. 

In [18]:
# write a string to the file. 
# It returns the number of characters written, which we can ignore for now.
ofile.write("Hello world")

11

In [19]:
# when we are done writing to the file use .close()
ofile.close()

### Reading files
To read data from a file we need to first open a file object, just like when we wrote to a file, but now we use the mode flag `r`. We can now access the data in the file using the `.read()` function. Below we read data form the file and store the result as a string variable called `idata`. 

In [41]:
# open a file object for reading
ifile = open("./datafiles/iris-data-dirty.csv", 'r')

# read all contents of the file as a string
idata = ifile.read()

# close the file object
ifile.close()

Now that we've stored the contents of the file in the variable `idata` we can interact with it just like it is any other string. 

In [51]:
## show the first 50 characters
idata[:50]

'5.1,3.5,1.4,0.2,Iris-setosa\n4.9,3.0,1.4,0.2,Iris-s'

### Gzip compressed files
Gzip compression is easily handled in Python using the standard library. The `gzip` module has an `open()` function that acts just like the regular `open` function to create a file object. We need to use the `gzip` version instead of the regular `open` function to open and read a gzipped file properly. Let's try it out on the compressed fastq file we just downloaded. We'll also practice using `os.path` to find the full filepath of the `40578.fastq.gz` file. 


In [52]:
# get full path to the file in our current directory
gzfile = os.path.abspath("./datafiles/40578.fastq.gz")


In [53]:
# read compressed byte data from this file
ffile = gzip.open(gzfile, 'rb')
fdata = ffile.read().decode()
ffile.close()

In [54]:
# show some data from the file
print(fdata[:200])

@40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74
TGCAGCATAGCATAGATAATACAAGGTTNNNNNNNNNNNNNNTTTNCACAGTNTNNNATTAAACCCGGTAGNTN
+40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74
IIIIIIHIIIIIIIIIGIIIH


### Reading and parsing data with the `read()` function
The `read()` function is nice for reading in a large chunk of text, but it then requires us to parse that text using string processing. This is because all of the data is loaded as a big chunk of text. It then usually requires us to `split` this text using some kind of delimiter. Let's try splitting the fastq data on newline characters (`"\n"`). 

In [57]:
# split fastq string data on newline characters to return a list
fastqlines = fdata.split("\n")

# print the first 10 list elements
print(fastqlines[:10])

['@40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74', 'TGCAGCATAGCATAGATAATACAAGGTTNNNNNNNNNNNNNNTTTNCACAGTNTNNNATTAAACCCGGTAGNTN', '+40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74', 'IIIIIIHIIIIIIIIIGIIIHIIIBB:B##############################################', '@40578_rex.2 GRC13_0027_FC:4:1:13011:1181 length=74', 'TGCAGAGTCTACCCAAAGGTTCAGGCCGNNNNNNNNNNNNNNGTTNATACGTNTNNNTATTTCTATGAGAANCN', '+40578_rex.2 GRC13_0027_FC:4:1:13011:1181 length=74', 'GGGGHHHHHHHHHHHHHHHEBG<G?;??##############################################', '@40578_rex.3 GRC13_0027_FC:4:1:15237:1184 length=74', 'TGCAGAGTCCTAAATCTATTTCCTCTTCNNNNGNNNNNNNATGCATGCAACCTCCNNTCGCCACCTGTACGNAN']


## The fastq file format
Let's take a side quest now and read some details of the [fastq file format here](https://en.wikipedia.org/wiki/FASTQ_format). This is a file format for next-generation sequence data that we will use frequently throughout this course. Fastq files can be really large, often multiple gigabytes (Gb) of data. Our downloaded example file is relatively small, though. 

As the link above describes, the fastq format stores labeled sequence data in a sequence of four lines at a time. That is one sequenced read (a length of DNA information) is written over four lines. The first line labels the read with unique identifying information. The second line contains the sequence data. The third line is a spacer. And the fourth line contains quality scores for each base in the read. 

Because we split the file at every line break, we can easily look at the first four lines of data using indexing on the list object `fastqlines`. 

In [61]:
# the first line: identifier
fastqlines[0]

'@40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74'

In [63]:
# the second line: sequence data
fastqlines[1]

'TGCAGCATAGCATAGATAATACAAGGTTNNNNNNNNNNNNNNTTTNCACAGTNTNNNATTAAACCCGGTAGNTN'

In [66]:
# the third line: spacer/repeat
fastqlines[2]

'+40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74'

In [67]:
# the fourth line: quality scores
fastqlines[3]

'IIIIIIHIIIIIIIIIGIIIHIIIBB:B##############################################'

### Phred quality scores
The quality scores in the fastq sequence format are stored using an ASCII encoding, which is a way of representing a number using a single character of text. This data was generated on a modern Illumina machine, and so the scores are actually encoded by the ASCII character + 33. Python has the function `ord()` to convert string characters to ints, and `chr()` to convert ints to ASCII character strings. 

In [74]:
## convert string to int
ord("A")


65

In [76]:
## convert int to str
chr(65)

'A'

In [82]:
## get first 10 phred scores from a line from the fastq file
phreds = fastqlines[3][:10]

## get ASCII for a string of phred scores
[ord(i) for i in phreds]

[73, 73, 73, 73, 73, 73, 72, 73, 73, 73]

In [87]:
## subtract the built-in offset amount from each number 
phred33 = [ord(i) - 33 for i in phreds]



In [89]:
# convert to probabilities of being wrong
[10 ** (-i / 10) for i in phred33]

[0.0001,
 0.0001,
 0.0001,
 0.0001,
 0.0001,
 0.0001,
 0.00012589254117941674,
 0.0001,
 0.0001,
 0.0001]

### Parsing (splitting) text on different characters

From looking at the fastq file data we can see that each four line element could also be separated by a `"\n@"` character. This is because the identifier in the first line will always start with a "@" character. Splitting the file into string objects that represent separate reads of the file, instead of just lines, can make it easier to parse and read the file. Let's try this now by parsing the file and counting the number of reads. 

In [90]:
## split the fdata string on each occurrence of "\n@"
freads = fdata.split("\n@")

## print the first element in the list
print("The first read: \n{}".format(freads[0]))

## print the last element in the list
print("\nThe last read: \n{}".format(freads[-1]))

## print the number of reads in the file
print("\nN reads in the file = {}".format(len(freads)))

The first read: 
@40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74
TGCAGCATAGCATAGATAATACAAGGTTNNNNNNNNNNNNNNTTTNCACAGTNTNNNATTAAACCCGGTAGNTN
+40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74
IIIIIIHIIIIIIIIIGIIIHIIIBB:B##############################################

The last read: 
40578_rex.125 GRC13_0027_FC:4:1:2571:1496 length=74
TGCAGCTCACGGTCGTGAGGGTGAGCTTATTTTTTTGTGAACTGTCTCAACTGCTCGTGAGGGTCCTCACGATT
+40578_rex.125 GRC13_0027_FC:4:1:2571:1496 length=74
IIIIIGHIIIIIHIIIIFIIIDIHGIIIBGIIFIDIDIHHIDIHEIHIIIEEEIHIIE>CEEE:DDBDDFECC8


N reads in the file = 125


### Using context to automatically open & close files

In Python there is a special keyword called `with` that can be used to wrap statements into a context dependency. That means that everything which takes place indented within the statement will be able to access information about the outer statement. This is most often used for opening file objects. The reason being, when you open a file object using the `with` statement it is designed to automatically close the file when you end the `with` statement. In other words, this is just a shortcut to make your code a little bit shorter, by avoiding having to write a `.close()` argument for every file. 

In [91]:
## infile will automatically close when finished.
with open("./datafiles/iris-data-dirty.csv", 'r') as infile:
    data = infile.readlines()

In [92]:
data[:10]

['5.1,3.5,1.4,0.2,Iris-setosa\n',
 '4.9,3.0,1.4,0.2,Iris-setosa\n',
 '4.7,3.2,1.3,0.2,Iris-setosa\n',
 '4.6,3.1,1.5,0.2,Iris-setosa\n',
 '5.0,3.6,1.4,0.2,Iris-setosa\n',
 '5.4,3.9,1.7,0.4,Iris-setosa\n',
 '4.6,3.4,1.4,0.3,Iris-setosa\n',
 '5.0,3.4,1.5,0.2,Iris-setosa\n',
 '4.4,2.9,1.4,0.2,Iris-setosa\n',
 '4.9,3.1,1.5,0.1,Iris-setosa\n']

## Downloading data from the web in Python

The standard format for using the `requests` library is to make a GET request to a url, which is a request to read the data from that page. This will return a `response` object which we can then access for information. The `response` object will contain an error message if the url is invalid, or blocked, and it will contain the HTML text of the webpage if it is successful. 

We used this method to download data at the top of this notebook. Now we'll look at it in a bit more detail. 

In [97]:
# store urls as strings
url1 = "https://eaton-lab.org/data/40578.fastq.gz"
url2 = "https://eaton-lab.org/data/iris-data-dirty.csv"

The `requests.get()` function returns a new variable 'response', which is a Python object just like the other object types we've learned about. We can access functions of this object using tab completion. 

In [98]:
# see the response object (200 means successful GET)
response = requests.get(url2)
response

<Response [200]>

In [99]:
# show the first 50 characters of data
response.text[:50]

'5.1,3.5,1.4,0.2,Iris-setosa\n4.9,3.0,1.4,0.2,Iris-s'

In [100]:
# split the string of text on each newline character
lines = response.text.split("\n")[:10]
lines

['5.1,3.5,1.4,0.2,Iris-setosa',
 '4.9,3.0,1.4,0.2,Iris-setosa',
 '4.7,3.2,1.3,0.2,Iris-setosa',
 '4.6,3.1,1.5,0.2,Iris-setosa',
 '5.0,3.6,1.4,0.2,Iris-setosa',
 '5.4,3.9,1.7,0.4,Iris-setosa',
 '4.6,3.4,1.4,0.3,Iris-setosa',
 '5.0,3.4,1.5,0.2,Iris-setosa',
 '4.4,2.9,1.4,0.2,Iris-setosa',
 '4.9,3.1,1.5,0.1,Iris-setosa']

### Join: combine multiple string elements into a single string.

It is often useful to split a string into separate elements as a list, and then operate on those list elements. When finished, we then wish to join the list elements back together into a string object. This can be done using the `.join()` function, which is a function of string objects. The object calling join is the string that you want to be placed in between each element of the list being joined. Some examples below. 

In [103]:
# join together lines with no separator
"".join(lines)

'5.1,3.5,1.4,0.2,Iris-setosa4.9,3.0,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa4.6,3.1,1.5,0.2,Iris-setosa5.0,3.6,1.4,0.2,Iris-setosa5.4,3.9,1.7,0.4,Iris-setosa4.6,3.4,1.4,0.3,Iris-setosa5.0,3.4,1.5,0.2,Iris-setosa4.4,2.9,1.4,0.2,Iris-setosa4.9,3.1,1.5,0.1,Iris-setosa'

In [104]:
# join on newline characters
"\n".join(lines)

'5.1,3.5,1.4,0.2,Iris-setosa\n4.9,3.0,1.4,0.2,Iris-setosa\n4.7,3.2,1.3,0.2,Iris-setosa\n4.6,3.1,1.5,0.2,Iris-setosa\n5.0,3.6,1.4,0.2,Iris-setosa\n5.4,3.9,1.7,0.4,Iris-setosa\n4.6,3.4,1.4,0.3,Iris-setosa\n5.0,3.4,1.5,0.2,Iris-setosa\n4.4,2.9,1.4,0.2,Iris-setosa\n4.9,3.1,1.5,0.1,Iris-setosa'

In [105]:
# remember newlines are only rendered when you print
print("\n".join(lines))

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa


In [106]:
# join on an arbitrary phrase
"Helloworld".join(lines)

'5.1,3.5,1.4,0.2,Iris-setosaHelloworld4.9,3.0,1.4,0.2,Iris-setosaHelloworld4.7,3.2,1.3,0.2,Iris-setosaHelloworld4.6,3.1,1.5,0.2,Iris-setosaHelloworld5.0,3.6,1.4,0.2,Iris-setosaHelloworld5.4,3.9,1.7,0.4,Iris-setosaHelloworld4.6,3.4,1.4,0.3,Iris-setosaHelloworld5.0,3.4,1.5,0.2,Iris-setosaHelloworld4.4,2.9,1.4,0.2,Iris-setosaHelloworld4.9,3.1,1.5,0.1,Iris-setosa'

### Challenges

<div class="alert alert-success">
    <b>Action:</b> 
    This challenge builds on the last challenge from the last notebook. You can reuse your function from the last notebook to generate random sequence data. Write code below to combine a fasta header (e.g., "> sequence name") and random sequence data to create valid fasta data. Then write the data to a file and save it as "datafiles/sequence.fasta". 
</div>

In [110]:
import random

# create a fasta header
header = "> yeast chromosome 1"

# create random sequence data
random_dna = "".join(random.choice("ACGT") for i in range(20))

# combine with a \n separator to create valid fasta
fasta = header + "\n" + random_dna

# write to a file
with open("./datafiles/sequence.fasta", 'w') as out:
    out.write(fasta)

<div class="alert alert-success">
    <b>Action:</b> 
    You have now learned about two sequence file formats, fasta and fastq. If you do not remember the details of fasta then use google or look back at your notebooks from session 1. Fastq contains more information than fasta since it also stores quality information for each base. Your challenge here is to write a function to convert one format to the other. All of the code you need is composed in snippets in examples above. Feel free to use google or the chatroom to seek further help if needed. Your answer must: (1) Write a function; (2) The function must read the 'datafiles/40578.fastq.gz' file from disk; (3) It must convert the data to fasta format; and (4) It must write the result to a file "datafiles/40578.fasta".     
    
Be sure you look at your fasta file after you write it to check that it looks how you expect. If not, modify your code and try again. 
</div>

In [230]:
# (1) create a function
def converter(fastq_file):
    
    # (2) read the fastq file and store data
    with gzip.open(fastq_file, 'rb') as infile:
        fastq_data = infile.read().decode()
    
    # (3) convert the file to fastq...
    # make a list to store data
    fasta_data = []
    
    # split into separate reads and iterate over list
    for read in fastq_data.split("\n@"):

        # split this read into 4 lines
        rlines = read.split("\n")
        
        # convert each read to fasta
        header = "> " + rlines[0]
        sequence = rlines[1]
        fasta = header + "\n" + sequence + "\n"
        
        # store in the new fasta list
        fasta_data.append(fasta)
        
    # (4) write the result to a file (as a string)
    with open("./datafiles/40578.fasta", 'w') as out:
        out.write("".join(fasta_data))


In [233]:
# run the function for testing
converter("./datafiles/40578.fastq.gz")
    

In [240]:
# read the output file to check if it worked
with open("./datafiles/40578.fasta", 'r') as infile:
    # load the data
    data = infile.read()
    # print first 1000 characters
    print(data[:1000])

> @40578_rex.1 GRC13_0027_FC:4:1:10524:1181 length=74
TGCAGCATAGCATAGATAATACAAGGTTNNNNNNNNNNNNNNTTTNCACAGTNTNNNATTAAACCCGGTAGNTN
> 40578_rex.2 GRC13_0027_FC:4:1:13011:1181 length=74
TGCAGAGTCTACCCAAAGGTTCAGGCCGNNNNNNNNNNNNNNGTTNATACGTNTNNNTATTTCTATGAGAANCN
> 40578_rex.3 GRC13_0027_FC:4:1:15237:1184 length=74
TGCAGAGTCCTAAATCTATTTCCTCTTCNNNNGNNNNNNNATGCATGCAACCTCCNNTCGCCACCTGTACGNAN
> 40578_rex.4 GRC13_0027_FC:4:1:4657:1192 length=74
TGCAGGGTATAAATGTTTATTAGAAGATTAAGANNNNGCTGCACAAAAACCATATGACATTAAAAGAAACTCAC
> 40578_rex.5 GRC13_0027_FC:4:1:6218:1191 length=74
TGCAGTATAGGTGCTAAAATACATCATTAACAANNNNCTTTCTTATAATTATTTAATGTTTCATAGCATTTAAN
> 40578_rex.6 GRC13_0027_FC:4:1:11872:1189 length=74
TGCAGGCAAATTATGGCAGTTGAAATGAAGAAANNNNNNTAAAATGACTGCTAATTTTTTGTTAAAATGTAATN
> 40578_rex.7 GRC13_0027_FC:4:1:15437:1199 length=74
TGCAGTGTTTATTCTTTTGTTTGACACAAATTAANTCCTTTAGTTGGTGAACGACCAAACTCGACCAAACTCAA
> 40578_rex.8 GRC13_0027_FC:4:1:17455:1193 length=74
TGCAGAGCAAATAATTCTGCTAAATCTACTGAANNNNTTCTTGTTTGAGAAC

<div class="alert alert-success">
    <b>Question:</b> 
   Describe each step of your function above verbally, in other words, explain how and why it works. Describe any parts that gave you trouble and how you found a solution. Enter your answer below using Markdown. 
   </div>

I created a function by wrapping the code with the `def` operator. Inside of the function I first loaded the gzip fastq file, then I converted it by splitting the file into separate reads on the "\n@" character, and writing just the first two lines. I also appended ">" to the identifier of the sequence. As I iterated through the reads in the fastq file I wrote each to a list. In the end, I used `.join` to turn the list back into a string, and wrote it to a file.  After testing it looks like the function works. I read in the output file and it looks like fasta.

<div class="alert alert-success">
    <b>Action:</b> 
    Save your notebook and download as HTML to upload to courseworks.
</div>