# Intro to Strings 
## with DNA Sequences

`string` objects in Python represent text.  They can be created in several ways:

```python
>>> 'Hello'  # with apostrophes ("single-quotes")
'Hello'

>>> "Hello"  # with quotation marks ("double-quotes")
"Hello"

>>> """Hello,     
... my name is
... Nick"""   # with triple-double-quotes (a "docstring", used for multi-line text)
'Hello,\nmy name is\nNick'  

>>> str(32)  # using the str() function to change into a string
'32'
```

Nucleotide sequences are often represented as strings:

```python
>>> seq = 'GCATTGGCT'
```

## String Operation Exercises

Modify the dna sequences below in a single line of code to match what's asked for.  Functions and methods that may be used are:

### Operations
  - `'GTC' * 3`
  - `'GTC' + 'GTC'`
  - `'GTC'[0]`
  - `'GTC'[-1]`
  - `'GTC'[1:]`
  - `'GTC'[:-1]`
  - `'GTC'[::-1]   # Note: Reverses the sequence`
  - `'GTC' == 'GTC'`
  - `'GTC' != 'GTC'`

### Functions
  - `len('GTC')`

### Methods
  - `'GTC'.count('A')`
  - `'GtC'.upper()`
  - `'GTc'.lower()`
  - `'GTC'.isdigit()`
  - `'GTC'.index('T')`
  - `'GTC'.replace('G', 'C')`
  - `'GTC-CCA'.split('-')`



**Exercises**

Count the Number of "G" in the sequence

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

Count the number of "AT" repeats in the sequence

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

Concatenate the following two sequences (i.e. combine them into one sequence)

In [None]:
seq1 = "GTGTCAGT"
seq2 = "TGAATCGATAG"

How long is the following sequence?

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

What is the 7th nucleotide in this sequence?

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

What is the 3rd-from-the-last nucleotide in this sequence?

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

Repeat the following sequence 13 times

In [None]:
gc = "GC"

Replace the incorrect letter with an empty string (i.e. delete the letter)

In [None]:
seq = "GTGXXGTXCCXCCATGXAATCGXATA"

Keep only the first six nucleotides in this sequence

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

Standardize the formatting of this sequence

In [None]:
seq = "GtCGAaaCCgTaGcTAgc"

Split the following string around the empty space into a list of sequences

In [None]:
seqs = "GTTCGAAAG GACCTGATTATAG AACCGATTTA"

Reverse this sequence

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

What percentage of strong nucleotides (G and C) are there in this sequence?

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

Is this sequence the same forwards and backwards (i.e. a palindrome)?

In [None]:
seq = "TCGATCTAGCGCGAATATCGGAGAAGAGGCTATAAGCGCGATCTAGCT"

## Files

### Writing Strings to Files

Strings can be saved to text files by making a `File` object with the `open()` function and writing the string to it.  Here are two ways to do it:

```python
my_file = open('myfile.txt', 'w')  # open in 'write' mode
my_file.write('This is my text')
my_file.close()
```

A shorter version of this is:
```python
with open('myfile.txt', 'w') as my_file:
    my_file.write('This is my text')

Even shorter uses the `Path` object from the pathlib package:
```python
from pathlib import Path
Path('myfile.txt').write_text('This is my text')
```

### Reading Strings from Files

Reading works in a similar way

```python
my_file = open('myfile.txt')
text = my_file.read()
my_file.close()
```

A shorter version of this is:
```python
with open('myfile.txt') as my_file:
    text = my_file.read()

```

Even shorter: 
```python
from pathlib import Path
text = Path('myfile.txt').read_text()
```

**Exercises**

Write the following sequence to a file called "sequence1.txt":

In [None]:
seq = "GTGTCAGTCCCCATGAATCGATAG"

Read the sequence from the file back into Python

### Online Text

For getting text data from the internet, we can use the [requests](https://docs.python-requests.org/en/master/) package, which comes with Anaconda.

```python
import requests
url = "https://docs.python-requests.org/en/master/"
r = requests.get(url)
text = r.text
```

**Exercises**

Roughly how many letters are in William Shakespeare's play "Romeo and Juliet"?

In [None]:
url = "https://raw.githubusercontent.com/cgovella/learning/master/edx-python/case%20studies/gutenverg/Books/English/shakespeare/Romeo%20and%20Juliet.txt"
import requests
r = requests.get(url)
len(r.text)

178981

Is Romeo or Juliet mentioned more often?

What genome sequence does this URL point to?

In [None]:
url = "https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?report=fasta&log$=seqview&format=text"


Write the genome sequence to a file called "sequence2.fasta"

How long is the sequence?

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=79696a98-709a-4729-b1aa-af4bf3c33168' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>