<small><small><i>
All of these python notebooks are available at https://github.com/GunzIvan28/MScMak2023-IntroductionToPython
<small><small><i>

# Working with strings

## The Print Statement

As seen previously, The **print()** function prints all of its arguments as strings, separated by spaces and follows by a linebreak:

    - print("Hello World")
    - print("Hello",'World')
    - print("Hello", <Variable Containing the String>)

Note that **print** is different in old versions of Python (2.7) where it was a statement and did not need parenthesis around its arguments.

In [None]:
print("Hello","World")

The print has some optional arguments to control where and how to print. This includes `sep` the separator (default space) and `end` (end charcter) and `file` to write to a file.

In [3]:
dna="ACGTATA"

In [None]:
?dna.count

In [None]:
dna.count('TA')

In [None]:
print("Hello","World",sep='...',end='!!')

You can find the additional arguments, and help on usage of print, and any other function, by appending a ? before it. 

In [None]:
?print

## String Formatting

There are lots of methods for formatting and manipulating strings built into python. Some of these are illustrated here.

String concatenation is the "addition" of two strings. Observe that while concatenating there will be no space between the strings.

In [None]:
string1='World'
string2='!'
print('Hello' + " "+ string1 + string2 + str(267.00))

The **%** operator is used to format a string inserting the value that comes after. It relies on the string containing a format specifier that identifies where to insert the value. The most common types of format specifiers are:

    - %s -> string
    - %d -> Integer
    - %f -> Float
    - %o -> Octal
    - %x -> Hexadecimal
    - %e -> exponential

In [None]:
print("Hello %s" % string1)
print("Actual Number = %d" %18)
print("Float of the number = %.3f" % 18.87687)
print("Exponential equivalent of the number = %e" %18)

When referring to multiple variables parenthesis is used. Values are inserted in the order they appear in the paranthesis (more on tuples in the next lecture)

In [None]:
print("Hello %s %s. This meaning of life is %d" %(string1,string2,42))

We can also specify the width of the field and the number of decimal places to be used. For example:

In [None]:
print('Print width 10: |%10s|'%'my')
print('Print width 10: |%10s|'%'name') # left justified
print("The number pi = %.2f to 2 decimal places"%3.1415)
print("More space pi = %10.2f"%3.1415)
print("Pad pi with 0 = %010.2f"%3.1415) # pad with zeros

## Other String Methods

Multiplying a string by an integer simply repeats it

In [None]:
print("Hello World! "*5)

Strings can be tranformed by a variety of functions:

Let's get back to our trna example. 

In [None]:
s="hello wOrld"
print(s.capitalize())
print(s.upper())
print(s.lower())
print('|%s|' % "Hello World".center(30)) # center in 30 characters
print('|%s|'% "     lots of space             ".strip()) # remove leading and trailing whitespace
print("Hello World".replace("World","Class"))

There are also lots of ways to inspect or check strings. Examples of a few of these are given here:

In [None]:
help(str)

In [None]:
trna='AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGAGTGGGGTTTTGCAGTCCTTA'
print("The length of the sequence is %i" % len(trna),"nucleotides") # len() gives length

#count strings
print("There are %d 'G's but only %d C's in the sequence" % (trna.count('G'),trna.count('C')))
print('The "ATTAA" motif is at index',trna.find('ATTAA')) #index from 0 or -1

### Exercise

Calculate the % GC and % AT content in the trna sequence

In [11]:
A_count=trna.count('A')
C_count=trna.count('C')
G_count=trna.count('G')
T_count=trna.count('T')

## String comparison operations
Strings can be compared in lexicographical order with the usual comparisons. In addition the `in` operator checks for substrings:

In [None]:
'abc' < 'bbc' <= 'bbc'

In [13]:
"ABC" in "This is the ABC of Python"

True

## Accessing parts of strings

Strings can be indexed with square brackets. Indexing starts from zero in Python. 

In [None]:
s = 'AAGGGCTTAGCTTAATTAAAGTGGCTGATTTGCGTTCAGTTGATGCAGAGTGGGGTTTTGCAGTCCTT'
print('First nucleotide of the sequence is',s[0])
print('Last nucleotide of the sequence is',s[len(s)-1])

Negative indices can be used to start counting from the back

In [None]:
print('First nucleotide of the sequence is',s[-len(s)])
print('Last nucleotide of the sequence is',s[-1])

#### Slicing
Finally a substring (range of characters) can be specified as using $a:b$ to specify the characters at index $a,a+1,\ldots,b-1$. Note that the last charcter is *not* included. Now we can find the first codon in the sequence:

In [None]:
print("First codon in the sequence is",s[0:3])
print("The secodn codon in the sequence is",s[3:6])

An empty beginning and end of the range denotes the beginning/end of the string:

In [None]:
print("First codon in the sequence is", s[:3])
print("Last codon in the sequence is", s[-3:])

A colon without an index, returns the whole string. 

In [None]:
s[:]

## Strings are immutable

It is important that strings are constant, immutable values in Python. While new strings can easily be created it is not possible to modify a string:

In [None]:
s='012345'
sX=s[:2]+'X'+s[3:] # this creates a new string with 2 replaced by X
print("creating new string",sX,"OK")
sX=s.replace('2','X') # the same thing
print(sX,"still OK")
s[2] = 'X' # an error!!!

### Exercise:

1. Given the following amino acid sequence (MNKMDLVADVAEKTDLSKAKATEVIDAVFA), find the first, last and the 5th amino acids in the sequence. 
2. The above amino acid is a bacterial restriction enzyme that recognizes "TCCGGA". Find the first restriction site in the following sequence: AAAAATCCCGAGGCGGCTATATAGGGCTCCGGAGGCGTAATATAAAA