# for loop

In [2]:
seq = 'GACTACAGAGACGCGATATCGATAGAGAGACACACTATATGA'

# Initialize GC counter
n_gc = 0

# Initialize sequence length
len_seq = 0

# Iterate through the sequence and count G's and C's
for base in seq:
    len_seq += 1
    if base in 'GCgc':
        n_gc += 1
        
# Get GC content by dividing
n_gc/len_seq

0.42857142857142855

There's actually a built-in way to find the length in Python.

In [3]:
len(seq)

42

In [4]:
seq = 'GACTACAGAGACGCGATATCGATAGAGAGACACACTATATGA'

# Initialize GC counter
n_gc = 0

# Iterate through the sequence and count G's and C's
for base in seq:
    if base in 'GCgc':
        n_gc += 1
        
# Get GC content by dividing
n_gc/len(seq)

0.42857142857142855

In [5]:
my_integers = [1, 2, 3, 4, 5]

for n in my_integers:
    n *= 2
    
my_integers

[1, 2, 3, 4, 5]

This doesn't work. It converts each member of the list to n but then it doesn't put n anywhere. We need to change the iterator.

In [8]:
for i in range(5):
    print(i)

0
1
2
3
4


In [9]:
for i in range(2,10):
    print(i)

2
3
4
5
6
7
8
9


In [10]:
for i in range(2, 10, 2):
    print(i)

2
4
6
8


Range works like slicing.

In [6]:
for i in range(5):
    my_integers[i] *= 2
    
my_integers

[2, 4, 6, 8, 10]

In [11]:
# Find index of all G's in a sequence

len_seq = 0

for base in seq:
    if base == 'G':
        print(len_seq)
    len_seq += 1


0
7
9
12
14
20
24
26
28
40


This works, but it's kind of cumbersome to have to keep track of len_seq and keep increasing it.

In [12]:
for i, base in enumerate(seq):
    if base == 'G':
        print(i)

0
7
9
12
14
20
24
26
28
40


In [15]:
# Let's use enumerate in a previous example
my_integers = [1, 2, 3, 4, 5]

for i, _ in enumerate(my_integers): # Style: we name variables _ when we plan to ignore them
    my_integers[i] *= 2
    
my_integers

[2, 4, 6, 8, 10]

Now we don't have to figure out how long the sequence is to use range. (We could use len to figure that out, but len cannot be used with certain type of data structures, so enumerate is more general.)

In [18]:
names = ('Dunn', 'Ertz', 'Lavelle', 'Rapinoe')
positions = ('D', 'MF', 'MF', 'F')
numbers = (19, 8, 16, 15)

# Print a list of jersey number, name, position

for num, pos, name in zip(numbers, positions, names):
    print(num, name, pos)

19 Dunn D
8 Ertz MF
16 Lavelle MF
15 Rapinoe F


In [19]:
count_up = ('ignition', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

for count in reversed(count_up):
    print(count)

10
9
8
7
6
5
4
3
2
1
ignition


# while loop

In [20]:
seq = 'ACUGAUCGUAGUCACUCUGAAUGAUAGGAGAGUGAUGAU'

In [23]:
start_codon = 'AUG'

# Initialize sequence index
i = 0

# Scan the sequence until I hit the start codon
while seq[i:i+3] != start_codon:
    i += 1
    
print('The start codon starts at index', i)

The start codon starts at index 20


The problem is if there's no AUG--the while loop will become an infinite loop!

In [25]:
start_codon = 'AUG'

# Initialize sequence index
i = 0

# Scan the sequence until I hit the start codon
while seq[i:i+3] != start_codon and i < len(seq):
    i += 1
    
print('The start codon starts at index', i)

The start codon starts at index 20


In [29]:
seq = 'ACUGAUCGUAGUCACUCUGAAGAUAGGAGAGUGAGAU'

# Initialize sequence index
i = 0

# Scan the sequence until I hit the start codon
while seq[i:i+3] != start_codon and i < len(seq):
    i += 1
    
if i == len(seq):
    print('Start codon was not found.')
else:
    print('The start codon starts at index', i)

Start codon was not found.


If you know how many times you need to do something, use a **for** loop. If you don't know, use a **while** loop.

In [30]:
%load_ext watermark
%watermark -v -p jupyterlab

CPython 3.7.7
IPython 7.13.0

jupyterlab 1.2.6
