# PART 4

## **[Dictionaries](https://docs.python.org/3/library/stdtypes.html#dict)**

In [106]:
my_first_dict = {}  #or another method: my_first_dict = dict()
print('dict: {}, type: {}'.format(my_first_dict, type(my_first_dict)))

dict: {}, type: <class 'dict'>


### Initialization

In [107]:
dict1 = {'key1': 2, 'key2': 4, 'key3': 6}
my_second_dict = dict(key1 = 2, key2 = 4, key3 = 6)

print(dict1)
print(my_second_dict)

print('equal: {}'.format(dict1 == my_second_dict))
print('length: {}'.format(len(dict1)))

{'key1': 2, 'key2': 4, 'key3': 6}
{'key1': 2, 'key2': 4, 'key3': 6}
equal: True
length: 3


In [111]:
my_dict = {
    'Subaru' : 'Japanese',
    'Tesla' : 'American',
    'Mercedes' : 'German',
}

print(my_dict)

{'Subaru': 'Japanese', 'Tesla': 'American', 'Mercedes': 'German'}


In [112]:
help(format)
help('FORMATTING')

Help on built-in function format in module builtins:

format(value, format_spec='', /)
    Return value.__format__(format_spec)
    
    format_spec defaults to the empty string.
    See the Format Specification Mini-Language section of help('FORMATTING') for
    details.

Format String Syntax
********************

The "str.format()" method and the "Formatter" class share the same
syntax for format strings (although in the case of "Formatter",
subclasses can define their own format string syntax).  The syntax is
related to that of formatted string literals, but there are
differences.

Format strings contain “replacement fields” surrounded by curly braces
"{}". Anything that is not contained in braces is considered literal
text, which is copied unchanged to the output.  If you need to include
a brace character in the literal text, it can be escaped by doubling:
"{{" and "}}".

The grammar for a replacement field is as follows:

      replacement_field ::= "{" [field_name] ["!" conversio

## `dict.keys(), dict.values(), dict.items()`

In [113]:
print('keys: {}'.format(my_second_dict.keys()))
print('values: {}'.format(my_second_dict.values()))
print('items: {}'.format(my_second_dict.items()))

keys: dict_keys(['key1', 'key2', 'key3'])
values: dict_values([2, 4, 6])
items: dict_items([('key1', 2), ('key2', 4), ('key3', 6)])


### How to access the values of a dictionary and how to set new values?

In [117]:
recipe_dict = dict()
recipe_dict['ratatouille'] = ['eggplant', 'tomato', 'zucchini', 'onion', 'olive oil', 'garlic']
recipe_dict['homemade pasta'] = ['semolina flour', 'olive oil', 'sea salt']
recipe_dict['homemade pasta'] = ['wheat', 'olive oil', 'sea salt']  # we are changing the value that exists
print(type(recipe_dict))
print(recipe_dict)
print('ingredients for homemade pasta: {}'.format(recipe_dict['homemade pasta']))


<class 'dict'>
{'ratatouille': ['eggplant', 'tomato', 'zucchini', 'onion', 'olive oil', 'garlic'], 'homemade pasta': ['wheat', 'olive oil', 'sea salt']}
ingredients for homemade pasta: ['wheat', 'olive oil', 'sea salt']


Accessing a nonexistent key will raise `KeyError`:

In [120]:
print(recipe_dict['soup'][0])

KeyError: 'soup'

### Deleting elements of dictionary

In [None]:
my_dict = {'key1': 3, 'key2': 9, 'key3': 15}
del my_dict['key3']
print(my_dict)

# First check: Does the key exist? (hints: pop() ve popitem())
key_to_delete = 'key3'
if key_to_delete in my_dict:
    del my_dict[key_to_delete]
else:
    print('{key} is not in {dictionary}'.format(key=key_to_delete, dictionary=my_dict)) #remember format

In [None]:
squares = {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
print('dictionary before modification: ', squares)

#let's remove a given element using pop method 
print(squares.pop(1))
print(squares)

#let's remove a random element
print(squares.popitem()) # returns (key, value)
print(squares)

#let's remove all elements
squares.clear() #result {}
print(squares)

#let's delete the dictionary
del squares
print(squares) #it will error out

### Dictionaries are mutable

Which other data types are mutable?

In [None]:
grades = {}.fromkeys(['week 1', 'week 2', 'week 3'], 88)
print(grades)

updated_grades = grades
grades['week 2'] = 90
grades['week 3'] = 92
grades['week 4'] = 82
print('grades: {}\nupdated_grades: {}'.format(grades, updated_grades))
print('equal: {}'.format(grades == updated_grades))

In [None]:
help(dict.fromkeys)

If we prefer a copy, let's make a new `dict`:

In [None]:
grades = {}.fromkeys(['week 1', 'week 2', 'week 3'], 88)
updated_grades = dict(grades)
grades['week 2'] = 90
grades['week 3'] = 92
print('grades: {}\nupdated_grades: {}'.format(grades, updated_grades))
print('equal: {}'.format(grades == updated_grades))

### `dict.get()`

Returns `None` if `key` is not in `dict`. However, you can also specify default return value which will be returned if `key` is not present in the `dict`.

In [None]:
my_dict = {'a': 1, 'b': 2, 'c': 3}
d = my_dict.get('d')
print('d: {}'.format(d))

d = my_dict.get('d', 'temporary values for d')
print('d: {}'.format(d))

In [None]:
help(dict.get)

## `dict.pop()`

In [None]:
enzymes = {}
enzymes['EcoRI'] = r'GAATTC'
enzymes['AvaII'] =  r'GG(A|T)CC'
enzymes['BisI'] =  r'GC[ATGC]GC'
print('enzymes dictionary before pop method: {}'.format(enzymes))


#let's remove an element using pop, it will shows its value 
EcoRI = enzymes.pop('EcoRI')
print('EcoRI: {}'.format(EcoRI))
print('enzymes dictionary after removing EcoRI: {}'.format(enzymes))



### `dict.setdefault()`

Returns the `value` of `key` defined as first parameter. If the `key` does not exist in the dict, adds `key` with `default value` (second parameter).

In [None]:
new_dict = {'a': 1, 'b': 2, 'c': 3}
a = new_dict.setdefault('a', 'default value')
d = new_dict.setdefault('d', 'default value')
print('a: {}\nd: {}\nnew_dict: {}'.format(a, d, new_dict))

### `dict.update()`

1) dictionary gets updated, 2) two `dictionaries` get combined.

In [None]:
d = {1: "one", 2: "three"}
d1 = {2: "two"}

# updates the value of key 2
d.update(d1)
print(d)

d1 = {3: "three"}

# adds element with key 3
d.update(d1)
print(d)

## the keys in the `dict` are immutable

Therefore, we can not use mutable data types ad keys of a dictionary. Examples: list or dictionary:

In [None]:
 bad_dict = {['my_list'], 'value'}  # will generate TypeError

Values are mutable

In [None]:
good_dict = {'my key': ['Python', ' can be', 'difficult']}
print(good_dict)

In [None]:
#Introduction to bioinformatics: Using dictionaries

dna = "AATGATGAACGAC" 
dinucleotides = ['AA','AT','AG','AC',
                 'TA','TT','TG','TC', 
                 'GA','GT','GG','GC', 
                 'CA','CT','CG','CT'] 
all_counts = {}  #initialize dictionary
for dinucleotide in dinucleotides: #for each element in dictionary
    count = dna.count(dinucleotide) #count how many of that element
    print("count is " + str(count) + " for " + dinucleotide) 
    all_counts[dinucleotide] = count #all these elements are keys, so let's define the values for each key
print(all_counts)

# PART 5

## **[`for` loops](https://docs.python.org/3/tutorial/controlflow.html#for-statements)**

### For loops in lists 

Example: Our goal is to convert each value given in cm to inch and print the output.

info: 1 inch = 2.54 cms

Algorithm:
length_cm=[158, 165, 168, 172, 183, 190]

For each element in length_cm[index]
          inc = length_cm[index]/2.54

          print cm and inch values
          Move to the next element

For every operation that needs to take place in the for loop, they need to have same indentation. 
Not a requirement, but better to use \"<TAB\"> or press space four times.

In [None]:
print("------------------") # beginning of the output)
length_cm = [158, 165, 168, 172, 183, 190]
for item in length_cm: # for each element in length_cm
    # entering for loop
    # things to do in for loop
    inc = item/2.54 # convert to inch
    print('value in cm: ', item, ',', 'value in inch: ', inc) # Print
    # exit for loop
# things to do outside for loop
print("------------------") # end of the output

### `break`
using break, we can stop the loop when a given condition is satisfied.

In [None]:
for item in length_cm:
    if item == 165:
        break
    print(item)

### `continue`
Continue to the next item without executing the lines occuring after `continue` inside the loop:

In [None]:
for item in length_cm:
    if item == 160:
        continue
    print(item)

### `enumerate()` 
enumerate() function can give us both the value and the index for that given value

In [None]:
for idx, val in enumerate(length_cm):
    print('idx: {}, value: {}'.format(idx, val))

In [None]:
Turkish_consonants = "bcçdfgğhjklmnprsştvyz"
Turkish_consonants = list(Turkish_consonants)
print(Turkish_consonants)

count = 0
for item in Turkish_consonants:
    print(count, item)
    count +=1
#What happens if count+=1 is outside of the loop?

In [None]:
#We can rewrite the above for loop more neatly by using enumerate() function.

for count, item in enumerate(Turkish_consonants):
    print(count, item)


In [None]:
#python
# Python program to illustrate
# enumerate function
l1 = ["Moderna","BioNTech","Sinovac"]
s1 = "available vaccines"

# creating enumerate objects
obj1 = enumerate(l1)
obj2 = enumerate(s1)

print("Return type:",type(obj1))
print(list(enumerate(l1)))

# changing start index to 2 from 0
print(list(enumerate(s1,2)))



## for loops for dictionaries

In [None]:
recipe_dict = {} 
recipe_dict['ratatouille '] = ['eggplant', 'tomato', 'zucchini', 'onion', 'olive oil', 'garlic']
recipe_dict['homemade pasta'] = ['semolina flour', 'olive oil', 'sea salt']
for key in recipe_dict.keys(): 
    print(key)

## `range()`

range() function returns numbers starting from 0, and increasing incrementally.

In [None]:
for number in range(5):
    print(number) #range(5) returns 0-4, not 0-5.

In [None]:
for number in range(2, 5):
    print(number) 

#range() function's default initial value is 0,
#but we can specify a different initial value:
#range(2, 5) returns values from 2 to 5 (exclusive of 5):

In [None]:
for number in range(0, 10, 2):  # last number is the number of steps
    print(number)

In [None]:
# generate a random sequence
#import random
seq = ''.join([random.choice('ACGT') for _ in range(10)])
print(seq)
print(len(seq))

# What is happening here?

In [None]:
for number in range(10): 
    print(number)

In [None]:
help(random.choice)

In [None]:
print(random.choice('ACGT'))

In [None]:
#Let's practice some

my_numbers = [1,3,5,7,9,12,19,21]

#which numbers in the `my_numbers` list are multiples of 3? Please answer with a for loop.

for item in my_numbers:
    if item%3 ==0:
        print(item)

In [None]:
#The sum of values in my_numbers list?

total  = 0
for item in my_numbers:
    total += item
print('total:', total)


In [None]:
for num in range(10,20):  # loop through values between 10 and 20 (exclusive of 20)
   for i in range(2,num): # for each element in the loop, starting from 2
      if num%i == 0:      
         j=num/i          
         print('%d is equal to %d * %d' % (num,i,j))
         break # move to the next number
   else:                 
      print(num, ' is a prime number')

## The importance of pointers in Python: What is the point?

So far, we have seen that everything in Python is indeed an object. Each object contains at least three pieces of data:

    Reference count
    Type
    Value


A variable does not point to a value in Python but points to the memory address of an object. For example, in our simple example x = 1 the variable x is pointing to a memory address that the integer object 1 is stored.

# How do we find the memory address that the variable x points to?

    id() returns the object’s memory address.
    is returns True if and only if two objects have the same memory address.


In [2]:
# Create a dictionary and populate with elements
x = {}
x['val1'] = list(range(1,10))
y = 1

print('address of x', id(x))
print('value of x', x)
print('address of y', id(y))
print('value of y', y)

address of x 140070764232136
value of x {'val1': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
address of y 140071108720544
value of y 1


In [141]:
# reassign value to y
y = x['val1']
print('address of x', id(x))
print('value of x', x)
print('address of val1 in dict x', id(x['val1']))
print('address of y', id(y))
print('value of y', y)

address of x 139892595062824
value of x {'val1': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
address of val1 in dict x 139892594668424
address of y 139892594668424
value of y [1, 2, 3, 4, 5, 6, 7, 8, 9]


In [142]:
y[2] = 99
print('address of x', id(x))
print('value of x', x)
print('address of val1 in dict x', id(x['val1']))
print('address of y', id(y))
print('value of y', y)

address of x 139892595062824
value of x {'val1': [1, 2, 99, 4, 5, 6, 7, 8, 9]}
address of val1 in dict x 139892594668424
address of y 139892594668424
value of y [1, 2, 99, 4, 5, 6, 7, 8, 9]


In [144]:
x['val1'] = list(range(1,10))
y = [i for i in x['val1']]
print('address of x', id(x))
print('value of x', x)
print('address of val1 in dict x', id(x['val1']))
print('address of y', id(y))
print('value of y', y)

address of x 139892595062824
value of x {'val1': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
address of val1 in dict x 139892594669512
address of y 139892595492040
value of y [1, 2, 3, 4, 5, 6, 7, 8, 9]


In [145]:
y[2] = 99
print('address of x', id(x))
print('value of x', x)
print('address of val1 in dict x', id(x['val1']))
print('address of y', id(y))
print('value of y', y)

address of x 139892595062824
value of x {'val1': [1, 2, 3, 4, 5, 6, 7, 8, 9]}
address of val1 in dict x 139892594669512
address of y 139892595492040
value of y [1, 2, 99, 4, 5, 6, 7, 8, 9]


In [197]:
x = {}
x['val1'] = [0,1]
x

{'val1': [0, 1]}

In [198]:
x['val1'] = [0,1]
y = [x['val1']]*10 # make a new variable that is now an array, essentialy copying the dictionary 10 times: array of arrays
print('value of y', y)
print(id(y[0]), id(y[1])) # for every element of y, the memory address is the same. WHY??

y[0][0] = 0
y[0][1] = 0
y[1][0] = 1
y[1][1] = 1
y[2][0] = 2
y[2][1] = 2 #we were hoping to get [0,0,1,1,2,2]. What happened?
print('value of y', y)

value of y [[0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1], [0, 1]]
139889966815688 139889966815688
value of y [[2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2], [2, 2]]


In [193]:
import pandas as pd
x = [[None, None]]
x = pd.DataFrame(x, columns=['gender','age'])
[david, brooke, bilge] = [x, x, x]

brooke.gender = 'female'
brooke.age = 35
bilge.gender = 'kadin'
bilge.age = 21
david.gender = 'male'
david.age = 105

In [194]:
y = ['david', 'brooke', 'bilge']
x = [david, brooke, bilge]
for i in range(len(x)):
    print(y[i])
    print(x[i])

david
  gender  age
0   male  105
brooke
  gender  age
0   male  105
bilge
  gender  age
0   male  105


In [191]:
print('memory address of david', id(david))
print('memory address of brooke', id(brooke))
print('memory address of bilge', id(bilge))

memory address of david 139889966847144
memory address of brooke 139889966847144
memory address of bilge 139889966847144


In [5]:
import pandas as pd
x = [[None, None]]
x = pd.DataFrame(x, columns=['gender','age'])
# initialize with copies
[david, brooke, bilge] = [x.copy(), x.copy(), x.copy()]

david.gender = 'male'
david.age = 43
brooke.gender = 'female'
brooke.age = 28
bilge.gender = 'female'
bilge.age = 79

In [189]:
y = ['david', 'brooke', 'bilge']
x = [david, brooke, bilge]
for i in range(len(x)):
    print(y[i])
    print(x[i])

david
  gender  age
0   male  105
brooke
   gender  age
0  female   35
bilge
  gender  age
0  kadin   21


In [192]:
print('memory address of david', id(david))
print('memory address of brooke', id(brooke))
print('memory address of bilge', id(bilge))

memory address of david 139889966847144
memory address of brooke 139889966847144
memory address of bilge 139889966847144


# Functions

In [None]:
def hello_my_first_function():
    print('Hello world!')

print('type: {}'.format(hello_my_first_function))

hello_my_first_function()  # Let's call the function

### Arguments

In [None]:
def key_Python_libraries(name1, name2, name3): #required arguments
    print('These are great libraries for data visualization in Python:  {}, {}, {}'.format(name1, name2, name3))

key_Python_libraries('matplotlib', 'seaborn', 'Bokeh')

In [None]:
# functions that return something we define 

def strip_and_lowercase(original):
    modified = original.lower().strip()
    return modified

ugly_string = '  MixED CaSe '
pretty = strip_and_lowercase(ugly_string)
print('pretty: {}'.format(pretty))

#In general, a function takes arguments (if any), performs some operations, and returns a value (or object). 
#The value that a function returns to the caller is generally known as the function’s return value. 
#All Python functions have a return value, either explicit or implicit. 

In [None]:
def word_lowercase(word):
    modified = word.upper()
    return modified

mixed = 'AAgcgctgagtcTGC'
word_lowercase(mixed)

In [None]:
# Introduction to bioinformatics: Example to the use of dictionaries

dna = "AATGATGAACGAC" 
dinucleotides = ['AA','AT','AG','AC',
                 'TA','TT','TG','TC', 
                 'GA','GT','GG','GC', 
                 'CA','CT','CG','CT'] 
all_counts = {}  #initiated the dictionary
for dinucleotide in dinucleotides: #for each element in the dictionary
    count = dna.count(dinucleotide) #let's count how many elements 
    print("count is " + str(count) + " for " + dinucleotide) 
    all_counts[dinucleotide] = count #all elements are keys, let's prepare values for each key
print(all_counts)


In [None]:
def dinucleotide_counting(seq):
    """This function gets a DNA sequence and returns the count of each dinucleotide"""
    #convert sequence to upper letter format 
    seq = seq.upper()
    counting = {k: 0 for k in dinucleotides}  # This is faster than a list
    #Scans the sequence, looking for all dinucleotides at once
    for i in range(len(seq)-2):
        if seq[i:i+2] in counting:
            counting[seq[i:i+2]] += 1
    return counting

In [None]:
dinucleotide_counting('AAgcgctgagtcTGC')

In [None]:
seq2 = 'AACTG'
range(len(seq2)-2)

In [None]:
#TGGA
b = {k: 0 for k in dinucleotides}
b


In [None]:
seq = 'AATGC'
seq[2:4]

In [None]:
dinucleotides = ['AA','AT','AG','AC',
                 'TA','TT','TG','TC', 
                 'GA','GT','GG','GC', 
                 'CA','CT','CG','CT'] 
dinucleotide_counting("AATGATGAACGAC")

### Keyword arguments

In [None]:
def first_algorithm(first, second, third):
    return (first + second)** third 

print(first_algorithm(2, 3, 4))

print(first_algorithm(first=2, second=3, third=4))

# using keyword arguments, we can change the ordering
print(first_algorithm(third=4, first=2, second=3))

# positional arguments and keyword arguments can also change places but we should start with positional arguments
print(first_algorithm(2, third=4, second=3))  

print(first_algorithm(third=4, second=3, 2))  #It will error out


### Default arguments 

In [None]:
def greet(name, msg="good morning!"):
    """
    This function greets to
    the person with the
    provided message.

    If the message is not provided,
    it defaults to "Good
    morning!"
    """

    print("Hello", name + ', ' + msg)


greet("Alex")
greet("Krishna", "how are you?")

**Do not use mutable objects as default arguments!**

In [None]:
def compute_patterns(inputs=[]):
    inputs.append("some stuff")
    patterns = ["a list is based on"] + inputs
    return patterns

In [None]:
compute_patterns()

In [None]:
def append(element, seq=[]):
    seq.append(element)
    return seq


In [None]:
append(1) # seq is assigned to []

# This returns a reference to the *same* list as the default for `seq`

In [None]:
append(4) # `seq` is now given [1] as a default!

Let's fix the above problem like this:

In [None]:
def append(element, seq=None):
    if seq is None:  
        seq = []
    seq.append(element)
    return seq

In [None]:
append(4)

In [None]:
#a more clean fix
def append(element, seq=None):
    seq = seq if seq else []
    seq.append(element)
    return seq

In [None]:
append(3)

### Docstrings
Python docstrings are the string literals that appear right after the definition of a function, method, class, or module. They are used to document our code.

In [None]:
filename = 'fastq_runid_687811ddee13bdfed6e08d8f5d403e432f41ebd8_0_0.fastq'

In [None]:
import bilge_pype as bpy

def load_file(filename, run_info=None):
    '''
    Checks the file type and load the data
    '''
    file_format = filename.split('.')
    # check if it is a fasta file
    if ('fasta' in file_format) or ('fa' in file_format):
        A = bpy.read_fasta(filename)
    # check if it is a fastq file
    elif ('fastq' in file_format) or ('fq' in file_format):
        A = bpy.read_fastq(filename)
    return A

load_file(filename)

In [None]:
def print_sum(val1, val2):
    """results in the sum of two given values"""
    print('total: {}'.format(val1 + val2))

print_sum(308, 1897)
print(help(print_sum))

### [`pass`](https://docs.python.org/3/reference/simple_stmts.html#the-pass-statement) statement
`pass` statement is a null statement.The interpreter does not ignore a pass statement, but nothing happens and the statement results into no operation. The pass statement is useful when you don't write the implementation of a function but you want to implement it in the future.

In [None]:
def my_function(bla_bla_bla):
    pass

def my_other_function():
    pass