# <font color="blue">1.01- Data Types</font>

In Python, a data type is a classification of data which tells its interpreter how the program intends to use the data.

<br>
<font color="red"><u><b>Learning Objectives</b></u>
<div class="panel-body">
<ul>
<li>Importance and use complex data types.</li>
<li>Use of five data structures: Strings, Lists, Tuples, Dictionnaries and Sets.</li>
</ul>
</div>
</font>

We do have two data types in general: primitive and complex data types. 
The primitive types contain numbers: integer, floating point, Booleans or complex.

**Primitive types:**

<font color="blue">
* int
* float
* bool
* complex value

</font>

**Complex types:**

A complex data type is any data type which can be constructed in a program using the  primitive data types and other composite types. In other words, a data type is a particular way of organizing data in a computer so that it can be used efficiently. Its implementation requires **operations** to efficiently manipulate its instances. In the last week we will teach you how to create your own data structures, but these are the commonly used.

<font color="blue">
* string
* list: a finite sequence
* tuple: a finite sequence
* dictionary: implementation of a binary relation, sometimes called a database. Its members have the form *(key,data)*. 
* set: a collection of distinct objects.
</font>
Not all data types handle changes the same way. Some are **mutable**: they can be altered after their instantiation.  Others are **immutable**: they cannot be altered. 

**All the primitive types, string and tuples are immutable while lists, dictionary and sets are mutable.**

### Examples

#### with string and integer

In [11]:
name = "Jordan"
age=18
id(name), id(age)

(139657315698256, 10914912)

#### After the change

In [12]:
name='Masakuna'
age=20
id(name), id(age)

(139657306824944, 10914976)

#### With a float and complex value

In [13]:
height=1.99
z=2+3j
id(height), id(z)

(139657429462472, 139657316056656)

In [14]:
height+=0.8 # height=height+0.8
z=3+6j
id(height), id(z), id(z.real), id(z.imag)

(139657429462520, 139657316056752, 139657429462472, 139657429462472)

#### With a tuple

In [15]:
countries=("Benin", "Ghana", "Botwana", "Zimbabwe")
id(countries)

139657306779304

##### After defining a tuple, no more change of neither its existing values nor a new value 

In [None]:
countries[1]="Namibia"

#### with a list

In [16]:
students=['Timothy', 'Jeromy', 'Lambert']
id(students)

139657306827592

#### After change

In [17]:
students[2]='Yasser'
id(students)

139657306827592

In [20]:
students.append('Siyabonga')
id(students)

139657306827592

#### With a dictionary

In [22]:
student_information={"Siyabonga":['South Africa', 'Pure Math', 26, 'English'], 
                    "Kelone":['Botwana', 'Applied Math', 24, 'English']}
student_information

{'Kerone': ['Botwana', 'Applied Math', 24, 'English'],
 'Siyabonga': ['South Africa', 'Pure Math', 26, 'English']}

In [23]:
id(student_information)

139657375220616

#### After change

In [24]:
student_information["Farid"]=['Algeria', 'Pure Math', 26, 'Arabic']
id(student_information)

139657375220616

## <font color="blue">A. Strings</font>

To start understanding the string type, let's use the built in helpsystem.

In [None]:
help(str)

The help page for string is very long, and it may be easier to keep it open
in a browser window by going to the [online Python
documentation](http://docs.python.org/library/stdtypes.html#sequence-types-str-unicode-list-tuple-bytearray-buffer-xrange)
while we talk about its properties.

At its heart, a string is just a sequence of characters. Basic strings are
defined using single or double quotes.

In [None]:
s = "This is a string."
s2 = 'This is another string that uses single quotes'A

The reason for having two types of quotes to define a string is
emphasized in these examples:

In [None]:
s = "Bob's mom called to say hello."
s = 'Bob's mom called to say hello.'

The second one should be an error: Python interprets it as `s = 'Bob'` then the
rest of the line breaks the language standard.



### Working with Strings

Strings are iterables, which means . For instance, characters can
be accessed individually or in sequences:

In [None]:
s = 'abcdefghijklmnopqrstuvwxyz'
s[0]

In [None]:
print(s[-1])

In [None]:
s[1:4]

#### what will happen to the following command?

In [None]:
s[2]='j'

They can also be compared using sort and equals.

In [None]:
'str1' == 'str2'

In [None]:
'str1' == 'str1'

In [None]:
'str1' < 'str2'

**Hands on example**

Try each of the following functions on a few strings. What does the
function do?

In [None]:
s = "This is a string"

In [None]:
s.startswith("This")

In [None]:
s.split(" ")

In [None]:
s.strip() # This won't change every string!

In [None]:
s.capitalize()

In [None]:
s.lower()   #.lower() string method

In [None]:
s.upper()   #.upper() string method

 Investigate what the .count() and .find() string methods do and test them. The .replace() string method could also be interesting to look at.




There are operations that can be done with strings.

In [None]:
firstname = "Jordan"
lastname = "Masakuna"

When concatenating strings, we must explicitly use the concatenation operator +.  Computers don't understand context.

In [None]:
fullname = firstname + lastname
print (fullname)

In [None]:
fullname = firstname + " " + lastname
print (fullname)

### Exercise

Jordan's father's name is Augustin. Print his full name, including his title: Jordan Masakuna, Mr.

### Bonus Exercise: Transcribe DNA to RNA
#### Motivation:
During transcription, an enzyme called RNA Polymerase reads the DNA sequence and creates a complementary RNA sequence. Furthermore, RNA has the nucleotide uracil (U) instead of thymine (T). 
#### Task:
Write a function that mimics transcription. The input argument is a string that contains the letters A, T, G, and C. Create a new string following these rules: 

* Convert A to U

* Convert T to A

* Convert G to C

* Convert C to G

Hint: You can iterate through a string using a for loop similary to how you loop through a list.

In [None]:
def transcribe(seq):


Check your work:

In [None]:
transcribe('ATGC') == 'UACG'

In [None]:
transcribe('ATGCAGTCAGTGCAGTCAGT') == 'UACGUCAGUCACGUCAGUCA'

# <font color="blue">B. Lists</font>

Python would be a fairly useless language if it weren't for the compound data types. As mathematicians and scientists, you'll also find **numpy arrays** useful, because they are designed to handle numeric data but that will be done later.

A list is an ordered, indexable collection of data. Lets say you have collected some current and voltage data that looks like this:

    voltage:
        -2.0
        -1.0
        0.0
        1.0
        2.0

    current:
        -1.0
        -0.5
        0.0
        0.5
        1.0

So you could put that data into lists like:

In [None]:
voltage = [-2.0, -1.0, 0.0, 1.0, 2.0]

current = [-1.0, -0.5, 0.0, 0.5, 1.0]

**voltage** is of type list:

In [None]:
type(voltage)

In [None]:
#And to find the value of the third item
voltage[2]

Lists can be indexed from the back using a negative index. The last item of current


In [None]:
current[-1]

and the next-to-last

In [None]:
current[-2]

You can "slice" items from within a list. Lets say we wanted the second through fourth items from voltage.



In [None]:
voltage[1:4]


As you can see this slice returns all items for which the index is in 1≤i<4
. You can leave the second space blank to get all remaining items. For example, the third item to the end can be obtained by

In [None]:
voltage[2:]

### Exercise

Power is defined as voltage multiplied by current. What is the power for the second entry? For the fourth? Print a list that stores the power for all the entries.

### Append and Extend

Just like strings have methods, lists do too.

In [None]:
dir(list)

One useful method is append. Lets say we want to stick the following data on the end of both our lists :

    voltage:
        3.0
        4.0
        
    current:
        1.5
        2.0

If you want to append items to the end of a list, use the append method.

In [None]:
voltage.append(3.0)

In [None]:
voltage.append(4.0)
voltage

You can see how that approach might be tedious in certain cases. If you want to concatenate a list onto the end of another one, use extend.

In [None]:
current.extend([1.5, 2.0])

In [None]:
current

Python lists can also be extended using the addition operator

In [None]:
[3, "Yes"] + ["AIMS", 6]

### Heterogeneous Data

Lists can contain hetergeneous data.

In [None]:
data = ["experiment: current vs. voltage", 
        "run", 47,
        "temperature", 372.756, 
        "current", [-1.0, -0.5, 0.0, 0.5, 1.0], 
        "voltage", [-2.0, -1.0, 0.0, 1.0, 2.0],
        ]

In [None]:
print(data)

We've got strings, ints, floats, and even other lists in there. 

### Exercise

Make a list that contains the following: your first name, your age as an integer, and your last name. Then, using that list, print a single string that looks like

```
<first name> <last name> is <age> years old.
```

### Length of Lists

Sometimes you want to know how many items are in a list. Use the len command.

In [None]:
len(voltage)

### Assigning Variables to Other Variables

Something that might cause you headaches in the future is how python deals with assignment of one variable to another. When you set a variable equal to another, both variables point to the same thing. Changing the first one ends up changing the second. Be careful about this fact.

In [None]:
a = [1, 2]

#### This does not copy values but memory location

In [None]:
b = a

In [None]:
a.append(10)

In [None]:
a, b

The way to deal with that is to do the following:

In [None]:
x = [1, 2]
y=x[:]
x.append(10)
x, y

### Or

In [None]:
import copy as cp

c = [1, 2]
d = cp.copy(c)
d. append(10)
c,d

# <font color="blue">C. Tuples</font>

Tuples are one of Python's basic container data types. Tuples are **immutable**. Once data is placed into a tuple, the tuple cannot be changed. You define a tuple as follows:

In [None]:
tup = ("red", "white", "blue") 

In [None]:
type(tup)

In [None]:
for i in tup:
    print (i)

### Exercise 

Make a tuple of the _remaining colors_ (not red, white, and blue) of the South African flag. Can you figure out a way to combine the two groups of colors into a single tuple?

<img src=za_flag.png>

# <font color="blue">D. Dictionaries</font>

A Python dictionary is a unordered collection of key-value pairs.  Dictionaries are likely the most useful data type in Python you will use in everyday programming. The key is a way to name the data, and the value is the data itself. Here's a way to create a dictionary that contains all the data in our data.dat file in a more sensible way than a list.

In [None]:
data = {"experiment": "current vs. voltage",
        "run": 47,
        "temperature": 372.756, 
        "current": [-1.0, -0.5, 0.0, 0.5, 1.0], 
        "voltage": [-2.0, -1.0, 0.0, 1.0, 2.0],
        }

In [None]:
data

This model is clearly better because you no longer have to remember that the run number is in the second position of the list, you just refer directly to "run":

In [None]:
data["run"]

If you wanted the voltage data list:

In [None]:
data["voltage"]

Or perhaps you wanted the last element of the current data list

In [None]:
data["temperature"] = 3275.325

You can also add new keys to the dictionary.  Note that dictionaries are indexed with square braces, just like lists--they look the same, even though they're very different.

In [None]:
data.keys()

also, values

In [None]:
data.values()

### Exercise

You have an additional voltage and current reading of 1.5 and 3, respectively. Add this observation to the `data` dictionary and show that it has been added. 

Now, print the _difference_ between the power observed in the last two readings.

# <font color="blue">E. Sets</font>

Most introductory python courses do not go over sets this early (or at all), but I've found this data type to be useful. The python set type is similar to the idea of a mathematical set: it is an unordered collection of unique things. Consider:

In [None]:
fruits = {"apple", "banana", "pear", "banana"}

Since sets contain only unique items, there's only one banana in the set fruits.

In [None]:
fruits

In [None]:
id(fruits)

You can also add things to sets.

In [None]:
fruits.add('pineapple')
fruits

#### What will happen after the following command?

In [None]:
fruits.add('banana')
fruits

#### Mutability

In [None]:
id(fruits)

You can do things like intersections, unions, etc. on sets just like in math. Here's an example of an intersection of two sets (the common items in both sets).

In [None]:
A= {i for i in range(10)}
B = {i for i in range(1, 15, 2)}

In [None]:
A

In [None]:
B

#### Intersection

In [None]:
A & B

In [None]:
A.intersection(B)

#### Union

In [None]:
A | B 

In [None]:
A.union(B)

#### Difference and Symmetric difference

In [None]:
A.difference(B)

In [None]:
A.symmetric_difference(B)