<div align="center">
    <h1><a href="index.ipynb">Knowledge Discovery in Digital Humanities</a></h1>
</div>

<div align="center">
    <h2>Class 06. Python III: Data types</h2>
    <img src="img/python.png" width="300">
</div>

###Table of contents

- [Basic data types](#Basic-data-types)
- [Strings](#Strings)
- [Lists](#Lists)
- [Typles](#Typles)
- [Dictionaries](#Dictionaries)

###Basic data types

####`int`
- Type for integer numbers
- Examples: `1`, `1234567890`

####`long`
- Type for long integer numbers
- Examples: `10`<sup>`1000`</sup> (a one followed by a thousand zeros)

####`float`
- Type for floating-point numbers
- Examples: `1.0`, `3.1416`

####`bool`
- Type for logic values
- Examples: `True`, `False`

####`str`
- Type for strings
- Examples: `'Hello world!'`, `"Hello world!"`

####The `type` function
Returns the type of a value, variable or expression.

Examples:

In [1]:
type(1)

int

In [2]:
x = 10**1000 + 1
type(x)

long

In [3]:
y = 3.1 + 2.21
type(y)

float

In [4]:
type(x == y)

bool

####Type conversion functions
- `int` converts to `int` type (if possible)
- `float` converts to `float` type (if possible)
- `bool` converts to `bool` type (if possible)
- `str` : converts to `str` (if possible)

In [5]:
int('123')

123

In [6]:
int(3.1416)

3

In [7]:
float(123)

123.0

In [8]:
float('3.1416')

3.1416

In [9]:
bool([1, 2, 3])

True

In [10]:
bool(0)

False

In [11]:
str(123)

'123'

In [12]:
str(not True)

'False'

###Strings

####Notes on debugging
Common errors:
- Syntax:
    - not closing quotations, `''`/`""`
- Logic:
    - modifying an element
    - accessing to a non-existing element: index out of range
- Semantic:
    - not accessing the first and/or last element
    - not considering the empty string, `''`/`""`

####Description
- A string is a sequence of characters
- Suitable to represent texts
- Type: `str`
- Examples: `'Hello world!'`, `"Hello world!"`

####Indices
- Three ways to access a string:
    - As a whole
    - One character
        - Syntax: `string[index]`
    - Slices
        - Syntax: `string[index_1: index_2]`

Example:

<table align="left">
    <tr><td>character<td><b>k</b></td><td><b>n</b></td><td><b>o</b></td><td><b>w</b></td><td><b>l</b></td><td><b>e</b></td><td><b>d</b></td><td><b>g</b></td><td><b>e</b></td></tr>
    <tr><td>index<td>0</td><td>1</td><td>2</td><td>3</td><td>4</td><td>5</td><td>6</td><td>7</td><td>8</td></tr>
</table>

In [13]:
word = 'knowledge'

In [14]:
word

'knowledge'

In [15]:
word[1]

'n'

In [16]:
word[2: 5]

'owl'

####Immutability
- Strings are immutable (cannot be modified)
- To modify a string, it is necessary to reasign changes to a new (or same) string
- Example: `word += 's'`
    - Equivalent to `word = word + 's'`
    - Accesses the value of the variable `word`, concatenates a `s`, and reasigns the result to the variable `word` again

####Other functions and operators
- The `len` function returns the number of characters in a string
- The operator `in` checks if a string is contained in another string

Examples:

In [17]:
word = 'knowledge'

In [18]:
len(word)

9

In [19]:
'owl' in word

True

In [20]:
'abcd' in word

False

####Exercise 1
Write a function that receives a string and returns the number of characters (do not use the `len` function, use a `for` loop instead).

In [21]:
def length(s):
    counter = 0
    for c in s:
        counter += 1
    return counter

In [22]:
length('abcde')

5

####Exercise 2
What does this function do?

In [23]:
def a_function(string, char):
    result = -1
    index = 0
    while index < len(string):
        if string[index] == char:
            result = index
            break
        index += 1
    return result

In [24]:
a_function('knowledge', 'e')

5

In [25]:
a_function('knowledge', 'a')

-1

It returns the (first) index of a character in a string or `-1` if not found.

####Exercise 3
Write a function that counts the number of ocurrences of a character in a string.

In [26]:
def count(s, ch):
    counter = 0
    for c in s:
        if c == ch:
            counter += 1
    return counter

In [27]:
count('knowledge', 'e')

2

In [28]:
count('knowledge', 'a')

0

###Lists

####Notes on debugging
Common errors:
- Syntax:
    - not closing brackets, `[]`
- Logic:
    - accessing to a non-existing element: index out of range
- Semantic:
    - not accessing the first and/or last element
    - not considering the empty list, `[]`
    - modifying a list inside a function: side effects

####Description
- A list is a sequence of values
- Type: `list`
- Examples: `[1, 2, 3]`, `['a', 'b', 'c']`, `[1, 'abc', [], True]`

####Indices
- Same index system as strings
- Three ways to access a list:
    - As a whole
    - One element
        - Syntax: `list[index]`
    - Slices
        - Syntax: `list[index_1: index_2]`
- Range: `0 .. len(list) - 1`
- The index `[-1]` accesses the last element

Example:

<table align="left">
    <tr><td>element<td><b>1</b></td><td><b>2</b></td><td><b>3</b></td></tr>
    <tr><td>index<td>0</td><td>1</td><td>2</td></tr>
</table>

In [29]:
l = [1, 2, 3]

In [30]:
l[0]

1

In [31]:
l[-1]

3

####Mutability
- Strings are mutable (can be modified)

Example:

In [32]:
l = [1, 'abc', [], True]
l

[1, 'abc', [], True]

In [33]:
l[1] = 3.1416
l

[1, 3.1416, [], True]

####Other functions and operators
- `append(x)`: appends the element `x` at the end of the list
- `count(x)`: counts the number of occurences of the element `x` in the list
- `extend(l)`: appends the elements contained in the list `l` at the end of the list
- `index(x)`: returns the lowest index of the element `x` in the list
- `insert(i, x)`: inserts the element `x` in the position `i` of the list
- `pop(i)`: takes out and returns the element contained in the position `i` of the list
- `remove(x)`: removes the first occurence of the element `x` from the list
- `reverse()`: reverses the order of the list
- `sort()`: orders a list
- `in`: checks if an element is contained a the list
- `+`: concatenates two lists
- `*`: repeats a list a given number of times
- `[:]`: slices a list
    - General syntax: `l[n: m]` (includes the element in position `n`, excludes the element in position `m`)
    - `l[n:]` $\equiv$ `l[n: len(l)]`
    - `l[:m]` $\equiv$ `l[0: m]`
    - `l[:]` $\equiv$ `l[0: len(l)]` $\equiv$ `l`

Examples:

In [34]:
l = [2, 8, 4, 6]

In [35]:
l.append(8)
l

[2, 8, 4, 6, 8]

In [36]:
l.count(8)

2

In [37]:
l.extend([8, 7, 6])
l

[2, 8, 4, 6, 8, 8, 7, 6]

In [38]:
l.index(8)

1

In [39]:
l.insert(2, 9)
l

[2, 8, 9, 4, 6, 8, 8, 7, 6]

In [40]:
l.pop(2)

9

In [41]:
l.remove(8)
l

[2, 4, 6, 8, 8, 7, 6]

In [42]:
l.reverse()
l

[6, 7, 8, 8, 6, 4, 2]

In [43]:
l.sort()
l

[2, 4, 6, 6, 7, 8, 8]

In [44]:
3 in l

False

In [45]:
6 in l

True

In [46]:
l + [1, 2, 3]

[2, 4, 6, 6, 7, 8, 8, 1, 2, 3]

In [47]:
l * 2

[2, 4, 6, 6, 7, 8, 8, 2, 4, 6, 6, 7, 8, 8]

In [48]:
l[2: 6]

[6, 6, 7, 8]

####Exercise 4
Write a function called `invert` that returns the reverse of a list (do not use the function `reverse` of lists). Clue: create a new list, use the `insert` function.

In [49]:
def invert(l):
    result = []
    for elem in l:
        result.insert(0, elem)
    return result

In [50]:
invert([1, 3, 2, 4])

[4, 2, 3, 1]

####Exercise 5
Write a function called `repeat` that returns the result of repeating a list a number `n` of times (do not use the operator `*` of lists). Consider the case n=0. Clue: use the `+` operator.

In [51]:
def repeat(l, n):
    result = []
    i = 1
    while i <= n:
        result +=  l
        i += 1
    return result

In [52]:
repeat([1, 2, 3], 2)

[1, 2, 3, 1, 2, 3]

####Lists as arguments/parameters of functions
- The parameter is a reference to the list
- Modifying the paremeter (list inside the function) implies modifying the argument (list outside the function)
- To avoid this, make a copy of the list with `list()` or `[:]`

#### Lists vs strings
- Lists are mutable
- Strings are immutable
- A string is a sequence of characters
- A list is a sequence of values
- A list of characters is not a string
- The function `list` converts a string to a list

###Tuples

####Description
- A tuple is an immutable list
- Type: `tuple`
- Examples: `(1, 2, 3)`, `('a', 'b', 'c')`, `(1, 'abc', [], True)`

###Dictionaries

####Notes on debugging
Common errors:
- Syntax:
    - not closing curly brackets, `{}`
- Logic:
    - accessing to a non-existing element: key not found
- Semantic:
    - modifying a dictionary inside a function: side effects

####Description
- A dictionary is a kind of list that establishes a mapping between a set of indices (called keys) and a set of values
- Keys can be any hashable type (roughly speaking, it means immutable; its value never changes; for example, integers or strings)
- Values can be any type
- Each pair key-value is called item
- Type: `dict`
- Examples:
```
{1: 'a', 2: 'b', 3: 'c'}
```
```
{'one':'uno', 'two':'dos',
'three':'tres', 'four':'cuatro',
'five':'cinco', 'six':'seis',
'seven':'siete', 'eight':'ocho',
'nine':'nueve', 'ten':'diez',}
```

####Access
- Several ways to access a dictionary:
    - As a whole
    - One element (lookup)
        - Syntax: `dictionary[key]`
    - List of items
        - Syntax: `dictionary.items()` (it returns a list of tuples `(key, value)`)
    - List of keys
        - Syntax: `dictionary.keys()`
    - List of values
        - Syntax: `dictionary.values()`

Example:

In [53]:
d = {1: 'a', 2: 'b', 3: 'c'}

In [54]:
d

{1: 'a', 2: 'b', 3: 'c'}

In [55]:
d[1]

'a'

In [56]:
d.items()

[(1, 'a'), (2, 'b'), (3, 'c')]

In [57]:
d.keys()

[1, 2, 3]

In [58]:
d.values()

['a', 'b', 'c']

####Modifying dictionaries
- Adding new item
    - Syntax: `dictionary[new_key] = value`
- Modifying existing item
    - Syntax: `dictionary[key] = new_value`
- Deleting existing item
    - Syntax: `del(dictionary[key])`

Example:

In [59]:
d[4] = 'd'
d

{1: 'a', 2: 'b', 3: 'c', 4: 'd'}

In [60]:
d[1] = 'x'
d

{1: 'x', 2: 'b', 3: 'c', 4: 'd'}

In [61]:
del(d[2])
d

{1: 'x', 3: 'c', 4: 'd'}

####Other functions and operators
- `in`: checks if a key is contained a dictionary

Example:

In [62]:
2 in d

False

In [63]:
4 in d

True

####Exercise 6
Write a function called `histogram` that receives a string and returns the frequency of each letter.

In [64]:
def histogram(word):
    d = {}
    for letter in word:
        if letter in d:
            d[letter] += 1
        else:
            d[letter] = 1
    return d

In [65]:
histogram('banana')

{'a': 3, 'b': 1, 'n': 2}

####Exercise 7
Write a function called `invert_histogram` that receives an histogram and returns the inverted histogram, where the keys are the frequencies and the values are lists of the letters that have that frequency. Example: `portar` (in Spanish, *carry*).

<div align="center">
    <figure>
        <img src="img/inverted_histogram.png">
        <figcaption>Inverted histogram</figcaption>
    </figure>
</div>

In [66]:
def invert_histogram(h):
    inverse = {}
    for key in h:
        value = h[key]
        if value in inverse:
            inverse[value].append(key)
        else:
            inverse[value] = [key]
    return inverse

In [67]:
h = histogram('portar')
h

{'a': 1, 'o': 1, 'p': 1, 'r': 2, 't': 1}

In [68]:
i = invert_histogram(h)
i

{1: ['a', 'p', 't', 'o'], 2: ['r']}