<!--NAVIGATION-->
<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>< [Quiz 1](01.05-Quiz.ipynb) | [Contents](00.00-Index.ipynb) | [Flow Control & Functions](02.02-Flow-Functions.ipynb) ></span>

<a href="https://colab.research.google.com/github/eurostat/e-learning/blob/main/python-official-statistics/02.01-Data-Types.ipynb"><img align="left" src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab" title="Open and Execute in Google Colaboratory"></a>

<a id='top'></a>

# Data Types
## Content  
- [String](#string)  
- [Numbers](#numbers)  
- [Booleans](#booleans)  
- [Assignment](#assignment)  
- [Sequences](#sequences)  
- [Collections](#collections)
- [Truthy and Falsy Values](#truthy)

Nothing is an object in python, called `None`. If you need a variable to be declared but not used and properly specify a type for it you can use `None` (it is used for some default parameters in functions):  

In [1]:
a = None
type(a)

NoneType

<a id='string'></a>

## String
Single quotes, double quotes, triple quotes ...  
### Single quotes and double quotes
For simple strings, and a way to include the single quotes `'` char in a string is declaring it with double quotes and viceversa. Some examples:

In [2]:
print('a')
print('aa')
print("aaa")
print('Hello!')
print("don't worry")

a
aa
aaa
Hello!
don't worry


Also it is possible to use escape characters.  
Example: 


In [3]:
'don\'t'

"don't"

### Triple quotes (or triple double quotes)  
This way you can have all characters in a string, the string can be multiple lines:

In [4]:
haiku = '''"The Old Pond" by Matsuo Bashō
An old silent pond
A frog jumps into the pond —
Splash! Silence again.'''

print(haiku)

"The Old Pond" by Matsuo Bashō
An old silent pond
A frog jumps into the pond —
Splash! Silence again.


### F-strings
Python 3.6 introduced the f-strings that allow you to format text strings faster and more elegant.  
The f-strings provide a way to embed expressions inside a string literal using a clearer syntax than the format() method:


In [5]:
a = 3
print(f'1/a = {1/a:.2f}')

1/a = 0.33


### Raw Strings
When you prefix a string with the letter r or R, that string becomes a raw string.  
Unlike a regular string, a raw string treats the backslashes `\` as literal characters.  
Raw strings are useful when you deal with strings that have many backslashes, for example, regular expressions or directory paths on Windows.  


In [6]:
print(r'That is Carol\'s cat.')

That is Carol\'s cat.


### Concatenation & Replication

In [7]:
# String Concatenation
str1 = 'aaa' "bbb"
str2 = 'aaa' + "bbb"
print(str1)
print(str2)

# String Replication
str3 = 'Alice' * 5
print(str3)

aaabbb
aaabbb
AliceAliceAliceAliceAlice


### The `in` Operator with Strings
Check for the existence of a substring in a string (case sensitive).

In [8]:
print('Hello' in 'Hello World')
print('Hello' in 'Hello')
print('HELLO' in 'Hello World')
print('' in 'spam')
'cats' not in 'cats and dogs'

True
True
False
True


False

### Some more functionality of the class `str` by examples

In [9]:
spam = 'Hello world!'
# convert to capital letters
print(spam.upper())
'HELLO WORLD!'
#  check if all chars are lower
print(spam.islower())
# check if the string start with a specific sequence
print(spam.startswith('Hello'))
# joining a list of strings into a string separated by space
print(' '.join(['My', 'name', 'is', 'Simon']))
# split: the opposite of join
print('My name is Simon'.split())
# centering a string into a predefined size string
c = 'Hello'.center(20)
print('"', c, '"')
# removing spaces from right
c.rstrip()

HELLO WORLD!
False
True
My name is Simon
['My', 'name', 'is', 'Simon']
"        Hello         "


'       Hello'

<a id='numbers'></a>

## Numbers
There are three numeric types: integer, floating-point, and complex.  

### Integer Numbers Examples

In [10]:
-69, -6, 0, 1_000, 0xa, 0o12, 0B1010, int(5), int(-9.3), int('56'), int('100', base=3)

(-69, -6, 0, 1000, 10, 10, 10, 5, -9, 56, 9)

### Floating-point Numbers Examples

In [11]:
-1.25, --0.5, 0.0, 0.5, float("4.5"), float(3 + 6.9)

(-1.25, 0.5, 0.0, 0.5, 4.5, 9.9)

### Complex Numbers Examples

In [12]:
3.2 + 2j, 1j, -1 + 0j, complex('3-5.7j'), complex()

((3.2+2j), 1j, (-1+0j), (3-5.7j), 0j)

### Arithmetics
Here is the list with all the operations. Some work for all numeric types some not:  

| Operation | Symbol |
| :- | :-: |
| Addition | + |
| Substraction | - |
| Multiplication | * |
| Division | / |
| Floor division: division quotient | // |
| Modulus: division reminder | % |
| Exponentiation | ** |

#### Narrower, Wider
These notions are important when use numbers of different types in arithmetics. In python's arithmetics, always are used numbers of the same types. So, a conversion to wider operand is done before aritmetic operation.  
`Complex numbers` are `wider` than `float numbers` which are `wider` than `int numbers`.  
Some examples:

In [23]:
print(1 + .2)
print((1 + 1j)/2)
print(2**3)

1.2
(0.5+0.5j)
8


There are some situations when arithmetics are not possile an an error will be raise:
> float // complex  
> float % complex

### Integer bitwise operations
For integer type you can also manipulate the value (a variable) at the bit level. Here is the list with all the operators:  

| Operation | Symbol |
| :- | :-: |
| AND | & |
| OR | \| |
| XOR (exclusive OR) | ^ |
| NOT | ~ |
| Left shift | << |
| Left shift | >> | 

Some examples here:

In [13]:
print(bin(0b1011 & 0b1010))
print(bin(0b1011 | 0b1010))
print(bin(0b1011 ^ 0b1010))
print(bin(~ 0b1010))
print(bin(0b1010 << 2))
print(bin(0b1010 >> 2))

0b1010
0b1011
0b1
-0b1011
0b101000
0b10


For more about bitwise operations see [python official page](https://wiki.python.org/moin/BitwiseOperators).

<a id='booleans'></a>

## Booleans
There is just one type of boolean type in Python, of course, with values: `True` & `False`.  
 ### Comparison operators
 They work for majority of types, but operators must be of the same type or can be converted into some wider type.  
 
| Operation | Symbol |
| :- | :-: |
| Equal | == |
| Not equal | != |
| Greater than | > |
| Less than | > |
| Greater than or equal to | >= |
| Less than or equal to | <= |  

### Logical operators
Logical operators are used on conditional statements (involving boolean values, expressions, variables).

| Operation | Symbol |
| :- | :-: |
| True if both statements are true | and |
| True if one of the statements is true | or |
| False if the result is true and viceversa | not |

### Identity operator(s)
Identity operators are used to compare the objects, not if they are equal, but if they are actually the same object, with the same memory location.  
The operator is: `is`; and can be used in the negative form too: `is not`.

### Membership operator(s)
They are used to test if a sequence is presented in an object. Already used in strings. Also useful in complex types as lists and dictionaries.  
The operator is: `in`; and can be used in the negative form too: `not in`.

<a id='assignment'></a>

## Assignment
Assignment is the operation to `assign` values to variables.  
The main operator is `=` sign. If you want to use a variable before first initializing it you'll get an error. Each re-assignment can change the type of a variable.  

Python provides also some shortcuts for assignment, called compound assignment operators.  

Example:  

In [21]:
x = 1
x = x + 5
x

6

this line can be writen using compound operator like this:  

In [20]:
x = 1
x += 5
x

6

There is a compound oerator available for each arithmetic and bitwise operator.

### Walrus operator
The walrus operator is denoted `:=` , and introduced in Python 3.8. This operator is used for and only for the assignment of variables within another expression. Walrus operators can be used everywhere from loops to functions to list comprehension to if statements to roundabout variable assignment.  

Example: A while loop.  
Without walrus operator:

In [15]:
i = 1
while i < 5:
    print(i)
    i += 1

1
2
3
4


now with the walrus:

In [16]:
i = 0
while (i := i + 1) < 5:
    print(i)

1
2
3
4


It sparked a little of a controversy in the python community.  
For more, see [PEP 572](https://www.python.org/dev/peps/pep-0572/).

<a id='sequences'></a>

## Sequences
In Python programming, sequences are a generic term for an ordered set which means that the order in which we input the items will be the same when we access them.  
Python supports six different types of sequences. These are `strings`, `lists`, `tuples`, `byte sequences`, `byte arrays`, and `range` objects.  

### Lists
Python lists are similar to an array but they allow us to create a heterogeneous collection of items inside a list.  
Here some examples:

In [23]:
list1 = [1,2,3,4]
list2 = ['red', 'green', 'blue']
list3 = ['hello', 100, 3.14, [1,2,3] ]

print(list1, list2, list3)

[1, 2, 3, 4] ['red', 'green', 'blue'] ['hello', 100, 3.14, [1, 2, 3]]


Very fast now more opeartions available for lists.
### Changing a value in a list:  

In [24]:
list1[2] = 34
list1

[1, 2, 34, 4]


### Removing Values from Lists with `del` Statements:

In [25]:
del list1[2]
list1

[1, 2, 4]


### Finding a Value in a List with the ``index()`` Method

In [26]:
spam = ['Zophie', 'Pooka', 'Fat-tail', 'Pooka']
spam.index('Pooka')

1


### Adding Values to Lists with the ``append()`` and ``insert()`` Methods
- append():

In [27]:
spam = ['cat', 'dog', 'bat']
spam.append('moose')
spam

['cat', 'dog', 'bat', 'moose']


- insert():

In [28]:
spam.insert(1, 'chicken')
spam

['cat', 'chicken', 'dog', 'bat', 'moose']

### Removing Values from Lists with ``remove()``

In [29]:
spam = ['cat', 'bat', 'rat', 'elephant']
spam.remove('bat')
spam

['cat', 'rat', 'elephant']


### Sorting the Values in a List with the ``sort()`` Method

In [30]:
spam = [2, 5, 3.14, 1, -7]
spam.sort()
spam

[-7, 1, 2, 3.14, 5]

### Tuples
Tuples are also a sequence of Python objects. A tuple is created by separating items with a comma. They can be optionally put inside the parenthesis () but it is necessary to put parenthesis in an empty tuple.  
Tuples are also `immutable` like `strings` so we can `only reassign` the variable but we `cannot` `change`, `add` or `remove` elements from the tuple.
Some examples:

In [31]:
print(())
print((1,2,3,4,5))
print(( "78 Street", 3.8, 9826 ))

()
(1, 2, 3, 4, 5)
('78 Street', 3.8, 9826)


#### Boxing, unboxing
This is a valid code in Python:

In [32]:
a, b = 'Alice', 'Bob'
print(a, b)
a, b = b, a
print(a, b)

Alice Bob
Bob Alice


How this works? Some misterious multiple assignment? No.
Let's explain for this line `a, b = 'Alice', 'Bob'`.  
On the right side of `=` sign the two strings are first converted into a tuple `('Alice', 'Bob')`. This is called boxing.
On the left side of `=` sign if the number of variables is the same as the number of elements of the tuple (actualy unboxing works with all sequence types), each variable, in order, is assigned one element from sequence. This is called unboxing.

### Python ``range()`` objects
range() is a built-in function in Python that returns us a range object. The range object is nothing but a sequence of integers. It generates the integers within the specified start and stop range.  
Example:

In [33]:
for i in range(3):
    print(i)

0
1
2


## Common operations for sequences type

### Getting Individual Values
Done with the help of square brackets.  
Example: 

In [34]:
spam = ['cat', 'bat', 'rat', 'elephant']
print(spam[0])
print(spam[3])
# negative indexes are allowed: -1 is the last element, -2 is the element before the last one
print(spam[-1])
# error: index out of range
spam[4]

cat
elephant
elephant


IndexError: list index out of range

### Concatenation
The operator (``+``) is used to concatenate the second element to the first:

In [35]:
[1,3,4] + [1,1,1]

[1, 3, 4, 1, 1, 1]


### Repeat
The operator (``*``) is used to repeat a sequence n number of times:

In [36]:
(1,2,3) * 3

(1, 2, 3, 1, 2, 3, 1, 2, 3)

### Membership Operators
Membership operators `in` and `not in` are used to check whether an item is present in the sequence or not. They return True or False. Already seen in the string type context.

### Slicing Operator
All the sequences in Python can be sliced. The slicing operator can take out a part of a sequence from the sequence.  

Examples:

In [37]:
str = "The new york times"
print(str[4:10])
# implicit start is position 0
print(str[:6])
tup = (1,2,3,4,5)
print(tup[1:3])
 # third element in a slice is the step
print(tup[::2])


new yo
The ne
(2, 3)
(1, 3, 5)


### List Comprehension
The use of a Python list comprehension is when you want to iterate over a sequence and perform a quick operation on the elements, and having a list as result. Also you can iterate on every type that can be enumerated (like `set` or `dictionary`). 
Examples:

In [38]:
nums = [1,2,3,4,5]
# create a new list from list nums adding 10 to each element
print([i+10 for i in nums])
# new list with squares for all even numbers from nums
print([i**2 for i in nums if i % 2 == 0])

[11, 12, 13, 14, 15]
[4, 16]


<a id='collections'></a>

## Collections

### Sets
Sets in Python are a collection of unordered and unindexed Python objects. Sets are mutable, iterable and they do not contain duplicate values. It is similar to the concept of the mathematical set. Here a nice [tutorial](https://techvidvan.com/tutorials/python-sets/).  

Create some:  

In [39]:
fruits = {"apple", "banana", "cherry", "apple"}
empty_set = set()
s = {1, 2, 3}

and, as an unordered data type, they can't be indexed:  

In [40]:
s = {1, 2, 3}
s[0]

TypeError: 'set' object is not subscriptable

Add an elemment to a set:

In [41]:
s = {1, 2, 3}
s.add(4)
s

{1, 2, 3, 4}

or, several at once:

In [42]:
s = {1, 2, 3}
s.update([2, 3, 4, 5, 6])
s

{1, 2, 3, 4, 5, 6}

and remember: Sets automatically remove duplicates.  


Removing an element from a set:  
There are two methods: `remove()` and `discard()`. Both methods will remove an element from the set, but remove() will raise a key error if the value doesn't exist.

In [43]:
s = {1, 2, 3}
s.remove(3)
s

{1, 2}

In [44]:
s.remove(3)

KeyError: 3

In [45]:
s.discard(3) # no error
s

{1, 2}

#### Set Union  
`union()` or `|` will create a new set that contains all the elements from the sets provided.

In [46]:
s1 = {1, 2, 3}
s2 = {3, 4, 5}
s1.union(s2)  # or 's1 | s2'

{1, 2, 3, 4, 5}

#### Set Intersection  
`intersection()` or `&` will return a set containing only the elements that are common to all of them.

In [47]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}
s3 = {3, 4, 5}
s1 & s2 & s3  # or 's1.intersection(s2, s3)'

{3}

#### Set Difference
`difference()` or `-` will return only the elements that are unique to the first set (invoked set).

In [48]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}
print(s1.difference(s2))  # or 's1 - s2'
print(s2.difference(s1)) # or 's2 - s1'

{1}
{4}


#### Set Symetric Difference  
`symetric_difference()` or `^` will return all the elements that are not common between them.

In [49]:
s1 = {1, 2, 3}
s2 = {2, 3, 4}
s1.symmetric_difference(s2)  # or 's1 ^ s2'

{1, 4}

#### Set Comprehension
Similar with list comprehension, it's possible to create a new set by enumerate and process elements from a different set (or list):

In [50]:
b = {"abc", "def"}
{s.upper() for s in b}

{'ABC', 'DEF'}

### Dictionaries
Dictionaries in Python are collections that are unordered, indexed and mutable. They hold keys and values. Here a nice [tutorial](https://techvidvan.com/tutorials/python-dictionaries/). 

Let's create some:

In [51]:
x = {"name" : "John", "age" : 36}
x

{'name': 'John', 'age': 36}

In [52]:
x = dict(name="John", age=36)
x

{'name': 'John', 'age': 36}

#### The keys(), values(), and items() Methods:
For accessing the keys, or values of the dictionary, or a pair (key, value) as a tuple.
- Function values():

In [53]:
spam = {'color': 'red', 'age': 42}
for v in spam.values():
    print(v)

red
42


 - Function keys():

In [54]:
for k in spam.keys():
    print(k)

color
age


- Function items():

In [55]:
for i in spam.items():
    print(i)

('color', 'red')
('age', 42)


Because items are key-value pairs, actually a tuple, it can be unboxed:

In [58]:
for k, v in spam.items():
    print(f'Key: {k}, Value: {v}')

Key: color, Value: red
Key: age, Value: 42


Checking whether a key or value exists in a dictionary

In [59]:
spam = {'name': 'Zophie', 'age': 7}
'name' in spam.keys()

True

In [60]:
'Zophie' in spam.values()

True

You can omit the call to ``keys()`` when checking for a key:

In [64]:
'color' in spam # equivalent to: 'color' in spam.keys()

False

In [62]:
'color' not in spam

True

#### The ``get()`` Method
Get has two parameters: key and default value if the key did not exist

In [66]:
picnic_items = {'apples': 5, 'cups': 2}
print(f'I am bringing {picnic_items.get("cups", 0)} cups.')
print(f'I am bringing {picnic_items.get("eggs", 0)} eggs.')

I am bringing 2 cups.
I am bringing 0 eggs.


#### The ``setdefault()`` Method
Adding a new key/value is done like this:

In [67]:
spam = {'name': 'Pooka', 'age': 5}
if 'color' not in spam:
    spam['color'] = 'black'
spam

{'name': 'Pooka', 'age': 5, 'color': 'black'}

Using setdefault() we could write the same code more succinctly:

In [68]:
spam = {'name': 'Pooka', 'age': 5}
spam.setdefault('color', 'black')
spam

{'name': 'Pooka', 'age': 5, 'color': 'black'}

If the key already exists, the function is doing nothing:

In [69]:
spam.setdefault('color', 'white')
spam

{'name': 'Pooka', 'age': 5, 'color': 'black'}

#### Merge two dictionaries

In [70]:
x = {'a': 1, 'b': 2}
y = {'b': 3, 'c': 4}
z = {**x, **y}
z

{'a': 1, 'b': 3, 'c': 4}

#### Dictionary comprehension
And, finally, the comprehension in dictionaries same way as for other sequences and collections.

In [71]:
# nice example of changing keys in values and values in keys
c = {'name': 'Pooka', 'age': 5}
{v: k for k, v in c.items()}

{'Pooka': 'name', 5: 'age'}

<a id='truthy'></a>

## Truthy and Falsy Values
Any object can be tested for truth value (not just boolean), for use in an if or while condition or as operand of the logical operations below (and, or, not).  
They provide some shortcuts, but be aware, there is a little drawback: the code becomes not so readable.  

Example: Checking the length of a string.  
Without relying on `falsy`:

In [72]:
str = ''
if len(str) > 0:
    print('string not empty')
else:
    print('string empty')

string empty


now the same thing with falsy:

In [73]:
str = ''
if str:
    print('string not empty')
else:
    print('string empty')

string empty


Here some `falsy` values:  
- Sequences and Collections:
    - Empty lists []
    - Empty tuples ()
    - Empty dictionaries {}
    - Empty sets set()
    - Empty strings ""
    - Empty ranges range(0)
- Numbers, zero of any numeric type:
    - Integer: 0
    - Float: 0.0
    - Complex: 0j
- Constants
    - None
    - False

<!--NAVIGATION-->
<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>< [Quiz 1](01.05-Quiz.ipynb) | [Contents](00.00-Index.ipynb) | [Flow Control & Functions](02.02-Flow-Functions.ipynb) > [Top](#top) ^ </span>

<span style='background: rgb(128, 128, 128, .15); width: 100%; display: block; padding: 10px 0 10px 10px'>This is the Jupyter notebook version of the __Python for Official Statistics__ produced by Eurostat; the content is available [on GitHub](https://github.com/eurostat/e-learning/tree/main/python-official-statistics).
<br>The text and code are released under the [EUPL-1.2 license](https://github.com/eurostat/e-learning/blob/main/LICENSE).</span>