# Data Science and Visualization (RUC F2023)

## Python Programming Basics

## 1. Comment, Indentation and Function

### A line of comment is started with #. This line will not be interpreted as Python code.

In [1]:
#variable = 10 # See the effect of having and not having # at the beginning of this line
variable = variable +1

NameError: name 'variable' is not defined

### Python uses **def** to define functions and *Indentation* (*tab*) to structure code blocks.

Also pay attention to the use of colon (**:**)

In [None]:
def say_hello():
    # block belonging to the function
    # indentation is necessary for indicating the code block
    print('Hello world!') # See the effect of removing the indentation at the beginning of this line
    # we can have more lines for the function
    # End of function

say_hello()

Compare the following two cells and see the different orders of the two outputs. Why?

In [None]:
def say_hello(name):
    print('Hello, ' + name)
    print("Hello world")
    
say_hello('Mike')

In [None]:
def say_hello(name):
    print('Hello, ' + name)
print("Hello world")
    
say_hello('Mike')

An example of function to find the larger one of two given numbers.

*NB*: We can use triple-quoted string literals to create block comment.

In [2]:
def max(a, b):
    """
    Betwee these to triple-quoted strings, 
    you can write as much text as you want. 
    They will all be treated as comments,
    though you don't use #.
    """
    if (a > b):
        return a
    else:
        return b

num1 = int(input("Number 1: "))
num2 = int(input("Number 2: "))
print("The biggest number is", str(max(num1, num2)))

Number 1: 3
Number 2: g


ValueError: invalid literal for int() with base 10: 'g'

## 2. Control Flow

In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls are executed. By default, all these are executed *sequentially*. However, programming languages like Python provides ways to change the sequential execution order.

* **if** statement: conditional
* **while** loop: repeat something as long as a condition is satisfied
* **for** loop: repeat something according to a range
* **break** from a loop
* **continue** continue to the next iteration of a loop

###  **if** statement: conditional

*NB*: Pay attention to **else** and **elif**. They are optional. But if you do use them, they must correpond to **if** correctly. Also remember to use indentations correctly when you have *nested* **if-elif-else** structures.

Again, pay attention to the use of colon (**:**)

In [3]:
def maxplus(a, b, c):
    if (a > b):
        if (a > c):
            return a
        else:
            return c
    elif (b > c):
        return b
    else:
        return c

In [6]:
maxplus(10, 30, 12)

30

A condition in Python is of boolean type: Either True or False. See the examples below.

In [134]:
3 > 4

False

In [135]:
2 == 2

True

In [136]:
308 < 1000

True

The % operator (modulus) returns remainder of the division of the two operands.

The // operator returns the quotient.

The / operator does a 'real' division.

In [133]:
print(32 % 5 == 0)

False


In [139]:
print(32 // 5)

6


In [140]:
print(32 / 5)

6.4


### **while** loop: repeat something as long as a condition is satisfied

This example prints the integers from 1 to 5:

In [8]:
i = 1
while i < 6:
    print(i)
    i += 1

1
2
3
4
5


See the effect of **break**

In [13]:
i = 1
while i < 6:
    if i == 4:
        break
    print(i)
    i += 1

1
2
3


See the effect of **continue**

*NB*: Think what we need 'i += 1' before continue? What if it is removed?

In [15]:
i = 1
while i < 6:
    if i == 4:
        i += 1
        continue
    print(i)
    i += 1

1
2
3
5


### **for** loop: repeat something according to a range

A **for** loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string).

Recall that **range(a, b)** defines a sequence **[a, b)** for two given integers a and b.

In [17]:
for i in range(1, 6):
    print(i)

1
2
3
4
5


A string in Python is also a sequence.

In [21]:
for x in "banana":
    print(x)

b
a
n
a
n
a


A list in Python is also a sequence.

In [20]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    print(x)

apple
banana
cherry


See the effect of **break**:

In [18]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    if x == "banana":
        break
    print(x)

apple


See the effect of **continue**:

In [19]:
fruits = ["apple", "banana", "cherry"]
for x in fruits:
    if x == "banana":
        continue
    print(x)

apple
cherry


## 3. Built-In Data Structures

In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. Python provides a number of built-in data structures for efficient data access.

The following three are sequences, i.e., the data objects inside have an order.
* **String**
* **List**: [...]
* **Tuple**: (...)

The following two do not guarantee any order among the data objects inside.
* **Dictionary**: {: , : , ...}
* **Set**: {...}


### List examples

In [22]:
# A list of strings (string objects)
shop_list = ['apple', 'mango', 'carrot', 'banana']

print(shop_list)

['apple', 'mango', 'carrot', 'banana']


The indexes of the objects in a list starts with 0.

In [24]:
print(shop_list[0])

apple


In [25]:
print(shop_list[4])

IndexError: list index out of range

We can remove an object from a list.

In [27]:
shop_list.remove('mango')
print(shop_list)

['apple', 'carrot', 'banana']


We can append an object to (the end of) a list.

In [29]:
shop_list.append('rice')
print(shop_list)

['apple', 'banana', 'carrot', 'rice']


We can change the object on a particular position in a list.

In [35]:
shop_list[3] = 'avocado'
print(shop_list)

['apple', 'banana', 'carrot', 'avocado']


We can sort the objects in a list according to the natural odering of the objects.

In [30]:
shop_list.sort()
print(shop_list)

['apple', 'banana', 'carrot', 'rice']


Another example of list. Note here the order is only *accidentally* consistent with the value of the integers inside.

In [23]:
# A list of integers (integer objects)
grades = [-3, 0, 2, 4, 7, 10, 12]

print(grades)

[-3, 0, 2, 4, 7, 10, 12]


### Tuple examples

A tuple is basically the same as a list, but you cannot change the items of a tuple.
Lists are mutable, whereas tuples are immutable.


In [38]:
# A tuple containing three integer objects.
tuple1 = (1, 2, 3)
var1 = tuple1[0] 
print (var1)

1


In [34]:
# This is not allowed for a Tuple object
tuple2[0] = 5

TypeError: 'tuple' object does not support item assignment

### Sequence

Lists, tuples and strings are all sequences. On all of them, there're some major common features:

* membership test: **in**, **not in**
* indexing operations: [i] (see examples above)
* concatenation
* slicing: This is often used in Data Science :)

#### Membership test

In [41]:
list2 = [1, 2, 3, 4]
print(11 not in list2)

True


In [42]:
tuple2 = ('apple', 'banana', 'carrot', 'rice')
print('rice' in tuple2)

True


#### Concatenation

In [48]:
t1 = ("Canada", "Japan")
t2 = ("UK", "Germany")
t3 = t1 + t2
print(t3)

('Canada', 'Japan', 'UK', 'Germany')


In [50]:
l1 = ["Canada", "US"]
l2 = ["UK", "Germany"]
l3 = l1 + l2
print(l3)

['Canada', 'US', 'UK', 'Germany']


In [108]:
l4 = [1, 4, 7]
l5 = l1 + l4
print(l5)

['Canada', 'US', 1, 4, 7]


#### Slicing in Sequence with one colon **:**

In [54]:
items = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# From index 4 to the last index 
print(items[4:])

[4, 5, 6, 7, 8, 9]


In [57]:
# From index 0 to index 4 (excl.)
print(items[:4])

[0, 1, 2, 3]


In [58]:
# From index 4 (incl.) up to index 7 (excl.)
print(items[4:7])

[4, 5, 6]


In [59]:
# Exclude the last item
print(items[:-1])

[0, 1, 2, 3, 4, 5, 6, 7, 8]


In [60]:
# Exclude the last two items
print(items[:-2])

[0, 1, 2, 3, 4, 5, 6, 7]


#### Slicing in Sequence with two colons **::**

In [62]:
# From the last to first items in reverse order (negative step)   
print(items[::-1])

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


In [64]:
# From the last item, every other item in reversed order 
print(items[::-2])

[9, 7, 5, 3, 1]


In [66]:
# From the 2nd to last item, every other item in reversed order 
print(items[-2::-2])

[8, 6, 4, 2, 0]


In [68]:
# All items in the normal order
print(items[::])

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


### Dictionary

A dictionary stores key-value pairs. Any key in a dictionary must be unique. The value describes the (detailed) information for a given key. Keys must be *immutable* objects (e.g., strings), objects whose content cannot be changed after creation.

Format of a dictionary: **dict = {key1 : value1, key2 : value2}**. 

The pairs are not ordered!


In [74]:
# We create a dictionary in this way
d1 = {'Canada': 100, 'Japan': 500}
print(d1)

{'Canada': 100, 'Japan': 500}


In [75]:
# We can add key-value pair into a dictionary
d1['Italy'] = 400
print(d1)

{'Canada': 100, 'Japan': 500, 'Italy': 400}


In [77]:
# Remove a key-value pair
del d1['Japan']
print(d1)

{'Canada': 100, 'Italy': 400}


In [78]:
# pop(.) also removes a key-pair value
d1.pop('Italy')
print(d1)

{'Canada': 100}


In [80]:
# We can change a pair's value
d1['Canada'] = 200
print(d1)

{'Canada': 200}


In [82]:
d1['Italy'] = 250
print(d1)

# popitem() removes the last pair
d1.popitem()
print(d1)

{'Canada': 200, 'Italy': 250}
{'Canada': 200}


#### Iterate over a dictionary using for loop

In [84]:
d1 = {'Canada': 100, 'Japan': 200, 'Germany': 300, 'Italy': 400}

Print only keys

In [88]:
for x in d1:
    print(x)

Canada
Japan
Germany
Italy


Print only values 

In [89]:
for x in d1:
    print(d1[x])

100
200
300
400


In [90]:
for x in d1.values():
    print(x)

100
200
300
400


Print keys and values

In [92]:
for x, y in d1.items():
    print(x, "=>", y)

Canada => 100
Japan => 200
Germany => 300
Italy => 400


#### Sorting and Dictionary

The sorted() method sorts the keys of a dictionary. However, the dictionary itself is not sorted.

**Syntax**: sorted(iterable, key=key, reverse=reverse)

In [94]:
d1 = {'Canada': 100, 'Japan': 200, 'Germany': 300, 'Italy': 400}

In [95]:
for key in sorted(d1):
    print("%s: %s" % (key, d1[key]))

Canada: 100
Germany: 300
Italy: 400
Japan: 200


The 2nd argument specifies that we use d1.get, i.e., the values, as the key for the sorting. The 3rd argument means sort in the reverse order. As a result, we sort the dictionary on the values decreasingly.

In [103]:
for x in sorted(d1, key=d1.get, reverse=True):
    print("%s: %s" % (x, d1[x]))    

Italy: 400
Germany: 300
Japan: 200
Canada: 100


#### Merge two dictionaries

In [105]:
d1 = {'Canada': 100, 'Japan': 200, 'Germany': 300, 'Italy': 400}
d2 = {'Australia': 500, 'India': 600}

# The update() method merges dictionary d1 into d2. 
d2.update(d1)

# d1 is not changed
print(d1)
print(d2)

{'Canada': 100, 'Japan': 200, 'Germany': 300, 'Italy': 400}
{'Australia': 500, 'India': 600, 'Canada': 100, 'Japan': 200, 'Germany': 300, 'Italy': 400}


### Set

Sets are unordered and unindexed collections of objects. Those objects can be of different types.

In [107]:
set1 = {'apple', 'banana', 'cherry'}
print(set1)

{'cherry', 'banana', 'apple'}


We cannot access items in a set by referring to an index or a key. But we can still use membership operator in.

In [110]:
for x in set1:
    print(x)

cherry
banana
apple


Once a set is created, we cannot change its items, but we can add new items to and remove items from it.

In [111]:
set1.add('orange')
print(set1)

{'cherry', 'banana', 'apple', 'orange'}


In [113]:
set1.remove('apple') # or set1.discard(‘apple')
print(set1)

{'cherry', 'banana', 'orange'}


Union, intersection and difference of sets.

In [124]:
s1 = {"a", "b" , "c"}
s2 = {1, 2, 3}

In [117]:
s3 = set1.union(s2)
print(s3)

{1, 2, 3, 'a', 'b', 'c'}


Note s1 is not changed.

In [118]:
print(s1)

{'a', 'b', 'c'}


Compare to this:

In [119]:
s1.update(s2)
print(s1)

{1, 2, 3, 'a', 'b', 'c'}


In [127]:
s1.add(4)
s1.add(2)
s4 = s1.intersection(s2)
print(s4)

{2, 4}


In [129]:
print(s1)
print(s2)
s5 = s1.difference(s2)
print(s5)

{2, 4, 'a', 'b', 'c'}
{1, 2, 3, 4, 6}
{'a', 'b', 'c'}


#### Creating set from sequence

In [130]:
tuple1 = ('Canada', 'Japan', 'Canada', 'Germany', 'Japan', 'Italy')
list2 = ['Germany', 'France', 'Poland', 'Italy', 'India', 'Poland']
set1 = set(tuple1)
set2 = set(list2)
print(set1)
print(set2)

{'Germany', 'Canada', 'Japan', 'Italy'}
{'Germany', 'Italy', 'India', 'Poland', 'France'}
