<a href="https://colab.research.google.com/github/natthamonm/PDMO/blob/main/WB_Thailand_Python_data_types.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction to Python's built-in data types**
----
In the introductory example, we briefly discussed how to construct customized data structures for our application (such as the "Bond" class).

However, to be able to build such model components, we first need to learn more about Python's stardard, built-in data types.

#### This notebook covers the following fundamental Python data types: 
- integers
- real numbers
- lists
- strings
- dates

## **Integers**: datatype 'int'

In [1]:
x = 4
y = 2
type(x)

int

In [2]:
print("x + y :", x + y)
print("x - y :", x - y)
print("x * y :", x * y)
print("x / y :", x / y)

x + y : 6
x - y : 2
x * y : 8
x / y : 2.0


In [3]:
x = 2
y = 3
x**y # Notation for exponentiation

8

## **Real numbers**: datatype 'float'

In [8]:
a = 0.016
print(a)
type(a)

0.016


float

In [9]:
# Round a real number to a specific number of decimals
round(20.675, 2)

20.68

In [6]:
# Convert from float to int
int(3.1415)

3

In [10]:
# And the other way around
float(3)

3.0

## **Lists**: Datatype 'list'

Lists are useful for storing collections of things, e.g. bonds in a debt portfolio.

In [11]:
# Lists are created with brackets
L = []
type(L)

list

In [12]:
# Create a list of integers
L = [2, 4, 6]

In [13]:
# Number of elements
len(L)

3

In [14]:
# Lists can contain anything, even other lists
L = [0.016, [1, 2, 3], 'PDMO']
L

[0.016, [1, 2, 3], 'PDMO']

In [15]:
# Accessing individual elements with brackets []
# NOTE! Python uses zero-based indexing

# first element
print(L[0])

# last element
print(L[-1])

0.016
PDMO


> ### <font color='steelblue'>**Exercise 1**: List basics</font>
(Note: You can add a new cell by placing the cursor between any two cells and clicking "+ Code" or "+ Text")
1. Make a Python list containing five different numbers
2. Store the list in variable called 'L'
3. Print the length of the list
4. Print the middle element
5. Print the last element

In [16]:
# Slicing lists

# pos: 0    1    2    3    4
#      |    |    |    |    |
L =     ['a', 'b', 'c', 'd', 'e']

# first index included in slice, last index is not
print(L[1:3])

['b', 'c']


In [None]:
# Slicing from the beginning
print(L[:3])

In [None]:
# Slicing until the end
print(L[2:])

### Comparison with 1-based indexing (Matlab-slicing as example)

> First 3 elements
* Matlab: ``A(1:3)``
* Python: ``A[:3]``

> Last 3 elements
* Matlab: ``A(end-2:3)``
* Python: ``A[-3:]``

In [None]:
# Adding lists is straighforward

L1 = [4, 6, 8]
L2 = [10, 20, 30, 40]

L = L1 + L2
L

In [None]:
# Adding elements to a list

L1 = [4, 6, 8]
L1.append(10)
L1

> #### <font color='steelblue'>**Exercise 2**: Slicing and combining lists</font>
1. Make a list with 6 elements: 3 integers and 3 floats
2. Make a new list called 'first2' by slicing the __first two__ elements in the original list
3. Make a new list called 'last2' by slicing the __last two__ elements in the original list
4. Combine 'first2' and 'last2' to form a new list (with 4 elements)
3. Print the new list to the screen

In [None]:
# Append an element to list

L = [10, 5, 8]

# append inserts an element at the end of the list
L.append(4)
L

----
> ### <font color='forestgreen'>**Intermezzo**: Revisiting the DebtPortfolio code</font>

In [None]:
class DebtPortfolio:
    def __init__(self):
        self.positions = [] # <-- An empty list is created
        
    def add_position(self, position):
        self.positions.append(position) # <-- Elements are added using 'append'
        
    def get_CFs(self, eval_date):
        CFs = [p.bond.get_CFs(eval_date) * p.amount for p in self.positions]
        df = pd.concat(CFs, axis=1)
        df = df.fillna(value=0)
        return df

----
#### Continuing with list methods

In [None]:
# Insert (before a specified index)
L = ['A', 'C', 'D']
L.insert(0, 'X') # 0 here means before index 0, i.e. as the first element
L

In [None]:
L = ['A', 'C', 'D']
L.insert(1, 'X')
L

In [None]:
# Use 'pop' to return and remove an element 

# pop(index), where parameter 'index' defaults to the last item 
L.pop()

In [None]:
L

In [None]:
L = ['A', 'B', 'C', 'D']
L.remove('A') # Remove (first occurence only) without returning anything

In [None]:
L

In [None]:
# Check if an element is contained in the list
'D' in L

In [None]:
# Reversing lists 

# in-place
L.reverse()
L

In [None]:
# Reversing but keeping list itself unchanged
L = range(20)
print(L)
print(L[::-1]) # Slices are always copies

In [None]:
# Sorting lists

# Sorting in-place (i.e. the original list is modified)
print(L.sort())
print(L)
print(L.sort(reverse=True))
print(L)

In [None]:
# Returning a sorted copy of the list
L = [5, 22, 3, 1, 4]
print(sorted(L))
print(sorted(L, reverse=True))

In [None]:
# The list itself remains unchanged when using 'sorted'
L

> ### <font color='steelblue'>**Exercise 3**: Sorting lists</font>
This is an exercise in sorting lists
1. Create an unsorted list of 10 numbers and call it 'my_list' 
2. Print the list
3. Print the list in reversed order
3. Make a new list containing the numbers of the original list sorted in __ascending__ order

In [None]:
L = [4,5,3,2,6,9,7,8,9,10]

print(L)
L.reverse()
print(L)

L2 = sorted(L)
L2

In [None]:
L3 = sorted(L, reverse=True)
L3

### copy vs. reference

In [None]:
# Simple assignment creates a reference: A and B now points to the same thing!
A = [2, 4, 6]
B = A
B.append(99)
print(B)
print(A)

In [None]:
# If you really want a copy, use [:] instead
A = [2, 4, 6]
B = A[:]
B.append(99)
print(B)
print(A)

## **Strings**: Datatype 'str'

In [None]:
# Multiple types of quotes can be used
a = 'Isaac'
b = "Newton"  # <<<< Note: double quotes
print(a)
print(type(a))
print(b)
print(type(b))

In [None]:
d = "Moody's"
print(d)

In [None]:
# Use """...""" for long text spanning multiple lines
# Note also the use of "\" to break lines (but continue the statement)

long_text = """With 189 member countries, staff from more than 170 countries, \
and offices in over 130 locations, the World Bank Group is a unique global \
partnership: five institutions working for sustainable solutions that reduce \
poverty and build shared prosperity in developing countries."""

print(long_text)

### Strings behave a lot like lists

In [None]:
# Slicing

print(a)
print(a[:2])   # First two characters
print(a[-2:])  # Last two characters

In [None]:
# Adding strings

print(a + b)
print(a + ' ' + b)

In [None]:
# Splitting (turn string into a list)

words = long_text.split()
print(words)
print()
print(type(words))
len(words)

In [None]:
# Example: chaining of string operations

# Task: extract the domain from an arbitrary email address

x = 'billgates@microsoft.com'

x.split('@')[-1].split('.')[0]

In [None]:
# Step-by-step

# Split email in first and second part
print(x.split('@'))

In [None]:
# get last element in list
x.split('@')[-1]

In [None]:
# Split into domain and country code
x.split('@')[-1].split('.')

In [None]:
# Get the first element, i.e. the domain
x.split('@')[-1].split('.')[0]

In [None]:
# Alternative to chaining would be unpleasant nested function calls like this
str.split(str.split(x, '@')[-1], '.')[0]

####<font color='steelblue'>**Exercise 4** (Optional): Extracting information from strings</font>

U.S. social security numbers have the form "AAA-GG-SSSS", where AAA is an area code, GG is a group code and SSSS is a serial code.  

Write some code that can extract the group code (i.e. number in the middle) from a social security number such '409-52-2002' in two different ways:
* Method 1: use slicing directly on the string
* Method 2: use the 'split' function

Hint: Start by declaring a string variable, s,  containing the social security number.
</blockquote>

## **Dates**: Datatype 'date' and 'datetime'

In [None]:
from datetime import date, datetime

In [None]:
dt = date(2015, 2, 4)
dt

In [None]:
# Print shows dates in a human-friendly format
print(dt)

In [None]:
# A date have properties such as day, month, year
type(dt)
dt.month

In [None]:
# Subtracting dates yields a 'timedelta' object
date(2016, 2, 4) - date(2016, 1, 13)

In [None]:
td = (date(2016, 2, 4) - date(2016, 1, 13))
print(type(td))

In [None]:
# We can e.g. extract number of days from this object
td.days

In [None]:
# Converting from date to string (see http://strftime.org/)
dt = date(2020, 10, 20)
print(dt.strftime("%Y-%m-%d"))
print(dt.strftime("%d %B %Y"))

In [None]:
# Converting from string to data
s = "20 October 2020 (Tuesday)"
dt = datetime.strptime(s, "%d %B %Y (%A)")
dt

### <font color='steelblue'>Exercise 5: Wrangling dates</font>
>Turn a string into a Python datetime datatype 

Write code that converts the string '20 Oct 2020, 13:00' into a datatime

## **String substitution** using "f-strings"

We often need to insert some values into a string, e.g. for forming plot legends dynamically etc.
Let's generate text from some bond data.

In [None]:
class Bond:
    def __init__(self, maturity, coupon, symbol):
        self.maturity = maturity
        self.coupon = coupon
        self.symbol = symbol

In [None]:
# Create an instance of a specific bond
b = Bond(maturity=date(2029, 12, 17), coupon=0.016, symbol="LB29DA")

In [None]:
bond_text = f"The bond {b.symbol} matures on {b.maturity:%d %B %Y} and has a coupon of {100 * b.coupon:.2f}%"

In [None]:
print(bond_text)