# Python: Data Types

In [1]:
%%html
<style>
th {font-size:12px}
td {font-size:12px}
p {font-size:14px}
div.highlight {font-size:14px}
</style>

## 1. Numeric classes

### 1.1. Creating numbers
Python supports 3 numeric classes: `int` (integer numbers), `float` (decimal or floating point numbers) and `complex` (complex numbers).

Directly defining by using the correct syntax of each class.

In [1]:
my_integer = 10
type(my_integer)

int

In [2]:
my_decimal = 1.25
type(my_decimal)

float

In [3]:
my_complex = 3 + 4j
type(my_complex)

complex

In [4]:
1e5

100000.0

Each class provides a constructor function returning an instance.

In [5]:
int(3.14)

3

In [6]:
int('3f', base=16)

63

In [7]:
float(5)

5.0

In [1]:
float('inf')

inf

In [2]:
float('nan')

nan

In [8]:
complex(7)

(7+0j)

In [9]:
complex(3, 4)

(3+4j)

### 1.2. Numeric operators

In [10]:
# the sum
5 + 3.14

8.14

In [11]:
# the difference
6 - 9

-3

In [12]:
# the product
3 * (5+2.5j)

(15+7.5j)

In [13]:
# the quotient
40 / 8

5.0

In [14]:
# the floored quotient
41 // 5

8

In [15]:
# the division remainder
41 % 5

1

In [16]:
dividend = 41
divisor = 5

quotient, remainder = divmod(dividend, divisor)

print(f'{dividend} = {divisor} * {quotient} + {remainder}')

41 = 5 * 8 + 1


In [17]:
# 2 to the power 5
2**5

32

In [18]:
pow(2, 5)

32

In [19]:
# the square root
from math import sqrt
sqrt(25)

5.0

In [20]:
# the absolute value
abs(5+12j)

13.0

### 1.3. Constants and functions

In [21]:
import math

In [22]:
# the pi constant
math.pi

3.141592653589793

In [23]:
# the e constant
math.e

2.718281828459045

In [24]:
# the infinity
1/math.inf

0.0

In [25]:
# the factorial
math.factorial(10)

3628800

In [26]:
# the nearest integer on the right of 5.6
math.ceil(5.6)

6

In [27]:
# the nearest interger on the left of 5.6
math.floor(5.6)

5

In [28]:
# e to the power of 2
math.exp(2)

7.38905609893065

In [29]:
# the logarith to the base of 10
math.log(100, 10)

2.0

In [30]:
# the logarithm to the base of e, or natural logarithm
math.log(math.e)

1.0

In [31]:
# the cosine of pi
math.cos(math.pi)

-1.0

In [32]:
# round to 2 decimal places
round(1.4825, 2)

1.48

In [33]:
# round to 1 digit on the left of the decimal separator
round(133.45, -1)

130.0

In [34]:
# round to integer
round(1.85)

2

### 1.4. Methods and attributes

In [36]:
my_decimal = 7.0
my_decimal.is_integer()

True

In [37]:
my_complex = 3+4j
my_complex.real

3.0

In [38]:
my_complex = 3+4j
my_complex.imag

4.0

## 2. Boolean and None

### 2.1. Boolean class
The only two Boolean values in Python are written as `True` and `False` (capitalized). Comparisons and other conditional expressions evaluate to either `True` or `False`. Boolean values are combined with the `and` and `or` keywords.

#### Usage

In [39]:
1 == 2

False

In [40]:
1 < 2

True

In [41]:
'a' in 'anaconda'

True

In [42]:
'b' not in 'anaconda'

True

In [43]:
'1996'.isdigit()

True

In [44]:
# check if 7. is an instance of "float" class
isinstance(7., float)

True

#### Rules
```python
True  and True  = True
True  or  True  = True
False and False = False
False or  False = False
True  and False = False
True  or  False = True
```

In [45]:
True and False and False or True
# = False and False or True
# = False or True
# = True

True

In [46]:
True and (False and True) or (False or True)
# = True and False or True
# = False or True
# = True

True

In [47]:
sum([True, False, True, True])

3

### 2.2. None class

In [48]:
# "None" must be capitalize
a = None
type(a)

NoneType

In [49]:
None == 0

False

## Recap
A recap of function, method, constant and attribute.

Object       |Functionality    |Scope                    |Syntax                             |
:------------|:----------------|:------------------------|:----------------------------------|
**Function** |Perform an action|Independent              |`function(parameter=argument)`     |
**Method**   |Perform an action|Associated with an object|`object.method(parameter=argument)`|
**Constant** |Return a value   |Independent              |`constant`                         |
**Attribute**|Return a value   |Associated with an object|`object.attribute`                 |

## 3. String class
String is an immutable object and is of sequence type.

### 3.1. Creating strings
Strings are created using a single quotes `''`, a double quotes `""` or the constructor.

In [50]:
'python 3.7'

'python 3.7'

In [51]:
print('''This string
spans multiple lines''')

This string
spans multiple lines


In [52]:
str(245)

'245'

### 3.2. String maniplation

#### String operations

In [53]:
# concatenating
'the pi number ' + 'is ' + '3.14'

'the pi number is 3.14'

In [54]:
# multiplifying
'abc' * 3

'abcabcabc'

In [55]:
'py' in 'jupyter'

True

#### Functions

In [57]:
len('123456789')

9

In [58]:
eval('1+2')
# the passed string must represent an expression

3

In [59]:
max('jupyter, notebook')

'y'

#### String methods

In [61]:
'jack o lantern'.split(sep=' ')

['jack', 'o', 'lantern']

In [62]:
'-'.join(['jack', 'o', 'lantern'])

'jack-o-lantern'

In [64]:
'old string old'.replace('old', 'new', 1)

'new string old'

In [66]:
'jupyter noteBOOK'.title()

'Jupyter Notebook'

In [67]:
'jupyter'.upper()

'JUPYTER'

In [71]:
'notebook'.startswith('no')

True

In [72]:
'anaconda'.count('a')

3

In [73]:
'1234'.isnumeric()

True

In [74]:
'abc'.isalpha()

True

In [75]:
'ab12'.isalnum()

True

In [78]:
'Abcd Abcd'.istitle()

True

### 3.3. Index and slicing
Python string supports slicing to create substrings. It returns a new object and the original string remains unchanged.

#### Index
Indexing rules for the string "ANACONDA":

```
positive index:   0  1  2  3  4  5  6  7
string:           A  N  A  C  O  N  D  A
negative index:  -8 -7 -6 -5 -4 -3 -2 -1
```

In [79]:
my_string = 'abcdef'
my_string.index('cd')

2

#### Slicing
Slicing notation: `string[start:stop:step]`, where the part `start:stop` uses character indices and represents a half-open interval `[start; stop)` that includes the start index and does not include the stop index. If not specified, `start` defaults to 0, `stop` defaults to the length of the string and `step` defaults to 1.

In [80]:
digits = '0123456789'
digits[2:7]

'23456'

In [81]:
digits = '0123456789'
digits[7:]

'789'

In [82]:
digits = '0123456789'
digits[:7]

'0123456'

In [83]:
digits = '0123456789'
digits[7]

'7'

In [84]:
digits = '0123456789'
digits[::2]

'02468'

In [85]:
digits = '0123456789'
digits[1:-1]

'12345678'

In [86]:
digits = '0123456789'
digits[:7] + digits[7:] == digits

True

In [87]:
# reverse a string
digits = '0123456789'
digits[::-1]

'9876543210'

### 3.4. String literals

#### Escape character
In strings, the backslash `\` behaves as the escape character and tells Python that the following character `n`, `t`, `'`, `"`, `\`, `u` or `U` has a special meaning.

In [88]:
# new line
print('aaa\nbbb')

aaa
bbb


In [89]:
# tab
print('aaa\tbbb')

aaa	bbb


In [90]:
# backslash
print('C:\\Users')

C:\Users


In [91]:
# single quote
print('It\'s raining')
print("It's raining")

It's raining
It's raining


In [92]:
# Greek letters, using Unicode
print('''alpha: \u03b1 \t beta: \u03b2 \t gamma: \u03b3
delta: \u03b4 \t epsilon: \u03b5 \t theta: \u03b8
lambda: \u03bb \t mu: \u03bc \t\t pi: \u03c0
sigma: \u03c3 \t phi: \u03c6 \t omega: \u03c9''')

alpha: α 	 beta: β 	 gamma: γ
delta: δ 	 epsilon: ε 	 theta: θ
lambda: λ 	 mu: μ 		 pi: π
sigma: σ 	 phi: φ 	 omega: ω


#### Bytes
A string prefixed with `b` returns an instance of the `bytes` type instead of `str` type. Bytes store ASCII characters only.

In [93]:
bytes('Tiếng Việt', 'utf-8')

b'Ti\xe1\xba\xbfng Vi\xe1\xbb\x87t'

In [94]:
'Tiếng Việt'.encode('utf-8')

b'Ti\xe1\xba\xbfng Vi\xe1\xbb\x87t'

In [95]:
b'Ti\xe1\xba\xbfng Vi\xe1\xbb\x87t'.decode('utf-8')

'Tiếng Việt'

#### Raw string
A string prefixed with `r`, it treats backslashes as literal characters. This is very useful to input a directory.

In [96]:
print(r'a\nb')

a\nb


In [97]:
r'C:\Users'

'C:\\Users'

#### Formatted string
Formatted strings are `f` prefixed, and the formatted part is placed inside braces `{}`.

In [98]:
f'1 + 2 = {1+2}'

'1 + 2 = 3'

In [99]:
name = 'Hung'
f'My name is {name}'

'My name is Hung'

### 3.5. Formatting

#### Basic formatting
The `str.format()` method fills its arguments to the placeholder (identified with the braces `{}`):
- If a placeholder is empty then it will be filled with the corresponding argument
- If a placeholder contains a number, that number acts as the index of the filling argument
- If a placeholder contains a name, it will be filled with the value of that variable

In [100]:
'The answer is {}'.format(3+4)

'The answer is 7'

In [101]:
'{} and {} and {}'.format('a', 1, None)

'a and 1 and None'

In [102]:
'{1} {0} {0} {1}'.format('three', 'seven')

'seven three three seven'

In [103]:
'The {cat} is {color}'.format(cat='panther', color='black')

'The panther is black'

The prefix `f` can be used instead of the `str.format()` method.

In [104]:
cat = 'panther'
color = 'black'
f'The {cat} is {color}'

'The panther is black'

#### Padding and truncating

In [105]:
# default padding uses whitespace characters and left aligns
'{:10}'.format('test')

'test      '

In [106]:
# right align
print('{:*>10}'.format('test'))

# left align
print('{:*<10}'.format('test'))

# center align
print('{:*^10}'.format('test'))

******test
test******
***test***


In [107]:
# truncate to a 6-character long string
'{:.6}'.format('google chrome')

'google'

In [108]:
# combine padding and truncating
'{:*>10.6}'.format('google chrome')

'****google'

#### Number formatting

In [109]:
# using comma (,) as thousands separator
'{x:,}'.format(x=123456789)

'123,456,789'

For integers, use `d`.

In [110]:
'{: 05d}'.format(85)

# " " indicates there is a leading space " " should be used on positive number, a minus sign "-" on a negative number
# "0" indicates the number 0 is used for padding
# "5" is the fixed length

' 0085'

For floats, use `f`.

In [111]:
# round to 2 decimal places
'{pi:.2f}'.format(pi=3.14159265359)

'3.14'

In [112]:
# "#" option causes the output to always contain a decimal point 
'{pi:#.0f}'.format(pi=3.14159265359)

'3.'

In [113]:
# "#" option causes the output to always contain a decimal point 
'{pi:.2%}'.format(pi=3.14159265359)

'314.16%'

### 3.6. Regular expression
A regular expression (regex) is a sequence of characters that defines a search pattern. For example, `^a...s$` defines any five-letter string starting with `a` and ending with `s`. Python has the module `re` to work with regex.

#### Search patterns
Metacharacters are characters that are interpreted in a special way by a regex engine. The metacharacters are: `.`, `^`, `$`, `*`, `+`, `?`, `[]`, `()`, `{}`, `\`, `|`.

In [114]:
import re

A period `.` matches any single character, except for `\n`. The `re.findall()` function finds all the substrings that match the search pattern.

In [115]:
# return all 3-character substrings
re.findall('...', 'abcd\n1234')

['abc', '123']

In [116]:
# return all 3-character substrings, including "\n"
re.findall('...', 'abcd\n1234', flags=re.DOTALL)

['abc', 'd\n1', '234']

The caret `^` is used to check if the string starts with certain characters. The dollar sign `$` is used to check if the string ends with certain characters.

In [117]:
# return the first 3 characters if the original string starts with "ab"
re.findall('^ab.', 'abcdefgf')

['abc']

In [118]:
# return the last 4 characters if the original string ends with "89"
re.findall('..89$', '0123456789')

['6789']

Adding `flags=re.MULTILINE` will search the pattern in every line instead of the entire string only, and return as much text as possible.

In [119]:
string = '''
abcd
amnd
abcde
aabcdd
'''
re.findall('^a..d$', string, flags=re.MULTILINE)

['abcd', 'amnd']

The following metacharacters match a number of occurrences of the preceding character:  
- The asterisk `*` matches 0 or more repetitions.
- The plus sign `+` matches 1 or more repetitions.
- The question mark `?` matches 0 or 1 repetitions.
- The braces `{x,y}` match at least x and at most y repetitions.

For example:
- `ma*n` matches, `mn`, `man`, `maan`, `maaan`
- `ma+n` matches, `man`, `maan`, `maaan`
- `ma?n` matches `mn` and `man` only
- `ma{2,4}n` matches `maan`, `maaan` and `maaaan` only

In [120]:
re.findall('ma+n', 'mn man maan maaan main')

['man', 'maan', 'maaan']

In [121]:
re.findall('ma{2,3}n', 'mn man maan maaan main')

['maan', 'maaan']

The brackets `[]` specify a set of character need to match. The hyphen `-` can be used to specify a range of characters and the caret `^` can be used to denote that the characters not in the set will be match.

In [122]:
re.findall('[Aa]', 'Anaconda')

['A', 'a', 'a']

In [123]:
re.findall('[a-e]', 'jupyter notebook')

['e', 'e', 'b']

In [124]:
re.findall('[^1-9]', 'abc123')

['a', 'b', 'c']

The backslash `\` behaves as an escape character.
- `\d` matches any digit. Equivalent to `[0-9]`. The opposite: `\D`.
- `\s` matches if a string contains any whitespace character such as space ` `, new line `\n` and tab `\t`. The opposite: `\S`.
- `\w` matches any alphanumeric character (digits and alphabets). Equivalent to `[a-zA-Z0-9]`. The opposite: `\W`.

In [125]:
re.findall('\d', 'a1b2c34')

['1', '2', '3', '4']

In [126]:
re.findall('\s', 'a string that spans\nmultiple lines')

[' ', ' ', ' ', '\n', ' ']

#### Regex functions

In [127]:
import re

The `re.findall()` function returns a list of substrings that matches the search pattern.

In [128]:
re.findall('\d+', 'date:08month:12year:2019')

['08', '12', '2019']

The `re.split()` function splits the original string using search pattern as the separators.

In [129]:
re.split(':\d+', 'date:08month:12year:2019')

['date', 'month', 'year', '']

In [130]:
re.split(':\d+', 'date:08month:12year:2019', maxsplit=2)

['date', 'month', 'year:2019']

The `re.sub()` function replaces all matches with the content of the `repl` parameter. The `re.subn()` function works the same, but it returns both the replaced string and the number of substitutions made.

In [131]:
re.sub(
    pattern=r'\s+',
    repl='_',
    string='tiger  lion cheetah \n leopard \t jaguar'
)

'tiger_lion_cheetah_leopard_jaguar'

In [132]:
re.sub(
    pattern=r'\s+',
    repl='_',
    string='tiger  lion cheetah \n leopard \t jaguar',
    count=2
)

'tiger_lion_cheetah \n leopard \t jaguar'

In [133]:
re.subn(
    pattern=r'\s+',
    repl='_',
    string='tiger  lion cheetah \n leopard \t jaguar'
)

('tiger_lion_cheetah_leopard_jaguar', 4)

The `re.search()` and `re.match()` functions both return an `re.match` object.

In [134]:
if re.match('...', 'abcdef'):
    print(True)
else:
    print(False)

True


In [135]:
# return 2 subgroups
match = re.search(r'(\d{3}) (\d{2})', '39801 356, 2102 1111')
print(match.group())
print(match.group(1))
print(match.groups())

801 35
801
('801', '35')


In [136]:
# return the indices of the first and the last character of the match
match = re.search(r'\d+', 'abc12345de')
print(match.start())
print(match.end())
print(match.span())

3
8
(3, 8)


## 4. Date and time
The built-in Python `datetime` module provides a variety of date and time classes.

In [137]:
import datetime

In [138]:
# show all functions and constants of datetime module
dir(datetime)

['MAXYEAR',
 'MINYEAR',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'date',
 'datetime',
 'datetime_CAPI',
 'sys',
 'time',
 'timedelta',
 'timezone',
 'tzinfo']

### 4.1. Datetime

In [139]:
import datetime as dt

In [141]:
dt.datetime.now()

datetime.datetime(2021, 6, 25, 0, 4, 18, 465375)

In [142]:
dt.date.today()

datetime.date(2021, 6, 25)

`dt.date` and `dt.time` objects are parts of `dt.datetime` object. These 3 objects work the same.

In [143]:
moment = dt.datetime(2019, 9, 12, 1, 23, 45)
print(moment)
moment

2019-09-12 01:23:45


datetime.datetime(2019, 9, 12, 1, 23, 45)

In [144]:
moment = dt.datetime(year=2019, month=9, day=12, hour=1, minute=23, second=45)
moment

datetime.datetime(2019, 9, 12, 1, 23, 45)

#### Manipulation

In [145]:
moment.year

2019

In [146]:
moment.month

9

In [147]:
moment.day

12

In [148]:
moment.hour

1

In [149]:
moment.minute

23

In [150]:
moment.second

45

In [151]:
moment.date()

datetime.date(2019, 9, 12)

In [152]:
moment.time()

datetime.time(1, 23, 45)

In [153]:
# change some features
moment.replace(hour=0, minute=0, second=0)

datetime.datetime(2019, 9, 12, 0, 0)

### 4.2. Timedelta
The difference between two `dt.datetime` objects is a `dt.timedelta` object. However, this object does not support adding frequencies greater than day, so we recommend using the [`dateutil.reltivedelta`] module instead.

[`dateutil.reltivedelta`]: https://dateutil.readthedocs.io/en/stable/relativedelta.html

In [1]:
import dateutil.relativedelta as du

In [7]:
import datetime as dt
from dateutil import relativedelta as du

In [40]:
date1 = dt.date(1945, 9, 2)
date2 = dt.date.today()
date2 - date1

datetime.timedelta(days=28190)

In [41]:
du.relativedelta(date2, date1)

relativedelta(years=+77, months=+2, days=+5)

In [32]:
# calculating with various frequencies
date1 + du.relativedelta(years=5, months=-3, weeks=1, days=-10)

datetime.date(2024, 6, 9)

In [35]:
# substitue with a specific day
date1 + du.relativedelta(day=31)

datetime.date(2019, 9, 30)

In [48]:
# next 2 Mondays
date1 + du.relativedelta(weekday=du.MO(2))

datetime.date(1945, 9, 10)

In [49]:
# date1 is already a Sunday
# 
date1 + du.relativedelta(weekday=du.SU, days=1)

datetime.date(1945, 9, 9)

:::{note}

- Substituing explicitly the day of 31 will always return the last day of that month
- Because 1945-09-02 is already a Sunday, Dateutil considers the next Sunday is still the same. To actually move to the next Sunday, we add 1 day forward.
- When using Dateutil, try not to mix too many date/time frequencies.

:::

### 4.3. Datetime formatting
Using the `dt.datetime.strftime()` and `dt.datetime.strptime()` methods. *strftime* stands for *string-format-time* and *strptime* stands for *string-parse-time*.

#### Format codes
Code|Meaning                         |Example                             |
:---|:-------------------------------|:-----------------------------------|
`%y`|Year, 2 last digits             |00, 01, 02,..., 99                  |
`%Y`|Year, full                      |0001, 0002,..., 2019, 2020,..., 9999|
`%b`|Month name, abbreviated         |Jan, Feb,..., Dec                   |
`%B`|Month name, full                |January, February,..., December     |
`%m`|Month of the year, zero-padded  |01, 02, 03,..., 12                  |
`%W`|Week of the year, zero-padded   |00, 01, 02,..., 53                  |
`%d`|Day of the month, zero-padded   |01, 02, 03,..., 31                  |
`%a`|Weekday name, abbreviated       |Mon, Tue,..., Sun                   |
`%A`|Weekday name, full              |Monday, Tuesday,..., Sunday         |
`%H`|Hour, 24-hour clock, zero-padded|00, 01, 02,..., 23                  |
`%I`|Hour, 12-hour clock, zero-padded|01, 02, 03,..., 12                  |
`%p`|AM or PM                        |AM, PM                              |
`%M`|Minute, zero-padded             |00, 01, 02,..., 59                  |
`%S`|Second, zero-padded             |00, 01, 02,..., 59                  |

In [158]:
import datetime as dt

In [159]:
# strptime() converts a string to a datetime object
string_1 = '11/8/2018 12:18:00 PM'
datetime_1 = dt.datetime.strptime(string_1, '%m/%d/%Y %I:%M:%S %p')
datetime_1

datetime.datetime(2018, 11, 8, 12, 18)

In [160]:
# strftime() converts a datetime object to a string
datetime_2 = dt.datetime(2018, 11, 8, 12, 18)
string_2 = datetime_2.strftime('%m/%d/%Y %I:%M:%S %p')
string_2

'11/08/2018 12:18:00 PM'

In [161]:
# get date, day, month of the current day, print as "DD-MM-YYYY"
date = dt.date.today()
date.strftime('%d-%m-%Y')

'25-06-2021'

### 4.4. Timestamp
Also known as Unix timestamp, defined as the number of seconds that have elapsed since 01/01/1970 07:00:00 UTC.

In [162]:
import datetime as dt

In [163]:
dt.datetime.now().timestamp()

1624554258.892254

In [164]:
dt.datetime.fromtimestamp(0)

datetime.datetime(1970, 1, 1, 7, 0)

## Reference
- *pyformat.info - [PyFormat](https://pyformat.info/)*
- *mkaz.blog - [Python string format cookbook](https://mkaz.blog/code/python-string-format-cookbook/)*