# FIT1043 Introduction to Data Science
Week 2 Lecture

## Basic Python Data Types

Like most programming languages, there are some basic data types. We need the computer to discern whether we are dealing with whole numbers (generally referred to as integers in computing) or whether we are working with values like currency, which involve decimals. Some programming languages support fixed-point decimals, but most commonly refer to them as floating-point numbers.

Another important data type is logic, which comprises true or false values.

Finally, the majority of modern programming languages provide some means of handling strings. However, in the C programming language, there are no built-in data types specifically designated for strings. Instead, strings are represented as arrays of characters in C

The basic data types in Python are:
- Integers
- Floating-Point Numbers
- Boolean
- Strings

### Integer (int)

Python interprets a sequence of decimal (power of 10) digits without any prefix (``0b``, ``0o`` or ``0x``) to be a decimal number:

A prefix of ``0b`` is interpreted as a binary sequence of numbers

In [1]:
print(10)

10


In [9]:
print(0b10)

2


``0o`` is interpreted as a octal sequence of numbers (rarely used)

In [10]:
print(0o10)

8


``0x`` is interpreted as a hexadecimal sequence of numbers.

In [11]:
print(0x10)

16


In [13]:
type(0x10)

int

### Floating Point (float)

The values are specified with a decimal point.

In [4]:
print(4.2)

4.2


In [5]:
type(4.2)

float

In [6]:
4.

4.0

For scientific notation style, the character e followed by a positive or negative integer may be used.

In [7]:
.4e7

4000000.0

In [8]:
type(.4e7)

float

In [21]:
4.2e-4

0.00042

### Boolean (bool)

Note that this type is only available in Python 3 and it is not available in Python 2.  The Boolean type (in any language) has one of two values, ``True`` or ``False``

In [22]:
type(True)

bool

In [23]:
type(False)

bool

In [24]:
print(True | False)

True


In [25]:
print(True & False)

False


### Strings (str)

Strings are delimited using either the single or double quotes.

Only the characters between the opening delimiter and matching closing delimiter are part of the string

In [26]:
print(I am here)

SyntaxError: invalid syntax (1203604894.py, line 1)

In [27]:
print("I am a string.")

I am a string.


In [28]:
print('I am a string in single quotes')

I am a string in single quotes


In [29]:
type("I am a string.")

str

Handling strings can be a bit more complicated than we initially think.  For example, if you have quotes in a sentence like 
- You aren’t simple
- He said "Hello!"

In [30]:
print('you aren't simple')

SyntaxError: invalid syntax (3143795980.py, line 1)

In [31]:
print("He said "Hello!"")

SyntaxError: invalid syntax (1259159016.py, line 1)

In [32]:
print("you aren't simple")

you aren't simple


In [33]:
print('He said "Hello!"')

He said "Hello!"


The earlier example is just for the basics of putting the sequence of characters between the delimiters as a string.

What about
- She ain't happy and shouted "Arghhhh!"

- There are many other considerations to cater for special characters in strings handling.

- Use ``\`` (back-slash) as the escape character.

In [34]:
print("She ain't happy and shouted \"Arghhhh!\"")

She ain't happy and shouted "Arghhhh!"


In [35]:
print('you aren\'t simple')

you aren't simple


There are a few reserved special escape characters:

- ``\t``        Tab
- ``\n``        New line
- ``\uxxxx``    16-bit unicode character. <https://en.wikipedia.org/wiki/List_of_Unicode_characters>

In [47]:
print("This is a \t\t\t triple tab example\na new line\nand a Yen sign \u00A5")

This is a 			 triple tab example
a new line
and a Yen sign ¥


### Dynamically Typed Language

For those who learned programming with static typed languages, you will need to declare the variables, e.g., in C.

``int x;``

In Python, there is no declaration and it is only known at run-time.

In [48]:
x = 10
print(type(x))

<class 'int'>


In [49]:
y = x + 1.1
print(y)

11.1


In [50]:
type(x)

int

In [51]:
type(y)

float

In [52]:
x = 'Hello, world'
print(type(x))

<class 'str'>


## Built-In Functions

There are more than 65 built-in functions in the current Python version.  These functions cover
- Maths
- Type Conversions
- Iterators
- Composite Data Types
- Classes, Attributes, and Inheritance
- Input/Output
- Variables, References, and Scope
- Others

You can refer to them [&lt;here&gt;](https://docs.python.org/3.8/library/functions.html)

*************

## Operators and Strings Manipulation

Arithmetic operators
- +, -, *, /, % etc.

Comparison operators
- \>, \<, \<=, \>=, !=, ==

String operators
- +, *, in


## Strings Manipulation
Some useful stuff for Data Science

Indexing.  The number indicated inside the ``[]`` is the position of the character that we would like to access.  Note that in most languages, the first indexing number is 0.

In [56]:
s = 'foobar'
s[0]

'f'

In [60]:
'f' in s

True

In [61]:
s[5]

'r'

Built-in function len(), which is to display the length of the string in this case.

In [62]:
len(s)

6

We can index it using the output of functions.  The following two are the same.

In [66]:
s[len(s)-1]

'r'

In [67]:
s[-1]

'r'

### Subsetting strings

In [35]:
s = 'foobar'
s[2:5]

'oba'

In [69]:
s[0:4]

'foob'

In [70]:
s[2:]

'obar'

In [38]:
s[:4] + s[4:]

'foobar'

In [39]:
s[:4] + s[4:] == s

True

Striding the string

In [79]:
s = 'foobar'
s[0:6:2]

'foa'

In [80]:
s[1:6:2]

'obr'

In [81]:
s[0:7:3]

'fb'

*************

## More Python Data Types

There are Python data types that are useful for data science
- list
- tuples
- dictionary

### List

A Python ``list`` is a collection of objects (not necessary the same). lists are defined by square brackets that encloses a comma-separated sequence of objects(``[]``)


In [82]:
a = ['foo', 'bar', 'baz', 'qux']
print(a)

['foo', 'bar', 'baz', 'qux']


In [83]:
b = ['foo', 1, False, 9.9]
print(b)

['foo', 1, False, 9.9]


In [86]:
print(b[3])

9.9


In [87]:
type(b[3])

float

In [88]:
type(b[2])

bool

Note that:
- Lists are ordered.
- Lists can contain any arbitrary objects. (see variable ``b`` above)
- List elements can be accessed by index. (see examples above)
- Lists can be nested to arbitrary depth.
- Lists are mutable.
- Lists are dynamic.

Lists can be nested to arbitrary depth:

In [89]:
c = [['foo', 1, False, 9.9],'bar', 'baz', 'qux']
print(c)

[['foo', 1, False, 9.9], 'bar', 'baz', 'qux']


Lists are mutable: This means we can change an item in a list by accessing it directly as part of the assignment statement.

In [92]:
c[1] = 'bus'
print(c)

[['foo', 1, False, 9.9], 'bus', 'baz', 'qux']


Lists are dynamic: meaning that you can add elements to the list or remove elements from a list completely

In [98]:
d = a + b + c
print(d)

['foo', 'bar', 'baz', 'qux', 'foo', 1, False, 9.9, ['foo', 1, False, 9.9], 'bus', 'baz', 'qux']


### Tuple

Tuples are identical to lists in all aspects except that the content are immutable (fixed). Tuples are defined by round brackets (parentheses) that encloses a comma-separated sequence of objects (``()``).


In [99]:
a = ('foo', 'bar', 'baz', 'qux')
print(a)

('foo', 'bar', 'baz', 'qux')


In [100]:
b = ('foo', 1, False, 9.9)
print(b)

('foo', 1, False, 9.9)


In [101]:
c = (['foo', 1, False, 9.9],'bar', 'baz', 'qux')
print(c)

(['foo', 1, False, 9.9], 'bar', 'baz', 'qux')


In [102]:
c[0]

['foo', 1, False, 9.9]

In [103]:
c[1] = 'bus'

TypeError: 'tuple' object does not support item assignment

In [104]:
d = a + b + c
print(d)

('foo', 'bar', 'baz', 'qux', 'foo', 1, False, 9.9, ['foo', 1, False, 9.9], 'bar', 'baz', 'qux')


### Dictionary

Dictionary is similar to a list in that it is a collection of objects. Only difference is that list is ordered and indexed by their position whereas dictionary is indexed by the key.


- Think of it as a key-value pair.
- This maps nicely to Data Science when there is access to NoSQL databases that stores items in key-value pairs.

``
d = dict([
    (<key>, <value>),
    (<key>, <value),
      .
      .
      .
    (<key>, <value>)
])
``


In [106]:
person = {}
person['fname'] = 'Ian'
person['lname'] = 'Tan'
person['age'] = 19
person['pets'] = {'dog': 'Barney', 'cat': 'Dino'}
person

{'fname': 'Ian',
 'lname': 'Tan',
 'age': 19,
 'pets': {'dog': 'Barney', 'cat': 'Dino'}}

In [58]:

print(person['fname'])
#person[0]

Ian


In [59]:
print(person['pets'])

{'dog': 'Barney', 'cat': 'Dino'}


In [60]:
print(person['pets'].items())

dict_items([('dog', 'Barney'), ('cat', 'Dino')])


In [61]:
print(person['pets']['dog'])

Barney


In [107]:
type(person)

dict

There are many options on how to use the dictionary type and it will be left for you to explore.

*************


## Controls
In most programming languages, you only need to code it in sequence and then there are two other techniques you need to know:
    - conditions (if, else)
    - iterations (for, while)

### Conditions

``
if <expr>:
    <statement>
elif <expr>:
    <statement(s)>
elif <expr>:
    <statement(s)>
else:
    <statement(s)>
``

Note: Python uses indentation

In [123]:
a = 1
b = 20
if b > a:
    print("b is greater than a")

b is greater than a


In [124]:
a = 30
if b > a:
    print("b is greater than a")

In [125]:
if b > a:
    print("b is greater than a")
else:
    print("a is greater than b")

a is greater than b


In [126]:
a = 20
if b > a:
    print("b is greater than a")
else:
    print("a is greater than b")

a is greater than b


In [127]:
if b > a:
    print("b is greater than a")
elif a > b:
    print("a is greater than b")
else:
    print("a and b are equal")

a and b are equal


### Iterations

``
while <expr>:
    <statement(s)>
``

or (one example of for)

``
for x in range():
    <statement(s)>
``

In [128]:
i = 1
while i < 6:
  print(i)
  i += 1

1
2
3
4
5


There are many Python variations for the ``for`` loops.

``for`` iterating over a list (hope you remember what is a list)

In [129]:
fruits = ["mango", "banana", "pineapple", "durians"]
for x in fruits:
  print(x)

mango
banana
pineapple
durians


``for`` iterating over a string

In [130]:
for x in "banana":
  print(x)

b
a
n
a
n
a


``for`` iterating over a numeric range

In [131]:
for x in range(3):
  print(x)

0
1
2


That's the **end** of the Python basics.