# UBC
## Programming in Python for DS
### Week 4
Instructor: Socorro Dominguez-Vidana

## Overview

- [] Compare and contrast Python's key **data types**.
- [] Compare and contrast Python's key **data structures**.
- [] Use Python to determine the type and structure of an object.
- [] Demonstrate how to create data structures and convert them to another.
- [] Identify which operations can be applied to different data types and columns dtypes.

### Python Built-in Datatypes

- Integers
```python
5
```

- Floats
```python
5.0
```

- Booleans (capitalized, no `'` or `"` needed)
```python
True
False
```

- Strings (ordered, immutable)
```python
"Hello"
```

To see what data type an element is, use the command `type()`:

```python
type(5)
```

```output
int
```

In [1]:
type(3)

int

#### Casting Datatypes
Sometimes, we need to use information that comes in a different data type from what we need...

In [2]:
'5'

'5'

In [3]:
2 +'5'

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [4]:
int('5')

5

In [5]:
type(int('5'))

int

In [6]:
int('5')+2

7

In [7]:
float('5')

5.0

In [8]:
int(1.9999)

1

##### Example of Casting

In [9]:
import re

In [10]:
sentence = "I bought 2.5 Lbs of apples and 4 lbs Of oranges"

In [11]:
numbers = re.findall(r"(?:\d+(?:\.\d*)?|\.\d+)", sentence)
numbers

['2.5', '4']

In [12]:
numbers[1]

'4'

In [13]:
float(numbers[0]), int(numbers[1])

(2.5, 4)

In [14]:
float(numbers[0]) + int(numbers[1])

6.5

#### Strings
##### Characteristics

- `'` `"` up to your preference.
    - However, if you use `'` as an apostrophe, then, you need to use `"` 
- Ordered
- Immutable

In [15]:
"He's John"

"He's John"

In [16]:
"Hello!"

'Hello!'

In [17]:
'Hello'

'Hello'

In [18]:
sentence = "I bought 2.5 Lbs of apples and 4 lbs Of oranges"

##### Order allows access to different index positions.

In [19]:
sentence[4]

'u'

In [20]:
sentence[4:15]

'ught 2.5 Lb'

##### Immutable 
It means that we cannot change/modify parts of it

In [21]:
sentence[0] = 'U'

TypeError: 'str' object does not support item assignment

##### Methods

In [22]:
sentence.split()

['I',
 'bought',
 '2.5',
 'Lbs',
 'of',
 'apples',
 'and',
 '4',
 'lbs',
 'Of',
 'oranges']

In [23]:
sentence.title()

'I Bought 2.5 Lbs Of Apples And 4 Lbs Of Oranges'

In [24]:
sentence.upper()

'I BOUGHT 2.5 LBS OF APPLES AND 4 LBS OF ORANGES'

In [25]:
sentence.lower()

'i bought 2.5 lbs of apples and 4 lbs of oranges'

In [26]:
sentence

'I bought 2.5 Lbs of apples and 4 lbs Of oranges'

##### `.upper()` and `.lower()` as normalization tools

In [27]:
from word_count import word_count

In [28]:
word_count(sentence)

{'I': 1,
 'bought': 1,
 '2.5': 1,
 'Lbs': 1,
 'of': 1,
 'apples': 1,
 'and': 1,
 '4': 1,
 'lbs': 1,
 'Of': 1,
 'oranges': 1}

In [29]:
word_count(sentence.lower())

{'lbs': 2,
 'of': 2,
 'i': 1,
 'bought': 1,
 '2.5': 1,
 'apples': 1,
 'and': 1,
 '4': 1,
 'oranges': 1}

`.upper()` can be used when we are working with codes (such as ISBN or DOIs), headers, or commands.

## Data Structures

#### Lists

- Ordered sequence of items
- `[` and `]`
- Different data types can be stored

```python
df.groupby(by=['Col1', 'Col2']).mean()
```

##### Methods

In [30]:
a_string = "Hello world! How are you?"

In [31]:
a_list = a_string.split()
a_list

['Hello', 'world!', 'How', 'are', 'you?']

In [32]:
a_new_list = []
b_new_list = []
for word in a_list:
    a_new_list.append(word)
    print(word)
    # Do not do
    #b_new_list.append(print(word))

Hello
world!
How
are
you?


In [33]:
a_new_list

['Hello', 'world!', 'How', 'are', 'you?']

In [34]:
type(print("hello"))

hello


NoneType

In [35]:
a_new_list.append("Good bye")
a_new_list.append(print("Good bye"))

Good bye


In [36]:
a_new_list

['Hello', 'world!', 'How', 'are', 'you?', 'Good bye', None]

In [37]:
my_list = [["hello", "world"], 5, 3.4, "hello", True]
my_list

[['hello', 'world'], 5, 3.4, 'hello', True]

We can slice them / Access their elements

In [38]:
my_list[0]

['hello', 'world']

In [39]:
my_list[0][0]

'hello'

In [40]:
my_list[1:3]

[5, 3.4]

In [41]:
import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Col1': ['A', 'A', 'B', 'B', 'D', 'C'],
    'Col2': [2, 1, 9, 8, 7, 4],
    'Col3': [0, 1, 9, 4, 2, 3],
    'Col4': [['a','A'], ['B'], ['c'], ['D'], ['e'], ['F']]})
df

Unnamed: 0,Col1,Col2,Col3,Col4
0,A,2,0,"[a, A]"
1,A,1,1,[B]
2,B,9,9,[c]
3,B,8,4,[D]
4,D,7,2,[e]
5,C,4,3,[F]


In [42]:
df['Col1']

0    A
1    A
2    B
3    B
4    D
5    C
Name: Col1, dtype: object

In [43]:
a_list = ['Col1', 'Col2']

In [44]:
df[a_list]

Unnamed: 0,Col1,Col2
0,A,2
1,A,1
2,B,9
3,B,8
4,D,7
5,C,4


In [45]:
df.sort_values(by='Col1')

Unnamed: 0,Col1,Col2,Col3,Col4
0,A,2,0,"[a, A]"
1,A,1,1,[B]
2,B,9,9,[c]
3,B,8,4,[D]
5,C,4,3,[F]
4,D,7,2,[e]


In [46]:
df.sort_values(by=['Col1', 'Col2'])

Unnamed: 0,Col1,Col2,Col3,Col4
1,A,1,1,[B]
0,A,2,0,"[a, A]"
3,B,8,4,[D]
2,B,9,9,[c]
5,C,4,3,[F]
4,D,7,2,[e]


In [47]:
# There is a `sort_values()` for Series and one for DataFrames
df['Col3'].sort_values()

0    0
1    1
4    2
5    3
3    4
2    9
Name: Col3, dtype: int64

[pd.DataFrame.sort_values()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html)

[pd.Series.sort_values()](https://pandas.pydata.org/docs/reference/api/pandas.Series.sort_values.html)

##### List Comprehension

We can manipulate a whole list using list comprehension:

In [48]:
# Add 5 to each element of the following list

new_list = [0,3,4,5]

other_new_list = []

for i in new_list:
    x = i+5
    other_new_list.append(x)

other_new_list

[5, 8, 9, 10]

In [49]:
# Using list comprehension 
other_new_list2 = [i+5 for i in new_list]

other_new_list2

[5, 8, 9, 10]

In [50]:
new_list

[0, 3, 4, 5]

In [51]:
new_list[2] = 10

In [52]:
new_list

[0, 3, 10, 5]

In [53]:
my_list

[['hello', 'world'], 5, 3.4, 'hello', True]

In [54]:
my_list.append(4)

In [55]:
my_list

[['hello', 'world'], 5, 3.4, 'hello', True, 4]

In [56]:
my_list = my_list.append(4)

In [57]:
my_list

**Note** that we do not do 
```python
my_list = my_list.append(4)
```
This will lead your list to end up being `None`

**Note** If you rerun the code cell multiple times, the value will be appended multiple times.

#### Tuples
Tuples are a data structure very similar to lists but with two main differences:

- Represented with parentheses `(` `)`
- Immutable

In [58]:
my_tuple = ('I', None,  'do', 1, False)
my_tuple

('I', None, 'do', 1, False)

In [59]:
my_list = ['I', None,  'do', 1, False]
my_list

['I', None, 'do', 1, False]

In [60]:
my_list[0] = "You"

In [61]:
my_list

['You', None, 'do', 1, False]

In [62]:
my_tuple[0] = "You"

TypeError: 'tuple' object does not support item assignment

##### Modifying pd.DataFrames with lists and tuples

In [63]:
df = {'Col1': ['A', 'B', 'B', 'A'],
      'Col2': [2,3,4,4]}
df = pd.DataFrame(df)
df

Unnamed: 0,Col1,Col2
0,A,2
1,B,3
2,B,4
3,A,4


In [64]:
df.columns

Index(['Col1', 'Col2'], dtype='object')

In [65]:
a_list = ['a', 'b']

In [66]:
df.columns = a_list

In [67]:
df

Unnamed: 0,a,b
0,A,2
1,B,3
2,B,4
3,A,4


In [68]:
df.shape

(4, 2)

In [69]:
df.shape = (2,4)

  df.shape = (2,4)


AttributeError: property 'shape' of 'DataFrame' object has no setter

#### Sets

- Unordered
- The values contained are unique - meaning there are no duplicate entries.
- Sets are made with curly brackets `{`, `}`.

In [70]:
my_set = {2, 1.0, 'aPple', 1.0, 'apple'}
my_set

{1.0, 2, 'aPple', 'apple'}

In [71]:
my_set[0]

TypeError: 'set' object is not subscriptable

In [72]:
list(my_set)

[1.0, 2, 'aPple', 'apple']

In [73]:
type(my_set)

set

**You can edit them - you can remove elements from them**

In [74]:
my_set.remove('aPple')
my_set

{1.0, 2, 'apple'}

In [75]:
# You can remove the element just once
my_set.remove('aPple')
my_set

KeyError: 'aPple'

In [76]:
my_set

{1.0, 2, 'apple'}

### Dictionaries
- Represented by `{ key : value, key2: value2}`
- Pairs of keys and corresponding values
- `keys` must not be repeated (otherwise only one will be considered)
- If the length of the values is the same, you can create a `pandas.DataFrame`.

In [77]:
data = {'Col1' : [2,3,4],
        'Col2' : [3,4,5]}

In [78]:
pd.DataFrame(data)

Unnamed: 0,Col1,Col2
0,2,3
1,3,4
2,4,5


In [79]:
account_details = {'Name':['Jack', 'John'],
                   'Account_Type':['Checking', 'Checking'],
                   'Branch': [13, 12],
                   'Age': [23, 22]}

df = pd.DataFrame(account_details)
df

Unnamed: 0,Name,Account_Type,Branch,Age
0,Jack,Checking,13,23
1,John,Checking,12,22


In [80]:
account_details.keys()

dict_keys(['Name', 'Account_Type', 'Branch', 'Age'])

In [81]:
account_details.values()

dict_values([['Jack', 'John'], ['Checking', 'Checking'], [13, 12], [23, 22]])

In [82]:
account_details.items()

dict_items([('Name', ['Jack', 'John']), ('Account_Type', ['Checking', 'Checking']), ('Branch', [13, 12]), ('Age', [23, 22])])

### Accessing Values via the Keys

In [83]:
account_details = {'Name':'Jane',
                   'Account_Type':'Checking',
                   'USD': 20,}

In [84]:
account_details2 = {'Name': 'Jane',
 'Account_Type': 'Savings',
 'USD': 15}

In [85]:
account_details['Name']

'Jane'

In [86]:
account_details2['Name']

'Jane'

In [87]:
key = 'USD'
account_details[key] - account_details2[key]

5