<a href="https://colab.research.google.com/github/fd-journey/journey/blob/main/01_DS2_Python_Types_Lecture.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Types

These first lectures are to make sure you have your python fundamentals.

# Data Types

Python has a few basic data types. We have integers, floating point numbers, Booleans and strings. Integers are ***whole*** numbers and do not include decimals. Floating Point numbers are decimal numbers. Booleans are essentially True or False statements. Lastly, strings hold text data. 


We can check the data type of a value in python with the `type(x)` function:

In [1]:
#integer Numbers
a = 5
print(f"a is a {type(a)}")

#floating point (real) numbers
b = 5.5
print(f"b is a {type(b)}")

#Boolean True/False
c= True
print(f"c is a {type (c)}")

#strings
d = "String"
print(f"d is a {type(d)}")


a is a <class 'int'>
b is a <class 'float'>
c is a <class 'bool'>
d is a <class 'str'>


Knowing what data type you are dealing with is important when writing code as each type has limits on what operation it can do. 

##Arithmetic Operations
Much like you can with regular numbers, you can add, subtract, multiply, divide, exponentiate, etc. integers and floating point numbers in Python.

In [3]:
#Addition/Subtraction
print(1+1)
print(1.+1.)

#multiplication/division
print(10./5.)
print(10//5)
print(5*3)

#Exponentiate 
print(3**2)
print(3.**2)

2
2.0
2.0
2.0
15
9
9.0


Notice that even the whole floating point numbers have a decimal place next to them. 

In Python, addition isn't limited to numbers. We can also do so with strings! 

**Note:** Addition and Multiplication are the only arithmetic operation we can do with strings. Subtraction, division, etc. are not viable operations with this data type. 

In [5]:
 print("python" + "notebook")
 print("python" * 2)
 

pythonnotebook
pythonpython


Functions that evaluate expressions return the resulting type:

In [7]:
y = 5 < 10

print(f"y is: {y}")
type(y)

y is: True


bool

In arithmetic expressions, `True` is converted to `1` and `False` is converted `0`.

This is called **Boolean arithmetic**. It's a surprisingly rich field of mathematics and very useful in programming!

Here are some examples:

In [8]:
bool_list = [True, False, False, True]
sum(bool_list)

2

In [10]:
True + True

2

In [11]:
True*5*False

0

#Boolean Conditionals

As mentioned above, Boolean statements are simply statements that evaluate to True or False and using this, we can create code that executes based on whether the data we're given fits a certain criteria.

For Example:

In [13]:
x = 5
if x == 5:
  print("x is 5!")

x is 5!


Since the statement x==5 evaluated to True, Python was able to print out the message. Let's see what happens if x isn't equal to 5.

In [14]:
x = 3
if x == 5:
  print("x is 5!")

As you can see, since x was not equal to 5, the boolean condition of x = 5 evaluated to False and Python did not execute the print command. We will see more of this when we talk about loops.

###Converting Between Data Types

Sometimes when programming, we run into issues where we need to convert between the data types. Ex. We wanted to multiply a number by another number but since one of the numbers was a string, the operation failed.

In [15]:
"5" + 0

TypeError: ignored

Instead, we can use Python to convert between data types when such an issue arises.

In [16]:
int("5") - 0

5

In [None]:
# examples of data conversions
A = float(5)
B = str(5)
C = int(4.5)

print (A)
print (B)
print (C)

5.0
5
4


# Containers

Containers are types that store collections of (not necessarily type-homogeneous) data. 

**NOTE: The individual entries in a list can be referred to as "Elements"**

# Lists

Square Brackets denote python `list`:

In [17]:
A = [3,5,6,7,8,9,3, False, True]
A

[3, 5, 6, 7, 8, 9, 3, False, True]

In [18]:
#to add to a list
A.append("String")
A

[3, 5, 6, 7, 8, 9, 3, False, True, 'String']

You can put objects into lists. Note that the object in the list is just a **reference** to the underlying object:

In [20]:
b = ["cat", "dog"]
A.append(b)
A

#to update B

b.append("mouse")
A

[3,
 5,
 6,
 7,
 8,
 9,
 3,
 False,
 True,
 'String',
 ['cat', 'dog'],
 ['cat', 'dog', 'mouse']]

Now if we change `b` then the value will change in `a`

In [21]:
#0 index, to access 3rd element, use 2
A[2]

6

A handy feature common to all containers is that you can "pick out" or call individual elements using brackets. **NOTE: The types of brackets matter when calling elements of different containers**

In [23]:
#curly brackets for dictionaries and sets
A[2]

6

Note that element `#3` is actually the **fourth element** because Python *always counts from zero*.

There are certain advantages to starting from 0 over 1 but overall, it's not any better or worse than coding on a language that indexes starting from 1.

We can also **slice** using the colon in the square bracket. Slicing is a method that allows us to call multiple entries in a list between a starting and ending index that we set:

In [24]:
A[0:4]

[3, 5, 6, 7]

**NOTE:** When slicing, while Python does call the starting index, it *does not* call the ending index as Python counts from 0.

In [26]:
A[4]

8

As you can see, the element with index 5 is 999 and was not called in the above slice. 

The general rule is that `a[m:n]` returns `n - m` elements, starting at `a[m]`.

Negative numbers also work and **go back from the first element**

In [27]:
A[-1]

['cat', 'dog', 'mouse']

In [33]:
A[-4:-1]
A[2:] # from elemtn till end

[6, 7, 8, 9, 3, False, True, 'String', ['cat', 'dog'], ['cat', 'dog', 'mouse']]

Elements in lists can also be overwritten:

In [35]:
A[0] = "First"
A

['First',
 5,
 6,
 7,
 8,
 9,
 3,
 False,
 True,
 'String',
 ['cat', 'dog'],
 ['cat', 'dog', 'mouse']]

Here, I replaced the first element in our list with baseball.

###List of Lists

A List of Lists is a list where all the entries or elements are lists themselves. Calling certain elements in this data type requires an additional step. as you can see below.  

In [40]:
a = [[1,2],[34,4],[3,4],[5,6]]
#Slicing the first three elements in a
print(a[:3])
#Calling the Second Element in a
print(a[1])
#Calling the first entry in the Second Element of a
print(a[1][0])
#Slicing the second entry in the first 3 elements of a
print(a[2:3][0]) #check back 


[[1, 2], [34, 4], [3, 4]]
[34, 4]
34
[3, 4]


Note that since strings are effectively containers of single characters we can slice strings as well:

In [41]:
Mystring = "Journey"
Mystring[0:4]

'Jour'

# Tuples

Parentheses denote a python `tuple`. Tuples are the default container: if you put commas between objects it'll default to a tuple.

In [None]:
x = ('a', 'b')
y = 'a', 'b' # You can skip the brackets
print(x)
print(x == y)

('a', 'b')
True


Tuples are like lists except they're immutable (e.g. not-mutable; can't be modified)

In [44]:
x = ('a', 'b')
x[0] = "this will fail" #can't be updated
x[0:2] #tuples can be sliced

('a', 'b')

In [None]:
l = [True, 0, 5.5]
l[0] = "This works on a list"
l

Watch out: Leaving an accidental comma after an expression will convert it into a tuple with an empty second element:

In [47]:
x = 5,
print(f"X is a {type(x)}")
x

X is a <class 'tuple'>


(5,)

# Set

A **set** is a container where objects are forced to be unique. It's denoted by the *curly brackets*

In [50]:
s = {1,1,1,1,2}
print(type(s))
s

<class 'set'>


{1, 2}

In [52]:
#removing duplicates by converting list into set and reconverting to list
d = [1,2,34,2,1,1,1,5]
list(set(d))

[1, 2, 34, 5]

# Associative Containers

The python `dict` is an "associative array" or a "map" -- it associates (maps) values to other values

In [55]:
d = {'name': 'Sam', 
     'age': 31,
     'race': 'hobbit',
}
type(d)

dict

One way to think of dictionaries is that they are like lists except that the items are named instead of numbered

In [56]:
d["name"]

'Sam'

The names `'name'` and `'age'` are called the *keys*.

Keys are unique in a dictionary, so resetting a key will change the mapping.

This is why sets and dictionaries both use curly brackets (the keys are a set)

In [None]:
#to print all keys
print(d.keys())
#print all values

print(d.values())



In [58]:
#dictionaries can be changed
d["name"] = "Samantha"
d

{'age': 31, 'name': 'Samantha', 'race': 'hobbit'}

##Arrays

Arrays are containers that carry mainly numeric information. For example, if I had a matrix:

$$\begin{bmatrix} 1 & 2 & 1 \\ 3 & 0 & 1 \\ 0 & 2 & 4 \end{bmatrix}$$

The way I could represent this in python is by using an array. The package that deals with arrays and array calculations is called "Numpy".

In [63]:
import numpy as np
matrix = [[1,2,1],[3,0,1],[0,2,4]]
print(matrix)
Matrix = np.array(matrix)
print(Matrix)

[[1, 2, 1], [3, 0, 1], [0, 2, 4]]
[[1 2 1]
 [3 0 1]
 [0 2 4]]


Arrays are different from lists of lists in that I can carry out operations **elementwise**

In [69]:
#works on arrays
Matrix + 5

array([[6, 7, 6],
       [8, 5, 6],
       [5, 7, 9]])

In [68]:
#doesnt work on lists
matrix + 5

TypeError: ignored

As you can see above, I was able to add 5 to every element in the array but ran into an error when I ran the same code with a list of lists. Arrays also have the added benefit of having their elements called with Boolean Statements.

In [75]:
Matrix[:2,1] 


array([2, 0])

In [82]:
Matrix[Matrix < 5]
lis = [2,4,5,6,7,8,9]
lis = np.array(lis)
x = lis[lis<=6]
lis = list(x)
print(x)
#or this in 1 line: 
#lis = list(lis[lis<=6])

[2 4 5 6]


#Data Frames

Lastly, is the most commonly used data container in data science, Data Frames. Data Frames are a versatile container as they can store multiple types of data inside it and provide corresponding labels for them. The package that handles data frames is "pandas".

In [None]:
#Creating a data frame
import pandas as pd
s = pd.Series([1, 3, 5, np.nan, 6, 8])
dates = pd.date_range("20130101", periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df


Unnamed: 0,A,B,C,D
2013-01-01,-0.873197,0.883904,0.804738,-1.135271
2013-01-02,2.902201,1.507966,0.626154,-0.834217
2013-01-03,2.562273,-1.353514,-1.433148,-0.246318
2013-01-04,0.574619,-0.201205,-1.657429,0.077034
2013-01-05,-0.058098,2.25163,0.188237,0.103833
2013-01-06,-1.014063,0.239728,-0.434771,-1.14692


Here, I created a dataframe where I generated columns with random numbers attached. We can also append columns as needed to pandas dataframes. Each individual column is knownw as a "Series.

In [None]:
#Adding a List of fruits as a column in the data frame
df['E'] = ['Apples', 'Oranges', 'Pears', 'Grapes', 'Banana', 'Guava']
df

Unnamed: 0,A,B,C,D,E
2013-01-01,-0.873197,0.883904,0.804738,-1.135271,Apples
2013-01-02,2.902201,1.507966,0.626154,-0.834217,Oranges
2013-01-03,2.562273,-1.353514,-1.433148,-0.246318,Pears
2013-01-04,0.574619,-0.201205,-1.657429,0.077034,Grapes
2013-01-05,-0.058098,2.25163,0.188237,0.103833,Banana
2013-01-06,-1.014063,0.239728,-0.434771,-1.14692,Guava


We'll dive more into the usage of data frames in later workshops as there is quite abit to get when it come to the specifics of manipulating data in a data frame.

#Converting Between Data Containers

In the same way we can convert between data types, we can convert between data containers as well. As you can see below: 

In [None]:
#List to Set
a = [1,2,3,2,1,4,5,4,2,3,5,77,33]
a = set(a)
print (a)

{1, 2, 3, 4, 5, 33, 77}


In [None]:
#Tuple to List
a = (1,2,3,4,'a','d','f')
a= list(a)
print (a)

[1, 2, 3, 4, 'a', 'd', 'f']


In [None]:
#List to Tuple
a = [1,2,3,4,5,5,6,67,7]
a = tuple(a)
print(a)

(1, 2, 3, 4, 5, 5, 6, 67, 7)
