<a href="https://colab.research.google.com/gist/dakilaledesma/c6b7a59b740caf8a2298cd75826f0f8f/python-basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Notebook prepared by Mathieu Blondel.

# Welcome

We will learn about the programming language Python.

# Notebooks

We will use Jupyter notebooks and Google colab to learn Python. Notebooks are a great way to mix executable code with rich contents. Colab allows to run notebooks on the cloud for free without any prior installation.

The document that you are reading is not a static web page, but an interactive environment called a notebook, that lets you write and execute code. Notebooks consist of so-called code cells, blocks of one or more Python instructions. For example, here is a code cell that stores the result of a computation (the number of seconds in a day) in a variable and prints its value:

In [None]:
seconds_in_a_day = 24 * 60 * 60
seconds_in_a_day

86400

Click on the "play" button to execute the cell. You should be able to see the result. Alternatively, you can also execute the cell by pressing Ctrl + Enter if you are on Windows / Linux or Command + Enter if you are on a Mac.

Variables that you defined in one cell can later be used in other cells:

In [None]:
seconds_in_a_week = 7 * seconds_in_a_day
seconds_in_a_week

604800

Note that the order of execution is important. For instance, if we do not run the cell storing *seconds_in_a_day* beforehand, the above cell will raise an error, as it depends on this variable. To make sure that you run all the cells in the correct order, you can also click on "Runtime" in the top-level menu, then "Run all".

**Exercise.** Add a cell below this cell: click on this cell then click on "+ Code". In the new cell, compute the number of seconds in a year by reusing the variable *seconds_in_a_day*. Run the new cell.

# Python

Python is one of the most popular programming languages for data science and machine learning, both in academia and in industry. As such, it is essential to learn this language for anyone interested in machine learning. In this section, we will review Python basics.

## Arithmetic operations

Python supports the usual arithmetic operators: + (addition), * (multiplication), / (division), ** (power), // (integer division).

In [None]:
1 + 2

3

In [None]:
3-1

2

In [None]:
2*4

8

In [None]:
5/2

2.5

In [None]:
2**3

8

In [None]:
5//2

2

## Lists

Lists are a container type for ordered sequences of elements. Lists can be initialized empty

In [None]:
my_list = []

or with some initial elements

In [None]:
my_list = [1, 2, 3]

Lists have a dynamic size and elements can be added (appended) to them

In [None]:
my_list.append(4)
my_list

[1, 2, 3, 4]

We can access individual elements of a list (indexing starts from 0)

In [None]:
my_list[2]

3

We can access "slices" of a list using `my_list[i:j]` where `i` is the start of the slice (again, indexing starts from 0) and `j` the end of the slice. For instance:

In [None]:
my_list[1:3]

[2, 3]

Omitting the second index means that the slice shoud run until the end of the list

In [None]:
my_list[1:]

[2, 3, 4]

We can check if an element is in the list using `in`

In [None]:
5 in my_list

False

The length of a list can be obtained using the `len` function

In [None]:
len(my_list)

4

## Strings

Strings are used to store text. They can delimited using either single quotes or double quotes

In [None]:
string1 = "some text"
string2 = 'some other text'

Strings behave similarly to lists. As such we can access individual elements in exactly the same way

In [None]:
string1[3]

'e'

and similarly for slices

In [None]:
string1[5:]

'text'

String concatenation is performed using the `+` operator

In [None]:
string1 + " " + string2

'some text some other text'

## Conditionals

## In Python, indentation matters!

As their name indicates, conditionals are a way to execute code depending on whether a condition is True or False. As in other languages, Python supports `if` and `else` but `else if` is contracted into `elif`, as the example below demonstrates. 

In [None]:
my_variable = 5
if my_variable < 0:
  print("negative")
elif my_variable == 0:
  print("null")
else: # my_variable > 0
  print("positive")

positive


Here `<` and `>` are the strict `less` and `greater than` operators, while `==` is the equality operator (not to be confused with `=`, the variable assignment operator). The operators `<=` and `>=` can be used for less (resp. greater) than or equal comparisons.

Contrary to other languages, blocks of code are delimited using indentation. Here, we use 2-space indentation but many programmers also use 4-space indentation. Any one is fine as long as you are consistent throughout your code.

## Loops

Loops are a way to execute a block of code multiple times. There are two main types of loops: while loops and for loops.

While loop

In [None]:
i = 0
while i < len(my_list):
  print(my_list[i])
  i += 1 # equivalent to i = i + 1

1
2
3
4


For loop

In [None]:
for i in range(len(my_list)):
  print(my_list[i])

1
2
3
4


If the goal is simply to iterate over a list, we can do so directly as follows

In [None]:
for element in my_list:
  print(element)

1
2
3
4


## Functions

To improve code readability, it is common to separate the code into different blocks, responsible for performing precise actions: functions. A function takes some inputs and process them to return some outputs.

In [None]:
def square(x):
  return x ** 2

def multiply(a, b):
  return a * b

# Functions can be composed.
square(multiply(3, 2))

36

To improve code readability, it is sometimes useful to explicitly name the arguments

In [None]:
square(multiply(a=3, b=2))

36

## Class and Objects

In Python (and other similar object-oriented programming languages), a class is a means to define an *object*. An object may have some associated properties that are attributed to it, and can be passed around to different functions.

For example, we can define a `Person` object to have the properties "name" and "age."

In [None]:
class Person:
  name = "Dax"
  age = 25

print(Person.name)
print(Person.age)

Dax
25


However, defining a `Person` like above isn't useful. Why? Well, not every `Person`'s name is Dax, nor their age 25. A more useful way of defining a class is through an `__init__` function. This function allows us to define a class' property. When we define a `Person`, we can use the `Person` class like a function, and pass it variables defined in the `__init__` function. For example:

In [None]:
class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

mentor = Person("Dax", 25)

print(mentor.name, mentor.age)

Dax 25


We can now extend the `Person` class to have different functionality and allow it to change internal variables. For example, I extend the `Person` class to have a `birthday()` function that increment's that person's age by one.

In [None]:
class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age

  def birthday(self):
    self.age += 1

mentor = Person("Dax", 25)
print(mentor.age)

mentor.birthday()
print(mentor.age)

25
26


# Numpy

In [8]:
import numpy as np

A very important (and common) data type/object that one will encounter is a multi-dimensional list. To put simply, a two-dimensional list contains a list inside of a list, a three-dimensional list contains list inside of a list inside of a list, etc. Let's visualize what that is.

In [2]:
twod_list = []
for word in "Hello, my name is Dax".split(' '):
  twod_list.append([ord(c) for c in word])

print(twod_list)

[[72, 101, 108, 108, 111, 44], [109, 121], [110, 97, 109, 101], [105, 115], [68, 97, 120]]


It's common for images to be in a 3-dimensional list. Why? Because an image has rows, columns, and R, G, B values. So one list contains all the rows in an image, one list contains all the column values in that row, and each of the value contains a list in format [R, G, B].

In [9]:
from PIL import Image


im = Image.open("v.png")
print(np.array(im))

[[[151 135 122]
  [154 138 125]
  [151 135 122]
  ...
  [170 151 139]
  [170 151 139]
  [170 151 139]]

 [[152 136 123]
  [156 140 127]
  [155 139 126]
  ...
  [170 151 139]
  [170 151 139]
  [170 151 139]]

 [[152 136 123]
  [157 141 128]
  [157 141 128]
  ...
  [167 148 136]
  [167 148 136]
  [167 148 136]]

 ...

 [[174 151 136]
  [174 151 136]
  [175 152 137]
  ...
  [124 108  95]
  [123 107  94]
  [127 111  98]]

 [[174 151 136]
  [174 151 136]
  [175 152 137]
  ...
  [126 110  97]
  [126 110  97]
  [130 114 101]]

 [[174 151 136]
  [174 151 136]
  [175 152 137]
  ...
  [123 107  94]
  [126 110  97]
  [133 117 104]]]


Numpy is a really convenient library to manipulate multi-dimensional data. Let's explore what it can do:

## Defining an array

In [12]:
a = np.array([1,2,3])
b = np.array([(1.5,2,3), (4,5,6)], dtype = float)
c = np.array([[(1.5,2,3), (4,5,6)],[(3,2,1), (4,5,6)]], dtype = float)

d = np.zeros((3,4)) #Create an array of zeros
e = np.ones((2,3,4),dtype=np.int16) #Create an array of ones

f = np.arange(10,25,5)#Create an array of evenly spaced values (step value)
g = np.linspace(0,2,9) #Create an array of evenlyspaced values (number of samples)

h = np.full((2,2),7)#Create a constant array
i = np.eye(2) #Create a 2X2 identity matrix
j = np.random.random((2,2)) #Create an array with random values
k = np.empty((3,2)) #Create an empty array

In [17]:
def namestr(obj, namespace):
    return [name for name in namespace if namespace[name] is obj]

for arr in [a, b, c, d, e, f, g, h, i, j, k]:
  print(namestr(arr, globals())[0], arr, "\n")

a [1 2 3] 

b [[1.5 2.  3. ]
 [4.  5.  6. ]] 

c [[[1.5 2.  3. ]
  [4.  5.  6. ]]

 [[3.  2.  1. ]
  [4.  5.  6. ]]] 

d [[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]] 

e [[[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]

 [[1 1 1 1]
  [1 1 1 1]
  [1 1 1 1]]] 

f [10 15 20] 

g [0.   0.25 0.5  0.75 1.   1.25 1.5  1.75 2.  ] 

h [[7 7]
 [7 7]] 

i [[1. 0.]
 [0. 1.]] 

j [[0.96934343 0.62986747]
 [0.19670802 0.43431563]] 

k [[1.39069238e-309 1.39069238e-309]
 [1.39069238e-309 1.39069238e-309]
 [1.39069238e-309 1.39069238e-309]] 



In [24]:
l = a - b #Subtraction
np.subtract(a,b) #Subtraction

m = b + a #Addition 
np.add(b,a) #Addition 

a/b #Division 
np.divide(a,b) #Division 

a * b #Multiplication 
np.multiply(a,b) #Multiplication 

np.exp(b) #Exponentiation
np.sqrt(b) #Square root
np.sin(a)  #Print sines of an array
np.cos(b) #Elementwise cosine
np.log(a)#Elementwise natural logarithm

array([0.        , 0.69314718, 1.09861229])

# Pandas

## Introduction


Pandas is most notably known as a library to create *DataFrames*, which are 2-dimensional (row, column) tabular data. You can think of DataFrames a lot like Excel Spreadsheets, where for each row and column you have a singular cell that may contain data.

DataFrames are not unlike spreadsheets, however they have their own ways in defining, placing, analyzing, and modifying data. 

To get started, we need to import Pandas. I use the common `pd` alias to reduce the amount of characters I have to type during coding later.

In [None]:
import pandas as pd

## Defining a DataFrame

There are numerous ways to define a DataFrame. Below are some you may be interested in using.

### Creating your own DataFrames

In [None]:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
display(df)

Unnamed: 0,col1,col2
0,1,3
1,2,4


In [None]:
rows = []
for i in range(2):
  rows.append({"col1": i, "col2": i * 2})

df = pd.DataFrame.from_dict(rows)
display(df)

Unnamed: 0,col1,col2
0,0,0
1,1,2


### Loading a DataFrame through a file

In [None]:
df = pd.read_excel("example.xlsx")
display(df)

Unnamed: 0,col1,col2
0,1,3
1,2,4


In [None]:
df = pd.read_csv("example.csv")
display(df)

Unnamed: 0,col1,col2
0,1,3
1,2,4


## Manipulating Data

Once you have a DataFrame defined with data, here are common ways that you can modify that data. Let's define a sample DataFrame that we can manipulate.

In [None]:
summer_df = pd.read_excel("summer.xlsx")
display(summer_df)

Unnamed: 0,Role,Name,Preferred Name,Email,GitHub Username,Project
0,Mentor,Ahmed Mohamed,Shadi,wff394@mocs.utc.edu,A7med3liamin,
1,Student,Amaya Caudel,,acaudel@my.apsu.edu,llamayall,MusicAI
2,Mentor,Amin Amiri,,tvp372@mocs.utc.edu,amiri-amin (ID:107808541),"LIDAR, MLPPC (Pavement)"
3,Student,Andres Angel,Andres,andres-angel@utc.edu,angeland001,VIGOR
4,Student,Bansari Patel,,bxy539@mocs.utc.edu,Bansari26700,Transportation
5,Student,Collin Matthews,,collin.j.matthews@gmail.com,collinjm,Transportation
6,Mentor/Staff,Dakila Ledesma,Dax,dakila-ledesma@utc.edu\ndakila_ledesma@bcbst.com,dakilaledesma,"VIGOR, MusicAI, TDoT GI, PTP"
7,Professor,Dalei Wu,,dalei-wu@utc.edu,utcwu2019,
8,Student,David Dias,,DTH931@mocs.utc.edu,DavidDiasN,LIDAR
9,Student,Freddy Su,,freddysu06@gmail.com,freddysu06,MusicAI


### Adding a column

In [None]:
# Generate a list of first names
first_names = []
for index, row in summer_df.iterrows():
  first_names.append(row["Name"].split(' ')[0])
print(first_names)

# Assign a "First Name" column using the list of first_names
summer_df["First Name"] = first_names
display(summer_df)

['Ahmed', 'Amaya', 'Amin', 'Andres', 'Bansari', 'Collin', 'Dakila', 'Dalei', 'David', 'Freddy', 'Jennifer', 'Kalon', 'Megan', 'Ruhan', 'Rutvi', 'Sophia', 'Stanley', 'Veer', 'Viji', 'Yu']


Unnamed: 0,Role,Name,Preferred Name,Email,GitHub Username,Project,First Name
0,Mentor,Ahmed Mohamed,Shadi,wff394@mocs.utc.edu,A7med3liamin,,Ahmed
1,Student,Amaya Caudel,,acaudel@my.apsu.edu,llamayall,MusicAI,Amaya
2,Mentor,Amin Amiri,,tvp372@mocs.utc.edu,amiri-amin (ID:107808541),"LIDAR, MLPPC (Pavement)",Amin
3,Student,Andres Angel,Andres,andres-angel@utc.edu,angeland001,VIGOR,Andres
4,Student,Bansari Patel,,bxy539@mocs.utc.edu,Bansari26700,Transportation,Bansari
5,Student,Collin Matthews,,collin.j.matthews@gmail.com,collinjm,Transportation,Collin
6,Mentor/Staff,Dakila Ledesma,Dax,dakila-ledesma@utc.edu\ndakila_ledesma@bcbst.com,dakilaledesma,"VIGOR, MusicAI, TDoT GI, PTP",Dakila
7,Professor,Dalei Wu,,dalei-wu@utc.edu,utcwu2019,,Dalei
8,Student,David Dias,,DTH931@mocs.utc.edu,DavidDiasN,LIDAR,David
9,Student,Freddy Su,,freddysu06@gmail.com,freddysu06,MusicAI,Freddy


### Filtering and changing data through a conditional

Sometimes, it's important to filter data for viewing. For example, I may want to filter people who are working on VIGOR:

In [None]:
vigor_students_df = summer_df[summer_df["Project"].astype(str).str.contains('VIGOR')]
display(vigor_students_df)

Unnamed: 0,Role,Name,Preferred Name,Email,GitHub Username,Project,First Name
3,Student,Andres Angel,Andres,andres-angel@utc.edu,angeland001,VIGOR,Andres
6,Mentor/Staff,Dakila Ledesma,Dax,dakila-ledesma@utc.edu\ndakila_ledesma@bcbst.com,dakilaledesma,"VIGOR, MusicAI, TDoT GI, PTP",Dakila
10,Student,Jennifer Wu,,wu1745@purdue.edu,JenniferWu25,VIGOR,Jennifer
12,Student,Megan Burns,,Lrt935@mocs.utc.edu,ms-burns,VIGOR,Megan
15,Student,Sophia Su,,su288@purdue.edu,SophiaSu1124,VIGOR,Sophia


## Common Functions

### Sorting a DataFrame

In [None]:
summer_proj_df = summer_df.sort_values(["Project", "Name"])
display(summer_proj_df)

Unnamed: 0,Role,Name,Preferred Name,Email,GitHub Username,Project,First Name
8,Student,David Dias,,DTH931@mocs.utc.edu,DavidDiasN,LIDAR,David
17,Student,Veer Sahasi,,veersahasi@icloud.com,vsahasi,LIDAR Drone Project,Veer
2,Mentor,Amin Amiri,,tvp372@mocs.utc.edu,amiri-amin (ID:107808541),"LIDAR, MLPPC (Pavement)",Amin
1,Student,Amaya Caudel,,acaudel@my.apsu.edu,llamayall,MusicAI,Amaya
9,Student,Freddy Su,,freddysu06@gmail.com,freddysu06,MusicAI,Freddy
11,Student,Kalon Camkal,,kalon.camkal@gmail.com,,MusicAI,Kalon
16,Student,Stanley Wang,,stanleymaverwang@gmail.com,stanleymw,MusicAI,Stanley
13,Student,Ruhan Sahasi,,ruhansahasi@icloud.com,rsahasi,Pavement Infrastructure,Ruhan
14,Student,Rutvi Shah,,nmg669@mocs.utc.edu,rutvishah30,TDoT GI,Rutvi
4,Student,Bansari Patel,,bxy539@mocs.utc.edu,Bansari26700,Transportation,Bansari


# Acknowledgement 
Modified from materials prepared by Mathieu Blondel.