# Introduction to Python

Key concepts in procedural programming (think COMP 1900)
* Block
* Conditional statement
* Iterative statement
* Functional definition and invocation
    
Key concepts in OOP (think COMP 2150)
* Class and objects
* Attributes and methods

Data structures in Python
* Number and string
* Linear ordered structures:  list, array, tuple
* Unordered structure: set
* Map or dictionary
* File

# Blocks

How is this done in Java?

In most languages, e.g. Java, a block is something inside { and }.

In Python, a block is defined by indentations.  Things in a same block have the same level/number of indentation.



# Conditional statements

Check if a string consists the word "Python".

Check if a string is a palindrome.

Check if a list consists of a number.


In [1]:
x = 'I like Python.'
y = "I don't like Python."

In [2]:
type(x), type(y)

(str, str)

In [6]:
'Pythog' in y

False

In [7]:
if 'Python' in x:
    print('Yes')
    print('x is', x)

Yes
x is I like Python.


# Iterative statements

Print out all strings in a list that consist the word "Python".

Go through each item of a list.

Go through each index of a list.

Print out all numbers in a list that is even.

Go through a list of strings and stop when an item has the word "Python".

In [8]:
nums = [1,2,3,4,5]
for x in nums:
    print(x)
    print(x*x)

1
1
2
4
3
9
4
16
5
25


# Functions

Input and output of functions.

Define a function that count the frequency of the word "Python" in a list of strings.

Exercise: define a Python function that returns a list of squares from a given list.

In [13]:
def square_of_items(L):
    output = []
    for x in L:
        output.append(x*x)
    return output


In [15]:
squares = square_of_items([1,3,4])
print(squares)

[1, 9, 16]


# Objects

Capitalize a string.

What are useful string methods?

What are useful list methods? 

In [16]:
type(squares)

list

To inspect an object, we can use the "dir" function.

In [17]:
dir(squares)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [18]:
squares

[1, 9, 16]

In [19]:
squares.reverse()

In [20]:
squares

[16, 9, 1]

# Class

Create a class IrisData with attributes:
* species
* number of data points
* list of data points


Understand "self" and how methods are invoked.

#### The Iris Dataset (iris.csv)

The iris flower has 3 species.

<img src="https://miro.medium.com/max/700/0*Uw37vrrKzeEWahdB">

# More on lists

Concatenate lists.

Sort lists.

List comprehension.

In [21]:
iris_data = open('../../Datasets/iris.csv').readlines()

In [24]:
print(iris_data[0])
print(iris_data[1])


SepalLength,SepalWidth,PetalLength,PetalWidth,Species

5.1,3.5,1.4,0.2,setosa



Let's clean up this data.  (Note: it's already cleaned.  We'll just touch up a little.)

In [61]:
def remove_newlines(lines):
    output = []
    header = lines.pop(0).strip()
    for line in lines:
        output.append(line.strip().split(','))
    return header, output

In [62]:
iris_data = open('../../Datasets/iris.csv').readlines()
header, iris_data = remove_newlines(iris_data)

In [63]:
header

'SepalLength,SepalWidth,PetalLength,PetalWidth,Species'

In [64]:
iris_data

[['5.1', '3.5', '1.4', '0.2', 'setosa'],
 ['4.9', '3', '1.4', '0.2', 'setosa'],
 ['4.7', '3.2', '1.3', '0.2', 'setosa'],
 ['4.6', '3.1', '1.5', '0.2', 'setosa'],
 ['5', '3.6', '1.4', '0.2', 'setosa'],
 ['5.4', '3.9', '1.7', '0.4', 'setosa'],
 ['4.6', '3.4', '1.4', '0.3', 'setosa'],
 ['5', '3.4', '1.5', '0.2', 'setosa'],
 ['4.4', '2.9', '1.4', '0.2', 'setosa'],
 ['4.9', '3.1', '1.5', '0.1', 'setosa'],
 ['5.4', '3.7', '1.5', '0.2', 'setosa'],
 ['4.8', '3.4', '1.6', '0.2', 'setosa'],
 ['4.8', '3', '1.4', '0.1', 'setosa'],
 ['4.3', '3', '1.1', '0.1', 'setosa'],
 ['5.8', '4', '1.2', '0.2', 'setosa'],
 ['5.7', '4.4', '1.5', '0.4', 'setosa'],
 ['5.4', '3.9', '1.3', '0.4', 'setosa'],
 ['5.1', '3.5', '1.4', '0.3', 'setosa'],
 ['5.7', '3.8', '1.7', '0.3', 'setosa'],
 ['5.1', '3.8', '1.5', '0.3', 'setosa'],
 ['5.4', '3.4', '1.7', '0.2', 'setosa'],
 ['5.1', '3.7', '1.5', '0.4', 'setosa'],
 ['4.6', '3.6', '1', '0.2', 'setosa'],
 ['5.1', '3.3', '1.7', '0.5', 'setosa'],
 ['4.8', '3.4', '1.9', '0.2', 

In [58]:
lines = ['5.1,3.5,1.4,0.2,setosa\n', '5.5,5.5,0.4,2.2,setosa\n']
lines.pop(0).strip()


'5.1,3.5,1.4,0.2,setosa'

In [60]:
lines = ['5.1,3.5,1.4,0.2,setosa\n', '5.5,5.5,0.4,2.2,setosa\n']
lines.pop(0).strip().split(',')

['5.1', '3.5', '1.4', '0.2', 'setosa']

## List Comprehension

In [67]:
iris_data = open('../../Datasets/iris.csv').readlines()

# a slice of the first 10 items.
iris_data[0:10]

['SepalLength,SepalWidth,PetalLength,PetalWidth,Species\n',
 '5.1,3.5,1.4,0.2,setosa\n',
 '4.9,3,1.4,0.2,setosa\n',
 '4.7,3.2,1.3,0.2,setosa\n',
 '4.6,3.1,1.5,0.2,setosa\n',
 '5,3.6,1.4,0.2,setosa\n',
 '5.4,3.9,1.7,0.4,setosa\n',
 '4.6,3.4,1.4,0.3,setosa\n',
 '5,3.4,1.5,0.2,setosa\n',
 '4.4,2.9,1.4,0.2,setosa\n']

In [71]:
nums = list(range(20))
print(nums)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]


Let's say we want to square each item in nums.

"squares" is a list which has each item x of "nums", squared.

In [72]:
squares = [ x*x for x in nums ]

In [73]:
squares

[0,
 1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361]

I only squares of odd numbers in nums.

squares is a list of x*x for every x in nums.

In [74]:
squares = [ x*x for x in nums if x%2==1]

In [75]:
squares

[1, 9, 25, 49, 81, 121, 169, 225, 289, 361]

In [77]:
header = iris_data.pop(0).strip()
header

'SepalLength,SepalWidth,PetalLength,PetalWidth,Species'

In [81]:
data = [ line.strip() for line in iris_data ]

In [83]:
data[0:3]

['5.1,3.5,1.4,0.2,setosa', '4.9,3,1.4,0.2,setosa', '4.7,3.2,1.3,0.2,setosa']

In [96]:
def process_line(line):
    things = line.strip().split(',')
    things = [float(things[0]),float(things[1]),float(things[2]),float(things[3]),things[4]]
    return things

In [97]:
data = [ process_line(line) for line in iris_data ]

In [99]:
versicolor = [ r for r in data if r[-1] == 'versicolor']

In [100]:
versicolor

[[7.0, 3.2, 4.7, 1.4, 'versicolor'],
 [6.4, 3.2, 4.5, 1.5, 'versicolor'],
 [6.9, 3.1, 4.9, 1.5, 'versicolor'],
 [5.5, 2.3, 4.0, 1.3, 'versicolor'],
 [6.5, 2.8, 4.6, 1.5, 'versicolor'],
 [5.7, 2.8, 4.5, 1.3, 'versicolor'],
 [6.3, 3.3, 4.7, 1.6, 'versicolor'],
 [4.9, 2.4, 3.3, 1.0, 'versicolor'],
 [6.6, 2.9, 4.6, 1.3, 'versicolor'],
 [5.2, 2.7, 3.9, 1.4, 'versicolor'],
 [5.0, 2.0, 3.5, 1.0, 'versicolor'],
 [5.9, 3.0, 4.2, 1.5, 'versicolor'],
 [6.0, 2.2, 4.0, 1.0, 'versicolor'],
 [6.1, 2.9, 4.7, 1.4, 'versicolor'],
 [5.6, 2.9, 3.6, 1.3, 'versicolor'],
 [6.7, 3.1, 4.4, 1.4, 'versicolor'],
 [5.6, 3.0, 4.5, 1.5, 'versicolor'],
 [5.8, 2.7, 4.1, 1.0, 'versicolor'],
 [6.2, 2.2, 4.5, 1.5, 'versicolor'],
 [5.6, 2.5, 3.9, 1.1, 'versicolor'],
 [5.9, 3.2, 4.8, 1.8, 'versicolor'],
 [6.1, 2.8, 4.0, 1.3, 'versicolor'],
 [6.3, 2.5, 4.9, 1.5, 'versicolor'],
 [6.1, 2.8, 4.7, 1.2, 'versicolor'],
 [6.4, 2.9, 4.3, 1.3, 'versicolor'],
 [6.6, 3.0, 4.4, 1.4, 'versicolor'],
 [6.8, 2.8, 4.8, 1.4, 'versicolor'],
 

# Files

Read lines from a file.

"with" block.

Remove the header of iris.csv file.

Traverse and print iris data points.

Describe the procedure in English first.

In [8]:
data_src = '../../Datasets/iris.csv'
with open(data_src) as fp:
#     everything = fp.read()
#     print(everything)
    for line in fp:
        print(line)

SepalLength,SepalWidth,PetalLength,PetalWidth,Species

5.1,3.5,1.4,0.2,setosa

4.9,3,1.4,0.2,setosa

4.7,3.2,1.3,0.2,setosa

4.6,3.1,1.5,0.2,setosa

5,3.6,1.4,0.2,setosa

5.4,3.9,1.7,0.4,setosa

4.6,3.4,1.4,0.3,setosa

5,3.4,1.5,0.2,setosa

4.4,2.9,1.4,0.2,setosa

4.9,3.1,1.5,0.1,setosa

5.4,3.7,1.5,0.2,setosa

4.8,3.4,1.6,0.2,setosa

4.8,3,1.4,0.1,setosa

4.3,3,1.1,0.1,setosa

5.8,4,1.2,0.2,setosa

5.7,4.4,1.5,0.4,setosa

5.4,3.9,1.3,0.4,setosa

5.1,3.5,1.4,0.3,setosa

5.7,3.8,1.7,0.3,setosa

5.1,3.8,1.5,0.3,setosa

5.4,3.4,1.7,0.2,setosa

5.1,3.7,1.5,0.4,setosa

4.6,3.6,1,0.2,setosa

5.1,3.3,1.7,0.5,setosa

4.8,3.4,1.9,0.2,setosa

5,3,1.6,0.2,setosa

5,3.4,1.6,0.4,setosa

5.2,3.5,1.5,0.2,setosa

5.2,3.4,1.4,0.2,setosa

4.7,3.2,1.6,0.2,setosa

4.8,3.1,1.6,0.2,setosa

5.4,3.4,1.5,0.4,setosa

5.2,4.1,1.5,0.1,setosa

5.5,4.2,1.4,0.2,setosa

4.9,3.1,1.5,0.2,setosa

5,3.2,1.2,0.2,setosa

5.5,3.5,1.3,0.2,setosa

4.9,3.6,1.4,0.1,setosa

4.4,3,1.3,0.2,setosa

5.1,3.4,1.5,0.2,setosa

5,3.5,1.3

# Maps (Dictionaries)


Create a map that stores data for each species of Iris.

Traverse a map.

Understand keys and values in maps.


A dictionary is a set of key-value pairs.


In [13]:
iris = {}
iris['setosa'] = [(1.5,2.5,3,2), (2.5,4.5,3.1,2)]
iris['versicolor'] = [(5,2,3.0,2.1), (2,4,3.1,2)]

In [14]:
type(iris)

dict

In [15]:
iris.keys()

dict_keys(['setosa', 'versicolor'])

In [16]:
iris.values()

dict_values([[(1.5, 2.5, 3, 2), (2.5, 4.5, 3.1, 2)], [(5, 2, 3.0, 2.1), (2, 4, 3.1, 2)]])

Dictionary keys are unique.  Values do not have to be unique.

Values are retrieved using keys.

In [17]:
iris['setosa']

[(1.5, 2.5, 3, 2), (2.5, 4.5, 3.1, 2)]

# Practice

Task: get data from iris.csv and store the data into a dictionary. Data for each species is stored in a different key.

What are the steps?
+ Open the file.
+ As we go through each line in the file.
    + Remove the newline character from the current line.
    + Extra information from the line.
        + 

In [42]:
data_src = r'../../Datasets/iris.csv'
data = {}
with open(data_src) as fp:
    header = fp.readline().strip()
#     print('Header:', header)
    for line in fp:
        items = line.strip().split(',')
        species_name = items.pop()
        items = [ float(x) for x in items ]  
        if species_name in data:
            data[ species_name ].append(items)
        else:
            data[ species_name ] = [items]



In [43]:
data.keys()

dict_keys(['setosa', 'versicolor', 'virginica'])

In [44]:
len(data['virginica'])

50

In [45]:
data['virginica']

[[6.3, 3.3, 6.0, 2.5],
 [5.8, 2.7, 5.1, 1.9],
 [7.1, 3.0, 5.9, 2.1],
 [6.3, 2.9, 5.6, 1.8],
 [6.5, 3.0, 5.8, 2.2],
 [7.6, 3.0, 6.6, 2.1],
 [4.9, 2.5, 4.5, 1.7],
 [7.3, 2.9, 6.3, 1.8],
 [6.7, 2.5, 5.8, 1.8],
 [7.2, 3.6, 6.1, 2.5],
 [6.5, 3.2, 5.1, 2.0],
 [6.4, 2.7, 5.3, 1.9],
 [6.8, 3.0, 5.5, 2.1],
 [5.7, 2.5, 5.0, 2.0],
 [5.8, 2.8, 5.1, 2.4],
 [6.4, 3.2, 5.3, 2.3],
 [6.5, 3.0, 5.5, 1.8],
 [7.7, 3.8, 6.7, 2.2],
 [7.7, 2.6, 6.9, 2.3],
 [6.0, 2.2, 5.0, 1.5],
 [6.9, 3.2, 5.7, 2.3],
 [5.6, 2.8, 4.9, 2.0],
 [7.7, 2.8, 6.7, 2.0],
 [6.3, 2.7, 4.9, 1.8],
 [6.7, 3.3, 5.7, 2.1],
 [7.2, 3.2, 6.0, 1.8],
 [6.2, 2.8, 4.8, 1.8],
 [6.1, 3.0, 4.9, 1.8],
 [6.4, 2.8, 5.6, 2.1],
 [7.2, 3.0, 5.8, 1.6],
 [7.4, 2.8, 6.1, 1.9],
 [7.9, 3.8, 6.4, 2.0],
 [6.4, 2.8, 5.6, 2.2],
 [6.3, 2.8, 5.1, 1.5],
 [6.1, 2.6, 5.6, 1.4],
 [7.7, 3.0, 6.1, 2.3],
 [6.3, 3.4, 5.6, 2.4],
 [6.4, 3.1, 5.5, 1.8],
 [6.0, 3.0, 4.8, 1.8],
 [6.9, 3.1, 5.4, 2.1],
 [6.7, 3.1, 5.6, 2.4],
 [6.9, 3.1, 5.1, 2.3],
 [5.8, 2.7, 5.1, 1.9],
 [6.8, 3.2,