# Introduction to Python Data Analytics
# Part 0. Python Basics

Author: Kang P. Lee <br>
References: 
- Python Programming by en.wikibooks.org (https://en.wikibooks.org/wiki/Python_Programming)
- Data Wrangling with Python by Katharine Jarmul, Jacqueline Kazil (http://shop.oreilly.com/product/0636920032861.do)
- The Python Tutorial by Python Software Foundation (https://docs.python.org/3/tutorial/)
- The Python Standard Library by Python Software Foundation (https://docs.python.org/3/library/)

## ▪ Running a Cell

In [1]:
print("Hello, world!")

Hello, world!


## ▪ Numbers

- Integers
- floating point numbers
- complex numbers

Refer to https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex

In [2]:
x = 5
print(x, type(x))

5 <class 'int'>


In [3]:
x = 3.141592
print(x, type(x))

3.141592 <class 'float'>


In [4]:
x = 5 + 2j
print(x, type(x))

(5+2j) <class 'complex'>


You don't have to specify what type of variable you want; in Python the data types are dynamically inferred.

## ▪ Strings

Refer to https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str

In [5]:
a = "hello"
print(a, type(a))

hello <class 'str'>


String literals can be enclosed in matching single quotes (') or double quotes ("); either is fine. If the string contains a single quote, you can use double quotes around the string without using the backslash escape character before the single quote inside the string. 

In [6]:
a = "I'm a boy"
a

"I'm a boy"

In [7]:
a = 'I'm a boy'
a

SyntaxError: invalid syntax (<ipython-input-7-0132d64ba264>, line 1)

In a similar way, if the string contains any double quotes, you can use single quotes around the string without using the backslash escape character before the double quote inside the string.

In [8]:
a = 'She said, "How are you?"'
a

'She said, "How are you?"'

In [9]:
a = "She said, "How are you?""
a

SyntaxError: invalid syntax (<ipython-input-9-10ecee2842e4>, line 1)

In [10]:
b = "1"
print(b, type(b))

1 <class 'str'>


In [11]:
1 == "1"

False

### String Additions and Multiplications

In [12]:
a = "hello"
b = "world"
a + b

'helloworld'

String addition is equal to sting concatenation.

In [13]:
a * 3

'hellohellohello'

In [14]:
a * b

TypeError: can't multiply sequence by non-int of type 'str'

### Containment

In [15]:
a = "hello"
b = "hell"
print(b in a)
print(a in b)

True
False


The 'in' operator returns True if the first operand is contained in the second.

### Indexing and Slicing

Python string is, in fact, a sequence, meaning that it could be indexed and sliced.

In [16]:
a = "Data_Science_Institute!"
print(a)

Data_Science_Institute!


In [17]:
index = 0
for letter in a:
    print(index, "\t", letter)
    index += 1

0 	 D
1 	 a
2 	 t
3 	 a
4 	 _
5 	 S
6 	 c
7 	 i
8 	 e
9 	 n
10 	 c
11 	 e
12 	 _
13 	 I
14 	 n
15 	 s
16 	 t
17 	 i
18 	 t
19 	 u
20 	 t
21 	 e
22 	 !


In [18]:
print(a[22])

!


In [19]:
print(a[-1])

!


Python also indexes the arrays backwards, using negative numbers.

In [20]:
index = 0
for letter in a:
    print(index, "\t", index-len(a), "\t", letter)
    index += 1

0 	 -23 	 D
1 	 -22 	 a
2 	 -21 	 t
3 	 -20 	 a
4 	 -19 	 _
5 	 -18 	 S
6 	 -17 	 c
7 	 -16 	 i
8 	 -15 	 e
9 	 -14 	 n
10 	 -13 	 c
11 	 -12 	 e
12 	 -11 	 _
13 	 -10 	 I
14 	 -9 	 n
15 	 -8 	 s
16 	 -7 	 t
17 	 -6 	 i
18 	 -5 	 t
19 	 -4 	 u
20 	 -3 	 t
21 	 -2 	 e
22 	 -1 	 !


In [21]:
print(a[0:4])

Data


Note that s[i:j] will give us a string starting with s[i] and ending with s[j-1], not s[j]

In [22]:
print(a[:4])

Data


You can skip the starting index 0, if it stars from the beginning.

In [23]:
print(a[4:])

_Science_Institute!


You can skip the ending index if it ends to the end.

In [24]:
print(a[:])

Data_Science_Institute!


You can skip both the starting and ending indices if it starts from the beginning and ends to the end.

In [25]:
print(a[-10:])

Institute!


In [26]:
a[4] = "*"

TypeError: 'str' object does not support item assignment

Strings are immutable, which means the content of a string cannot be changed after they are created. 

In [27]:
print(a)

Data_Science_Institute!


Note that the original string a hasn't changed a bit. Indexing and slicing of strings returns a new copy of string, not changing the original string. 

In [28]:
a = a[:4]
print(a)

Data


Don't forget to re-assign the new copy to the original variable, if that is what you intend. 

### String Methods

In [29]:
a = "DaTa ScIeNcE InStItUtE!"
print(a.upper())          # Convert the string to upper case.

DATA SCIENCE INSTITUTE!


In [30]:
print(a.lower())          # Convert the string to lower case.

data science institute!


In [31]:
print(a.count("S"))       # Count the number of the specified substrings in the string.

2


All strings and string methods in Python are case-sensitive.

In [32]:
a = "\tData Science Institute!\nWelcome!"     # \t: tab, \n: new line
print(a)

	Data Science Institute!
Welcome!


In [33]:
a = "\tData Science Institute    \n"
print(a.strip())                              # Remove the leading and trailing whitespaces in the string.

Data Science Institute


In [34]:
print(a.lstrip())                             # Remove the leading whitespaces in the string.

Data Science Institute    



In [36]:
print(a.rstrip())                             # Remove the leading and trailing whitespaces in the string.

	Data Science Institute


In [37]:
a= "Data Science Institute;"
print(a.rstrip(";"))                          # Remove other types of characters.

Data Science Institute


In [38]:
seq = ["a", "b", "c", "d", "e"]
print("+".join(seq))                          # Join together the given sequence with the string as a separator.

a+b+c+d+e


In [40]:
a = "Data Science Institute!"
print(a.find("a"))                            # Return the index of the first found occurrence of the given substring.

1


In [41]:
a = "Data Science Institute!"
print(a.find("z"))

-1


In [42]:
a = "Data Science Institute!"
print(a.index("a"))

1


In [43]:
a = "Data Science Institute!"
print(a.index("z"))

ValueError: substring not found

In [44]:
a = "Data Science Institute!"
print(a.replace(" ", "_"))

Data_Science_Institute!


In [45]:
a = "Data Science Institute!"
print(a.split())                              # Splits the string and returns a list of words in the string.

['Data', 'Science', 'Institute!']


In [46]:
a = "Data_Science_Institute!"
print(a.split("_"))                           # It can take a seperator argument.

['Data', 'Science', 'Institute!']


## ▪ Lists

Refer to https://docs.python.org/3/library/stdtypes.html#lists

### List Creation

In [47]:
l1 = []
print(l1, type(l1))

[] <class 'list'>


In [48]:
l1 = list()
print(l1, type(l1))

[] <class 'list'>


In [49]:
l2 = [1, 2, 3]
l2

[1, 2, 3]

In [50]:
l3 = ["a", "b", "c"]
l3

['a', 'b', 'c']

In [51]:
l4 = [1, 2, 3, "a", "b", "c"]
l4

[1, 2, 3, 'a', 'b', 'c']

A list is a very general structure, and list elements don't have to be of the same type.

### Length of a List

In [53]:
len(l4)

6

In [54]:
l4.len()

AttributeError: 'list' object has no attribute 'len'

The 'len' function is a built-in function of Python, which is widely used for getting the length of a list of any type. It is often confused with the 'len' method, which actually doesn't exist in lists.

### List Creation Shorcuts

In [55]:
[0, 1, 2] * 5

[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]

### Combining Lists

In [57]:
l1 = [1, 2, 3]
l2 = ["a", "b", "c"]
l1 + l2

[1, 2, 3, 'a', 'b', 'c']

In [58]:
l1.extend(l2)
l1

[1, 2, 3, 'a', 'b', 'c']

Note that a + b returns a new copy of list, while a.extend(b) extends a by adding b. 

In [59]:
l = [1, 2, 3]
l.append(4)
l

[1, 2, 3, 4]

In [60]:
l.append([5, 6])
l

[1, 2, 3, 4, [5, 6]]

Note that [5,6] is an element of the list, not part of the list. The 'append' method always adds one element only to the end of a list.

### Indexing and Slicing of Lists

In [61]:
l = [1, 2, 3, "a", "b", "c"]
l[-2:]

['b', 'c']

Indexing and slicing of lists is the same as that of strings, except that lists are mutable, which means we can assign new values to the items in a list

In [62]:
l[-1] = "z"
l

[1, 2, 3, 'a', 'b', 'z']

### Sorting Lists

In [63]:
l = [1, 6, 3, 4, 2, 5]
l.sort()                    # Sort a list in an ascending order.
l

[1, 2, 3, 4, 5, 6]

In [64]:
l = [1, 6, 3, 4, 2, 5]
l.sort(reverse=True)        # Sort a list in a dscending order.
l

[6, 5, 4, 3, 2, 1]

In [65]:
l = ["a", "c", "b", 3, 1, 2]
l.sort()
l

TypeError: '<' not supported between instances of 'int' and 'str'

If you trying sorting a list with elements of different types, it returns a TypeError. 

In [66]:
l = ["a", "c", "b", "3", "1", "2"]  # Make the types of all elements string.
l.sort()
l

['1', '2', '3', 'a', 'b', 'c']

In [67]:
l = ["a", "c", "b", "3", "1", "2"]
sorted(l)

['1', '2', '3', 'a', 'b', 'c']

Python also has a built-in function sorted(), which works the same as the method sort() except that it returns a new copy. 

### Iteration

In [68]:
l = ["a", "b", "c", "d", "e"]
for item in l:
    print(item)               # Instead of print(), you can do whatever you want with each item.

a
b
c
d
e


In [70]:
for i in range(0, len(l)):    # range(m, n) returns a list of n integers starting from m
    print(l[i])

a
b
c
d
e


### Removing

In [71]:
l = [1, 2, 3, 4, 5]
l.pop()           # Remove the last item
l

[1, 2, 3, 4]

In [72]:
l.pop(0)          # Remove the first item.
l

[2, 3, 4]

In [73]:
l.remove(3)       # Remove the item 3
l

[2, 4]

### Aggregates

In [74]:
l = [1, 2, 3, 4, 5]
min(l)

1

In [75]:
max(l)

5

In [76]:
sum(l)

15

In [77]:
avg = sum(l) / len(l)
avg

3.0

### Containment

In [78]:
l = [1, 2, 3, 4, 5]
3 in l

True

In [79]:
"a" in l

False

### List Comprehensions

In [80]:
l1 = [1, 2, 3]

l2 = []                  # Create a new list l2 by multiplying the elements of l1 by 10.
for item in l1:
    l2.append(item * 10)
l2

[10, 20, 30]

In [81]:
l2 = [item * 10 for item in l1]
l2

[10, 20, 30]

Using list comprehension, you simply describe the process using which the list should be created.

In [82]:
l1 = ["1", "2", "3"]
l2 = ["a", "b", "c"]

l3 = []                  # Create a new list l3 by concatenating two elements from l1 & l2.
for item1 in l1:         
    for item2 in l2:
        l3.append(item1 + item2)
l3

['1a', '1b', '1c', '2a', '2b', '2c', '3a', '3b', '3c']

In [83]:
l3 = [item1 + item2 for item1 in l1 for item2 in l2]
l3

['1a', '1b', '1c', '2a', '2b', '2c', '3a', '3b', '3c']

List comprehensions have not only the code length advantage, but also the time advantage. List comprehensions are 35% faster than for loops.

## ▪ Dictionaries

Refer to https://docs.python.org/3/library/stdtypes.html#mapping-types-dict

In [84]:
d = {}
print(d, type(d))

{} <class 'dict'>


In [85]:
d = dict()
print(d, type(d))

{} <class 'dict'>


In [86]:
buildings = {"CPHB": "College of Public Health Building", "UCC": "University Capitol Center"}
buildings

{'CPHB': 'College of Public Health Building',
 'UCC': 'University Capitol Center'}

In [87]:
buildings["CPHB"]

'College of Public Health Building'

Note that the key should be called in square brackets, not in parentheses. Dictionaries are not functions which need parentheses to deliver parameters.

In [88]:
buildings["IMU"]

KeyError: 'IMU'

In [89]:
buildings.keys()

dict_keys(['CPHB', 'UCC'])

In [90]:
buildings.values()

dict_values(['College of Public Health Building', 'University Capitol Center'])

When designing a dictionary, think about which should be the key and which should be the value. It depends on the purpose of the dictionary.

In [91]:
buildings["IMU"] = "Iowa Memorial Union"
buildings

{'CPHB': 'College of Public Health Building',
 'IMU': 'Iowa Memorial Union',
 'UCC': 'University Capitol Center'}

In [92]:
"IMU" in buildings

True

In [93]:
"PBB" in buildings

False

In [94]:
len(buildings)

3

## ▪ Sets

Refer to https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset

In [95]:
s = set()
print(s, type(s))

set() <class 'set'>


In [97]:
s = {"cat", "dog", "bird"}
print(s, type(s))

{'dog', 'bird', 'cat'} <class 'set'>


In [98]:
s.add("fish")
s

{'bird', 'cat', 'dog', 'fish'}

In [99]:
s.add("fish")
s

{'bird', 'cat', 'dog', 'fish'}

Sets do not allow duplicate values.

In [100]:
s.remove("cat")
s

{'bird', 'dog', 'fish'}

In [101]:
s.update(["elephant", "horse", "whale"])
s

{'bird', 'dog', 'elephant', 'fish', 'horse', 'whale'}

The 'add' method adds a single element to a set, while the 'update' method adds a group of elements.

In [102]:
"dog" in s

True

In [103]:
"cow" in s

False

In [104]:
for item in s:
    print(item)

fish
bird
horse
dog
elephant
whale


### Set Operations - Union

In [105]:
s1 = {1, 2, 3, 4, 5}
s2 = {1, 3, 5, 7, 9}

In [107]:
s1 | s2                         # vertical bar

{1, 2, 3, 4, 5, 7, 9}

In [108]:
s1.union(s2)

{1, 2, 3, 4, 5, 7, 9}

### Set Operations - Intersection

In [109]:
s1 & s2                         # ampersand

{1, 3, 5}

In [110]:
s1.intersection(s2)

{1, 3, 5}

### Set Operations - Difference

In [111]:
s1 - s2

{2, 4}

In [112]:
s1.difference(s2)

{2, 4}

### Set Operations - Symmetric Difference

In [113]:
s1 ^ s2

{2, 4, 7, 9}

In [114]:
s1.symmetric_difference(s2)

{2, 4, 7, 9}

### Set Operations on Multiple Sets

In [115]:
s1 = set([1, 2, 3, 4, 5])
s2 = set([1, 3, 5, 7, 9])
s3 = set([2, 4, 6, 8, 10])

In [116]:
s1 | s2 | s3

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

In [117]:
set.union(s1, s2, s3)

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

## ▪ Operators

In [119]:
(1 + 2) * 3

9

Python supports all types of arithmetic operators.

### Powers

In [120]:
2 ** 10      # 2 to the power of 10

1024

### Division

In [121]:
5 / 2    # true division

2.5

In [122]:
5 // 2   # floor division

2

In [123]:
5 % 2    # remainder division

1

### Type Conversion

In [124]:
a = 1.0
print(a, type(a))

1.0 <class 'float'>


In [125]:
b = int(a)
print(b, type(b))

1 <class 'int'>


In [126]:
c = str(b)
print(c, type(c))

1 <class 'str'>


In [127]:
d = float(c)
print(d, type(d))

1.0 <class 'float'>


### Negation

In [128]:
x = 1
-x

-1

### Comparisons

Refer to https://docs.python.org/3/library/stdtypes.html#comparisons

In [129]:
1 == 1

True

Do not counfuse the '==' (equality) operator with the '=' (assignment) operator. 

In [130]:
1 < 2

True

In [131]:
1 >= 2

False

### Augmented Assignment

In [132]:
x = 2
x += 1    # the same as x = x + 1
x

3

There is no x++ in Python.

In [133]:
x = 2
x -= 1    # the same as x = x - 1
x

1

In [134]:
x = 3
x *= 2    # the same as x = x * 2
x

6

In [135]:
x = 4
x /= 2    # the same as x = x / 2
x

2.0

In [136]:
x = 4
x **= 2    # the same as x = x ** 2
x

16

### Boolean Operations

Refer to https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not

In [137]:
p = True
q = False

In [138]:
p and q

False

In [139]:
p or q

True

In [140]:
not p

False

## ▪ Modules

In [141]:
math.sqrt(9)

NameError: name 'math' is not defined

In [142]:
import math
math.sqrt(9)

3.0

In [143]:
import numpy as np
import pandas as pd

In [144]:
from sklearn import svm 