##Basics of Python

Python is a high-level, dynamically typed multiparadigm programming language. Python code is often said to be almost like pseudocode, since it allows you to express very powerful ideas in very few lines of code while being very readable. 

# Getting Started!

#### Variables 

In [1]:
s = "Python for data science"
s
type(s)   # prints the last value returned in the cell, no need to call print()

str

In [None]:
print(type(s))   # print formats the same output

<class 'str'>


In [2]:
s = 10  # a variable can be assigned to another data type
type(s)

int

In [4]:
x = 2
y = 5
xy = 'Hey'

In [5]:
print(x, y, xy)  # printing multiple variables

2 5 Hey


#### Aritmetic Operators

| Symbol | Task |
|---|---|
| `+` | addition |
| `-` | subtraction |
| `*` | multiplication |
| `/` | division |
| `%` | modulo |
| `//` | integer division |
| `**` | power |

In [None]:
print(5/2)

2.5


In [None]:
print(5//2)

2


In [6]:
print(5**2)

25


#### Some of the built-in functions

In [8]:
print(int("10"))
print(int("10110", 2))  # second parameter is the base
print(int(10.1))

10
22
10


In [None]:
print(float("25.6"))
print(float(11))

25.6
11.0


In [9]:
print(chr(65))  # converting ASCII to its alphabet equivalent
print(ord("A"))

A
65


In [10]:
print(round(5.6231))  # to the nearest integer
print(round(5.6231, 3))  # specified precision

6
5.623


In [11]:
print(pow(2,3))  # base, power
print(pow(3,2,2))  # base, power, modulo

8
1


In [12]:
input()  # waits till input is provided, then returns the input as a string

6


'6'

In [13]:
num = input("please enter an integer: ")
print(type(num))  # always a str

num = int(num)
print("after casting")
print(type(num))

please enter an integer: 16
<class 'str'>
after casting
<class 'int'>


#### Precision


In [14]:
val = 3.121312312312

In [16]:
print("%.4f" % val)

3.1213


In [None]:
print("{:.4f}".format(val))

3.1213


In [None]:
print("{:.3f}, {:f}, {:.1f}".format(val, 10, 23.2425))

3.121, 10.000000, 23.2


In [19]:
print("{1:.3f}, {0:f}, {2:.1f}".format(val, 10, 23.2425))

10.000, 3.121312, 23.2


#### Data Structures

**Lists**

In [21]:
lst = [1,2,3,4]  # empty list

In [22]:
print(type(lst))

<class 'list'>


In [None]:
lst_str = ["cs", "210"]
lst_mixed = ["cs", 210]  # can store different data types

**List indexing**

In [27]:
lst = [1, 2, 3, 4, 5]  # index range starts from 0 to length - 1 

print(len(lst))  # length of the list
print(lst[0])  # first element
print(lst[len(lst)-1])  # without -1 we would get an error

5
1
5


In [24]:
len(lst)

5

In [None]:
lst[-1]

5

For list indexing, Python also has negative indices.

![alt](https://qph.fs.quoracdn.net/main-qimg-a380b1bc159589df5e0b9842e5b56b6d)

**Nested Lists**

In [28]:
lst = [ [1, 2, 3], [4, 5, 6] ]
print(lst[0][2])  # lst[first row][third column]

3


**List Slicing**

In [30]:
lst = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print(lst[1:5])  # lst[start_index:end_index], end_index is not inclusive
print(lst[1:8:2])  # lst[start_index:end_index:stride]

print(lst[::-1])  # reversing a list

[2, 3, 4, 5]
[2, 4, 6, 8]
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1]


**List Methods**

In [31]:
lst = []

lst.append(1)  # appends a new element to the end of the list
print(lst)

lst.append([1,4])
print(lst)

lst.append('two')
print(lst)

[1]
[1, [1, 4]]
[1, [1, 4], 'two']


In [32]:
lst = [1, 2, 3, 4, 5]

lst.remove(4)  # removes the element from the list
print(lst)

lst.pop()  # removes the last element
print(lst)

lst.pop(0)  # removes the element at the given index
print(lst)

[1, 2, 3, 5]
[1, 2, 3]
[2, 3]


In [33]:
lst = [1, 2, 3, 4, 5]

print(min(lst))  # min of list
print(max(lst))  # max of list
print(sum(lst))  # sum of list

print(lst.index(4))  # returns the index of the given element

1
5
15
3


In [34]:
lst = [1, 2, 3, 4, 5]

print(8 in lst)  # in operator checks if the elements is in the list
print(10 not in lst)

False
True


In [35]:
lst = [4, 5, 1, 2, 3]

lst.sort()
print(lst)

lst.sort(reverse=True)
print(lst)

[1, 2, 3, 4, 5]
[5, 4, 3, 2, 1]


In [None]:
# must be careful when assigning lists
a = [1, 2, 3]

b = a  # at this point, a and b are pointing the same address in the memory

b.append(10)  # the effects of this operation will also be visible on a

print(a)

c = [1, 2, 3]

b = c.copy()  # copy function only returns the values, not the address

b.append(10)

print(c)

[1, 2, 3, 10]
[1, 2, 3]


**Tuples**

In [36]:
# very similar to list, but elements in a tuple can't be modified (immutable)
tpl = tuple([1, "a", 23.56])

tpl[0] = 10  # gives an error

TypeError: ignored

In [None]:
tpl = tuple([1, "a", 23.56])

print(tpl[2])  # again, we have indices
print(tpl[0:2])  # slicing

# most of the time, tuples are used to group computation results

23.56
(1, 'a')


**Sets**

In [None]:
# can only contain unique values
set1 = set()
print(type(set1))

<class 'set'>


In [None]:
set1 = set([1, 1, 2, 3, 5, 5])
print(set1)

{1, 2, 3, 5}


In [37]:
set1 = set([1, 1, 2, 3, 5, 5])
set1.add(9)

print(set1)

{1, 2, 3, 5, 9}


In [None]:
# useful for getting unique values from a list
lst = [1, 1, 2, 3, 5, 5]

print(set(lst))

{1, 2, 3, 5}


In [None]:
set1 = set([1,2,3])
set2 = set([2,3,4,5])

print(set1.union(set2))
print(set1.intersection(set2))
print(set2.difference(set1))

set1.clear()
print(set1)

{1, 2, 3, 4, 5}
{2, 3}
{4, 5}
set()


**Dictionaries**

In [None]:
# each entry consists of a key, value pair
# to reach a particular value in the dictionary
# we need to call its associated key
courses = {}  # empty dictionary

print(len(courses))

courses["cs210"] = 200
courses["cs201"] = 300

print(len(courses))

print(courses["cs210"])

print(courses.keys())
print(courses.values())
print(courses.items())  # returns keys and values as tuples

0
2
200
dict_keys(['cs210', 'cs201'])
dict_values([200, 300])
dict_items([('cs210', 200), ('cs201', 300)])


 **String Methods**

In [38]:
word = "CS210 - Introduction to Data Science"

# indexing and slicing
print(word[0])
print(word[2:6])


# word[2] = "a"  # yields an error, strings are immutable

print(word.find("a"))  # returns the first index of occurrence
print(word.split(" "))  # returns a list in which separated substrings exist
print(word.upper())
print(word.replace("a", "e"))

word = "  \t \n CS210 - Introduction to Data Science \n\n"

print(word.strip())  # removes preceding and trailing whitespaces
print(word.strip("\n"))  # removes preceding and trailing new line characters
print(word.strip(" \t\n")) # removes preceding and trailing whitespace, tab and new line characters

C
210 
25
['CS210', '-', 'Introduction', 'to', 'Data', 'Science']
CS210 - INTRODUCTION TO DATA SCIENCE
CS210 - Introduction to Dete Science
CS210 - Introduction to Data Science
  	 
 CS210 - Introduction to Data Science 
CS210 - Introduction to Data Science


### Control Flows Statements

**If statements**

In [39]:
if True:
    print("ok")

ok


In [None]:
if not 10 > 3:
    print("ok")
else:
    print("not ok")

not ok


In [None]:
if 0:  # same with most of the languages
    print("0")
elif 10:
    print("10")
else:
    print("...")

10


**Loops**

In [None]:
n = 5
for i in range(n):  # generates a sequence from 0 to n-1
    print(i)

0
1
2
3
4


In [None]:
n = 10
for i in range(2, n, 3):  # start value, end value, stride
    print(i)

2
5
8


In [None]:
lst = [1, 2, 3]
for i in lst:  # list traversing
    print(i)

1
2
3


In [None]:
lst = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
for i in lst:
    print(i)

[1, 2, 3]
[4, 5, 6]
[7, 8, 9]


In [40]:
i = 0
while i < 3:
    print(i)
    i += 1

0
1
2


**List Comprehension**

In [41]:
lst = [i for i in range(5)]

print(lst)

[0, 1, 2, 3, 4]


In [42]:
lst = [i for i in range(10) if i % 2 == 0]

print(lst)

[0, 2, 4, 6, 8]


### File Operations

Please add the shared files to your drive, so that the code below can be executed.

In [None]:
# first, you need to permit colab 
# to access your drive folder
from google.colab import drive
drive.mount("./drive", force_remount=True)

Mounted at ./drive


In [None]:
# this is the path to your drive
path = "/content/"
filename = "courses.txt"

file_obj = open(path + filename, "r")  # r -> reading, w -> writing, a -> appending

contents = file_obj.read()
print(contents)

file_obj.close()

cs210 200
cs201 300
cs300 80
cs301 60
cs404 50


In [None]:
with open('/content/courses.txt') as f:
  print(f.readlines())

['cs210 200\n', 'cs201 300\n', 'cs300 80\n', 'cs301 60\n', 'cs404 50']


In [None]:
# instead of explicitly closing files, we can use with statement
path = "/content/"
filename = "courses.txt"

with open(path + filename, "r") as f:
    contents = f.read()
    print(contents)
    
# when the scope of with ends, the file is closed

cs210 200
cs201 300
cs300 80
cs301 60
cs404 50


In [None]:
path = "/content/"
filename = "courses.txt"
course_dict = {}  # at the end, it will store courses as keys and num of students as their values

with open(path + filename, "r") as f:
    lines = f.readlines()  # returns a list of lines from the file
    for line in lines:
        line = line.strip("\n")  # remove the new line character
        info = line.split()
        
        course_name = info[0]
        num_students = info[1]
        
        course_dict[course_name] = num_students
        
print(course_dict)
sorted(course_dict.items())

{'cs210': '200', 'cs201': '300', 'cs300': '80', 'cs301': '60', 'cs404': '50'}


[('cs201', '300'),
 ('cs210', '200'),
 ('cs300', '80'),
 ('cs301', '60'),
 ('cs404', '50')]

In [None]:
import json  # whenever we want use a function from the library, we need json. as a prefix
#from json import *  # no need for prefixes, but overrides are possible
#from json import load  # only load function is imported

from pprint import pprint

path = "/content/"
filename = "quiz.json"

with open(path + filename, "r") as f:
    # we opened the file, but not yet read the file itself
    data = json.load(f)

pprint(data)

{'quiz': {'maths': {'q1': {'answer': '12',
                           'options': ['10', '11', '12', '13'],
                           'question': '5 + 7 = ?'},
                    'q2': {'answer': '4',
                           'options': ['1', '2', '3', '4'],
                           'question': '12 - 8 = ?'}},
          'sport': {'q1': {'answer': 'Huston Rocket',
                           'options': ['New York Bulls',
                                       'Los Angeles Kings',
                                       'Golden State Warriros',
                                       'Huston Rocket'],
                           'question': 'Which one is correct team name in '
                                       'NBA?'}}}}


### Functions

Functions help break our program into smaller and modular chunks.  
As our program grows larger and larger, functions make it more organized and manageable.

You may declare functions with the following sytanx.

``` py
>>> def func_name(parameters)
    body
```

In [43]:
def print_hello():
    """
    just a silly function without any parameter
    """
    print("hello")

print_hello  # returns the function reference
#print_hello() # calling the function

print_hello ()

hello


In [44]:
def even_or_odd_P(x):
    """
    Prints whether x is even or odd 
    """
    if (x % 2 == 0): 
        print("even")
    else: 
        print("odd")

even_or_odd_P(7)

odd


In [None]:
# If we assing the result to a variable
# nothing will be there since the function does not return a value
result = even_or_odd_P(17)
# will print None
print(result)

odd
None


In [45]:
def even_or_odd_R(x):
    """
    Returns whether x is even or odd 
    """
    if (x % 2 == 0): 
        return "even"
    else: 
        return "odd"

# now, we can see the assigned value
result = even_or_odd_R(10)
print(result)

even


# Some Notes on Python

Now, we are going to go through a couple of concepts that will be quite useful throughtout the semester, especially when dealing with data. The list of the concepts are below. 

- Zip Function for Parallel Iteration
- List Comprehension
- Map-Filter-Reduce
- Lambda Expressions

## Zip Function

In some cases, there is a need to iterate over a set of paired objects. One may use a shared index variable for this task.

In [None]:
fruits = ["banana", "apple", "orange"]
prices = [60, 70, 80]

for i in range(len(fruits)):
  print(fruits[i], prices[i])

banana 60
apple 70
orange 80


Instead of dealing with the shared index, one could also utilize the `zip` function. It takes the iterable objects to be bounded as its input and returns a zip object.

In [None]:
fruits = ["banana", "apple", "orange"]
prices = [60, 70, 80]

zip(fruits, prices)

<zip at 0x7fa152c84a80>

You may cast this zip object into other data types. Or put it in a for loop as well.

In [None]:
list(zip(fruits, prices))

[('banana', 60), ('apple', 70), ('orange', 80)]

In [None]:
for i in zip(fruits, prices):
  print(i)

('banana', 60)
('apple', 70)
('orange', 80)


### Unpacking a tuple object

In [None]:
tuple_obj = ("item", 10)
item, price = tuple_obj

print(item, price)

item 10


As you can see, in the for loop each iteration it returns a tuple object. So, with the help of tuple unpacking, we could have written the loop:

In [None]:
for fruit, price in zip(fruits, prices):
  print(fruit, price)

banana 60
apple 70
orange 80


You may zip as many iterable objects as you like.

In [None]:
dates = ["12/05/2020", "08/11/2019", "01/01/1999"]

for i in zip(fruits, prices, dates):
  print(i)

('banana', 60, '12/05/2020')
('apple', 70, '08/11/2019')
('orange', 80, '01/01/1999')


## List Comprehension

Python is renowned as a very readable programming language. Most of the statements are akin to plain English. List comprehension is a distinct feature of Python that enables you to reduce the iterations, loops, to a single line.

Most of the programs utilize loops

- To initiate an object (list, dict, etc.)
- As control mechanism
- And so on

In [None]:
foo = []
for i in range(10):  # values between 0-9
  foo.append(i)

for i in foo:
  if i % 2 == 0:  # prints even numbers
    print(i)

0
2
4
6
8


With the help of list comprehension, the statements above can be written more concisely.

In [None]:
foo = [i for i in range(10)]

evens = [i for i in foo if i % 2 == 0]
evens

[0, 2, 4, 6, 8]

A list comprehension statement is evaluated with respect to three parts:

``` py
list = [expression for member in iterable]
```

The statement above is equivalent to:

```py
list = []
for member in iterable:
  list.append(expression)
```

Here `expression` can be any Python statement; it can be a function call or a very basic math statement.

Additionally, you may include logical operators as well. For instance, in the code cell above where only the even numbers were extracted, we utilized an inline if statement.

``` py
evens = [i for i in foo if i % 2 == 0]
```

Which is again the equivalent to:

``` py
evens = []
for i in foo:
  if i % 2 == 0:
    evens.append(i)
```

Selection can be quite easy with the help of list comprehension.

Here, we have a list of dictionaries in which several items are contained with their associated attributes. If we are only interested with the `price` attribute, we can construct a list comprehension statement.

In [None]:
list_of_dicts = [
                 {"date": "12.07.2020", "price": 5.2, "product_name": "apple"},
                 {"date": "13.07.2020", "price": 4.7, "product_name": "kiwi"},
                 {"date": "14.07.2020", "price": 3.1, "product_name": "grape"},
                 {"date": "14.07.2020", "price": 8.2, "product_name": "orange"}
]

price = [item["price"] for item in list_of_dicts]
price

[5.2, 4.7, 3.1, 8.2]

You may apply nested loops as well.

In [None]:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

flattened = [i for row in matrix for i in row]
print(flattened)

# equivalent to
flattened = []
for row in matrix:
  for i in row:
    flattened.append(i)
print(flattened)

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9]


And finally, you may create other iterable objects as well. All you have to do is to change the encircling characters to the required object type and the expression.

In [None]:
import random

fruits = ["banana", "apple", "orange"]
# creating brand-price pairs 
fruit_price = {fruit: random.randint(50, 100) for fruit in fruits}
fruit_price

{'banana': 60, 'apple': 100, 'orange': 70}

## Map-Filter-Reduce

Let's say you have a function and it needs to be applied on each entry in an iterable object. And this function can manipulate the entries, selects a portion based on a condition or accumulates them. 

Map, Filter and Reduce help us implement such mechanisms with eliminating the need for explicit for loops and so on.

### Map

`Map` allows you to apply a function to each element in an iterable object and returns a map object that can be cast into various types.

``` py
def function_to_be_applied(element):
  # does something with the element
  # and returns a value
  return ...

map(function_to_be_applied, iterable_object)
```

In [46]:
def add_two(num):
  return num + 2

nums = [1, 2, 3]
result = []
for i in nums:
  result.append(add_two(i))
print(result)

# equivalent to
list(map(add_two, nums))

[3, 4, 5]


[3, 4, 5]

### Filter

`Filter` allows you to conditionally select entries in an iterable object. It returns a filter object that can be cast into various types.

``` py
def function_to_be_applied(element):
  # does something with the element
  # at the must return a boolean value
  return True/False

filter(function_to_be_applied, iterable_object)
```

In [None]:
def smaller_than_10(n):
  return n < 10

list(filter(smaller_than_10, [4, 9, 10, 15, 21]))

[4, 9]

### Reduce

And finally, `Reduce` takes a function and iterable object as its inputs and produce a single value by:

1. First, the function is executed with the first two entries in the iterable object and the result is returned.

2. The function is then executed again with the result obtained in step 1 and the next entry in the iterable object. This execution is performed till there are no more entries left. 

``` py
from functools import reduce

def function_to_be_applied(element1, element2):
  # does something with the elements
  # at the must return a value to be used in the next iteration
  return ...

reduce(function_to_be_applied, iterable_object)
```

In [47]:
from functools import reduce

def summation(x, y):
  print(x, y)
  return x + y

result = reduce(summation, [1, 2, 3, 4])
print("result", result)

1 2
3 3
6 4
result 10


Moreover, you can also start the execution with a user-defined initial value.

In [None]:
def summation(x, y):
  print(x, y)
  return x + y

result = reduce(summation, [1, 2, 3, 4], 10)  # summation will begin with 10
print("result", result)

10 1
11 2
13 3
16 4
result 20


In [None]:
# a list of dicts that contain an invitation and the responses
# we are going to create another dict that stores the count of status
guestlist = [
             {"guest_id": "a520", "status": "pending"},
             {"guest_id": "b125", "status": "attending"},
             {"guest_id": "c886", "status": "declined"},
             {"guest_id": "d336", "status": "attending"},
]

# our goal is to create a dict object with statuses and associated counts
# {'attending': 2, 'declined': 1, 'pending': 1}

def counts(status_counts, invitation):
  if invitation["status"] in status_counts:
    status_counts[invitation["status"]] += 1
  else:
    status_counts[invitation["status"]] = 1
  return status_counts

reduce(counts, guestlist, {})

{'pending': 1, 'attending': 2, 'declined': 1}

Keep in mind that you can use the output of a function as the input of another.

In [None]:
# find the guests older than 20 and print their names capitalized
guestlist = [
             {"guest_id": "a520", "name": "john", "status": "pending", "age": 19},
             {"guest_id": "b125", "name": "karen", "status": "attending", "age": 25},
             {"guest_id": "c886", "name": "tom", "status": "declined", "age": 30},
             {"guest_id": "d336", "name": "jen", "status": "attending", "age": 40}
]

def age_filter(guest): 
  return guest["age"] > 20

def capitalize(guest):
  return guest["name"].capitalize()

list(map(capitalize, filter(age_filter, guestlist)))

['Karen', 'Tom', 'Jen']

## Lambda Expressions

In the examples above, we needed to create very minimal functions for very simple tasks. And to create a function, you need to include the required keywords such `def`, `return` and a function name.

In such cases, writing such a formal function definition may take some time. Instead, you can utilize an `anonymous function` to get the job done. 

- An anonymous function is not associated with a name.
- Can have any number of arguments but only a single expression which is directly returned.
- Definitions start with the `lambda` keyword.

``` py
lambda argument(s): expression
```

In [48]:
add_2 = lambda x: x + 2

add_2(9)

11

Lambda expressions can be used in the previous examples to shorten the code.

In [None]:
guestlist = [
             {"guest_id": "a520", "name": "john", "status": "pending", "age": 20},
             {"guest_id": "b125", "name": "karen", "status": "attending", "age": 25},
             {"guest_id": "c886", "name": "tom", "status": "declined", "age": 30},
             {"guest_id": "d336", "name": "jen", "status": "attending", "age": 40}
]

list(map(lambda guest: guest["name"].capitalize(), filter(lambda guest: guest["age"] > 20, guestlist)))

['Karen', 'Tom', 'Jen']

## Choice between List Comprehension and Map-Filter-Reduce

It's a design choice. Almost all of the `map` or `filter` statements can be rewritten with list comprehension.

In [None]:
nums = list(range(10))

print(list(map(lambda x: x**2, filter(lambda x: x % 2 == 0, nums))))

print([x**2 for x in nums if x % 2 == 0])

[0, 4, 16, 36, 64]
[0, 4, 16, 36, 64]


- In most of the cases, list comprehension is more concise and readable.
- Depending on the expression, there are differences in terms of the execution speed.

Lambda expressions allow a single expression to be evaluated with no bounded name. This may induce a readability issue. Do not hesitate to write a regular named functions with appropriate comments if you are in such a need.