Some Python Basics (iPython Notebook)
=================================

Basics
------

Comments in Python:

In [1]:
# you can mix text and code in one place and
# run code from a Web browser

All you need to know about Python is here:

You don't need to specify type of a variable

In [2]:
a = 11

In [3]:
print(a)
print(type(a) is bool)
type(a)

11
False


int

You can assign several variables at once:

In [4]:
a, b = 1, 2
a, b

(1, 2)

In [5]:
b, a = a, b
a, b

(2, 1)

There is no "begin-end"! You use indentation to specify blocks. Here is simple IF statement:

In [6]:
if a > b:
    print("A is greater than B")
    print("x")
else:
    print("B is greater than A")

A is greater than B
x


Types
-----

In [7]:
# Integer
a = 1
print(a)

# Float
b = 1.0
print(b)

# String
c = "Hello world"
print(c)

# Unicode
d = u"Привет, мир!"
print(d)

# List (array)
e = [1, 2, 3]
print(e[2]) # 3

# Tuple (constant array)
f = (1, 2, 3)
print(f[0]) # 1

# Set
g = {1, 1, 1, 2}
print(g)

# Dictionary (hash table, hash map)
g = {1: 'One', 2: 'Two', 3: 'Three'}
print(g[1]) # 'One'

1
1.0
Hello world
Привет, мир!
3
1
{1, 2}
One


Loops
-----

### for

In [8]:
for i in range(10):
    print(i)

0
1
2
3
4
5
6
7
8
9


### while

In [9]:
i = 0
while i < 10:
    print(i)
    i += 1

0
1
2
3
4
5
6
7
8
9


### List and Enumerate

In [10]:
items = ['apple', 'banana', 'stawberry', 'watermelon']
# append an element
items.append('blackberry')
# removes last element
items.pop()
# insert element on the 2nd position
items.insert(1, 'blackberry')
# lenght of the list
print('Length of the list:', len(items),'\nElements:')
for item in items:
    print('- ', item)

Length of the list: 5 
Elements:
-  apple
-  blackberry
-  banana
-  stawberry
-  watermelon


In [11]:
for i, item in enumerate(items):
    print(i, item)

0 apple
1 blackberry
2 banana
3 stawberry
4 watermelon


Python code style
=================

There is PEP 8 (Python Enhancement Proposal), which contains all wise ideas about Python code style. Let's look at some of them:

Naming
------

In [12]:
# Variable name
my_variable = 1

# Class method and function names
def my_function():
    pass

# Constants
MY_CONSTANT = 1

# Class name
class MyClass(object):
    # 'private' variable - use underscore before a name
    _my_variable = 1

    # 'protected' variable - use two underscores before a name
    __my_variable = 1

    # magic methods
    def __init__(self):
        self._another_my_variable = 1

String Quotes
-------------

PEP 8 quote:
> In Python, single-quoted strings and double-quoted strings are the same. PEP 8 does not make a recommendation for this. Pick a rule and stick to it. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string. It improves readability.

> For triple-quoted strings, always use double quote characters to be consistent with the docstring convention in PEP 257.

My rule for single-quoted and double-quoted strings is:
1. Use single-quoted for keywords;
2. Use double-quoted for user text;
3. Use tripple-double-quoted for all multiline strings and docstrings.

In [13]:
'string'

"another string"

"""Multiline
string"""

'''
Another
multiline
string
'''

'\nAnother\nmultiline\nstring\n'

Some tricks
-------------

Sum all elements in an array is straightforward:

In [14]:
sum([1,2,3,4,5])

15

However, there is no built-in function for multiplication:

In [15]:
mult([1,2,3,4,5])

NameError: name 'mult' is not defined

So we have to write our solution. Let's start with straightforward one:

In [16]:
def mult(array):
    result = 1
    for item in array:
        result *= item
    return result

In [17]:
mult([1,2,3,4,5])

120

In [19]:
import numpy as np
import pandas as pd
import time

# load data from a csv file using pandas (pd)
start_time = time.time()
df = pd.read_csv('01_sample_data_movies.csv', encoding='utf8')
df.head()

Unnamed: 0,imdbID,title,year,rating,runtime,genre,released,director,writer,cast,...,imdbRating,imdbVotes,poster,plot,fullplot,language,country,awards,lastupdated,type
0,1,Carmencita,1894,NOT RATED,1 min,"Documentary, Short",,William K.L. Dickson,,Carmencita,...,5.9,1032.0,http://ia.media-imdb.com/images/M/MV5BMjAzNDEw...,Performing on what looks like a small wooden s...,Performing on what looks like a small wooden s...,,USA,,2015-08-26 00:03:45.040000000,movie
1,5,Blacksmith Scene,1893,UNRATED,1 min,Short,1893-05-09,William K.L. Dickson,,"Charles Kayser, John Ott",...,6.2,1189.0,,Three men hammer on an anvil and pass a bottle...,A stationary camera looks at a large anvil wit...,,USA,1 win.,2015-08-26 00:03:50.133000000,movie
2,3,Pauvre Pierrot,1892,,4 min,"Animation, Comedy, Short",1892-10-28,�mile Reynaud,,,...,6.7,566.0,,"One night, Arlequin come to see his lover Colo...","One night, Arlequin come to see his lover Colo...",,France,,2015-08-12 00:06:02.720000000,movie
3,8,Edison Kinetoscopic Record of a Sneeze,1894,,1 min,"Documentary, Short",1894-01-09,William K.L. Dickson,,Fred Ott,...,5.9,988.0,,A man (Thomas Edison's assistant) takes a pinc...,A man (Edison's assistant) takes a pinch of sn...,,USA,,2015-08-10 00:21:07.127000000,movie
4,10,Employees Leaving the Lumi�re Factory,1895,,1 min,"Documentary, Short",1895-03-22,Louis Lumi�re,,,...,6.9,3469.0,,A man opens the big gates to the Lumi�re facto...,A man opens the big gates to the Lumi�re facto...,,France,,2015-08-26 00:03:56.603000000,movie


In [20]:
# some infos on the data-frame (columns, memory usage)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1123 entries, 0 to 1122
Data columns (total 21 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   imdbID       1123 non-null   int64  
 1   title        1123 non-null   object 
 2   year         1123 non-null   int64  
 3   rating       628 non-null    object 
 4   runtime      1112 non-null   object 
 5   genre        1118 non-null   object 
 6   released     1080 non-null   object 
 7   director     1121 non-null   object 
 8   writer       1011 non-null   object 
 9   cast         1089 non-null   object 
 10  metacritic   4 non-null      float64
 11  imdbRating   1120 non-null   float64
 12  imdbVotes    1120 non-null   float64
 13  poster       808 non-null    object 
 14  plot         1054 non-null   object 
 15  fullplot     1038 non-null   object 
 16  language     925 non-null    object 
 17  country      1122 non-null   object 
 18  awards       208 non-null    object 
 19  lastup

In [21]:
# description of data-frame, including the 7-number summary
df.describe()

Unnamed: 0,imdbID,year,metacritic,imdbRating,imdbVotes
count,1123.0,1123.0,4.0,1120.0,1120.0
mean,17168.562778,1925.779163,91.5,6.910893,2126.257143
std,7060.945774,8.014872,4.50925,0.733135,7452.461114
min,1.0,1892.0,88.0,4.0,6.0
25%,13137.0,1922.0,88.75,6.5,271.75
50%,19646.0,1929.0,90.0,7.0,594.5
75%,22789.5,1932.0,92.75,7.4,1273.75
max,24989.0,1935.0,98.0,8.6,99845.0


Very important references
=====================

* PEP 8 - Style Guide for Python Code: https://www.python.org/dev/peps/pep-0008/
* Python 3 Documentation: https://docs.python.org/3/

##### Exercise: Take the following list and write a program that prints out all the elements of the list that are smaller than 5.

In [22]:
a = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

for item in a:
    if item < 5:
        print(item)

1
1
2
3


##### Exercise: Take the following two lists and write a program that returns a list that contains only the elements that are common between the lists (without duplicates). Make sure your program works on two lists of different sizes. Moreover, try to find a 1-line-solution (using sets).

In [26]:
import random
a = random.sample(range(1, 100), 10)
b = random.sample(range(1, 100), 12)

c = set(a).intersection(set(b))
print(a, b, c)

[52, 23, 81, 2, 74, 54, 21, 76, 45, 6] [7, 28, 92, 66, 96, 72, 61, 32, 2, 21, 38, 45] {2, 21, 45}


##### Exercise: Write one line of Python that takes the following list a and makes a new list that has only the even elements of this list in it.

In [29]:
a = [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]
b = a[1::2]
print(b)

[4, 16, 36, 64, 100]


##### Exercise: Implement a function that takes as input three variables, and returns the largest of the three. Do this without using the Python max() function!

In [48]:
def max_function(list):
    list.sort()
    print(list)
    return list.pop()

a = random.sample(range(1, 100), 3)
largest = max_function(a)
print(largest)

[45, 70, 72]
72


##### Exercise: Write a Python program to concatenate following dictionaries to create a new one.

In [51]:
dic1={1:10, 2:20}
dic2={3:30, 4:40}
dic3={5:50, 6:60}

dic4 = {**dic1, **dic2, **dic3}
print(dic4)

{1: 10, 2: 20, 3: 30, 4: 40, 5: 50, 6: 60}


##### Exercise: With a given integral number n, write a program to generate a dictionary that contains (i, i*i) such that i is an integral number between 1 and n (both included). Remark: User input can be captured using the command input().

In [57]:
print("Enter a number:")
n = int(input())
i = random.randint(1, n)
dic = {i: i*i}
print(dic)

Enter a number:
{7: 49}


##### Exercise: Generate a random number between 1 and 25 (including 1 and 25). Ask the user to guess the number, then tell them whether they guessed too low, too high, or exactly right. Remark: Import and use the random library.

In [63]:
# generate a random number
import random
number = random.randint(1,25)

success = False
x = 0
while not success:
    print("Guess:")
    x = int(input())
    if x > number:
        print("Too high!")
    elif x < number:
        print("Too low!")
    else:
        print("Exactly right")
        success = True

Guess:
Too low!
Guess:
Too low!
Guess:
Too low!
Guess:
Too low!
Guess:
Too low!
Guess:


ValueError: invalid literal for int() with base 10: ''