# Python introduction

## Python

Python is a widely-used, interpreted, object-oriented, and high-level programming language with dynamic semantics, used for general-purpose programming. It was created by Guido van Rossum, and first released on February 20, 1991. It gets name after the comedy sketches Monty Python's Flying Circus.

* **open source** - source code is available to everyone and potentially anyone can contribute to its development. Source code for Python and its modules can be found on GitHub
* **interpreted** - not need to be compiled into machine code to run
* **high-level** - easy to write and read as it's closer to the human language. Thanks to the interpreter it can run on any machine
* **general purpose** - can be used for building softwares for various purposes
* **object-oriented** - focuses on "objects" that can be manipulated with methods or by changing its attributes. For example object can be list of numbers, if we want add a new number to the list ```append()``` method can be used
* **

#### Syntax

* Python has a simple syntax similar to the English language
* Python keywords are case-sensitive
* Python uses new lines to complete a command, unlike other languages which often use semicolons or parentheses.
* Python relies significantly on indentation to define the scope of if statement, loops, functions or classes while other languages can use curly braces {}
    
    **Java**
    ```java
    if (condition) {
    // block of code to be executed if the condition is true
    }
    ```
    **Python**
    ```python
    if condition:
        #TAB to create indentation
        #block of code to be executed if the condition is true
    ``` 
* **
    
#### Popular libraries

Python has a few in-built functionalities, but it's powerful thanks to wide range of libraries (modules).


* **Numpy** - mathematical functions, ranges and lists, n-dimensional arrays and matrices
* **Matplotlib** - static, animated and interactive visualizations and graphs
* **Pandas** - data analysis and manipulation
* **Scikit-learn** - machine learning
* **Scipy** - functions and data manipulation for science and engineering
* **Django** - web development
* **Beatiful Soup** - web scraping, pulling data out of HTML and XML files
* **Tensorflow** - large-scale neural networks
* **

#### PEP8 - style guide for Python code

Provides a guidelines and best practices how to write a Python code. https://www.python.org/dev/peps/pep-0008/

* **indentation** - use 4 spaces per indentation instead of tab. 
* **maximum line lenght** - limit all lines to a maximum of 79 characters
* **line break before binary operator** - for example adding up many variables which code doesn't fit into single line the following (+) sign should be at the beginning of the new line
* **blank lines** - surround top-level function and class definitions with two blank lines. Method definitions inside a class are surrounded by a single blank line.
* **imports** - importing libraries always should on separate lines
* **naming conventions** - ```b (single lowercase letter), B (single uppercase letter), lowercase, lower_case_with_underscores, UPPERCASE, UPPER_CASE_WITH_UNDERSCORES, CamelCase, mixedCase```. Class names should normally use the CapWords or CamelCase convention.
* **avoid variable names** - never use the characters 'l' (lowercase letter el), 'O' (uppercase letter oh), or 'I' (uppercase letter eye) as single character variable names.
* **

#### Assign variables

* value can be assigned to variable using ```=``` as ```x = 3```
* variables do not need to be declared with any particular type, and can even change type after they have been set
* data type of the variable can be set with casting as ```x = str(3)``` this will convert the integer 3 to string
* we can get the data type of a variable as ```type(x)```
* variable names are case-sensitive i.e. you can have variables *Var1* and *var1* at the same time and assign different values to them
* to output variable print function as ```print(x)```


In [1]:
%%html
<! --make tables left align-->
<style>
table {float:left}
</style>

In [None]:
#Import modules we will use later
import pandas as pd
import numpy as np
from numpy.random import randn

## 1. Data types

* **Numeric** - representing numbers Integer (whole number), Float (decimal)
* **Strings** - texts. Strings are usually written within single or double quotes
* **Booleans** - True or False
* **Lists** - is a store collection of data in one variable. List items are indexed and can be retrieved or deleted individually
* **Dictionaries** - a special collection which holds the data in key/value pair
* **Tuples** - tuples are very similar to lists but they are immutable, which means we cannot change the item. Although, tuples can contain mutable data types such as list
* **Sets** - unordered collection of data without duplicates

### Numeric

In [None]:
myFloat = 1.0
type(myFloat)

In [None]:
myInteger = 1
type(myInteger)

### Strings

* string represents text
* can be written either between single quotation or double quotation marks
* multiline string can be written with three single ''' or three double quotes """
* string is basically a collection of characters. Each of the character can be accessed by index

In [None]:
singleQuote = 'Hello world!'
doubleQuote = "Python is cool"
combinedQuotes = "Let's do some coding"

multiLines = """Hello world!
Python is cool
Let's do some coding
"""

In [None]:
print(type(singleQuote))
print(type(doubleQuote))
print(type(combinedQuotes))
print(type(multiLines))

| Text           | H   | e   | l   | l  | o  |    | w  | o  | r  | l  | d  | !  |
|----------------|-----|-----|-----|----|----|----|----|----|----|----|----|----|
| Index          | 0   | 1   | 2   | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 |
| Negative index | -12 | -11 | -10 | -9 | -8 | -7 | -6 | -5 | -4 | -3 | -2 | -1 |

In [None]:
#Get the last character
singleQuote[-1]

In [None]:
#Slicing - last index is not included
singleQuote[0:7]

In [None]:
#Slice from the start - last index is not include
singleQuote[:5]

In [None]:
#Slice to the end
singleQuote[6:]

In [None]:
#Slice with negative index - last index is not include
singleQuote[-6:-1]

### Booleans

* can be True or False
* results of any expression evaluation is boolean

In [None]:
type(True)

In [None]:
type(False)

In [None]:
print(10 > 9)
print(10 == 9)
print(10 < 9)

### Lists

* [  ] brackets
* indexed - first item has index [0], last item has index [-1]
* ordered - items following in a specific order
* mutable - can change, add or remove items
* allow duplicates

In [None]:
listNumbers = [1001,1002,1003]
listStrings = ['HM', '1K', 'Ex']
listBooleans = [True, True, False]
listCombined = [1001, 'HM', True]
listNested = [[1001,1002,1003],['HM', '1K', 'Ex'],[True, True, False]]

In [None]:
type(listNumbers), type(listStrings), type(listBooleans), type(listCombined), type(listNested)

In [None]:
#Access list item
listNested[1][0]

In [None]:
#Access the last item with index -1
listStrings[-1]

In [None]:
#Append new list item - it will automatically add new item at the end
print('Before: ', listNumbers)
listNumbers.append('4001')
print('After: ', listNumbers)

In [None]:
#Change list item
print('Before: ', listStrings)
listStrings[1] = 'HM'
print('After: ', listStrings)

In [None]:
#Add(append) or remove(pop) item at specific index
print('Before: ', listNumbers)
listNumbers.pop(-1)
print('After: ', listNumbers)

### Dictionaries

* { } curly braces
* used to store collection of values in **key:value** pair
* mutable - can change, add or remove items
* no duplicates are allowed

In [None]:
countryCodes = {'Czech':'CZ', 'Slovakia':'SL', 'Hungary':'HU'}
type(countryCodes)

In [None]:
#We can print all keys and values separately as a list
print(countryCodes.keys())
print(countryCodes.values())

In [None]:
#Add new list item - it will automatically add new item at the end
print('Before: ', countryCodes)
countryCodes['Poland'] = 'PL'
print('After: ', countryCodes)

In [None]:
#Change list item
print('Before: ', countryCodes)
countryCodes['Slovakia'] = 'SK'
print('After: ', countryCodes)

In [None]:
#Remove list item
print('Before: ', countryCodes)
countryCodes.pop('Poland')
print('After: ', countryCodes)

### Sets

* { } curly braces
* used to store collection of values
* unordered - item cannot be accessed by index, but by value
* immutable - cannot change individual item but can be added or removed
* no duplicates are allowed

In [None]:
mySet = {1001, 1001, 1001, 1002, 1002, 1003, 'HM'}
type(mySet)

In [None]:
mySet

In [None]:
mySet.add(1004)
mySet.remove("HM") #If the item to remove does not exist, discard() will NOT raise an error.
mySet

In [None]:
mySet[1]

### Tuples

* ( ) parentheses
* used to store collection of values
* ordered - item can be accessed by index, first[0], last[-1]
* immutable - cannot change individual item
* allow duplicates
* tuple unpacking - when assigning tuple elements into separate variables

In [None]:
myTuple = ("Ultra fresh", "Food grocery", "Non food grocery", "Ultra fresh")

In [None]:
myTuple[2]

In [None]:
#Packing tuples
theTuple = (10,20,30)

#Unpacking tuples
x, y, z = theTuple

print(x)
print(y)
print(z)

## 2. Operators

* Arithmetic operators
* Assignment operators
* Comparison operators
* Logical operators
* Identity operators
* Membership operators


### Arithmetic operators

Arithmetic operators are used with numeric values to perform common mathematical operations

| Operator |      Name      | Example |
|:--------:|:--------------:|:-------:|
|     +    | Addition       |  x + y  |
|     -    | Subtraction    |  x - y  |
|     *    | Multiplication |  x * y  |
|     /    | Division       |  x / y  |
|     %    | Modulus        |  x % y  |
|    **    | Exponentiation |  x ** y |
|    //    | Floor division |  x // y |

In [None]:
10 + 20

In [None]:
#returns the remainder of a division 
18 % 4 

In [None]:
2 ** 8

In [None]:
# 9/2=4.5 rounded down to 4
9 // 2

### Assignment operators

Assignment operators are used to assign values to variables. Assigment operations in combination with arithmethics operator can make your code shorter. Also can be used in a loops when we need to increase the value of counter variable by 1.

| Operator | Example |   Same As  |
|:--------:|:-------:|:----------:|
|     =    |  x = 5  |    x = 5   |
|    +=    |  x += 3 |  x = x + 3 |
|    -=    |  x -= 3 |  x = x - 3 |
|    *=    |  x *= 3 |  x = x * 3 |
|    /=    |  x /= 3 |  x = x / 3 |
|    %=    |  x %= 3 |  x = x % 3 |
|    //=   | x //= 3 | x = x // 3 |
|    **=   | x **= 3 | x = x ** 3 |

In [None]:
x = 5
print(x)

In [None]:
x *= 3
print(x)

### Comparison Operators

Comparison operators are used to compare two values or variable.

| Operator |           Name           | Example |
|:--------:|:------------------------:|:-------:|
|    ==    |           Equal          |  x == y |
|    !=    |         Not equal        |  x != y |
|     >    |       Greater than       |  x > y  |
|     <    |         Less than        |  x < y  |
|    >=    | Greater than or equal to |  x >= y |
|    <=    |   Less than or equal to  |  x <= y |

In [None]:
x = 10
y = 20

In [None]:
x == y

In [None]:
x != y

### Logical operators

Logical operators are used to combine conditional statements.

| Operator |                       Description                       |        Example        |
|:--------:|:-------------------------------------------------------:|:---------------------:|
|   and    |         Returns True if both statements are true        |   x < 5 and  x < 10   |
|    or    |      Returns True if one of the statements is true      |     x < 5 or x < 4    |
|    not   | Reverse the result, returns False if the result is True | not(x < 5 and x < 10) |


In [None]:
x = 10
y = 20

In [None]:
x < y and y % 2 == 0 # x is smaller than y AND y is even

In [None]:
x == 10 or y == 10 # either x or y is 10

In [None]:
not(True)

In [None]:
not(x < y and y % 2 == 0) # reverse the result True become False

### Identity operators

Identity operators are used to compare the objects, not if they are equal, but if they are actually the same object with the same memory location.

| Operator |                       Description                      |   Example  |
|:--------:|:------------------------------------------------------:|:----------:|
|    is    |   Returns True if both variables are the same object   |   x is y   |
|  is not  | Returns True if both variables are not the same object | x is not y |

In [None]:
x = [10, 20, 30]
y = [10, 20, 30]
z = x

In [None]:
x == y

In [None]:
x is y

In [None]:
z is x #z was created from x and therefore is the same object

### Membership operators

Membership operators are used to test if a sequence is presented in an object.


| Operator |                                     Description                                    |   Example  |
|:--------:|:----------------------------------------------------------------------------------:|:----------:|
|    in    |   Returns True if a sequence with the specified value is   present in the object   |   x in y   |
|  not in  | Returns True if a sequence with the specified value is   not present in the object | x not in y |

In [None]:
x = ["Budaörs", "Zlaté piesky", "Skalka", "Kapelanka" ]

In [None]:
"Skalka" in x

In [None]:
"Skalka" not in x

## 3. Conditionals, loops and range

* If, elseif, else
* For and while loops
* Range

### If ...elif... else

* ```if``` keyword tells to our program to run this part when certain condition is true
* ```elif``` keyword tells to our program "if the previous conditions were not true, then try this condition"
* ```else``` keyword tells to our program "if none of previous conditions were true, then run this"

In [None]:
if 1 < 2:
    print('2 is bigger than 1')

In [None]:
if 1 < 2:
    print('first')
else:
    print('last')

In [None]:
if 1 > 2:
    print('first')
else:
    print('last')

In [None]:
if 1 == 2:
    print('first')
elif 3 == 3:
    print('middle')
else:
    print('Last')

In [None]:
x = 10
y = 20
z = 30

if x < y:
    if y < z:
        print("x is smaller than y, and y is smaller than z")     
    else:
        print("x is smaller than y")

### For loop

* used to iterate through over a sequence (list, tuple, dictionary, set or string)
* we can use *break* statement if we want to finish the loop earlier 
* we can use *continue* statement if we want move to the next sequence without executing any code 

In [None]:
myStores = ['HM Nitra', 'HM Kosice', 'HM Petrzalka', 'SM Vrable', 'SM Krupina', 'SM Svidnik']

In [None]:
for store in myStores:
    print(store)

In [None]:
for store in myStores:
    if store[0:2] == 'HM':
        print(store)
    else:
        break

In [None]:
for store in myStores:
    if store[0:2] == 'HM':
        continue
    else:
        print(store)

In [None]:
for character in "Hello":
    print(character)

### While loop

* loop keeps executing the code inside while certain condition is true
* for this purpose we are using a loop control variable, which usually a counter for iteration
* at the end of each iteration we increment or decrement the counter based on our needs



In [None]:
i = 1 #loop control variable

while i < 6:
    print(i**i)
    i += 1 #add 1 to loop control variable

### Range

* range() function is used to create sequence of number
* it takes at least one argument the number of elements in the sequence
* for range function we can define *start, end, step size*
* range can be very useful with FOR LOOP when we want to tell to program the number of iterations

In [None]:
print(range(5))
print(range(0,5))
print(range(0,5,1))

In [None]:
for x in range(0,5,1):
    print(x)

In [None]:
list(range(5))

## 4. Classes, functions and lambda

### Functions

* function is a block of code that can be called anywhere in your program
* in Python we use the **def** to define a function
* you can pass data, known as parameters, into a function
* function can also **return** value

In [None]:
#Function without argument and return
def myFunc():
    print("Printed from a function")

In [None]:
myFunc()

In [None]:
def myPower(number,power=2):
    """Document your function thoroughly!
    
    This a very complicated function which raise your number to the power you want.
    Takes two arguments number and power.
    
    Power argument is 2 by default
    """
    return number ** power

In [None]:
print(myPower.__doc__)

In [None]:
defaultResult = myPower(10)
result = myPower(2,8)
print(defaultResult)
print(result)

### Lambda function

* often called anonymous function
* can take any number arguments, but can only have one expression
* we can assign lambda to a variable and call it later
* used alongside other functions like ```filter()```, ```map()``` and ```apply()```

In [None]:
myFunc = lambda x,y,z: x + y + z

In [None]:
myFunc

In [None]:
myFunc(2,3,5)

### Class

* Python is a object oriented and hence most of the time we are manipulating some sort of object often referring to a real life object
* classes are the 'blueprint' for creating objects
* classes can have methods(action) and attributes(data)
* in the real life class should always have ```__init__()``` function where we define object properties or operations when the class is initialised (i.e. first time called)
* when creating a function inside class we are using parameter variable ```self``` which refers to our class

In [None]:
class StackArray:
    """This class represent stack object. Stacks are data structures
    which storing a data in the way of LIFO (last in first out)
    """
    

    def __init__(self):
        self._data = []
        print("initialised")
        
    def __len__(self):
        return len(self._data)
    
    def isempty(self):
        return len(self._data) == 0
    
    def push(self, e):
        self._data.append(e)
        
    def pop(self):
        if self.isempty():
            print('Your stack is empty')
            return
        return self._data.pop()
    
    def top(self):
        if self.isempty():
            print('Stack is empty')
            return
        return self._data[-1]
    
    def values(self):
        return self._data

In [None]:
myClass = StackArray()

In [None]:
myClass.values()

In [None]:
myClass.push(100)

## 5. Map and filter

### Map

Applies a function to all items in an iterable list

In [None]:
myStores = ['HM Nitra', 'HM Kosice', 'HM Petrzalka', 'SM Vrable', 'SM Krupina', 'SM Svidnik']

In [None]:
storeNames = list(map(lambda store: store[3:],myStores))
print(storeNames)

### Filter

Filtering items based on function in an iterable list

In [None]:
myStores = ['HM Nitra', 'HM Kosice', 'HM Petrzalka', 'SM Vrable', 'SM Krupina', 'SM Svidnik']

In [None]:
myFunc = lambda store: store[0:2] == 'SM'

In [None]:
supermarkets = list(filter(myFunc, myStores))
print(supermarkets)

## 6. Pandas

* Python library for data analysis
* The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis" and was created by Wes McKinney in 2008
* using a fast and efficient *Series* and *DataFrame* objects for data manipulation
* very good alternative to MS Excel and VBA
* before using Pandas features we have to import the module ```import pandas as pd``` by convention


### Series

* we can consider Pandas Series as one dimensional array or data column in a table
* capable to holding any data type such as integer, float, string or even object
* index can be defined by the user, but it must be unique

In [None]:
storeID = [1001,1002,1003]
storeName = ['Nitra', 'Kosice', 'Banska Bystrica']

In [None]:
ser = pd.Series(storeName)
type(ser)

In [None]:
ser

In [None]:
ser[1]

In [None]:
#Series with custom index
pd.Series(storeName,index=storeID)

### Data frame

* two-dimensional data with rows and columns
* we can consider each columns as Pandas Series
* you can think of it as an SQL table or a spreadsheet data representation
* can be created using various inputs like *lists, dictionary, Series* or even from *another data frame*


In [None]:
storeID = [1001,1002,1003]
storeName = ['Nitra', 'Kosice', 'Banska Bystrica']
storeFormat = ['HM', 'HM', 'HM']
df = pd.DataFrame(data=list(zip(storeID, storeName, storeFormat)),columns=['store_id', 'store_name', 'store_format'])
df

#### Selecting and indexing

In [None]:
df = pd.DataFrame(randn(5,4),index='A B C D E'.split(),columns='W X Y Z'.split())
df

In [None]:
#Data type of a single column
type(df['W'])

In [None]:
#Selecting individual column
df['W']

In [None]:
#Alternatively you can select using loc[row label, column label]
df.loc[:,'W']

In [None]:
df.loc['A',:]

In [None]:
#Or using iloc[row index, column index]
df.iloc[:,0]

In [None]:
#Select columns by passing a list
df[['W','X']]

In [None]:
#Select columns with iloc
df.iloc[:,0:2]

In [None]:
#Create a new column
df['new'] =  df['W'] * 10
df

In [None]:
#Remove a column
df.drop('new', axis=1, inplace=True)
df

In [None]:
df.reset_index(drop=True, inplace=True)
df

In [None]:
dates = pd.date_range(start="01/01/2022", periods=5).strftime('%Y-%m-%d')
df.set_index(dates,inplace=True)

In [None]:
df.loc['2022-01-01']

#### Conditional selections

In [None]:
df>0

In [None]:
#Select rows where any of the values are greater than 0, display smaller as NaN
df[df>0]

In [None]:
#Select only rows where specific value of the column is less than 0
df[df['W']<0]

#### Loading data from file into DataFrame and exploring

* Pandas can read various data sources such as text files, JSON, Excel, SQL tables and many more
* When definining the filepath you can use absolute or relative path. 
* use ```pwd``` - print working directory or ```cd``` - change directory to find out the currenty working directory and change it if we want to work with relative path

In [None]:
pwd

In [None]:
forecastsDF = pd.read_csv('forecasts.txt', sep='\t')
storeDetailsDF = pd.read_excel('store_details.xlsx')

In [None]:
#First five rows of the DataFrame
forecastsDF.head()

In [None]:
#Last three rows of the DataFrame
storeDetailsDF.tail(3)

In [None]:
#List out indexes
storeDetailsDF.index

In [None]:
#List out all columns
storeDetailsDF.columns

In [None]:
#Shape of our data frame
forecastsDF.shape

In [None]:
#All elements in the data frame
forecastsDF.size

In [None]:
#Summary about the columns and data types
forecastsDF.info()

In [None]:
#Get basic statistics on numeric columns
forecastsDF.iloc[:,9:11].describe()

In [None]:
#Get basic statistics on categorical columns
forecastsDF.iloc[:,[2,6]].describe()

In [None]:
#Unique values from a series
forecastsDF['format_name'].unique()

In [None]:
#Value counts of a series
forecastsDF['format_name'].value_counts()

In [None]:
#Get number of rows where we have missing values
storeDetailsDF.isnull().sum()

In [None]:
#Get exact items where the location is null
storeDetailsDF[storeDetailsDF['location'].isnull()]

#### Manipulating data in data frame

In [None]:
#Cast to different datatype
print("Before: ", forecastsDF['tpnb'].dtypes)
forecastsDF['tpnb'] = forecastsDF['tpnb'].astype('str')
print("After: ", forecastsDF['tpnb'].dtypes)

In [None]:
#Sorting values - note that the records keep index values, can be reset using method reset.index()
forecastsDF.sort_values(['offer_number', 'store'], ascending=[False, True]).head()

In [None]:
#Use apply/map to create new column
def productDivision(x):
    divisionMapping = {'A':1, 'B':2, 'C':3}
    return divisionMapping.get(x[0])

forecastsDF['division'] = forecastsDF['sg_cd'].apply(productDivision)
forecastsDF['store_type'] = forecastsDF['format_name'].apply(lambda x: 'large' if x == 'HM' else 'small')
forecastsDF.head()

#### Group by

In [None]:
#Group by and sum some of the fields
forecastsDF.groupby('tpnb')[['forecast', 'capacity']].sum()

In [None]:
#Get the TPNB with the highest national forecast
forecastsDF.groupby('tpnb')['forecast'].sum().sort_values(ascending=False).head(1)

In [None]:
pd.pivot_table(forecastsDF, values=['store','forecast'], index=['offer_number', 'tpnb'], aggfunc={'store':len, 'forecast':[sum, max]})

#### Merging and concatenation

* ```pandas.merge()``` works similalry as joins in SQL
* ```pandas.concat()``` works as union in SQL

In [None]:
storeID = [1001,1002,1003,1004,1005,4001]
sales = np.random.randint(5,25,6)
storeName = ['Nitra', 'Kosice', 'Banska bystrica', 'Petrzalka', 'Trnava','Vrable']

leftTable = pd.DataFrame(list(zip(storeID,sales)), columns=['store_id', 'adj_sales'])
rightTable = pd.DataFrame(list(zip(storeID,storeName)), columns=['store_id', 'store_name'])

In [None]:
rightTable

In [None]:
leftTable

In [None]:
pd.merge(leftTable, rightTable, how='inner', left_on='store_id', right_on='store_id')[['store_id', 'store_name','adj_sales']]

In [None]:
salesDF = pd.merge(leftTable, rightTable, how='left', left_on='store_id', right_on='store_id')[['store_id', 'store_name','adj_sales']]
salesDF

In [None]:
df1 = pd.DataFrame(
     {
         "A": ["A0", "A1", "A2", "A3"],
         "B": ["B0", "B1", "B2", "B3"],
         "C": ["C0", "C1", "C2", "C3"],
         "D": ["D0", "D1", "D2", "D3"],
     },
 )

df2 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["C4", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    },
    
)

frames = [df1, df2]

pd.concat(frames,axis=0)

#### Export results

With Pandas we can also export our results into most used data formats such as csv, Excel, JSON and many more.

In [None]:
salesDF.to_csv('exported_sales.csv', sep='|')