# Before getting started

### Jupyter Notebook Cheat Sheet

https://www.edureka.co/blog/wp-content/uploads/2018/10/Jupyter_Notebook_CheatSheet_Edureka.pdf

Some handy shortcuts
- Enter: Enter edit mode
- Ctrl + Enter: Run cells
- Shift + Enter: Run cells and select below 
- Alt + Enter: Run cells and insert below
- Escape + b: Insert Cell below
- Escape + a: Insert Cell above
- Escape + d + d: Delete selected cell
- Escape + i + i: Interrupt the kernal

### Jupyter Notebook extensions for increased productivity

In [None]:
!jupyter contrib nbextension install --user

### Magic commands - special commands in Jupyter Notebook

In [None]:
%lsmagic #show all magic commands

In [None]:
%pwd #print the current working directory
%ls #show contents in the current directory
%who #use this to list all variables
%reset #Delete all variables and names defined in the current namespace
%time #Times a single statement

### Jupyter Notebook themes

In [None]:
!pip install --upgrade jupyterthemes

In [None]:
!jt -l

In [None]:
#https://github.com/dunovank/jupyter-themes
!jt -t monokai -f anka -fs 13 -nf ptsans -nfs 11 -ofs 13 -tfs 13 -N -kl -cursw 5 -cursc r -cellw 100% -T

In [None]:
!jt -r

### Method documentation - shift-tab

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("./bikes.csv")
df

In [None]:
df = pd.read_csv("./bikes.csv", sep=';')
df

# IDS-Instruction 2: Crash Course on Python

## 0. Start with Python
Python lets you write very concise code.

Java "Hello World!"

```
Public Class HelloWorld
{
  public static void main(String[] args)
  { 
   System.out.println("HelloWorld!");
  }

}

```

In [None]:
# Python Hello World !
print("Hello World!")

## 1. Python objects, basic types, and variables

Everything in Python is an **object**.
Every object in Python has a type.
Some of the basic types include:

    int (integer; a whole number with no decimal place)
        10
        -3
    float (float; a number that has a decimal place)
        7.41
        -0.006
    str (string; a sequence of characters enclosed in quotes)
        'this is a string using single quotes'
        "this is a string using double quotes"
    bool (boolean; a binary value that is either true or false)
        True
        False
    NoneType (a special type representing the absence of a value)
        None

In Python, a **variable** is a name you specify in your code that maps to:

a. A particular object

b. Object instance

C. or value


<font color='red'>
Attention! Names for variables can only contain:</font>

a. Letters

b. Underscores (_)

C. or numbers (no spaces, dashes, or other characters)

<font color='green'>Variable names must start with a letter or underscore.</font>

Python data types are either **mutable** or **immutable**; this influences the way in which variables are declared, passed as parameters, and modified. 

In [None]:
# Variable declaration
name = "Bob"
Age = 54
has_W2 = True
print(name, Age, has_W2)

In [None]:
# Variables start with alphas or underscore
_filed = False

In [None]:
_filed

In [None]:
# Variable (case sensitiveness)
age = 1
Age = 2
aGe = 3
AGE = 4
a_g_e = 5
_age = 6
age_ = 7
_AGE_ = 8
print(age, Age, aGe, AGE, a_g_e, _age, age_, _AGE_)

In [None]:
age = 80

In [None]:
age

In [None]:
# Reserved Word
for = 4
print(for)

In [None]:
x = 1
y = 1
z = 1

In [None]:
# Multiple Assignment
x = y = z = 1 
x,y,z = 1,2,"abcd"

In [None]:
False==0

In [None]:
# Operator Example : Is this expression True?
True != 3 + 1

In [None]:
# Operator Example: 
simple_string1 = 'an example'
simple_string2 = "oranges "
simple_string3 = simple_string1 + ' of using the + operator: ' + simple_string2
print(simple_string3)

## 2. Iteration and Condition 

In [None]:
# Prints out the numbers 0,1,2,3,4
for x in range(0,5,1):
    print(x)

In [None]:
# Prints out 3,4,5
for x in range(3,6):
    print(x)

In [None]:
# Prints out 3,5,7
for x in range(3, 8, 2):
    print(x)

In [None]:
count = 0
while True:
    print(count)
    count += 1
    if count >= 5:
        break

In [None]:
# Condition and Iteration Example: 
for x in range(10):
    if x % 2 == 0:
        continue
    print(x)

In [None]:
# Learn If and and For with an Example : 
# Python program to check if the input number is prime or not

num = 23

3,5,7,...

# take input from the user
#num = int(input("Enter a number: "))

isPrime = True

# if input number is less than
# or equal to 1, it is not prime
if num <= 1:
    isPrime = False
elif num > 1:
   # check for factors
   for i in range(2,num):
        if (num % i) == 0:
            isPrime = False
            print(i,"times",num//i,"is",num)
            break
            
if isPrime:
    print(num,"is a prime number")
else:
    print(num,"is not a prime number")

## 3. Python Basic Data Structure
    
    Note: mutable objects can be modified after creation and immutable objects cannot.

**Goal** : To group other objects together

The basic container types include:

    str (string: immutable; indexed by integers)
        Items are stored in the order they were added
        
    list (list: mutable; indexed by integers)
        Items are stored in the order they were added
        [3, 5, 6, 3, 'dog', 'cat', False]
        
    tuple (tuple: immutable; indexed by integers)
        Items are stored in the order they were added
        (3, 5, 6, 3, 'dog', 'cat', False)
        
    set (set: mutable; not indexed at all)
        Items are NOT stored in the order they were added, but in a item-defined ordering
        Can only contain immutable objects
        does NOT contain duplicate objects
        {3, 5, 6, 3, 'dog', 'cat', False}
    
    dict (dictionary: mutable; key-value pairs are indexed by immutable keys)
        Items are NOT stored in the order they were added, but in a item-defined ordering
        {'name': 'Jane', 'age': 23, 'fav_foods': ['pizza', 'fruit', 'fish']}

Defining *lists*, *tuples*, or *sets* : 

    seperate items with(,) 

Defining *dicts* : 

    seperate keys and values with(:)
    
    seperate pairs with(,)

**Strings, lists, and tuples are all sequence types that can use the +, *, +=, and *= operators.**

In [None]:
str10 = "HELLO"
str10

In [None]:
# Hallo, hier ist Deutschland!
str10[1] = 'A'

In [None]:
str10 = "J." + str10[3:]
str10

In [None]:
# Assign some containers to different variables
list1 = [3, 5, 6, 3, 'dog', 'cat', False]
tuple1 = (3, 5, 6, 3, 'dog', 'cat', False)
set1 = {3, 5, 6, 3, 'dog', 'cat', False}
dict1 = {'name': 'Conor McGregor', 
         'age': 34, 
         'born': 'Dublin',
         'division': ['Lightweight', 'Featherweight', 'Welterweight'],
         'MMA_records': {'win':22, 'losses':6},
         'fav_foods': ['Pizza kebab', 'KiWi', 'fish']}

In [None]:
# Items in the list object are stored in the order they were added
list1

In [None]:
# Items in the tuple object are stored in the order they were added
tuple1

In [None]:
# Items in the set object are not stored in the order they were added
# Also, notice that the value 3 only appears once in this set object
set1

In [None]:
# Items in the dict object are not stored in the order they were added
dict1

In [None]:
# Add and re-assign
# list1 = [3, 5, 6, 3, 'dog', 'cat', False]
# list1 = list1 + [5, 'grapes']
list1 += [5, 'grapes']
list1

In [None]:
# Add and re-assign
tuple1 += (5, 'grapes')
tuple1[:-3]

In [None]:
# Multiply
[1, 2, 3, 4] * 2

In [None]:
# Multiply
(1, 2, 3, 4) * 3

### Accessing data in containers
Subscript notation (square brackets) to access data at an index for: 
-  strings
-  lists
-  tuples
-  dicts



    Note: sets are not indexed, so we cannot use subscript notation to access data elements.



In [None]:
# Access the first item in a sequence
# list1 = [3, 5, 6, 3, 'dog', 'cat', False, 5, 'grapes']
list1[0]

In [None]:
# Access the last item in a sequence
# tuple1 = (3, 5, 6, 3, 'dog', 'cat', False, 5, 'grapes')
tuple1[-1]

In [None]:
# Access a range of items in a sequence
# simple_string1 = 'an example'
simple_string1[3:8]

In [None]:
# Access a range of items in a sequence
# tuple1 = (3, 5, 6, 3, 'dog', 'cat', False, 5, 'grapes')
tuple1[:-3]

In [None]:
# Access a range of items in a sequence
# list1 = [3, 5, 6, 3, 'dog', 'cat', False, 5, 'grapes']
list1[4:]

In [None]:
dict1

In [None]:
# Access an item in a dictionary
dict1['name']

In [None]:
# Access an element of a sequence in a dictionary
dict1['fav_foods'][2]

### List comprehension
List comprehension is a quick way to iterate through the elements of a list.

Normally, you can iterate through elements of a list with a loop. The list comprehension allows you to manipulate lists in a very concise way to obtain new lists, and it is usually more efficient than looping. The syntax is:

`[func(element) for element in list]`

the list comprehension can also have conditions on the selection of elements:

`[func(element) for element in list if {condition on element}]`

In [None]:
list2 = [1,2,3,4,5,6]

# Basic loop
listdouble = []
for i in range(0, len(list2)):
    listdouble.append(list2[i]*2)
print(listdouble)

In [None]:
# Slightly better way for looping
listdouble = []
for item in list2:
    listdouble.append(item*2)
print(listdouble)

In [None]:
# The best way: list comprehension
listdouble = [item*2 for item in list2]
print(listdouble)

In [None]:
# Conditions in list comprehension: let's take just the odd numbers
doubleodds = [item*2 for item in list2 if item%2==1]
print(doubleodds)

## 4. Python Functions

### 4.1. build-in functions

-  A function is a Python object that you can "call" to perform an action or compute and return another object

-  Some functions allow you to pass arguments inside the parentheses 
>-  (separating multiple arguments with a comma).
- Inside the function, these arguments are treated like variables.

A small sample of python useful built-in functions: 

    type(obj) to determine the type of an object
    
    len(container) to determine how many items are in a container
    
    callable(obj) to determine if an object is callable
    
    sorted(container) to return a new list from a container, with the items sorted
    
    sum(container) to compute the sum of a container of numbers
    
    min(container) to determine the smallest item in a container
    
    max(container) to determine the largest item in a container
    
    abs(number) to determine the absolute value of a number
    
    repr(obj) to return a string representation of an object
    
    dict(keyword=arguments)

In [None]:
# Use the type() function to determine the type of an object
type(simple_string1)

In [None]:
dict1

In [None]:
# Use the len() function to determine how many items are in a container
len(dict1)

In [None]:
dict(name='Conor McGregor',
     age=34,
     born='Dublin',
     division=['Lightweight', 'Featherweight', 'Welterweight'],
     MMA_records={'win': 22, 'losses': 6},
     fav_foods=['Pizza kebab', 'KiWi', 'fish'])

In [None]:
simple_string2

In [None]:
# Use the len() function to determine how many items are in a container
len(simple_string2)

In [None]:
# Use the sorted() function to return a new list from a container, with the items sorted
# - notice that capitalized strings come first
sorted(['dogs', 'cats', 'zebras', 'Chicago', 'California', 'ants', 'mice'])

In [None]:
# Use the repr() function to return a string representation of an object
i1 = 123
repr(i1)

### 4.2 self-defined functions

A function is a block of code that only runs when it is called.

A function can return something as a result.

To call a function, use the function name followed by parenthesis.


```

def functionname(parameters):
   "function_docstring"
   do something
   
   return result
   
```

In [None]:
def helloWorld(language='english'):
    if language == 'english':
        print('Hello World!')
    elif language == 'german':
        print("Hallo Welt!")
    elif language == 'dutch':
        print("Hallo Wereld")
    elif language == 'korean':
        print("안녕하세요 세계")
    else: 
        print('Hello World!')
    
helloWorld(language='korean')

In [None]:
# define a function that takes a number and returns True if it is a prime number, False otherwise
def checkIsPrime(num):
    
    isPrime = True
    # if input number is less than
    # or equal to 1, it is not prime
    if num <= 1:
        isPrime = False
    elif num > 1:
       # check for factors
       for i in range(2,num):
            if (num % i) == 0:
                isPrime = False
                break

    return isPrime

In [None]:
# define a function that takes a number (num) and returns a list of all prime numbers <= num
# hint: use list comprehension and the function you just defined
def getAllSmallerPrime(num):
    
    allSmallerPrime = [n for n in range(num+1) if checkIsPrime(n)]
    return allSmallerPrime

## 5. Object Oriented Programming
### Python object attributes (methods and properties)

Different types of objects in Python have different attributes

To access an attribute of an object, use a dot (.)
#### Method

<font color= "green">
    When an attribute of an object is a callable, that attribute is called a method</font>

It is the same as a function, only this function is bound to a particular object.

#### Property
<font color = "green">When an attribute of an object is not a callable, that attribute is called a property</font>
It is just a piece of data about the object, that is itself another object.

The built-in *dir()* function can be used to return a list of an object's attributes.


##### Sample methods for string objects

    .capitalize() to return a capitalized version of the string (only first char uppercase)
    
    .upper() to return an uppercase version of the string (all chars uppercase)
    
    .lower() to return an lowercase version of the string (all chars lowercase)
    
    .count(substring) to return the number of occurences of the substring in the string
    
    .startswith(substring) to determine if the string starts with the substring
    
    .endswith(substring) to determine if the string ends with the substring
    
    .replace(old, new) to return a copy of the string with occurences of the "old" replaced by "new"


In [None]:
# Assign a string to a variable

test_string = 'tHis is a sTriNg'

In [None]:
test_string

In [None]:
# Return a capitalized version of the string
test_string.capitalize()

In [None]:
# Return an lowercase version of the string
test_string.lower()

In [None]:
# Return an uppercase version of the string
test_string.upper()

In [None]:
test_string

In [None]:
# determine if the string ends with the substring
test_string.endswith('x')

In [None]:
# Count number of occurences of a substring in the string
test_string.count('I')

In [None]:
# Count number of occurences of a substring in the string after a certain position
test_string.count('I', 7)

In [None]:
# what would be the output of the code below ? 
test_string.upper().replace('sTriNg', 'New String')

In [None]:
# True or False?
callable(test_string.capitalize())

##### Sample methods on list objects

    .append(item) to add a single item to the list
    
    .extend([item1, item2, ...]) to add multiple items to the list
    
    .remove(item) to remove a single item from the list
    
    .pop() to remove and return the item at the end of the list
    
    .pop(index) to remove and return an item at an index



In [None]:
# Add a single item to the list
color_list=["Red", "Blue", "Green", "Black"]
color_list.append("Yellow")
print(color_list)

In [None]:
# Insert a single value at a specific position 
color_list.insert(2, "White")
print(color_list)

In [None]:
color_list.pop()

In [None]:
color_list[-1]

### Class

A class is a blueprint for the object.
The example for class of RWTHemployee can be :

In [None]:
class RWTHemployee:
    #class attribute
    affiliation = "RWTH"
    
    def __init__(self,name,group,task,quote):
        #instance attribute
        self.name=name
        self.group=group
        self.task=task
        self.quote=quote
        
    def introduce2():
        print("I am a human")
        
    def introduce(self):
        print("I am working at {}.".format(self.group))
    
    def work(self):
        print("I love {}!".format(self.task))
        
    def give_quote(self):
        print('"{}", {}'.format(self.quote,self.name))

In [None]:
wil = RWTHemployee("Prof. Wil van der Aalst",
                   "PADS", 
                   "Process Mining", 
                   "This Petri net theory is so beautiful that makes you cry.")

In [None]:
ramsay = RWTHemployee("Gordon Ramsay",
                      "Mensa Vita", 
                      "Cooking",
                      "This pizza is so disgusting, if you take it to Italy you’ll get arrested.")

In [None]:
conor = RWTHemployee("Conor McGreger",
                      "Informatikzentrum", 
                      "securing the Informatikzentrum",
                      "When you sign to fight me, it’s a celebration. You ring back home, you ring your wife – baby, we’ve done it. We’re rich, baby. Conor McGregor made us rich.")

In [None]:
# access the class attributes
print("Wil is affiliated with {}".format(wil.__class__.affiliation))
print("Ramsay is also affiliated with {}".format(ramsay.__class__.affiliation))

# access the instance attributes
print("{} works at {}.".format(wil.name, wil.group))
print("{} works at {}.".format(ramsay.name, ramsay.group))
print("{} works at {}.".format(conor.name, conor.group))

In [None]:
# call some methods
conor.work()

In [None]:
# call some methods
ramsay.give_quote()
wil.give_quote()
conor.give_quote()

## 6. Python Package

### Math Package

As a data scientist, some notions of geometry never hurt. Let's refresh some of the basics.

For a fancy clustering algorithm, you want to find the circumference C
and area A of a circle. When the radius of the circle is r, you can calculate C and A

as:

C = 2πr

A = πr²

To use the constant pi, you will need the math package. A variable r is already coded in the script. Fill in the code to calculate C and A and see how the print() functions create some nice printouts.

**Now Your Turn!**

**Instruction**
-  import the math package. Now you can access the constant pi with math.pi.
-  Calculate the circumference of the circle and store it in C.
-  Calculate the area of the circle and store it in A. math.pow(base, exp) could be useful.

In [None]:
import math

In [None]:
# your answer 


# Definition of radius
r = 0.43

# Calculate C
C = 2 * math.pi * r


# Calculate A
A = math.pi * (r**2)

# Build printout
print("Circumference: " + str(C))
print("Area: " + str(A))

### Numpy Package

**Numpy is a powerful package to do data science.**

A list *sportclass* has already been defined in the Python script, representing the height of some students in centimeters.

Can you add some code here and there to create a Numpy array from it?

**Now Your Turn!**

**Instruction**
-  Import the numpy package as np, so that you can refer to numpy with np.
-  Use np.array() to create a Numpy array from sportclass. Name this array np_sportclass.
-  Print out the type of np_sportclass to check that you got it right.


In [None]:
# Import the numpy package as np
import numpy as np

In [None]:
# Answer 

# Create list sportclass
sportclass = [180, 215, 210, 210, 188, 176, 209, 200]

# Create a Numpy array from sportclass: np_sportclass
np_sportclass = np.array(sportclass)

# Print out type of np_sportclass
print(type(np_sportclass))
print(np_sportclass)

Let's try to create a 2D Numpy array from a small list of lists.

In this exercise, sportclass  is a list of lists. 

The main list contains 4 elements. Each of these elements is a list containing the height and the weight of 4 students, in this order. sportcalass is already coded for you in the script.

**Now Your Turn!**

**Instruction**

-  Use np.array() to create a 2D Numpy array from sportclass. Name it np_sportclass.
-  Print out the type of np_sportclass.
-  Print out the shape attribute of np_sportclass. Use np_sportclass.shape.


In [None]:
# Create sportclass, a list of lists
sportclass = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Import numpy
import numpy as np

# Create a 2D Numpy array from sportclass: np_sportclass
np_sportclass = np.array(sportclass)

# Print out the type of np_sportclass
print(type(np_sportclass))


# Print out the shape of np_sportclass
print(np_sportclass.shape)


# Pandas: acquiring and exploring a CSV file
Pandas is a Python package that offers many functionalities to explore and edit tabular data. Panda's structure dataframe can support large dataset, and it is the standard library for fine data preprocessing.

Let's acquire data from a CSV file using the read_csv function. By default, it assumes that the fields are comma-separated.

### bike dataset

This CSV contains some cyclist data from Montréal. It's a list of how many people were on 7 different bike paths in Montreal each day.

In [None]:
import pandas as pd
dataf = pd.read_csv('bikes.csv')
print(dataf[:5])

The data is a mess. This is due to the fact that this dataset is non-standard: we are going to acquire it again specifying the separator ';' and the date format.

In [None]:
dataf = pd.read_csv('bikes.csv', sep=';', parse_dates=['Date'], dayfirst=True, index_col='Date')
print(dataf[:5])

From a dataframe you can select columns:

In [None]:
print(dataf['Rachel1'])

How can we plot the data? If we include pyplot from the matplotlib package, it is extremely easy. Just add .plot() at the end.

In [None]:
import matplotlib.pyplot as plt
dataf['Rachel1'].plot()
plt.show()

The plot command has a host of options. For example, I can specify the dimensions of the resulting plot.

In [None]:
dataf['Rachel1'].plot(figsize=(15, 15))

It also supports many different kinds of plots. Here is an example of a histogram, calculated on the first ten days of the year.

In [None]:
dataf['Rachel1'][:10].plot(kind='bar', figsize=(15, 15))

### Forbes billionaires dataset
Let's look at another csv file regarding the Forbes billionaires in 2022

In [None]:
df = pd.read_csv('2022_forbes_billionaires.csv')
df.head()

In [None]:
df.drop(columns='Unnamed: 0', inplace=True)
df.head()

In [None]:
# Plot a histrogram to see the disttibution of age
df.hist(column='age',figsize=(10, 6))

In [None]:
# Find the billionaires younger than 30
df[df.age < 30]

In [None]:
# Find the billionaires in the technology industry
df[df.industry == 'Technology ']

In [None]:
# Find the top 20 countries with the most billionaires 
top20country=df.groupby('country').size().sort_values(ascending=False)[:20]
top20country

In [None]:
# Plot a bar chart for country
top20Country.plot(kind='bar',figsize=(10, 6))

In [None]:
# Plot a pie chart for the industry
industryCount = df.groupby('industry').size().sort_values(ascending=False)
industryCount

In [None]:
industryCount.plot(kind='pie',figsize=(10, 10))