# Day 1 class material: Functions, modules, exception, web scrapping

- Programming in Python for Business and Life Science Analytics (MGT001437, englisch)
- School of Management & School of Life Sciences, <span style = "color: blue">Technical University of Munich</span> 

Today we are going to learn __functions__, __modules__, __exception handling__, __web scrapping__, __OS module__ and __file/folder handling__.

### Functions
- create your own functions
- __def__ stands for "definition"
- When you define variables inside a function definition, they are __local__ to this function by default (even if the name is the same)

In [None]:
# function initialization
def function_name(function_arguments):
    pass    # indented statement block

In [1]:
# example 1: user-defined functions
# Defining a function in maths, e.g. f(x) = 5^x 
def f_justprint(x):
    print(f"5 to the power of {x} is {5**x}.")

In [2]:
f_justprint(4)

5 to the power of 4 is 625.


You must define a function first before calling it.  
- Functions can have also zero arguments, e.g.   
- or more than one argument (arguments can be of any type), e.g.

In [6]:
# zero arguments
def printHelloWolrd():
    print("Hello World!")

printHelloWolrd()

Hello World!


In [None]:
# mutiple arguments
test = 4 # test is a global variable

def print_sum_twice(x,y): # x,y here are called local variables, need to be defined in the beginning of the function
    print(x + y)
    print(x + y)
    print(test)

print_sum_twice(2,4)  

6
6
4


- Python assumes variables are local, if not otherwise declared. 
- Reason: Global variables are generally bad practice and should be avoided. In most cases where you are tempted to use a global variable, it is better to utilize a parameter for getting a value into a function or return a value to get it out.


In [None]:
# global and local comparison

test = 4 #global variable

def sum(var):
    var += 1
    test = 7 # local variable
    print(f"is function, var {var}")
    print(f"is function, test {test}")
    
sum(5)
"""
There is an assignment to test, so this is a local variable in that block. 
The test variable outside the block is a global variable.
"""

print(test) # global test


is function, var 6
is function, test 7
4


In [10]:
"""
A variable (test in this case) can't be both local and global inside of a function.
"""

test = 4

# will return an UnboundLocalError
def sum(var):
    print(f"is function, test {test}")
    var += 1
    test = 7 # local variable
    print(f"is function, var {var}")
    print(f"is function, test {test}")
    
sum(5)

UnboundLocalError: local variable 'test' referenced before assignment

Use the keyword __global__ to tell Python that you want to use a global variable

In [None]:
# to avoid dubble assignment

test = 4


def sum(var):
    global test
    print(f"is function, test {test}")
    var += 1
    test = 7 # local variable
    print(f"is function, var {var}")
    print(f"is function, test {test}")

    
sum(5)

- __return__ statement allows your function to return a value (otherwise it returns the special value None)
_ Once a value from a function is returned, the function __stops being executed immediately__


In [None]:
# comparison

def f_justprint(x):
    print(f"5 to the power of {x} is {5**x}.")

f_justprint(2)

print(f_justprint(2))    # Output: None

def f_withReturn(x):
    return 5**x

y = f_justprint(2)
print(y)

In [None]:
# Once a value from a function is returned, the function stops being executed immediately
#Comparison
def min(x,y):
    if x<= y:
        return x
    else:
        return y

print(min(4,9))

z = min(3,1)
print(z)

4
1


- Although created differently from normal variables, functions are like any other kinds of value.
- They can be __assigned__ and __re-assigned__ to variables.


In [14]:
def multiply(x,y):
    return x*y

x = 2
y = 3
product = multiply
print(product(x,y))
print(multiply(x,y))

6
6


- Functions can also be used as __arguments__ of other functions (Functional Programming)


In [18]:
def add(x,y):
    return x + y

def do_twice(func, x, y):
    return func(func(x,y),func(x,y))

a = 2
b = 3

print(do_twice(add, a, b))

10


In [None]:
def find_dependencies(project_dir='')

__Recursion__ in Python refers to the process where a function calls itself directly or indirectly during its execution. This technique is often used to solve problems that can _be broken down into smaller, similar problems_. Recursive functions typically include a __base__ case that terminates the recursion and a __recursive__ case that applies the same procedure to a sub-problem.  
- Recursive Case: This is the part of the function where it calls itself to work on a smaller portion of the problem.
- Base Case: This is the condition under which the recursion ends. Without a base case, a recursive function would continue to call itself indefinitely, leading to a stack overflow error.  
__Use case__: Backtracking Algorithms for constraint satisfaction problems like __combinatorial optimization__, decision making; tree Traversal: Each recursive call can process one node of the tree and then call itself on the children of this node.  
__Generally speacking__, recursive functions are powerful but need to be used wisely to avoid performance issues and stack overflow errors if the __recursion depth__ becomes too large.

In [None]:
# Recursice case: n! = n × (n−1) × (n−2) × … × 1 # factorial of N
# Base case: 0! = 1 (by definition)


import pandas as pd

pd.DataFrame()


def factorial(n):
    # Base case: if n is 0, return 1
    if n == 0:
        return 1
    # Recursive case: n times the factorial of n-1
    else:
        return n * factorial(n-1)

print(factorial(5)) 

### Modules
Pieces of code (.py files consisting of functions and values) that someone else has written to do a common task, e.g. mathematical operations
- Add __import module_name__ at the top of your code
- Add __from module_name import var__ at the top of your code
- Import modules or objects under a different name using the __as__ keyword
- Use __module_name.var__ to access functions and values with the name var in the module

```python
# Path handling
import os
import Path
import shutil
import zipfile
# Data handling
import pandas as pd
import numpy as np
import xarray as xr
import json
import string
from random import randint #work with randome number
import time
import re
# API
import requests
# Image handling
import cv2
from PIL import Image, ImageDraw, ImageFont
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
# Machine learning
import sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import RandomForestRegressor
import xgboost as xgb
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Deep learning
import tensorflow as tf
import torch
from tensorflow import keras
from ultralytics import YOLO
# Web development
from flask import Flask, render_template, request, redirect, send_file, url_for, Response
```


### Exception 
- Exceptions occur when something goes wrong due to incorrect code or input
- Without exception-handling, your program will terminate immediately in case of an exception
- To handle exceptions use try and except  statements: try statement contains code that may __throw an exception__, except statement __defines what to do if a particular exception is thrown__


In [10]:
# Basic Syntax
try:
    # Code block where exception can occur
    result = 10 / 0
except ZeroDivisionError:
    # Code block that handles the exception
    print("You can't divide by zero!")
    

You can't divide by zero!


In [None]:
# Multiple Exceptions
try:
    x = int(input("Enter a number: "))
    result = 10 / x
except ZeroDivisionError:
    print("You can't divide by zero!")
except ValueError:
    # Handles value errors (e.g., if input is not an integer)
    print("Invalid input. Please enter a valid integer.")


In [None]:
# All Exceptions
try:
    result = '10' + 2
except:
    # General exception handler
    print("An error occurred.")

__else Block__: This runs if the code in the try block did not raise an exception.  
__finally Block__: This runs no matter what, and is often used to perform clean-up actions.

In [19]:
# else and finally Clauses
try:
    print("Trying to open the file...")
    file = open('file.txt', 'r')
    data = file.read()
except FileNotFoundError:
    print("File not found.")
else:
    print("File opened successfully.")
    file.close()
finally:
    print("This will run regardless of what happens.")


Trying to open the file...
File not found.
This will run regardless of what happens.


In [2]:
# integrate with function

def get_integer_from_user():
    while True:
        try:
            return int(input("Please enter a number: "))
        except ValueError:
            print("That was not a valid number. Please try again.")
        except KeyboardInterrupt:
            print("\nNo input taken. Exiting.")
            break
        finally:
            print("Attempt to input a number done.")


get_integer_from_user()

Attempt to input a number done.


4

### Web scrapping
- We'll take a look at two main useful packages for web scrapping in python: __beautifulsoup__ and __requests__.
- Import these packages, get all of the HTML from our website, make sure it's a usable state.

- import libraries  
__bs4__ is a module name  
if you haven't installed these two packages, use conda install we mentioned last week to install these two packages

In [24]:
# pip install bs4
# pip install requests

Collecting bs4
  Downloading bs4-0.0.2-py2.py3-none-any.whl (1.2 kB)
Installing collected packages: bs4
Successfully installed bs4-0.0.2
Note: you may need to restart the kernel to use updated packages.


In [12]:
from bs4 import BeautifulSoup
# takes the really messy HTML or XML, makes it into this kind of beautiful soup
import requests

- specify the HTML link(where you're taking the HTML from)

HTML is the standard language used to create and design web pages. It uses tags (elements enclosed in angle brackets) to structure content. Most tags have an opening tag like __\<tag\>__ and a closing tag like __\</tag\>__, and they can contain text, other tags, or both.

In [13]:
# example: https://docs.dnb.com/partner/en-US/iso_country_codes
# assign it to a variable:
url = "https://docs.dnb.com/partner/en-US/iso_country_codes"

In [14]:
# send a "get" request to that url 
requests.get(url)

<Response [200]>

In [13]:
# send a "get" request to that url 

# return a response
# get a snapshot of the webpage (static)

<Response [200]>

possible responses:  
- __200__: OK – the request was successful, and the server returned the expected data.
- __204__: No content in the actural page.
- __404__: Not Found – the requested resource was not found on the server.
- __500__: Internal Server Error – a generic error message when the server fails.
- __403__: Forbidden – the request was valid, but the server is refusing action.
- __400__: Bad request.

In [16]:
# name the requests
response = requests.get(url)
response.raise_for_status()

# two parameters
# what are we going to retriving from this page, how we are going to parse the page
# Sign it to a variable
soup = BeautifulSoup(response.text, "html")

In [17]:
soup

<!DOCTYPE html>
	var d=document.getElementById("trustarc-wrapper"),e=document.createElement("div"),f=document.createElement("div");e.id="teconsent",e.classList.add("cookie-icon"),f.id="consent-banner",d.appendChild(e),d.appendChild(f),window.addEventListener("load",function(){var t=Array.prototype.slice.call(document.getElementsByClassName("trustarc_settings_link")).concat(Array.prototype.slice.call(document.querySelectorAll('a[title^="trustarcsettings"]')));for(var n=0;n<t.length;n++)t[n].onclick=function(t){truste.eu&&truste.eu.clickListener()}});
	</script><style>.cookie-icon{position:fixed;bottom:1rem;left:1rem;z-index:999;width:50px;height:50px;box-shadow:0 4px 8px 0 rgba(0,0,0,.2);border-radius:50%}.cookie-icon a{display:block;width:100%;height:100%;text-indent:-9999px;background-image:url("data:image/svg+xml;utf8,%3Csvg%20viewBox%3D%220%200%20130%20130%22%20xmlns%3D%22http%3A%2F%2Fwww.w3.org%2F2000%2Fsvg%22%3E%0A%3Cdefs%3E%0A%3Cstyle%3E.cls-1%7Bfill%3A%23fff%3B%7D.cls-2%7Bfill%3

In [6]:
# try to print

# type(soup)

# prettify(), a bit easier to look and visualize

bs4.BeautifulSoup

The __\<div\>__ tag is a block-level element used for __grouping HTML elements__. It’s like a __box__ that can contain any other elements. It’s incredibly common and useful for structuring a web page.  
- __\<table\>__ is the container for the __table__ elements.
- __\<tr\>__ stands for __table row__.
- __\<td\>__ stands for __table data__ and represents a __cell__ in the table.
- __\<p\>__ stands for paragraph.

```html
<div class="content">
  <table>
    <tr>
      <td>Name</td>
      <td>Age</td>
    </tr>
    <tr>
      <td>John Doe</td>
      <td>30</td>
    </tr>
  </table>
</div>
```

In [19]:
# find and find_all methods
soup.find_all("table")
# "class"


[<table class="TableStyle-dnb_tablestyle_2"><tbody><tr><td>WorldBase Country Name</td><td>WorldBase
     3 digit</td><td>WorldBase
     2 char</td><td>ISO
  Country Name</td><td>ISO
     3 digit</td><td>ISO
     3 char</td><td><span>ISO
     2 char</span></td><td>Comments</td></tr><tr><td>Aruba</td><td>034</td><td>AA</td><td>Aruba</td><td>533</td><td>ABW</td><td>AW</td><td> </td></tr><tr><td>Afghanistan</td><td>005</td><td>AF</td><td>Afghanistan</td><td>004</td><td>AFG</td><td>AF</td><td> </td></tr><tr><td>Angola</td><td>025</td><td>AO</td><td>Angola</td><td>024</td><td>AGO</td><td>AO</td><td> </td></tr><tr><td>Anguilla</td><td>027</td><td>AL</td><td>Anguilla</td><td>660</td><td>AIA</td><td>AI</td><td> </td></tr><tr><td>Albania</td><td>009</td><td>AB</td><td>Albania</td><td>008</td><td>ALB</td><td>AL</td><td> </td></tr><tr><td>Andorra</td><td>021</td><td>AN</td><td>Andorra</td><td>020</td><td>AND</td><td>AD</td><td> </td></tr><tr><td>Netherlands
  Antilles</td><td>525</td><td>NA</td><t

In [20]:
table = soup.find("table")

rows = table.find_all("tr")

In [12]:
[td.text.strip() for td in rows[0].find_all('td')]

['WorldBase Country Name',
 'WorldBase\n    3 digit',
 'WorldBase\n    2 char',
 'ISO\n Country Name',
 'ISO\n    3 digit',
 'ISO\n    3 char',
 'ISO\n    2 char',
 'Comments']

In [24]:
data = []
for row in rows:
    cells = row.find_all("td")
    cells = [td.text.strip() for td in cells]
    data.append(cells)

In [25]:
import pandas as pd

df = pd.DataFrame(data) # dataframe
df

Unnamed: 0,0,1,2,3,4,5,6,7
0,WorldBase Country Name,WorldBase\n 3 digit,WorldBase\n 2 char,ISO\n Country Name,ISO\n 3 digit,ISO\n 3 char,ISO\n 2 char,Comments
1,Aruba,034,AA,Aruba,533,ABW,AW,
2,Afghanistan,005,AF,Afghanistan,004,AFG,AF,
3,Angola,025,AO,Angola,024,AGO,AO,
4,Anguilla,027,AL,Anguilla,660,AIA,AI,
...,...,...,...,...,...,...,...,...
255,Yemen South,857,YM,Yemen,887,YEM,YE,
256,Yugoslavia,861,YU,Yugoslavia,891,YUG,YU,
257,South Africa,685,SA,South Africa,710,ZAF,ZA,
258,Zambia,869,ZA,Zambia,894,ZMB,ZM,


In [56]:
header = df.iloc[0]
content = df.iloc[1:]

In [57]:
content.columns = header
content

Unnamed: 0,WorldBase Country Name,WorldBase\n 3 digit,WorldBase\n 2 char,ISO\n Country Name,ISO\n 3 digit,ISO\n 3 char,ISO\n 2 char,Comments
1,Aruba,034,AA,Aruba,533,ABW,AW,
2,Afghanistan,005,AF,Afghanistan,004,AFG,AF,
3,Angola,025,AO,Angola,024,AGO,AO,
4,Anguilla,027,AL,Anguilla,660,AIA,AI,
5,Albania,009,AB,Albania,008,ALB,AL,
...,...,...,...,...,...,...,...,...
255,Yemen South,857,YM,Yemen,887,YEM,YE,
256,Yugoslavia,861,YU,Yugoslavia,891,YUG,YU,
257,South Africa,685,SA,South Africa,710,ZAF,ZA,
258,Zambia,869,ZA,Zambia,894,ZMB,ZM,


In [60]:
content.to_csv("isowithindex.csv", index = False)

In [62]:
indexdf = pd.read_csv("isowithindex.csv")
new = indexdf.iloc[:,[2,3] ]

Unnamed: 0,WorldBase\n 3 digit,WorldBase\n 2 char
0,34.0,AA
1,5.0,AF
2,25.0,AO
3,27.0,AL
4,9.0,AB
...,...,...
254,857.0,YM
255,861.0,YU
256,685.0,SA
257,869.0,ZA


### OS module
The os module in Python is a standard utility module that provides a portable way of using operating system dependent functionality. It includes a wide range of functions to interact with the underlying operating system in several ways, like __manipulating file system paths__, __executing shell commands__, and __getting or setting the process environment__.

- Get Current Working Directory

In [27]:
import os
current_directory = os.getcwd()

print("Current Directory:", current_directory)

Current Directory: /Users/faye/Desktop/Winter_W2


- Change Directory

In [32]:
os.chdir('/Users/faye/Desktop/IMAGE')
print("Directory changed to:", os.getcwd())

Directory changed to: /Users/faye/Desktop/IMAGE


- List Files and Directories

In [64]:
files_and_dirs = os.listdir('.')
print("Files and directories in current directory:", files_and_dirs)

Files and directories in current directory: ['iso.csv', 'Lecture3_pre.pptx', 'gwasprocessing.ipynb', '.DS_Store', 'Lecture3.pptx', 'W3.ipynb', 'isowithindex.csv', '~$time schedule.xlsx', 'GWAS', 'Reading_before_exercises.pdf', 'MGT001437_Project', '~$Lecture3_pre.pptx', 'W3exercise_studentversion.ipynb', 'W3_demonstration.ipynb', 'W3exercise.ipynb', 'group.numbers', 'time schedule.xlsx']


- Make New Directory

In [37]:
os.mkdir('new_folder')


- Rename Files or Directories

In [65]:
import os
os.rename('iso.csv', 'sss.csv')

- Join Paths

In [None]:
full_path = os.path.join('directory', 'myfile.txt')

# mac: /
# windows: \
              
print("Full Path:", full_path)

- Split Path

In [None]:
path, filename = os.path.split('/my/directory/myfile.txt')
print("Path:", path, "Filename:", filename)

- Check if File Exists

In [None]:
if os.path.exists('/path/to/file'):
    print("File exists.")
else:
    print("File does not exist.")


- Get File Size

In [None]:
size = os.path.getsize('/path/to/file')
print("File size:", size, "bytes")