# Python Essentials
## DAT540 Introduction to Data Science
## University of Stavanger

#### Antorweep Chakravorty (antorweep.chakravorty@uis.no)

## Functions
- Method of code organization
- Groups statements that perform the same or very similar code more than once, so as to be reused multiple times
- Makes code more readable by giving a name to a group of python statements
- Functions are declared with the *def* keyword
- Functions may have arguments that refers to objects on which computation would be performed
- Function arguments may be optional with default values, in such cases, the arguments assumes the default value if they are not presented when the function is called
- Function can return computational results with the *return* keyword
- Multiple return statement might be present in a function, controlled through a control flow statement. 
- If python reaches the end of the code, without encountering a return statement, *None* is returned automatically
```python
def change(x, y=10):
    return (x - y) / x
a = 10
b = 20
result1 = percentage_change(a, b)
result2 = percentage_change(a)
result3 = percentage_change(a, y=b)
```

 - **Namespaces, Scope, and Local Functions**
  - Functions can access variables in two different scopes: *global* and *local*
  - Variable scope is also described as *namespace*.
  - Any variables that are assigned within a function, by default are assigned to the local namespace
  - The local namespace is created when the function is called and is destroyed after the function execution finishes.
  - global objects can be accessed from within a local scope but not directly assigned to

In [None]:
def anyFName(b): 
    #...
    #  
    return (True, b)



anyFName(100)

In [None]:
a = []
def xyz():
  a = [] # Local
  a.append(1)
xyz() # Call function
print(a)  

In [None]:
a = []
def xyz():
  a.append(1) # Global
xyz() # Call function
print(a)

```python
a = 1
def x():
    a += 2
    print(a)
x()
print(a)
```
**Class Exercise**
- what happens in this case

In [None]:
# ...

  - In order assign to variable outside the scope of the function, the *global* keyword needs to be used

In [None]:
a = 10
def xyz():
  global a
  a += 1 # Global
xyz() # Call function
print(a)

 - **Returning multiple values**
  ```python
  def f(returnType = ()):
    a, b, c = 1, 2, 3
    if isinstance(returnType, list):
      return [a, b, c]
    elif isinstance(returnType, set):
      return {a, b, c}
    elif isinstance(returnType, dict):
      return {'a': a, 'b': b, 'c': c}
    
    return a, b, c # Return as a tuple
  result1 = f()
  result2 = f([])
  ```
**Class Exercise**
- what happens in this case

 - **Functions as Objects**
  - E.g.: Data Cleaning using a sequence of transformations through functions
  - Suppose we have a string and we want to prepare it for analysis. In order to do so we need to perform a set of operations to ensure that each word is properly tokenized by stripping whitespaces, removing punctuation symbols and standardizing on proper capitalization.
   - The *re* standard library module for regular expression can be used

In [None]:
string = '''Hello WOrld, this is an examplE# string.    WE might have 
!Some  special$ charecter?? associated to some of the words here!!'''

In [None]:
# Checkout > https://www.debuggex.com/cheatsheet/regex/python

import re
def tokenize_string(string):
  tokens = string.split(' ')  
  tokens = list(filter(None, tokens)) # Remove empty strings  
  result = []
  for token in tokens:
    token = token.strip()
    token = re.sub('[!#?$.]', '', token) 
    # convert the 1st character in each word to Uppercase and remaining characters to lowercase 
    token = token.title() 
    result.append(token)
  return result

In [None]:
tokens = tokenize_string(string)
print(tokens)

  - However, alternatively we might want to create a list of operations and apply them sequentially to a set of strings

In [None]:
import re
def remove_punctuation(value):
  return re.sub('[!#?$.]', '', value) 

def tokenize_string_with_function_objects(string):
  tokenize_ops = [str.strip, remove_punctuation, str.title]
  
  tokens = string.split(' ')  
  tokens = list(filter(None, tokens)) # Remove empty strings  
  result = []
  for token in tokens:
    for function in tokenize_ops:
      token = function(token) # We are directly providing the value to the list of functions in tokenize_ops
    result.append(token)
  return result

In [None]:
# Alternately we can also use comprehentions.
pass


 - **Lambda Functions**
  - Python supports so-called anonymous functions or lambda functions, which are a way of writing functions consisting of a single statement that results a return value
  - Lambda Functions are defined with the *lambda* keyword
  - Convenient to use as arguments to transformation functions while analyzing or cleaning data sets
  
```python
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2
```



  - Examples:
    - Transforming a list of integers
    

In [None]:
def apply_to_list(some_list, f):
  return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

    - Transforming a list of strings

In [None]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']
# We pass a lambda method to the list's sort method
strings.sort(key=lambda x: len(set(x)))
strings

  - What do we get if we use this statement instead

```python
strings.sort(key=len)
```

In [None]:
strings.sort(key=len)
strings

 - **Generators**
  - Consistent way to iterate over sequences like objects in lists or lines in a file
  - Accomplished by means of the *iterator protocol*, a generic way to make objects iterable
  - An iterator is any object that will yield objects to the python interpreter when used in a context like a *for loop*
  - Most methods expecting a list or list-like object will also accept any iterable object such as *min*, *max*, *sum*, *list*, and *tuple*

In [None]:
a = [1, 2, 3, 4, 5]
a_iterator = iter(a)
print(type(a_iterator))

 - A *generator* is a concise way to construct a new iterable object
 - Generators return a sequence of multiple results lazily, pausing after each one until the next one is requested
 - The *yield* keyword is used instead of return in a function to create a generator
 - When a generator function is called, no node is executed immediately
 - Generator code is executed only when elements are requested from it

In [None]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [None]:
gen = squares()

In [None]:
next(gen)

In [None]:
list(gen) # we can also use a for loop to go through the generator list

  - **Generator expressions** are even more concise ways to create generators

In [None]:
gen = (x ** 2 for x in range(100))
gen

  - Generator expressions can be used instead of list comprehensions as function arguments in many cases

In [None]:
# Using list comprehensions
l_comp = [x ** 2 for x in range(100)]
# Using generator
l_gen = (x ** 2 for x in range(100))
print(type(l_comp), type(l_gen))
print(l_comp[:10])
print(l_gen[:10])

  - Generator vs Comprehensions

In [None]:
a_list = [x for x in range(5)] 
a_gen = (x for x in range(5)) 
print(a_list)
print(a_gen)
print('printing gen')
while True:
  print(next(a_gen))

  - Generators are better than Comprehensions in some cases but not all. 
  - Generators are better when memory is in demand
  - Generators don't support indexing or slicing. Therefore, it is not possible to directly access an element in an generator such a[0]
  - Generators can't be added to lists
  - Generators is suited better if we are interested in iterating once. 
  - Where as, Comprehensions are suited more if we want to store and use the generated results

 - **itertools** module is a standard library that has a collection of generators for many common data algorithms
  - Selected itertools functions

  <img src='./images/itertools.png' width=450>

In [None]:
# groupby - takes any sequence and a function, grouping consecutive elements 
#  in the sequence by the return value of the function
import itertools as it
first_letter = lambda x: x[0]
all_names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']
all_names.sort() # sorts the list in place. Returns None
gby_names = it.groupby(all_names, first_letter)# a generator
for letter, names in gby_names:
    print(letter, list(names))

In [None]:
# combinations tool returns the **r** length subsequences of elements from the input iterable.
#  Combinations are emitted in lexicographic sorted order. So, if the input iterable is sorted, 
#  the combination tuples will be produced in sorted order.
import itertools as it
A = [1, 2, 3, 4]
combinations = it.combinations(A, 2)# a generator
print(list(combinations))

In [None]:
# permutations tool returns successive **r** length permutations of elements in an iterable.
#  If **r** is not specified or is None, then  defaults to the length of the iterable, 
#  and all possible full length permutations are generated.
#  Permutations are printed in a lexicographic sorted order. So, if the input iterable is sorted, 
#  the permutation tuples will be produced in a sorted order.
import itertools as it
A = [1, 2, 3, 4]
permutations = it.permutations(A, 2)# a generator
print(list(permutations))

In [None]:
# product tool computes the cartesian product of input iterables. 
#  It is equivalent to nested for-loops. 
#  For example, product(A, B) returns the same as ((x,y) for x in A for y in B).
A = [1, 2]
B = [3, 4]
cartesian_product = it.product(A, B)# a generator
print(list(cartesian_product))

 - **Errors and Exception Handling**
  - Handling exception is an important part of building any robust program
  - Majority of standard data processing methods and modules are designed to work with certain kinds of inputs

In [None]:
print(float('1.234'))
print(float('xyz'))
print('abc')

  - In order to fail gracefully or to perform certain corrective action, we may enclose an statement within a *try/except* block

In [None]:
def convert(x):
  try:
    print(float(x))
  except ValueError:
    print('Check value')
  except:
    print('Check error')
  print('done')
  print('-' * 100)
  
  
convert('1.2d34')

- What happens when we execute this 
```python
print(convert([1,2,3]))
```

In [None]:
print(convert([1,2,3]))

In [None]:
# In order to handle this exception, we need to catch TypeErrors as well
def convert(x):
  try:
    return float(x)
  except (ValueError, TypeError):
    return 'Check value or type'
  
print(convert([1,2,3]))

  - We can enclose the exception types as a tuple to handle multiple exceptions
  - However, a choice should be made on which exceptions is to catch. As in the above case, a TypeError might indicate a legitimate bug in the program as the input was neither a numeric or string value
  - In order to catch all types of exceptions we just use the *except* keyword without any ErrorType

```python
try:
  #Do something
  pass
except:
  # Do someting
  pass
```
  - In certain cases we might want to suppress an exception, but would like some code to be executed regardless of whether the code in the *try* block succeeds or not. We can do this using *finally*
  - Similarly, we can have code that executes only if the *try* block succeeds using *else*
  
```python
f = open(path, 'w')
try:
  write_to_file(f)
except:
  print('write fail')
else:
  print('write success')
finally:
  f.close()
```

  - The amount of error stack information displayed can be controlled by using the **%xmode** magic command
  - The default value is *%xmode Plain*, it can be changed to *%xmode Verbose* to get even more details

In [None]:
%xmode Plain
def convert(x):
  try:
    return float(x)
  except ValueError:
    return 'Check value'
  
print(convert([1,2,3]))

In [None]:
%xmode Verbose
def convert(x):
  try:
    return float(x)
  except ValueError:
    return 'check Value'
  
print(convert([1,2,3]))

## Files and OS
 - Files can be opened in python for either reading or writing with their relative or absolute path
```python
path = './data/file1.txt'
f = open(path) # read-only
```
- By default python opens files only in read-only mode
- the file handler object *f* (in this case) can be used as a generator to iterate over the lines:
```python
for line in f:
        pass
```
- lines read by the python interpreter are separated based on the end-of-line (EOL) marker (\\n)
- Any file that is opened for either read or write operation, must be *explicitly* closed, so that the file releases its resources back to the OS
```python
f.close()
```
- Alternatively, we could use the *with* statement to open, perform operation and close the file, without needing to do them explicitly
- Once the *with* block gets executed the file is automatically closed
```python
with open(path) as f:
        lines = [x.rstrip() for x in f]
```

- Reading files in python can also be performed using methods such as *read*, *seek* and *tell*
 - *read*: returns a certain number of characters from the file and advances the file handle's position by the number of bytes read
 - *tell*: method gives the current position of the file handle
 - *seek*: changes the file position to the indicated byte in the file
- Writing files in python can be done by opening the file in write '*w*' mode and using the *write* or *writelines* methods.
- The *write* method just writes a sequence of chars to the file, whereas the *writelines* method writes a list of strings to the file

```python
with open('tmp.txt', 'w') as f:
    f.writelines(x for x in open(path) if len(x) > 1)
```
- Files can be also opened in binary mode for both read and write operations by using 'rb' or 'wb' when opening it

<img src='./images/fileop_modes.png' >

In [None]:
strings = ['Reading files in python can also be performed using methods such as read, seek and tell\n\n\n',
         '\tread: returns a certain number of charecters from the file and advances the file\n\t\t handle\'s position by the number of bytes read\n\n',
         '\ttell: method gives the current position of the file handle\n\n',
         '\tseek: changes the file position to the indicated byte in the file\n']

with open('./data/tmp.txt', 'w') as f:    
    f.writelines(strings)

In [None]:
!cat ./data/tmp.txt

In [None]:
with open('./data/tmp.txt', 'r') as f:
  lines = f.readlines()

r_strings = ''.join(l for l in lines)
print(r_strings)

- Important python file methods or attributes

<img src='./images/file_methods.png' >

Sometimes, your jupyter notebook and data files might not be in the same folder. In such cases, you can either use the absolute path which is the complete path to the data file or change your current working directory.

In order to check the current working directory, we use the **os** module
```python
import os
```

From the **os** module, we use the method **os.getcwd()** to get the current working directory

In [None]:
import os
print(os.getcwd())

We can change the current working directory, by providing the new path as a string argument to **os.chdir('...')** method

In [None]:
os.chdir('/')
print(os.getcwd())