# <center><font color='#E20174'>**Introduction To Python at Magyar Telekom**</font></center>
### <center>Autumn, 2021</center>
### <center>Asztalos Áron, Duronelly Péter, Forgách Márton, Neszmélyi Zsolt</center>
# <center>Class 2</center>

## Class Topics
- Lambda functions
- Dictionaries
- I/O
- Working with JSON
- Datetime
- Modules
- Exception handling

## Lambda functions

A lambda function is a small anonymous function. A lambda function can take any number of arguments, but can only have one expression. It is created using the `lambda` keyword.

In [1]:
square = lambda x: x ** 2

In [2]:
square(2)

4

We use lambda to simplify our code, to create temporary definitions, which are used only once. The same can be achieved with a normal definiton:

In [3]:
def square_def(x): 
    return x ** 2

In [4]:
square_def(2)

4

You can combine lambda functions with list comprehension. 

In [5]:
ls_numbers = list(range(10))

In [6]:
ls_numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Let's square all the values from the list and add 1 to each element

In [7]:
f = lambda x: x**2 + 1
[f(x) for x in ls_numbers]

[1, 2, 5, 10, 17, 26, 37, 50, 65, 82]

Let's square and add one to each even number in the list

In [8]:
[f(x) for x in ls_numbers if x%2 == 0 ]

[1, 5, 17, 37, 65]

Square and add one to each even number in the list but return the odd numbers without transformation

In [9]:
[f(x) if x%2 == 0 else x for x in ls_numbers]

[1, 1, 5, 3, 17, 5, 37, 7, 65, 9]

### Exercise - lambda functions

Create a function that changes every third and seventh element of a list to "Boom", given the following list. 

In [10]:
boom_list = list(range(100))

<details><summary>Kattints ide a megoldásért</summary>

```python
boom_37 = lambda x: "boom"
[boom_37(x) if (x%3 == 0 or x%7 == 0) else x for x in boom_list]
```

</details>

## Dictionaries

Dictionaries are also like lists, except that each element is a key-value pair. Dictionaries are written with curly brackets. A dictionary is a collection which is changeable and does not allow duplicates. The syntax for dictionaries is `{key1 : value1, key2 : value 2, ...}`:

A listákhoz képest: A listáknál az értékek az összegyűjtése lineáris: a sorrendjuk kötött és fontos (hiszen aszerint hivatkozunk rájuk). A dictionaries-nél nincs kötött sorrendje az elemeknek; az értékeknek (value), viszont van címkeje (label/key/lookup tag), ezzel tudunk hivatkozni rájuk.

value + label = item (más néven tuple)

Egy dictionary olyan mint egy változó: sokszor a címkékhez tartozó értékek maguk a gyakorisagok (tehát value=n; label=válaszlehetőség - például value=542; label=férfi) - fontos, hogy ez azért is lehetséges, mert különböző címkékhez tartozhat ugyanaz az érték is.

Az, hogy az értékeknek nincs kötött helyük, az teszi a használatukat nagyon gyorssá: 'hashing' - ld. wikipedia.

In [11]:
counts = {"bananas" : 1,
          "oranges" : 2,
          "apples" : 3,}

print(type(counts))
print(counts)

<class 'dict'>
{'bananas': 1, 'oranges': 2, 'apples': 3}


Strings, numbers, and tuples work as keys, and any type can be a value. Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). Looking up a value which is not in the dict throws a KeyError -- use "in" to check if the key is in the dict, or use dict.get(key) which returns the value or None if the key is not present (or get(key, not-found) allows you to specify what value to return in the not-found case).

In [12]:
{1: 'a', 2: [2,3]}

{1: 'a', 2: [2, 3]}

Select a dictionary item by key using squared braces.

In [13]:
counts['bananas']

1

Alternatively

In [14]:
counts.get('bananas')

1

To access dictionary elements:

In [15]:
for item in counts.items():
    print(item)

('bananas', 1)
('oranges', 2)
('apples', 3)


In [16]:
for key, value in counts.items():
    print('The number of', key, 'is', value,'.')

The number of bananas is 1 .
The number of oranges is 2 .
The number of apples is 3 .


In [17]:
for key in counts.keys():
    print(key)

bananas
oranges
apples


In [None]:
for key in counts.keys():
    print(key, counts[key])

In [None]:
for quantity in counts.values(): # You can call the keys and values in any names.
    print(quantity)

To avoid key errors, you can simply check with an ``if`` that the key is present in the dictionary: 

In [None]:
if 'bananas' in counts:
    print(counts['bananas'])

In [None]:
if 'strawberries' in counts:
    print(counts['strawberries'])
else:
    print('No strawberries in the basket.')

What if we are referring to a **missing key**? It depends on how we are calling that key.

In [None]:
print(counts.get('strawberries'))

In [None]:
print(counts['strawberries'])

Define a **default value** for missing keys.

In [None]:
for key in ['bananas', 'oranges', 'strawberries', 'apples']:
    print(key, counts.get(key, 0))

As of Python version 3.7, dictionaries are *ordered*. In Python 3.6 and earlier, dictionaries are *unordered*. If you want to sort the dictionaries by key use the `sorted()` method.

In [None]:
for key in sorted(counts.keys()):
    print(key, counts[key])

<br> 

## I/O: Reading from and writing to files

### Reading

First you need to **open** the file. 

In [1]:
f = open('data/example.txt')
print(f)

<_io.TextIOWrapper name='data/example.txt' mode='r' encoding='cp1250'>


In [2]:
f.read()

"Hi, my name is Jim and I am from ÄŚeskĂ˝ Krumlov\nWhat's you name?"

Let's fix the encoding issues...

You can also add **encoding information** to the `open()` method. You need to know these endocings:
- **'utf-8'**: The most common encoding designed for backward compatibility with ASCII. UTF-8 is by far the most common encoding for the World Wide Web, accounting for over 97% of all web pages, and up to 100% for some languages, as of 2021.
- **'cp1250'** or **'Windows-1250'**: It is used under Microsoft Windows to represent texts in Central European and Eastern European languages that use Latin script, such as Polish, Czech, Slovak, Hungarian, Slovene, Serbo-Croatian (Latin script), Romanian (before 1993 spelling reform) and Albanian. It may also be used with the German language; German-language texts encoded with Windows-1250 and Windows-1252 are identical.
- **'iso8859_2'**:  part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as "Latin-2". It is generally intended for Central[1] or "Eastern European" languages that are written in the Latin script.

You can read about the encoding 'codecs' [here](https://docs.python.org/3/library/codecs.html).

In [3]:
f = open('data/example.txt', encoding="utf-8")
f.read()

"Hi, my name is Jim and I am from Český Krumlov\nWhat's you name?"

You also need to **close** the file, otherwise your program will not allow other programs to access it. 

In [4]:
f.close()

In [5]:
f = open('data/example.txt', encoding="utf-8")

f.readline() # This command only reads one line

'Hi, my name is Jim and I am from Český Krumlov\n'

In [6]:
f.readline()

"What's you name?"

In [7]:
f.close()

The best way to close a file is by using the `with` statement. This ensures that the file is closed when the block inside the with statement is exited. We don't need to explicitly call the `close()` method. It is done internally.

In [8]:
with open('data/example.txt', encoding="utf-8") as f:
    for line in f:                # remember to indent! 
        print(line)

Hi, my name is Jim and I am from Český Krumlov

What's you name?


### Writing

In [26]:
write_text = open('message.txt', 'w')

In [27]:
write_text.write('Hello Monthy! \nThis is my second Python class')

45

In [28]:
write_text.close()

Four ways to open a file:
- "r" - Read - Default value. Opens a file for reading, error if the file does not exist
- "a" - Append - Opens a file for appending, creates the file if it does not exist
- "w" - Write - Opens a file for writing, creates the file if it does not exist
- "x" - Create - Creates the specified file, returns an error if the file exists

In [9]:
fruits=dict()
f = open('data/fruits.txt')
f.readline() # This reads only one line, the first one with X and Y. We are often not interested in the headline of a file
for line in f:
    mykey,myvalue=line.strip().split('\t') # strip() removes whitespace
    fruits[mykey]=myvalue
f.close() # Remember to close your file!!
print(fruits)

{'bananas': '3', 'apples': '5', 'orange': 'none'}


**hun: Feladat: írj egy kódot, ami a felhasználót kérdezi meg arról, hogy melyik fájlt olvassa be. Ha nem sikerül a fájlt beolvasni, akkor hívd fel a felhasználó figyelmét arra, hogy olyan fájlnevet adjon meg, ami létezik és elérhető; és hogy a következők közül választhat: mbox.txt és mbox-short.txt. Addig csinálja ezt a program, amíg a felhasználó meg nem adja valamelyik fájlnevet.**

**eng: Task: write a code that asks the user which file they want to read. If the written file cannot be opened, tell the user to give a name the exists and can be opened. They can chose from these: example.txt, fruits.txt, message.txt. Ask the user again, until they choose from this three.**

<details><summary>Kattints ide a megoldásért</summary>

```python
behivando=input('Melyik fájlt olvassuk be?')

while (behivando != "fruits.txt" and behivando!="message.txt" and behivando!="example.txt"):
    print('Olyan fájlnevet adj meg, ami létezik és elérhető! A következők közül választhatsz: example.txt, fruits.txt, message.txt.')
    behivando=input('Melyik fájlt olvassuk be?')
    
behivando = open('data/' + behivando,'r', encoding="utf-8")
print ("Szuper vagy!")
print(behivando)
```

</details>

## JSON

**JSON** is a syntax for storing and exchanging data. JSON is text, written with **JavaScript object notation**.

In [13]:
import json

In [10]:
JSON_string = '{"my_beer" : "Stella", "my_car" : "Alfa Romeo", "my_food": "hamburgers"}'

In [11]:
type(JSON_string)

str

In [14]:
my_dict = json.loads(JSON_string)

In [15]:
print(type(my_dict))
print(my_dict)

<class 'dict'>
{'my_beer': 'Stella', 'my_car': 'Alfa Romeo', 'my_food': 'hamburgers'}


In [16]:
for k, v in my_dict.items():
    print(k,':', my_dict[k])

my_beer : Stella
my_car : Alfa Romeo
my_food : hamburgers


Use JSON for config files.

In [17]:
with open('data/config.json') as cf:
    configurations = json.load(cf)

In [18]:
configurations

{'Output base directories': {'local_path': '/data/user/local_data',
  'hdfs_path': '/user/user/hdfs_data'},
 'Output options': {'is_local_csv_out': False,
  'is_local_parquet_out': False,
  'is_hdfs_parquet_out': True},
 'Database settings': {'database': 'impala', 'use_password': True}}

In [19]:
type(configurations)

dict

In [20]:
for key in configurations.keys():
    print(key, configurations[key])

Output base directories {'local_path': '/data/user/local_data', 'hdfs_path': '/user/user/hdfs_data'}
Output options {'is_local_csv_out': False, 'is_local_parquet_out': False, 'is_hdfs_parquet_out': True}
Database settings {'database': 'impala', 'use_password': True}


In [21]:
configurations['Output options']

{'is_local_csv_out': False,
 'is_local_parquet_out': False,
 'is_hdfs_parquet_out': True}

In [22]:
configurations['Database settings']

{'database': 'impala', 'use_password': True}

In [24]:
configurations['Database settings']['use_password']

True

In [23]:
if configurations['Database settings']['use_password']:
    print('Password needed!')

Password needed!


Dump dictionary as text.

In [25]:
type(json.dumps(configurations))

str

In [26]:
json.dumps(configurations)

'{"Output base directories": {"local_path": "/data/user/local_data", "hdfs_path": "/user/user/hdfs_data"}, "Output options": {"is_local_csv_out": false, "is_local_parquet_out": false, "is_hdfs_parquet_out": true}, "Database settings": {"database": "impala", "use_password": true}}'

## Datetime

Date is not a datatype in Python, but the `datetime` module provides access to date and time functionalities. 

In [None]:
import datetime

In [None]:
D1 = datetime.date(1986, 4, 21)
T1 = datetime.time(12,0,0) # noon
DT = datetime.datetime(1986, 4, 21, 12, 15, 0)

# Typically you want to work with datetime because you can
# omit the time values and then it defaults to midnight.
D = datetime.datetime(1986,4,21)

In [None]:
print('D1:', D1)
print('T1:', T1)
print('DT:', DT)
print('D:', D)

Once you have a `datetime` object you can do fancy things with it:

In [None]:
print("The year was %d and the day is %d." % (D.year, D.day))

print ("The day of the week was %d." % (D.weekday()))
print ("(Monday = 0, ..., Sunday = 6.)")

In [None]:
D.utctimetuple()

In [None]:
Dnow = datetime.datetime.now()
print(Dnow)

We can format the time using for example `strftime()` (all information about the format [ here](https://docs.python.org/2/library/datetime.html#strftime-strptime-behavior)):

In [None]:
#Dnow.strftime("%I:%M%p") 
Dnow.strftime("Date: %Y-%m-%d time: %H:%M") 

Math operations make sense with datatime objects.

In [None]:
dt = Dnow - D
print(type(dt))
print("There are %i days between then and now." % dt.days)

`timedelta` encodes time intervals. This allow us to do more operations:

In [None]:
interval = datetime.timedelta(days=100,hours=12) # 100.5 days

soon = datetime.datetime.now() + interval # addition!

interval_days =interval.days #if you want also the .5, need to hack a bit and do interval.total_seconds()/3600.0/24

print ("In %0.1f days it will be %s." % (interval_days, soon))


In [None]:
ts1 = "2012-04-26"
ts2 = "January 5, 1978"

We can now use a function to parse a string for a time given a string representing a time format. This uses a function called `strptime` (read it as **str**ing **p**arse" **time**). 

Here we go.

In [None]:
d1 = datetime.datetime.strptime( ts1, "%Y-%m-%d" )
print(d1)
print(d1 + datetime.timedelta(days=-7))

The string `"%Y-%m-%d"` encodes the timestamp format we were looking for. A four-digit year (`%Y`), a dash (`-`), a two-digit month number (`%m`), another dash, and then a day number (`%d`).

Now ts2 incorporates the name of a month, so that format string is a little different (`%B` means the full month name).

In [None]:
d2 = datetime.datetime.strptime( ts2, "%B %d, %Y" )
print(d2)
print(d2 - datetime.timedelta(days=-7))

There's a huge number of ways to build a format string. Best is to look up the documentation: http://docs.python.org/2/library/datetime.html#strftime-strptime-behavior

Parallel to strptime is another function, `strftime` (string format time) that does the opposite: it takes a `date` or `datetime` and returns a timestamp format string.

In [None]:
s_before = "Jan 19, '89"
d = datetime.datetime.strptime("Jan 19, '89", "%b %d, '%y") # taking the time in a specific format as input
s_after  = d.strftime("%Y-%m-%d") # writing the time in another format
print (s_before, "--->", s_after)

Datetime is extremely useful, because different data sources encode times in different ways. Some formats are easy for humans to read, but I like the standard `%Y-%m-%d %H:%M:%S` UNIX-style timestamp because it _sorts nicely_.

### Wrangling with timezones

In [None]:
D = datetime.datetime(2021,9,28,1,0,15)
D

Date and time objects may be categorized as “**aware**” or “**naive**” depending on whether or not they include timezone information. Our **D** variable is *naive*. 

In [None]:
print(D.tzinfo)

In [None]:
# Python timezone module
import pytz

In [None]:
D_Bp = D.astimezone(pytz.timezone("Europe/Budapest"))

In [None]:
print(D_Bp.tzinfo)

In [None]:
D_Bp.hour

Convert to other timezones.

In [None]:
D_Bp.astimezone(pytz.timezone("Europe/London"))

In [None]:
D_Bp.astimezone(pytz.timezone("Europe/London")).hour

Everything about datetime is [here](https://docs.python.org/3/library/datetime.html)

## Modules

Reusing code is standard practice in programming. Writing functions and classes with well-defined purpuse and reuse them instead of writing them all over again for new pgramrs makes life easer, debugging simpler, and your code more readable. 

**Modules** are essentially code libraries. A python module is defined in a python file (with file-ending `.py`), and it can be made accessible to other Python modules and programs using the `import` statement.

In [27]:
import my_print_module

In [28]:
help(my_print_module.print_text)

Help on function print_text in module my_print_module:

print_text(text)
    Prints text to the console.
    
    Parameters
    ----------
    text: string
        The text to print.



In [29]:
dc_to_print = {'A': 23, 'B': 'WewishYouaMerryChristmas'}

In [30]:
my_print_module.print_anything(dc_to_print)

{'A': 23, 'B': 'WewishYouaMerryChristmas'}


There are multiple ways to import modules.

In [31]:
import my_calculator_module as mc

In [32]:
mc.add_two_numbers(14, 30)

44

## Exception Handling (Try Except)

`Exceptions` handle errors in the code. They let you write contructs so that your program falls back to somewhere else if an error blocks the normal run of your code. 

The `try` block lets you test a block of code for errors. <br>
The `except` block lets you handle the error.<br>
The `else` block is to be executed if no errors were raised.<br>
The `finally` block lets you execute code, regardless of the result of the try- and except blocks.<br>

In [None]:
try:
    print("test")
    # generate an error: the variable test is not defined
    print(test)
except:
    print("Caught an exception")

To get information about the error, we can access the `Exception` class instance that describes the exception by using for example:

    except Exception as e:

In [None]:
try:
    print("test")
    # generate an error: the variable test is not defined
    print(test)
except Exception as e:
    print("The problem with our code is the following: " + str(e))

In [None]:
mc.add_two_numbers(3, 'b')

In [None]:
try:
    mc.add_two_numbers(3, 'b')
except Exception as e:
    print('We ran into this error: ' + str(e))

And what happens here? 

In [None]:
try:
    mc.devide_two_numbers(3, 'b') # This function already handles the error inside!
except Exception as e:
    print('We ran into this error: ' + str(e))
else:
    print('Everything went fine.')