# Homework 8

## Overview
* Absolute paths
* Relative paths
* File I/O
* Import
* PIP: Installing python modules
* CSV (Comma Separated Values) - reading and writing with pythons's standard library

## Absolute paths

### Helpful resources
https://docs.python.org/3/library/os.path.html  

An _absolute path_ is the complete path to a file. Absolute paths work only on your system - since other users can have directory trees different from yours. It is recommended to **avoid absolute paths whenever possible!** Also, hardcoding paths (absolute or relative) will most likely produce code that is not easy to port or test.

In [1]:
# Execute to create a python file for demonstration
with open('demofile.py', 'w') as datafile:
    datafile.write("print('Hello, World!')")

In [3]:
absolute_path = 'C:/Users/denis/work/courses/focsp/focsp/homeworks/demofile.py'

with open(absolute_path, 'r') as text_file:
    data = text_file.read()
print(data)
eval(data)

print('Hello, World!')
Hello, World!


So reading this file with its absolute path works well, but just on my PC (this won't work on your PC)!

## Relative Paths
Relative paths are paths which start from the current working directory. So only the directory tree 'below' the current working directory is relevant. Working with relative paths will save you a lot of nerves and time ;)  
_Hint:_ you can still go 'up' the directory tree by using `../` once for each level that you go up at the beginning of the path.

In [4]:
relative_path = 'demofile.py' # this file is in our current working directory

with open(relative_path, 'r') as text_file:
    data = text_file.read()
print(data)
eval(data)

print('Hello, World!')
Hello, World!


### Common Mistakes
One common mistake is trying to access a file via path that does not exist.

In [5]:
absolute_path = 'folder/demofile.py'
with open(absolute_path, 'r') as text_file:
    data = text_file.read()
print(data)
eval(data)

FileNotFoundError: [Errno 2] No such file or directory: 'folder/demofile.py'

Path does not exist!

## File I/O

### Helpful resources
https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files  
https://docs.python.org/3/library/functions.html#open

Python built-in function open() opens a file and returns a File Object, which you can use to read, write or append.

* `'r'` - Reading a file
* `'w'` - Open for writing (deleting previous content)
* `'a'` - Open for writing (appending to the end if exists)

mode | Methods
-------- | --------
`'r'`   | `read() ` `readlines() `
`'w'`   | `write()` `writelines()`
`'a'`   | `write()` `writelines()`


In [6]:
# This creates a file with some content for the examples below
with open('shoppinglist.txt', 'w') as datafile:
    datafile.write('noodles\nbread\nmilk\ncheese\napples\n')

Let's see what happens if we want to add 'rice' with `'w'` mode:

In [7]:
with open('shoppinglist.txt', 'w') as datafile:
    datafile.write('rice\n')

Didn't work out well, our old list got deleted. Using `'a'` will do a better job:

In [8]:
with open('shoppinglist.txt', 'w') as datafile:
    datafile.write('noodles\nbread\nmilk\ncheese\napples\n')

In [9]:
with open('shoppinglist.txt', 'a') as datafile:
    datafile.write('rice\n')

### Reading the entire content of a file
https://docs.python.org/3/tutorial/inputoutput.html#methods-of-file-objects

The `read(size)` method reads a specified amount of data and returns it as a string. Size is an optional argument, and if size is omitted or negative, the method reads and returns the entire content of the file will at once.

In [10]:
# read the whole file
with open('shoppinglist.txt', 'r') as shoppinglist_file:
    content = shoppinglist_file.read()

print(content)

noodles
bread
milk
cheese
apples
rice



In [11]:
# content  -> this is a string with newline characters ('\n')
print(type(content))
print(repr(content)) # printable representation of the given object

<class 'str'>
'noodles\nbread\nmilk\ncheese\napples\nrice\n'


As you can see, the output of the `read()` method is a string. Each newline is represented by a `\n`. You can read the content of the file by iterating over the lines in the file. In that case, each line is represented as a string.

In [12]:
# Reading a file line by line (i.e. iterate over the file):
with open('shoppinglist.txt', 'r') as shoppinglist_file:
    for line in shoppinglist_file:
        print(type(line))
        print(repr(line))
        print(line)

<class 'str'>
'noodles\n'
noodles

<class 'str'>
'bread\n'
bread

<class 'str'>
'milk\n'
milk

<class 'str'>
'cheese\n'
cheese

<class 'str'>
'apples\n'
apples

<class 'str'>
'rice\n'
rice



Method `readlines()` reads all lines from a file at once and returns them as a list. A newline character (\n) is included at the end of every string in the list.

In [13]:
# reading with readlines()
with open('shoppinglist.txt', 'r') as shoppinglist_file:
    content = shoppinglist_file.readlines()
    print(content)
    print('Length of content is: ', len(content))

['noodles\n', 'bread\n', 'milk\n', 'cheese\n', 'apples\n', 'rice\n']
Length of content is:  6


In [14]:
print(type(content[0]))

<class 'str'>


`f.write(string)` writes the content of a string to a file, returning the number of characters written.

In [15]:
# writing to a file
with open('some_new_file.txt', 'a') as some_file:
    for i in range(10):
        line = f'Line number {i+1}\n'
        some_file.write(line)

### Common Mistakes
One common mistake is opening a file for reading but trying to write to the file instead.

In [16]:
with open('shoppinglist.txt', 'r') as shoppinglist_file:
    shoppinglist_file.write('rice\n')

UnsupportedOperation: not writable

Another common mistake is opening a file for writing but trying to read from the file.

In [17]:
with open('shoppinglist.txt', 'w') as shoppinglist_file:
    print(shoppinglist_file.readlines())

UnsupportedOperation: not readable

## Import
Import/Modules:
+ https://docs.python.org/3/tutorial/modules.html
+ https://realpython.com/python-modules-packages/#python-modules-overview<br>

Standard Library: https://docs.python.org/3/library/index.html

Imagine you are working on a script which includes a variety of functions to solve common tasks, for instance: functions to perform mathematical matrix operations or functions to visualise huge amounts of data. Wouldn't it be convenient if we could use those functions from another python-script? Well....since we are too lazy to rewrite everything.... Yeah, it would!

Modules - it's your time to shine🌞<br>
Modules are nothing more than <ins>.py-files</ins> consisting of different kinds of components, i.e. functions, which can be made available in any other python-script using the `import`-statement. And yes, you can import your own python-scripts as well! Besides that, Python comes with an extensive amount of modules, known as <ins>Standard-Library</ins>. You can also include 3rd-party packages (numpy, pandas, scipy, matplotlib,...) but you will have to install them first.

### How to `import` everything from a module

**Python-Syntax:** 
```python
import module_name
```

It doesn't get easier than that - after the import-statement follows the name of the module. Now you can use all functions from the `module_name` module by prefixing their name with their <ins>namespace</ins> `module_name.` Usually, all import-statements are found at the top of the script to keep the code tidy and clear.

**Note:** `random` is part of the python standard-library


In [1]:
import random

In [2]:
random.randint(20,30)  # returns an random integer between the values 20 and 30

28

In [3]:
random.choice("ABCDE") # returns an random element of a given non-empty sequence

'D'

But what happens if we import a module inside a function? The module will become part of the local namespace of the function, therefore it is not possible to use the module-contents outside this particular function.

**Note:** Most python-styleguides do recommend importing modules at script-top since it is not really beneficial to do it somewhere else (like inside a function).

In [4]:
def generate_some_numbers(amount):
    import random
    numbers = []
    for n in range(amount):
        numbers.append(random.randint(0, 100))
    return numbers
print(generate_some_numbers(10))

random.randint(100,150)  # this will lead to a NameError since random is not defined outside the function

[22, 91, 8, 47, 14, 80, 54, 35, 22, 91]


102

### How to `import` specific contents `from` a module

**Python-Syntax:**
```python
from module_name import content_name1, content_name2, etc
``` 

Instead of importing everything of a module, we can extract specific content, i.e. only functions we really need. This allows us, to use functions without the namespace-prefix. Keep in mind, that multiple functions are separated with commas (`,`).

In [5]:
from statistics import mean, median

In [6]:
list_of_random_numbers = [random.randint(0,100) for _ in range(111)]
print(mean(list_of_random_numbers))
print(median(list_of_random_numbers))

52.24324324324324
58


**Pitfall:**
```python
from statistics import mean
from numpy import mean
```
Always keep an eye on which elements you are importing from different modules. In our case, there are two imported functions with the same name (name-collision). Therefore python always uses the last imported function with that name - in our case, the mean-function of the numpy module. <ins>The last import always wins!</ins>



### How to `import` a module `as` you like

**Python-Syntax:** 
```python
import module_name as new_module_name_in_namespace
from module_name import component as new_component_name_in_namespace
```

Modules and packages can be renamed on import to keep code more succinct. Most widely-used packages have an established abbreviation. Stick to it to make your code readable for others!

In [7]:
import numpy as np # np is the established shorthand for numpy

Now we can use `np` as the namespace prefix for `numpy` components.

In [9]:
a = np.array([1,2,3,4,5])
b = np.array([6,7,8,9,10])
a + b # numpy array operations work elementwise

array([ 7,  9, 11, 13, 15])

### How to `import*` all names from a module
It is possible to import all names from a given package. This means no namespace prefix is necessary for any of them. **Usually this is a bad idea, as it causes poorly readable code and namespace conflicts.** Thus, use explicit imports and namespace prefixes as shown above!

In [10]:
import random # let's import random, just like before
random # -> this will show us where the package is stored

<module 'random' from 'C:\\Users\\denis\\AppData\\Local\\Programs\\Python\\Python310\\lib\\random.py'>

In [11]:
from numpy import * # please don't do this
random # -> this is not the built-in random anymore, as it's overwritten by numpy's

<module 'numpy.random' from 'C:\\Users\\denis\\AppData\\Local\\Programs\\Python\\Python310\\lib\\site-packages\\numpy\\random\\__init__.py'>

In [12]:
sum([1, 2, 3, 4, 5]) # this is not the builtin sum-function, but numpy's

15

## PIP: Installing Python Modules
https://docs.python.org/3/installing/index.html

While the python standard library is *extensive*, there is an even bigger collection of 3rd-party packages available. `pip` is one of several possibilities to install such additional python packages. It comes with the default python setup, so you can use it right away. Note that pip is a package itself, but is usually used as a command line tool, not within a `.py`-script or `.ipynb`-notebook. You most likely already used pip when installing JupyterLab. Replace `pip` with `pip3` in the following commands, if you are using MACOS or Linux.

To install a package with pip, simply open up a terminal and enter
> `pip install <package name>`

e.g.
> `pip install numpy`.

Multiple packages can be installed at once by just chaining names (separated with spaces) like this:
> `pip install numpy matplotlib pandas scipy`

Depending on your computer's settings and environment setup, you might run into permission issues. Usually installing new packages just for the current user (instead of the systemwide default) fixes those:
> `pip install --user numpy`

Just like python itself, its packages exist in different versions, having different features and dependencies. This often leads to issues when trying to run code on another computer. Best practice would be to create a *virtual environment* ([Docs](https://packaging.python.org/tutorials/installing-packages/#creating-virtual-environments)) for each project, using *requirements files* ([Docs](https://pip.pypa.io/en/latest/user_guide/#requirements-files)) - or using Docker containers. You are welcome to do so, but for now it's sufficient to just install packages globally ;)


<center><img src="https://hackernoon.com/hn-images/1*ookfwogTLx_1qhHaiFJoJw.png" width="400" /></center>

## CSV (Comma Separated Values) - reading and writing with pythons's standard library

https://docs.python.org/3/library/csv.html  

If you are working with a huge amount of data, you will soon get in touch with so called `.csv-files`. In those files, the stored data is separated with commas (`,`), hence the abbreviation `csv` (comma separated values). It can be found in many places: Online datasets, Export from Excel or Google Sheets, Export from SQL, ...<br>
Luckily, pythons standard-library offers a csv-module to read and write tabular data in csv-format. 

**File-Structure:**

```python
column1,column2,column3
value1,value2,value3
value1,value2,value3
value1,value2,value3
```

1.) The **first row** of a csv-file always contains column **headers**<br>
2.) All **following rows** contain **values**

We can import the csv-module in the same fashion as we learned before:

In [13]:
import csv

### CSV - Read from a given csv-file

Lets have a look into a csv-file, containing the most streamed songs of the year 2021, using the `csv.reader`-method

In [17]:
with open("./most_streamed_songs.csv", 'r') as csv_file:
    streamed_songs = csv.reader(csv_file, delimiter=',')
    
    header = next(streamed_songs)
    print(f"Header of the CSV: {header}")

    first_data_row = next(streamed_songs)
    print(f"First data row: {first_data_row}")
    
    # alternatively (iterate through whole csv-file and store the data in a list of lists):
    # data = []
    # for row in streamed_songs:
    #     print(row)
    #     data.append(row)
    
print(streamed_songs)
# print(data)

Header of the CSV: ['songs', 'artist', 'album', 'streams_in_millions', 'publish_year']
First data row: ['Shape of You', 'Ed Sheeran', 'Divide', '2.96', '2017']
<_csv.reader object at 0x0000012BF0D0A6E0>


Well done - row after row, we stored the data of the file in lists which can be used for further tasks. <br>
In some cases it might be handy to read data in a dictionary using the `csv.DictReader`. This method gives us a dictionary with the column name as key and the data as value. 

In [24]:
with open("./most_streamed_songs.csv", 'r') as csv_file:
    streamed_songs = csv.DictReader(csv_file, delimiter=',')
    
    for row in streamed_songs:
        print(row)

{'songs': 'Shape of You', 'artist': 'Ed Sheeran', 'album': 'Divide', 'streams_in_millions': '2.96', 'publish_year': '2017'}
{'songs': 'Blinding Lights', 'artist': 'The Weeknd', 'album': 'After Hours', 'streams_in_millions': '2.613', 'publish_year': '2019'}
{'songs': 'Dance Monkey', 'artist': 'Tones and I', 'album': 'The Kids Are Coming', 'streams_in_millions': '2.395', 'publish_year': '2019'}
{'songs': 'Rockstar', 'artist': 'Post Malone', 'album': 'Beerbongs & Bentleys', 'streams_in_millions': '2.291', 'publish_year': '2017'}
{'songs': 'One Dance', 'artist': 'Drake', 'album': 'Views', 'streams_in_millions': '2.16', 'publish_year': '2016'}
{'songs': 'Someone You Loved', 'artist': 'Lewis Capaldi', 'album': 'Breach', 'streams_in_millions': '2.143', 'publish_year': '2018'}


### CSV - Writing into a (given) csv-file

But the magic✨ doesn't stop at just reading csv-files, it is also possible to write/append to csv-files. We simply use the method `csv.writer` combined with the `.writerow()`-function which takes a list (containing the new data) as a single parameter. Keep in mind to define exactly what you want to do in the *with open*-statement: 

* `a` for appending
* `w` for writing (clears file before writing-procedure)

If the file you try to access doesn't exist, python will create a new file with the given file name.


In [21]:
new_data_row = ["Someone You Loved","Lewis Capaldi","Breach",2.143,2018]
headers = ["song","artist","album","streams_in_millions","publish_year"]

with open("./most_streamed_songs.csv", 'a', newline = "\n") as csv_file: 
    streamed_songs = csv.writer(csv_file, delimiter=',')
    # streamed_songs.writerow(headers)
    streamed_songs.writerow(new_data_row)

It also works with dictionaries using `csv.DictWriter`. Instead of a list, the `.writerow()`-function takes a dictionary containing the column headers as keys and the new data as values. 

In [23]:
new_data_row = {"song":"Someone You Loved","artist":"Lewis Capaldi","album":"Breach","streams_in_millions":2.143,"publish_year":2018}

with open("./most_streamed_songs.csv", 'a', newline = "\n") as csv_file:
    streamed_songs = csv.DictWriter(csv_file, fieldnames = headers)
    # streamed_songs.writeheader()
    streamed_songs.writerow(new_data_row)

## Pandas

Pandas is a Python package, which provides data structures for working with tabular, labeled data (i.e., data in a table with rows and columns). It is a good tool for real-world data analysis in Python.
Get more information about pandas:
* https://www.youtube.com/watch?v=dqT-UlYlg1s
* https://pandas.pydata.org/docs/  

### Install and Import

In order to get access to the Pandas module, we’ll need to install it: ```pip install pandas ``` (in commandline/terminal)  
The module is usually imported at the top of a file under pd.

In [25]:
import pandas as pd

### Pandas Data structures

Pandas has it's own data structures - ```Series ``` and ``` DataFrames ```.

```pd.Series ```: One-dimensional array with axis labels   
bsp. ```x = pd.Series([6,3,4,6], index=[‘a’, ‘b’, ‘c’, ‘d’]) ```   


```pd.DataFrames ```: DataFrames can contain many different data types: strings, ints, floats, tuples, ...   
They have **rows** and **columns**: Every column has a name/header, every row has an index (integer) and contains values (different data types). So it is very nice to work with a lot of data with a lot of different columns, because they are named.  

### CSV reading and writing with Pandas
The most important methods are `.to_csv` and `pd.read_csv`. But what are the methods doing?

`.to_csv`: saves pd.DataFrame, pd.Series... as an .csv file

`pd.read_csv`: loads the .csv-file in a pd.DataFrame

In [26]:
most_streamed_songs = pd.read_csv('most_streamed_songs.csv') #load data to Pandas DataFrame
# do something with the DataFrame
most_streamed_songs.to_csv('./updated_most_streamed_songs.csv', index = False) #save data to a CSV

### Create a DataFrame

Let´s try to pass a dictionary to pd.DataFrame():

In [27]:
most_streamed_songs = {"songs": ["Shape of You","Blinding Lights", "Dance Monkey", "Rockstar", "One Dance"],
                       "artist": ["Ed Sheeran","The Weeknd", "Tones and I", "Post Malone", "Drake"],
                       "album": ["Divide","After Hours","The Kids Are Coming", "Beerbongs & Bentleys", "Views"],
                       "streams_in_millions": [2.960, 2.613, 2.395, 2.291, 2.160],
                       "publish_year": [2017, 2019, 2019, 2017, 2016]}

most_streamed_songs = pd.DataFrame(most_streamed_songs)
display(most_streamed_songs) #this works only in jupyter

Unnamed: 0,songs,artist,album,streams_in_millions,publish_year
0,Shape of You,Ed Sheeran,Divide,2.96,2017
1,Blinding Lights,The Weeknd,After Hours,2.613,2019
2,Dance Monkey,Tones and I,The Kids Are Coming,2.395,2019
3,Rockstar,Post Malone,Beerbongs & Bentleys,2.291,2017
4,One Dance,Drake,Views,2.16,2016


It is also possible to take lists and use the keyword argument *columns* to pass a list of column names:

In [28]:
most_streamed_songs = [["Shape of You", "Ed Sheeran", "Divide", 2.960, 2017],
                       ["Blinding Lights","The Weeknd","After Hours",2.613,2019],
                       ["Dance Monkey", "Tones and I", "The Kids Are Coming", 2.395, 2019],
                       ["Rockstar", "Post Malone", "Beerbongs & Bentleys", 2.291, 2017],
                       ["One Dance", "Drake", "Views", 2.160, 2016]]
                       
most_streamed_songs = pd.DataFrame(most_streamed_songs, columns =  ["songs", "artist", "album", "streams_in_millions", "publish_year"])
display(most_streamed_songs)  #this works only in jupyter

Unnamed: 0,songs,artist,album,streams_in_millions,publish_year
0,Shape of You,Ed Sheeran,Divide,2.96,2017
1,Blinding Lights,The Weeknd,After Hours,2.613,2019
2,Dance Monkey,Tones and I,The Kids Are Coming,2.395,2019
3,Rockstar,Post Malone,Beerbongs & Bentleys,2.291,2017
4,One Dance,Drake,Views,2.16,2016


### Working with DataFrames

Now, we want to work with a DataFrame. Therefore, let us load some data from a csv file:

In [34]:
data = pd.read_csv('COVID-19_AUT_2022_01_01_TO_2022_11_21.csv')

### Inspect DataFrame
Now we have a DataFrame and want to have a look at it. If we work with a small DataFrame, we can simply use print(DataFrame) or in jupyter display(DataFrame).  
If it's a larger DataFrame, pandas offers some helpfull methodes:

`data.head()`- shows the first 5 rows   
`data.info()`  -shows a summary of the DataFrame including the index dtype and columns, non-null values and memory usage.   
`data.tail()` - shows the last 5 rows   

In [35]:
display(data.head()) # display just works in Jupyter
display(data.info()) # display just works in Jupyter
display(data.tail()) # display just works in Jupyter

Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population
0,13734,AUT,Austria,2022-01-01,1274628.0,3817.0,16845.0,30.0,8939617.0
1,13735,AUT,Austria,2022-01-02,1278327.0,3699.0,16859.0,14.0,8939617.0
2,13736,AUT,Austria,2022-01-03,1281328.0,3001.0,16867.0,8.0,8939617.0
3,13737,AUT,Austria,2022-01-04,1284811.0,3483.0,16879.0,12.0,8939617.0
4,13738,AUT,Austria,2022-01-05,1290817.0,6006.0,16900.0,21.0,8939617.0


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 324 entries, 0 to 323
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            324 non-null    int64  
 1   iso_code      324 non-null    object 
 2   location      324 non-null    object 
 3   date          324 non-null    object 
 4   total_cases   324 non-null    float64
 5   new_cases     323 non-null    float64
 6   total_deaths  324 non-null    float64
 7   new_deaths    320 non-null    float64
 8   population    324 non-null    float64
dtypes: float64(5), int64(1), object(3)
memory usage: 22.9+ KB


None

Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population
319,14053,AUT,Austria,2022-11-16,5502186.0,5041.0,21116.0,11.0,8939617.0
320,14054,AUT,Austria,2022-11-17,5506697.0,4511.0,21124.0,8.0,8939617.0
321,14055,AUT,Austria,2022-11-18,5510919.0,4222.0,21134.0,10.0,8939617.0
322,14056,AUT,Austria,2022-11-19,5514701.0,3782.0,21143.0,9.0,8939617.0
323,14057,AUT,Austria,2022-11-20,5517893.0,3192.0,21144.0,1.0,8939617.0


With the methods `.head()` and `.tail()` 5 rows will be displayed. But you can change the amount easily when you fill in another number.

In [36]:
display(data.head(3)) # display just works in Jupyter
display(data.tail(3))# display just works in Jupyter

Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population
0,13734,AUT,Austria,2022-01-01,1274628.0,3817.0,16845.0,30.0,8939617.0
1,13735,AUT,Austria,2022-01-02,1278327.0,3699.0,16859.0,14.0,8939617.0
2,13736,AUT,Austria,2022-01-03,1281328.0,3001.0,16867.0,8.0,8939617.0


Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population
321,14055,AUT,Austria,2022-11-18,5510919.0,4222.0,21134.0,10.0,8939617.0
322,14056,AUT,Austria,2022-11-19,5514701.0,3782.0,21143.0,9.0,8939617.0
323,14057,AUT,Austria,2022-11-20,5517893.0,3192.0,21144.0,1.0,8939617.0


### Pandas get rows and columns
Sometimes, you just want a few rows or columns of your DataFrame. There are different ways one can get a column in pandas. The following are the ones we recomand to use:

##### `.loc` method

**Python-Syntax:**

```python
df.loc[row_start:row_end, columns]
``` 

The method lets you access the rows and columns you want to have. You need square brackets and then you specify, which rows you want by their index. If you want all rows, just make a `:`.

**Attention:** The first and second index are both included!

In [37]:
display(data.loc[1:4, "total_cases"], "\n")
print(data.loc[:, "total_cases"])

1    1278327.0
2    1281328.0
3    1284811.0
4    1290817.0
Name: total_cases, dtype: float64

'\n'

0      1274628.0
1      1278327.0
2      1281328.0
3      1284811.0
4      1290817.0
         ...    
319    5502186.0
320    5506697.0
321    5510919.0
322    5514701.0
323    5517893.0
Name: total_cases, Length: 324, dtype: float64


##### `df[]` method

**Python-Syntax:**

```python
df[column]
``` 
or if you want to have more than one column

```python
df[[column, column]]
``` 

You can access columns in a DataFrame like in a dictionary. Just write it in square brackets.If you want to have access to more than one column, write them in a list.

In [38]:
print(data["total_cases"])
print(data[["total_cases", "new_cases"]])

0      1274628.0
1      1278327.0
2      1281328.0
3      1284811.0
4      1290817.0
         ...    
319    5502186.0
320    5506697.0
321    5510919.0
322    5514701.0
323    5517893.0
Name: total_cases, Length: 324, dtype: float64
     total_cases  new_cases
0      1274628.0     3817.0
1      1278327.0     3699.0
2      1281328.0     3001.0
3      1284811.0     3483.0
4      1290817.0     6006.0
..           ...        ...
319    5502186.0     5041.0
320    5506697.0     4511.0
321    5510919.0     4222.0
322    5514701.0     3782.0
323    5517893.0     3192.0

[324 rows x 2 columns]


`df.column` method

**Python-Syntax:**

```python
df.column
``` 

You can access a column in a DataFrame by calling the attribute with a dot.

In [39]:
print(data.total_cases)

0      1274628.0
1      1278327.0
2      1281328.0
3      1284811.0
4      1290817.0
         ...    
319    5502186.0
320    5506697.0
321    5510919.0
322    5514701.0
323    5517893.0
Name: total_cases, Length: 324, dtype: float64


In [40]:
#print(data["total_cases"])
#print(display(data.total_cases))
print(data.loc[:, "total_cases"])

0      1274628.0
1      1278327.0
2      1281328.0
3      1284811.0
4      1290817.0
         ...    
319    5502186.0
320    5506697.0
321    5510919.0
322    5514701.0
323    5517893.0
Name: total_cases, Length: 324, dtype: float64


### Pandas get rows by index
There are different ways one can get a row in pandas. Therefore, you need the `df.iloc` method

##### **For one column:**

**Python-Syntax:**
```python
data.iloc[<row_index>]
``` 

Be aware, that if you just want to see one row, it will be displayed like this:

In [41]:
display(data.iloc[2])

id                   13736
iso_code               AUT
location           Austria
date            2022-01-03
total_cases      1281328.0
new_cases           3001.0
total_deaths       16867.0
new_deaths             8.0
population       8939617.0
Name: 2, dtype: object

##### **For more than one row:**

**Python-Syntax:**
```python
data.iloc[[<row_index>, <row_index>]]
```

In [42]:
display(data.iloc[[2, 4]])

Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population
2,13736,AUT,Austria,2022-01-03,1281328.0,3001.0,16867.0,8.0,8939617.0
4,13738,AUT,Austria,2022-01-05,1290817.0,6006.0,16900.0,21.0,8939617.0


It can also be used for a better display of one row:

In [43]:
display(data.iloc[[2]])

Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population
2,13736,AUT,Austria,2022-01-03,1281328.0,3001.0,16867.0,8.0,8939617.0


##### **For a range of rows:**

**Python-Syntax:**
```python
data.iloc[<start_row_index> : <end_row_index>]
```
`<start_row_index>` is inclusive, `<end_row_index>` is exclusive.

In [None]:
display(data.iloc[0:10])

### Row selection with conditions
You can select multiple rows by using logical operations. For example get all rows which fulfill the criterias 1 to n:

**Python-Syntax:**
```python
data.loc[(<creteria_1>) & (<creteria_2>) & ... & (<creteria_n>)]
```
Or get all rows which fulfill one of the criterias 1 to n:

**Python-Syntax:**
```python
data.loc[(<creteria_1>) | (<creteria_2>) | ... | (<creteria_n>)]
```

You can combine also different logical operations

In [54]:
#Get all rows with date "2022-01-06"
display(data.loc[data.date == "2022-01-06"])

#Get the entries with new_cases > 1000 and new_deaths > 40
data.loc[(data.new_cases > 1000) & (data.new_deaths > 40)]

Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population
5,13739,AUT,Austria,2022-01-06,1300899.0,10082.0,16912.0,12.0,8939617.0


Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population
78,13812,AUT,Austria,2022-03-20,3462110.0,42322.0,18525.0,41.0,8939617.0
83,13817,AUT,Austria,2022-03-25,3664438.0,41281.0,18718.0,49.0,8939617.0
84,13818,AUT,Austria,2022-03-26,3701790.0,37352.0,18759.0,41.0,8939617.0
85,13819,AUT,Austria,2022-03-27,3734990.0,33200.0,18805.0,46.0,8939617.0
90,13824,AUT,Austria,2022-04-01,3868236.0,26188.0,19001.0,49.0,8939617.0
91,13825,AUT,Austria,2022-04-02,3890531.0,22295.0,19045.0,44.0,8939617.0
92,13826,AUT,Austria,2022-04-03,3909523.0,18992.0,19095.0,50.0,8939617.0
213,13947,AUT,Austria,2022-08-02,4799326.0,25283.0,20366.0,49.0,8939617.0


### Combine row and colum slicing
With the `df.loc[]` method, you can combine  row and column slicing.   
So here, we want to get the `total_cases`, `new_cases`, `total_deaths` and `new_deaths` of austria from today:

In [55]:
data.loc[(data.new_cases > 1000) & (data.new_deaths > 40), ["total_cases", "new_cases", "total_deaths", "new_deaths"]]

Unnamed: 0,total_cases,new_cases,total_deaths,new_deaths
78,3462110.0,42322.0,18525.0,41.0
83,3664438.0,41281.0,18718.0,49.0
84,3701790.0,37352.0,18759.0,41.0
85,3734990.0,33200.0,18805.0,46.0
90,3868236.0,26188.0,19001.0,49.0
91,3890531.0,22295.0,19045.0,44.0
92,3909523.0,18992.0,19095.0,50.0
213,4799326.0,25283.0,20366.0,49.0


### Compute the rolling window 
Rolling window? If you want to compute mean of the last 3 days of each value in the DataFrame, you need a rolling window by using the method `rolling()`. Therefore, you can add a the parameter `min_period` to make sure, that at least 2 values are in the DataFrame to compute the mean.

**Python-Syntax:**
```python
data.rolling(<window_size>).function() 
```

In [57]:
# Compute the max new cases of the last 3 days
data.loc[:, "new_cases_max_3d"] = data.loc[:, "new_cases"].rolling(3, min_periods=2).max()
display(data)

Unnamed: 0,id,iso_code,location,date,total_cases,new_cases,total_deaths,new_deaths,population,new_cases_max_3d
0,13734,AUT,Austria,2022-01-01,1274628.0,3817.0,16845.0,30.0,8939617.0,
1,13735,AUT,Austria,2022-01-02,1278327.0,3699.0,16859.0,14.0,8939617.0,3817.0
2,13736,AUT,Austria,2022-01-03,1281328.0,3001.0,16867.0,8.0,8939617.0,3817.0
3,13737,AUT,Austria,2022-01-04,1284811.0,3483.0,16879.0,12.0,8939617.0,3699.0
4,13738,AUT,Austria,2022-01-05,1290817.0,6006.0,16900.0,21.0,8939617.0,6006.0
...,...,...,...,...,...,...,...,...,...,...
319,14053,AUT,Austria,2022-11-16,5502186.0,5041.0,21116.0,11.0,8939617.0,5041.0
320,14054,AUT,Austria,2022-11-17,5506697.0,4511.0,21124.0,8.0,8939617.0,5041.0
321,14055,AUT,Austria,2022-11-18,5510919.0,4222.0,21134.0,10.0,8939617.0,5041.0
322,14056,AUT,Austria,2022-11-19,5514701.0,3782.0,21143.0,9.0,8939617.0,4511.0


## Programming examples

### Programming example 1

Next year, a big wedding is planned. You want to invite all your family members and friends :) Later you realize, that a wedding is pretty expensive, so you decide to delete all your aunties & oncles from the list.

**Task:** 
Open the file `guestlist.txt` and read all the entries into a list. Write a function `wedding_cost_reducer()` that gets the guestlist, and creates a new guestlist with every second entry deleted and writes this list into the new file `guestlist_short.txt`.

In [22]:
# execute to create guestlist:
with open('guestlist.txt', 'w', encoding='utf8') as datafile:
    datafile.write('mom\ndad\nmother-in-law\nfather-in-law\nuncle Nepomuk\naunt Margarethe\ngrandma\ngrandpa\ngreat-aunt Henriette\nSebastian\n')
    

# Your code goes here

### Programming example 2

**Even and odd numbers**

You are not allowed to import anything. Make sure you test your functions. The tests get also graded so don’t delete or comment them out.
1. Write a function `is_even`:

This function should check if the given number is even or not. The function should return `True` if the number is even and `False` if not.

For example: if 5 is given for number, `False` should be returned.

* Parameters:

| Name | Type | Description |
|------|------|-------------|
|number|int   |Number which should be checked if even or odd|

 
* Return:
|Type|Description|
|----|-----------|
|bool|`True` if the number is even, `False` if not|


2. Write a function `check_numbers`:

This function receives a file path to a text file with comma-separated numbers. The function should read this file and use the function `is_even` to check, for every number in the file, whether it is even or odd. To save the result, the function should create a dictionary with two keys, "even" and "odd", where both values are a list. If the checked number is even, it should be added to the list of the key "even". Otherwise, it should be added to the list of the key "odd". Make sure the elements in the list are integers. The function should output and return the dictionary at the end.

**Hint:** use `.split(<delimiter>)` to split up a string by a <delimiter>

* Parameters:
|Name|Type|Description|
|---|---|----|
|file_path|str|Path to the file|

* Return:
|Type|Description|
|---|----|
|`dict[str, list[int]]`|Dictionary with even and odd numbers|
    
3. Write a function `write_numbers`:
    
This function recieves a file path to a file to write the dictionary containing even and odd numbers from the previous example. Write a CSV file using "even" and "odd" as column names and the numbers from those two lists as data.
    
* Parameters:
|Name|Type|Description|
|---|---|----|
|file_path|str|Path to the file|
|`dict[str, list[int]]`|dict|Dictionary with even and odd numbers|


**Test your functions:**
If you use the given file `numbers.txt` your dictionary output should look like this:
```
{'odd': [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99], 
'even': [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98]}
```

**As always, you need to test your implementation - this will also help with debugging! And it also gets graded.**

In [1]:
# your code goes here

### Programming example 3

**DictReader**

You are not allowed to import anything. Make sure you test your functions. The tests get also graded so don’t delete or comment them out.
1. Write a function `create_dict`:

This function should create a dictionary where the keys are given by the header list and the values are given by the values list. For example, the first element in the header list corresponds to the first element in the values list. Return the dictionary.
For example, when the following headers and values are given:

`header = ["header1", "header2", "header3"]`
`values = ["value1" , "value2" , "value3" ]`

the returned dictionary should look like this:

`{"header1":"value1", "header2":"value2", "header3":"value3"}`

* Parameters:
|Name|Type|Description|
|---|---|---|
|headers|`list[str]`|List of headers|
|values|`list[str]`|List of values|

* Return:
|Type|Description|
|---|---|
|`dict[str, str]`|Created dictionary from headers and values|

2. Write a function `dict_reader`:

This function receives a file path to a csv file with comma-separated values. The function should read the file line by line. The first line is the header, the other lines are the values for the header. Split every line by the given delimiter and save it as a list. For every data line, call the function `create_dict` with the header (list of strings) and the data line (list of strings). The returned dictionary from `create_dict` should be appended to a list. At the end, return the list. The result should be the same as if you would use the `dict_reader` form the csv module.

**Hint:** use `.split(<delimiter>)` to split up a string by a <delimiter>. A delimiter can also be a newline (`\n`)

* Parameters:
|Name|Type|Description|
|---|---|---|
|file_path|`str`|Path to the file|
|delimiter|`str`|Delimiter by which the data should be splitted|
    

**Test your functions:**
If you use the given file `data.csv` your dictionary output should look like this:
```
[{'date': '2020-12-09', 'location': 'Austria', 'new_cases': '2932'},
 {'date': '2020-12-08', 'location': 'Austria', 'new_cases': '2377'},
 {'date': '2020-12-08', 'location': 'Austria', 'new_cases': '2263'}]

```

**As always, you need to test your implementation - this will also help with debugging! And it also gets graded.**

In [2]:
# your code goes here

### Programming example 4

The file `spotify_weekly_charts_cw45.csv` contains about 200 different songs from different artists. It is your job to analyse this file using the `csv`-module. You can decide which method (`csv.reader` or `csv.DictReader`) you want to use.

Write a function called `find_artist_songs_in_charts`, which takes a single positional argument `artist:str` and returns/prints nothing. This function should find all songs of the given artist and save them in a different csv-file. The new file-name should include the name of the artist. Do not forget to import the csv-module. 

If everything works as intended, your new file content should look like this (using `Billie Eilish` as artist):
```
Position,Track Name,Artist,Streams
23,Happier Than Ever,Billie Eilish,15725534
64,Bored,Billie Eilish,8284534
74,Happier Than Ever - Edit,Billie Eilish,7634386
80,lovely (with Khalid),Billie Eilish,7455574
108,bad guy,Billie Eilish,6309259
```

In [26]:
# your code goes here

### Bonus programming example 1 (for extra 2 points)

Solve the task from previous example using the `pandas` module. Additionally, print the total and average number of streams, i.e., the sum and the average of streams of all songs of this artist.

In [27]:
# your code goes here

### Bonus programming example 2 (for extra 3 points)

Using the `pandas` module read the covid dataset. Compute the rolling mean of new cases for 7 days. Plot the rolling mean for months January-March, April-June, July-September, October-November (4 plots all together). Compute the new cases as the percentage of the total population. Compute the new deaths for each month. Plot the new deaths for each month (1 plot).

In [27]:
# your code goes here