# Python Mustercodes

In diesem Dokument habe ich eine **Liste von diversen Python-Codes**, welche mir in meiner Arbeit als Data Scientist behilflich sein könnten. I hope you find what you are looking for =)

## Install some Packages // Modules, or Sub-Modules

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import date2num
import pandas as pd
from scipy import stats
from datetime import datetime

## Need help & documentation
---

In [None]:
help()


Welcome to Python 3.8's help utility!

If this is your first time using Python, you should definitely check out
the tutorial on the Internet at https://docs.python.org/3.8/tutorial/.

Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules.  To quit this help utility and
return to the interpreter, just type "quit".

To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics".  Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".



## Read data // Load data
---

In order to **load datasets**, you will need the library `pandas`. 

To load data from Excel-Sheats:


In [None]:
epex_df = pd.read_excel("./Data_V2/Preis_aktuell_Spot_EEX_CH-19-ver-mac.xlsx", 
                        header=[1], 
                        sheet_name='Prices', # If you have more than 1 Excel-Sheet within the Excel-File, you need to specify
                        # which sheet you want to load
                        engine='openpyxl') # This input will (sometimes) be needed if you load data from an Excel-File via 
                                           # a Windows-Computer, otherwise it can print an error!
epex_df # output the ddset a

## Filtering Data // Selection of Columns
---

"Filtering" können in Datensätze mit Hilfe der `Column-Selection` durchgeführt werden. Beachte dass es - oftmals - mehrere Möglichkeiten, um - beispielsweise - eine bestimmte Spalte zu selektieren.

<mark>Also, note that the selection of the whole column will print out <u>all</u> the values // entries within a column</mark>.

**Möglichkeit 1: mit Hilfe der `dot-notation`:**

In [None]:
desc = reviews.description # ACHTUNG: diese Art von "selection" funktioniert nicht, wenn es Leerschläge gibt!!

**Möglichkeit 2: mit Hilfe des `indexing-operators [...]`**

In [None]:
desc = reviews['description'] # Dadurch wird die Column 'description' im Datensatz 'reviews' selektiert.

### Filtering for a particular `entry` WITHIN a specific column?
---

**Möglichkeit 1**: Lustigerweise scheint es nur für die eine 'entry-selection' zu geben mit Hilfe des 
`indexing operators` (= siehe 'Möglichkeit 2' oben, dh mittels `index-operators`), aber NICHT für die `dot-notation`!

In [None]:
first_description = reviews['description'][0]

**Möglichkeit 2**: As an alternative to the 'indexing operators', we can use the special 'accessor operators'
that come along with the `pandas`-package: 

- `iloc` (= based on the **Postition** des Index <u>innerhalb</u> der Spalten & Reihen im ddset), and
- `loc` (= based on the **Index-Namen** of the Spalten & Zeilen im ddset).
    
**Important to note BEFORE running `iloc?`**: we need to know the POSITION // where in the ddset the column 
'description' is located --> use `reviews.head()` command:
- Fazit: the 'description'-column is the 2nd column in the ddset! --> now we can use iloc!
- Note: Oftentimes, the index // row-labels of datasets have - **by default** - no real <u>name</u>. Instead Python automatically gives them **a name that corresponds to the position** - starting at 0 to where the dataset ends... 

*<mark>Hence - since the (default) name of the rows == position of the rows within the ddset - it is often the case that `loc` and `iloc` have the <u>same</u> index-value for the rows</mark>*!

In [None]:
# use of iloc (--> remember: we need the POSITION of the rows & columns that we are interested in within the dataframe)
first_description = reviews.iloc[0,1] # Output == selects the 2n column (= 1). Within this second column, you select 
# the 1st element (= 0)
    # IMPORTANT: Reihenfolge der inputs bei 'iloc' == 1) 'rows', then 2) 'columns'

**Möglichkeit 3**: use `loc` as the 'accessor operator', which is based on the names of the rows- & column-indices.

In [None]:
first_description = reviews.loc[0, 'description'] # IMPORTANT: Reihenfolge der inputs == 1) 'rows', then 
# 2) 'columns' --> selects the column ""

# Important to note BEFORE running 'loc': we need to know the NAMES of the columns'- & rows'-indices 
# --> use reviews.head() command:
    # Fazit: the row-label (= names of the index of the rows) is simply an ascending range starting 
    # from 0,1,2,3,... (= default-values in a ddframe) AND the column-labels (= names of the index of the 
    # columns)  are the names of the columns itself, e.g. 'country', 'description', 'designation' etc..
    # --> now we can use loc!

## Converting a Column into a Date-Column
---

**If you work with time-series**, it will oftentimes be the case that you will need to convert your date-column - which oftentimes are strings - into an actual `date-type` column!

In [None]:
#convert Delivery day to a date time column
epex_df['Delivery day'] = pd.to_datetime(epex_df['Delivery day'], format = '%Y%m%d')