<br>

<img src="./image/Logo/logo_elia_group.png" width = 200>

<br>

# Cheat Sheet
<br>

Well done! You have successfully completed your Python training. This cheat sheet is made to support you during your next steps towards working with data. It includes all the little tips and tricks around Python, its most important libraries and some important syntaxes. Have fun! 🙌

<img src="./image/cheat_sheet_icon.png" width = 100>

<div class="alert alert-block alert-success">

**Variables and Data Types**
        
</div>

<table>
<tr>
    <td><b> Data Types </b></td>
    <td><b> Type Conversion </b></td>
    <td><b> Example </b></td>
</tr>
<tr>
<td> String </td>
<td> str() </td>
<td> 'Data Science' </td>
</tr>
<tr>
<td> Integer </td>
<td> int() </td>
<td> 11, 2, 9 </td>
</tr>
<tr>
<td> Float </td>
<td> float() </td>
<td> 11.0, 2.0, 9.0 </td>
</tr>
<tr>
<td> Boolean </td>
<td> bool() </td>
<td> True, False </td>
</tr>
<tr>
<td> List </td>
<td> list() </td>
<td> list = ["a","b", 2.0, 3] &#128161; index starts at 0 </td>
</tr>
<tr>
<td> Dictionary </td>
<td> / </td>
<td> {key1: value1, key2: value2} </td>
</tr>
</table>

<div class="alert alert-block alert-success">

**Working with Data Types**
        
</div>

**For loops**: iterate through lists, or rows in a table with data. <br>
- Example: <br>
```
list = [1, 5, 8, 12, 6]
for number in list:
    print(number)`
```





**Conditional** statements work like the "IF..THEN.." command from Microsoft Excel. With **elif** you can add almost infinite layers to your condition statement. <br>

- Example: 
<br> 
```
x = 12 
if x < 10:
    print('X is less than 10')
elif x == 10:
    print('X is equal to 10')
else:
    print ('X is greater than 10')
```

<div class="alert alert-block alert-success">

**Libraries and Importing Files**
        
</div>

In [1]:
import pandas as pd
import numpy as np

You can import **text files** and **CSVs** easily 
- if you import .txt files, you need to insert the right path `open("./folder_name/file_name.txt", 'r')`
- to import csv files, you need to import pandas first: `import pandas as pd`
    - then you can use the read_csv() function: `pd.read_csv("./folder_name/file_name.csv")`

<div class="alert alert-block alert-success">

**Functions**
        
</div>

The syntax to define a function is the following: 
<br>
```
def function_name(input_values):
    computation
    return function_output
```
&#128161; Remember, that a **local scope** variable only exists **in** the function.

<div class="alert alert-block alert-success">

**DataFrames: First steps**
        
</div>

To work with DataFrames you need to `import pandas as pd`. <br>
You can create Pandas DataFrames from Python lists or dictionaries: <br>
- Syntax: `df = pd.DataFrame(list_name, columns = ["column_1", "column_2"])`

Take a look at your data using with: 
- `df.head()`
- `df.tail()`
- `df.shape`
- `df.dtypes`

Select column using: 
- `df["column_name"]` or `df.column_name`

<div class="alert alert-block alert-success">

**DataFrames: Precise Selection by Row or Column**
        
</div>

- `df.loc` gets rows (or columns) by searching for **labels**
- `df.iloc` gets rows (or columns) works **positionally** with numbers (in our case the index)

<br>

&#128161; When using `loc` and `iloc`, the **row is always 1st** and **column is always second!**

`[start_row : end_row, start_column : end_column]`

<div class="alert alert-block alert-success">

**Data Cleaning & Manipulation**
        
</div>

**Missing Values**
1. Identifying missing values: <br>
    - `pandas.DataFrame.isnull().any().any()` or `pandas.DataFrame.isnull().any()` and `pandas.DataFrame.isnull().sum()`

2. Drop missing values: <br>
    - `pandas.DataFrame.dropna(subset=['column_name'])`

3. Replace missing values e.g. with `unknown`: <br>
    - `pandas.DataFrame.fillna('unknown')`

**Quick Statistics**

- `pandas.DataFrame["column_name"].value_counts()`
- `pandas.DataFrame["column_name"].unique()`
- `pandas.DataFrame["column_name"].describe()`
- `pandas.DataFrame["column_name"].min()`
- `pandas.DataFrame["column_name"].max()`
- `pandas.DataFrame["column_name"].mean()`
- `pandas.DataFrame["column_name"].median()`

**Filtering and Groupby**
- `pandas.DataFrame[pandas.DataFrame["column_name"] == "value_name"]`
- multiple filters can be added with `|` statement
- &#128161; groupby() only works with an aggregator
- `pandas.DataFrame.groupby("column_name").aggregator()` 
    - example: `energy_flow.groupby("Control area").mean()`
    - aggregators: `sum(), mean(), max(), min()`

<div class="alert alert-block alert-success">

**Advanced Manipulation**
        
</div>

**Apply Functions with apply()**
- You can use `apply()` to apply functions on Pandas Series and DataFrames
    - Syntax: 
        - `series_name.apply(function_name)`
        - `df_name.apply(function, axis = 1)`
    - In a DataFrame, you can loop through rows and columns, but you need to specify the axis: 
        - axis = 0 for column 
        - axis = 1 for row
<br>

**Vectorization**

- import numpy as np
- speeds up the process 
    - `np.where()` is like the "if statement" in Excel
    - Syntax: 
           np.where(
                conditional statement -> bool array,
                series/array/function()/scalar if True,
                series/array/function()/scalar if False
           )

    - `np.select()` for multiple conditions
    - Syntax: 
        conditions = [
            condition1
            condition2
            etc.
        ]

        choices = [
            value1
            value2
            etc.
        ]
    - `pandas.DataFrame["new column"] = np.select(conditions, choices, default="NA")`
           

<div class="alert alert-block alert-success">

**Datetime**
        
</div>

**DateTimeIndex** <br>
Turn Datetime column into DatetimeIndex while uploading your data:
1. Set `parse_dates` parameter to `= True`. 
2. Set `index_col` to `= 0`
- Example: 
    - `pd.read_csv("./data/energy/elia_load_2019_01_15.csv", sep = ";", parse_dates = True, index_col = 0)`

**pd.to_datetime()**
- keeps Datetime as column but convert to Datetime object 
- Syntax: 
    - `df_name["Datetime"] = pd.to_datetime(df_name["Datetime"])`

<div class="alert alert-block alert-success">

**Data Visualization**
        
</div>

**Quick Visualization**
- `pandas.DataFrame.plot(x = "column_1", y = "column_2")`

**Matplotlib**
- `import matplotlib.pyplot as plt`
- Works with `fig, ax = plt.subplots()`
- See [documentation](https://matplotlib.org/stable/tutorials/introductory/usage.html#sphx-glr-tutorials-introductory-usage-py)

**Seaborn**
- `import matplotlib.pyplot as plt
import seaborn as sns`
- See [documentation](https://seaborn.pydata.org/introduction.html)

**Plotly Express**
- Interactive plots by default
- `import pandas as pd
import plotly.express as px`
- See [docmentation](https://plotly.com/python-api-reference/plotly.express.html)
- Syntax: 
    - `fig  = px.bar(data_frame = df_name, x = "column_1", y = "column_2", title = "plot_title")
fig.show()
print(fig)`