# Basic Libraries
```python
import numpy as np 
```
* used for 2D **arrays** (matrix)
* handling *NaN* values
* mathematical operations (sum, average)
* find key features (unique, count)
* other array functions (transpose, reverse, flattening)

```python
import pandas as pd
```
* handling **DataFrame** objects (Excel)
* data analysis
* handling different data types

```python
from matplotlib import pyplot as plt
```
* extracting data from Excel (read_excel)
* Graphical User Interface for plots
* adjustable axis and labels

```python
import seaborn as sb
sb.set() # set the default Seaborn style for graphics
```


<span style="color:White; font-size:4em;">Jupyter Shortcuts</span>

$\color{orange}{{\rm General}}$  
- Ctrl-Shift-P - List of All Commands

---

$\color{blue}{{\rm Cell~Options}}$  
- M - change Cell to Markdown
- Y - change Cell to Python
- Enter - Edit Selected Cell
- Ctrl-Shift-(-) - Split Cell into Two
- Shift-Arrow - Select Multiple Cells (for reorganizing Notebook)
- Ctrl-Shift-M - Merge Multiple Cells

$\color{green}{{\rm Code~Shortcuts}}$  
- Shift-Enter - Run Cell 
- Alt-Enter - Run Cell and Insert Cell Below
- Esc - Navigate Cells with Arrow Keys
- b - Insert Cell Below
- a - Insert Cell Above
- z - Undo
- dd - Delete Selected Cell
- o - Toggle Output
- f - Find and Replace (variable names)
- Alt-Select - Multicursor
- Ctrl-/ - Comment Line

---

$\color{red}{{\rm Module}}$  
- Shift-Tab (iterable) - view Docstring
- find out how function works

    `?module_name.function_name()`
    

- install module into Jupyter

    `!pip install module_name`
 

- View multiple lines of output from a Cell (not just the last Cell)
```python
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
```
- View Magic Functions
```python 
%lsmagic
```

$\color{teal}{{\rm Veiw~Objects}}$  
- %who DataFrame - lists all DataFrame objects
- Using Command prompt (and listing files in directory)
```python
%%bash
ls *.ipynb
```

$\color{pink}{{\rm Plotting}}$  
- plotting_function; - suppresses output

---

$\color{grey}{{\rm Others}}$  
- write Formulas using LaTex $P(A \mid B) = \frac{P(B \mid A)P(A)}{P(B)}$
- [Lesson NTULearn](https://ntulearn.ntu.edu.sg/webapps/collab-ultra/tool/collabultra?course_id=_2608548_1&mode=view)
- [Tutorials NTULearn](https://ntulearn.ntu.edu.sg/webapps/blackboard/content/listContent.jsp?course_id=_2608548_1&content_id=_2797669_1)

# DataFrame functions

Slice DataFrame by Column

`df["Column"]`

Slice DataFrame by Row

`df.iloc[row]`

read CSV file

`df = pd.read(filename, header=None)`

read Excel file

```
df = pd.read_excel(filename, 
                    sheet_name = 'Sheet1',
                    header = None,
                    engine = 'xlrd')
```

DataFrame functions

```
type(df)      --> check variable type
df.shape      --> size of DataFrame (row,col)
df.head()     --> first 5 rows
df.info()     --> columns, number of rows and their types
df.dypes      --> siimplified view of columns and their types
df.describe() --> summary statistics
```

Plot functions

```
plt.figure(figsize=(24, 4)) ---> set canvas size

sb.boxplot(data = hp, orient = "h") --> best view for central tendency and spread
sb.histplot(data = hp) --> best view for distribution / frequency
sb.kdeplot(data = hp) -->simplified histogram
sb.histplot(data = hp, kde = True) --> histogram and kdeplot
sb.violinplot(data = hp, orient = "h") --> boxplot, histogram and kdeplot

```

Multiple Functions


```
f, axes = plt.subplots(2, 3, figsize=(24, 12))
sb.boxplot(data = hp, orient = "h", ax = axes[0,0])
sb.histplot(data = hp, ax = axes[0,1])
sb.violinplot(data = hp, orient = "h", ax = axes[0,2])
sb.boxplot(data = attack, orient = "h", ax = axes[1,0])
sb.histplot(data = attack, ax = axes[1,1])
sb.violinplot(data = attack, orient = "h", ax = axes[1,2])
```

Joining DataFrames
Create a joint dataframe by concatenating the two Series variables
```
jointDF = pd.concat([attack, hp], axis = 1).reindex(attack.index)
```
Draw jointplot of the two variables in the joined dataframe (plotting one column vs another column)
```
sb.jointplot(data = jointDF, x = "Attack", y = "HP", height = 12)
```

Correlation
Numerical
```
jointDF.corr()
```
Visual
```
sb.heatmap(jointDF.corr(), vmin = -1, vmax = 1, annot = True, fmt=".2f")
```
annot refers to the numbers
fmt is the number formatting

Multivariate Statistics
```
numDF = pd.DataFrame(pkmndata[["HP", "Attack", "Defense", "Sp. Atk", "Sp. Def", "Speed"]])

count = 0
for var in numDF:
    sb.boxplot(data = numDF[var], orient = "h", ax = axes[count,0])
    sb.histplot(data = numDF[var], ax = axes[count,1])
    sb.violinplot(data = numDF[var], orient = "h", ax = axes[count,2])
    count += 1
```

pairplot
Draw pairs of variables against one another

```
sb.pairplot(data = numDF)
```
