In [None]:
---
title: "Basic Modules"
execute:
    echo: true
    eval: true
---

# Numpy and Pandas {.unnumbered}

## Numpy {.unnumbered}
- Numpy is the most important library for Python. 
- The standard data types in Python are very slow and not very efficient for data analysis. 
- Numpy is based mainly on C an C++. 
- This allows Numpy to be faster than plain Python. 
- With Numpy a new data type is introduced `numpy array`. 
- Numpy arrays are multidimensional arrays that are much faster than Python lists. 
- The libary also includes many mathematical functions and methods for linear algebra. 

More information can found at the [Numpy website](https://numpy.org/).

In [None]:
# Load the required libraries
import numpy as np

###  Numpy Arrays {.unnumbered}

An array can be described as multidimensional lists. For example a matrix is a 2D array.

In [None]:
mat = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(mat)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


The elements of an array can be accessed using the index of the element. 

In [None]:
print(mat[0, 0])   # 1 element at row 0, column 0

1


In [None]:
print(mat[1, 2])  # 6 element at row 1, column 2

6


In [None]:
print(mat[:, 0])  # [1 4 7] all elements in column 0

[1 4 7]


In [None]:
print(mat[1, :])  # [4 5 6] all elements in row 1

[4 5 6]


In [None]:
empty_mat = np.empty((3,3),dtype=float)
print(empty_mat)

[[4.9e-324 9.9e-324 1.5e-323]
 [2.0e-323 2.5e-323 3.0e-323]
 [3.5e-323 4.0e-323 4.4e-323]]


In [None]:
ones_mat = np.ones((3, 3))
print(ones_mat)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [None]:
zeros_mat = np.zeros((3, 3))
print(zeros_mat)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [None]:
arr = np.arange(1, 10, 2) # an array from 1 to 10 with a step of 2
print(arr)

[1 3 5 7 9]


In [None]:
arr2 = np.linspace(0,100,5) # an array from 0 to 100 with 5 elements
print(arr2)

[  0.  25.  50.  75. 100.]


In [None]:
rand_mat = np.random.rand(3,3) # a 3x3 matrix with random numbers between 0 and 1
print(rand_mat)

[[0.2256729  0.02910694 0.54348537]
 [0.37709634 0.44347666 0.83187513]
 [0.96959175 0.36089956 0.10453707]]


## Pandas {.unnumbered}

- Pandas is a library for data manipulation and analysis. 
- It is built on top of Numpy. 
- With Pandas you are working with dataframes and not with arrays like in Numpy. 
- Dataframes are two-dimensional labeled data structures with columns of potentially different types.
- It is like a table in a database or a spreadsheet. Pandas has a lot of methods to manipulate dataframes. 
- You can select subsets of the data, filter, sort, group, merge, join, etc. 
- You can statistically analyze the data, export the data to different file formats, but also plot the data with the help of `matplotlib`.

More information can be found under [Pandas website](https://pandas.pydata.org/).

In [2]:
import pandas as pd

### Panda DataFrames {.unnumbered}

Panda DataFrames are two-dimensional labeled data structures with columns of potentially different types like a table.

In [4]:
data = pd.DataFrame({"gas": ["CH4", "H2O", "CO2"], "moalar_weight": [16, 18, 44]
                     }) # create a dataframe from a dictionary
print(data)

   gas  moalar_weight
0  CH4             16
1  H2O             18
2  CO2             44
