# Jupyter vs Streamlit

Jupyter notebooks are great to explore a dataset. Let's do that here: we have stored the titanic CSV in the subfolder "files". It's downloaded from [Kaggle](https://www.kaggle.com/datasets/vinicius150987/titanic3). We'll install the libraries, import the dataset and show some graphs.

## Installing libraries

We need pandas, matplotlib and numpy. If you're working with Excel-files (not CSV's) you'll also need openpyxl and of you want really pretty graphs look at seaborn. (The idea is that Python is used as much as it is because of the libraries.)

If you installed the requirements.txt you don't need to run the following commnands. They won't be included in the next notebooks, but for now it's nice to include them just the once, if only to show the "!"-syntax that runs commands in the shell in stead of in python. An alternative is "%" which runs the commands in the shell but using jupyter-python. [It's a long story.](https://dnmtechs.com/difference-between-and-in-jupyter-notebooks-in-python-3/)

In [None]:
# !pip install pandas matplotlib numpy openpyxl

## Importing

Import csv-files in pandas is quite easy, really.

In [None]:
import pandas as pd

df = pd.read_excel('../../files/titanic3.xlsx', engine='openpyxl')
df.head()


# Some graphs

Some quick and easy graphs to get a feel of the dataset. We're not cleaning it up just yet.

First plot the age of the passengers vs their fare and color the dots based on the class they were in. So you'll need:

* A scatter plot
* 'age' on the X-axis
* 'fare' on the Y-axis
* 'pclass' on the color (C)

We'll give you the code for this graph:

In [None]:
df.plot.scatter(x='age', y='fare', c='pclass', colormap='viridis', s=50, alpha=0.5)

This way of creating a graph, starting out with the dataframe, choosing plot, etc. is directly from pandas. But pandas is mainly known for the dataframe it entails. The graphs are derived from matplotlib (which also needs to be installed). For reasons that will become clear later on we'll now add some code to the graph we made before.

Mainly, we're creating a figure and axes first by using "plt.subplots()". This "ax" is then passed to de plot (which is the same code as before but has the "ax"-parameter added). Then we can customize the graph (add title and labels) and finally show using "plt.show()".

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
df.plot.scatter(x='age', y='fare', c='pclass', colormap='viridis', s=50, alpha=0.5, ax=ax)
ax.set_title('Titanic passengers')
ax.set_xlabel('Age')
ax.set_ylabel('Fare')
# show the plot
plt.show()


Make a histogram of all ages on the titanic. Keep on using the long syntax (with matplotlib) we used earlier.

In [None]:
# Up to you!



Show a boxplot of the ages, divided by the class they were in.

In [None]:
# Up to you!



## Streamlit

Jupyter notebooks are nice, but not very interactive. If you want to change the graphs you have to change the settings and to show them (and rerun them) you need to install vscode, python, libraries, ...

Streamlit solves all this, but the caveat is it requires python scripts, not jupyter-notebooks.

We'll create code here first and put them in a streamlit file later on.

### Filtered dataframe

Filter the dataframe bas on two ages: the minimum and maximum age. Enter these ages as variables for now.

In [None]:
min_age = 70
max_age = 80

#filter the dataframe
df_filtered = df[(df.age >= min_age) & (df.age <= max_age)]
df_filtered

### Histogram

Create a histogram of the ages for all people between two pre-set ages. Enter these ages as variables again.

In [None]:
# Up to you!



### Boxplot by embarked

The titanic made 3 stops to pick up passengers: **S**outhampton, **C**herbourg and **Q**ueenstown. Show the boxplot of ages based on class, filtered on the place of embarkement.

In [None]:
# Up to you!



### Widgets

Note how we always made you define a variable before filtering the dataframe? These will become [widgets](https://docs.streamlit.io/library/api-reference/widgets) in the streamlit-app.

* Age: [slider](https://docs.streamlit.io/library/api-reference/widgets/st.slider)
* Embarked: [selectbox](https://docs.streamlit.io/library/api-reference/widgets/st.selectbox)

And the graphs? They can be plotted using "st.pyplot(fig)" with fig being the fig-variable we've been creating.

The code for this can be found in the file "3.3 - First streamlit.py"

It's a first try and could using some updates. The selection boxes and results are all below each other, making it a very un-userfriendly website. But as a poc it's fine!