# Data Science in VS Code tutorial

Let's walk through the complete [Data Science in VS Code tutorial](https://code.visualstudio.com/docs/datascience/data-science-tutorial) using this Jupyter Notebook to copy over the code, run it, and capture our observations for each step in markdown cells. Along the way, explore how we can take advantage of the Data Science Profile we setup (with the Data Wrangler extension) to make our data science workflow more efficient.

For convenience, the dataset used for this tutorial (`titanic3.csv`) has already been downloaded to the repo under the `1-data/` folder. All you need to do is load it update the code to reference the right location.

---

## 1. Set up a data science environment

The [instructions](https://code.visualstudio.com/docs/datascience/data-science-tutorial#_set-up-a-data-science-environment) require you to setup the required environment manually. However, _this_ dev container is already configured with all the packages and libraries you need (under `requirements.txt`), specifically the following:
 - pandas
 - jupyter 
 - seaborn
 - scikit-learn
 - keras
 - tensorflow
Note that if you want to add any additional libraries (for your own exploration), make sure you update the `requirements.txt` file, save it, and rebuild the dev container to have them take effect.

Once you open this notebook in your dev container (GitHub Codespaces or Docker Desktop) runtime, you should:
 - `Select Kernel` (top right) - gets a drop down of options
 - Select `Python Envionments` - gets you a list of available options
 - Select `Python 3.10.13` - or whatever shows as the recommended one in the list
 - Check that the `Select Kernel` box is now updated to show this runtime.

You are now ready to run the code in this notebook.

---

## 2. Prepare the data

The [instructions](https://code.visualstudio.com/docs/datascience/data-science-tutorial#_prepare-the-data) start by loading the data from a downloaded CSV file. Let's modify that to use the location in this repo - and run just the first cell. This also validates that our Jupyter Notebook environment is working correctly.

In [1]:
# 2.1 | Load the Data into a Pandas Dataframe
import pandas as pd
import numpy as np
df = pd.read_csv('./../1-data/titanic3.csv')

# 2.2 | Have Pandas Show the First Few Rows of the Data
df.head()

   pclass  survived                                             name     sex  \
0       1         1                    Allen, Miss. Elisabeth Walton  female   
1       1         1                   Allison, Master. Hudson Trevor    male   
2       1         0                     Allison, Miss. Helen Loraine  female   
3       1         0             Allison, Mr. Hudson Joshua Creighton    male   
4       1         0  Allison, Mrs. Hudson J C (Bessie Waldo Daniels)  female   

     age  sibsp  parch  ticket      fare    cabin embarked boat   body  \
0  29.00      0      0   24160  211.3375       B5        S    2    NaN   
1   0.92      1      2  113781  151.5500  C22 C26        S   11    NaN   
2   2.00      1      2  113781  151.5500  C22 C26        S  NaN    NaN   
3  30.00      1      2  113781  151.5500  C22 C26        S  NaN  135.0   
4  25.00      1      2  113781  151.5500  C22 C26        S  NaN    NaN   

                         home.dest  
0                     St Louis, MO  
