# T4SG Data Science Starter Project - Jupyter Notebook Setup



### [Table of contents](#top)
- **1** [Installing Python](#part1)
- **2** [Virtual Environments & Package Managers](#part2)
- **3** [Jupyter](#part3)

## Step 1: Check Python Setup <a id="part1"></a>

First, check that you have Python installed on your machine, with a version of 3.8 or higher. 

You can do this in the terminal by running:

```sh
which python
```

You'll be shown a path like `/usr/bin/python`. If it was found, you can envoke `python` and ask for its version.

```shell
python --version
```

You can also run shell commands from within a Jupyter notebook by prepending a `!`

In [8]:
# !command sends `command` to the shell that launched Jupyter
!python --version

Python 3.12.5


## Step 2: Setup T4SG Virtual Environment <a id="part2"></a>

A [virtual environment](https://docs.python.org/3/library/venv.html#venv-def) is an isolated directory structure containing a specific Python version and a set of additional packages. This isolation allows for project-specific dependencies without affecting the system-wide Python installation.

Typically, Python distributions come with:
* [venv](https://docs.python.org/3/library/venv.html#module-venv) - a tool for creating and managing virtual environments
* [pip](https://docs.python.org/3/installing/index.html#installing-index) - a package installer for Python, used to download and manage third-party packages (primarily from PyPI)

Package managers like Conda and Micromamba extend the concept of a virtual environment to other languages. See the README for more details on packages managers & virtual environments. 

Within this folder, we have a `t4sg.yml` file that lists the packages we'll be using for this project --- we will provide a baseline set of pacakges, but this may be altered by your SSWE/PM depending on specific project functionality. It should be in the same directory as this notebook.

To instruct micromamba to create a new environment from our yml specification, run the following in the terminal:
``` sh
micromamba create -f t4sg.yml
```

Once all the packages have been installed, we can activate our new environment:
```sh
micromamba activate t4sg
```

You'll see your shell prompt now shows your active environment. You can use `deactivate` to turn it off.

## Step 3: Setup Jupyter Notebook <a id="part3"></a>

### Setting up Jupyter in VSCode

This starter project will be in the form of a [Jupyter Notebook](https://jupyter.org/). They allow for the embedding of runnable code in a document of formatted text. It's recommend that you use Visual Studio Code (VSCode) as your development environment. To use Jupyer Notebooks with VSCode:
 1. Install the "Jupyter" extension in VSCode:
    - Open VSCode
    - Go to the Extensions view (Ctrl+Shift+X or Cmd+Shift+X on macOS)
    - Search for "Jupyter"
    - Install the official Jupyter extension by Microsoft
 
 2. Open or create a Jupyter Notebook (.ipynb file) in VSCode

 ### Selecting the Correct Kernel

Next, to be able to run your notebook, you must make sure to select the correct kernel. To ensure you're using the correct Python environment (kernel) for your Jupyter Notebook in VSCode:

1. Open your Jupyter Notebook in VSCode
2. Look for the "Select Kernel" button in the top-right corner of the notebook
3. Click on it and choose the kernel that corresponds to your 't4sg' environment
4. If you don't see your 't4sg' environment, you may need to restart VSCode or manually add the kernel path

Remember to activate your 't4sg' environment before launching VSCode to ensure it's available as a kernel option.

 ### Manually Adding Kernel 

If you don't see your virtual environment in the kernel selection menu, you may need to manually add it. Here's how you can find and add your virtual environment to VSCode:
1. First, activate your virtual environment in the terminal:
```sh
micromamba activate t4sg
```
2. With the environment activated, run this command to get the path to your Python interpreter:
```sh
which python
```
This will output a path like /path/to/your/micromamba/envs/t4sg/bin/python.

3. In VSCode, open the Command Palette (Cmd+Shift+P on macOS or Ctrl+Shift+P on Windows/Linux). Type "Python: Select Interpreter" and choose this option. Click on "Enter interpreter path..." at the bottom of the list. Paste the path you got from step 2 into the input box. VSCode should now recognize this as a valid interpreter and add it to your list of available kernels.

4. Open your Jupyter notebook again and try selecting the kernel. You should now see your 't4sg' environment listed.
If you still don't see the environment, you may need to restart VSCode for the changes to take effect.

After selecting the t4sg kernel, you should now be able to run a code cell, and have access to all the packages listed in the t4sg.yml file.

 ### Running Jupyter Notebooks

To run a Jupyter Notebook cell in VSCode:
1. Click inside the cell you want to run
2. Use the keyboard shortcut:
   - Mac: Shift (or Ctrl) + Enter
   - Windows: Shift + Enter
Alternatively, you can:
3. Click the "Run Cell" play button that appears to the left of the cell
4. Use the "Run" button in the top menu of the notebook

When you run a cell, VSCode sends the code to the Jupyter kernel (in this case, the 't4sg' environment we set up). The kernel executes the code and sends the results back to VSCode, which then displays the output below the cell. For code cells, the output will be the result of the last expression or any print statements. For markdown cells, the rendered markdown will be displayed.

 You can run cells in any order, and variables/functions defined in one cell are available in subsequent cells. This allows for an interactive and iterative coding experience. Try running the cell below to install some key packages that you may use. The primary libraries we will be using are:
- numpy: Provides a fast numerical array structure and helper functions.
- pandas: Provides a DataFrame structure to store data in memory and work with it easily and efficiently.
- scikit-learn: The essential Machine Learning package in Python.
- matplotlib: Basic plotting library in Python; most other Python plotting libraries are built on top of it.
- seaborn: Advanced statistical plotting library.


In [1]:
# See the "import ... as ..." contructs below: 
# they're aliases/shortcuts for the package names. As a result, 
# we can call methods such as plt.plot() instead of matplotlib.pyplot.plot()
import numpy as np
from scipy import stats
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns