# Class 0: Configuring your computer <a class="anchor" id="Config"></a>

<hr/>

In this lesson, we will set up a Python computing environment for scientific computing. 

To set up Python for scientific computing on your own machine, we will be downloading and installing a Python distribution called [Anaconda](https://www.anaconda.com), with its associated package manager, `conda`. Anaconda contains binaries of many of the scientific packages we will need during this class, so we avoid setting up new environments with package-by-package installations.

In addition, we will install Git and GitHub Desktop, which will help you download all the class material (which can also be accessed on Moodle).

It would be great if you could go over the steps described below before the class. We will go over this again during the first class, so do not worry if you do not get everything working beforehand. However, we suggest to do at least the following installations in advance, as they may take some time:
- Mac users: [XCode](https://developer.apple.com/xcode/)
- Windows users: either Firefox or Chrome, and [Git](https://gitforwindows.org)
- All users: [Anaconda](https://www.anaconda.com/distribution/) (more instructions below) and a GitHub account ([http://github.com/](http://github.com/))

## Contents

- [Class 0: Configuring your computer](#Config)  
  - [Why install on my own machine](#Why-Install)
  - [macOS users: Install XCode](#Install-XCode)
  - [Windows users: Install Git and Chrome or Firefox](#Install-Git)
  - [Uninstalling Anaconda](#Uninstall-Anaconda)
  - [Downloading and Installing Anaconda](#Download-Anaconda)
  - [Installing node.js](#Install-NodeJS)
  - [Launching Jupyter Notebook and JupyterLab](#Launching-Notebook)
      - [Launching JupyterLab from the command line](#Launching-Lab-FromCommand)
  - [The conda package manager](#Conda-Package)
  - [Installations](#Install-Packages)
  - [Setting Up Git](#Setting-Up-Git)
  - [Checking your distribution](#Checking-Distribution)

## Why install on my own machine? <a class="anchor" id="Why-Install"></a>

For this class we'll be working with [Google Colab](http://colab.research.google.com/), which is a free notebook environment, that does not require any installations and uses Google servers to execute the code.

However, working on local is the standard when you start a new project, especially in industry when you have some constraints such as intellectual property. It is thus useful to learn how to set up your own machine and using tools like GitHub (also the standard). Also, working on local is advantageous when you need to install a custom software, or a software that you write.

Before we get rolling with the Anaconda distribution on your own machine, we have some considerations and installations to get out of the way first.

## macOS users: Install XCode <a class="anchor" id="Install-XCode"></a>

If you are using macOS, you should install [XCode](https://developer.apple.com/xcode/). This takes up about 35 GB on your hard drive and it takes a long time to install.

After installing it, you need to open the program. Be sure to do that, for example by clicking on the XCode icon in your Applications folder. Upon opening XCode, it may perform more installations. You can let it go ahead and do this, and then close XCode.

## Windows users: Install Git and Chrome or Firefox <a class="anchor" id="Install-Git"></a>

To work on notebooks,  [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) and the [Jupyter Notebook App](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/what_is_jupyter.html). They are browser-based, and Chrome, Firefox, and Safari are supported. Therefore, if you are a Windows user, you need to be sure to have either Chrome of Firefox installed.

Git is installed on Macs with XCode. For Windows users, you need to install Git. You can do this by following the instructions [here](https://gitforwindows.org).

## Uninstalling Anaconda <a class="anchor" id="Uninstall-Anaconda"></a>

If you have previously installed Anaconda with a version of Python older than 3.9, you need to uninstall it, removing it completely from your computer. To know which version of Python you have, on Mac open the Terminal and type `python - version`, then `exit()` and `conda list python -f`, on Windows, start Anaconda by searching for `Anaconda Prompt` and click the first result. 
You can find instructions on how to uninstall Anaconda from the [official uninstallation documentation](https://docs.anaconda.com/anaconda/install/uninstall/).

## Downloading and installing Anaconda <a class="anchor" id="Download-Anaconda"></a>

To download and install Anaconda: 

1. Go to the [Anaconda homepage](https://www.anaconda.com/download) and download the graphical installer.
2. Install Anaconda with Python 3.9.
3. Follow the on-screen instructions for installation. Make sure that during the installation [Anaconda is added to your environment/path](https://docs.anaconda.com/anaconda/user-guide/tasks/integration/python-path/):
  - For Mac OS / Linux: this should happen by default.
  - For Windows users, it is recommended installing for “just me” instead of “all users” (disregard the “not recommended” warning from Anaconda).

That's it, now you should have a functioning Python distribution.

## Installing node.js <a class="anchor" id="Install-NodeJS"></a>

node.js is a platform that enables you to run JavaScript outside of the browser. We will not use it directly, but it needs to be installed for some of the more sophisticated JupyterLab functionality. Install node.js by following the instructions [here](https://nodejs.dev).

## Launching JupyterLab and Jupyter Notebook <a class="anchor" id="Launching-Notebook"></a>


In the **Anaconda Navigator**, you will see icons to launch JupyterLab and Jupyter Notebook (or Jupyter Notebook App).
JupyterLab is the web-based interactive development environment for notebooks, code, and data. The Jupyter Notebook App is the original web application for creating and sharing ntoebooks (we'll go over them again later in the class).

If you're using macOS, Anaconda Navigator will be available in your `Applications` menu. If you are using Windows, you can launch Anaconda Navigator from the Start menu.

To launch **JupyterLab**, click the "launch" icon for JupyterLab in the Anaconda Navigator. When you do that, a new browser window or tab will open with JupyterLab running.

To launch **Jupyter Notebook**, click the "launch" icon for Notebook in the Anaconda Navigator. When you do that, a new browser window or tab will open with the home page of Jupyter Notebook.

### Launching JupyterLab from the command line <a class="anchor" id="Launching-Lab-FromCommand"></a>

You can also launch say **JupyterLab** from the command line. If you are on a Mac, open the `Terminal` program. You can do this hitting `Command + space bar` and searching for "terminal." Using Windows, you should launch PowerShell. You can do this by hitting `Windows + R` and typing `powershell` in the text box.

Once you have a terminal or PowerShell window open, you will have a prompt, which is a message from the terminal saying that it is awaiting your input. At the prompt, type

    jupyter lab
    
and you will have an instance of JupyterLab running in your browser. If you want to specify the browser, you can, for example, type

    jupyter lab --browser=firefox
    
on the command line.

It is up to you if you want to launch JupyterLab from the Anaconda Navigator or command line.

## The conda package manager <a class="anchor" id="Conda-Package"></a>

conda is a package manager for keeping all of your packages up-to-date. We will primarily be using conda to install and update packages.

conda works from the command line.  Now that you know how to get a command line prompt, you can start using conda. The first thing we'll do is update conda itself. Enter the following on the command line.

    conda update conda

You can press `y` to continue. You should do this once more, again entering

    conda update conda
    
on the commmand line.

Next, we will update the packages that came with the Anaconda distribution. To do this, enter the following on the command line:

    conda update --all

If anything is out of date, you will be prompted to perform the updates, and press `y` to continue. (If everything is up to date, you will just see a list of all the installed packages.)

## Installations <a class="anchor" id="Install-Packages"></a>

There are some additional installations of Python packages we will need for this class. Many of these packages are available through conda. First, we need to install `jupyter_bokeh`, which allows Bokeh plots to be displayed withing Jupyter notebooks. Do the following on the command line.

    conda install -c bokeh jupyter_bokeh
    
Now, we can proceed with the rest of our installations.

    conda install colorcet holoviews hvplot param datashader pyserial altair hypothesis netcdf4
        
There are a few other packages from [pip](https://realpython.com/what-is-pip/) - the standard package manager for Python - we will need, so we can go ahead and install those now.

    pip install iqplot watermark blackcellmagic jupyterlab-spellchecker

You should close your JupyterLab session and terminate Anaconda Navigator after you have completed the build. Relaunch Anaconda Navigator and launch a fresh JupyterLab instance.

## Setting up Git <a class="anchor" id="Setting-Up-Git"></a>

We will use Git to share files, data, and code we will use in the class. In case you prefer using simple downloads, the materials will also be available on Moodle.

### Set up a GitHub account

Go to [http://github.com/](http://github.com/) to get an account. You should register with your academic email address so you get free private repositories as academics. You should also think carefully about picking your user name. There is a good chance other people in your professional life will see this.

### Forking the class repository

Let's say you want to do some work on a project with code stored in a **repository**, but you are not an active collaborator. For example, there could be a useful package a lab at another university put on GitHub to analyse specific image data that is useful for your research. You want to do something *almost* exactly like the package does, but need to make some small modifications yourself. You want to clone the repository and add a couple functions and maybe modify one or two they already have, leaving much of the rest of the repository untouched. Of course, you also want to update your local copy of all that untouched (but still used) code when the maintainers update it.

This is kind of exactly what you want to do here with the material for this class. We have a repository that has some code and data, but you want to write your Python code right in that repository. If we update the data sets, you want to be able to pull in our changes, but still have your code in place.

There is a nice way to do this called **forking**. To fork a repository on GitHub, simply go on the GitHub repository of this class at [this link](https://github.com/edoardochiarotti/class_datascience) (be sure you are logged in as yourself when you do this), which should look like this:

![Screen%20Shot%202022-09-17%20at%2019.32.32.png](attachment:Screen%20Shot%202022-09-17%20at%2019.32.32.png)

The fork button is in the upper right. Click the button, and you now have a **fork** of the bootcamp repository on your GitHub account.

### Cloning your fork to your local machine

Now you can clone your fork of the repository to your local machine. We will keep all of your material under version control in a directory called `git` in your home directory. 
    
To find the forked repository, navigate your browser to the forked copy of the `class_datascience` repository on your account (this is where clicking the "Fork" button took you in your browser). The browser URL will be: `https://github.com/YOURUSERNAME/class_datascience`, and the top left of the website will say "`YOURUSERNAME/class_datascience` forked from `edoardochiarotti/class_datascience`, like below for user `echiarotti-dataclass`.

![Screen%20Shot%202022-09-17%20at%2019.44.01.png](attachment:Screen%20Shot%202022-09-17%20at%2019.44.01.png)

Now let's clone *your forked repository* (not the original `class_datascience` repository). Open GitHub Desktop and click on `Current Repository` in the upper left corner and then `Add -> Clone Repository`, as shown here:

![Screen%20Shot%202022-09-17%20at%2019.51.34-2.png](attachment:Screen%20Shot%202022-09-17%20at%2019.51.34-2.png)

Look in `Your Repositories` and find `YOURUSERNAME/class_datascience`, and set path such as `Users/YOURUSER/Documents/GitHub/class_datascience`, as here:

![Screen%20Shot%202022-09-17%20at%2019.54.27-2.png](attachment:Screen%20Shot%202022-09-17%20at%2019.54.27-2.png)

There will be now a window saying "How are you planning to use this fork" and you should select "For my own purposes" (as you do not plan to contribute to the material of this class ...yet).

You now have a local copy of your own fork of the class repository. You can add files and edit it. When you commit and push, it will all be on your account, and the original repository will not see the changes. (Don't worry if you do not understand all of this now; we will cover it all in subsequent lessons.)

### Syncing your forked repository to the upstream repository

As I mentioned before, you want to be able to sync your local repository with the original `class_datascience` repository so you can retrieve any updates in it. 

To recap, there are 3 repositories you are currently dealing with:
1. The original repository on GitHub, which we call **upstream repository**.
2. You forked repository on GitHub of the upstream repository, which we call **forked repository**
3. The cloned repository of the forked repository, which we call **local repository**

GitHub Desktop should have automatically assigned the upstream reository (the original of the class) as the remote repository for your local repository. Basically this means that now GitHub Desktop is following the upstream reository for you, and is tracking any changes in the upstream repository (for example, when we add a new notebook). When these changes happen, the "History" section of the local repository in GitHub Desktop will ask you if you want to merge them with your local repository and commit them to your forked repository, as it is shown here:

![Screen%20Shot%202022-09-17%20at%2020.18.46-3.png](attachment:Screen%20Shot%202022-09-17%20at%2020.18.46-3.png)

"Behind" means that there are changes in the upstream repository (for example new code for week 2) that you do not have in your local repository. When you click "create a merge commit", the new notebooks will automatically be added to your local repository. Now, you can update these changes you have on local also in your forked repository on GitHub, by clicking "Push".

If you want to work in this local repository, by editing and adding files etc, we suggest you create copies of the notebooks, for example `nameofnotebook_YOURINITIALS.ipynb`, and work on those, so when you will be updating the folder as shown below, there will not be overwriting conflicts. This means that if you work directly on the notebooks we use for class, and you save them on your local, and then we change them in the original repository, and you update the folder, then there will be conflicts, and GitHub desktop will tell you. We will see how to do this in the next lesson. If you prefer to work on any other folder in your computer, just save these copies in that folder and work there.

We are doing all of this to get you familiar with GitHub. Of course, you do not have to use it, and you can just download the notebooks in the standard way. To do that, there are 2 options:
1. Go on the [class GitHub page](https://github.com/edoardochiarotti/class_datascience), and click Code -> Donwload Zip (you must download the full folder, it is not possible to download single notebooks)

![Screen%20Shot%202022-09-19%20at%2011.23.01.png](attachment:Screen%20Shot%202022-09-19%20at%2011.23.01.png)

2. Go on Moodle and download the material from there (we'll upload everything on Moodle too)

## Checking your distribution <a class="anchor" id="Checking-Distribution"></a>

We'll now run a quick test to make sure things are working properly. We will make a quick plot that requires some of the scientific libraries we will use in the bootcamp.

Go to Anaconda Navigator and launch Jupyter Notebook, and click New to open a notebook as shown here:

![Screen%20Shot%202022-09-17%20at%2020.43.57.png](attachment:Screen%20Shot%202022-09-17%20at%2020.43.57.png)

In the first cell (the box next to the `[ ]:` prompt), paste the code below. To run the code, press `Shift+Enter` while the cursor is active inside the cell. You should see a plot that looks like the one below. If you do, you have a functioning Python environment for scientific computing!

In [1]:
import numpy as np
import bokeh.plotting
import bokeh.io

bokeh.io.output_notebook()

# Generate plotting values
t = np.linspace(0, 2 * np.pi, 200)
x = 16 * np.sin(t) ** 3
y = 13 * np.cos(t) - 5 * np.cos(2 * t) - 2 * np.cos(3 * t) - np.cos(4 * t)

p = bokeh.plotting.figure(height=250, width=275)
p.line(x, y, color="red", line_width=3)
text = bokeh.models.Label(x=0, y=0, text="Data Science Class", text_align="center")
p.add_layout(text)

bokeh.io.show(p)