## Introduction to data analysis
### What is Anaconda?
- Program to manage(install, upgrade or uninstall) packages and environments to use with python
- python package - bunch of modules where each module consists of a set of classes and function definitions

**Anaconda Distribution**
- Anaconda is a software distribution that includes the following:

1. Anaconda Navigator - GUI that helps open up any installed applications such as jupyter notebook or Vs code
- Open Anaconda Navigator with the command `anaconda-navigator`

2. Managing Packages using either `pip` or `conda`
- Both are python package managers. Available packages availability from the Anaconda distribution in `conda` focus on data science whereas pip is for general use

3. Environments
- Python environment comprises of: python interpreter, python-packages and utility scripts such as `pip`
- Creating a conda environment:
    
    `conda create -n my_env numpy`

- Alongside managing packages conda is also a virtual environment manager, similar to `virtualenv` and `pyenv`

### Installing Anaconda
- Listing packages: `conda list`
- List of application installed with Anaconda:
    * Anaconda Navigator - GUI for managing your environments and packages
    * conda - command-line utility
    * python - latest version of python gets installed as an individual package
    * Bunch of apps such as "Spyder-IDE generated towards scientific development"
- Upgrading:
    
    ```
    conda upgrade conda
    conda upgrade --all
    ```
## Managing Packages
- Install Packages;

    ```
    conda install PACKAGE_NAME
    conda install numpy scipy pandas
    conda install numpy=1.10
    ```
- Remove Packages
    ```
    conda remove PACKAGE_NAME
    conda update --all
    ```
- Search a Package to Install, if you don't know the exavt name of the package you're looking for, you can try searching with;
    
    ```
    conda search *SEARCH_TERM*
    conda search '*beautifulsoup*'
    ```
- Shell might expand the wildcard, to fix wrap the search string in single or double quotes
- Returns a list of the Beautiful Soup available with the appropriate package name

## Managing Environments
- To create an environment;

    ```
    conda create -n env_name [python=X.X][LIST_OF_PACKAGES]
    conda create -n my_env python=3.7 numpy Keras
    ```
**Activating,listing and deactivating an environment**
    ```
    conda activate my_env
    conda list
    conda deactivate
    ```

## Saving and loading environments
- A useful feature is sharing environments do that others can install all the packages used in you code with the correct versions
- All package names including the python version present in the current environment
;
    `conda env export`

- In the above code you see the environment and all other dependencies
- You can save all the above information to a `YAML` file "environment.yaml" and later share it to other github users

    ```
    conda env export > environment.yaml
    ```

**Create an environment from an environment file**
- Making an environment file and including it in the repository
`conda env create -f environment.yaml`

**Listing environments**

    ```
    conda env list
    conda info --envs
    ```

**List the packages inside an environment name**
- If the environment is not activated
- If the environment is activated
- To see if a specific package, say `scipy` is installed in an environment

    ```
    conda list -n env_name
    conda list
    conda list -n env_name scipy
    ```

**Removing an environment**
`conda env remove -n env_name`

## Summary and Best Practices
- Having separate environments for different python versions

    ```
    conda create -n py34_env python=3.4
    conda create -n py37_env python=3.7
    ```

- Conda is a package manager, Anaconda is a distribution
- A software distribution is a pre-built and pre-configured collection of packages that can be installed and used on a system
- A package manager is a tool that automates the process of installing, updating and removing packages
- Conda with its `conda install`, `conda update`, `conda remove` falls squarely under the second definition: it is a package manager
- Conda is distinct from anaconda/miniconda as is python itself and if you wish can be installed without ever touching Anaconda/Miniconda
- Conda is not only a python package manager, it is designed to manage packages and dependencies within any software stack. In this sense it is less like pip and more like a cross-platform version of `apt` or `yum`
- Pip is a general-purpose manager for python packages, conda is a language-agnostic cross-platform environment manager. Pip installs python packages within any environment, conda installs any package within an conda environments.
- You can install some conda packages within a virtualenv but better is to use conda's own environment manager, it is fully compatoble with pip and has several advantages over virtualenv
- **Virtualenv/venv** are utilities that allow users to create isolated python environments that work with `pip`
- Using conda within your virtualenv(not recommended though);
    ```
    virtualenv test_conda
    source test_conda/bin/activate
    pip install conda
    conda install numpy
    ```

# Lesson 2: Jupyter Notebooks
1. What are jupyter notebooks 

- Notebooks are an amazing tool for data analysis where text,math equations,  code and visualizations all sit in one document in your browser
- Notebooks are rendered automatically on Github

**Literate Programming**
- Notebooks are also a form of literate programming. With literate programming the documentation is written as a narrative alongside the code instead of sitting off by its own
- Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do

**How Notebooks Work**
- The central point is the notebook server . You connect to the server through your browser and the notebook is rendered as a web app
- Code you write in the web app is sent through the server to the kernel. The kernel runs the code and sends it back to the server then the output is rendered bak in the browser
- When you save the notebook, it is written to the server as a JSON file with a `.ipynb` file extension
- The great part of this architecture is that the kernel doen't need to run Python.Since the notebook and the kernel are separate, code in any language can be sent between them
- IPython notebooks were renamed because notebooks became language agnostic. The new name **Jupyter** comes from the combination of **J**ulia, **Py**thon and **R**
- Another benefit is that the server can be run anywhere and accessed via the internet
- Typically you'll be running the server on your own machine where all your data and notebook files are stored. But you could also set up a server on a remote machine or cloud instance like Amazon's EC2. Then you can access the notebooks in your browser from anywhere in the world

2. Installing Jupyter Notebook

- It automatically gets installed with the Anaconda distribution. You'll be able to use notebooks from the default environment
- To run the notebook run the following command;
 
    `jupyter notebook`
   
3. Launching the Notebook Server
- Navigate to the directory where you'd like to create notebook files `.ipynb` and enter the command to lanch jupyter notebook

**Create a New Notebook**
- On the right click on "New" to create a new notebook, textfile, folder or terminal
- The list under "Notebooks" shows the kernels you have installed
- I'm running the server in a Python 3 environment, so I have a Python 3 kernel available
- [Installing kernels](https://ipython.readthedocs.io/en/latest/install/kernel_install.html)

**Jupyter Notebook Server Tabs**
- The tabs at the top show "Files", "Running" and "Cluster"
- Files shows all the files in the current directory
- Running tab will list all the current running notebooks
- Clusters previously was where you'd create multiple kernels for use in pararellel computing.Now that's beentaken over by ipyparallel so there isn't much to do be done there

**Notebook Conda Package**
- You should consider installing the Notebook Conda package
    `conda install nb_conda`
- After succesful installation of the `nb_conda` package, if you run the notebook server from a conda environment you'll have access to the "Conda" tab
- Here you can manage your environments from within jupyter. Here you can manage your environments from within jupyter. YOu can create new environments, install packages, update packages, export environments and much more
- You'll also be able to access any of your conda environments when choosing a kernel
- Adding a newly created environment to a kernel
    
    ```
    conda activate data_analysis
    python -m ipykernel install --user --name data_analysis --display-name "Python (data_analysis)"
    ```

**Shutting down jupyter**
- You can shutdown individual notebooks by marking the checkbox next to the notebook on the server home and clicking "Shutdown".
- Make sure you've saved your work before you do this though
- You can shutdown the entire server by pressing control + C twice in the terminal.

4. Notebook Interface
- You can create a new notebook by clicking "new" and then choosing a kernel such as python3. This action will create a new notebooj in a new browser tab  named `Untitled.pynb`
- You'll see a little box outlined in green. This is a cell. Cells are where you run and write your code. You can also change it to render Markdown
- In the toolbar click "Code" to change it to Markdown and back
- The little play button runs the cell and the up and down arrows move cells up and down

**Command palette**
- The little keyboard is the command palette. It brings up a panel with a search bar where you can search various commands
- This is helpful for speeding up your workflow as you don't need to search around in the menus with your mouse. For instance if you want to merge two cells; type in "merge..."

**More things**
- In the "File" menu you can download the notebook in multiple formats
- You'll often want to download it as an HTML file to share with others who aren't using jupyter
- The Markdown and reST formats are great for using notebooks in blogs or documentation

5. Code Cells
- Most work in notebooks is done in code cells. This is where you write your code and it gets executed. Any code executed in one cell is available in all other cells

6. Markdown Cells
- As with code you press *Shift + Enter* or *Control + Enter* to run the markdown cell

__Math expressions__
- You can create math expressions in Markdown cells using _LaTex_ symbols
- Notebooks use MathJax to render the LaTeX symbols as math symbols
- To start math mode, wrap the LaTeX in dollar signs

7. Keyboard Shortcuts

8. Magic Keywords
- Magic keywords are special commands you can run in cells that let you control the notebook itself or perform system calls such as changing directories
- For example you can set up matplotlib to work interactively with `%matplotlib`
- Magic commands are preceded with one or two percent signs (% or %%) for line magics and cell magics respectively
- Line magics apply only to the line the magic command is written on while cell magics apply to the whole cell
- Magic keywords are specifc to the normal Python kernel. If you are using other kernels, these most likely won't work

**Timing code**
- At some point, you'll probably spend some effort optimizing code to run faster. Timing how quickly your code runs is essential for this optimixation
- You can use the `timeit` magic command to time how long it takes for a function to run

**Embedding visualizations in notebooks**
- As mentioned before notebooks embed images along with text and code. This is most useful when you are using matplotlib or other plotting packages to create visualizations
- You can use `%matplotlib` to set up matplotlib for interactive use in the notebook. 
- By default figures will render in their own window. However you can pass arguments to the command to select a specific "backend",the software that renders the image
- To render figures directly in the notebook, you should use the inline backend with the command `%matplotlib inline`

**Debugging in the Notebook**
- With the Python kernel, you can turn on the interactive debugger using the magic command `%pdb`
- When you cause an error, you'll be able to inspect the variables in the current namespace
- To quit the debugger simply enter `q`

9. Converting Notebooks
- Notebooks are just big JSON files with the extension `.ipynb`
- Since notebooks are JSON, it is simple to convert them to other formats, jupyter comes with a utility called `nbconvert` for converting to HTML, Markdown, Slideshows etc

    ```
    pip install nbconvert
    conda install nbconvert
    jupyter nbconvert --to FORMAT mynotebook.ipynb
    jupyter nbconvert --to html introduction.ipynb
    ```

- If you wish to install any package in conda that is not available in Anaconda distribution, such as the Airbase oackage use `pip install airbase` instead of `conda install airbase`

10. Creating a Slideshow
- The slideshows are created in notebooks like normal, but you'll need to designate which cells are slides and the type of slide the cell will be
- In the menu bar, click **View > Cell Toolbar > Slideshow** to bring up the slide cell menu on each cell
- This will show a menu dropdown on each cell that lets you choose how the cell shows up in the slideshow;
- **Slides** are full slides that you move through left to right
- **Sub-slides** show up in the slideshow by pressing up or down
- **Fragments** are hidden at first, then appear with a button press
- You can skip cells in the slideshow with **Skip** and **Notes** leaves the cell as speaker notes

**Running the slideshow**
- To create the slideshow from the notebook file, you'll need to use `nbconvert`;

    ```
    jupyter nbconvet introduction.ipynb --to slides

    ```

- This just converts the notebook to the necessary files for the slideshow but you need to serve it with an HTTP server to actually see the presentation- To convert it and immediately use it;

    ```
    jupyter nbconvert introduction.ipynb --to slides --post serve
    ```

- This will open up the slideshow in your browser so you can present it
