# Intro to Data Analysis in Python

##### Navigation Links

- [Software setup instructions](https://jenfly.github.io/datajam-python/SETUP)
- Lessons:
  - **0 - Intro to Jupyter**
  - [1 - Intro to Pandas](1-pandas.ipynb)
  - [2 - Intro to Data Visualization](2-dataviz.ipynb)
- Solutions to exercises
- You can download all the workshop materials as a [zipped folder](https://github.com/jenfly/datajam-python/archive/master.zip) or clone/fork the [Github repo](https://github.com/jenfly/datajam-python)
- Need a refresher on Python basics? Check out [these short lessons](https://www.kaggle.com/learn/python) and/or [this cheat sheet](https://www.pythoncheatsheet.org/#Python-Basics)
- Additional resources

## Schedule

- 5:30 Intro to Jupyter (45 min)
- 6:15 Break (15 min)
- 6:30 Intro to Pandas (1 hour 15 min)
- 7:45 Break (15 min)
- 8:00 Intro to Data Visualization (1 hour)

# Lesson 0: Intro to Jupyter

## What is Jupyter?

- For this workshop, we will be using Python via [Jupyter](https://jupyter.org/index.html)

- You can think of Python like a car’s engine, while Jupyter is like a car’s dashboard

  - Python is the programming language that runs computations
  - Jupyter is an integrated development environment (IDE) that provides an interface by adding convenient features and tools

![engine](img/python_jupyter.png)

## Jupyter Notebooks

- Code, plots, formatted text, equations, etc. in a single document
- Run Python code interactively
- Also supports R, Julia, Perl, and over 100 other languages (and counting!)

[Example Notebook](example-notebook.ipynb)

- Notebooks are great for exploration and for documenting your workflow
- Many options for sharing notebooks in human readable format:
  - Share online with [nbviewer.jupyter.org](http://nbviewer.jupyter.org/)
  - If you use Github, any notebooks you upload are automatically rendered on the site
  - Convert to HTML, PDF, etc. with [nbconvert](https://nbconvert.readthedocs.io/en/latest/)

## Getting Started

Let's open the notebook dashboard and create our first Jupyter notebook! Two options:

- Working online on Syzygy
- Working locally on your computer

### What if I don’t like where my current working directory is?

![working_directory](img/working_directory.png)

- Navigating the file system from the notebook dashboard
- Creating new directories (folders)

### Classic Jupyter Notebook vs. JupyterLab

- We'll be using the classic Jupyter notebook app in this workshop
- There is also a newer app called JupyterLab, which has additional handy features
- [This tutorial](https://nbviewer.jupyter.org/github/jenfly/jupyter-quickstart/blob/master/quickstart.ipynb) may be a helpful reference as you get acquainted with Jupyter notebooks and provides a short intro to JupyterLab

### Create a New Notebook

- Create a new untitled notebook
  - Note the .ipynb extension (comes from "interactive Python notebook", the previous name before it was changed to Jupyter to reflect multi-language support)
  - Rename the notebook to "workshop.ipynb"
- Notebooks auto-save periodically, or you can manually save
- You can open a previously saved notebook by clicking on it in the dashboard

## Working with Notebooks

A notebook consists of a series of "cells":
- **Code cells**: execute snippets of code and display the output
- **Markdown cells**: formatted text, equations, images, and more

By default, a new cell is always a code cell.

## Code Cells

To run a code cell, click in it and press `Shift-Enter` or press the Run button on the toolbar

In [1]:
print('Hello world!')

Hello world!


In [2]:
2 + 2

4

In [3]:
today = 'Friday'

Some handy features:

- Auto-complete
- Viewing documentation

## Markdown Cells

In Markdown cells, you can write plain text or add formatting and other elements with [Markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet). These include headers, **bold text**, *italic text*, hyperlinks, equations $A=\pi r^2$, inline code `print('Hello world!')`, bulleted lists, and more.

- To create a Markdown cell, select an empty cell and change the cell type from "Code" to "Markdown" in the dropdown menu on the toolbar
- To run a Markdown cell, press `Shift-Enter` or the Run button on the toolbar
- To edit a Markdown cell, you need to double-click inside it

## Other Notebook Basics

- Organizating cells &mdash; insert, delete, cut/copy/paste, move up/down, split, merge
- Running all cells or selected cell(s)
- Restarting and interrupting the kernel
- Caveat: Notebooks are nonlinear and running cells out of order can sometimes lead to unexpected results
  - It's good practice to periodically restart the kernel and run all cells, making sure that everything works as expected when you run the whole notebook from top to bottom
- Closing vs. shutting down a notebook &mdash; kernel process in background
- Re-opening a notebook after shutdown
  - All the code output is maintained from the previous kernel session
- Clear output of all cells or selected cell(s)

### Interactivity vs. Automation

For a great example of how an interactive workflow in Jupyter notebook can progress into automation with libraries/scripts, check out Jake VanderPlas' blog post [Reproducible Data Analysis in Jupyter](https://jakevdp.github.io/blog/2017/03/03/reproducible-data-analysis-in-jupyter/).

## Python Data Science Ecosystem

The Python libraries for data science are developed and maintained by external "3rd party" development teams
- Python core + 3rd party libraries = **ecosystem** 
- To install and manage 3rd party libraries, you need to use a package manager such as `conda` (which comes with Anaconda/Miniconda)

Some of the libraries in the Python data science ecosystem:

![ecosystem_big](img/ecosystem_big.png)

From [The Unexpected Effectiveness of Python in Science](https://speakerdeck.com/jakevdp/the-unexpected-effectiveness-of-python-in-science) (Jake VanderPlas)

In this workshop, we'll be using `pandas` to work with tabular data and will give a brief introduction to data visualization with the `seaborn` and `plotly` libraries.

---

Go to: [next lesson](1-pandas.ipynb)