# Introduction

**Prerequisites**

- Good attitude  
- Good work ethic  


**Outcomes**

- Understand what a programming language is.  
- Know why we chose Python  
- Know what the Jupyter notebook is  
- Know Jupyter notebook basics: cell modes, editing/evaluating cells  

## Welcome

Welcome to the start of your path to learning how to work with data in the
Python programming language!

A programming language is, loosely speaking, a structured subset of natural
language (words) and special characters (e.g. `,` or `{`) that allow humans
to describe operations they would like their computer to perform on their behalf

It is then the job of the programming language to translate these words and
symbols into instructions the computer can execute

### Why Python?

Among the hundreds of programming languages, we chose to teach you Python for the following reasons

- Easy to learn and use (relative to other programming languages)  
- Designed with readability in mind  
- Excellent tools for handling data efficiently and succinctly  
- Cemented as the world’s [third most popular](https://www.zdnet.com/article/programming-language-of-the-year-python-is-standout-in-latest-rankings/)
  programming language, the most popular scripting language, and an increasing standard for
  [data analysis in industry](https://medium.com/@data_driven/python-vs-r-for-data-science-and-the-winner-is-3ebb1a968197)  
- General purpose: Initially you will learn Python for data analysis, but it
  can also used for websites, database management, web scraping, financial
  modeling, data visualization, etc.  In particular, it is the world’s best language for
  [gluing](https://en.wikipedia.org/wiki/Glue_code)  those different pieces together.  


The general purpose nature of Python comes at a cost: it is often called “the second best language
for everything”.  But the flip-side of that argument is it is a great language to have in your
toolbox to solve all sorts of problems and patch them together.  Hence, a versatile “second-best”
language is typically the best one to learn first.

Some other languages to consider

- R has a spectacular ecosystem of statistical packages, and is defensible as a choice for pure
  data science.  However, it is a difficult general-purpose language to use and ill-suited towards
  the versatile glue code in which Python excels.  Nevertheless, it can be a useful second-language
  to learn for projects that are entirely statistical.  
- Matlab has much more natural notation for writing linear algebra heavy code.  However, it is:
  (a) expensive; (b) poor at dealing with data analysis; (3) grossly inferior to Python as a
  language; and (4) being left behind as Python and Julia ecosystems expand to more packages.  
- Julia is in part a far better version of Matlab, which can be as fast as Fortran or C.  However,
  it has a young and immature environment and is currently more appropriate for academics and
  scientific computing specialists.  


Another consideration with programming language is runtime performance, where both Python and R can
be slow for general purpose code.  For the most part this will not be an issue for doing data
science and machine learning, as most datascience packages in Python (and R) call out to
high-performance code written in other languages in the background.  If you are writing more
traditional scientific/technical computing in Python, there are [things that can help](http://numba.pydata.org/) make Python faster in limited cases, but other languages like Julia and
Matlab can become more appealing.

### Why Open Source?

Software development has changed radically in the last decade, increasingly becoming a process of
stitching together both established high quality libraries, and state-of-the-art research projects

A major disadvantage of Matlab, Stata, and other proprietary languages is that they are not
open-source, and unable to work within this new paradigm

Forgetting the cost for a moment, the benefits of using an open-source language are pragmatic rather
than ideological

- Open source languages are easier for everyone in the world to write and share packages because
  the code is accessible and available  
- With the right kinds of open source licenses, academics, businesses, and hobbyists all have
  incentives to contribute  
- Because open-source languages are managed on publicly accessible sites (e.g. GitHub), it is
  easier to built a community and collaborate  
- Package management systems (i.e. a way to find, download, install, and upgrade packages) in
  open-source languages can be very open and accessible since they don’t need to deal with
  proprietary software licenses  


Taking Matlab as an example: it has no package management system at all, and due to the license and
language limitations it is unlikely to ever catch up

### Installation

If you are accessing these materials via a Jupyter(Hub) server, you can skip
this section

If you are not viewing them on an already established Jupyter server, please
follow the [local installation instructions](local_install.ipynb) and come back
here when you are finished

### Jupyter notebook basics

There are a few things that we should know about Jupyter notebooks up front

1. Notebooks are made up of cells  
1. Cells have inputs and outputs  
1. We use cells of two main types:  
  1. Markdown cells
    -  Contain [markdown](https://github.com/adam-p/markdown-here/wiki/Markdown-Here-Cheatsheet) text
    -  Output is rendered in place of the input when the cell is executed  
  1. Code cells
    -  Contain Python (or other language) code
    -  Inputs have an `In [ ]:` to the left
    -  When executed, output placed below the input with `Out [ ]:` to the left  

In [None]:
1 + 1

#### Editing cells

The selected cells can be in one of two modes:

1. Command mode: This mode is for making high level changes to the notebook
  itself. For example changing the order of cells, creating a new cell etc…  
  - You know you’re in command mode when there is a blue sidebar on left of
    cell  
  - Pressing keys tells Jupyter notebook to run commands. For example, `a`
    adds a new cell above the current cell, `b` adds one below the current
    cell, and `dd` deletes the current cell  
  - up arrow (or `k`) changes the selected cell to the cell above the
    current one and down arrow (or `j`) changes to the cell below  
1. Edit mode: Used when editing the content inside of cells.  
  - When in edit mode the selected cell displays a green sidebar on left  
  - Can edit the content of a cell  


Some useful commands

- To go from command mode to edit mode press enter or double click the mouse  
- Go from edit mode to command mode by pressing escape  
- You can evaluate a cell by pressing shift+enter (meaning shift and enter at
  the same time)  


<blockquote>

**Check for understanding**

In the *code* cell below (notice the `In [ ]:` to the left) type a quote(`"`), your name, then another quote (`"`) and evaluate the cell

> 
</blockquote>

In [None]:
# code here!

<blockquote>


</blockquote>

#### Getting help

For more help with Jupyter notebook, see the `Help` menu at the top

<blockquote>

**Check for understanding**

Experiment with the different options in the help menu. You might try items
like “User Interface Tour” and “Keyboard Shortcuts”.


</blockquote>