# Ipython notebooks

Spring 2017 - Prof. Foster Provost

Teacher Assistant: Maria L Zamora Maass

***

## Python

Python is a programming language that has been growing in popularity in recent years. There are many reasons for this, but it mostly comes down to Python being easy to learn and use as well as the fact that Python has a very active community that continually develops and improves amazing extensions to Python!

Python has become one of the most frequently used languages in the world of data science due to the ability to almost instantly apply it to a large number of data science problems. When asking stakeholders in companies in different industries and of various sizes what language they would like their data scientists to know when coming in, they almost all agree that Python is the best choice. If you are going to learn one language (something everyone should do!), Python would be a great choice.

From this language, other languages, features and packages have been created: Ipython, Pandas, Numpy, Matplotlib, and others that we will be using during this course. For more info please visit https://www.python.org/doc/


## Jupyter Ipython notebooks

One extremely useful tool Jupyter, which incorporates "Ipython notebooks".

"The IPython Notebook is now known as the Jupyter Notebook. It is an interactive computational environment, in which you can combine code execution, rich text, mathematics, plots and rich media. It is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more."

- Language: The Jupyter Notebook has support for over 40 programming languages, including those popular in Data Science such as Python, R, Julia and Scala.

- Sharing: Notebooks can be shared with others using email, Dropbox, GitHub and the Jupyter Notebook Viewer.

- Widgets (apps): Code can produce rich output such as images, videos, LaTeX, and JavaScript. Interactive widgets can be used to manipulate and visualize data in realtime.


IPython notebooks are made up of cells. There are two basics types of entries in an IPython notebook: **text cells** for comments, and **code cells** (commands). You can edit a cell by double clicking on it. You can get it back to the display mode (run a cell) by pressing the "Play" (▶) button, and you can also stop it with the "square" ◼︎ . Basically, all the tasks for cells can be found in the tool bar.  Try it now!  Doubleclick this cell, then click the play button above. (Mine looks more like this: ▶| ) We'll explain what you just saw in a minute.


For more details on the Jupyter Notebook, please see the Jupyter website http://jupyter.org/


Steps to open a new notebook (you can open as many as you want!):


![NewNotebook](images/new_notebook.png)



[Note: Jupyter now uses Python 3 rather than Python 2.]

This is how a new Ipython notebook looks:




![NewNotebook](images/notebook.png)





## Text files and scripts in Jupyter

In jupyter, we can also create new text files or create scripts.

A script is a (often small) program that can be run as a "command". The program code of course is just text defining commands in certain programming language (e.g. Python). Often scripts cobble other commands into something useful that can be run again and again.

We can know the language of the script based on the extension of the file. For example, a file called "script_example.py" is a file with python commands, and a file called "script_example.R" is a file with the R language commands.

You would create a script by opening a text file with an appropriate name and extension, typing or pasting in some Python commands, and then saving it.

Steps to open a new text file:

![NewText](images/new_text.png)

***

This is how it looks:

![Text](images/text.png)

***

Now we can change the language and write some examples of Python commands. 

We should change the extension of this file into a file.py to be able to run the file later.

![Language](images/selectlanguage.png)
![Language](images/script.png)


Why do we need scripts? Because we often want to create code that we will use frequently. Instead of writing all that code repeatedly in many notebooks, we can create a script and just call it. You will see examples of how to do this later!


## Text cells in a Jupyter notebook

To write text in a cell we select the cell and go to the toolbar to change it from "code" to "markdown" (**Markdown** is a formatting language for making text look more fancy). Doubleclick on this cell to see the markdown version.  Run the cell to see the formatted version.

Now, you can write and do text formatting:

- Hashtag (number sign) is useful \# for titles
- Simple \*asterisk\* or \_underscores\_ to emphasize things: _example_.
- Backslash (\\) to get those special characters not to act special (like in the preceeding).
- Double **asterisks** to make things bold. 
- Square Brackets [ ] are for links and images
- Also, HTML code is allowed. Some resources can be found in [HTML w3schools](http://www.w3schools.com/html/html_examples.asp) <p style="color:red;">This is text formatted with HTML.</p>

- And you can write math with $\LaTeX$ (latex is a typesetting language for the production of scientific documents https://www.latex-project.org/): You use latex in Jupyter by wrapping the latex code in dollar signs, $x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$. If you don't know how to write a symbol, you can go to [Detexify](http://detexify.kirelabs.org/classify.html).





If you are ever stuck because you don't know how to format some text, just Google **"Markdown syntax"**.

## Code cells

Now, let's take a look at a code cell (this is the default type of a Jupyter cell). In a code cell, we can type any Python command and then click "Play". When we play a cell, the code in it is executed and it returns whatever is the result of running a code.  (Maybe an error!) Code cells will always start with "`In [ ]:`". 

We build our ipython notebooks with successive code cells, often interspersed with notes, images, and so on.  Because of this it is important to think of your notebook as a whole, and not just the individual cells.

Importantly, some of the results of running a cell are "remembered" for subsequent cells, as long as the session continues running.  Usually the session keeps running as long as you don't stop your instance and you keep the notebook window open.  Sometimes when you get errors in running your cells, it's because something from a prior cell was forgotten for some reason.  If you look above under the dropdown menu Kernel above, you'll see that there are options for clearing your outputs, rerunning all the cells, etc.

For example, in the following cell, it will remember that the **_VARIABLE_** called "x" is  the sum of two given numbers.  Once you run that cell, variable "x" will be available for use in cells below. Please select it and press the ▶ button to run it.  Then select the following cell and run it.

In [4]:

x = 5 + 51
print ("The value of the 'x' variable is " + str(x) + ".")


The value of the 'x' variable is 56.


In [3]:
print("The value of the 'x' variable is *still* " + str(x) + " down here.")

The value of the 'x' variable is *still* 10 down here.


Go back to those two cells and change some things and then rerun them.

_Note_: Instead of having to manually go and click the "Play" button every time, you can also run a cell with your keyboard (*much easier!*). Just press **Ctrl + Enter** or **Shift + Enter**. Experiment with both of those to see what the difference is.

Again, since it's important, IPython notebooks flow from **top to bottom**. This means that if a cell relies on a variable or function that was created earlier in the notebook, you must run the prior cell to make that information available in future cells  (_we cannot just call "x" in other cell if we don't run this one before_)! Try restarting and clearing all outputs (from the Kernel menu above), and then just run that last cell.  You should get an error sayint that it doesn't know what "x" is.

_Note_: The number in the "In [#]:" statement will always increase by one for every time you run a code cell -- even if it's an error and even if you rerun the same cell.