# getting started with jupyter

Welcome to COMP 205: Principles of Data Science in Python. 

If you can see this page, you have successfully installed or accessed Jupyter notebooks. There are a few steps that you need to take in order to participate in this course. This will lead you through them. 

First, you need to run the following cell, to install the software this course uses on your computer. To do this, press the run button to the right of the cell once. You only need to run this cell once. You do not need to run this (and it will print an error) if you are viewing this page on my servers (http://comp205.eecs.tufts.edu). If you manage to do that, don't worry, it won't do anything harmful.  

In [1]:
!pip install datascience
!pip install okpy



# getting familiar with Jupyter notebooks

The cell you are reading is a "Markdown cell". This contains text in a format like that of a word processor. The source for this is a simple language called "Markdown". It is written in text form, and is formatted by Jupyter notebooks into HTML that your browser can display. To see the source code for this message, double-click on this cell. 

# the Markdown language
A lot of data science requires writing in English rather than in computer code. In Jupyter Notebooks, this is done using the Markdown language. The language is fairly simple: 

```
# starts a heading
## starts a subheading
* starts a bulleted list. 
1. (or any number) starts a numbered list. 
```

The above is typeset as follows: 
# starts a heading
## starts a subheading
* starts a bulleted list
1. (or any number) starts a numbered list. 

For more details on markdown, please read [the Markdown documentation](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html). 

Meahwhile, I will typeset the rest of the course instructions in Markdown and you can double-click to see how I did it. Just click "Run" to have them appear normally again. 

# general structure of this course

This course is expressed as a series of Jupyter notebooks illustrating basic concepts and testing your understanding. These pages have supporting files and each page is in its own directory. You can download the zipfiles for each page from our Canvas site. Several mini-lectures illustrate concepts interactively, but the "textbook" for the course is these pages and you need to know how to access them. 

Each notebook has supporting files, so you need to put each notebook and its files into a separate directory/folder. If you unzip two notebooks to the same directory, neither will run properly. No pernament harm has been done, just unzip to a separate directory next time. 

The wonderful thing about a Jupyter notebook is that one can write a "live" set of course notes that interacts with you. I'll ask you to try specific things out periodically, and ask you questions to test your understanding. Thus, reading is an interactive -- rather than a passive -- experience. 

You are free to add cells to these notebooks yourself in order to take your own personal notes on the experience. Thus, you personalize the experience and put things into your own words. 

Each notebook will also link to external reading that you need to do in order to complete each exercise.

# how to work on assignments
The course materials are designed so that you can work on them in several modes: 

1. With your own copy of Jupyter notebooks, installed as the anaconda3 package on a windows, mac, or linux computer. This is by far the most convenient option. 
2. On a special server reserved for this course: https://comp205.eecs.tufts.edu . This givs you the ability to work from less powerful devices, including chromebooks, and from machines where you don't have the privilege to install software, e.g., on your work computers. 

You are free to switch back and forth between these modes. To switch, you need to move the whole notebook directory (not just the .ipynb file) to the other environment. 

Meanwhile, if you are reading this from the comp205 environment and want to set up your own environment, please [click here for instructions](https://docs.anaconda.com/anaconda/install/)

# using comp205.eecs.tufts edu
Access to https://comp205.eecs.tufts.edu is controlled via the Halligan hall computer accounts system, which is independent from but coupled with the Tufts university accounts system. The typical student in this course will receive a Universal Tufts Login Name (UTLN) before the course starts, which will be the same login name you have on our systems. To enable your account on our systems, you must use a web page based upon your Tufts account: https://www.eecs.tufts.edu/~accounts . Please click on this link and enable your eecs.tufts.edu account by specifying a *different* password than for your Tufts email. 

# anaconda
This course is based upon the anaconda library for data science and machine learning. This is a set of software that one can call from a Jupyter cell To use it, you need to install it in on your own machine. If you cannot do so, because you do not own the machine, or the machine has too small a disk, or it is an unsupported machine (e.g., a chromebook or even a raspberry pi!) or installation is prohibited by company policy, then feel free to use our https://comp205.eecs.tufts.edu environment instead. 

When you are installing anaconda, make sure to install the Python 3 version rather than Python 2. All exercises in this course are written in Python 3. 

There are two additions that we will call upon in the exercises that are not part of anaconda:
* `okpy`: an interactive grading program that allows you to check your understanding of a concept in real time. 
* `datascience`: a set of user-friendly data science routines based upon anaconda. 
You installed these above in the first compute cell. 

# Python 3 versus Python 2 versus iPython
In this course, we will work in iPython, which is short for "interactive Python". This is a version suitable for use in Jupyter notebooks cells. Our particular iPython interpreter understands Python 3 syntax. Python 2 -- an older version of Python -- is still in active use but we will not use it in this course. The differences are subtle, and beyond the scope of this course. 

Language in the Jupyter notebooks documentation can be confusing so an initial note on nomenclature might help. You will often find that the word *kernel* is used to describe an interactive language (such as iPython) that is interpreted in a Notebook. There are several "kernels", including Python 3, Python 2, R, Matlab, etc. In this course, we will be using the iPython kernel based upon Python 3. Thus: 
* The word *kernel* refers to the language a notebook is interpreting. 
* The *iPython kernel* supports either Python 3 or 2. 
* We are using the Python 3 version. 

Hopefully this note will make the documentation easier to read. 

# Logging into okpy

The code below will log you into okpy for the first time. 

It will direct you to the okpy.org site for your secret key, which you must type into the notebook cell that appears. 

In [3]:
# Don't change this cell; just run it. 
from client.api.notebook import Notebook
ok = Notebook('Getting started.ok')
ok.auth(inline=True, force=True)

Assignment: Getting started
OK, version v1.14.15

Successfully logged in as alva.couch@gmail.com


# If you've already logged in from this instance of Jupyter

Then you don't need the "force" option, and you can instead run this cell. 

In [7]:
# Don't change this cell; just run it. 
from client.api.notebook import Notebook
ok = Notebook('Getting started.ok')
ok.auth(inline=True)

Assignment: Getting started
OK, version v1.14.15

Successfully logged in as alva.couch@gmail.com


# how to use OK

### Periodically, I'll embed a cell into the notebook that tells you whether you are correct. 

Run it to figure out whether your answer is reasonable. 

For example, write the code that sets the variable x to 'hello' in the cell below. 

In [4]:
x = 'hello'

### now run the cell below to figure out whether it's correct or not. 

In [5]:
_ = ok.grade('q1_1')

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Running tests

---------------------------------------------------------------------
Test summary
    Passed: 1
    Failed: 0
[ooooooooook] 100.0% passed



# When you're done with a notebook

You can report this to me with the cell below. 

Running that cell submits a solution to the notebook to me. 

In [6]:
_ = ok.submit()

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Saving notebook... Saved 'Getting started.ipynb'.
Submit... 100% complete
Submission successful for user: alva.couch@gmail.com
URL: https://okpy.org/cal/COMP205/su19/gettingstarted/submissions/r8n4Vp

