# 1. Basics on Jupyter Project
## 1.1. What is Jupyter Notebook?

### - Interactive computing environment
### - Web based interface with modules to run code on several languages... Shhhhht! Not only Python!
### - Documents, called notebooks, with many types of content, distributed in cells.
### - Useful for prototyping, recording coding sessions, and ability to export to different formats
### - It was called IPython before, but was renamed to Jupyter

## 1.2. Starting the server

### - To run locally:
#### 1. Open a terminal
#### 2. Go to the directory where you want to store your files: `mkdir jupyter-intro && cd jupyter-intro`
#### 3. Type `jupyter notebook` and press the return key

## 1.3. Jupyter Notebook's UI

### 1.3.1. Notebook Dashboard

#### 1. Create a new notebook
#### 2. Restarting and shutting down notebooks

### 1.3.2. Notebook's UI

#### - Menu
#### - Toolbar
#### - Navigation
#### - Cell modes

## 1.4. Markdown language

### - Use `m` to switch a cell to Markdown mode
### - Help menu contains a link to a reference
### - https://daringfireball.net/projects/markdown/
### - Equations with LaTeX using MathJax $x^2$

### - Syntax highlight through GFM

```python
print(a)
```

### - Other languages syntax highlight

```ruby
def my_function(a) do
  puts a
end
```

### - Local files reference: 
```html
<img src="./pybcn1.png" />
```

<img src="./pybcn1.png" />

## 1.5. Writing and running code

### - Get a new cell, write `a = 10` and press Shift+Return
### - In the new cell, write `print(a)` and press Ctrl+Return
### - Execution of certain operations might take a while ...

In [None]:
import time
time.sleep(15)

### - Internal documentation

In [None]:
a = 'This is a string'

In [None]:
a.split?

### - Magics: %%writefile, %%load, %timeit, %%bash, ...

In [1]:
%%bash
rm test.txt

In [1]:
%%writefile test.txt
This is a test

Writing test.txt


In [None]:
%load test.txt

In [2]:
import time

%timeit time.sleep(0.5)

1 loop, best of 3: 504 ms per loop


In [3]:
%%bash
ls -la

total 10072
drwxr-xr-x  13 ifosch  staff      442 Sep 17 02:48 .
drwxr-xr-x  35 ifosch  staff     1190 Sep 16 11:34 ..
drwxr-xr-x  13 ifosch  staff      442 Sep 17 02:40 .git
-rw-r--r--   1 ifosch  staff       20 Sep 17 02:06 .gitignore
drwxr-xr-x   5 ifosch  staff      170 Sep 17 02:05 .ipynb_checkpoints
-rw-r--r--   1 ifosch  staff     1331 Sep 17 02:05 0. Introduction to Jupyter.ipynb
-rw-r--r--   1 ifosch  staff    14621 Sep 17 02:47 1. Basics on Jupyter Project.ipynb
-rw-r--r--   1 ifosch  staff  3156720 Sep 17 02:39 2. Example using matplotlib and numpy.ipynb
-rw-r--r--@  1 ifosch  staff   140542 Sep 16 23:41 HISTORY.gz
-rw-r--r--@  1 ifosch  staff   290649 Sep 17 02:18 dessert.png
-rw-r--r--   1 ifosch  staff    19348 Sep 16 23:34 pybcn1.png
-rw-r--r--@  1 ifosch  staff  1515444 Sep 17 02:06 stained_glass_barcelona.png
-rw-r--r--   1 ifosch  staff       14 Sep 17 02:48 test.txt


## 1.6. Kernels and output

### - Kernel keeps state
### - Sometimes you'll need to restart it

### - Output, standard output and standard error...

In [4]:
a = 'Hola'
a

'Hola'

In [5]:
from __future__ import print_function
import sys

print(a)
print(a, file=sys.stderr)

Hola


Hola


### - Output is asynchronous...

In [None]:
for i in range(8):
    print(i)
    time.sleep(0.5)

### - Output is automatically scrolled when it is too long

In [6]:
import random

for i in range(1000):
    print('%s = %s' % (i, random.random()))

0 = 0.00525869111317
1 = 0.226685059155
2 = 0.769818992195
3 = 0.0336839950994
4 = 0.728270506417
5 = 0.24873579083
6 = 0.186263013385
7 = 0.286379526453
8 = 0.526148951997
9 = 0.499438275288
10 = 0.409407614763
11 = 0.403655987897
12 = 0.101549000429
13 = 0.0605253675324
14 = 0.78231862958
15 = 0.621111762279
16 = 0.432158911099
17 = 0.887386396646
18 = 0.358575573002
19 = 0.540345337826
20 = 0.374936522994
21 = 0.628473218335
22 = 0.136587260654
23 = 0.519680100408
24 = 0.234163723558
25 = 0.983523172117
26 = 0.981338197756
27 = 0.229826983852
28 = 0.332948370215
29 = 0.496202123027
30 = 0.96919244235
31 = 0.224426506891
32 = 0.607131471875
33 = 0.977331235075
34 = 0.43139625334
35 = 0.919027156444
36 = 0.940637811239
37 = 0.261814868026
38 = 0.624069398888
39 = 0.921308047741
40 = 0.925554805237
41 = 0.151998461246
42 = 0.343812016805
43 = 0.454472145131
44 = 0.914010726041
45 = 0.640235303356
46 = 0.0956633819874
47 = 0.395155323803
48 = 0.34646609866
49 = 0.795347426379
50 = 0.077

## 1.7. Converting your notebooks

### - Usage of nbconvert
### - Notebooks are rendered by GitHub
### - NBViewer

## 1.8. Guided practice: Counting words

This practice will use standard library Python modules to read a text from a gzipped file, count the frequencies for all the words in this text, and print the total quantity of words in the text, the 20 words least frequent, and the 20 words more frequent.

Let's start by creating a function to read a gzipped file:

In [7]:
import gzip

def read_file(filename):
    return gzip.open(filename).read()

Now, check this function works appropriately with the example `HISTORY.gz` file:

In [8]:
text = read_file('HISTORY.gz')
text[:100]

'Python History\n--------------\n\nThis file contains the release messages for previous Python releases.'

Once the content of the file is stored in a variable, the easiest way of counting the words within it, would be keeping a separate counter for each word found in the string.
To keep these counters, which are going to be dynamically stored, and keep it easy to locate the words and the value for the counter, using a dictionary, where the keys will be the words, and the values will be the count for the word, would be the most appropriate.
There are many ways of doing this using other modules, but, in this case we can build a small and simple function:

In [9]:
def word_freq(text):
    """Return a dictionary of word frequencies for the given text."""

    freqs = {}
    for word in text.split():
        freqs[word] = freqs.get(word, 0) + 1
    return freqs

In [10]:
freqs = word_freq(text[:100])

In [11]:
freqs

{'--------------': 1,
 'History': 1,
 'Python': 2,
 'This': 1,
 'contains': 1,
 'file': 1,
 'for': 1,
 'messages': 1,
 'previous': 1,
 'release': 1,
 'releases.': 1,
 'the': 1}

Next, let's create a function so we can get a nice printing on a key-value pair:

In [12]:
def get_vk(lst):
    """Get a string from a list of value/key pairs nicely formatted in key/value order."""

    # Find the longest key: remember, the list has value/key paris, so the key
    # is element [1], not [0]
    longest_key = max([len(word) for count, word in lst])
    # Make a format string out of it
    fmt = '%'+str(longest_key)+'s -> %s'
    # Create the string
    output = ""
    for v,k in lst:
        output = "%s%s\n" % (output, fmt % (k,v))
    return output.rstrip()

And this function could be used to get this list:

In [13]:
print(get_vk(list([(1, "the"), (1, "word"), (1, "is"), (1, "in"), (2, "Python")])))

   the -> 1
  word -> 1
    is -> 1
    in -> 1
Python -> 2


The last function to be created in this practice will return the summary about the word frequencies in the text:

In [14]:
def freq_summ(freqs,n=10):
    """Get a simple summary of a word frequencies dictionary.

    Inputs:
      - freqs: a dictionary of word frequencies.

    Optional inputs:
      - n: the number of """

    words,counts = freqs.keys(),freqs.values()
    # Sort by count
    items = list(zip(counts,words))
    items.sort()

    output = "Number of words: %s\n\n" % len(freqs)
    output = "%s%d least frequent words:\n\n" % (output, n)
    output = "%s%s\n\n" % (output, get_vk(items[:n]))
    output = "%s%d most frequent words:\n\n" % (output, n)
    output = "%s%s\n\n" % (output, get_vk(list(reversed(items[-n:]))))
    return output

So, using this function on this text's 100 first words:

In [15]:
print(freq_summ(word_freq(text[:100]), 20))

Number of words: 12

20 least frequent words:

-------------- -> 1
       History -> 1
          This -> 1
      contains -> 1
          file -> 1
           for -> 1
      messages -> 1
      previous -> 1
       release -> 1
     releases. -> 1
           the -> 1
        Python -> 2

20 most frequent words:

        Python -> 2
           the -> 1
     releases. -> 1
       release -> 1
      previous -> 1
      messages -> 1
           for -> 1
          file -> 1
      contains -> 1
          This -> 1
       History -> 1
-------------- -> 1




Now, these functions can be used on the whole text, in a nice program:

In [16]:
text = read_file('HISTORY.gz')
freqs = word_freq(text)
print(freq_summ(freqs, 5))

Number of words: 12253

5 least frequent words:

      !) -> 1
    ""), -> 1
    ""). -> 1
"#define -> 1
     "%% -> 1

5 most frequent words:

the -> 2461
  - -> 1624
 to -> 1521
  a -> 1294
and -> 1062


