# 1. Basics on Jupyter Project
## 1.1. What is Jupyter Notebook?

### - Interactive computing environment
### - Web based interface with modules to run code on several languages... Shhhhht! Not only Python!
### - Documents, called notebooks, with many types of content, distributed in cells.
### - Useful for prototyping, recording coding sessions, and ability to export to different formats
### - It was called IPython before, but was renamed to Jupyter

## 1.2. Starting the server

### - To run locally:
#### 1. Open a terminal
#### 2. Go to the directory where you want to store your files: `mkdir jupyter-intro && cd jupyter-intro`
#### 3. Type `jupyter notebook` and press the return key

## 1.3. Jupyter Notebook's UI

### 1.3.1. Notebook Dashboard

#### 1. Create a new notebook
#### 2. Restarting and shutting down notebooks

### 1.3.2. Notebook's UI

#### - Menu
#### - Toolbar
#### - Navigation
#### - Cell modes

## 1.4. Markdown language

### - Use `m` to switch a cell to Markdown mode
### - Help menu contains a link to a reference
### - https://daringfireball.net/projects/markdown/
### - Equations with LaTeX using MathJax $x^2$

### - Syntax highlight through GFM

```python
print(a)
```

### - Other languages syntax highlight

```ruby
def my_function(a) do
  puts a
end
```

### - Local files reference: 
```html
<img src="./pybcn1.png" />
```

<img src="./pybcn1.png" />

## 1.5. Writing and running code

### - Get a new cell, write `a = 10` and press Shift+Return
### - In the new cell, write `print(a)` and press Ctrl+Return
### - Execution of certain operations might take a while ...

In [None]:
import time
time.sleep(15)

### - Internal documentation

In [None]:
a = 'This is a string'

In [None]:
a.split?

### - Magics: %%writefile, %%load, %timeit, %%bash, ...

In [1]:
%%writefile test.txt
This is a test

Overwriting test.txt


In [None]:
%load test.txt

In [2]:
import time

%timeit time.sleep(0.5)

1 loop, best of 3: 501 ms per loop


In [3]:
%%bash
ls -la

total 512
drwxr-xr-x  11 ifosch  staff     374 Sep 17 01:53 .
drwxr-xr-x  35 ifosch  staff    1190 Sep 16 11:34 ..
drwxr-xr-x   9 ifosch  staff     306 Sep 16 23:41 .git
drwxr-xr-x   6 ifosch  staff     204 Sep 16 12:43 .ipynb_checkpoints
-rw-r--r--   1 ifosch  staff    1620 Sep 16 12:54 0. Introduction to Jupyter.ipynb
-rw-r--r--   1 ifosch  staff   79908 Sep 17 01:53 1. Basics on Jupyter Project.ipynb
-rw-r--r--   1 ifosch  staff     625 Sep 16 12:18 2. Basics on Numpy.ipynb
-rw-r--r--   1 ifosch  staff     593 Sep 16 12:43 3. Basics on matplotlib.ipynb
-rw-r--r--@  1 ifosch  staff  140542 Sep 16 23:41 HISTORY.gz
-rw-r--r--   1 ifosch  staff   19348 Sep 16 23:34 pybcn1.png
-rw-r--r--   1 ifosch  staff      14 Sep 17 01:53 test.txt


## 1.6. Kernels and output

### - Kernel keeps state
### - Sometimes you'll need to restart it

### - Output, standard output and standard error...

In [4]:
a = 'Hola'
a

'Hola'

In [5]:
from __future__ import print_function
import sys

print(a)
print(a, file=sys.stderr)

Hola


Hola


### - Output is asynchronous...

In [None]:
for i in range(8):
    print(i)
    time.sleep(0.5)

### - Output is automatically scrolled when it is too long

In [6]:
import random

for i in range(1000):
    print('%s = %s' % (i, random.random()))

0 = 0.668431080405
1 = 0.570782007611
2 = 0.736154548472
3 = 0.157302601261
4 = 0.448638797687
5 = 0.935349320599
6 = 0.353606191376
7 = 0.620164561024
8 = 0.315575489789
9 = 0.915416567158
10 = 0.45950755502
11 = 0.586024866682
12 = 0.589838106771
13 = 0.269764883086
14 = 0.353708321121
15 = 0.348719294201
16 = 0.448674497787
17 = 0.424434248325
18 = 0.106308520519
19 = 0.826301576133
20 = 0.79137518191
21 = 0.395349508426
22 = 0.629688435421
23 = 0.68387609127
24 = 0.445255779927
25 = 0.48039499882
26 = 0.42342280817
27 = 0.444104199125
28 = 0.657965272321
29 = 0.810529976905
30 = 0.542279420888
31 = 0.164731203217
32 = 0.128639898939
33 = 0.370018876574
34 = 0.990564548963
35 = 0.530573184369
36 = 0.170501669344
37 = 0.909635535275
38 = 0.880600238639
39 = 0.174176818846
40 = 0.719877695547
41 = 0.365327557917
42 = 0.0053575513212
43 = 0.24618592012
44 = 0.844917247507
45 = 0.113320865152
46 = 0.0730676307071
47 = 0.0180026871372
48 = 0.908629775161
49 = 0.140901597165
50 = 0.696371

## 1.7. Converting your notebooks

### - Usage of nbconvert
### - Notebooks are rendered by GitHub
### - NBViewer

## 1.8. Guided practice: Counting words

This practice will use standard library Python modules to read a text from a gzipped file, count the frequencies for all the words in this text, and print the total quantity of words in the text, the 20 words least frequent, and the 20 words more frequent.

Let's start by creating a function to read a gzipped file:

In [7]:
import gzip

def read_file(filename):
    return gzip.open(filename).read()

Now, check this function works appropriately with the example `HISTORY.gz` file:

In [8]:
text = read_file('HISTORY.gz')
text[:100]

'Python History\n--------------\n\nThis file contains the release messages for previous Python releases.'

Once the content of the file is stored in a variable, the easiest way of counting the words within it, would be keeping a separate counter for each word found in the string.
To keep these counters, which are going to be dynamically stored, and keep it easy to locate the words and the value for the counter, using a dictionary, where the keys will be the words, and the values will be the count for the word, would be the most appropriate.
There are many ways of doing this using other modules, but, in this case we can build a small and simple function:

In [9]:
def word_freq(text):
    """Return a dictionary of word frequencies for the given text."""

    freqs = {}
    for word in text.split():
        freqs[word] = freqs.get(word, 0) + 1
    return freqs

In [10]:
freqs = word_freq(text[:100])

In [11]:
freqs

{'--------------': 1,
 'History': 1,
 'Python': 2,
 'This': 1,
 'contains': 1,
 'file': 1,
 'for': 1,
 'messages': 1,
 'previous': 1,
 'release': 1,
 'releases.': 1,
 'the': 1}

Next, let's create a function so we can get a nice printing on a key-value pair:

In [12]:
def get_vk(lst):
    """Get a string from a list of value/key pairs nicely formatted in key/value order."""

    # Find the longest key: remember, the list has value/key paris, so the key
    # is element [1], not [0]
    longest_key = max([len(word) for count, word in lst])
    # Make a format string out of it
    fmt = '%'+str(longest_key)+'s -> %s'
    # Create the string
    output = ""
    for v,k in lst:
        output = "%s%s\n" % (output, fmt % (k,v))
    return output.rstrip()

And this function could be used to get this list:

In [13]:
print(get_vk(list([(1, "the"), (1, "word"), (1, "is"), (1, "in"), (2, "Python")])))

   the -> 1
  word -> 1
    is -> 1
    in -> 1
Python -> 2


The last function to be created in this practice will return the summary about the word frequencies in the text:

In [14]:
def freq_summ(freqs,n=10):
    """Get a simple summary of a word frequencies dictionary.

    Inputs:
      - freqs: a dictionary of word frequencies.

    Optional inputs:
      - n: the number of """

    words,counts = freqs.keys(),freqs.values()
    # Sort by count
    items = list(zip(counts,words))
    items.sort()

    output = "Number of words: %s\n\n" % len(freqs)
    output = "%s%d least frequent words:\n\n" % (output, n)
    output = "%s%s\n\n" % (output, get_vk(items[:n]))
    output = "%s%d least frequent words:\n\n" % (output, n)
    output = "%s%s\n\n" % (output, get_vk(items[-n:]))
    return output

So, using this function on this text's 100 first words:

In [15]:
print(freq_summ(word_freq(text[:100]), 20))

Number of words: 12

20 least frequent words:

-------------- -> 1
       History -> 1
          This -> 1
      contains -> 1
          file -> 1
           for -> 1
      messages -> 1
      previous -> 1
       release -> 1
     releases. -> 1
           the -> 1
        Python -> 2

20 least frequent words:

-------------- -> 1
       History -> 1
          This -> 1
      contains -> 1
          file -> 1
           for -> 1
      messages -> 1
      previous -> 1
       release -> 1
     releases. -> 1
           the -> 1
        Python -> 2




Now, these functions can be used on the whole text, in a nice program:

In [16]:
text = read_file('HISTORY.gz')
freqs = word_freq(text)
print(freq_summ(freqs, 20))

Number of words: 12253

20 least frequent words:

               !) -> 1
             ""), -> 1
             ""). -> 1
         "#define -> 1
              "%% -> 1
            "%%". -> 1
             "%d" -> 1
             "%x" -> 1
       "'single'" -> 1
 "(?<!abc)(def)". -> 1
         "(None)" -> 1
     "(built-in)" -> 1
    "*noconfig*", -> 1
       "*shared*" -> 1
              "+" -> 1
              "," -> 1
              "-" -> 1
"--with-pymalloc" -> 1
           "-l_r" -> 1
             "-x" -> 1

20 least frequent words:

   are -> 314
    an -> 319
  with -> 320
module -> 354
    it -> 365
    by -> 380
   new -> 382
     * -> 393
  that -> 452
   The -> 581
   now -> 600
   for -> 762
    is -> 926
    of -> 930
    in -> 935
   and -> 1062
     a -> 1294
    to -> 1521
     - -> 1624
   the -> 2461


