# Week 9 Problem 2

If you are not using the `Assignments` tab on the course JupyterHub server to read this notebook, read [Activating the assignments tab](https://github.com/UI-DataScience/info490-fa16/blob/master/Week2/assignments/README.md).

A few things you should keep in mind when working on assignments:

1. Make sure you fill in any place that says `YOUR CODE HERE`. Do **not** write your answer in anywhere else other than where it says `YOUR CODE HERE`. Anything you write anywhere else will be removed or overwritten by the autograder.

2. Before you submit your assignment, make sure everything runs as expected. Go to menubar, select _Kernel_, and restart the kernel and run all cells (_Restart & Run all_).

3. Do not change the title (i.e. file name) of this notebook.

4. Make sure that you save your work (in the menubar, select _File_ → _Save and CheckPoint_)

5. You are allowed to submit an assignment multiple times, but only the most recent submission will be graded.

In [1]:
from nose.tools import assert_equal
import os
import json
import requests

# Problem 1. IPython notebook as JSON.

IPython/Jupyter notebooks are actually simple JSON documents. You may have noticed this if you are a Mac user, as OS X tries to attach .json at the end of the file name when you try to download an .ipynb file to your hard disk. It's because all information in the notebook is stored as JSON texts.

See [The Jupyter Notebook Format](https://ipython.org/ipython-doc/3/notebook/nbformat.html) for details.

In this problem, we will use the json module to parse a notebook file and extract some information about the notebook. We will use intro2ipy.ipynb from Week 2 as an example.

In [2]:
r = requests.get('https://raw.githubusercontent.com/UI-DataScience/info490-fa15/master/Week2/notebooks/intro2ipy.ipynb')

with open('intro2ipy.ipynb', 'w') as f:
    f.write(r.text)

## Function: get_keys()
- Write a function named get_keys() that takes a file name (str) and returns the keys (list of strings) of the dictionary at the [top level](https://ipython.org/ipython-doc/3/notebook/nbformat.html#top-level-structure).

In other words, the function opens the JSON file, reads the file as a Python dictionary, and returns the keys of that dictionary. The dictionary you get from reading the JSON file is nested (there's a dictionary inside a dictionary inside a dictionary), and you should return the keys of the outermost (the top-level) dictionary.

In [3]:
def get_keys(filename):
    '''
    Takes the file name (str) of a JSON text file and returns the keys of the top-level dictionary.
    
    Parameters
    ----------
    filename (str): a JSON file.
    
    Returns
    -------
    A list of strings.
    '''
    # YOUR CODE HERE
    result = []
    # Open the file
    with open(filename, 'r') as jsonfile:
        # Read the data
        data = json.load(jsonfile)
        # Get the keys and append keys to result list
        for key in data:
            result.append(key)
    return result

In [4]:
print(get_keys('intro2ipy.ipynb'))

['nbformat_minor', 'metadata', 'cells', 'nbformat']


In [5]:
test1 = get_keys('intro2ipy.ipynb')
answer1 = ['cells', 'nbformat_minor', 'metadata', 'nbformat']

assert_equal(len(test1), len(answer1))
assert_equal(set(test1), set(answer1))

test2 = {
    'A': 1,
    'B': {'C': 2, 'D': 3},
    'C': {
        'E': {'F': 4},
        'G': {'H': 5, 'I': 6}
    }
}

answer2 = ['A', 'B', 'C']

with open('test.json', 'w') as f:
    json.dump(test2, f)

assert_equal(len(get_keys('test.json')), len(answer2))
assert_equal(set(get_keys('test.json')), set(answer2))

os.remove('test.json')

## Function: get_version()
Version information is always important. For example, the course docker image runs Python 3.4, while the latest version of Python 3 is version 3.5, so new features in Python 3.5 may or may not work in 3.4. Or, if you try to run a Python 3 notebook using a Python 2 kernel, it will throw errors. Furthermore, IPython notebooks themselves have different format versions. We are using "nbformat" 4, and if you try to run this notebook in older versions of IPython kernels that only recognizes nbformats 3 or less, it won't run.

- Write a function named get_version() that takes a file name (str) and returns a tuple of ("the programming language of the kernel", "the version of the language", nbformat). The data type of this tuple is (str, str, int)

In [6]:
def get_version(filename):
    '''
    Takes a file name (str) of a JSON file.
    Returns a tuple of ("the programming language of the kernel", "the version of the language", nbformat).
    
    Parameters
    ----------
    filename (str): a JSON file.
    
    Returns
    -------
    A tuple of (str, str, int)
    '''
    # YOUR CODE HERE
     # Open the file
    with open(filename, 'r') as jsonfile:
        # Read the data
        data = json.load(jsonfile)
        # Get the programming language of the kernel
        name = data['metadata']['language_info']['name']
        # Get the version of the language
        version = data['metadata']['language_info']['version']
        # Get nbformat
        nbformat = data['nbformat']
    
    return name, version, nbformat

In [7]:
print(get_version('intro2ipy.ipynb'))

('python', '3.4.0', 4)


In [8]:
test1 = get_version('intro2ipy.ipynb')
answer1 = ('python', '3.4.0', 4)

assert_equal(test1, answer1)

test2 = {
  "metadata" : {
    "signature": "hex-digest", # used for authenticating unsafe outputs on load
    "kernel_info": {
        # if kernel_info is defined, its name field is required.
        "name" : "the name of the kernel"
    },
    "language_info": {
        # if language_info is defined, its name field is required.
        "name" : "the programming language of the kernel",
        "version": "the version of the language",
        "codemirror_mode": "The name of the codemirror mode to use [optional]"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0,
  "cells" : [
      # list of cell dictionaries, see below
  ],
}

answer2 = ("the programming language of the kernel", "the version of the language", 4)

with open('test.json', 'w') as f:
    json.dump(test2, f)
    
assert_equal(get_version('test.json'), answer2)

os.remove('test.json')

## Function: count_code_cells()
- Write a function named count_code_cells() that takes a filename (str) and a cell type (str), and returns the number count of that cell type (int).

In [9]:
def count_code_cells(filename, cell_type):
    '''
    Takes a filename and a cell type, and returns the number count of that cell type.
    
    Parameters
    ----------
    filename (str): a JSON file.
    cell_type (str): "code", "markdown", etc.
    
    Returns
    -------
    An int.
    '''
    # YOUR CODE HERE
    count = 0
    # Open the file
    with open(filename, 'r') as jsonfile:
        # Read the data
        data = json.load(jsonfile)
        # Get cell
        for cell in data['cells']:
            # Count the specific cell_type
            if cell["cell_type"] == cell_type:
                count +=1

    return count

In [10]:
n_code = count_code_cells('intro2ipy.ipynb', 'code')
n_markdown = count_code_cells('intro2ipy.ipynb', 'markdown')

print('There are {} code cells and {} markdown cells.'.format(n_code, n_markdown))

assert_equal(count_code_cells('intro2ipy.ipynb', 'code'), 5)
assert_equal(count_code_cells('intro2ipy.ipynb', 'markdown'), 16)

test = {
  "cells" : [
    {
      "cell_type" : "type1",
      "metadata" : {},
      "source" : "single string or [list, of, strings]",
      },
    {
      "cell_type" : "type1",
      "metadata" : {},
      "source" : "single string or [list, of, strings]",
      },
    {
      "cell_type" : "type2",
      "metadata" : {},
      "source" : "single string or [list, of, strings]",
      }
  ],
} 

with open('test.json', 'w') as f:
    json.dump(test, f)

assert_equal(count_code_cells('test.json', 'type1'), 2)
assert_equal(count_code_cells('test.json', 'type2'), 1)
assert_equal(count_code_cells('test.json', 'type3'), 0)

os.remove('test.json')

There are 5 code cells and 16 markdown cells.
