## Assignment 2: CLI, Openpyxl and modules

This assignment will make you work with files, command-line and objects. 

### How to hand in
The assignment is expected to be published on GitHub, but the actual hand-in should be a link to a MyBinder. See the notebook `12-Assignments` if you don't know what that means
  


## Part 1: Download script

Write a program `download_script.py`, which downloads a set of files from the internet. The files to download are given as arguments to your program on the command-line as illustrated in the following:

  ```bash
  $ python download_script.py http://www.gutenberg.org/files/2701/2701-0.txt http://www.gutenberg.org/cache/epub/27525/pg27525.txt
  Downloading file to ./2701-0.txt
  Downloading file to ./pg27525.txt
  ```

    Reuse your `webget` module from exercises in 07-Functions and Modules.

In [None]:
import os
import urllib.request as req
from urllib.parse import urlparse
import sys

def download(urls):
    for url in urls:
        urlstring = urlparse(url)
        urlsplit = urlstring.path.split("/")
        to = urlsplit[-1]
        req.urlretrieve(url, to)
        print("Downloading file to ./", to, sep="")

In [None]:
import webget
import sys

urls = sys.argv[1:]
webget.download(urls)

In [5]:
!python download_script.py http://www.gutenberg.org/files/2701/2701-0.txt http://www.gutenberg.org/cache/epub/27525/pg27525.txt

Downloading file to ./2701-0.txt
Downloading file to ./pg27525.txt


## Part 2: Reading .xlsx files


Write a program that converts the Excel spreadsheet `./iris_data.xlsx` into a CSV file with the same data. 
 * Start with writing a unit test against which you implement your solution. You are welcome to use a framework for this, but it can also simply be a function that you call with an expected outcome


In [None]:
import openpyxl
import csv
import xlrd
filename = 'iris_data.xlsx'
# created from: https://en.wikipedia.org/wiki/Iris_flower_data_set#Data_set


def csv_from_excel():
    wb = xlrd.open_workbook(filename)
    sh = wb.sheet_by_name("Fisher's Iris Data")
    csv_file = open('output.csv', 'w')
    wr = csv.writer(csv_file, delimiter='\t', quotechar='|')

    for rownum in range(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    print("excel to csv has been completed for " + filename)
    csv_file.close()

# runs the csv_from_excel function:
csv_from_excel()

def test_excel_file():
    wb = openpyxl.load_workbook(filename)
    sheet = wb.get_sheet_by_name("Fisher's Iris Data")
    for rowOfCellObjects in sheet['A1':'E151']:
        for cellObj in rowOfCellObjects:
            print(cellObj.coordinate, cellObj.value)
        print('---------')

#test_excel_file() #run test to spam data into console

In [6]:
!python excel_to_csv.py

excel to csv has been completed for iris_data.xlsx


## Part 3: Creating a module with data

Write a function that reads the `befkbhalderstatkode.csv` file from this url: `'http://data.kk.dk/dataset/76ecf368-bf2d-46a2-bcf8-adaf37662528/resource/9286af17-f74e-46c9-a428-9fb707542189/download/befkbhalderstatkode.csv'`

The function should return the following `STATISTICS` dictionary:

  ```python

  STATISTICS = {
      2015: {
          1: {
              0: {
                 5100: 614,
                 5104: 2,
                 5106: 1,
                 ...
              },
              1: {
                  5100: 485,
                  5110: 1,
                  5115, 1,
                  ...
              },
              2: {
                  ...
              },
              ...
          },
          2: {
              ...
          },
          3: {
              ...
          },
          ...
      },
      2014: {
          ...
      },
      ...
  }
  ```
  To be sure that the code is complete and correct, start with writing a **unit test**, which iterates over the CSV data and checks that the corresponding data exists in the dictionary. Here is an example
  
  ```python
  import kkdata
    
  f = './befkbhalderstatkode.csv'
  
  reader = csv.reader(f)
  header_row = next(reader)
  for row in reader:
      data.append(row)
      
      assert kkdata.STATISTICS[row[0]][row[1]][row[2]][row[3]] == [row[4]]
  ```

In [None]:
import webget
import pprint

urllist = ["http://data.kk.dk/dataset/76ecf368-bf2d-46a2-bcf8-adaf37662528/resource/9286af17-f74e-46c9-a428-9fb707542189/download/befkbhalderstatkode.csv"]
webget.download(urllist)

with open('befkbhalderstatkode.csv') as file_object:
    STATISTICS = {}

    # making a list of strings
    lines = file_object.readlines()

    # removing first string (the headers)
    lines.pop(0)

    # converting the list of strings to a list of intarrays
    listofintarrays = []
    for line in lines:
        strarr = line.split(",")
        strarr[-1] = strarr[-1].rstrip()
        for idx in range(len(strarr)):
            strarr[idx] = int(strarr[idx])
        listofintarrays.append(strarr)

    # using loops to fill the STATISTICS dictionary
    for item in listofintarrays:
        if item[0] in STATISTICS:
            continue
        else: STATISTICS[item[0]]={}
    
    for item in listofintarrays:
        if item[1] in STATISTICS[item[0]]:
            continue
        else: STATISTICS[item[0]][item[1]]={}
    
    for item in listofintarrays:
        if item[2] in STATISTICS[item[0]][item[1]]:
            continue
        else: STATISTICS[item[0]][item[1]][item[2]]={}

    for item in listofintarrays:
        STATISTICS[item[0]][item[1]][item[2]][item[3]] = item[4]

    pprint.pprint(STATISTICS[2015][1])

In [8]:
!python kbh.py

Downloading file to ./befkbhalderstatkode.csv
{0: {5100: 614,
     5104: 2,
     5106: 1,
     5110: 1,
     5120: 4,
     5126: 1,
     5130: 5,
     5140: 3,
     5150: 5,
     5154: 1,
     5164: 3,
     5170: 3,
     5180: 3,
     5228: 1,
     5306: 2,
     5390: 1,
     5448: 1,
     5464: 1,
     5472: 1,
     5502: 1,
     5704: 1,
     5752: 1},
 1: {5100: 540,
     5104: 3,
     5106: 2,
     5110: 3,
     5120: 4,
     5130: 3,
     5140: 1,
     5142: 1,
     5150: 2,
     5154: 1,
     5156: 3,
     5164: 2,
     5170: 4,
     5180: 3,
     5390: 3,
     5432: 1,
     5448: 2,
     5462: 1,
     5472: 1,
     5502: 3,
     5700: 1,
     5704: 1,
     5750: 1,
     5756: 1},
 2: {5100: 485,
     5110: 1,
     5115: 1,
     5120: 3,
     5130: 4,
     5140: 1,
     5142: 1,
     5150: 3,
     5156: 1,
     5164: 3,
     5170: 4,
     5180: 5,
     5288: 1,
     5306: 1,
     5314: 2,
     5318: 1,
     5390: 5,
     5432: 1,
     5611: 1,
     5700: 1},
 3: {5100: 469,
     

      5134: 1,
      5140: 2,
      5142: 1,
      5150: 11,
      5154: 4,
      5156: 1,
      5158: 3,
      5160: 1,
      5162: 1,
      5164: 7,
      5170: 9,
      5174: 1,
      5180: 12,
      5182: 1,
      5228: 1,
      5306: 1,
      5314: 2,
      5326: 1,
      5328: 1,
      5356: 1,
      5390: 10,
      5392: 2,
      5432: 2,
      5436: 1,
      5438: 2,
      5442: 3,
      5444: 4,
      5456: 1,
      5458: 2,
      5474: 1,
      5484: 1,
      5492: 1,
      5502: 2,
      5700: 1,
      5752: 1,
      5778: 1},
 43: {5100: 605,
      5104: 2,
      5106: 2,
      5110: 8,
      5120: 11,
      5128: 1,
      5130: 4,
      5134: 1,
      5140: 1,
      5142: 2,
      5150: 8,
      5154: 6,
      5156: 1,
      5158: 6,
      5160: 3,
      5164: 5,
      5170: 12,
      5172: 1,
      5180: 12,
      5262: 1,
      5306: 2,
      5314: 2,
      5324: 2,
      5352: 1,
      5354: 2,
      5390: 8,
      5432: 1,
      5438: 1,
      5444: 1,
      5448: 1,
 