## Assignment 2: CLI, Openpyxl and modules
## Group: Good Awareness

This assignment will make you work with files, command-line and objects. 

### How to hand in
The assignment is expected to be published on GitHub, but the actual hand-in should be a link to a MyBinder. See the notebook `12-Assignments` if you don't know what that means
  


## Part 1: Download script

Write a program `download_script.py`, which downloads a set of files from the internet. The files to download are given as arguments to your program on the command-line as illustrated in the following:

  ```bash
  $ python download_script.py http://www.gutenberg.org/files/2701/2701-0.txt http://www.gutenberg.org/cache/epub/27525/pg27525.txt
  Downloading file to ./2701-0.txt
  Downloading file to ./pg27525.txt
  ```

    Reuse your `webget` module from exercises in 07-Functions and Modules.

In [None]:
# webget.py
import os
import urllib.request as req
from urllib.parse import urlparse
import sys

# takes a list of urls and downloads the files specified at the end of the link
def download(urls):
    for url in urls:
        # using urlparse to get the url in components
        urlstring = urlparse(url)

        # making stringarray of the path from the url
        urlsplit = urlstring.path.split("/")
        
        # to defines the last part of the url, where the file is to be saved
        to = urlsplit[-1]

        # saves the file as intended
        req.urlretrieve(url, to)
        print("Downloading file to ./", to, sep="")



In [None]:
#download_script.py
import webget
import sys

urls = sys.argv[1:]
webget.download(urls)


In [None]:
!python download_script.py http://www.gutenberg.org/files/2701/2701-0.txt http://www.gutenberg.org/cache/epub/27525/pg27525.txt

## Part 2: Reading .xlsx files


Write a program that converts the Excel spreadsheet `./iris_data.xlsx` into a CSV file with the same data. 
 * Start with writing a unit test against which you implement your solution. You are welcome to use a framework for this, but it can also simply be a function that you call with an expected outcome


In [None]:
# excel_to_csv.py
import openpyxl
import csv
import xlrd
filename = 'iris_data.xlsx'
# created from: https://en.wikipedia.org/wiki/Iris_flower_data_set#Data_set


def csv_from_excel():
    wb = xlrd.open_workbook(filename)

    # retrieves the sheet to be worked on
    sh = wb.sheet_by_name("Fisher's Iris Data")

    # creates csv-file with write-permission
    csv_file = open('output.csv', 'w')

    # creates writer-object
    wr = csv.writer(csv_file, delimiter='\t', quotechar='|')

    # nrows is the number of rows
    for rownum in range(sh.nrows):
        wr.writerow(sh.row_values(rownum))

    print("excel to csv has been completed for " + filename)
    csv_file.close()

# runs the csv_from_excel function:
csv_from_excel()

#run test to spam data into console
def test_excel_file():
    wb = openpyxl.load_workbook(filename)
    sheet = wb.get_sheet_by_name("Fisher's Iris Data")
    for rowOfCellObjects in sheet['A1':'E151']:
        for cellObj in rowOfCellObjects:
            print(cellObj.coordinate, cellObj.value)
        print('---------')

# run test_excel_file() 

In [None]:
!python excel_to_csv.py

## Part 3: Creating a module with data

Write a function that reads the `befkbhalderstatkode.csv` file from this url: `'http://data.kk.dk/dataset/76ecf368-bf2d-46a2-bcf8-adaf37662528/resource/9286af17-f74e-46c9-a428-9fb707542189/download/befkbhalderstatkode.csv'`

The function should return the following `STATISTICS` dictionary:

  ```python

  STATISTICS = {
      2015: {
          1: {
              0: {
                 5100: 614,
                 5104: 2,
                 5106: 1,
                 ...
              },
              1: {
                  5100: 485,
                  5110: 1,
                  5115, 1,
                  ...
              },
              2: {
                  ...
              },
              ...
          },
          2: {
              ...
          },
          3: {
              ...
          },
          ...
      },
      2014: {
          ...
      },
      ...
  }
  ```
  To be sure that the code is complete and correct, start with writing a **unit test**, which iterates over the CSV data and checks that the corresponding data exists in the dictionary. Here is an example
  
  ```python
  import kkdata
    
  f = './befkbhalderstatkode.csv'
  
  reader = csv.reader(f)
  header_row = next(reader)
  for row in reader:
      data.append(row)
      
      assert kkdata.STATISTICS[row[0]][row[1]][row[2]][row[3]] == [row[4]]
  ```

In [None]:
import webget
import pprint

urllist = ["http://data.kk.dk/dataset/76ecf368-bf2d-46a2-bcf8-adaf37662528/resource/9286af17-f74e-46c9-a428-9fb707542189/download/befkbhalderstatkode.csv"]
webget.download(urllist)

with open('befkbhalderstatkode.csv') as file_object:
    STATISTICS = {}

    # making a list of strings
    lines = file_object.readlines()

    # removing first string (the headers)
    lines.pop(0)

    # converting the list of strings to a list of intarrays
    listofintarrays = []
    for line in lines:
        strarr = line.split(",")
        strarr[-1] = strarr[-1].rstrip()
        for idx in range(len(strarr)):
            strarr[idx] = int(strarr[idx])
        listofintarrays.append(strarr)

    # using loops to fill the STATISTICS dictionary
    for item in listofintarrays:
        if item[0] in STATISTICS:
            continue
        else: STATISTICS[item[0]]={}
    
    for item in listofintarrays:
        if item[1] in STATISTICS[item[0]]:
            continue
        else: STATISTICS[item[0]][item[1]]={}
    
    for item in listofintarrays:
        if item[2] in STATISTICS[item[0]][item[1]]:
            continue
        else: STATISTICS[item[0]][item[1]][item[2]]={}

    for item in listofintarrays:
        STATISTICS[item[0]][item[1]][item[2]][item[3]] = item[4]

    pprint.pprint(STATISTICS[2015][1])


In [None]:
import kbh
import csv
file = './befkbhalderstatkode.csv'

# testing kbh.py
with open('befkbhalderstatkode.csv') as file:
    # making array of strings with data from the file
    linesarray = file.readlines()

    # converting stringarray to intarray
    for idx in range(len(linesarray)):
        linesarray[idx] = linesarray[idx].strip()
    
    assert kbh.STATISTICS[2015][1][0][5106] == int(linesarray[3][-1])
    print("Test complete.")

In [None]:
!python kbh.py

In [None]:
!python testingkbh.py