# <center>Week 1 Assignment</center>
In this week's FTE, we examined CSV and JSON file formats. We wrote code to manually convert a specific CSV file to a specific JSON in the process. 

In [1]:
# Read CSV file. This wouldn't work well for very large files
with open('data/scientists.csv') as f:
    reader = csv.DictReader(f)
    rows = list(reader)
    
# Write JSON file to disk
with open('data/scientists.json', 'w') as f:
    json.dump(rows, f)

FileNotFoundError: [Errno 2] No such file or directory: 'data/scientists.csv'

## Part 1

The comment above the CSV section makes an assumption and says it wouldn't work for large files. Use the following articles to understand the terms **eager evaluation** and **lazy evaluation**:
* https://en.wikipedia.org/wiki/Eager_evaluation
* https://en.wikipedia.org/wiki/Lazy_evaluation

Now answer the following questions:

1. In light of the two definitions above, what is the assumption made by that comment?
2. Explain why that assumption is right or wrong.
3. Is it safe to use the code above on large files?

Eager Evaluation stores fully computes values before they are required. Greedy evaluation computes as soon as its arguments are known (Imperative languages are generally greedily evaluated). Greedy and Eager evaluations seem a lot alike but eager evaluation allows to create a large workload of values to be evaluated where as greed ebaluation would immediately evaluate them. 

**Pros**
* More Control
* More Transparent

__Cons__
* More responsibility for the developer
* Potential overhead more memory intense 

Lazy Evaluation only computes a value when it is needed. The assumption made by that comment is that the entire file will be read directly into memeory and not used immediately. Advantages of Lazy Evaluation
* It allows the language runtime to discard sub-expression that are not directly linked to the final result of the expression
* It allows faster computations, only access the needed data
* It allows the programmer to access components of data structures out-of-order after initializing them 
* Great for using not-frequently access data

Eager and Lazy Evaluation really depends on the langague you are working in.


1. In light of the two definitions above, what is the assumption made by that comment?
* The assumption made above is that the entire file is being read and used immediately and if it is a large file it will run into memory issues. The better faster way would be to read in the file by chucks or process the data when reading in in


3. Is it safe to use the code above on large files?
    * It really depends on the system you are working on and how large the file is. There are better/faster ways to get a file from CSV to JSON like do a json.dump while reading the file in (process the data while reading it in)

## Part 2

Programmers hate code written for a specific case, "I don't care if it can solve one special case, I want it to solve *all* cases." This generalization process is called **"abstraction"**.

1. Generalize the CSV->JSON code above into a function that can work for any CSV file and any JSON file (within reason).

Being able to convert in only one direction is only helpful half the time. Specious math aside, 

2. Write a generalized JSON->CSV converter function.

3. Use the functions to do a "round-trip" (CSV->JSON->CSV or JSON->CSV->JSON) on the Consumer Complaint Database data found at https://catalog.data.gov/dataset/consumer-complaint-database#topic=consumer_navigation

When you are done with the two functions, the original and the round-trip files should be reasonably identical.

**Hint:** Mac and Linux have a command line tool called 'diff' that will show differences between two files. Windows users can use the 'fc' command on the command line. See this answer on StackOverflow for alternatives: https://stackoverflow.com/questions/6877238/what-is-the-windows-equivalent-of-the-diff-command

**Also: No using libraries like Pandas that will automatically do the conversion!!** The purpose of this exercise is for you to get a fairly deep understanding of the two formats and using a converter will not fulfill that purpose. 

I might of gone overbroad on this section of the Assignment.

In [3]:

import csv 
import json
import re

def jsonToCsv(jsonFile, output):
    fileInput = jsonFile
    fileOutput = output
    inputFile = open(fileInput) #open json file
    outputFile = open(fileOutput, 'w') #load csv file
    data = json.load(inputFile) #load json content
    inputFile.close() #close the input file
    output = csv.writer(outputFile) #create a csv.write
    output.writerow(data[0].keys())  # header row
    for row in data:
        output.writerow(row.values()) #values row
    outputFile.close()
        
def csvToJson(csvFile,output):
    fileInput = csvFile
    fileOutput = output
    inputFile = open(fileInput) #open csv file
    outputFile = open(fileOutput, 'w') #open json/output file
    lines= []
    for line in inputFile:
        line = line.replace("\n","") #remove newline chars from the end of each rec
        lines.append(line) # read each line into lines list
    header = lines[0].split(",")
    for i in lines[1:]:
        data_line = i.split(",")
        data_dict = dict(zip(header,data_line))
        json.dump(data_dict, outputFile) 
    outputFile.close()
        
def jsonRegex(infile, output):
    '''
    jsonRegex(infile,output) expects a file from CSVTOJSON function.
    jsonRegex takes in a the formated file from csv2json and adds a [ at the beginning of the file
    and formats between the }{ to add a comma },{. It also adds the closing bracket ] at the end of 
    the file. Current is is added to a new line. currently working on getting it to the end of the last
    entry
    '''
    fileInput = infile
    fileOutput = output
    inputfile = open(fileInput,'r')
    outputfile= open(fileOutput, 'w')
    outputfile.write("[")
    for x in inputfile:
        outputfile.write(x.replace("}{","},{"))
    outputfile.write("]")
    outputfile.close()

In [4]:
jsonToCsv("scientists.json","json2csv.tst")
csvToJson("json2csv.tst","csv2json.tst")
jsonRegex("csv2json.tst", "csv2json2.tst")