# Ingesting Delimited Files in Python
## Introduction
Now that you have worked through the CVS intro lab it is time to try 
 your hand parsing another file.  In this lab you are to open and parse 
 the zillow.csv data file.
 
Each line in the zillow.csv file contains the following 7 attributes:
```
Index
Living Space (sq ft)
Beds
Baths
Zip
Year
List Price ($)
```
You may/should model your approach after our intro to delimited data lab solutions.  When 
 that is not enough, use the documentation to determine an appropriate 
 course of action. https://docs.python.org/3/library/csv.html
 
### Tasks
1. Use the csv module to ingest the data
2. For each how many properties have more than 3 baths?  What are 
 the Index values for those properties?
3. How many houses built prior to 1990 have living space >= 3000 
 square feet
4. Ingest the CSV file using a Python dictionary approach
5. Add an attribute to the dictionary named Living Space (sq m) 
 that is the conversion of square feet to square meters (to convert 
 from square feet to square meters, just multiply by 0.0929).
 

In [1]:
import csv
'''
Reading from a CSV file is done using the reader object. 
The CSV file is opened as a text file with Python’s built-in open() function, 
which returns a file object. This is then passed to the reader, which does 
the heavy lifting.
'''
# 'with' creates a context manager whose job is the make sure that 
# resources created in support of opening an reading a file are
# properly disposed of when the context is exited.  This may be the result of 
# an error or normal (non-error) processing

# As most other things in Python, the with statement is actually very simple, once you 
# understand the problem it’s trying to solve. Consider this piece of pseudocode:
'''
    set things up
    try:
        do something that might fail
    finally:
        tear things down
'''
# “set things up” could be opening a file (as we are doing here), or acquiring some 
# sort of external resource, and “tear things down” would then be closing the file, 
# or releasing or removing the resource. The try-finally construct guarantees that the 
# “tear things down” part is always executed, even if the code in the 'try' block 
# does the work doesn’t finish for any reason. 
# 'with' is sytactic sugar, as they say, that simply implements the above pseudocode.
# See https://www.python.org/dev/peps/pep-0343/ for gory details.
#
# read the csv file using a Python row approach
with open('zillow.csv', mode='r', newline='') as csv_file:
    csv_reader = csv.reader(csv_file)

    # Notice, each row is a list.
    # The header row is presented as a "row" by the csv_reader iterator,
    #
    # 1st row contains the following attribute names:
    # Index,LivingSpace,Beds,Baths,Zip,Year,ListPrice
    # 
    gt_3_baths = []
    lg_pre_1990 = 0

    # Anyone working with python will need to understand
    # iterators and iterables.  If you don't, you should read the following:
    # https://treyhunner.com/2016/12/python-iterator-protocol-how-for-loops-work/
    # I can ask an iterator for it's next value using next().
    # Here, the 'next' value is the header row.
    # Since this is my first request - next is also the first value - the header row.

    hdr = next(csv_reader)

    # create a tuple of the index values in the row list for each attribute
    # ranges() returns an object that produces a sequence of integers from start (inclusive)
    # to stop (exclusive) by step.  range(i, j) produces i, i+1, i+2, ..., j-1.

    # Note: this assumes I know the order of the columns - a reasonable assumption
    Index,LivingSpace,Beds,Baths,Zip,Year,ListPrice = [ i for i in range(0, len(hdr)) ]

    # when complete Index=0, LivingSpace=1, etc.

    for row in csv_reader:
        # Recall, everything returned as a row is a string data type.
        # Often we will need to convert column from a string to a number type.

        # Use a list comprehension to pick out the keys of interest and
        # convert them to the appropriate number type (int or float).  Assign the resulting numbers
        # to a tuple of variables with the appropriate names.

        # Index and Year are integers
        Index_v,Year_v = (int(v) for i,v in enumerate(row) if i in [Index,Year])

        # LivingSpace and Baths are floating point
        LivingSpace_v,Baths_v = (float(v) for i,v in enumerate(row) if i in [LivingSpace,Baths])

        if Baths_v > 3:
            gt_3_baths.append(row[Index]) 
            
        if Year_v < 1990 and LivingSpace_v >= 3000:
            lg_pre_1990 += 1

        # Add the sq meter requirement
        row.append(LivingSpace_v * .0929)

print(f'There are {len(gt_3_baths)} houses with more than 3 baths.')

# conditionally print the list of houses
if len(gt_3_baths):
    print(f'Those {len(gt_3_baths)} houses are located at indices {gt_3_baths}.')

print(f'There are {lg_pre_1990} houses built prior to 1990 that have 3000 or more of living space.')

There are 8 houses with more than 3 baths.
Those 8 houses are located at indices ['1', '3', '5', '9', '12', '13', '16', '19'].
There are 3 houses built prior to 1990 that have 3000 or more of living space.


# Alternative Solution Using Dictionary Approach

In [None]:
import csv

# read the csv file using a Python dictionary approach (DictReader)
with open('zillow.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)

    # Notice, each row is now a dicitionary - not a list.
    # The header row is not presented as a "row" by the csv_reader iterator,
    # Rather, the header row is used to create the key to be used/consumed to create/define
    # the keys for each row-level dicitionary
    #
    # 1st row contains the following attribute names:
    # Index,LivingSpace,Beds,Baths,Zip,Year,ListPrice
    # The attr names will be the keys in each row dict.
    gt_3_baths = []
    lg_pre_1990 = 0
    for row_dict in csv_reader:
        # Here's a neat trick... the items() method returns an list of 
        # the dict key,value pairs for the current row making it an iterable.
        # The values imported into the dict will all be of type string.
        # To use these values in computations they must be converted numbers.

        # Use a list comprehension to pick out the keys of interest and 
        # convert them to the appropriate number type (int or float).  Assign the resulting numbers
        # to a tuple of variables with the appropriate names. 
        #  
        Index,Year = [int(v) for k,v in row_dict.items() if k in ['Index','Year']]
        LivingSpace,Baths = [float(v) for k,v in row_dict.items() if k in ['LivingSpace','Baths']]

        if Baths > 3:
            gt_3_baths.append(Index) 
            
        if Year < 1990 and LivingSpace >= 3000:
            lg_pre_1990 += 1

        row_dict['LivingSpace_sqm'] = LivingSpace * .0929


print(f'There are {len(gt_3_baths)} houses with more than 3 baths.')

# conditionally print the list of houses
if len(gt_3_baths) > 0:
    print(f'Those {len(gt_3_baths)} houses are located at indices {gt_3_baths}.')

print(f'There are {lg_pre_1990} houses built prior to 1990 that have 3000 or more of living space.')