## Class 2 Python for Data Science
### Python Dictionary
### List Comprehension
### Reading CSV file and fix data errors

One of Python's built−in datatypes is the dictionary, which defines one−to−one relationships between keys and values.

"Like lists dictionaries can easily be changed, can be shrunk and grown ad libitum at run time. They shrink and grow without the necessity of making copies. Dictionaries can be contained in lists and vice versa. But what's the difference between lists and dictionaries? Lists are ordered sets of objects, whereas dictionaries are <b>unordered sets.</b> But the main difference is that items in dictionaries are accessed via keys and not via their position."

<br>
A pair of braces creates an empty dictionary: {}. Placing a comma-separated list of key:value pairs within the braces adds initial key: value pairs to the dictionary; this is also the way dictionaries are written on output.

In [274]:
dict1 = {"fruit" : [75,"orange"], "vegetable":"onion, mushroom, lettuce"}
dict1

{'fruit': [75, 'orange'], 'vegetable': 'onion, mushroom, lettuce'}

### Keys

Get the keys from "dict1"

In [275]:
dict1.keys()

dict_keys(['fruit', 'vegetable'])

### Indexing With Keys?

What happens if you try to run "<b>dict1[0]</b>"? Why?


In [276]:
dict1["fruit"]

[75, 'orange']

OR

In [277]:
dict1.get("fruit")

[75, 'orange']

### ii.Values

Get the values from "dict1"

In [278]:
dict1.values()

dict_values([[75, 'orange'], 'onion, mushroom, lettuce'])

### Indexing With Values?
A little more complicated

In [279]:
V = 'orange, watermelon, grape'

for key, value in dict1.items():
    if value == V:
        K = key
print(K)

fruit


### iii. Length of Dictionary

Returns the number of stored entries, i.e. the number of (key,value) pairs.

In [280]:
len(dict1)

2

### iv. Remove key and value

In [281]:
del dict1["vegetable"]
print(dict1)

{'fruit': [75, 'orange']}


### v. Add new value

In [282]:
dict1["new"] = 0
print(dict1)

{'fruit': [75, 'orange'], 'new': 0}


### vi. Concatenating Dictionaries
<i>*Note: Keys must be unique</i>

In [283]:
dict1 = {"fruit" : "orange, watermelon, grape", "vegetable":"onion, mushroom, lettuce"}
dict2 = {"fruit1": [5,6,7]}
dict1.update(dict2)
dict1

{'fruit': 'orange, watermelon, grape',
 'vegetable': 'onion, mushroom, lettuce',
 'fruit1': [5, 6, 7]}

### <font color = "coral">Exercise 1: Create a new dictionary</font>
<font color = "coral">Your keys should be "Country","State","City","ZipCode"

Fill in the values according to the keys.

In [1]:
#Your code here
my_dict = {"Country" : "USA", "State":"Calif", "City":"Sunnyvale", "Zip Code":"94087"}


## Multi-dimensional Array

In [285]:
a = [[0,  1, 2, 3, 4, 5],
     [10,11,12,13,14,15],
     [20,21,22,23,24,25],
     [30,31,32,33,34,35],
     [40,41,42,43,44,45],
     [50,51,52,53,54,55]]

In [286]:
a[0]

[0, 1, 2, 3, 4, 5]

In [287]:
a[4:6]

[[40, 41, 42, 43, 44, 45], [50, 51, 52, 53, 54, 55]]

In [288]:
a[5][5]

55

### List Comprehensions

In [289]:
[x**2 for x in range(15)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]

 SAME AS BELOW

In [290]:
original = list(range(15))

squares = []

for x in original:
    squares.append(x**2)
    
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196]

#### What is happening in this loop?

In [291]:
new = []
for x in squares:
    if x < 100:
        new.append(x**2)
new

[0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561]

In [292]:
new = [i**2 for i in squares if i < 100]
new 

[0, 1, 16, 81, 256, 625, 1296, 2401, 4096, 6561]

### <font color = "coral">Exercise 2:
<font color = "coral">
Turn this for loop into a nested for list comprehension (Should only be one line).

In [293]:
mystery = []
for i in range(1000):
    if i%5 == 0:
        mystery.append(i)
print(mystery)

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 505, 510, 515, 520, 525, 530, 535, 540, 545, 550, 555, 560, 565, 570, 575, 580, 585, 590, 595, 600, 605, 610, 615, 620, 625, 630, 635, 640, 645, 650, 655, 660, 665, 670, 675, 680, 685, 690, 695, 700, 705, 710, 715, 720, 725, 730, 735, 740, 745, 750, 755, 760, 765, 770, 775, 780, 785, 790, 795, 800, 805, 810, 815, 820, 825, 830, 835, 840, 845, 850, 855, 860, 865, 870, 875, 880, 885, 890, 895, 900, 905, 910, 915, 920, 925, 930, 935, 940, 945, 950, 955, 960, 965, 970, 975, 980, 985, 990, 995]


In [2]:
#Your code here
mystery = [x for x in range(1000) if x%5 == 0 ]
print(mystery)

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 305, 310, 315, 320, 325, 330, 335, 340, 345, 350, 355, 360, 365, 370, 375, 380, 385, 390, 395, 400, 405, 410, 415, 420, 425, 430, 435, 440, 445, 450, 455, 460, 465, 470, 475, 480, 485, 490, 495, 500, 505, 510, 515, 520, 525, 530, 535, 540, 545, 550, 555, 560, 565, 570, 575, 580, 585, 590, 595, 600, 605, 610, 615, 620, 625, 630, 635, 640, 645, 650, 655, 660, 665, 670, 675, 680, 685, 690, 695, 700, 705, 710, 715, 720, 725, 730, 735, 740, 745, 750, 755, 760, 765, 770, 775, 780, 785, 790, 795, 800, 805, 810, 815, 820, 825, 830, 835, 840, 845, 850, 855, 860, 865, 870, 875, 880, 885, 890, 895, 900, 905, 910, 915, 920, 925, 930, 935, 940, 945, 950, 955, 960, 965, 970, 975, 980, 985, 990, 995]


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 3:</h1></font>

<font color = "coral">
Not that you have all this knowledge on different operators, data types, and loops create a loop that removes all the unwanted information from our list.

<b>1) Create a loop where you get rid of all the odd numbers.
<br><br>
2) Put all the numbers in order from smallest to largest.<br><br>
3) Once you only have a list of ordered even numbers convert all these integers into strings.<br><br>
4) Now print your number strings as a single string with comma separation.</b>



In [3]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

#Your code here
no_odd_num = [x for x in lst if x%2 == 0]
print("1:",no_odd_num)

no_odd_num.sort()
print("2:",no_odd_num)

y=[]
for i in range(len(no_odd_num)):
    i1 = str(no_odd_num[i])
    y.append(i1)

print("3:",y)

x = ','.join(y)
print("4:",x)
print(type(x))


1: [4, 6, 2, 6, 8, 4, 6, 8, 454, 876, 54, 76, 34, 234234, 23432, 4, 1212, 76, 5432, 123212245346342, 2]
2: [2, 2, 4, 4, 4, 6, 6, 6, 8, 8, 34, 54, 76, 76, 454, 876, 1212, 5432, 23432, 234234, 123212245346342]
3: ['2', '2', '4', '4', '4', '6', '6', '6', '8', '8', '34', '54', '76', '76', '454', '876', '1212', '5432', '23432', '234234', '123212245346342']
4: 2,2,4,4,4,6,6,6,8,8,34,54,76,76,454,876,1212,5432,23432,234234,123212245346342
<class 'str'>


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 4:</h1></font>

<font color = "coral">If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.

Find the sum of all the multiples of 3 or 5 below 10,000.

In [4]:
#Your answer here
l=[x for x in range(1,10000) if x%3 ==0 or x%5 ==0]
print(sum(l))

23331668


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 5:</h1></font>

<font color = "coral">
Calculate all square numbers (1,4,9,16,...) below 1,000. What's their sum?

In [5]:
#Your answer here
s=[x**2 for x in range(1,1000)]
print(sum(s))

332833500


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 6:</h1></font>

<font color = "coral">
Write a function to calculate the mean (average) of "lst". Do not use the built-in "mean" functions that Python offers.

In [6]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55]
           
#Your answer here
def avg_fun():
 c = sum(lst)/len(lst)
 return c

f= avg_fun()
print(f)


121.77272727272727


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 7:</h1></font>

<font color = "coral">
Write a function to calculate the median of "lst". Do not use the built-in "median" functions that Python offers.

In [7]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

#Your answer here
def my_median():
    lst.sort()
    len_of_the_list = len(lst)
    x = int(len_of_the_list / 2)
    if len_of_the_list % 2 == 0: 
        med = int(lst[x-1]+lst[x])/2
        return med
    else:
        return lst[x]

m = my_median()
print(m)



67


 <h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 8:</h1></font>

<font color = "coral">Write a function to calculate the mode of "lst". Do not use the built-in "mode" functions that Python offers.

In [8]:
lst = [4,6,3,2,6,8,9,7,23,4,465,7,6,8,454,5,876,567,54,76,34,55,
       33,7653,234234,7857,23432,4353,4,345,4667,23235,1212,221,
       335,2323,21,45,76,5432,54645645,123212245346342,67,34563,2]

#Your answer here
e=[]
def my_mode():
    no_of_times = []
    for i in lst:
        k = lst.count(i)
        no_of_times.append(k) # constructing list with no. of occurances

    my_dict = dict(zip(lst,no_of_times))
    m = max(my_dict.values()) # constructing dictionary with list elements and no.of occurances

    for key, value in my_dict.items():
        if value == m:e.append(key) # creating a list to check if there is more than 1 mode element

    if(len(e)>1): print("No unique mode") # when 2 equally common values are found
    elif(len(e)==1): print( "mode is", e[0] ) # prints when unique mode value is found
        
my_mode()

No unique mode


# Reading CSV file -- bayarea_home_prices data

In [301]:
"""
Dataset description
1) HomeID = Home ID number
2) HomeAge = Age of home in years
3) HomeSqft = Square footage of home
4) LotSize = LotSize
5) BedRooms = Num bedrooms as per county data
6) HighSchoolAPI = API for nearest high school
7) ProxFwy = Distance in miles to Freeway
8) CarGarage = Number of cars in garage; 0 = no garage
9) ZipCode = Postal zip code for the home
10)HomePriceK = Home price in $K (Target)
-------------------------------------------
9 X Variables; 1 Y variable (Target)
Data Points = 100

Data errors:
1) Few ZipCode have starting digit to be 8, it should be 9
2) Few HighSchoolApi scores have two digits, the ending digit 0 is missing
3) Few CarGarage numbers were entered as letter "l", it should be integer 1 
"""

'\nDataset description\n1) HomeID = Home ID number\n2) HomeAge = Age of home in years\n3) HomeSqft = Square footage of home\n4) LotSize = LotSize\n5) BedRooms = Num bedrooms as per county data\n6) HighSchoolAPI = API for nearest high school\n7) ProxFwy = Distance in miles to Freeway\n8) CarGarage = Number of cars in garage; 0 = no garage\n9) ZipCode = Postal zip code for the home\n10)HomePriceK = Home price in $K (Target)\n-------------------------------------------\n9 X Variables; 1 Y variable (Target)\nData Points = 100\n\nData errors:\n1) Few ZipCode have starting digit to be 8, it should be 9\n2) Few HighSchoolApi scores have two digits, the ending digit 0 is missing\n3) Few CarGarage numbers were entered as letter "l", it should be integer 1 \n'

In [302]:
## Reading csv files
def read_file(filename):
    file_open = open(filename,"r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip() # delete white space characters
        line_split = line_no_newline.split(",")
        data_array.append(line_split)
    return data_array

In [303]:
housing_data = read_file("bayarea_home_prices.csv")
print(housing_data[0:6])

[['1', '24', '1757', '6056', '2', '899', '3', '3', '94085', '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', '94085', '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', '94085', '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', '94085', '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', '94085', '890'], ['6', '18', '1589', '6148', '2', '920', '3', '0', '84085', '867']]


In [304]:
len_housing_data = len(housing_data)
print(len_housing_data)

100


In [305]:
list_HomeAge = []
# for all rows, extract only column 1
for k in range(0,len_housing_data):
    list_HomeAge.append(housing_data[k][1])    

In [306]:
print(list_HomeAge) 
# they are still strings, cannot do numerical calculations with strings 

['24', '10', '14', '14', '24', '18', '13', '19', '17', '24', '12', '22', '15', '25', '10', '20', '23', '16', '10', '13', '17', '10', '15', '10', '21', '12', '13', '10', '17', '24', '10', '18', '11', '19', '12', '14', '13', '22', '22', '15', '23', '21', '17', '11', '15', '11', '21', '22', '12', '19', '19', '25', '23', '12', '10', '11', '11', '19', '22', '19', '13', '19', '25', '12', '14', '25', '24', '12', '21', '16', '19', '24', '25', '17', '14', '12', '17', '25', '17', '11', '18', '19', '24', '25', '22', '19', '18', '22', '21', '14', '16', '18', '25', '21', '13', '11', '10', '21', '19', '11']


In [307]:
# How to convert zipcodes from text to numbers
for k in range(0,len_housing_data):
    housing_data[k][8] = int(housing_data[k][8])  # convert to integer data type and over-write

In [308]:
print(housing_data[0:5]) # Zipcode is without quotes and not strings; they are now integers

[['1', '24', '1757', '6056', '2', '899', '3', '3', 94085, '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', 94085, '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', 94085, '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', 94085, '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', 94085, '890']]


In [1]:
## Reading csv files, how to fix errors in data, replace 84085 with 94085
def read_file_housing(filename):
    file_open = open(filename,"r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()
        line2 = line_no_newline.replace("84085","94085")
        line_split = line2.split(",")
        data_array.append(line_split)
    return data_array



In [310]:
housing_data2 = read_file_housing("bayarea_home_prices.csv")
print(housing_data2[0:6])

[['1', '24', '1757', '6056', '2', '899', '3', '3', '94085', '894'], ['2', '10', '1563', '6085', '2', '959', '4', '3', '94085', '861'], ['3', '14', '1344', '6089', '2', '865', '4', '3', '94085', '831'], ['4', '14', '1215', '6129', '3', '959', '4', '2', '94085', '809'], ['5', '24', '1866', '6141', '3', '877', '4', '1', '94085', '890'], ['6', '18', '1589', '6148', '2', '920', '3', '0', '94085', '867']]


In [311]:
len_housing_data2 = len(housing_data2)
print(len_housing_data2)

100


In [312]:
list_ZipCode2 = []
# for all rows, extract all zipcodes
for k in range(0,len_housing_data2):
    list_ZipCode2.append(int(housing_data2[k][8]))  

In [313]:
print(list_ZipCode2) # Converted to numbers

[94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 94085, 95051, 94085, 94085, 95051, 94085, 94085, 94085, 94085, 95051, 94085, 94085, 94085, 95051, 94085, 95051, 95051, 95051, 95051, 95051, 95051, 95051, 85051, 95051, 95051, 95051, 94087, 94087, 95051, 95051, 95051, 94087, 95051, 94087, 94087, 95051, 95051, 95051, 85051, 94087, 95051, 94087, 94087, 94087, 95051, 94087, 94087, 94087, 94087, 94087, 94087, 95014, 94087, 94087, 94087, 94087, 95014, 94087, 95014, 95014, 84087, 84087, 95014, 94087, 94087, 94087, 95014, 95014, 95014, 95014, 85014, 95014, 95014, 95014, 85014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014, 95014]


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 9:</h1></font>

In [19]:
"""
The above example shows correcting 84085 -> 94085
Perform other zip code corrections: 
84087 -> 94087,
85014 -> 95014,
85051 -> 95051
Create a table for zip code distribution after corrections:
After:Zipcode,House_Count
94085,25
94087,25
95051,25
95014,25
"""
#Your answer here
from collections import Counter

def read_house_data(fname):
    file_open = open(fname, "r")
    new_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()  # deletes spaces from each line
        line2 = line_no_newline.replace("84085", "94085")
        line_split = line2.split(',')
        new_array.append(line_split)

    for k in range(len(new_array)):
        if new_array[k][8] == "84087":
            new_array[k][8] = "94087"
        elif new_array[k][8] == "85051":
             new_array[k][8] = "95051"
        elif new_array[k][8] == "85014":
             new_array[k][8] = "95014"

    return new_array

data_of_houses = read_house_data("bayarea_home_prices.csv")
zipcode = []
for i in range(len(data_of_houses)):
    zipcode.append(int(data_of_houses[i][8]))
counter = Counter(zipcode)
print(counter)

Counter({94085: 25, 95051: 25, 94087: 25, 95014: 25})


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 10:</h1></font>

In [27]:
"""
Modify function read_file_housing to multiply incorrect SchoolAPI by 10.
Assume API value to be incorrect if it is a two digit number.
Calculate average School API by zipcode. Print the following:
Average_SchoolAPI,Cnt_of_homes,ZipCode
xyz,mn,abc
"""


# Your answer here
from statistics import mean
def read_file_housing(filename):
    file_open = open(filename, "r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()
        line2 = line_no_newline.replace("84085", "94085")
        line_split = line2.split(",")
        data_array.append(line_split)

    for k in range(len(data_array)):
        if data_array[k][8] == "84087":
            data_array[k][8] = "94087"
        elif data_array[k][8] == "85051":
            data_array[k][8] = "95051"
        elif data_array[k][8] == "85014":
            data_array[k][8] = "95014"

    for i in range(len(data_array)):
        if int(data_array[i][5]) < 100:
            data_array[i][5] = int(data_array[i][5]) * 10

    return data_array

homes_data = read_file_housing("bayarea_home_prices.csv")
api_94085 = []
api_94087 = []
api_95051 = []
api_95014 = []
for j in range(len(homes_data)):
    if homes_data[j][8] == "94085":
       api_94085.append(int(homes_data[j][5]))
    elif homes_data[j][8] == "94087":
         api_94087.append(int(homes_data[j][5]))
    elif homes_data[j][8] == "95051":
         api_95051.append(int(homes_data[j][5]))
    elif homes_data[j][8] == "95014":
          api_95014.append(int(homes_data[j][5]))
print("Avg API in 94085 zip code is:", mean(api_94085), "no of homes:", len(api_94085))
print("Avg API in 94087 zip code is:", mean(api_94087), "no of homes:", len(api_94087))
print("Avg API in 95051 zip code is:", mean(api_95051), "no of homes:", len(api_95051))
print("Avg API in 95014 zip code is:", mean(api_95014), "no of homes:", len(api_95014))


Avg API in 94085 zip code is: 907 no of homes: 25
Avg API in 94087 zip code is: 899.24 no of homes: 25
Avg API in 95051 zip code is: 916.68 no of homes: 25
Avg API in 95014 zip code is: 894.8 no of homes: 25


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 11:</h1></font>

In [22]:
"""
Modify function read_file_housing to replace CarGarage value of 'l' with integer 1
Calculate and print the following:
Car_Garage,Cnt_of_homes
0,m
1,n
2,o
3,p
"""
from collections import Counter
def read_file_housing(filename):
    file_open = open(filename,"r")
    data_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()
        line2 = line_no_newline.replace("84085","94085")
        line_split = line2.split(",")
        data_array.append(line_split)
    for i in range(len(data_array)):
        if data_array[i][7] == "l":
            data_array[i][7] = "1"
           # print(data_array[i])

    return data_array

house_list = read_file_housing("bayarea_home_prices.csv")
car_garage_list = []
for j in range(len(house_list)):
    car_garage_list.append(house_list[j][7])
count_garage = Counter(car_garage_list)
print(count_garage)

Counter({'3': 32, '0': 31, '2': 19, '1': 18})


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 12:</h1></font>

In [24]:
"""
Find the average price of a home in this four zip codes area.
Zipcode,Avg_Price,Cnt_of_homes
"""
from collections import Counter
from statistics import mean

def read_house_data(fname):
    file_open = open(fname, "r")
    new_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()  # deletes spaces from each line
        line2 = line_no_newline.replace("84085", "94085")
        line_split = line2.split(',')
        new_array.append(line_split)

    for k in range(len(new_array)):
        if new_array[k][8] == "84087":
            new_array[k][8] = "94087"
        elif new_array[k][8] == "85051":
             new_array[k][8] = "95051"
        elif new_array[k][8] == "85014":
             new_array[k][8] = "95014"

    return new_array

data_of_houses = read_house_data("bayarea_home_prices.csv")

homes_94085 = []
homes_94087 = []
homes_95051 = []
homes_95014 = []
for j in range(len(data_of_houses)):
    if data_of_houses[j][8]=="94085":
        homes_94085.append(int(data_of_houses[j][9]))
    elif data_of_houses[j][8]=="94087":
        homes_94087.append(int(data_of_houses[j][9]))
    elif data_of_houses[j][8]=="95051":
        homes_95051.append(int(data_of_houses[j][9]))
    elif data_of_houses[j][8] == "95014":
        homes_95014.append(int(data_of_houses[j][9]))
print("Avg Home price in 94085 zip code is:", mean(homes_94085),"no of homes:", len(homes_94085))
print("Avg Home price in 94087 zip code is:", mean(homes_94087),"no of homes:", len(homes_94087))
print("Avg Home price in 95051 zip code is:", mean(homes_95051),"no of homes:", len(homes_95051))
print("Avg Home price in 95014 zip code is:", mean(homes_95014),"no of homes:", len(homes_95014))




Avg Home price in 94085 zip code is: 885.96 no of homes: 25
Avg Home price in 94087 zip code is: 1151.48 no of homes: 25
Avg Home price in 95051 zip code is: 1023.2 no of homes: 25
Avg Home price in 95014 zip code is: 1263.32 no of homes: 25


<h1> <b><font color = coral>&#9998; <font color = coral>EXERCISE 13:</h1></font>

In [26]:
"""
Find the average price of a home in Sunnyvale (94087 and 94085).
Print the output as follows:
The average house price in Sunnyvale based on xx homes is $yyy (thousands).
"""
from collections import Counter
from statistics import mean

def read_house_data(fname):
    file_open = open(fname, "r")
    new_array = []
    for line in iter(file_open):
        if "HomeID" in line:
            continue
        line_no_newline = line.rstrip()  # deletes spaces from each line
        line2 = line_no_newline.replace("84085", "94085")
        line_split = line2.split(',')
        new_array.append(line_split)

    for k in range(len(new_array)):
        if new_array[k][8] == "84087":
            new_array[k][8] = "94087"
        elif new_array[k][8] == "85051":
             new_array[k][8] = "95051"
        elif new_array[k][8] == "85014":
             new_array[k][8] = "95014"

    return new_array

data_of_houses = read_house_data("bayarea_home_prices.csv")

homes_94085_prices = []
homes_94087_prices = []
homes_95051_prices = []
homes_95014_prices = []
for j in range(len(data_of_houses)):
    if data_of_houses[j][8]=="94085":
        homes_94085_prices.append(int(data_of_houses[j][9]))
    elif data_of_houses[j][8]=="94087":
        homes_94087_prices.append(int(data_of_houses[j][9]))
    elif data_of_houses[j][8]=="95051":
        homes_95051_prices.append(int(data_of_houses[j][9]))
    elif data_of_houses[j][8] == "95014":
        homes_95014_prices.append(int(data_of_houses[j][9]))

sunnyvale_homes_prices = homes_94085_prices+homes_94087_prices
print("The average house price in Sunnyvale based on", len(sunnyvale_homes_prices), "homes is $",mean(sunnyvale_homes_prices),"thousands")


The average house price in Sunnyvale based on 50 homes is $ 1018.72 thousands
