  <h3>MPG Datafile Information</h3>
    <p>Let's import our datafile <strong>mpg.csv</strong>, which contains fuel economy data for 234 cars.</p>
    <ul>
        <li><strong>mpg</strong>: miles per gallon</li>
        <li><strong>class</strong>: car classification</li>
        <li><strong>cty</strong>: city mpg</li>
        <li><strong>cyl</strong>: # of cylinders</li>
        <li><strong>displ</strong>: engine displacement in liters</li>
        <li><strong>drv</strong>: f = front-wheel drive, r = rear wheel drive, 4 = 4wd</li>
        <li><strong>fl</strong>: fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)</li>
        <li><strong>hwy</strong>: highway mpg</li>
        <li><strong>manufacturer</strong>: automobile manufacturer</li>
        <li><strong>model</strong>: model of car</li>
        <li><strong>trans</strong>: type of transmission</li>
        <li><strong>year</strong>: model year</li>
    </ul>

In [2]:
import csv 
# Set the precision to 2 decimal places
%precision 2

with open("datasets/mpg.csv") as csvfile:
    mpg=list(csv.DictReader(csvfile))


mpg[:4]


[{'': '1',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'auto(l5)',
  'drv': 'f',
  'cty': '18',
  'hwy': '29',
  'fl': 'p',
  'class': 'compact'},
 {'': '2',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '1.8',
  'year': '1999',
  'cyl': '4',
  'trans': 'manual(m5)',
  'drv': 'f',
  'cty': '21',
  'hwy': '29',
  'fl': 'p',
  'class': 'compact'},
 {'': '3',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '2',
  'year': '2008',
  'cyl': '4',
  'trans': 'manual(m6)',
  'drv': 'f',
  'cty': '20',
  'hwy': '31',
  'fl': 'p',
  'class': 'compact'},
 {'': '4',
  'manufacturer': 'audi',
  'model': 'a4',
  'displ': '2',
  'year': '2008',
  'cyl': '4',
  'trans': 'auto(av)',
  'drv': 'f',
  'cty': '21',
  'hwy': '30',
  'fl': 'p',
  'class': 'compact'}]

`csv.Dictreader` has read in each row of our csv file as a dictionary. `len` shows that our list is comprised of 234 dictionaries.

In [3]:
len(mpg)

234

`keys` gives us the column names of our csv.

In [7]:
mpg[0].keys()

dict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

This is how to find the average cty fuel economy across all cars. All values in the dictionaries are strings, so we need to convert to float.
        <li><strong>cty</strong>: city mpg</li>


In [8]:
sum( float(d['cty']) for d in mpg ) / len(mpg)

16.86

Similarly this is how to find the average hwy fuel economy across all cars.

In [10]:
sum(float(d['hwy']) for d in mpg) / len(mpg)

23.44

Use `Set` to get only the unique values for the number of cylinders  in the cars in our dataset 

In [12]:
cylinders=set(d['cyl'] for d in mpg)
cylinders

{'4', '5', '6', '8'}

Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group.

In [14]:
CtyMpgByCyl=[]

for c in cylinders:# iterate over all the cylinder levels
    summpg=0
    cyltypecount=0
    for d in mpg:# iterate over all dictionaries
        if d['cyl'] == c:
            summpg+=float(d['cty']) # add the cty mpg
            cyltypecount+=1 # increment the count
    CtyMpgByCyl.append((c,summpg/cyltypecount))# append the tuple ('cylinder', 'avg mpg')


CtyMpgByCyl.sort(key=lambda x:x[0])
CtyMpgByCyl

[('4', 21.01), ('5', 20.50), ('6', 16.22), ('8', 12.57)]

Use `set` to return the unique values for the class types in our dataset.

In [15]:
vehicleclass = set(d['class'] for d in mpg)  # what are the class types
vehicleclass

{'2seater', 'compact', 'midsize', 'minivan', 'pickup', 'subcompact', 'suv'}

And here's an example of how to find the average hwy mpg for each class of vehicle in our dataset.

In [16]:
HwyMpgByClass = []

for t in vehicleclass:  # iterate over all the vehicle classes
    summpg = 0
    vclasscount = 0
    for d in mpg:  # iterate over all dictionaries
        if d['class'] == t:  # if the cylinder amount type matches,
            summpg += float(d['hwy'])  # add the hwy mpg
            vclasscount += 1  # increment the count
    HwyMpgByClass.append((t, summpg / vclasscount))  # append the tuple ('class', 'avg mpg')

HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass

[('pickup', 16.88),
 ('suv', 18.13),
 ('minivan', 22.36),
 ('2seater', 24.80),
 ('midsize', 27.29),
 ('subcompact', 28.14),
 ('compact', 28.30)]