<a href="https://colab.research.google.com/github/luis-telesforo/Cleaning-Data/blob/main/reading_csv.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Readind a csv with the csv module
We have a csv file with MPG records. With this we do a simply calculation of the mean MPG taking into account different variables.

We take data from mpg records (mpg.csv) provided by U-M.

In [1]:
import csv

#We set precision 2:
%precision 2

with open("/content/drive/MyDrive/bases de datos michigan/mpg.csv") as csvfile:
  mpg = list(csv.DictReader(csvfile))

Our csv file contains a dictionary for each of the 234 cars listed. The set of keys is unique. In other words a key *k* is a variable with values in **mpg[n][k]** for $0\leq n\leq 233$. 

Here is an example of a dictionary in **mpg**



In [6]:
mpg[0]

OrderedDict([('', '1'),
             ('manufacturer', 'audi'),
             ('model', 'a4'),
             ('displ', '1.8'),
             ('year', '1999'),
             ('cyl', '4'),
             ('trans', 'auto(l5)'),
             ('drv', 'f'),
             ('cty', '18'),
             ('hwy', '29'),
             ('fl', 'p'),
             ('class', 'compact')])

So, our variables are:

In [7]:
mpg[0].keys()

odict_keys(['', 'manufacturer', 'model', 'displ', 'year', 'cyl', 'trans', 'drv', 'cty', 'hwy', 'fl', 'class'])

We calculate the mean city MPG (that is, the miles per gallon a car can reach in the city) across all cars in our records we use **float** because each value is a string.

In [None]:
sum(float(d["cty"]) for d in mpg)/len(mpg)

16.86

We can extract all instances of the variable *cyl* and analize our data according to this cualitative variable, so we can compare fuel economy based on the number of cylinders a car has.

In [9]:
cylinders = set(d["cyl"] for d in mpg)

CtyMpgCyl = [] #this list will contain tuples (c=#of cylinders,city MPG for c)
for c in cylinders:

  sumMpg = 0 #the mpg value for cars with c cylinders
  cars_with_c_cyl = 0 #the number of cars with c cylinders
  for d in mpg:

    if d["cyl"] == c: #we check if the d car has c cylinders

      sumMpg += float(d["cty"])
      cars_with_c_cyl += 1

  CtyMpgCyl.append((c,sumMpg/cars_with_c_cyl))

CtyMpgCyl.sort(key=lambda x: x[0])
CtyMpgCyl

[('4', 21.01), ('5', 20.50), ('6', 16.22), ('8', 12.57)]

We conclude that the mean of MPG is the best for cars with $4$ cylinders.

We analize another variable with the same algorithm:

In [8]:
car_class = set(d["class"] for d in mpg)

CtyMpgClass = [] #this list will contain tuples (c=class,city MPG for c)
for c in car_class:

  sumMpg = 0 #the mpg value for cars of class c
  cars_in_c = 0 #the number of cars of class
  for d in mpg:

    if d["class"] == c: #we check if the d car is in the class c

      sumMpg += float(d["cty"])
      cars_in_c += 1

  CtyMpgClass.append((c,sumMpg/cars_in_c))

CtyMpgClass.sort(key=lambda x: x[1])
CtyMpgClass

[('pickup', 13.00),
 ('suv', 13.50),
 ('2seater', 15.40),
 ('minivan', 15.82),
 ('midsize', 18.76),
 ('compact', 20.13),
 ('subcompact', 20.37)]