#What Is a CSV File?


A CSV file (Comma Separated Values file) is a type of plain text file that uses specific structuring to arrange tabular data. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.

The structure of a CSV file is given away by its name. Normally, CSV files use a comma to separate each specific data value. Here’s what that structure looks like:

column 1 name,column 2 name, column 3 name
first row data 1,first row data 2,first row data 3
second row data 1,second row data 2,second row data 3


Notice how each piece of data is separated by a comma. Normally, the first line identifies each piece of data—in other words, the name of a data column. Every subsequent line after that is actual data and is limited only by file size constraints.

In general, the separator character is called a delimiter, and the comma is not the only one used. Other popular delimiters include the tab (\t), colon (:) and semi-colon (;) characters. Properly parsing a CSV file requires us to know which delimiter is being used.


# Where Do CSV Files Come From?

CSV files are normally created by programs that handle large amounts of data. They are a convenient way to export data from spreadsheets and databases as well as import or use it in other programs. For example, you might export the results of a data mining program to a CSV file and then import that into a spreadsheet to analyze the data, generate graphs for a presentation, or prepare a report for publication.

CSV files are very easy to work with programmatically. Any language that supports text file input and string manipulation (like Python) can work with CSV files directly.



# Reading and Writing CSV files


## Reading CSV Files With csv

Reading from a CSV file is done using the reader object. The CSV file is opened as a text file with Python’s built-in open() function, which returns a file object. This is then passed to the reader, which does the heavy lifting.


CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. 


The csv module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.

**reader and writer objects**
The csv module’s reader and writer objects read and write sequences. 

**DictReader and DictWriter classes**
Programmers can also read and write data in dictionary form using the DictReader and DictWriter classes.



For more information: 
https://docs.python.org/3/library/csv.html

https://docs.python.org/3.1/library/csv.html



Let's import datafile mpg.csv, which contains fuel economy data for 234 cars.

* mpg : miles per gallon
* class : car classification
* cty : city mpg
* cyl : # of cylinders
* displ : engine displacement in liters
* drv : f = front-wheel drive, r = rear wheel drive, 4 = 4wd
* fl : fuel (e = ethanol E85, d = diesel, r = regular, p = premium, c = CNG)
* hwy : highway mpg
* manufacturer : automobile manufacturer
* model : model of car
* trans : type of transmission
* year : model year

In [0]:
from google.colab import drive
drive.mount('/content/drive')

In [0]:
csv_file_path = '/content/drive/My Drive/Goal - Data Science/python/Section:4-File IO/mpg.csv'

In [0]:
import csv
%precision 2


%precision 2 : This is an IPython magic. It controls how floats display:

In [0]:
with open(csv_file_path) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    
    #print(csv_reader)
    #print ( list(csv_reader) )

    for line_count, row in enumerate(csv_reader):
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
        else:
            print(f'\t{row[0]}.   {row[2]} - {row[4]} of {row[1]} has {row[5]} cylinders.')

    print(f'Processed {line_count} lines.')
    

Each row returned by the reader is a list of String elements containing the data found by removing the delimiters. The first row returned contains the column names, which is handled in a special way.

## Reading CSV Files Into a Dictionary With csv

Rather than deal with a list of individual String elements, you can read CSV data directly into a dictionary as well.


In [0]:
with open(csv_file_path) as csv_file:
    csv_reader = csv.DictReader(csv_file, delimiter=',')
    print(list(csv_reader))
    


In [0]:
with open(csv_file_path) as csv_file:
    csv_reader_list =  list( csv.DictReader(csv_file, delimiter=',') )

    for line_count, row in enumerate(csv_reader_list):
        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
        print(f'\t{row[""]}.   {row["model"]} - {row["year"]} of {row["manufacturer"]} has {row["cyl"]} cylinders.')

    print(f'Processed {line_count+1} lines.')


##Optional Python CSV reader Parameters

The reader object can handle different styles of CSV files by specifying additional parameters, some of which are shown below:

*   **delimiter** specifies the character used to separate each field. The default is the comma (',').

*   **quotechar** specifies the character used to surround fields that contain the delimiter character. The default is a double quote (' " ').

*   **escapechar** specifies the character used to escape the delimiter character, in case quotes aren’t used. The default is no escape character.


## Writing CSV Files With csv

You can also write to a CSV file using a writer object and the .write_row() method

In [0]:
with open('employee_file.csv', mode='w') as csv_file:
    csv_writer = csv.writer(csv_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)

    csv_writer.writerow(['Name', 'Course', 'Month'])
    csv_writer.writerow(['John Smith', 'Accounting', 'November'])
    csv_writer.writerow(['Erica Meyers', 'IT', 'March'])
    csv_writer.writerow(['Kristin', 'Data Science', 'May'])


The quotechar optional parameter tells the writer which character to use to quote fields when writing. Whether quoting is used or not, however, is determined by the quoting optional parameter:

    If quoting is set to csv.QUOTE_MINIMAL, then .writerow() will quote fields only if they contain the delimiter or the quotechar. This is the default case.
    If quoting is set to csv.QUOTE_ALL, then .writerow() will quote all fields.
    If quoting is set to csv.QUOTE_NONNUMERIC, then .writerow() will quote all fields containing text data and convert all numeric fields to the float data type.
    If quoting is set to csv.QUOTE_NONE, then .writerow() will escape delimiters instead of quoting them. In this case, you also must provide a value for the escapechar optional parameter.


## Writing CSV File From a Dictionary With csv

Since you can read our data into a dictionary, it’s only fair that you should be able to write it out from a dictionary as well


In [0]:
with open('employee_file2.csv', mode='w') as csv_file:
    fieldnames = ['Name', 'Course', 'Month']

    csv_writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    csv_writer.writeheader()
    csv_writer.writerow({'Name': 'John Smith', 'Course': 'Accounting', 'Month': 'November'})
    csv_writer.writerow({'Name': 'Erica Meyers', 'Course': 'IT', 'Month': 'March'})
    csv_writer.writerow({'Name': 'Kristin', 'Course': 'Data Science', 'Month': 'May'})


the fieldnames parameter is required when writing a dictionary. This makes sense, when you think about it: without a list of fieldnames, the DictWriter can’t know which keys to use to retrieve values from your dictionaries

# Operations on the list as csv data is converted into the list

In [0]:
len( csv_reader_list )

**Using Slicing:** Get the first three dictionaries in our list.

In [0]:
csv_reader_list[:3]

`keys` gives us the column names of our csv.

In [0]:
csv_reader_list[0].keys()

##Find the average cty fuel economy across all cars

All values in the dictionaries are strings, so we need to convert to float.

In [0]:
avg_cty = sum(float(d['cty']) for d in mpg) / len(mpg)
print(avg_cty)

## Find the average hwy fuel economy across all cars

In [0]:
print(mpg)

In [0]:
#list(map(lambda d: d['hwy'] , mpg))
list(map(lambda d: float(d['hwy']) , mpg))


In [0]:
print("Maximum Value")
print( max(list(map(lambda d: float(d['hwy']) , mpg))) )

print("Minimum Value")
print( min(list(map(lambda d: float(d['hwy']) , mpg))) )


In [0]:
# Method 1
print("Sum using lambda")
print( sum(list(map(lambda d: float(d['hwy']) , mpg))) )

# Method 2
print("Sum using for loop")
#total_sum =  sum( d['hwy'] for d in mpg )
total_sum =  sum(float(d['hwy']) for d in mpg)
print(total_sum)

In [0]:
avg_hwy = sum(float(d['hwy']) for d in mpg) / len(mpg)
print(avg_hwy)

Use `set` to return the unique values for the number of cylinders the cars in our dataset have.

In [0]:
cylinders = set(d['cyl'] for d in mpg)
cylinders

###Grouping the cars by number of cylinder, and finding the average cty mpg for each group.


Here's a more complex example where we are grouping the cars by number of cylinder, and finding the average cty mpg for each group.




In [0]:
CtyMpgByCyl = []

for c in cylinders: # iterate over all the cylinder levels
    summpg = 0
    cyltypecount = 0

    for d in mpg: # iterate over all dictionaries
        if d['cyl'] == c: # if the cylinder level type matches,
            summpg += float(d['cty']) # add the cty mpg
            cyltypecount += 1 # increment the count

    CtyMpgByCyl.append((c, summpg / cyltypecount)) # append the tuple ('cylinder', 'avg mpg')

print("Before Sorting")
print(CtyMpgByCyl)

print("After Sorting")
CtyMpgByCyl.sort(key=lambda x: x[0])

print(CtyMpgByCyl)

Use `set` to return the unique values for the class types in our dataset.

In [0]:
vehicleclass = set(d['class'] for d in mpg) # what are the class types
vehicleclass

###Find the average hwy mpg for each class of vehicle in our dataset.

In [0]:
HwyMpgByClass = []

for t in vehicleclass: # iterate over all the vehicle classes
    summpg = 0
    vclasscount = 0
    for d in mpg: # iterate over all dictionaries
        if d['class'] == t: # if the cylinder amount type matches,
            summpg += float(d['hwy']) # add the hwy mpg
            vclasscount += 1 # increment the count
    HwyMpgByClass.append((t, summpg / vclasscount)) # append the tuple ('class', 'avg mpg')

HwyMpgByClass.sort(key=lambda x: x[1])
HwyMpgByClass