# CSV Files in Python

## A. J. Zerouali (21/07/27)

This is Lecture 130 of Section 17, in Pierian Data's Python Bootcamp. The library used here is Python's built-in CSV library (it's just called **csv**), which does the raw basics: opening, reading, converting to lists, writing and saving.


In [1]:
import csv

There are several other Python libraries for CSV processing:

* The most relevant one for data science is Pandas, which contains some functions for data visualization.

* Openpyxl which is compatible with the more sophisticated functionalities of MS Excel. With Python's CSV and Pandas, only raw data can be included in the CSVs, but Openpyxl allows to do more.

* There's also Google's library to work with Google Spreadsheets, which is not limited to Python.


## Opening CSVs

* Opening a CSV file is done with Python's usual  **open(*file_path_name*)**, except that to avoid encoding errors, one specifies an encoding with the **encoding = 'utf-8'** argument. Note that if the CSV is not in English, we may have to use other encodings.

In [24]:
data_clients_csv = open('example_data_clients.csv', encoding = 'utf-8')

* To convert the file contents into a Python CSV object, we first use **csv.reader(*file_contents*, delimiter = ',')**. Here, the delimiter parameter is to read the data properly if the separator is not ',', e.g. if the file uses ';' or '\t' (tabs) instead. 

* **Important:** The function **csv.reader()** creates a **reader object which only reads opened file**.

* The conversion of the data in the CSV is done using the **list()** function on the reader:

In [25]:
data_clients = csv.reader(data_clients_csv, delimiter = ',')
type(data_clients)

_csv.reader

In [26]:
data_clients_lst = list(data_clients)

for row in range(6):
    print(data_clients_lst[row])

['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city']
['1', 'Joseph', 'Zaniolini', 'jzaniolini0@simplemachines.org', 'Male', '163.168.68.132', 'Pedro Leopoldo']
['2', 'Freida', 'Drillingcourt', 'fdrillingcourt1@umich.edu', 'Female', '97.212.102.79', 'Buri']
['3', 'Nanni', 'Herity', 'nherity2@statcounter.com', 'Female', '145.151.178.98', 'Claver']
['4', 'Orazio', 'Frayling', 'ofrayling3@economist.com', 'Male', '25.199.143.143', 'Kungur']
['5', 'Julianne', 'Murrison', 'jmurrison4@cbslocal.com', 'Female', '10.186.243.144', 'Sainte-Luce-sur-Loire']


* The first line of a CSV is always the labels of the columns (which are separated by the delimiter obviously).

In [27]:
data_clients_lst[0]

['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address', 'city']

In [28]:
data_clients_csv.close()

* Once we get the list (of lists) of rows in the CSV file, we can essentially manipulate the data with the usual Python functions. For example, let's make a new list with the full names and cities of the clients:

In [31]:
data_clients_names = [['Full Name', 'City']]

for row in range(1,len(data_clients_lst)):
    data_clients_names.append([ data_clients_lst[row][1]+' '+data_clients_lst[row][2], \
                                data_clients_lst[row][6] ])

In [33]:
for row in data_clients_names[0:6]:
    print(row)


['Full Name', 'City']
['Joseph Zaniolini', 'Pedro Leopoldo']
['Freida Drillingcourt', 'Buri']
['Nanni Herity', 'Claver']
['Orazio Frayling', 'Kungur']
['Julianne Murrison', 'Sainte-Luce-sur-Loire']


## Writing CSVs

* Writing a CSV file uses the **csv.writer()** function. First we open a new file in write or append mode, we enter the appropriate encoding, and specify that the new lines are indicated by *empty strings* in the new CSV (see help on open()).

In [41]:
# Open/create the file in which we'll save the data
file_clients_names = open('example_data_clients_names_cities.csv', mode = 'w', encoding='utf-8', newline = '')


* Next, create a writer object. The first argument is the file (variable) in which we'll write the rows, and we also specify the delimiter. 

In [42]:
csv_writer = csv.writer(file_clients_names, delimiter = ',')

* The actual writing is done with the **.writerow(*row_list*)** or **.writerows(*list_rows*)**:

In [43]:
csv_writer.writerows(data_clients_names)

In [44]:
file_clients_names.close()