# Ficheros CSV

He aquí un fragmento de un archivo csv, abierto con excel:

<img src='./startup_funding.png' alt='Archivo csv abierto con excel' style='height:350px'>

El manejo de ficheros CSV es bastante sencillo en python. https://docs.python.org/3/library/csv.html

In [1]:
import csv
with open('startup_funding.csv', newline='') as rawdata:
    reader = csv.DictReader(rawdata, delimiter=',', quotechar='"')
    for row in reader:
        print('{0} {1} {2}'.format(row['Date'], row['StartupName'], row['AmountInUSD']))    

01/08/2017 TouchKin 1,300,000
02/08/2017 Ethinos 
02/08/2017 Leverage Edu 
02/08/2017 Zepo 500,000
02/08/2017 Click2Clinic 850,000
01/07/2017 Billion Loans 1,000,000
03/07/2017 Ecolibriumenergy 2,600,000
04/07/2017 Droom 20,000,000
05/07/2017 Jumbotail 8,500,000
05/07/2017 Moglix 12,000,000
05/07/2017 Timesaverz 1,000,000
06/07/2017 Minjar 
06/07/2017 MyCity4kids 
07/07/2017 Clip App 1,000,000
07/07/2017 Upwardly.in 
10/07/2017 Autorox.co 3,000,000
11/07/2017 Fabogo 2,250,000
11/07/2017 Flickstree 464,000
11/07/2017 Design Cafe 
12/07/2017 Innoviti 18,500,000
12/07/2017 VDeliver 
12/07/2017 Bottr.me 
12/07/2017 Arcatron 
14/07/2017 QwikSpec 540,000
14/07/2017 Chumbak 1,700,000
17/07/2017 Increff 2,000,000
17/07/2017 Vayana 4,000,000
18/07/2017 MObiquest 
18/07/2017 Ambee 
18/07/2017 Ideal Insurance 
18/07/2017 Hypernova Interactive 
19/07/2017 Rentomojo 10,000,000
19/07/2017 AirCTO 
19/07/2017 Playablo 600,000
20/07/2017 Trupay 700,000
21/07/2017 Brick2Wall 200,000
21/07/2017 FableStre

In [2]:
with open('startup_funding.csv', newline='') as rawdata:
    reader = csv.DictReader(rawdata, delimiter=',', quotechar='"')
    row = next(reader)
    while row:
        print('{0} {1} {2}'.format(row['Date'], row['StartupName'], row['AmountInUSD']))
        row = next(reader)

01/08/2017 TouchKin 1,300,000
02/08/2017 Ethinos 
02/08/2017 Leverage Edu 
02/08/2017 Zepo 500,000
02/08/2017 Click2Clinic 850,000
01/07/2017 Billion Loans 1,000,000
03/07/2017 Ecolibriumenergy 2,600,000
04/07/2017 Droom 20,000,000
05/07/2017 Jumbotail 8,500,000
05/07/2017 Moglix 12,000,000
05/07/2017 Timesaverz 1,000,000
06/07/2017 Minjar 
06/07/2017 MyCity4kids 
07/07/2017 Clip App 1,000,000
07/07/2017 Upwardly.in 
10/07/2017 Autorox.co 3,000,000
11/07/2017 Fabogo 2,250,000
11/07/2017 Flickstree 464,000
11/07/2017 Design Cafe 
12/07/2017 Innoviti 18,500,000
12/07/2017 VDeliver 
12/07/2017 Bottr.me 
12/07/2017 Arcatron 
14/07/2017 QwikSpec 540,000
14/07/2017 Chumbak 1,700,000
17/07/2017 Increff 2,000,000
17/07/2017 Vayana 4,000,000
18/07/2017 MObiquest 
18/07/2017 Ambee 
18/07/2017 Ideal Insurance 
18/07/2017 Hypernova Interactive 
19/07/2017 Rentomojo 10,000,000
19/07/2017 AirCTO 
19/07/2017 Playablo 600,000
20/07/2017 Trupay 700,000
21/07/2017 Brick2Wall 200,000
21/07/2017 FableStre

StopIteration: 

## Transformación de datos

Los datos de un csv son siempre cadenas de caracteres...

In [3]:
with open('startup_funding.csv', newline='') as rawdata:
    reader = csv.DictReader(rawdata, delimiter=',', quotechar='"')
    row = next(reader)
    for k, v in row.items():
        print(k,v, type(v))

SNo 0 <class 'str'>
Date 01/08/2017 <class 'str'>
StartupName TouchKin <class 'str'>
IndustryVertical Technology <class 'str'>
SubVertical Predictive Care Platform <class 'str'>
CityLocation Bangalore <class 'str'>
InvestorsName Kae Capital <class 'str'>
InvestmentType Private Equity <class 'str'>
AmountInUSD 1,300,000 <class 'str'>
Remarks  <class 'str'>


... pero se pueden convertir en los formatos necesarios con las librerías adecuadas:

In [4]:
import datetime # https://docs.python.org/3/library/datetime.html
with open('startup_funding.csv', newline='') as rawdata:
    reader = csv.DictReader(rawdata, delimiter=',', quotechar='"')
    for row in reader:
        sno = int(row['SNo'])
        if row['AmountInUSD']!='':
            amount = int(row['AmountInUSD'].replace(',',''))
        else:
            amount = None
        date = datetime.datetime.strptime(row['Date'],'%d/%m/%Y')
        print(sno, amount, date)
        
        

0 1300000 2017-08-01 00:00:00
1 None 2017-08-02 00:00:00
2 None 2017-08-02 00:00:00
3 500000 2017-08-02 00:00:00
4 850000 2017-08-02 00:00:00
5 1000000 2017-07-01 00:00:00
6 2600000 2017-07-03 00:00:00
7 20000000 2017-07-04 00:00:00
8 8500000 2017-07-05 00:00:00
9 12000000 2017-07-05 00:00:00
10 1000000 2017-07-05 00:00:00
11 None 2017-07-06 00:00:00
12 None 2017-07-06 00:00:00
13 1000000 2017-07-07 00:00:00
14 None 2017-07-07 00:00:00
15 3000000 2017-07-10 00:00:00
16 2250000 2017-07-11 00:00:00
17 464000 2017-07-11 00:00:00
18 None 2017-07-11 00:00:00
19 18500000 2017-07-12 00:00:00
20 None 2017-07-12 00:00:00
21 None 2017-07-12 00:00:00
22 None 2017-07-12 00:00:00
23 540000 2017-07-14 00:00:00
24 1700000 2017-07-14 00:00:00
25 2000000 2017-07-17 00:00:00
26 4000000 2017-07-17 00:00:00
27 None 2017-07-18 00:00:00
28 None 2017-07-18 00:00:00
29 None 2017-07-18 00:00:00
30 None 2017-07-18 00:00:00
31 10000000 2017-07-19 00:00:00
32 None 2017-07-19 00:00:00
33 600000 2017-07-19 00:00:00

ValueError: time data '12/05.2015' does not match format '%d/%m/%Y'