## Programación usando Test Development Driven

## Doctest
* Doctest permite correr las pruebas escritas en la documentación del coódigo de forma automática. En otras palabras, permite verificar que el código siga funcionando correctamente después de introducir cambios.

In [1]:
%clear

"""
Este es un ejemplo usando `doctest`. Las líneas que empiezan
con `>>>` son interpretada como código en Python. `doctest`
extrae dichas líneas, las ejecuta y compara contra el resultado
indicado.

En este ejemplo, la función suma(a, b) debe retornar a+b.
Note que en el primera llamada el resultado es correcto,
mientras que en la segunda es incorrecto.

>>> suma(1, 1)
2

>>> suma(1, 1)
3

"""

def suma(a, b):
    return a + b

if __name__ == "__main__":
    import doctest
    doctest.testmod()


**********************************************************************
File "__main__", line 15, in __main__
Failed example:
    suma(1, 1)
Expected:
    3
Got:
    2
**********************************************************************
1 items had failures:
   1 of   2 in __main__
***Test Failed*** 1 failures.


In [2]:
%clear
"""
Este es un ejemplo usando `doctest`. Las líneas que empiezan
con `>>>` son interpretada como código en Python. `doctest`
extrae dichas líneas, las ejecuta y compara contra el resultado
indicado.

En este ejemplo, la función suma(a, b) debe retornar a+b.
Note que en el primera llamada el resultado es correcto,
mientras que en la segunda es incorrecto.

>>> suma(1, 1)
2

>>> suma(1, 1)
3

"""
def suma(a, b):
    ## código nuevo >>>
    import pdb; pdb.set_trace()
    ## <<<
    return a + b

if __name__ == "__main__":
    import doctest
    doctest.testmod()


> [1;32m<ipython-input-2-386f7a9e4583>[0m(23)[0;36msuma[1;34m()[0m
[1;32m     21 [1;33m    [1;32mimport[0m [0mpdb[0m[1;33m;[0m [0mpdb[0m[1;33m.[0m[0mset_trace[0m[1;33m([0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m     22 [1;33m    [1;31m## <<<[0m[1;33m[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m---> 23 [1;33m    [1;32mreturn[0m [0ma[0m [1;33m+[0m [0mb[0m[1;33m[0m[1;33m[0m[0m
[0m[1;32m     24 [1;33m[1;33m[0m[0m
[0m[1;32m     25 [1;33m[1;32mif[0m [0m__name__[0m [1;33m==[0m [1;34m"__main__"[0m[1;33m:[0m[1;33m[0m[1;33m[0m[0m
[0m
--KeyboardInterrupt--

KeyboardInterrupt: Interrupted by user
**********************************************************************
File "__main__", line 15, in __main__
Failed example:
    suma(1, 1)
Expected:
    3
Got:
    2
**********************************************************************
1 items had failures:
   1 of   2 in __main__
***Test Failed*** 1 failures.


## Construcción de un Datapipe en Python

In [7]:
# ## Se crea un directorio para el ejemplo
# !mkdir directory

In [4]:
%%writefile directory/file1.csv
a,1,12
b,2,13,
c,,"14"
0,\N, "0"
a, \N, "0"
e,,
0,
\N,,
,,
a,0,0
\n,\n,\n


Writing directory/file1.csv


In [5]:
%%writefile directory/file2.csv
b,1,12
a,2,13
d,,14
0,\N,15
\n
\n

e,,
,,,
K,3,\n

Writing directory/file2.csv


In [6]:
%%writefile directory/file3.csv
b;1;12
'a';4;13
a;3;13

'c';3;17
'a';'\n';'15'
E;2;0

Writing directory/file3.csv


## Lectura de archivos

In [9]:
import glob

filenames = glob.glob("directory/*.csv")
filenames

['directory\\file1.csv', 'directory\\file2.csv', 'directory\\file3.csv']

In [10]:
## Crear un variable para almacenar el contenido de 
#de los archivos

text = []

for filename in filenames:
    with open(filename, "rt") as f:
        text += f.readlines()
        

text

['a,1,12\n',
 'b,2,13,\n',
 'c,,"14"\n',
 '0,\\N, "0"\n',
 'a, \\N, "0"\n',
 'e,,\n',
 '0,\n',
 '\\N,,\n',
 ',,\n',
 'a,0,0\n',
 '\\n,\\n,\\n\n',
 'b,1,12\n',
 'a,2,13\n',
 'd,,14\n',
 '0,\\N,15\n',
 '\\n\n',
 '\\n\n',
 '\n',
 'e,,\n',
 ',,,\n',
 'K,3,\\n\n',
 'b;1;12\n',
 "'a';4;13\n",
 'a;3;13\n',
 '\n',
 "'c';3;17\n",
 "'a';'\\n';'15'\n",
 'E;2;0\n']

In [12]:
text = [line[:-1] for line in text]
text

['a,1,1',
 'b,2,13',
 'c,,"14',
 '0,\\N, "0',
 'a, \\N, "0',
 'e,',
 '0',
 '\\N,',
 ',',
 'a,0,',
 '\\n,\\n,\\',
 'b,1,1',
 'a,2,1',
 'd,,1',
 '0,\\N,1',
 '\\',
 '\\',
 '',
 'e,',
 ',,',
 'K,3,\\',
 'b;1;1',
 "'a';4;1",
 'a;3;1',
 '',
 "'c';3;1",
 "'a';'\\n';'15",
 'E;2;']

### Todos los registro deben tener 3 columnas

In [14]:
csv = [line.split(",") for line in text]

for line in csv:
    if len(line) !=3:
        print(line)

['e', '']
['0']
['\\N', '']
['', '']
['\\']
['\\']
['']
['e', '']
['b;1;1']
["'a';4;1"]
['a;3;1']
['']
["'c';3;1"]
["'a';'\\n';'15"]
['E;2;']


In [15]:
## Hay archivos que estan delimitados por ";"
#Se reemplazan el ";" por "," y se verifica nuevamente

text = [line.replace(";", ",") for line in text]

csv = [line.split(",") for line in text]

for line in csv:
    if len(line) !=3:
        print(line)

['e', '']
['0']
['\\N', '']
['', '']
['\\']
['\\']
['']
['e', '']
['']


In [17]:
## Hay líneas vacias

text = [line for line in text if line != ""]
csv = [line.split(",") for line in text]
for line in csv:
    if len(line)!=3:
        print(line)
        
        

['e', '']
['0']
['\\N', '']
['', '']
['\\']
['\\']
['e', '']


In [18]:
#Eliminamos las lnieas que no tienen tres campos
csv = [line.split(",") for line in text]
csv = [line for line in csv if len(line) ==3]
csv

[['a', '1', '1'],
 ['b', '2', '13'],
 ['c', '', '"14'],
 ['0', '\\N', ' "0'],
 ['a', ' \\N', ' "0'],
 ['a', '0', ''],
 ['\\n', '\\n', '\\'],
 ['b', '1', '1'],
 ['a', '2', '1'],
 ['d', '', '1'],
 ['0', '\\N', '1'],
 ['', '', ''],
 ['K', '3', '\\'],
 ['b', '1', '1'],
 ["'a'", '4', '1'],
 ['a', '3', '1'],
 ["'c'", '3', '1'],
 ["'a'", "'\\n'", "'15"],
 ['E', '2', '']]

## Los valores válidos de  la columna 1 son {a,b,c,d,e} en minúsculas



In [19]:
#Inspeccionar valores de la columna 1 que son validos para
#determinar las acciones a tomar

for line in csv:
    if line[0] not in ["a", "b", "c", "d", "e", "\\N"]:
        print(line)

['0', '\\N', ' "0']
['\\n', '\\n', '\\']
['0', '\\N', '1']
['', '', '']
['K', '3', '\\']
["'a'", '4', '1']
["'c'", '3', '1']
["'a'", "'\\n'", "'15"]
['E', '2', '']


In [21]:
## Hay letras en mayúsculas

csv = [[line[0].lower()] + line[1:] for line in csv]

for line in csv:
    if line[0] not in ["a", "b", "c", "d", "e" "\\N"]:
        print(line)

['0', '\\N', ' "0']
['\\n', '\\n', '\\']
['0', '\\N', '1']
['', '', '']
['k', '3', '\\']
["'a'", '4', '1']
["'c'", '3', '1']
["'a'", "'\\n'", "'15"]
['e', '2', '']


In [22]:
# Se toma la decisión de reemplazar los valores inválidsos
## en la columna 1 por \N

csv = [["\\N"] + line[1:] if line[0] not in ["a", "b", "c", "d", "e"] else line for line in csv]


for line in csv:
    if line[0] not in ["a", "b", "c", "d", "e" "\\N"]:
        print(line)

['\\N', '\\N', ' "0']
['\\N', '\\n', '\\']
['\\N', '\\N', '1']
['\\N', '', '']
['\\N', '3', '\\']
['\\N', '4', '1']
['\\N', '3', '1']
['\\N', "'\\n'", "'15"]
['e', '2', '']


In [31]:
csv

[['a', '1', '1'],
 ['b', '2', '13'],
 ['c', '', '"14'],
 ['\\N', '\\N', ' "0'],
 ['a', ' \\N', ' "0'],
 ['a', '0', ''],
 ['\\N', '\\n', '\\'],
 ['b', '1', '1'],
 ['a', '2', '1'],
 ['d', '', '1'],
 ['\\N', '\\N', '1'],
 ['\\N', '', ''],
 ['\\N', '3', '\\'],
 ['b', '1', '1'],
 ['\\N', '4', '1'],
 ['a', '3', '1'],
 ['\\N', '3', '1'],
 ['\\N', "'\\n'", "'15"],
 ['e', '2', '']]

In [35]:
## Regla2: los valores nulos se notan como "\N"
#es más simple procesar cada linea como un string

text = [",".join(line) for line in csv]
text = [line.replace("\\n", "\\N") for line in text]
csv = [line.split(",") for line in text]
csv


[['a', '1', '1'],
 ['b', '2', '13'],
 ['c', '', '"14'],
 ['\\N', '\\N', ' "0'],
 ['a', ' \\N', ' "0'],
 ['a', '0', ''],
 ['\\N', '\\N', '\\'],
 ['b', '1', '1'],
 ['a', '2', '1'],
 ['d', '', '1'],
 ['\\N', '\\N', '1'],
 ['\\N', '', ''],
 ['\\N', '3', '\\'],
 ['b', '1', '1'],
 ['\\N', '4', '1'],
 ['a', '3', '1'],
 ['\\N', '3', '1'],
 ['\\N', "'\\N'", "'15"],
 ['e', '2', '']]

In [36]:
# Los "" se reemplazan por "\N"
csv = [["\\N" if field == "" else field for field in line] for line in csv]
csv

[['a', '1', '1'],
 ['b', '2', '13'],
 ['c', '\\N', '"14'],
 ['\\N', '\\N', ' "0'],
 ['a', ' \\N', ' "0'],
 ['a', '0', '\\N'],
 ['\\N', '\\N', '\\'],
 ['b', '1', '1'],
 ['a', '2', '1'],
 ['d', '\\N', '1'],
 ['\\N', '\\N', '1'],
 ['\\N', '\\N', '\\N'],
 ['\\N', '3', '\\'],
 ['b', '1', '1'],
 ['\\N', '4', '1'],
 ['a', '3', '1'],
 ['\\N', '3', '1'],
 ['\\N', "'\\N'", "'15"],
 ['e', '2', '\\N']]

### Los valores de la columna 2 son enteros o nulos

In [39]:
## Se imprimen los registros con problemas
[line for line in csv if not line[1].isdigit() and line[1] != "\\N"]

[['a', ' \\N', ' "0'], ['\\N', "'\\N'", "'15"]]

In [40]:
## Hay nulos con comillas simples

text = [",".join(line) for line in csv]
text = [line.replace("'\\N'", "\\N") for line in text]
csv = [line.split(",") for line in text]
csv

[['a', '1', '1'],
 ['b', '2', '13'],
 ['c', '\\N', '"14'],
 ['\\N', '\\N', ' "0'],
 ['a', ' \\N', ' "0'],
 ['a', '0', '\\N'],
 ['\\N', '\\N', '\\'],
 ['b', '1', '1'],
 ['a', '2', '1'],
 ['d', '\\N', '1'],
 ['\\N', '\\N', '1'],
 ['\\N', '\\N', '\\N'],
 ['\\N', '3', '\\'],
 ['b', '1', '1'],
 ['\\N', '4', '1'],
 ['a', '3', '1'],
 ['\\N', '3', '1'],
 ['\\N', '\\N', "'15"],
 ['e', '2', '\\N']]

### Los valores de la comuna 3 son enteros o nulos

In [41]:
## Se imprimen los registros con problemas
[line for line in csv if not line[2].isdigit() and line[2] != "\\N"]

[['c', '\\N', '"14'],
 ['\\N', '\\N', ' "0'],
 ['a', ' \\N', ' "0'],
 ['\\N', '\\N', '\\'],
 ['\\N', '3', '\\'],
 ['\\N', '\\N', "'15"]]

In [43]:
## Hay valores entre comillas simples

csv = [line[:2] + [line[2][1:-1]] if (line[2][0] == "'" and line[2] != "\\N") else line for line in csv]
csv

[['a', '1', '1'],
 ['b', '2', '13'],
 ['c', '\\N', '"14'],
 ['\\N', '\\N', ' "0'],
 ['a', ' \\N', ' "0'],
 ['a', '0', '\\N'],
 ['\\N', '\\N', '\\'],
 ['b', '1', '1'],
 ['a', '2', '1'],
 ['d', '\\N', '1'],
 ['\\N', '\\N', '1'],
 ['\\N', '\\N', '\\N'],
 ['\\N', '3', '\\'],
 ['b', '1', '1'],
 ['\\N', '4', '1'],
 ['a', '3', '1'],
 ['\\N', '3', '1'],
 ['\\N', '\\N', '1'],
 ['e', '2', '\\N']]

## Se genera el archivo de salid

In [44]:
text = [",".join(line) for line in csv]
text


['a,1,1',
 'b,2,13',
 'c,\\N,"14',
 '\\N,\\N, "0',
 'a, \\N, "0',
 'a,0,\\N',
 '\\N,\\N,\\',
 'b,1,1',
 'a,2,1',
 'd,\\N,1',
 '\\N,\\N,1',
 '\\N,\\N,\\N',
 '\\N,3,\\',
 'b,1,1',
 '\\N,4,1',
 'a,3,1',
 '\\N,3,1',
 '\\N,\\N,1',
 'e,2,\\N']

In [45]:
text = "\n".join(text)

In [47]:
with open("summary.csv", "w") as f:
    f.write(text)

In [50]:
!cat summary.csv

'cat' is not recognized as an internal or external command,
operable program or batch file.


In [51]:
## limpia el directorio de trabajo
!rm -rf directory
!rm summary.cs

'rm' is not recognized as an internal or external command,
operable program or batch file.
'rm' is not recognized as an internal or external command,
operable program or batch file.
