# Leyendo archivos con Pandas

No solo podemos ingresar datos a mano, sino que también podemos leerlos desde un archivo. De hecho, Pandas soporta una gran cantidad de tipos de archivo para leerlos. Veamos cómo se hace esto.

In [2]:
import pandas as pd

## Leyendo el csv

Para leer un archivo csv, usamos el método ``pd.read_csv``, y le indicamos la ubicación del archivo.

In [12]:
df_books = pd.read_csv(
    'datasets/bestsellers-with-categories_e591527f-ae45-4fa5-b0d1-d50142128fa6.csv')
df_books

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


Como estamos trabajando un archivo csv, hay más parámetros que podemos usar para leer el archivo. Veamos algunos.

### Parámetros al leer un csv

#### - Separador (``sep=''``)

No siempre están separados los csv por comas (``','``). En caso de que esto pase, le podemos indicar a pandas el separador que debe usar a la hora de leer los datos.

In [8]:
pd.read_csv(
    'datasets/bestsellers-with-categories_e591527f-ae45-4fa5-b0d1-d50142128fa6.csv', sep=',')

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


#### * Header (``header=0``)

El parámetro ``header`` indica qué fila usar para los encabezados de la tabla. Por defecto es ``0``, pero en caso de que tengamos otra estructura de archivo, lo podemos modificar. Por ejemplo, aquí estaríamos tomando la segunda fila como la de encabezados.

In [9]:
pd.read_csv(
    'datasets/bestsellers-with-categories_e591527f-ae45-4fa5-b0d1-d50142128fa6.csv', header=1)

Unnamed: 0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
0,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
1,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
2,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
3,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
4,A Dance with Dragons (A Song of Ice and Fire),George R. R. Martin,4.4,12643,11,2011,Fiction
...,...,...,...,...,...,...,...
544,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
545,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


Por defecto, toma siempre la primera fila como la de encabezados.

In [10]:
pd.read_csv(
    'datasets/bestsellers-with-categories_e591527f-ae45-4fa5-b0d1-d50142128fa6.csv', header=0)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


Si quisiéramos que ignorara la primera fila, y simplemente le asignara un índice cualquiera, podemos indicar ``None`` en el header.

In [11]:
pd.read_csv(
    'datasets/bestsellers-with-categories_e591527f-ae45-4fa5-b0d1-d50142128fa6.csv', header=None)

Unnamed: 0,0,1,2,3,4,5,6
0,Name,Author,User Rating,Reviews,Price,Year,Genre
1,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
2,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
3,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
4,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
...,...,...,...,...,...,...,...
546,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
549,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


#### - Nombres (``names=[]``)

Con ``names=[]``, le podemos indicar a Pandas que no tome los headers del csv como encabezados de la tabla, sino que podemos definir los propios.

In [15]:
pd.read_csv(
    'datasets/bestsellers-with-categories_e591527f-ae45-4fa5-b0d1-d50142128fa6.csv',
    header=0,
    names=['Nombre', 'Autor', 'Calificación del usuario', 'Vistas', 'Precio', 'Año', 'Género'])

Unnamed: 0,Nombre,Autor,Calificación del usuario,Vistas,Precio,Año,Género
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction
...,...,...,...,...,...,...,...
545,Wrecking Ball (Diary of a Wimpy Kid Book 14),Jeff Kinney,4.9,9413,8,2019,Fiction
546,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2016,Non Fiction
547,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2017,Non Fiction
548,You Are a Badass: How to Stop Doubting Your Gr...,Jen Sincero,4.7,14331,8,2018,Non Fiction


## Leyendo JSONs

La estructura del JSON es algo particular. Podemos trabajarlo como un CSV, pero esto no siempre es posible. Antes de esto, empecemos viendo un ejemplo sencillo.

In [19]:
df_json_books = pd.read_json('datasets/Books.json', )
df_json_books

Unnamed: 0,id,url,title,upc,product_type,price_excl_tax,price_incl_tax,tax,price,availability,num_reviews,stars,category,description
0,1,https://books.toscrape.com/catalogue/a-light-i...,It's hard to imagine a world without A Light i...,a897fe39b1053632,books,52,52,0,52,22,0,3,poetry,"(""It's hard to imagine a world without A Light..."
1,2,https://books.toscrape.com/catalogue/scott-pil...,Scott Pilgrim's life is totally sweet. He's 23...,3b1c02bac2a429e6,books,52,52,0,52,19,0,5,sequential art,('Scott Pilgrim\'s life is totally sweet. He\'...
2,3,https://books.toscrape.com/catalogue/set-me-fr...,Aaron Ledbetter’s future had been planned out ...,ce6396b0f23f6ecc,books,17,17,0,17,19,0,5,young adult,('Aaron Ledbetter’s future had been planned ou...
3,4,https://books.toscrape.com/catalogue/sapiens-a...,From a renowned historian comes a groundbreaki...,4165285e1663650f,books,54,54,0,54,20,0,5,history,('From a renowned historian comes a groundbrea...
4,5,https://books.toscrape.com/catalogue/shakespea...,This book is an important and complete collect...,30a7f60cd76ca58c,books,21,21,0,21,19,0,4,poetry,('This book is an important and complete colle...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
993,994,https://books.toscrape.com/catalogue/1st-to-di...,"James Patterson, bestselling author of the Ale...",f684a82adc49f011,books,54,54,0,54,1,0,1,mystery,"(""James Patterson, bestselling author of the A..."
994,995,https://books.toscrape.com/catalogue/choosing-...,"To the dismay of religious leaders, study afte...",a812f6969ddf3e39,books,28,28,0,28,1,0,4,religion,"('To the dismay of religious leaders, study af..."
995,996,https://books.toscrape.com/catalogue/a-spys-de...,"In England’s Regency era, manners and elegance...",19fec36a1dfb4c16,books,17,17,0,17,1,0,5,historical fiction,"('In England’s Regency era, manners and elegan..."
996,997,https://books.toscrape.com/catalogue/frankenst...,Mary Shelley began writing Frankenstein when s...,a492f49a3e2b6a71,books,38,38,0,38,1,0,2,default,"(""Mary Shelley began writing Frankenstein when..."
