## Introduction

En este tutorial vamos a aprender como crear nuestros datos así como a leer y manajear datos que ya existen. 

## Primeros pasos

Para usar **pandas** tipicamente se debe comenzar con el siguiente código:

In [1]:
import numpy as np
import pandas as pd

## Creando datos en pandas

Pandas incluye los dos tipos fundamentales de datos **DataFrame** y **Series**.

### Pandas DataFrame

Un **DataFrame** representa una tabla. Podemos pensarlo como un array bi-dimensional de entradas individuales, cada una con un cierto valor. Cada entrada se identifica con una posición que se corresponde con una cierta fila (o registro) y una columna. 

El siguiente código ilustra la creación y visualización de un **DataFrame** simple:

In [2]:
df = pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})
df

Unnamed: 0,Yes,No
0,50,131
1,21,2


Los valores de los datos en un **DataFrame** no se limitan a enteros o a un solo tipo de datos. Por ejemplo, aquí se muestra un **DataFrame** cuyos valores son tanto strings como enteros:

In [3]:
prods = pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
                      'Sue': ['Pretty good.', 'Bland.'],
                      'Price': ['100', '200']})
prods

Unnamed: 0,Bob,Sue,Price
0,I liked it.,Pretty good.,100
1,It was awful.,Bland.,200


El constructor diccionario-lista asigna valores a las columnas a partir de las correspondientes claves, pero por defecto se asigna una secuencia numérica, comenzando desde 0 (como en el caso de listas o arrays), para las etiquetas o índices de las filas. Muchas veces esto es suficiente, pero Sometimes this is OK, pero en muchas ocasiones vamos a querer definir esas etiquetas nosotros.

La lista de las etiquetas usadas para las filas de un **DataFrame** se denomina **Index** y podemos asignar valores específicos a este índice usando el parámetro **index** en el constructor:

In [6]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.'],
              'Price': ['100', '200']},
             index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue,Price
Product A,I liked it.,Pretty good.,100
Product B,It was awful.,Bland.,200


### Pandas Series

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [None]:
pd.Series([1, 2, 3, 4, 5])

In essence a **Series** is a single column of a **DataFrame**, so you can assign index values to the **Series** the same way as before, using an **index** parameter. However, a **Series** does not have a column name, it only has one overall **name** attribute:

In [7]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

The Series and the DataFrame are intimately related; it's helpful to think of a **DataFrame** as actually being just a bunch of **Series** "glued together" as columns. 

## Reading data files

Being able to create a **DataFrame** or **Series** by hand is handy; but most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

Data can be stored in any of a number of different forms and formats; but by far the most basic and common of these is the humble CSV file. A CSV file is a table of values separated by commas or another "separator" char (hence the name: "Comma-Separated Values" or CSV), and it looks like this:

``Product A,Product B,Product C,
30,21,9,
35,34,1,
41,11,11
``

Let's now set aside our toy datasets and see what a real dataset looks like when we read it into a DataFrame. We'll use the pd.read_csv() function to read the data into a DataFrame, and we can use the shape attribute to check how large the resulting DataFrame is:

In [11]:
wine_reviews = pd.read_csv("../data/winemag-data-130k-v2.csv")
wine_reviews.shape

(65499, 14)

Then we can examine the contents of the resultant **DataFrame** using the `head()` method, which by default grabs the first five rows:

In [12]:
wine_reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


The `pd.read_csv()` function is very flexible, with over 30 optional parameters you can specify.

For example, you can see that the CSV file has a built-in index, which pandas did not pick up on automatically. To make pandas use that column for the index (instead of creating a new one from scratch), we can specify an index_col.

In [15]:
wine_reviews = pd.read_csv("../data/winemag-data-130k-v2.csv", index_col=0)
wine_reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
