# Pandas

From http://pandas.pydata.org/pandas-docs/stable/

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

See also:

https://github.com/restrepo/PythonTipsAndTricks

Un libro práctico sobre las posibilidades de Pandas es:

[__Python for Data Analysis__](https://drive.google.com/open?id=0BxoOXsn2EUNIWExXbVc4SDN0YTQ)<br/>
Data Wrangling with Pandas, NumPy, and IPython<br/>
_By William McKinney_


Y otro basado en Pandas es:
![image.png](https://covers.oreillystatic.com/images/0636920030515/cat.gif) [Introduction to Machine Learning with Python](https://drive.google.com/open?id=0BxoOXsn2EUNISGhrdEZ3S29fS3M)<br/>
A Guide for Data Scientists
By Sarah Guido, Andreas Müller

`Pandas` se puede usar de forma similar al lenguaje de programación `R`, el cual esta basado en estructuras de datos similares; pero también puede reemplazar el uso de interfaces gráficas como Excel para manejar hojas de cálculo.

## Cargado estándar

In [14]:
import pandas as pd

## Estructuras de datos

`Pandas` contiene dos estructuras de datos nuevas:
* `Series` que son similares a los diccionarios
* `DataFrame` que son similares a los arreglos. Las filas en un `DataFrame` de dos dimensiones corresponden a `Series` de claves similares. Un ejemplo de `DataFrame` es una hoja de cálculo. Las columnas de un `DataFrame` en dos dimensiones son `arrays` de  `numpy` con una clave asignada

### `Series`

In [25]:
s=pd.Series({'Name':'Juan Valdez','Nacionality':'Colombia','Age':23})
s

Age                     23
Nacionality       Colombia
Name           Juan Valdez
dtype: object

Las series pueden ser usadas como diccionario:

In [26]:
s['Name']

'Juan Valdez'

pero también como espacios de nombres!

In [27]:
s.Name

'Juan Valdez'

### `DataFrame`

#### Incialization from de Series
We start with an empty `DataFrame`:

In [28]:
df=pd.DataFrame()

We can append a `Series` as a row of the `DataFrame`, provided that we always use the option: `ignore_index=True`

In [29]:
df=df.append(s,ignore_index=True)
df

Unnamed: 0,Age,Nacionality,Name
0,23.0,Colombia,Juan Valdez


We can fix the type of data of the `'Age'` column

In [33]:
df['Age']=df.Age.astype(int)
df

Unnamed: 0,Age,Nacionality,Name
0,23,Colombia,Juan Valdez


To add a second file we build another `Series`

In [None]:
s=pd.Series()
for k in ['Name','Nacionality','Age','Company']:
    var=input('{}:\n'.format(k))
    s[k]=var

#### Exercises
* Display the resulting `Series` in the screen:

* Append to the previous `DataFrame`:

* Save `Pandas` `DataFrame` as an Excel file

* Load the DataFrame from the saved file

## Loading data from the clouds

In [7]:
%%writefile drive.cfg
[FILES]
CIB_Wos.xlsx                                = 0BxoOXsn2EUNIRjJkQ1VEamdJXzA

Writing drive.cfg


We follow the conventions of https://github.com/kennethreitz/python-guide

In [8]:
import os
import sys
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname('__file__'), '../input')))
from google_drive_tools import *

In [9]:
df=read_drive_excel('CIB_Wos.xlsx')

Vea el tamaño del DataFrame

In [11]:
df.shape

(415, 58)

In [13]:
df.sample()

Unnamed: 0,AB,AF,AU,BP,C1,CR,DE,DI,DT,EI,...,SU,CA,MA,PN,BE,BN,D2,SE,SP,HO
32,Background: The implications of the Cryptococc...,"Andres Agudelo, Carlos\nMunoz, Carolina\nRamir...","Agudelo, CA\nMunoz, C\nRamirez, A\nTobon, AM\n...",214\n,"[Andres Agudelo, Carlos; Munoz, Carolina; Rami...","Aller AI, 2000, ANTIMICROB AGENTS CH, V44, P15...",AIDS; Cryptococcus neoformans; Fluconazole res...,10.1016/j.riam.2014.07.006\n,Article\n,,...,,,,,,,,,,


In [12]:
df=df.fillna('')

Unnamed: 0,AB,AF,AU,BP,C1,CR,DE,DI,DT,EI,...,SU,CA,MA,PN,BE,BN,D2,SE,SP,HO
0,Dimorphic human pathogenic fungi interact with...,"Tamayo, Diana\nMunoz, Jose F.\nAlmeida, Agosti...","Tamayo, D\nMunoz, JF\nAlmeida, AJ\nPuerta, JD\...",22\n,"[Tamayo, Diana; Munoz, Jose F.; Almeida, Agost...","Almeida AJ, 2007, FUNGAL GENET BIOL, V44, P138...",Paracoccidioides spp.; Dimorphic fungal pathog...,10.1016/j.fgb.2017.01.005\n,Article\n,1096-0937\n,...,,,,,,,,,,
1,Background: Onychomycosis is a highly prevalen...,"Velasquez-Agudelo, Veronica\nAntonio Cardona-A...","Velasquez-Agudelo, V\nCardona-Arias, JA\n",,"[Velasquez-Agudelo, Veronica] Biol Res Corp, M...","Abraira V., 2006, SEMERGEN, V32, P24\nAlkhayat...",Onychomycosis; Diagnosis; Validation studies; ...,10.1186/s12879-017-2258-3\n,Article\n,,...,,,,,,,,,,
2,Introduction: Lymphadenopathy is a frequent cl...,"Rodriguez-Vega, Federico\nBotero, Miguel\nAlbe...","Rodriguez-Vega, F\nBotero, M\nCortes, JA\nTobo...",79\n,"[Rodriguez-Vega, Federico; Alberto Cortes, Jor...","Arango M, 2011, BIOMEDICA, V31, P344, DOI 10.1...",Lymphatic diseases; HIV; biopsy; lymph node; o...,10.7705/biomedica.v37i1.3293\n,Article\n,,...,,,,,,,,,,
3,"In November 2015, a large mine-tailing dam own...","Garcia, Leticia Couto\nRibeiro, Danilo Bandini...","Garcia, LC\nRibeiro, DB\nRoque, FD\nOchoa-Quin...",5\n,"[Garcia, Leticia Couto; Ribeiro, Danilo Bandin...","Azam S., 2010, GEOTECH N, V28, P50\nBai YL, 20...",biodiversity losses; compensation; environment...,10.1002/eap.1461\n,Article\n,1939-5582\n,...,,,,,,,,,,
4,Tuberculous lymphadenitis is the most common e...,"Carlos Catano, Juan\nRobledo, Jaime\n","Catano, JC\nRobledo, J\n",,"[Carlos Catano, Juan] Univ Antioquia, Sch Med,...","Abdissa K, 2015, TROP MED INT HEALTH, V20, P15...",,10.1128/microbiolspec.TNMI7-0008-2016\n,Article\n,2165-0497\n,...,,,,,,,,,,
5,Background: Colombia currently does not have a...,"Caceres, Diego H.\nDavid Zapata, Juan\nGranada...","Caceres, DH\nZapata, JD\nGranada, SD\nCano, LE...",230\n,"[Caceres, Diego H.; David Zapata, Juan; Cano, ...","Ananda-Rajah MR, 2012, CURR OPIN INFECT DIS, V...",Antifungal agents; Posaconazole; High performa...,10.1016/j.riam.2015.09.002\n,Article\n,,...,,,,,,,,,,
6,,"Roque, Fabio O.\nOchoa-Quintero, Jose\nRibeiro...","Roque, FO\nOchoa-Quintero, J\nRibeiro, DB\nSug...",1131\n,"[Roque, Fabio O.; Ochoa-Quintero, Jose; Ribeir...","Alho CJR, 2011, BRAZ J BIOL, V71, P327, DOI 10...",,10.1111/cobi.12713\n,Article\n,1523-1739\n,...,,,,,,,,,,
7,Histoplasmosis is an importantmycosis in the A...,"Lopez, Luisa F.\nValencia, Yorlady\nTobon, Ang...","Lopez, LF\nValencia, Y\nTobon, AM\nVelasquez, ...",677\n,"[Lopez, Luisa F.; Valencia, Yorlady; Tobon, An...","Adderson EE, 2004, J PEDIATR-US, V144, P100, D...",Histoplasmosis; Children; Diagnosis\n,10.1093/mmy/myw020\n,Article\n,1460-2709\n,...,,,,,,,,,,
8,Chronic stages of paracoccidioidomycosis (PCM)...,"David Puerta-Arias, Juan\nAndrea Pino-Tamayo, ...","Puerta-Arias, JD\nPino-Tamayo, PA\nArango, JC\...",,"[David Puerta-Arias, Juan; Andrea Pino-Tamayo,...","Abadie V, 2005, BLOOD, V106, P1843, DOI 10.118...",,10.1371/journal.pone.0163985\n,Article\n,,...,,,,,,,,,,
9,The Paracoccidioides genus includes two specie...,"Munoz, Jose F.\nFarrer, Rhys A.\nDesjardins, C...","Munoz, JF\nFarrer, RA\nDesjardins, CA\nGallo, ...",,"[Munoz, Jose F.; Gallo, Juan E.; Misas, Elizab...","Almeida AJ, 2007, FUNGAL GENET BIOL, V44, P25,...",Paracoccidioides; evolution; genetic recombina...,10.1128/mSphere.00213-16\n,Article\n,,...,,,,,,,,,,
