## How to manage data with pandas library

### Reading information from a csv file

In this example I want to show you how to use ***pandas library*** to read data from an external resource as a **csv** file.
For this example we are going to read a file with the information about different apartament offers in Cali city (name, area, address, price, etc).


In [57]:
#Import pandas library
import pandas as pd

#Define a variable to save our data and read the file using the read_csv function
original_data = pd.read_csv("./01_apartments_information.csv")

After reading our file with the information we can use the ***shape*** attribute to know about our data file.

In [58]:
original_data.shape

(1755, 11)

We saw that our file has information about **1755** records and **11** columns. Now we can examine the content of our DataFrame using the **head()** function that can show us by default the first five rows of our data, also we can define as a parameter the number of rows that we want to see ***head(# rows)***

In [59]:
original_data.head()

Unnamed: 0,Project,Seller,Address,Type,Zone,Bedrooms,# Gar.,Area,Sub Zone,Floor,Price 2013
0,Ed. Rincón del Bosque,Inv. CIMAR,AV. 10 N # 51N-36,Apto.,Norte,1,0,20.0,La Flora,101,28000000
1,Vertice Suites - Apartaestudios,C Y C Urbanizadores,Cr. 83E Cl. 48 Bis,Apto.,Sur,1,0,28.0,Caney,303,42500000
2,Trikala,Const. Sintagma,Cr. 49 # 14C-15,Apto.,Sur,2,2 a1,29.0,Ingenio,301,41300000
3,Edificio Camp. Towers,Hérnandez Bohmer,Cr. 100 Cl. 11A,Apto.,Sur,1,0,29.0,Ciudad Jardín,7,53753490
4,Vertice Suites - Apartaestudios,C Y C Urbanizadores,Cr. 83E Cl. 48 Bis,Apto.,Sur,1,0,29.0,Caney,309,48500000


As we can see in our data set, pandas created a column with a index, also we can make that pandas use this column for the index with the **index_col** parameter:

In [60]:
original_data = pd.read_csv("./01_apartments_information.csv", index_col = False)
original_data.head()

Unnamed: 0,Project,Seller,Address,Type,Zone,Bedrooms,# Gar.,Area,Sub Zone,Floor,Price 2013
0,Ed. Rincón del Bosque,Inv. CIMAR,AV. 10 N # 51N-36,Apto.,Norte,1,0,20.0,La Flora,101,28000000
1,Vertice Suites - Apartaestudios,C Y C Urbanizadores,Cr. 83E Cl. 48 Bis,Apto.,Sur,1,0,28.0,Caney,303,42500000
2,Trikala,Const. Sintagma,Cr. 49 # 14C-15,Apto.,Sur,2,2 a1,29.0,Ingenio,301,41300000
3,Edificio Camp. Towers,Hérnandez Bohmer,Cr. 100 Cl. 11A,Apto.,Sur,1,0,29.0,Ciudad Jardín,7,53753490
4,Vertice Suites - Apartaestudios,C Y C Urbanizadores,Cr. 83E Cl. 48 Bis,Apto.,Sur,1,0,29.0,Caney,309,48500000


### Selecting data

We can consider our data file as a DataFrame, then to make some data operations over it, we can use the commands given by pandas library to access for example to an attribute using the follow syntaxis:

In [61]:
original_data["Project"]

0                 Ed. Rincón del Bosque
1       Vertice Suites - Apartaestudios
2                               Trikala
3                 Edificio Camp. Towers
4       Vertice Suites - Apartaestudios
                     ...               
1750                   Atelier Edificio
1751              Riberas del Aguacatal
1752            Rincon del Camp. Ed. IV
1753                 Arboleda Reservado
1754                Parque de Normandia
Name: Project, Length: 1755, dtype: object

The previous way to access an attribute is an esay way to read information, however pandas provides two operator that we can use to an advance seletcion. They are the ***loc*** and the ***iloc*** commands.

### Index-based selection

pandas works with two paradigms: The first is selecting data based on its numerical position in the data (index-based selection). We can use **iloc** command to work in this way.

For instance, to select the first row of data in our dataset we can do the following:

In [62]:
original_data.iloc[0]

Project       Ed. Rincón del Bosque
Seller                   Inv. CIMAR
Address           AV. 10 N # 51N-36
Type                          Apto.
Zone                          Norte
Bedrooms                          1
# Gar.                            0
Area                             20
Sub Zone                   La Flora
Floor                           101
Price 2013               28,000,000
Name: 0, dtype: object

If we want to get a column with **iloc** we can do the following:

In [63]:
original_data.iloc[:, 0]

0                 Ed. Rincón del Bosque
1       Vertice Suites - Apartaestudios
2                               Trikala
3                 Edificio Camp. Towers
4       Vertice Suites - Apartaestudios
                     ...               
1750                   Atelier Edificio
1751              Riberas del Aguacatal
1752            Rincon del Camp. Ed. IV
1753                 Arboleda Reservado
1754                Parque de Normandia
Name: Project, Length: 1755, dtype: object

Also we can combine it with other selectors for example, to select the ***project*** name for the first, second and third row we can indicate a range o value like the following:

In [64]:
original_data.iloc[:3,0]

0              Ed. Rincón del Bosque
1    Vertice Suites - Apartaestudios
2                            Trikala
Name: Project, dtype: object

Also we can insert a list of numbers to indicate the rows that we want to retrieve:

In [65]:
original_data.iloc[[0,1,2], 0]

0              Ed. Rincón del Bosque
1    Vertice Suites - Apartaestudios
2                            Trikala
Name: Project, dtype: object

It is important to know that we can use negative values to retrieve information starting by the end of the dataset. For example if we want to select the last six projects and their information we can do the following:

In [66]:
original_data.iloc[-6:]

Unnamed: 0,Project,Seller,Address,Type,Zone,Bedrooms,# Gar.,Area,Sub Zone,Floor,Price 2013
1749,Arboleda Reservado,Skema,Cr. 3 Oeste # 7-87,Apto.,Oeste,4,4,460.0,Santa Teresita,401,1736000000
1750,Atelier Edificio,JM Inm.,Av. 10A Nte. # 7-64,Apto.,Oeste,4,4,468.0,Sta Monica Nte,Ph8,1032330000
1751,Riberas del Aguacatal,Socorro Escobar,Av. Aguacatal Cl. 1 Oeste,Apto.,Oeste,4,2,514.0,Aguacatal,6,1040000000
1752,Rincon del Camp. Ed. IV,Const. Melendez,Cr. 98 Cl. 5,Apto.,Sur,4,3,517.0,Ciudad Jardín,Dp1302,1872996000
1753,Arboleda Reservado,Skema,Cr. 3 Oeste # 7-87,Apto.,Oeste,4,5,575.0,Santa Teresita,102,2300000000
1754,Parque de Normandia,Socorro Escobar,Av. 6 Oeste # 5-170,Apto.,Oeste,4,4,646.0,Aguacatal,1,1500000000
