# Pandas Introduction

Pandas is one of the most widely used python libraries in data science. 
It provides high-performance, easy to use structures and data analysis tools. 
It also provides an in-memory 2d table object called Dataframe. 
This can be compared to a spreadsheet with column names and row labels.
Hence, with 2d tables, pandas is capable of providing many additional functionalities 
like creating pivot tables, computing columns based on other columns and plotting graphs.

In [1]:
import pandas as pd

1 Pandas Series

In [2]:
data = pd.Series([0.25,0.5,0.75,0.1])
data

0    0.25
1    0.50
2    0.75
3    0.10
dtype: float64

In [3]:
data.values

array([ 0.25,  0.5 ,  0.75,  0.1 ])

In [4]:
# The index is an array-like object of type pd.Index
data.index

RangeIndex(start=0, stop=4, step=1)

In [5]:
print(data[1])
print(data[1:3])

0.5
1    0.50
2    0.75
dtype: float64


In [6]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],index=['a', 'b', 'c', 'd'])
print(data)
print(data['b'])

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64
0.5


In this way, you can think of a Pandas Series as a Python dictionary
A dictionary is a structure that maps arbitrary keys to a set of arbitrary
values, and a Series is a structure that maps typed keys to a set of typed values. This
typing is important: the type information of a Pandas Series makes 
it much more efficient than Python dictionaries for certain operations.

In [10]:
city_population = {"Antwerpen":525935, "Gent":262219, "Brugge":118325, "Mechelen":86616, "Aalst":86445, "Kortrijk":76735, "Oostende":71494, "Genk":66227}
population = pd.Series(city_population)
print(population)
print(population['Brugge'])

Aalst         86445
Antwerpen    525935
Brugge       118325
Genk          66227
Gent         262219
Kortrijk      76735
Mechelen      86616
Oostende      71494
dtype: int64
118325


In [11]:
population['Gent':'Oostende']

Gent        262219
Kortrijk     76735
Mechelen     86616
Oostende     71494
dtype: int64

2 Pandas Dataframe

In [12]:
city_area = {"Antwerpen":204.51, "Gent":156.18, "Brugge":138.40, "Mechelen":33.71, "Aalst":78.12, "Kortrijk":80.02, "Oostende":37.72, "Genk":87.85}
print(city_area)
area = pd.Series(city_area)
print(area)

{'Antwerpen': 204.51, 'Gent': 156.18, 'Brugge': 138.4, 'Mechelen': 33.71, 'Aalst': 78.12, 'Kortrijk': 80.02, 'Oostende': 37.72, 'Genk': 87.85}
Aalst         78.12
Antwerpen    204.51
Brugge       138.40
Genk          87.85
Gent         156.18
Kortrijk      80.02
Mechelen      33.71
Oostende      37.72
dtype: float64


In [13]:
cities = pd.DataFrame({'population': population,'area': area})
print(cities)

             area  population
Aalst       78.12       86445
Antwerpen  204.51      525935
Brugge     138.40      118325
Genk        87.85       66227
Gent       156.18      262219
Kortrijk    80.02       76735
Mechelen    33.71       86616
Oostende    37.72       71494


In [14]:
print(cities.index)

Index(['Aalst', 'Antwerpen', 'Brugge', 'Genk', 'Gent', 'Kortrijk', 'Mechelen',
       'Oostende'],
      dtype='object')


In [15]:
cities.columns

Index(['area', 'population'], dtype='object')

In [16]:
print(cities['area'])

Aalst         78.12
Antwerpen    204.51
Brugge       138.40
Genk          87.85
Gent         156.18
Kortrijk      80.02
Mechelen      33.71
Oostende      37.72
Name: area, dtype: float64
