In [1]:
import pandas as pd
import numpy as np

This is chapter 13 Introducing Pandas Objects.

## The Pandas Series Object

In [2]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])

In [3]:
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

In [4]:
data.index, list(data.index)

(RangeIndex(start=0, stop=4, step=1), [0, 1, 2, 3])

In [5]:
data.values, type(data.values)

(array([0.25, 0.5 , 0.75, 1.  ]), numpy.ndarray)

In [6]:
data[0]

np.float64(0.25)

In [7]:
data[0:2]

0    0.25
1    0.50
dtype: float64

### Series as Generalized NumPy Array

In [8]:
data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])

In [9]:
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

In [10]:
data['a']

np.float64(0.25)

### Series as Specialized Dictionary

In [11]:
population_dict = {'California': 39538223, 'Texas': 29145505,
                   'Florida': 21538187, 'New York': 20201249,
                   'Pennsylvania': 13002700}

In [12]:
population = pd.Series(population_dict)

In [13]:
population

California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
dtype: int64

In [14]:
population['California']

np.int64(39538223)

In [15]:
population['California':'Florida']

California    39538223
Texas         29145505
Florida       21538187
dtype: int64

## The Pandas DataFrame Object

### DataFrame as Generalized NumPy Array

In [16]:
population

California      39538223
Texas           29145505
Florida         21538187
New York        20201249
Pennsylvania    13002700
dtype: int64

In [17]:
area_dict = {'California': 423967, 'Texas': 695662, 'Florida': 170312,
             'New York': 141297, 'Pennsylvania': 119280}

In [18]:
area = pd.Series(area_dict)

In [19]:
states = pd.DataFrame({'population': population,
                       'area': area})

In [20]:
states

Unnamed: 0,population,area
California,39538223,423967
Texas,29145505,695662
Florida,21538187,170312
New York,20201249,141297
Pennsylvania,13002700,119280


In [21]:
states.index

Index(['California', 'Texas', 'Florida', 'New York', 'Pennsylvania'], dtype='object')

In [22]:
states.columns

Index(['population', 'area'], dtype='object')

### DataFrame as Specialized Dictionary

In [23]:
states['area']

California      423967
Texas           695662
Florida         170312
New York        141297
Pennsylvania    119280
Name: area, dtype: int64

### Constructing DataFrame Objects

A Pandas DataFrame can be constructed in a variety of ways:
- From a single Series object
- From a list of dicts
- From a dictionary of Series objects
- From a two-dimensional NumPy array
- From a NumPy structured array

## The Pandas Index Object

### Index as Immutable Array

As we may see there's a deep integration between Python and Pandas. If we create an index from a range we'll get a range kind of index.

In [24]:
ind = pd.Index(range(5))

In [25]:
ind

RangeIndex(start=0, stop=5, step=1)

In [26]:
ind = pd.Index(list(range(5)))

In [27]:
ind

Index([0, 1, 2, 3, 4], dtype='int64')

But it also shares some similarities with the NumPy. We have (almost) the same attributes that NumPy arrays. We may also use standard notation to get an element, but we can not mutate an index.

In [28]:
ind.size, ind.shape, ind.ndim, ind.dtype

(5, (5,), 1, dtype('int64'))

In [29]:
ind[0]

np.int64(0)

In [30]:
 ind[2:]

Index([2, 3, 4], dtype='int64')

In [31]:
try:
    ind[0] = 10
except TypeError: 
    print("Can not mutate an index object...")

Can not mutate an index object...


### Index as Ordered Set

We need an index to facilitate all sorts of set operations between dataframes (relational operations almost like with databases). We also (yet again) have a deep integration with Python `set`. BTW I do understand set operations, so probably this can help me in dealing with databases and Pandas dataframes.

In [32]:
indA = pd.Index([1, 3, 5, 7, 9])

In [33]:
indA

Index([1, 3, 5, 7, 9], dtype='int64')

In [34]:
indB = pd.Index([2, 3, 5, 7, 11])

In [35]:
indB

Index([2, 3, 5, 7, 11], dtype='int64')

In [36]:
indA.intersection(indB)

Index([3, 5, 7], dtype='int64')

In [37]:
indA.union(indB)

Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [38]:
indA.symmetric_difference(indB)

Index([1, 2, 9, 11], dtype='int64')