# Introducing Pandas Objects
## Pandas Series Objects

A Pandas Series is a one-dimensional array of indexed data. It can be created from a
list or array as follows:

In [2]:
import numpy as np 
import pandas as pd 
# pandas Series Object
data_1 = pd.Series([0.1, 0.2, 0.3, 0.4, .5])
data_1

0    0.1
1    0.2
2    0.3
3    0.4
4    0.5
dtype: float64

As we see in the preceding output, the Series wraps both a sequence of values and a
sequence of indices, which we can access with the values and index attributes. The
values are simply a familiar NumPy array:

In [3]:
# accessing the values
data_1.values

array([0.1, 0.2, 0.3, 0.4, 0.5])

The index is an array-like object of type pd.Index,

In [4]:
# Checking Index
data_1.index

RangeIndex(start=0, stop=5, step=1)

Like with a NumPy array, data can be accessed by the associated index via the familiar
Python square-bracket notation:

In [5]:
# accessing data using the associated index
print(data_1[1])
print(data_1[1:3])

0.2
1    0.2
2    0.3
dtype: float64


## Series as generalized NumPy array
It looks like the Series object is basically interchangeable
with a one-dimensional NumPy array. The essential difference is the presence
of the index: while the NumPy array has an <i>implicitly defined </i>integer index used
to access the values, the Pandas Series has an <i>explicitly defined</i> index associated with
the values. 

This <i>explicit index definition</i> gives the Series object additional capabilities. For
example, the index need not be an integer, but can consist of values of any desired
type. For example, if we wish, we can use strings as an index:

In [6]:
# using strings as an index
data_2 = pd.Series([0.1, 0.2, 0.3, 0.4],
                  index = ['a', 'b', 'c', 'd'])
print(data_2)
# And the item access works as expected:
print(data_2['b'])

a    0.1
b    0.2
c    0.3
d    0.4
dtype: float64
0.2


We can even use noncontiguous or nonsequential indices:

In [7]:
# using noncontiiguous or nonsequential indices
data_3 = pd.Series([0.1, 0.2, 0.3, 0.4],
                  index = [2, 5, 3, 7])
print(data_3)
print(data_3[5])

2    0.1
5    0.2
3    0.3
7    0.4
dtype: float64
0.2


## Series as specialized dictionary
In this way, you can think of a Pandas Series a bit like a specialization of a Python
dictionary. 

A dictionary is a structure that maps arbitrary keys to a set of arbitrary values, and a Series is a structure that maps typed keys to a set of typed values. This typing is important: just as the type-specific compiled code behind a NumPy array
makes it more efficient than a Python list for certain operations, the type information
of a Pandas Series makes it much more efficient than Python dictionaries for certain
operations.

We can make the Series-as-dictionary analogy even more clear by constructing a
Series object directly from a Python dictionary:

In [8]:
product_rev_dict = {'Bananas': 4000000, 'Onions': 3000000, 'Tomatoes': 3500000, 'Maize Flour': 6300000}
# constructing a Series object directly from a Python dictionary
product_rev = pd.Series(product_rev_dict)
print(type(product_rev)) # Series data type
product_rev

<class 'pandas.core.series.Series'>


Bananas        4000000
Onions         3000000
Tomatoes       3500000
Maize Flour    6300000
dtype: int64

By default, a Series will be created where the index is drawn from the sorted keys.
From here, typical dictionary-style item access can be performed:

In [9]:
# accessing items in the Series
product_rev['Bananas']

4000000

Unlike a dictionary, though, the Series also supports array-style operations such as
slicing:

In [10]:
# Series supports array-style operations such as:
# slicing
print(product_rev[1:4])
print(product_rev['Bananas':'Tomatoes'])

Onions         3000000
Tomatoes       3500000
Maize Flour    6300000
dtype: int64
Bananas     4000000
Onions      3000000
Tomatoes    3500000
dtype: int64


## Data Indexing and Selection
### Data Selection in Series

#### Series as dictionary

In [14]:
data = pd.Series([0.1, 0.2, 0.3, 0.4, 0.5], 
                    index=['a', 'b', 'c', 'd', 'e'])
print(data)
print(data['b'])

a    0.1
b    0.2
c    0.3
d    0.4
e    0.5
dtype: float64
0.2


We can also use dictionary-like Python expressions and methods to examine the
keys/indices and values:

In [21]:
print('a' in data)
print('Keys:', data.keys())
print('Items:', list(data.items()))

True
Keys: Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
Items: [('a', 0.1), ('b', 0.2), ('c', 0.3), ('d', 0.4), ('e', 0.5)]


Series objects can even be modified with a dictionary-like syntax. Just as you can
extend a dictionary by assigning to a new key, you can extend a Series by assigning
to a new index value:

In [22]:
data['e'] = 0.999999
data

a    0.100000
b    0.200000
c    0.300000
d    0.400000
e    0.999999
dtype: float64

This easy mutability of the objects is a convenient feature: under the hood, Pandas is
making decisions about memory layout and data copying that might need to take
place; the user generally does not need to worry about these issues.

### Series as one-dimensional array
A Series builds on this dictionary-like interface and provides array-style item selection
via the same basic mechanisms as NumPy arrays—that is, slices, masking, and
fancy indexing. Examples of these are as follows: 

In [27]:
# slicing by explicit index
print(data['a': 'c'])
# slicing by implicit integer index
print(data[0:3])
# masking
print(data[(data >0.3) & (data < 0.8)])

a    0.1
b    0.2
c    0.3
dtype: float64
a    0.1
b    0.2
c    0.3
dtype: float64
d    0.4
dtype: float64


### Indexers: loc, iloc, and ix
These slicing and indexing conventions can be a source of confusion. For example, if
your Series has an explicit integer index, an indexing operation such as data [1] will
use the explicit indices, while a slicing operation like data [1:3] will use the implicit
Python-style index.

In [29]:
data = pd.Series(['a', 'b', 'c'], index = [1, 3, 5])
data

1    a
3    b
5    c
dtype: object

In [33]:
# explicit index when indexing
print(data[1])
# implicit index when slicing
print(data[1:3])

a
3    b
5    c
dtype: object


Because of this potential confusion in the case of integer indexes, Pandas provides
some special indexer attributes that explicitly expose certain indexing schemes. These are not functional methods, but attributes that expose a particular slicing interface to
the data in the Series.

First, the loc attribute allows indexing and slicing that always references the explicit
index:

In [39]:
# loc attribute
print(data.loc[1])
print(data.loc[1:3])

a
1    a
3    b
dtype: object


The iloc attribute allows indexing and slicing that always references the implicit
Python-style index:

In [41]:
# iloc attribute
print(data.iloc[1])
print(data.iloc[1:3])

b
3    b
5    c
dtype: object
