![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/1200px-Pandas_logo.svg.png)
# Introducing Pandas Objects
- **`Series`**
- **`DataFrame`**
- **`Index`**

In [1]:
import numpy as np
import pandas as pd
pd.__version__

'1.0.1'

## The Pandas Series Object: one-dimensional array of indexed data
![](https://storage.googleapis.com/lds-media/images/series-and-dataframe.width-1200.png)

In [2]:
data = pd.Series([0.25, 0.5, 0.75, 1.0])
data

0    0.25
1    0.50
2    0.75
3    1.00
dtype: float64

### `values` & `index`

**``Series``** wraps both a sequence of values and a sequence of indices, which we can access with the **``values``** and **``index``** attributes.  
The **``values``** are simply a familiar NumPy array:

In [3]:
data.values

array([0.25, 0.5 , 0.75, 1.  ])

The **``index``** is an array-like object of type **``pd.Index``**

In [4]:
data.index

RangeIndex(start=0, stop=4, step=1)

### Indexing

In [5]:
data[1]

0.5

In [6]:
data[1:3]

1    0.50
2    0.75
dtype: float64

### ``Series`` as generalized NumPy array

The essential difference between **``Series``** & Numpy **``array``** is the presence of the index: 
- While the Numpy Array has an ***implicitly defined* integer index** used to access the values
- The Pandas ``Series`` has an ***explicitly defined* index** associated with the values.

This explicit index definition gives the ``Series`` object additional capabilities. For example, the index need not be an integer, but can consist of values of any desired type.
For example, if we wish, we can **use strings as an index**:

In [7]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=['a', 'b', 'c', 'd'])
data

a    0.25
b    0.50
c    0.75
d    1.00
dtype: float64

And the item access works as expected:

In [8]:
data['b']

0.5

We can even use **non-contiguous** or **non-sequential** indices:

In [9]:
data = pd.Series([0.25, 0.5, 0.75, 1.0],
                 index=[2, 5, 3, 7])
data

2    0.25
5    0.50
3    0.75
7    1.00
dtype: float64

In [10]:
data[5]

0.5

![](https://ekababisong.org/assets/seminar_IEEE/pandas-DataStructure.png)

### Constructing Series objects

We've already seen a few ways of constructing a Pandas ``Series`` from scratch; all of them are some version of the following:

```python
>>> pd.Series(data, index=index)
```

where ``index`` is an optional argument, and ``data`` can be one of many entities.

For example, ``data`` can be a list or NumPy array, in which case ``index`` defaults to an integer sequence:

In [11]:
pd.Series([2, 4, 6])

0    2
1    4
2    6
dtype: int64

``data`` can be a scalar, which is repeated to fill the specified index:

In [12]:
pd.Series(5, index=[100, 200, 300])

100    5
200    5
300    5
dtype: int64

``data`` can be a dictionary, in which ``index`` defaults to the sorted dictionary keys:

In [13]:
pd.Series({2:'a', 1:'b', 3:'c'})

2    a
1    b
3    c
dtype: object

In each case, the index can be explicitly set if a different result is preferred:

In [14]:
pd.Series({2:'a', 1:'b', 3:'c'}, index=[3, 2])

3    c
2    a
dtype: object

Notice that in this case, the ``Series`` is populated only with the explicitly identified keys.

## The Pandas DataFrame Object
![](https://www3.ntu.edu.sg/home/ehchua/programming/webprogramming/images/Python_Pandas_DataFrame.png)

### DataFrame as a generalized NumPy array
#### You can think of a ``DataFrame`` as a sequence of aligned ``Series`` objects.
- Here, by "aligned" we mean that they share the same index.  
![](https://miro.medium.com/max/3452/1*6p6nF4_5XpHgcrYRrLYVAw.png)

## The Pandas Index Object
### Both the ``Series`` and ``DataFrame`` objects contain an explicit `index` that lets you reference and modify data.
This **``Index``** object can be thought of either:
- as an ***immutable array*** or 
- as an ***ordered set***

In [15]:
ind = pd.Index([2, 3, 5, 7, 11])
ind

Int64Index([2, 3, 5, 7, 11], dtype='int64')

### Index as immutable array

The ``Index`` in many ways operates like an array.
For example, we can use standard Python indexing notation to retrieve values or slices:

In [16]:
ind[1]

3

In [17]:
ind[::2]

Int64Index([2, 5, 11], dtype='int64')

``Index`` objects also have many of the attributes familiar from NumPy arrays:

In [18]:
print(ind.size, ind.shape, ind.ndim, ind.dtype)

5 (5,) 1 int64


One difference between ``Index`` objects and NumPy arrays is that indices are **immutable** – that is, they **cannot be modified via the normal means**:

In [19]:
ind[1] = 0

TypeError: Index does not support mutable operations

This immutability makes it safer to share indices between multiple ``DataFrame``s and arrays, without the potential for side effects from inadvertent index modification.

### Index as ordered set

Pandas objects are designed to facilitate operations such as joins across datasets, which depend on many aspects of set arithmetic.
The ``Index`` object follows many of the conventions used by Python's built-in ``set`` data structure, so that unions, intersections, differences, and other combinations can be computed in a familiar way:

In [20]:
indA = pd.Index([1, 3, 5, 7, 9])
indB = pd.Index([2, 3, 5, 7, 11])

In [21]:
indA & indB  # intersection

Int64Index([3, 5, 7], dtype='int64')

In [22]:
indA | indB  # union

Int64Index([1, 2, 3, 5, 7, 9, 11], dtype='int64')

In [23]:
indA ^ indB  # symmetric difference

Int64Index([1, 2, 9, 11], dtype='int64')

These operations may also be accessed via object methods, for example ``indA.intersection(indB)``.

# RECAP
![](https://media.geeksforgeeks.org/wp-content/uploads/20200225170506/pandas-series.png)
![](https://media.geeksforgeeks.org/wp-content/uploads/finallpandas.png)
![](https://www.cdn.geeksforgeeks.org/wp-content/uploads/creating_dataframe1.png)