#Series

The first main data type we will learn about for pandas is the Series data type. 


Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index

A Series is very similar to a NumPy array (in fact it is built on top of the NumPy array object). What differentiates the NumPy array from a Series, is that a Series can have axis labels, meaning it can be indexed by a label, instead of just a number location. It also doesn't need to hold numeric data, it can hold any arbitrary Python Object.


A pandas Series can be created using the following constructor −

    pandas.Series( data, index, dtype, copy)

    data : data takes various forms like ndarray, list, constants

    index : Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.

    dtype : dtype is for data type. If None, data type will be inferred

    copy : Copy data. Default False

Let's explore this concept through some examples:

In [0]:
import numpy as np
import pandas as pd

### Creating a Series

You can convert a list,numpy array, or dictionary to a Series:

In [83]:
s = pd.Series()
print(s)

Series([], dtype: float64)


  """Entry point for launching an IPython kernel.


In [0]:
labels = ['a','b','c']
my_list = [10,20,30]


**Using Lists**

In [85]:
pd.Series(data=my_list)

0    10
1    20
2    30
dtype: int64

We did not pass any index, so by default, it assigned the indexes ranging from 0 to len(data)-1, i.e., 0 to 3


As we see in the preceding output, the Series wraps both a sequence of values and a sequence of indices, which we can access with the values and index attributes. The values are simply a familiar NumPy array:

In [86]:
s = pd.Series(data=my_list)

s.values

array([10, 20, 30])

In [87]:
s.index

RangeIndex(start=0, stop=3, step=1)

In [91]:
pd.Series(data=my_list,index=labels)

a    10
b    20
c    30
dtype: int64

In [92]:
pd.Series(my_list,labels)

a    10
b    20
c    30
dtype: int64

We passed the index values here. Now we can see the customized indexed values in the output.

We can even use noncontiguous or nonsequential indices:

In [94]:
pd.Series(my_list, index=[2, 5, 3])

2    10
5    20
3    30
dtype: int64

**NumPy Arrays**

In [96]:
arr = np.array([10,20,30])
pd.Series(arr)

0    10
1    20
2    30
dtype: int64

In [97]:
pd.Series(arr,labels)

a    10
b    20
c    30
dtype: int64

**Dictionary**


A dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

In [100]:
d = {'a':10,'b':20,'c':30}
pd.Series(d)

a    10
b    20
c    30
dtype: int64

In [48]:
s = pd.Series(d)
s.keys()

Index(['a', 'b', 'c'], dtype='object')

In [101]:
list(s.items())

[(0, 10), (1, 20), (2, 30)]

#Create a Series from Scalar

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

In [102]:
pd.Series(5, index=[0, 1, 2, 3])


0    5
1    5
2    5
3    5
dtype: int64

### Data in a Series

A pandas Series can hold a variety of object types:

In [106]:
pd.Series(data= labels)

0    a
1    b
2    c
dtype: object

In [107]:
# Even functions (although unlikely that you will use this)
pd.Series([sum,print,len])

0      <built-in function sum>
1    <built-in function print>
2      <built-in function len>
dtype: object

# Data Indexing and Selection

#Accessing Data from Series with Position

Data in the series can be accessed similar to that in an ndarray.

In [108]:

s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
print(s)
#retrieve the first element
print (s[0])

a    1
b    2
c    3
d    4
e    5
dtype: int64
1


Retrieve the first three elements in the Series. If a : is inserted in front of it, all items from that index onwards will be extracted. If two parameters (with : between them) is used, items between the two indexes (not including the stop index)

In [109]:
#retrieve the first three element
print (s[:3])

a    1
b    2
c    3
dtype: int64


In [110]:
#retrieve the last three element
print (s[-3:])

c    3
d    4
e    5
dtype: int64


#Retrieve Data Using Label (Index)

A Series is like a fixed-size dict in that you can get and set values by index label.

In [111]:
#retrieve a single element
print (s['a'])

1


Retrieve multiple elements using a list of index label values.

In [112]:
#retrieve multiple elements
print (s[['a','c','d']])

a    1
c    3
d    4
dtype: int64


If a label is not contained, an exception is raised.

In [113]:
#retrieve multiple elements
print (s['d', 'f'])

KeyError: ignored

# Indexers: loc, iloc, and ix


These slicing and indexing conventions can be a source of confusion. For example, if
your Series has an explicit integer index, an indexing operation such as data[1] will
use the explicit indices, while a slicing operation like data[1:3] will use the implicit
Python-style index.

In [114]:
s = pd.Series(['a', 'b', 'c'], index=[1, 3, 5])
s

1    a
3    b
5    c
dtype: object

In [115]:
# explicit index when indexing
s[1]

'a'

In [116]:
# implicit index when slicing
s[1:3]

3    b
5    c
dtype: object

Expectations :

s[1:4]

    1    a
    3    b
    dtype: object


Actual Results :

s[1:4]

    3    b
    5    c
    dtype: object

Because here python used its inbuilt index location

    P.I  O.I   V.
    0     1    a
    1     3    b
    2     5    c

    P.I - python index
    O.I - our index 
    V. - values in the series


Because of this potential confusion in the case of integer indexes, Pandas provides
some special indexer attributes that explicitly expose certain indexing schemes. These are not functional methods, but attributes that expose a particular slicing interface to the data in the Series .

# loc attribute
First, the loc attribute allows indexing and slicing that **always references the explicit
index**:


In [117]:
s

1    a
3    b
5    c
dtype: object

In [118]:
s.loc[1]

'a'

In [119]:
s.loc[3]

'b'

In [120]:
s.loc[1:3]

1    a
3    b
dtype: object

#iloc attribute


The iloc attribute allows indexing and slicing that **always references the implicit Python-style index**:

In [121]:
s

1    a
3    b
5    c
dtype: object

In [122]:
s.iloc[1]

'b'

In [123]:
s.iloc[2]

'c'

In [79]:
s.iloc[1:3]

3    b
5    c
dtype: object

A third indexing attribute, ix , is a hybrid of the two, and for Series objects is equivalent to standard [] -based indexing. The purpose of the ix indexer will become more apparent in the context of DataFrame objects, which we will discuss in a moment.


# “explicit is better than implicit.” 
One guiding principle of Python code is that “explicit is better than implicit.” The
explicit nature of loc and iloc make them very useful in maintaining clean and read‐
able code; especially in the case of integer indexes, I recommend using these both to
make code easier to read and understand, and to prevent subtle bugs due to the
mixed indexing/slicing convention.

## Operations on the series

The key to using a Series is understanding its index. Pandas makes use of these index names or numbers by allowing for fast look ups of information (works like a hash table or dictionary).

Let's see some examples of how to grab information from a Series. Let us create two sereis, ser1 and ser2:

In [0]:
ser1 = pd.Series([1,2,3,4],index = ['USA', 'Germany','USSR', 'Japan'])                                   

In [124]:
ser1

USA        1
Germany    2
USSR       3
Japan      4
dtype: int64

In [0]:
ser2 = pd.Series([1,2,5,4],index = ['USA', 'Germany','Italy', 'Japan'])                                   

In [126]:
ser2

USA        1
Germany    2
Italy      5
Japan      4
dtype: int64

Operations are then also done based off of index:

In [128]:
ser1 + ser2

Germany    4.0
Italy      NaN
Japan      8.0
USA        2.0
USSR       NaN
dtype: float64