# Pandas

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. The data manipulation capabilities of pandas are built on top of the numpy library. In a way, numpy is a dependency of the pandas library.

In this notebook we'll try various pandas methods and in the process learn more about Pandas.

### Installation

Please follow this [link](https://pandas.pydata.org/pandas-docs/stable/install.html). All the necessary steps are mentioned here.

## Importing Pandas

Once Pandas is installed, we can use it our file

In [1]:
import numpy as np
import pandas as pd

## Series

Series are similar to numpy arrays. The only difference between them is that series can have axis labels which means that it can be indexed by a label and also by number location.

### Creating Series

There are various ways to create Series. Some of them are listed below.

1. **Using Python List**

In [2]:
seriesLabel = ['label1', 'label2', 'label3']
exampleList = [5, 10, 20]

In [3]:
pd.Series(exampleList)

0     5
1    10
2    20
dtype: int64

In [4]:
pd.Series(exampleList, seriesLabel)

label1     5
label2    10
label3    20
dtype: int64

2. **Using Numpy Arrays**

In [5]:
exampleNumpyArray = np.array([6, 12, 18])

In [6]:
pd.Series(exampleNumpyArray)

0     6
1    12
2    18
dtype: int32

In [7]:
pd.Series(exampleNumpyArray, seriesLabel)

label1     6
label2    12
label3    18
dtype: int32

3. **Using Dictionary**

In [8]:
exampleDictionary = { 'label4': 7, 'label5': 14, 'label6': 21 }

In [9]:
# No need to mention labels parameter
pd.Series(exampleDictionary)

label4     7
label5    14
label6    21
dtype: int64

In [10]:
# If you mention different labels for a dictionary
pd.Series(exampleDictionary, seriesLabel)

label1   NaN
label2   NaN
label3   NaN
dtype: float64

### Data and Index Parameter in Series

1. **Data**

Series can hold a variety of data.

In [11]:
def sampleFunc1():
    pass

def sampleFunc2():
    pass

def sampleFunc3():
    pass

pd.Series(data=[sampleFunc1, sampleFunc2, sampleFunc3])

0    <function sampleFunc1 at 0x0000022170DA07B8>
1    <function sampleFunc2 at 0x0000022170DA0620>
2    <function sampleFunc3 at 0x0000022170DA08C8>
dtype: object

In [12]:
pd.Series(['a', 2, 'hey'])

0      a
1      2
2    hey
dtype: object

2. **Index**

It is the second parameter which acts as the label for the series.

In [13]:
pd.Series(data=[sampleFunc1, sampleFunc2, sampleFunc3], index=['a', 'b', 'c'])

a    <function sampleFunc1 at 0x0000022170DA07B8>
b    <function sampleFunc2 at 0x0000022170DA0620>
c    <function sampleFunc3 at 0x0000022170DA08C8>
dtype: object

In [14]:
pd.Series(['a', 2, 'hey'], ['label', 2, 'key'])

label      a
2          2
key      hey
dtype: object

## DataFrames

DataFrames are like spreadsheets or SQL tables. DataFrames are utilised a lot by pandas users.

### Creating a DataFrame

pd.DataFrame( *data*, *index*, *columns* )

*data* -> content of the cells<br>
*index* -> labels for rows<br>
*columns* -> labels for columns

Returns wwo-dimensional size-mutable, potentially heterogeneous tabular data i.e. DataFrame

In [15]:
pd.DataFrame(data = np.random.randint(1,51, (4,3)), index = ['row1', 'row2', 'row3', 'row4'], columns = ['col1', 'col2', 'col3'])

Unnamed: 0,col1,col2,col3
row1,22,9,12
row2,47,1,24
row3,48,16,34
row4,29,46,33


### Selection and Indexing

In [5]:
dataFrame = pd.DataFrame(data = np.random.randint(1,51, (4,3)), index = ['row1', 'row2', 'row3', 'row4'], columns = ['col1', 'col2', 'col3'])
dataFrame

Unnamed: 0,col1,col2,col3
row1,4,36,17
row2,49,10,38
row3,25,31,11
row4,48,27,50


**Selection of a single column**

In [4]:
dataFrame['col1']

row1    12
row2    45
row3    48
row4    11
Name: col1, dtype: int32

**Selecting multiple columns**

**Note**: This notebook is not complete, more content will be added soon.