![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/ed/Pandas_logo.svg/2560px-Pandas_logo.svg.png)

# pandas Tutorial


## Content:
1. [Introduction to pandas](#introduction_to_pandas)
2. [The Basics of pandas](#basics_of_pandas)
3. [pandas Series](#pandas_series)
    * [Creating pandas series](#creating_series)
    * [Accessing pandas series](#accessing_series)
4. [pandas Dataframes](#pandas_dataframes)
    * [Creating pandas dataframes](#creating_dataframes)
    * [Accessing pandas dataframes](#accessing_dataframes)
    

<a id="introduction_to_pandas"></a> <br>
# Introduction to Pandas
Pandas is anpen-source BSD-licensed library built on top of NumPy and Python that provides high-performance easy-to-use data structures and data analysis tools.

Pandas has been one of the most commonly used tools for Data Science and Machine learning requires, which is used for data cleaning and analysis. Here, Pandas is the best tool for handling this real-world messy data. 

### Features of Pandas
* Provides tools for loading data from different file formats into in-memory data objects.
* Represents the data in tabular form.
* Label-based Slicing, Indexing, and Subsetting can be performed on large datasets.
* Merges and joins two datasets easily.
* Pivoting and reshaping data sets.
* Easy handling of missing data (represented as NaN) in both floating point and non-floating point data.
* Size mutability: DataFrame and higher-dimensional object columns can be added and deleted.
* Provides multiple features of time-series.

<a id="basics_of_pandas"></a> <br>
# The Basics to Pandas

Pandas pacage can be imported as below:

In [1]:
import pandas as pd   # import Pandas
import numpy as np    # import NumPy

***NumPy*** and ***Pandas*** go hand-in-hand, as much of pandas is built on NumPy. It is, therefore, very convenient to import NumPy and put it in a ***np*** namespace. Likewise, pandas is imported and referenced with a ***pd***.

<a id="primary_pandas_objects"></a> <br>
### Primary pandas objects

Pandas framework provides two primary objects 
* ***Series***
* ***DataFrame***


<a id="pandas_series"></a> <br>
# pandas Series
The base data structure of pandas is the Series object, which is designed to operate
similar to a NumPy array but also adds index capabilities.

Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). The axis labels are collectively  referred to as the index. 

<a id="creating_series"></a> <br>
### Creating pandas Series
The basic method to create a Series is to call:

> ***s = pd.Series(data, index=index)***

Here, *data* can be many different things:
* an ndarray
* a Python dict
* a scalar value (like 5)

#### From ndarray

In [2]:
# creating series from Python array
s = pd.Series([1, 2, 3, 4])
print(s)

0    1
1    2
2    3
3    4
dtype: int64


If data is an ndarray, index must be the same length as data. If no index is passed, one will be created having values [0, ..., len(data) - 1].

In [3]:
# creating series from ndarray of random numbers
s = pd.Series(np.random.randn(5))
print(s)

0   -1.295687
1   -1.013091
2   -2.137665
3    0.115342
4    1.850596
dtype: float64


In [4]:
# generating series from ndarray of random numbers with string indices
s = pd.Series(np.random.randn(5), index=["a", "b", "c", "d", "e"])
print(s)

a   -0.543201
b    0.808219
c   -0.046680
d   -0.600156
e   -1.039131
dtype: float64


#### From Dictionary

In [5]:
# creating series from a dictionary
d = {"b": 1, "a": 0, "c": 2}
s = pd.Series(d)
print(s)

b    1
a    0
c    2
dtype: int64


In [6]:
d = {"a": 0.0, "b": 1.0, "c": 2.0}
s = pd.Series(d, index=["b", "c", "d", "a"])
print(s)

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64


Here for index *d* it will assign *NaN* as there is no value assigned to the key *d*.

#### From Scalar Values

In [7]:
s = pd.Series(5.0, index=["a", "b", "c", "d", "e"])
print(s)

a    5.0
b    5.0
c    5.0
d    5.0
e    5.0
dtype: float64


<a id="accessing_series"></a> <br>
### Accessing pandas Series
Series acts very similarly to a ndarray, and is a valid argument to most NumPy functions. 

In [8]:
s = pd.Series(np.random.randn(7), index=["a", "b", "c", "d", "e", "f", "g"])
#s = pd.Series(np.random.randn(10))             # Integer index
print("Series s:\n", s)
print("\ns[0]: ", s[0])

Series s:
 a    0.005184
b   -1.013098
c    0.276929
d   -0.882244
e   -0.700142
f   -0.742797
g   -1.231968
dtype: float64

s[0]:  0.005183696866060968


However, operations such as slicing will also slice the index.

In [9]:
print("s[:3]: ", s[:3])

s[:3]:  a    0.005184
b   -1.013098
c    0.276929
dtype: float64


In [10]:
print("s[[6, 4, 1]]: \n", s[[6, 4, 1]])

s[[6, 4, 1]]: 
 g   -1.231968
e   -0.700142
b   -1.013098
dtype: float64
