<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#What-is-Pandas?" data-toc-modified-id="What-is-Pandas?-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>What is Pandas?</a></span></li><li><span><a href="#Pandas-Series" data-toc-modified-id="Pandas-Series-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Pandas Series</a></span></li><li><span><a href="#Pandas-DataFrame" data-toc-modified-id="Pandas-DataFrame-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Pandas DataFrame</a></span></li><li><span><a href="#Advantages-of-Pandas" data-toc-modified-id="Advantages-of-Pandas-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Advantages of Pandas</a></span></li><li><span><a href="#Creating-Series" data-toc-modified-id="Creating-Series-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Creating Series</a></span></li><li><span><a href="#Creating-DataFrame" data-toc-modified-id="Creating-DataFrame-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Creating DataFrame</a></span><ul class="toc-item"><li><span><a href="#About-the-Author" data-toc-modified-id="About-the-Author-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>About the Author</a></span></li></ul></li></ul></div>

## What is Pandas?
The Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language.

## Pandas Series 
A **one-dimensional** labeled array a capable of holding any data type

## Pandas DataFrame 
A **two-dimensional** labeled data structure with columns of potentially different types
![Pandas](../img/pandas.png)

## Advantages of Pandas 
- Data representation
- Less writing and more work done
- An extensive set of features
- Efficiently handles large data
- Makes data flexible and customizable
- Made for Python

In [3]:
# Conventional  way to import pandas 
import pandas as pd 

In [4]:
# Check pandas version
pd.__version__

'0.25.1'

In [5]:
# Show version of all packages 
pd.show_versions()


INSTALLED VERSIONS
------------------
commit           : None
python           : 3.7.4.final.0
python-bits      : 64
OS               : Linux
OS-release       : 5.3.0-26-generic
machine          : x86_64
processor        : x86_64
byteorder        : little
LC_ALL           : None
LANG             : en_US.UTF-8
LOCALE           : en_US.UTF-8

pandas           : 0.25.1
numpy            : 1.17.2
pytz             : 2019.3
dateutil         : 2.8.0
pip              : 19.2.3
setuptools       : 41.4.0
Cython           : 0.29.13
pytest           : 5.2.1
hypothesis       : None
sphinx           : 2.2.0
blosc            : None
feather          : None
xlsxwriter       : 1.2.1
lxml.etree       : 4.4.1
html5lib         : 1.0.1
pymysql          : None
psycopg2         : None
jinja2           : 2.10.3
IPython          : 7.8.0
pandas_datareader: None
bs4              : 4.8.0
bottleneck       : 1.2.1
fastparquet      : None
gcsfs            : None
lxml.etree       : 4.4.1
matplotlib       : 3.1.1
numexp

## Creating Series

In [6]:
# Create Series 
s1 = pd.Series([3, 6, 9, 12])
s1

0     3
1     6
2     9
3    12
dtype: int64

In [7]:
# Check type 
type(s1)

pandas.core.series.Series

In [8]:
# To see values 
s1.values

array([ 3,  6,  9, 12])

In [9]:
# To see index/keys 
s1.index

RangeIndex(start=0, stop=4, step=1)

In [11]:
# Creating labeled series 
s2 = pd.Series([200000, 300000, 4000000, 500000], index=['A', 'B', 'C', 'D'])

In [12]:
s2

A     200000
B     300000
C    4000000
D     500000
dtype: int64

In [13]:
s2.values

array([ 200000,  300000, 4000000,  500000])

In [14]:
s2.index

Index(['A', 'B', 'C', 'D'], dtype='object')

In [21]:
# Indexing
s2['A']

200000

In [15]:
# Boolean indexing
s2[s2 > 700000]

C    4000000
dtype: int64

## Creating DataFrame 

In [4]:
# Create a DataFrame 
data = {'Country': ['Belgium', 'India', 'Brazil'],
        'Capital': ['Brussels', 'New Delhi', 'Brasília'],
        'Population': [11190846, 1303171035, 207847528]
}

df = pd.DataFrame(data, columns=["Country", "Capital", "Population"])

In [5]:
df

Unnamed: 0,Country,Capital,Population
0,Belgium,Brussels,11190846
1,India,New Delhi,1303171035
2,Brazil,Brasília,207847528


In [6]:
# Check type 
type(df)

pandas.core.frame.DataFrame

In [7]:
# Indexing
df["Country"]

0    Belgium
1      India
2     Brazil
Name: Country, dtype: object

In [8]:
# or 
df.Country

0    Belgium
1      India
2     Brazil
Name: Country, dtype: object

In [10]:
# Boolean indexing 
df["Population"]  > 40000000

0    False
1     True
2     True
Name: Population, dtype: bool

In [22]:
df["Country"] == "Belgium"

0     True
1    False
2    False
Name: Country, dtype: bool

In [23]:
df["Capital"] == "Brasilia"

0    False
1    False
2    False
Name: Capital, dtype: bool

<h3>About the Author</h3>
This repo was created by <a href="https://www.linkedin.com/in/jubayer28/" target="_blank">Jubayer Hossain</a> <br>
<a href="https://www.linkedin.com/in/jubayer28/" target="_blank">Jubayer Hossain</a> is a student of Microbiology at Jagannath University and the founder of <a href="https://github.com/hdro" target="_blank">Health Data Research Organization</a>. He is also a team member of a bioinformatics research group known as Bio-Bio-1. 

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.m