# Pandas
[Source#1](https://www.tutorialspoint.com/python_pandas/)

Pandas is an open-source Python Library providing high-performance data manipulation and analysis tool using its powerful data structures. The name Pandas is derived from the word Panel Data – an Econometrics from Multidimensional data.

In 2008, developer Wes McKinney started developing pandas when in need of high performance, flexible tool for analysis of data.

## Data Structure
- Series - 1 Dimention -- Series is a one-dimensional array
- DataFrame - 2 Dimention
  -- DataFrame is a two-dimensional array with heterogeneous data
- Panel - 3 Dimention -- Panel is a three-dimensional data structure with heterogeneous data

### Series Data Structure

##### Create a Empty Series

In [1]:
import pandas as pd
s = pd.Series()
print(s)

Series([], dtype: float64)


##### Create a Series from ndarray

In [2]:
import numpy as np

data  = np.array(['a','b','c','d','e'])
s     = pd.Series(data)
print(s)

0    a
1    b
2    c
3    d
4    e
dtype: object


In [3]:
s_idx = pd.Series(data, index=[100,101,102,103,104])
print(s_idx)

100    a
101    b
102    c
103    d
104    e
dtype: object


##### Create a Series from dict

In [4]:
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s

a    0.0
b    1.0
c    2.0
dtype: float64


In [5]:
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print s

b    1.0
c    2.0
d    NaN
a    0.0
dtype: float64


##### Create a Series from Scalar

In [6]:
s = pd.Series(5, index=['a',2,'name',4,5])
print(s)

a       5
2       5
name    5
4       5
5       5
dtype: int64


##### Accessing Data from Series with position

In [7]:
s = pd.Series([1,2,3,4,5], index=['a','b',3,4,'idx'])
print(s)

a      1
b      2
3      3
4      4
idx    5
dtype: int64


In [8]:
s['idx']

5

In [9]:
s[2]

3

In [10]:
s[1:3]

b    2
3    3
dtype: int64

In [11]:
s[:3]

a    1
b    2
3    3
dtype: int64

In [12]:
s[-3:]

3      3
4      4
idx    5
dtype: int64

In [13]:
s[['a', 'b']]

a    1
b    2
dtype: int64

### DataFrame Data Structure
DataFrame is two-dimensional data structure
    - Potentially columns are of different types
    - Size – Mutable
    - Labeled axes (rows and columns)
    - Can Perform Arithmetic operations on rows and columns
![img](https://www.tutorialspoint.com/python_pandas/images/structure_table.jpg)

##### Create Empty DataFrame

In [14]:
df = pd.DataFrame()
print(df)

Empty DataFrame
Columns: []
Index: []


##### Create DataFrame from List

In [15]:
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print(df)

   0
0  1
1  2
2  3
3  4
4  5


In [16]:
data = [['Melody', 4], ['Mahdi', 34], ['Maryam', 34]]
df = pd.DataFrame(data)
print(df)

        0   1
0  Melody   4
1   Mahdi  34
2  Maryam  34


In [17]:
data = [['Melody', 4],['Mahdi', 34], ['Maryam', 34]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
print(df)

     Name  Age
0  Melody    4
1   Mahdi   34
2  Maryam   34


In [18]:
data = [['Alex', 10], [1,2,3,4,5]]
df = pd.DataFrame(data)
print(df)

      0   1    2    3    4
0  Alex  10  NaN  NaN  NaN
1     1   2  3.0  4.0  5.0


##### Create a DataFrame from Dict of ndarray / List

In [19]:
data = {'name': ['Melody', 'Mahdi', 'Maryam'], 'Age': [4,34,34]}
df = pd.DataFrame(data)
print(df)

   Age    name
0    4  Melody
1   34   Mahdi
2   34  Maryam


In [20]:
df = pd.DataFrame(data, index=['rank1', 'rank2','rank3'])
print(df)

       Age    name
rank1    4  Melody
rank2   34   Mahdi
rank3   34  Maryam


In [21]:
data = [{'a': 1, 'b': 2},{'a': 5, 'b': 10, 'c': 20}]

#With two column indices, values same as dictionary keys
df1 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])

#With two column indices with one index with other name
df2 = pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b1'])
print df1
print df2

        a   b
first   1   2
second  5  10
        a  b1
first   1 NaN
second  5 NaN


# PANEL and ...

# Basic Funtionality

In [36]:
s = pd.Series(np.random.randn(10))
print(s)
print(s.axes)

a = s.axes

0    1.049670
1   -0.558312
2   -0.454707
3   -1.038599
4   -2.224802
5   -0.479216
6    1.396512
7    0.207327
8    0.032524
9   -1.042468
dtype: float64
[RangeIndex(start=0, stop=10, step=1)]


In [37]:
s.empty

False

In [38]:
s.ndim

1

In [39]:
s.size

10

In [44]:
s.values

array([ 1.04966988, -0.5583117 , -0.45470681, -1.03859875, -2.22480199,
       -0.4792162 ,  1.39651225,  0.20732675,  0.03252358, -1.04246834])

In [46]:
print s.values[2]
print s[2]

-0.454706813211
-0.454706813211


In [51]:
s.head(2)

0    1.049670
1   -0.558312
dtype: float64

In [52]:
s.tail(2)

8    0.032524
9   -1.042468
dtype: float64

In [53]:
#Create a Dictionary of series
d = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack']),
   'Age':pd.Series([25,26,25,23,30,29,23]),
   'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8])}

#Create a DataFrame
df = pd.DataFrame(d)
print ("Our data series is:")
print df

Our data series is:
   Age   Name  Rating
0   25    Tom    4.23
1   26  James    3.24
2   25  Ricky    3.98
3   23    Vin    2.56
4   30  Steve    3.20
5   29  Smith    4.60
6   23   Jack    3.80


In [55]:
df.transpose()

Unnamed: 0,0,1,2,3,4,5,6
Age,25,26,25,23,30,29,23
Name,Tom,James,Ricky,Vin,Steve,Smith,Jack
Rating,4.23,3.24,3.98,2.56,3.2,4.6,3.8


In [56]:
df.axes

[RangeIndex(start=0, stop=7, step=1),
 Index([u'Age', u'Name', u'Rating'], dtype='object')]

In [61]:
df.dtypes

Age         int64
Name       object
Rating    float64
dtype: object

In [62]:
df.empty

False

In [63]:
df.ndim

2

In [65]:
df.shape

(7, 3)

In [68]:
df.size

21

In [69]:
df.values

array([[25, 'Tom', 4.23],
       [26, 'James', 3.24],
       [25, 'Ricky', 3.98],
       [23, 'Vin', 2.56],
       [30, 'Steve', 3.2],
       [29, 'Smith', 4.6],
       [23, 'Jack', 3.8]], dtype=object)

In [71]:
df.head()

Unnamed: 0,Age,Name,Rating
0,25,Tom,4.23
1,26,James,3.24
2,25,Ricky,3.98
3,23,Vin,2.56
4,30,Steve,3.2


In [73]:
df.tail(3)

Unnamed: 0,Age,Name,Rating
4,30,Steve,3.2
5,29,Smith,4.6
6,23,Jack,3.8


# Descriptive Statistics

In [77]:
print(df)

df.sum()

   Age   Name  Rating
0   25    Tom    4.23
1   26  James    3.24
2   25  Ricky    3.98
3   23    Vin    2.56
4   30  Steve    3.20
5   29  Smith    4.60
6   23   Jack    3.80


Age                                  181
Name      TomJamesRickyVinSteveSmithJack
Rating                             25.61
dtype: object

In [76]:
# DataFrame Sum Axis 1
df.sum(1)

0    29.23
1    29.24
2    28.98
3    25.56
4    33.20
5    33.60
6    26.80
dtype: float64

In [78]:
df.mean()

Age       25.857143
Rating     3.658571
dtype: float64

In [79]:
df.std()

Age       2.734262
Rating    0.698628
dtype: float64

count()
sum()
mean()
median()
mode()
std()
min()
max()
abs()
prod()
cumsum()
cumprod()

In [80]:
df.cumsum()

Unnamed: 0,Age,Name,Rating
0,25,Tom,4.23
1,51,TomJames,7.47
2,76,TomJamesRicky,11.45
3,99,TomJamesRickyVin,14.01
4,129,TomJamesRickyVinSteve,17.21
5,158,TomJamesRickyVinSteveSmith,21.81
6,181,TomJamesRickyVinSteveSmithJack,25.61


In [81]:
df.describe()

Unnamed: 0,Age,Rating
count,7.0,7.0
mean,25.857143,3.658571
std,2.734262,0.698628
min,23.0,2.56
25%,24.0,3.22
50%,25.0,3.8
75%,27.5,4.105
max,30.0,4.6


In [83]:
df.describe(include=['objectect'])

Unnamed: 0,Name
count,7
unique,7
top,Ricky
freq,1


In [86]:
df.describe(include='all')

Unnamed: 0,Age,Name,Rating
count,7.0,7,7.0
unique,,7,
top,,Ricky,
freq,,1,
mean,25.857143,,3.658571
std,2.734262,,0.698628
min,23.0,,2.56
25%,24.0,,3.22
50%,25.0,,3.8
75%,27.5,,4.105


In [90]:
df = pd.DataFrame(np.random.rand(10,4),columns=['a','b','c','d'])
#df.plot.bar()