# Pandas

Pandas uses a data type ``dataframe`` to represent a table, which is an enhanced version of NumPy array (usually 2D).<br>
<br>
Each column of a dataframe has an index (a string or an integer or other object)  <br>
Each row of a dataframe has an index (a string or an integer or other object)  <br>
<br>
Pandas has another data type  ``Series``  , which is an enhanced version of 1D NumPy array. <br>
Each element in a ``Series`` has an index (a string or an integer or other object)

In [1]:
import numpy as np
import pandas as pd

## Series Data Object in Pandas

In [2]:
A = pd.Series([0.1, 0.2, 0.3, 0.4])
A

0    0.1
1    0.2
2    0.3
3    0.4
dtype: float64

In [3]:
type(A)

pandas.core.series.Series

We can get an element of the ``Series A``  using its index

In [4]:
A[0]

0.1

We can get a sub-series using element indexes in a ``Series``

In [5]:
S = A[0:2] # type(s) is pandas.core.series.Series
S

0    0.1
1    0.2
dtype: float64

The sub-series `S` and the series `A` share data

In [6]:
S[0]=100
S

0    100.0
1      0.2
dtype: float64

In [7]:
A

0    100.0
1      0.2
2      0.3
3      0.4
dtype: float64

We can convert ``Series`` into a 1D NumPy array using the property ``values``

In [8]:
B = A.values # type(B) is numpy.ndarray
B

array([100. ,   0.2,   0.3,   0.4])

### ``Series`` is similar to Python Dictionary and NumPy array

Each element in a ``Series`` has an index (usually, a string-index or an integer-index) <br>
We can acess an element using the string-index : similar to Python Dictionary  <br>
We can acess an element using the integer-index : similar to NumPy Array

In [9]:
ser = pd.Series([0.1, 0.2, 0.3, 0.4], index=['a', 'b', 'c', 'd'])
ser

a    0.1
b    0.2
c    0.3
d    0.4
dtype: float64

In [10]:
# Get an element using the string-index
ser['b']

0.2

A ``Series`` has an attribute ``index``, which is an array-like object

In [11]:
ser.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [12]:
type(ser.index)

pandas.core.indexes.base.Index

In [13]:
ser.index[0]

'a'

In [14]:
ser = pd.Series([0.1, 0.2, 0.3, 0.4]) # we do not specify index here
ser
# it is the same as
#ser = pd.Series([0.1, 0.2, 0.3, 0.4], index = [0, 1, 2, 3]) 
# indexes are contiguous from 0

0    0.1
1    0.2
2    0.3
3    0.4
dtype: float64

In [15]:
ser[0:3]

0    0.1
1    0.2
2    0.3
dtype: float64

We can use non-contiguous indexes in a ``Series``

In [16]:
ser = pd.Series([0.1, 0.2, 0.3, 0.4], index=[-1, 100, 2, 3])
ser

-1      0.1
 100    0.2
 2      0.3
 3      0.4
dtype: float64

In [17]:
ser[-1] # it is not the last element

0.1

In [18]:
ser[100]

0.2

In [19]:
ser[-1:101] # this is weird

3    0.4
dtype: float64

In [20]:
ser[0:3] #get a sub-series

-1      0.1
 100    0.2
 2      0.3
dtype: float64

### Create a Series from a Python Dictionary

In [21]:
patient_info = {'Age': 20,
                'Blood_Type': 'O',
                'sex': 'M',
                'Address': 'Base0, Mars',
                'Phone': '001001001',
                'Diagnosis': 'bone fracture in foot'}
patient_info

{'Age': 20,
 'Blood_Type': 'O',
 'sex': 'M',
 'Address': 'Base0, Mars',
 'Phone': '001001001',
 'Diagnosis': 'bone fracture in foot'}

In [22]:
patient_info = pd.Series(patient_info)
print(patient_info)
print('type(patient_info) is', type(patient_info))

Age                              20
Blood_Type                        O
sex                               M
Address                 Base0, Mars
Phone                     001001001
Diagnosis     bone fracture in foot
dtype: object
type(patient_info) is <class 'pandas.core.series.Series'>


In [23]:
patient_info[4] # using the integer index

'001001001'

In [24]:
patient_info['Phone']  # using the string index

'001001001'

In [25]:
patient_info[0:4] # patient_info[4] is not included

Age                    20
Blood_Type              O
sex                     M
Address       Base0, Mars
dtype: object

``Series`` supports slicing using strings as the start index and the end index

In [26]:
patient_info['Age':'Phone']
# ['Phone'] is included
# it is inconsistent with the above integer-index notation

Age                    20
Blood_Type              O
sex                     M
Address       Base0, Mars
Phone           001001001
dtype: object

### delete an element from a Series

In [27]:
patient_info.drop('Age')
patient_info

Age                              20
Blood_Type                        O
sex                               M
Address                 Base0, Mars
Phone                     001001001
Diagnosis     bone fracture in foot
dtype: object

In [28]:
patient_info.drop('Age', inplace=True)
patient_info

Blood_Type                        O
sex                               M
Address                 Base0, Mars
Phone                     001001001
Diagnosis     bone fracture in foot
dtype: object

### Convert a Series to a Numpy Array using Series.values

In [29]:
patient_info.values

array(['O', 'M', 'Base0, Mars', '001001001', 'bone fracture in foot'],
      dtype=object)

## Dataframe in Pandas 
Dataframe represents a table <br>
The value of a table is a matrix (2D Numpy Array) <br>
Each row of the table has an index (e.g. a string-index or an integer-index) <br>
Each column of the table has an index (e.g. a string-index or an integer-index) <br>

In [30]:
Matrix = [[1, 2],
          [3, 4],
          [5, 6]]

In [31]:
df = pd.DataFrame(Matrix, 
                  columns=['ColumnA', 'ColumnB'],
                  index=['RowA', 'RowB', 'RowC'])
# columns ~ column indexes
# index ~ row indexes
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [32]:
type(df)

pandas.core.frame.DataFrame

In [33]:
df.columns # column indexes (column names)

Index(['ColumnA', 'ColumnB'], dtype='object')

In [34]:
df.index  # row indexes (row names)

Index(['RowA', 'RowB', 'RowC'], dtype='object')

In [35]:
# get the first column using the string index
df['ColumnA']

RowA    1
RowB    3
RowC    5
Name: ColumnA, dtype: int64

get a column as pandas series

In [36]:
type(df['ColumnA'])

pandas.core.series.Series

get a column as dataframe use double-square-brackets

In [37]:
df_a=df[['ColumnA']]
df_a

Unnamed: 0,ColumnA
RowA,1
RowB,3
RowC,5


In [38]:
type(df_a)

pandas.core.frame.DataFrame

In [39]:
# try to get the first row using the string index
#df['RowA'] # this does not work

Get the first row using ```df.iloc``` and integer-index

In [40]:
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [41]:
df.iloc[0,:]

ColumnA    1
ColumnB    2
Name: RowA, dtype: int64

In [42]:
type(df.iloc[0,:])

pandas.core.series.Series

Get the first column using df.iloc and integer-index

In [43]:
df.iloc[:,0]

RowA    1
RowB    3
RowC    5
Name: ColumnA, dtype: int64

In [44]:
type(df.iloc[:,0])

pandas.core.series.Series

get one element in the Dataframe using df.iloc with integer-indexes

In [45]:
df.iloc[0,1]

2

In [46]:
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [47]:
df.iloc[0,1]=100
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,100
RowB,3,4
RowC,5,6


In [48]:
df.iloc[:,0]=[10 ,2 ,1]
df

Unnamed: 0,ColumnA,ColumnB
RowA,10,100
RowB,2,4
RowC,1,6


In [49]:
df.iloc[0,:]=[0, 1]
df

Unnamed: 0,ColumnA,ColumnB
RowA,0,1
RowB,2,4
RowC,1,6


### Remove a colum/row from a Dataframe

In [50]:
Matrix = [[1, 2],
          [3, 4],
          [5, 6]]
df = pd.DataFrame(Matrix, 
                  columns=['ColumnA', 'ColumnB'],
                  index=['RowA', 'RowB', 'RowC'])
# columns ~ column indexes
# index ~ row indexes
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [51]:
#remove ColumnB = select ColumnA
df[['ColumnA']]

Unnamed: 0,ColumnA
RowA,1
RowB,3
RowC,5


In [52]:
df_new = df.drop('ColumnB', axis=1)
df_new

Unnamed: 0,ColumnA
RowA,1
RowB,3
RowC,5


In [53]:
df_new = df.drop(['ColumnA', 'ColumnB'], axis=1)
df_new

RowA
RowB
RowC


In [54]:
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [55]:
df_new = df.drop('RowB', axis=0)
df_new

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowC,5,6


In [56]:
df_new = df.drop(['RowA','RowB'], axis=0)
df_new

Unnamed: 0,ColumnA,ColumnB
RowC,5,6


In [57]:
#df_new=df.drop,  df is not changed
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [58]:
df_new=df.copy()
df_new

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [59]:
#set inplace to True, df_new will be modified
df_new.drop(["ColumnB"], axis=1, inplace=True) 
df_new

Unnamed: 0,ColumnA
RowA,1
RowB,3
RowC,5


### Add a new column to a Dataframe

In [60]:
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [61]:
df['ColumnC']=[7,8,9]
df

Unnamed: 0,ColumnA,ColumnB,ColumnC
RowA,1,2,7
RowB,3,4,8
RowC,5,6,9


### Add a new row to a Dataframe

In [62]:
df.index

Index(['RowA', 'RowB', 'RowC'], dtype='object')

In [63]:
len(df.index)

3

In [64]:
df.loc[len(df.index)] = [-1, -2, -3] 
df

Unnamed: 0,ColumnA,ColumnB,ColumnC
RowA,1,2,7
RowB,3,4,8
RowC,5,6,9
3,-1,-2,-3


In [65]:
df.index

Index(['RowA', 'RowB', 'RowC', 3], dtype='object')

In [66]:
#df.index[-1]="RowD"  # not working
df.index=['RowA', 'RowB', 'RowC', "RowD"]
df

Unnamed: 0,ColumnA,ColumnB,ColumnC
RowA,1,2,7
RowB,3,4,8
RowC,5,6,9
RowD,-1,-2,-3


In [67]:
df.index

Index(['RowA', 'RowB', 'RowC', 'RowD'], dtype='object')

In [68]:
#change column names
df.columns=["ColA", "ColB", "ColC"]
df

Unnamed: 0,ColA,ColB,ColC
RowA,1,2,7
RowB,3,4,8
RowC,5,6,9
RowD,-1,-2,-3


In [69]:
df.columns

Index(['ColA', 'ColB', 'ColC'], dtype='object')

### Convert a Dataframe to a Numpy Array using ``Dataframe.values``

In [70]:
A = df.values
A

array([[ 1,  2,  7],
       [ 3,  4,  8],
       [ 5,  6,  9],
       [-1, -2, -3]], dtype=int64)

In [71]:
type(A)

numpy.ndarray

In [72]:
A.shape

(4, 3)

### combine two dataframes into one dataframe if they have the same columns

In [73]:
df1=df.copy()
df1

Unnamed: 0,ColA,ColB,ColC
RowA,1,2,7
RowB,3,4,8
RowC,5,6,9
RowD,-1,-2,-3


In [74]:
df2=df.copy()
df2.index=["RowE", "RowF", "RowG", "RowH"]
df2

Unnamed: 0,ColA,ColB,ColC
RowE,1,2,7
RowF,3,4,8
RowG,5,6,9
RowH,-1,-2,-3


In [75]:
df12=pd.concat([df1, df2], axis=0)
df12

Unnamed: 0,ColA,ColB,ColC
RowA,1,2,7
RowB,3,4,8
RowC,5,6,9
RowD,-1,-2,-3
RowE,1,2,7
RowF,3,4,8
RowG,5,6,9
RowH,-1,-2,-3
