# Pandas

Pandas uses a data type ``dataframe`` to represent a table, which is an enhanced version of NumPy array (usually 2D).<br>
<br>
Each column of a dataframe has an index (a string or an integer or other object)  <br>
Each row of a dataframe has an index (a string or an integer or other object)  <br>
<br>
Pandas has another data type  ``Series``  , which is an enhanced version of 1D NumPy array. <br>
Each element in a ``Series`` has an index (a string or an integer or other object)

In [4]:
import numpy as np
import pandas as pd

## Series Data Object in Pandas

In [2]:
data = pd.Series([0.1, 0.2, 0.3, 0.4])
data

0    0.1
1    0.2
2    0.3
3    0.4
dtype: float64

In [3]:
type(data)

pandas.core.series.Series

We can get an element of a ``Series``  using its index

In [4]:
data[0]

0.1

We can get a sub-series using element indexes in a ``Series``

In [5]:
a = data[0:2] # type(a) is pandas.core.series.Series
a

0    0.1
1    0.2
dtype: float64

We can convert ``Series`` into a 1D NumPy array using the function/method ``.values``

In [6]:
a = data.values # type(a) is numpy.ndarray
a

array([0.1, 0.2, 0.3, 0.4])

### ``Series`` is similar to Python Dictionary and NumPy array

Each element in a ``Series`` has an index (usually, a string-index or an integer-index) <br>
We can acess an element using the integer-index : similar to NumPy Array <br>
We can acess an element using the string-index : similar to Python Dictionary

In [7]:
ser = pd.Series([0.1, 0.2, 0.3, 0.4], index=['a', 'b', 'c', 'd'])
ser

a    0.1
b    0.2
c    0.3
d    0.4
dtype: float64

In [8]:
# Get an element using the string-index
ser['b']

0.2

A ``Series`` has an attribute ``index``, which is an array-like object

In [9]:
ser.index

Index(['a', 'b', 'c', 'd'], dtype='object')

In [10]:
type(ser.index)

pandas.core.indexes.base.Index

In [11]:
ser.index[0]

'a'

We can use non-contiguous indexes in a ``Series``

In [12]:
ser = pd.Series([0.1, 0.2, 0.3, 0.4], index=[-1, 100, 2, 3])
ser

-1      0.1
 100    0.2
 2      0.3
 3      0.4
dtype: float64

In [13]:
ser[-1] # it is not the last element

0.1

In [14]:
ser[-1:101] # this is weird, do not use this notation to get a sub-series

3    0.4
dtype: float64

In [15]:
ser1 = pd.Series([0.1, 0.2, 0.3, 0.4]) # we do not specify index here
# it is the same as
ser2 = pd.Series([0.1, 0.2, 0.3, 0.4], index = [0, 1, 2, 3]) # indexes are contiguous from 0

In [16]:
ser1

0    0.1
1    0.2
2    0.3
3    0.4
dtype: float64

In [17]:
ser2

0    0.1
1    0.2
2    0.3
3    0.4
dtype: float64

### Create a Series from a Python Dictionary

In [18]:
patient_info = {'Age': 20,
                'Blood_Type': 'O',
                'sex': 'M',
                'Address': 'Base0, Mars',
                'Phone': '001001001',
                'Diagnosis': 'bone fracture in foot'}
patient_info = pd.Series(patient_info)
print(patient_info)
print('type(patient_info) is', type(patient_info))

Age                              20
Blood_Type                        O
sex                               M
Address                 Base0, Mars
Phone                     001001001
Diagnosis     bone fracture in foot
dtype: object
type(patient_info) is <class 'pandas.core.series.Series'>


In [19]:
patient_info[4] # using the integer index

'001001001'

In [20]:
patient_info['Phone']  # using the string index

'001001001'

In [21]:
patient_info[0:4] # patient_info[4] is not included

Age                    20
Blood_Type              O
sex                     M
Address       Base0, Mars
dtype: object

``Series`` supports slicing using strings as the start index and the end index

In [22]:
patient_info['Age':'Phone']
# ['Phone'] is included
# it is inconsistent with the above integer-index notation

Age                    20
Blood_Type              O
sex                     M
Address       Base0, Mars
Phone           001001001
dtype: object

### Convert a Series to a Numpy Array using Series.values

In [23]:
patient_info.values

array([20, 'O', 'M', 'Base0, Mars', '001001001', 'bone fracture in foot'],
      dtype=object)

## Dataframe Object in Pandas 
Dataframe represents a table <br>
The value of a table is a matrix (2D Numpy Array) <br>
Each row of the table has an index (e.g. a string-index or an integer-index) <br>
Each column of the table has an index (e.g. a string-index or an integer-index) <br>

In [5]:
Matrix = [[1, 2],
          [3, 4],
          [5, 6]]

In [6]:
df = pd.DataFrame(Matrix, 
                  columns=['ColumnA', 'ColumnB'],
                  index=['RowA', 'RowB', 'RowC'])
# columns ~ column indexes
# index ~ row indexes
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [26]:
type(df)


pandas.core.frame.DataFrame

In [27]:
df.columns # column indexes

Index(['ColumnA', 'ColumnB'], dtype='object')

In [28]:
df.index  # row indexes

Index(['RowA', 'RowB', 'RowC'], dtype='object')

In [29]:
# get the first column using the string index
df['ColumnA']

RowA    1
RowB    3
RowC    5
Name: ColumnA, dtype: int64

In [30]:
# try to get the first row using the string index
df['RowA'] # this does not work

KeyError: 'RowA'

Get the first row using ```df.iloc``` and integer-index

In [26]:
fifa_attack = fifa_clean.iloc[(fifa_clean.Position2 == 'A').values,:]

Unnamed: 0,ColumnA,ColumnB
RowA,1,2


In [28]:
type(df.iloc[0,:])

pandas.core.series.Series

Get the first column using df.iloc and integer-index

In [24]:
df.iloc[:,0]

RowA    1
RowB    3
RowC    5
Name: ColumnA, dtype: int64

In [14]:
type(df.iloc[:,0])

pandas.core.series.Series

get one element in the Dataframe using df.iloc with integer-indexes

In [15]:
df.iloc[0,1]

2

In [16]:
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


### Remove a colum/row from a Dataframe

In [24]:
Matrix = [[1, 2],
          [3, 4],
          [5, 6]]
df = pd.DataFrame(Matrix, 
                  columns=['ColumnA', 'ColumnB'],
                  index=['RowA', 'RowB', 'RowC'])
# columns ~ column indexes
# index ~ row indexes
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [18]:
df[['ColumnA']]

Unnamed: 0,ColumnA
RowA,1
RowB,3
RowC,5


In [25]:
df_new = df.drop('RowA', axis=0)
df_new

Unnamed: 0,ColumnA,ColumnB
RowB,3,4
RowC,5,6


In [20]:
df_new = df.drop(['RowA','RowB'], axis=0)
df_new

Unnamed: 0,ColumnA,ColumnB
RowC,5,6


In [26]:
df

Unnamed: 0,ColumnA,ColumnB
RowA,1,2
RowB,3,4
RowC,5,6


In [22]:
df_new = df.drop('ColumnA', axis=1)
df_new

Unnamed: 0,ColumnB
RowA,2
RowB,4
RowC,6


In [27]:
##df_new = df.drop(['ColumnA', 'ColumnB'], axis=1)
df_new

Unnamed: 0,ColumnA,ColumnB
RowB,3,4
RowC,5,6


### Convert a Dataframe to a Numpy Array using ``Dataframe.values``

In [28]:
A = df.values
A

array([[1, 2],
       [3, 4],
       [5, 6]])

In [29]:
type(A)

numpy.ndarray

In [30]:
A.shape

(3, 2)

In [31]:
import numpy as np
import pandas as pd

In [32]:
json_data = pd.read_json('sample-json-file.json', orient="index")
json_data

Unnamed: 0,0
Address,"{'Permanent address': 'USA', 'current Address'..."
Boolean,True
Mobile,12345678
Name,Test
Pets,"[Dog, cat]"


In [33]:
json_data = pd.read_json('sample-json-file-2.json', orient="index")
json_data

Unnamed: 0,0,1,2,3,4
users,"{'userId': 1, 'firstName': 'Krish', 'lastName'...","{'userId': 2, 'firstName': 'racks', 'lastName'...","{'userId': 3, 'firstName': 'denial', 'lastName...","{'userId': 4, 'firstName': 'devid', 'lastName'...","{'userId': 5, 'firstName': 'jone', 'lastName':..."


In [34]:
##We can get an element of a ``Series``  using its index
json_data[0]

users    {'userId': 1, 'firstName': 'Krish', 'lastName'...
Name: 0, dtype: object

In [35]:
##We can get a sub-series using element indexes in a ``Series``
a = json_data[0:2] # type(a) is pandas.core.series.Series

In [36]:
##We can convert ``Series`` into a 1D NumPy array using the function/method ``.values``
a = json_data.values # type(a) is numpy.ndarray

In [None]:
##more info on reading in other files with Pandas
##https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html