# Pandas

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with
“relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.


## Pandas DataStructures

- Series 
- DataFrame
- Panel

### Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers,
Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is
to call:
>>> s = pd.Series(data, index=index)

A series can be created using various inputs like −

- List
- Dict
- constant


In [3]:
import pandas as pd
data = ['Anil','Basanti','Charan','Dolly']
s=pd.Series(data)
print(s)

0       Anil
1    Basanti
2     Charan
3      Dolly
dtype: object


In [4]:
s=pd.Series(data, index=['Row1','Row2','Row3','Row4'])
s

Row1       Anil
Row2    Basanti
Row3     Charan
Row4      Dolly
dtype: object

In [5]:
data = {
    'Row1':'Anil',
    'Row2':'Basanti',
    'Row3':'Charan',
    'Row4':'Dolly'
}
s=pd.Series(data)
s

Row1       Anil
Row2    Basanti
Row3     Charan
Row4      Dolly
dtype: object

In [6]:
s=pd.Series(data, index=['Row3','Row2','Row4','Row1'])
s

Row3     Charan
Row2    Basanti
Row4      Dolly
Row1       Anil
dtype: object

In [8]:
s=pd.Series(5, index=['Row1','Row2','Row3','Row4','Row5'])
s

Row1    5
Row2    5
Row3    5
Row4    5
Row5    5
dtype: int64

#### Accessing series data using position

In [14]:
data = ['Anil','Basanti','Charan','Dolly']
s=pd.Series(data)
print(s)

0       Anil
1    Basanti
2     Charan
3      Dolly
dtype: object


In [9]:
s[0]

5

In [10]:
s[4]

5

In [11]:
s[[1,4]]

Row2    5
Row5    5
dtype: int64

In [12]:
s[0:3]

Row1    5
Row2    5
Row3    5
dtype: int64

#### Accessing series data using label

In [15]:
data = ['Anil','Basanti','Charan','Dolly']
s=pd.Series(data,index=['Row1','Row2','Row3','Row4'])
print(s)

Row1       Anil
Row2    Basanti
Row3     Charan
Row4      Dolly
dtype: object


In [16]:
s['Row2']

'Basanti'

In [17]:
s[['Row1','Row3']]

Row1      Anil
Row3    Charan
dtype: object


### What is DataFrame?

  A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

### Features of DataFrame
  - Potentially columns are of different types
  - Size – Mutable
  - Labeled axes (rows and columns)
  - Can Perform Arithmetic operations on rows and columns
  
  
#### Creating a Pandas DataFrame

  
  Pandas DataFrame can be created from various types of data such as
  
  - Lists
  - List of Lists
  - Dictionary
  - List of Dictionary
  - Dict of Series
  - CSV File
  - Excel File
  
##### Creating a dataframe using List: 

  DataFrame can be created using a single list or a list of lists. 

In [18]:
import pandas as pd

lst = ['Anil','Basanti','Charan','Dolly']

df = pd.DataFrame(lst)
print(df)

         0
0     Anil
1  Basanti
2   Charan
3    Dolly


In [19]:
import pandas as pd

lst = ['Anil','Basanti','Charan','Dolly']

df = pd.DataFrame(lst, columns=['StudentName'])
print(df)

  StudentName
0        Anil
1     Basanti
2      Charan
3       Dolly


In [39]:
import pandas as pd

lst = ['Anil','Basanti','Charan','Dolly']

df = pd.DataFrame(lst, columns=['StudentName'], index=['Row1','Row2','Row3','Row4'])
print(df)

     StudentName
Row1        Anil
Row2     Basanti
Row3      Charan
Row4       Dolly


Index(['Row1', 'Row2', 'Row3', 'Row4'], dtype='object')

### Creating DataFrame using list of lists

In [23]:
import pandas as pd

lst = [['Anil', 98],['Basanti',80],['Charan',99],['Dolly',100]]

df = pd.DataFrame(lst, columns=['StudentName','Marks'])
print(df)

  StudentName  Marks
0        Anil     98
1     Basanti     80
2      Charan     99
3       Dolly    100


### Create a DataFrame from Dict of ndarrays / Lists
  All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array length.

In [24]:
import pandas as pd
data = {
        'studentName':['Anil','Basanti','Charan','Dolly'],
        'Marks':[98,80,90,100]
       }
df = pd.DataFrame(data)
df

Unnamed: 0,studentName,Marks
0,Anil,98
1,Basanti,80
2,Charan,90
3,Dolly,100


### Creating DataFrame using List of Dictionary

In [25]:
import pandas as pd

lst = [
        {'Marks':98,'StudentName':'Anil'},
        {'StudentName':'Basanti','Marks':80},
        {'Marks':90,'StudentName':'Charan'},
        {'Marks':100,'StudentName':'Dolly'}
      ]

df = pd.DataFrame(lst)
print(df)

   Marks StudentName
0     98        Anil
1     80     Basanti
2     90      Charan
3    100       Dolly


### Creating DataFrame using Dict of Series

In [26]:
import pandas as pd

data = {
    'StudentName':pd.Series(['Anil','Basanti','Charan','Dolly']),
    'Marks':pd.Series([98,80,90,100])
       }
df=pd.DataFrame(data)
df


Unnamed: 0,StudentName,Marks
0,Anil,98
1,Basanti,80
2,Charan,90
3,Dolly,100


### creating a DataFrame from CSV file

In [27]:
import pandas as pd
df = pd.read_csv(r"nba.csv")
print(df)

                        Name                    Team  Number Position   Age  \
0              Avery Bradley          Boston Celtics     0.0       PG  25.0   
1                Jae Crowder          Boston Celtics    99.0       SF  25.0   
2               John Holland          Boston Celtics    30.0       SG  27.0   
3                R.J. Hunter          Boston Celtics    28.0       SG  22.0   
4              Jonas Jerebko          Boston Celtics     8.0       PF  29.0   
5               Amir Johnson          Boston Celtics    90.0       PF  29.0   
6              Jordan Mickey          Boston Celtics    55.0       PF  21.0   
7               Kelly Olynyk          Boston Celtics    41.0        C  25.0   
8               Terry Rozier          Boston Celtics    12.0       PG  22.0   
9               Marcus Smart          Boston Celtics    36.0       PG  22.0   
10           Jared Sullinger          Boston Celtics     7.0        C  24.0   
11             Isaiah Thomas          Boston Celtics

### Creating a DataFrame from an Excel file

In [None]:
pip install xlrd

In [31]:
import pandas as pd

df = pd.read_excel(r"Data - Single Worksheet.xlsx")
print(df)

ImportError: Install xlrd >= 1.0.0 for Excel support

## Basic Functionality of DataFrame

#### Head()

In [33]:
df = pd.read_csv(r"nba.csv")
df.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


In [34]:
df.head(10)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0
6,Jordan Mickey,Boston Celtics,55.0,PF,21.0,6-8,235.0,LSU,1170960.0
7,Kelly Olynyk,Boston Celtics,41.0,C,25.0,7-0,238.0,Gonzaga,2165160.0
8,Terry Rozier,Boston Celtics,12.0,PG,22.0,6-2,190.0,Louisville,1824360.0
9,Marcus Smart,Boston Celtics,36.0,PG,22.0,6-4,220.0,Oklahoma State,3431040.0


#### tail()

In [35]:
df.tail()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
457,,,,,,,,,


In [40]:
import pandas as pd

lst = [['Anil', 98],['Basanti',80],['Charan',99],['Dolly',100]]

df = pd.DataFrame(lst, columns=['StudentName','Marks'], index=['Row1','Row2','Row3','Row4'])
df

Unnamed: 0,StudentName,Marks
Row1,Anil,98
Row2,Basanti,80
Row3,Charan,99
Row4,Dolly,100


#### DataFrame Transpose

In [41]:
df.T

Unnamed: 0,Row1,Row2,Row3,Row4
StudentName,Anil,Basanti,Charan,Dolly
Marks,98,80,99,100


#### index

In [42]:
df.index

Index(['Row1', 'Row2', 'Row3', 'Row4'], dtype='object')

#### columns

In [43]:
df.columns

Index(['StudentName', 'Marks'], dtype='object')

#### axes

In [44]:
df.axes

[Index(['Row1', 'Row2', 'Row3', 'Row4'], dtype='object'),
 Index(['StudentName', 'Marks'], dtype='object')]

#### values

In [45]:
df.values

array([['Anil', 98],
       ['Basanti', 80],
       ['Charan', 99],
       ['Dolly', 100]], dtype=object)

#### Shape

In [46]:
df.shape

(4, 2)

#### dtypes

In [47]:
df.dtypes

StudentName    object
Marks           int64
dtype: object

#### info

In [48]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, Row1 to Row4
Data columns (total 2 columns):
StudentName    4 non-null object
Marks          4 non-null int64
dtypes: int64(1), object(1)
memory usage: 256.0+ bytes


## Select One Column from a `DataFrame`

In [60]:
import pandas as pd
data=[['Anil',90,'A','Very Good'],['Basanti',89,'A-','Good'],['Charan',70,'B-','Average'],['Dolly',40,'D+','Below_Avereage'],
      ['Emily',68,'B-','Average'],['Fani',99,'A+','Awesome'],['goutham',29,'F','Bad'],['Hannah',55,'C+','Below_Average']]
df=pd.DataFrame(data,columns=['StudentName','Marks','Grade','Remarks'],index=['R1','R2','R3','R4','R5','R6','R7','R8'])
df

Unnamed: 0,StudentName,Marks,Grade,Remarks
R1,Anil,90,A,Very Good
R2,Basanti,89,A-,Good
R3,Charan,70,B-,Average
R4,Dolly,40,D+,Below_Avereage
R5,Emily,68,B-,Average
R6,Fani,99,A+,Awesome
R7,goutham,29,F,Bad
R8,Hannah,55,C+,Below_Average


In [50]:
df['StudentName']

0       Anil
1    Basanti
2     Charan
3      Dolly
4      Emily
5       Fani
6    goutham
7     Hannah
Name: StudentName, dtype: object

In [51]:
df['Grade']

0     A
1    A-
2    B-
3    D+
4    B-
5    A+
6     F
7    C+
Name: Grade, dtype: object

### Selecting multiple columns of a DataFrame

In [52]:
df[['StudentName','Grade']]

Unnamed: 0,StudentName,Grade
0,Anil,A
1,Basanti,A-
2,Charan,B-
3,Dolly,D+
4,Emily,B-
5,Fani,A+
6,goutham,F
7,Hannah,C+


In [53]:
lst_colmns = ['Marks','Remarks']
df[lst_colmns]

Unnamed: 0,Marks,Remarks
0,90,Very Good
1,89,Good
2,70,Average
3,40,Below_Avereage
4,68,Average
5,99,Awesome
6,29,Bad
7,55,Below_Average


In [54]:
df[df.columns[0:3]]

Unnamed: 0,StudentName,Marks,Grade
0,Anil,90,A
1,Basanti,89,A-
2,Charan,70,B-
3,Dolly,40,D+
4,Emily,68,B-
5,Fani,99,A+
6,goutham,29,F
7,Hannah,55,C+


### selecting rows in a DataFrame

#### loc()

In [64]:
df.loc['R1']

StudentName         Anil
Marks                 90
Grade                  A
Remarks        Very Good
Name: R1, dtype: object

In [65]:
df.loc[['R1','R5','R7']]

Unnamed: 0,StudentName,Marks,Grade,Remarks
R1,Anil,90,A,Very Good
R5,Emily,68,B-,Average
R7,goutham,29,F,Bad


In [66]:
df.loc['R1':'R5']

Unnamed: 0,StudentName,Marks,Grade,Remarks
R1,Anil,90,A,Very Good
R2,Basanti,89,A-,Good
R3,Charan,70,B-,Average
R4,Dolly,40,D+,Below_Avereage
R5,Emily,68,B-,Average


In [67]:
df.loc['R1','StudentName']

'Anil'

In [72]:
df.loc['R1':'R3', 'Marks':'Remarks']

Unnamed: 0,Marks,Grade,Remarks
R1,90,A,Very Good
R2,89,A-,Good
R3,70,B-,Average


#### .iloc()

In [73]:
df.iloc[0]

StudentName         Anil
Marks                 90
Grade                  A
Remarks        Very Good
Name: R1, dtype: object

In [74]:
df.iloc[[3,7]]

Unnamed: 0,StudentName,Marks,Grade,Remarks
R4,Dolly,40,D+,Below_Avereage
R8,Hannah,55,C+,Below_Average


In [78]:
df.iloc[[3,7],[0,2]]

Unnamed: 0,StudentName,Grade
R4,Dolly,D+
R8,Hannah,C+


In [81]:
df.iloc[0,0]

'Anil'