# Pandas

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with
“relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.

##### Here are just a few of things that pandas does well:

• Easy handling of missing data (represented as NaN) in floating point as well as non-floating point data
• Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects
• Intelligent label-based slicing, fancy indexing, and subsetting of large data sets
• Intuitive merging and joining data sets
• Flexible reshaping and pivoting of data sets


## Pandas DataStructures

- Series 
- DataFrame
- Panel

### Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers,
Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is
to call:
>>> s = pd.Series(data, index=index)

A series can be created using various inputs like −

Array
Dict
constant


In [2]:
import pandas as pd
data = ['Anil','Basanti','Charan','Dolly']
s=pd.Series(data)
print(s)

0       Anil
1    Basanti
2     Charan
3      Dolly
dtype: object


In [3]:
s=pd.Series(data, index=['Row1','Row2','Row3','Row4'])
s

Row1       Anil
Row2    Basanti
Row3     Charan
Row4      Dolly
dtype: object

In [4]:
data = {
    'Row1':'Anil',
    'Row2':'Basanti',
    'Row3':'Charan',
    'Row4':'Dolly'
}
s=pd.Series(data)
s

Row1       Anil
Row2    Basanti
Row3     Charan
Row4      Dolly
dtype: object

In [4]:
s=pd.Series(data, index=['Row3','Row2','Row4','Row1'])
s

Row3       Anil
Row2    Basanti
Row4     Charan
Row1      Dolly
dtype: object

In [5]:
s=pd.Series(data, index=['Row3','Row2','Row4','Row1','Row5'])
s

ValueError: Length of passed values is 4, index implies 5

#### Accessing series data using position

In [6]:
s[0]

'Anil'

In [8]:
s[4]

nan

In [10]:
s[[1,4]]

Row2    Basanti
Row5        NaN
dtype: object

In [7]:
s[0:3]

Row3       Anil
Row2    Basanti
Row4     Charan
dtype: object

In [8]:
s[5]

IndexError: index out of bounds

#### Accessing series data using label

In [19]:
s['Row2']

'Basanti'

In [20]:
s[['Row1','Row3']]

Row1      Anil
Row3    Charan
dtype: object


### What is DataFrame?

  A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

### Features of DataFrame
  - Potentially columns are of different types
  - Size – Mutable
  - Labeled axes (rows and columns)
  - Can Perform Arithmetic operations on rows and columns
  
  
#### Creating a Pandas DataFrame

  
  Pandas DataFrame can be created from various types of data such as
  
  - Lists
  - List of Lists
  - Dictionary
  - List of Dictionary
  - Dict of Series
  - CSV File
  - Excel File
  
##### Creating a dataframe using List: 

  DataFrame can be created using a single list or a list of lists. 

In [2]:
import pandas as pd

lst = ['Anil','Basanti','Charan','Dolly']

df = pd.DataFrame(lst)
print(df)

         0
0     Anil
1  Basanti
2   Charan
3    Dolly


In [3]:
import pandas as pd

lst = ['Anil','Basanti','Charan','Dolly']

df = pd.DataFrame(lst, columns=['StudentName'])
print(df)

  StudentName
0        Anil
1     Basanti
2      Charan
3       Dolly


In [4]:
import pandas as pd

lst = ['Anil','Basanti','Charan','Dolly']

df = pd.DataFrame(lst, columns=['StudentName'], index=['Row1','Row2','Row3','Row4'])
print(df)

     StudentName
Row1        Anil
Row2     Basanti
Row3      Charan
Row4       Dolly


### Creating DataFrame using list of lists

In [9]:
import pandas as pd

lst = [['Anil', 98],['Basanti',80],['Charan',99],['Dolly',100]]

df = pd.DataFrame(lst, columns=['StudentName','Marks'])
print(df)

  StudentName  Marks
0        Anil     98
1     Basanti     80
2      Charan     99
3       Dolly    100


### Create a DataFrame from Dict of ndarrays / Lists
  All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array length.

In [26]:
import pandas as pd
data = {
        'studentName':['Anil','Basanti','Charan','Dolly'],
        'Marks':[98,80,90,100]
       }
df = pd.DataFrame(data)
df

Unnamed: 0,studentName,Marks
0,Anil,98
1,Basanti,80
2,Charan,90
3,Dolly,100


### Creating DataFrame using List of Dictionary

In [29]:
import pandas as pd

lst = [
        {'Marks':98,'StudentName':'Anil'},
        {'StudentName':'Basanti','Marks':80},
        {'Marks':90,'StudentName':'Charan'},
        {'Marks':100,'StudentName':'Dolly'}
      ]

df = pd.DataFrame(lst)
print(df)

   Marks StudentName
0     98        Anil
1     80     Basanti
2     90      Charan
3    100       Dolly


### Creating DataFrame using Dict of Series

In [13]:
import pandas as pd

data = {
    'StudentName':pd.Series(['Anil','Basanti','Charan','Dolly']),
    'Marks':pd.Series([98,80,90,100])
       }
df=pd.DataFrame(data)
df


Unnamed: 0,StudentName,Marks
0,Anil,98
1,Basanti,80
2,Charan,90
3,Dolly,100


### creating a DataFrame from CSV file

In [19]:
import pandas as pd
df = pd.read_csv(r"C:\Users\maddir\Desktop\Python_Training\Pandas\nba.csv")
print(df)

                        Name                    Team  Number Position   Age  \
0              Avery Bradley          Boston Celtics     0.0       PG  25.0   
1                Jae Crowder          Boston Celtics    99.0       SF  25.0   
2               John Holland          Boston Celtics    30.0       SG  27.0   
3                R.J. Hunter          Boston Celtics    28.0       SG  22.0   
4              Jonas Jerebko          Boston Celtics     8.0       PF  29.0   
5               Amir Johnson          Boston Celtics    90.0       PF  29.0   
6              Jordan Mickey          Boston Celtics    55.0       PF  21.0   
7               Kelly Olynyk          Boston Celtics    41.0        C  25.0   
8               Terry Rozier          Boston Celtics    12.0       PG  22.0   
9               Marcus Smart          Boston Celtics    36.0       PG  22.0   
10           Jared Sullinger          Boston Celtics     7.0        C  24.0   
11             Isaiah Thomas          Boston Celtics

### Creating a DataFrame from an Excel file

In [21]:
import pandas as pd

df = pd.read_excel(r'C:\Users\maddir\Desktop\Python_Training\Pandas\Data - Single Worksheet.xlsx')
print(df)

  First Name Last Name           City Gender
0    Brandon     James          Miami      M
1       Sean   Hawkins         Denver      M
2       Judy       Day    Los Angeles      F
3     Ashley      Ruiz  San Francisco      F
4  Stephanie     Gomez       Portland      F


## Basic Functionality of DataFrame

#### Head()

In [23]:
nba = pd.read_csv(r"C:\Users\maddir\Desktop\Python_Training\Pandas\nba.csv")
nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


In [24]:
nba.head(1)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0


#### tail()

In [25]:
nba.tail()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
457,,,,,,,,,


In [28]:
import pandas as pd

lst = [['Anil', 98],['Basanti',80],['Charan',99],['Dolly',100]]

df = pd.DataFrame(lst, columns=['StudentName','Marks'])
df

Unnamed: 0,StudentName,Marks
0,Anil,98
1,Basanti,80
2,Charan,99
3,Dolly,100


#### DataFrame Transpose

In [37]:
df.T

Unnamed: 0,0,1,2,3
StudentName,Anil,Basanti,Charan,Dolly
Marks,98,80,99,100


#### index

In [29]:
df.index

RangeIndex(start=0, stop=4, step=1)

#### columns

In [36]:
df.columns

Index(['Marks', 'StudentName'], dtype='object')

#### axes

In [37]:
df.axes

[RangeIndex(start=0, stop=4, step=1),
 Index(['Marks', 'StudentName'], dtype='object')]

#### values

In [30]:
df.values

array([['Anil', 98],
       ['Basanti', 80],
       ['Charan', 99],
       ['Dolly', 100]], dtype=object)

#### Shape

In [31]:
df.shape

(4, 2)

#### dtypes

In [32]:
df.dtypes

StudentName    object
Marks           int64
dtype: object

#### info

In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
StudentName    4 non-null object
Marks          4 non-null int64
dtypes: int64(1), object(1)
memory usage: 88.0+ bytes


## Select One Column from a `DataFrame`

In [40]:
import pandas as pd
data=[['Anil',90,'A','Very Good'],['Basanti',89,'A-','Good'],['Charan',70,'B-','Average'],['Dolly',40,'D+','Below_Avereage'],
      ['Emily',68,'B-','Average'],['Fani',99,'A+','Awesome'],['goutham',29,'F','Bad'],['Hannah',55,'C+','Below_Average']]
df=pd.DataFrame(data,columns=['StudentName','Marks','Grade','Remarks'])
df

Unnamed: 0,StudentName,Marks,Grade,Remarks
0,Anil,90,A,Very Good
1,Basanti,89,A-,Good
2,Charan,70,B-,Average
3,Dolly,40,D+,Below_Avereage
4,Emily,68,B-,Average
5,Fani,99,A+,Awesome
6,goutham,29,F,Bad
7,Hannah,55,C+,Below_Average


In [42]:
df['StudentName']

0       Anil
1    Basanti
2     Charan
3      Dolly
4      Emily
5       Fani
6    goutham
7     Hannah
Name: StudentName, dtype: object

In [43]:
df['Grade']

0     A
1    A-
2    B-
3    D+
4    B-
5    A+
6     F
7    C+
Name: Grade, dtype: object

### Selecting multiple columns of a DataFrame

In [52]:
df[['StudentName','Grade']]

Unnamed: 0,StudentName,Grade
0,Anil,A
1,Basanti,A-
2,Charan,B-
3,Dolly,D+
4,Emily,B-
5,Fani,A+
6,goutham,F
7,Hannah,C+


In [63]:
lst_colmns = ['Marks','Remarks']
df[lst_colmns]

Unnamed: 0,Marks,Remarks
0,90,Very Good
1,89,Good
2,70,Average
3,40,Below_Avereage
4,68,Average
5,99,Awesome
6,29,Bad
7,55,Below_Average


In [41]:
df[df.columns[0:3]]

Unnamed: 0,Marks,StudentName
0,98,Anil
1,80,Basanti
2,90,Charan
3,100,Dolly


In [55]:
df[df.columns[0:2]]

Unnamed: 0,StudentName,Marks
0,Anil,90
1,Basanti,89
2,Charan,70
3,Dolly,40
4,Emily,68
5,Fani,99
6,goutham,29
7,Hannah,55


### selecting rows in a DataFrame

In [56]:
df[df.columns[0:2]].head(3)

Unnamed: 0,StudentName,Marks
0,Anil,90
1,Basanti,89
2,Charan,70


In [57]:
df[df.columns[0:2]].tail(3)

Unnamed: 0,StudentName,Marks
5,Fani,99
6,goutham,29
7,Hannah,55


In [58]:
df[df.columns[0:2]][3:5]

Unnamed: 0,StudentName,Marks
3,Dolly,40
4,Emily,68


In [62]:
df[2:7]

Unnamed: 0,StudentName,Marks,Grade,Remarks
2,Charan,70,B-,Average
3,Dolly,40,D+,Below_Avereage
4,Emily,68,B-,Average
5,Fani,99,A+,Awesome
6,goutham,29,F,Bad


#### .loc()

In [32]:
df.loc[0:5]

Unnamed: 0,StudentName,Marks,Grade,Remarks,Branch,University,City,Sex
0,Anil,90,A,Very Good,ECE,IIPL,Hyderabad,M
1,Basanti,89,A-,Good,CSE,IIPL,Hyderabad,F
2,Charan,70,B-,Average,MECH,IIPL,Hyderabad,M
3,Dolly,40,D+,Below_Avereage,CIVIL,IIPL,Hyderabad,F
4,Emily,68,B-,Average,EEE,IIPL,Hyderabad,F
5,Fani,99,A+,Awesome,IT,IIPL,Hyderabad,M


In [31]:
df.loc[[3,7]]

Unnamed: 0,StudentName,Marks,Grade,Remarks,Branch,University,City,Sex
3,Dolly,40,D+,Below_Avereage,CIVIL,IIPL,Hyderabad,F
7,Hannah,55,C+,Below_Average,ECE,IIPL,Hyderabad,F


In [33]:
df.loc[[3,7],['StudentName','City']]

Unnamed: 0,StudentName,City
3,Dolly,Hyderabad
7,Hannah,Hyderabad


In [34]:
df.loc[[3],['Branch']]

Unnamed: 0,Branch
3,CIVIL
