# Pandas DataFrame
 It contains data structures and data manipulation tools designed to make data cleaning
and analysis fast and easy in Python. 

## Pandas DataStructures

- Series
- DataFrame
- Panel

### Series
Series is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers,
Python objects, etc.). The axis labels are collectively referred to as the index. The basic method to create a Series is
to call:
>>> s = pd.Series(data, index=index)


In [1]:
import pandas as pd
data = ['Anil','Basanti','Charan','Dolly']
s=pd.Series(data)
print(s)

0       Anil
1    Basanti
2     Charan
3      Dolly
dtype: object



### What is DataFrame?

  A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.

### Features of DataFrame
  - Potentially columns are of different types
  - Size â€“ Mutable
  - Labeled axes (rows and columns)
  - Can Perform Arithmetic operations on rows and columns
  
  
#### Creating a Pandas DataFrame

  
  Pandas DataFrame can be created from various types of data such as
  
  - Lists
  - List of Lists
  - Dictionary
  - List of Dictionary
  - Dict of Series
  - CSV File
  - Excel File
  - SQL Database
  
##### Creating a dataframe using List: 

  DataFrame can be created using a single list or a list of lists. 

In [2]:
import pandas as pd
 
# list of strings
lst = ['Anil','Basanti','Charan','Dolly']
 
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)

         0
0     Anil
1  Basanti
2   Charan
3    Dolly


In [3]:
import pandas as pd

lst = ['Anil','Basanti','Charan','Dolly']

df = pd.DataFrame(lst, columns=['StudentName'])
print(df)

  StudentName
0        Anil
1     Basanti
2      Charan
3       Dolly


In [4]:
import pandas as pd

lst = ['Anil','Basanti','Charan','Dolly']

df = pd.DataFrame(lst, columns=['StudentName'], index=['Row1','Row2','Row3','Row4'])
print(df)

     StudentName
Row1        Anil
Row2     Basanti
Row3      Charan
Row4       Dolly


### Creating DataFrame using list of lists

In [6]:
import pandas as pd

lst = [['Anil', 98],['Basanti',80],['Charan',99],['Dolly',100]]

df = pd.DataFrame(lst, columns=['StudentName','Marks'])
print(df)

  StudentName  Marks
0        Anil     98
1     Basanti     80
2      Charan     99
3       Dolly    100


### Create a DataFrame from Dict of ndarrays / Lists
  All the ndarrays must be of same length. If index is passed, then the length of the index should equal to the length of the arrays.
If no index is passed, then by default, index will be range(n), where n is the array length.

In [14]:
import pandas as pd
data = {
        'studentName':['Anil','Basanti','Charan','Dolly'],
        'Marks':[98,80,90,100]
       }
df = pd.DataFrame(data)
df

Unnamed: 0,studentName,Marks
0,Anil,98
1,Basanti,80
2,Charan,90
3,Dolly,100


### Creating DataFrame using List of Dictionary

In [12]:
import pandas as pd

lst = [
        {'StudentName':'Anil','Marks':98},
        {'StudentName':'Basanti','Marks':80},
        {'StudentName':'Charan','Marks':90},
        {'StudentName':'Dolly','Marks':100}
      ]

df = pd.DataFrame(lst)
print(df)

   Marks StudentName
0     98        Anil
1     80     Basanti
2     90      Charan
3    100       Dolly


### Creating DataFrame using Dict of Series

In [13]:
import pandas as pd

data = {
    'StudentName':pd.Series(['Anil','Basanti','Charan','Dolly']),
    'Marks':pd.Series([98,80,90,100])
       }
df=pd.DataFrame(data)
df


Unnamed: 0,StudentName,Marks
0,Anil,98
1,Basanti,80
2,Charan,90
3,Dolly,100


### Creating a DataFrame from CSV file

In [1]:
import pandas as pd
df = pd.read_csv("nba.csv")
print(df)

                        Name                    Team  Number Position   Age  \
0              Avery Bradley          Boston Celtics     0.0       PG  25.0   
1                Jae Crowder          Boston Celtics    99.0       SF  25.0   
2               John Holland          Boston Celtics    30.0       SG  27.0   
3                R.J. Hunter          Boston Celtics    28.0       SG  22.0   
4              Jonas Jerebko          Boston Celtics     8.0       PF  29.0   
5               Amir Johnson          Boston Celtics    90.0       PF  29.0   
6              Jordan Mickey          Boston Celtics    55.0       PF  21.0   
7               Kelly Olynyk          Boston Celtics    41.0        C  25.0   
8               Terry Rozier          Boston Celtics    12.0       PG  22.0   
9               Marcus Smart          Boston Celtics    36.0       PG  22.0   
10           Jared Sullinger          Boston Celtics     7.0        C  24.0   
11             Isaiah Thomas          Boston Celtics

In [4]:
pip install xlrd

Note: you may need to restart the kernel to use updated packages.


### Creating a DataFrame from an Excel file

In [5]:
import pandas as pd

df = pd.read_excel("Data - Single Worksheet.xlsx")
print(df)

   FirstName LastName           City Gender
0    Brandon    James          Miami      M
1       Sean  Hawkins         Denver      M
2       Judy      Day    Los Angeles      F
3     Ashley     Ruiz  San Francisco      F
4  Stephanie    Gomez       Portland      F


## Attributes of DataFrame

#### Head()

In [23]:
nba = pd.read_csv("nba.csv")
nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0


In [24]:
nba.head(1)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0


#### tail()

In [25]:
nba.tail()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
457,,,,,,,,,


In [28]:
import pandas as pd

lst = [['Anil', 98],['Basanti',80],['Charan',99],['Dolly',100]]

df = pd.DataFrame(lst, columns=['StudentName','Marks'])
df

Unnamed: 0,StudentName,Marks
0,Anil,98
1,Basanti,80
2,Charan,99
3,Dolly,100


#### DataFrame Transpose

In [37]:
df.T

Unnamed: 0,0,1,2,3
StudentName,Anil,Basanti,Charan,Dolly
Marks,98,80,99,100


#### index

In [29]:
df.index

RangeIndex(start=0, stop=4, step=1)

#### values

In [30]:
df.values

array([['Anil', 98],
       ['Basanti', 80],
       ['Charan', 99],
       ['Dolly', 100]], dtype=object)

#### Shape

In [31]:
df.shape

(4, 2)

#### dtypes

In [32]:
df.dtypes

StudentName    object
Marks           int64
dtype: object

#### columns

In [34]:
df.columns

Index(['StudentName', 'Marks'], dtype='object')

#### axes

In [35]:
df.axes

[RangeIndex(start=0, stop=4, step=1),
 Index(['StudentName', 'Marks'], dtype='object')]

#### info

In [36]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
StudentName    4 non-null object
Marks          4 non-null int64
dtypes: int64(1), object(1)
memory usage: 88.0+ bytes


## Select One Column from a `DataFrame`

In [40]:
import pandas as pd
data=[['Anil',90,'A','Very Good'],['Basanti',89,'A-','Good'],['Charan',70,'B-','Average'],['Dolly',40,'D+','Below_Avereage'],
      ['Emily',68,'B-','Average'],['Fani',99,'A+','Awesome'],['goutham',29,'F','Bad'],['Hannah',55,'C+','Below_Average']]
df=pd.DataFrame(data,columns=['StudentName','Marks','Grade','Remarks'])
df

Unnamed: 0,StudentName,Marks,Grade,Remarks
0,Anil,90,A,Very Good
1,Basanti,89,A-,Good
2,Charan,70,B-,Average
3,Dolly,40,D+,Below_Avereage
4,Emily,68,B-,Average
5,Fani,99,A+,Awesome
6,goutham,29,F,Bad
7,Hannah,55,C+,Below_Average


In [42]:
df['StudentName']

0       Anil
1    Basanti
2     Charan
3      Dolly
4      Emily
5       Fani
6    goutham
7     Hannah
Name: StudentName, dtype: object

In [43]:
df['Grade']

0     A
1    A-
2    B-
3    D+
4    B-
5    A+
6     F
7    C+
Name: Grade, dtype: object

### Selecting multiple columns of a DataFrame

In [52]:
df[['StudentName','Grade']]

Unnamed: 0,StudentName,Grade
0,Anil,A
1,Basanti,A-
2,Charan,B-
3,Dolly,D+
4,Emily,B-
5,Fani,A+
6,goutham,F
7,Hannah,C+


In [63]:
lst_colmns = ['Marks','Remarks']
df[lst_colmns]

Unnamed: 0,Marks,Remarks
0,90,Very Good
1,89,Good
2,70,Average
3,40,Below_Avereage
4,68,Average
5,99,Awesome
6,29,Bad
7,55,Below_Average


In [50]:
df[df.columns[0:3]]

Unnamed: 0,StudentName,Marks,Grade
0,Anil,90,A
1,Basanti,89,A-
2,Charan,70,B-
3,Dolly,40,D+
4,Emily,68,B-
5,Fani,99,A+
6,goutham,29,F
7,Hannah,55,C+


In [55]:
df[df.columns[0:2]]

Unnamed: 0,StudentName,Marks
0,Anil,90
1,Basanti,89
2,Charan,70
3,Dolly,40
4,Emily,68
5,Fani,99
6,goutham,29
7,Hannah,55


### selecting rows in a DataFrame

In [56]:
df[df.columns[0:2]].head(3)

Unnamed: 0,StudentName,Marks
0,Anil,90
1,Basanti,89
2,Charan,70


In [57]:
df[df.columns[0:2]].tail(3)

Unnamed: 0,StudentName,Marks
5,Fani,99
6,goutham,29
7,Hannah,55


In [58]:
df[df.columns[0:2]][3:5]

Unnamed: 0,StudentName,Marks
3,Dolly,40
4,Emily,68


In [62]:
df[2:7]

Unnamed: 0,StudentName,Marks,Grade,Remarks
2,Charan,70,B-,Average
3,Dolly,40,D+,Below_Avereage
4,Emily,68,B-,Average
5,Fani,99,A+,Awesome
6,goutham,29,F,Bad


#### .loc()

In [32]:
df.loc[0:5]

Unnamed: 0,StudentName,Marks,Grade,Remarks,Branch,University,City,Sex
0,Anil,90,A,Very Good,ECE,IIPL,Hyderabad,M
1,Basanti,89,A-,Good,CSE,IIPL,Hyderabad,F
2,Charan,70,B-,Average,MECH,IIPL,Hyderabad,M
3,Dolly,40,D+,Below_Avereage,CIVIL,IIPL,Hyderabad,F
4,Emily,68,B-,Average,EEE,IIPL,Hyderabad,F
5,Fani,99,A+,Awesome,IT,IIPL,Hyderabad,M


In [31]:
df.loc[[3,7]]

Unnamed: 0,StudentName,Marks,Grade,Remarks,Branch,University,City,Sex
3,Dolly,40,D+,Below_Avereage,CIVIL,IIPL,Hyderabad,F
7,Hannah,55,C+,Below_Average,ECE,IIPL,Hyderabad,F


In [33]:
df.loc[[3,7],['StudentName','City']]

Unnamed: 0,StudentName,City
3,Dolly,Hyderabad
7,Hannah,Hyderabad


In [34]:
df.loc[[3],['Branch']]

Unnamed: 0,Branch
3,CIVIL


### Add New Column to `DataFrame`

In [7]:
import pandas as pd
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [8]:
nba["Sport"] = "Basketball"
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Sport
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,Basketball
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,Basketball
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,Basketball


In [9]:
nba["League"] = "National Basketball Association"
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Sport,League
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,Basketball,National Basketball Association
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,Basketball,National Basketball Association
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,Basketball,National Basketball Association


In [10]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [11]:
nba.insert(3, column = "Sport", value = "Basketball")
nba.head(3)

Unnamed: 0,Name,Team,Number,Sport,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,Basketball,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,Basketball,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,Basketball,SG,27.0,6-5,205.0,Boston University,


In [12]:
nba.insert(7, column = "League", value = "National Basketball Association")
nba.head(3)

Unnamed: 0,Name,Team,Number,Sport,Position,Age,Height,League,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,Basketball,PG,25.0,6-2,National Basketball Association,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,Basketball,SF,25.0,6-6,National Basketball Association,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,Basketball,SG,27.0,6-5,National Basketball Association,205.0,Boston University,


## Methods of DataFrame

### Arithmetic  Operations

In [39]:
import pandas as pd
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0,PG,25,2-Jun,180,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99,SF,25,6-Jun,235,Marquette,6796117.0
2,John Holland,Boston Celtics,30,SG,27,5-Jun,205,Boston University,


In [42]:
nba["Age"]=nba["Age"].add(5)
nba["Age"] + 5

nba["Salary"].sub(5000000)
nba["Salary"] - 5000000

nba["Weight"].mul(0.453592)
nba["Weight in Kilograms"] = nba["Weight"] * 0.453592

In [43]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight in Kilograms
0,Avery Bradley,Boston Celtics,0,PG,30,2-Jun,180,Texas,7730337.0,81.64656
1,Jae Crowder,Boston Celtics,99,SF,30,6-Jun,235,Marquette,6796117.0,106.59412
2,John Holland,Boston Celtics,30,SG,32,5-Jun,205,Boston University,,92.98636


In [119]:
nba["Salary"].div(1000000)
nba["Salary in Millions"] = nba["Salary"] / 1000000

In [120]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary,Weight in Kilograms,Salary in Millions
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0,81.64656,7.730337
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0,106.59412,6.796117
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,,92.98636,


### A Review of the `.value_counts()` Method

In [1]:
import pandas as pd
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [2]:
nba["Team"].value_counts()
nba["Position"].value_counts().head(1)
nba["Weight"].value_counts().tail()
nba["Salary"].value_counts()

947276.0      31
845059.0      18
525093.0      13
981348.0       6
1100602.0      5
16407500.0     5
5000000.0      5
12000000.0     5
8000000.0      5
4000000.0      5
3000000.0      4
7000000.0      4
2814000.0      4
1000000.0      4
19689000.0     4
200600.0       3
8500000.0      3
2500000.0      3
1015421.0      3
2854940.0      3
13500000.0     3
5543725.0      3
2288205.0      2
1270964.0      2
2900000.0      2
1007026.0      2
111444.0       2
13000000.0     2
1500000.0      2
1842000.0      2
              ..
2239800.0      1
1474440.0      1
19688000.0     1
7900000.0      1
2008748.0      1
13800000.0     1
2841960.0      1
1404600.0      1
1584480.0      1
273038.0       1
9213483.0      1
3272091.0      1
3075880.0      1
2250000.0      1
4626960.0      1
1304520.0      1
12100000.0     1
7500000.0      1
295327.0       1
2836186.0      1
6486486.0      1
5016000.0      1
3333333.0      1
1824360.0      1
8042895.0      1
1242720.0      1
2489530.0      1
5103120.0     

### Drop Rows/Columns with Null Values

#### dropna()

##### Parameters :


axis -  {0 or â€˜indexâ€™, 1 or â€˜columnsâ€™}, default 0
		Determine if rows or columns which contain missing values are removed.

		0, or â€˜indexâ€™ : Drop rows which contain missing values.

		1, or â€˜columnsâ€™ : Drop columns which contain missing value.

how	-	{â€˜anyâ€™, â€˜allâ€™}, default â€˜anyâ€™
		Determine if row or column is removed from DataFrame, when we have at least one NA or all NA.

		â€˜anyâ€™ : If any NA values are present, drop that row or column.

		â€˜allâ€™ : If all values are NA, drop that row or column.

thresh - int optional

		Specifies the minimum number of non-NA values in row/column in order for it to be considered in the final result. Any row/column with the number of non-NA values < thresh value is removed irrespective of other parameters passed. When thresh=none, this filter is ignored

subset - array-like, optional
		Labels along other axis to consider, e.g. if you are dropping rows these would be a list of columns to include.

inplace - bool, default False
		If True, do operation inplace and return None. 

In [44]:
nba = pd.read_csv("nba.csv")
nba.tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
454,Raul Neto,Utah Jazz,25,PG,24,1-Jun,179,,
455,Tibor Pleiss,Utah Jazz,21,C,26,3-Jul,256,,2900000.0
456,Jeff Withey,Utah Jazz,24,C,26,Jul-00,231,Kansas,947276.0


In [49]:
nba.dropna(how = "all", inplace = True)

In [50]:
nba.tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
454,Raul Neto,Utah Jazz,25,PG,24,1-Jun,179,,
455,Tibor Pleiss,Utah Jazz,21,C,26,3-Jul,256,,2900000.0
456,Jeff Withey,Utah Jazz,24,C,26,Jul-00,231,Kansas,947276.0


In [45]:
nba.dropna(subset = ["Salary", "College"])
nba.tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
454,Raul Neto,Utah Jazz,25,PG,24,1-Jun,179,,
455,Tibor Pleiss,Utah Jazz,21,C,26,3-Jul,256,,2900000.0
456,Jeff Withey,Utah Jazz,24,C,26,Jul-00,231,Kansas,947276.0


### Fill in Null Values with the `.fillna()` Method

This method allows us to replace the NULL values with any value

In [2]:
import pandas as pd
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0,PG,25,2-Jun,180,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99,SF,25,6-Jun,235,Marquette,6796117.0
2,John Holland,Boston Celtics,30,SG,27,5-Jun,205,Boston University,


In [3]:
nba.fillna(0)
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0,PG,25,2-Jun,180,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99,SF,25,6-Jun,235,Marquette,6796117.0
2,John Holland,Boston Celtics,30,SG,27,5-Jun,205,Boston University,


In [6]:
nba["Salary"].fillna(0, inplace = True)
nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0,PG,25,2-Jun,180,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99,SF,25,6-Jun,235,Marquette,6796117.0
2,John Holland,Boston Celtics,30,SG,27,5-Jun,205,Boston University,0.0
3,R.J. Hunter,Boston Celtics,28,SG,22,5-Jun,185,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8,PF,29,10-Jun,231,,5000000.0


In [7]:
nba["College"].fillna("No College", inplace = True)
nba.head()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0,PG,25,2-Jun,180,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99,SF,25,6-Jun,235,Marquette,6796117.0
2,John Holland,Boston Celtics,30,SG,27,5-Jun,205,Boston University,0.0
3,R.J. Hunter,Boston Celtics,28,SG,22,5-Jun,185,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8,PF,29,10-Jun,231,No College,5000000.0


### The `.astype()` Method

This method is used to cast a pandas object to a specified dtype.This method also provides the capability to convert any suitable existing column to categorical type.

In [7]:
import pandas as pd
nba = pd.read_csv("nba.csv").dropna(how = "all")
nba["Salary"].fillna(0, inplace = True)
nba["College"].fillna("None", inplace = True)
nba.head(6)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,0.0
3,R.J. Hunter,Boston Celtics,28.0,SG,22.0,6-5,185.0,Georgia State,1148640.0
4,Jonas Jerebko,Boston Celtics,8.0,PF,29.0,6-10,231.0,,5000000.0
5,Amir Johnson,Boston Celtics,90.0,PF,29.0,6-9,240.0,,12000000.0


In [8]:
nba.dtypes
nba.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 457 entries, 0 to 456
Data columns (total 9 columns):
Name        457 non-null object
Team        457 non-null object
Number      457 non-null float64
Position    457 non-null object
Age         457 non-null float64
Height      457 non-null object
Weight      457 non-null float64
College     457 non-null object
Salary      457 non-null float64
dtypes: float64(4), object(5)
memory usage: 35.7+ KB


In [9]:
nba["Salary"] = nba["Salary"].astype("int")

In [13]:
nba.dtypes

Name        object
Team        object
Number       int64
Position    object
Age          int64
Height      object
Weight       int64
College     object
Salary       int32
dtype: object

In [11]:
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0,PG,25,2-Jun,180,Texas,7730337
1,Jae Crowder,Boston Celtics,99,SF,25,6-Jun,235,Marquette,6796117
2,John Holland,Boston Celtics,30,SG,27,5-Jun,205,Boston University,0


In [10]:
nba["Number"] = nba["Number"].astype("int")
nba["Age"] = nba["Age"].astype("int")
nba.dtypes

Name         object
Team         object
Number        int64
Position     object
Age           int64
Height       object
Weight      float64
College      object
Salary        int64
dtype: object

In [16]:
nba["Age"].astype("float")
nba.dtypes

Name        object
Team        object
Number       int32
Position    object
Age          int32
Height      object
Weight       int64
College     object
Salary       int32
dtype: object

In [21]:
nba["Position"] = nba["Position"].astype("category")
nba["Team"] = nba["Team"].astype("category")
nba.dtypes

Name          object
Team        category
Number         int32
Position    category
Age            int32
Height        object
Weight         int64
College       object
Salary         int32
dtype: object

### nunique() method 

nunique() function return Series with number of distinct observations over requested axis. If we set the value of axis to be 0, then it finds the total number of unique observations over the index axis.

In [4]:
nba["Position"].nunique()

5

In [11]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 457 entries, 0 to 456
Data columns (total 9 columns):
Name        457 non-null object
Team        457 non-null object
Number      457 non-null int64
Position    457 non-null object
Age         457 non-null int64
Height      457 non-null object
Weight      457 non-null float64
College     457 non-null object
Salary      457 non-null int64
dtypes: float64(1), int64(3), object(5)
memory usage: 35.7+ KB


In [12]:
nba["Position"] = nba["Position"].astype("category")

In [13]:
nba["Team"] = nba["Team"].astype("category")

In [14]:
nba.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 457 entries, 0 to 456
Data columns (total 9 columns):
Name        457 non-null object
Team        457 non-null category
Number      457 non-null int64
Position    457 non-null category
Age         457 non-null int64
Height      457 non-null object
Weight      457 non-null float64
College     457 non-null object
Salary      457 non-null int64
dtypes: category(2), float64(1), int64(3), object(3)
memory usage: 31.1+ KB


### Sort a `DataFrame` with the `.sort_values()` Method

Parameters
by	-	str or list of str
		Name or list of names to sort by.

		if axis is 0 or â€˜indexâ€™ then by may contain index levels and/or column labels.

		if axis is 1 or â€˜columnsâ€™ then by may contain column levels and/or index labels.

axis - {0 or â€˜indexâ€™, 1 or â€˜columnsâ€™}, default 0
		Axis to be sorted.

ascending - bool or list of bool, default True
			Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.

inplace - bool, default False
		If True, perform operation in-place.

kind - {â€˜quicksortâ€™, â€˜mergesortâ€™, â€˜heapsortâ€™}, default â€˜quicksortâ€™

na_position - {â€˜firstâ€™, â€˜lastâ€™}, default â€˜lastâ€™
			  Puts NaNs at the beginning if first; last puts NaNs at the end.

ignore_index - bool, default False
			If True, the resulting axis will be labeled 0, 1, â€¦, n - 1.

In [1]:
import pandas as pd
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [15]:
nba.sort_values("Salary", ascending = False, na_position = "first").tail()

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
175,Jordan McRae,Cleveland Cavaliers,12.0,SG,25.0,6-5,179.0,Tennessee,111196.0
135,Alan Williams,Phoenix Suns,15.0,C,23.0,6-8,260.0,UC Santa Barbara,83397.0
291,Orlando Johnson,New Orleans Pelicans,0.0,SG,27.0,6-5,220.0,UC Santa Barbara,55722.0
130,Phil Pressey,Phoenix Suns,25.0,PG,25.0,5-11,175.0,Missouri,55722.0
32,Thanasis Antetokounmpo,New York Knicks,43.0,SF,23.0,6-7,205.0,,30888.0


In [3]:
nba.sort_values(["Team", "Name"], ascending = [True, False], inplace = True)
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
322,Walter Tavares,Atlanta Hawks,22.0,C,24.0,7-3,260.0,,1000000.0
310,Tim Hardaway Jr.,Atlanta Hawks,10.0,SG,24.0,6-6,205.0,Michigan,1304520.0
321,Tiago Splitter,Atlanta Hawks,11.0,C,31.0,6-11,245.0,,9756250.0


Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2020-03-25 12:42:00,97308,6.945,True,Marketing
21,Matthew,Male,1995-09-05,2020-03-25 02:12:00,100612,13.645,False,Marketing
26,Craig,Male,2000-02-27,2020-03-25 07:45:00,37598,7.757,True,Marketing
74,Thomas,Male,1995-06-04,2020-03-25 14:24:00,62096,17.029,False,Marketing
77,Charles,Male,2004-09-14,2020-03-25 20:13:00,107391,1.26,True,Marketing


### Sort `DataFrame` with the `.sort_index()` Method

In [4]:
nba = pd.read_csv("nba.csv")
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [8]:
nba.sort_values(["Number", "Salary", "Name"], inplace = True)
nba.tail(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
68,Lucas Nogueira,Toronto Raptors,92.0,C,23.0,7-0,220.0,,1842000.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
457,,,,,,,,,


In [9]:
nba.sort_index(ascending = False, inplace = True)
nba.head(3)

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
457,,,,,,,,,
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0


### Filter A `DataFrame`

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.head(3)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2020-03-25 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2020-03-25 06:53:00,61933,4.17,True,
2,Maria,Female,1993-04-23,2020-03-25 11:17:00,130590,11.858,False,Finance


In [3]:
mask1 = df["Gender"] == "Male"
mask2 = df["Team"] == "Marketing"

df[mask1 & mask2].head(5)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2020-03-25 12:42:00,97308,6.945,True,Marketing
21,Matthew,Male,1995-09-05,2020-03-25 02:12:00,100612,13.645,False,Marketing
26,Craig,Male,2000-02-27,2020-03-25 07:45:00,37598,7.757,True,Marketing
74,Thomas,Male,1995-06-04,2020-03-25 14:24:00,62096,17.029,False,Marketing
77,Charles,Male,2004-09-14,2020-03-25 20:13:00,107391,1.26,True,Marketing


### The `.isin()` , `.isnull()` , `.notnull()` and `.between()` Methods

In [4]:
filter1 = df["Gender"].isin(["Female"]) 
filter2 = df["Team"].isin(["Engineering", "Distribution", "Finance" ])
df[filter1 & filter2].head(5)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,2020-03-25 11:17:00,130590,11.858,False,Finance
7,,Female,2015-07-20,2020-03-25 10:43:00,45906,11.598,True,Finance
8,Angela,Female,2005-11-22,2020-03-25 06:29:00,95570,18.523,True,Engineering
14,Kimberly,Female,1999-01-14,2020-03-25 07:13:00,41426,14.543,True,Finance
30,Christina,Female,2002-08-06,2020-03-25 13:19:00,118780,9.096,True,Engineering


In [5]:
mask = df["Team"].isnull()

df[mask]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
1,Thomas,Male,1996-03-31,2020-03-25 06:53:00,61933,4.17,True,
10,Louise,Female,1980-08-12,2020-03-25 09:01:00,63241,15.132,True,
23,,Male,2012-06-14,2020-03-25 16:19:00,125792,5.042,True,
32,,Male,1998-08-21,2020-03-25 14:27:00,122340,6.417,True,
91,James,,2005-01-26,2020-03-25 23:00:00,128771,8.309,False,
109,Christopher,Male,2000-04-22,2020-03-25 10:15:00,37919,11.449,False,
139,,Female,1990-10-03,2020-03-25 01:08:00,132373,10.527,True,
199,Jonathan,Male,2009-07-17,2020-03-25 08:15:00,130581,16.736,True,
258,Michael,Male,2002-01-24,2020-03-25 03:04:00,43586,12.659,False,
290,Jeremy,Male,1988-06-14,2020-03-25 18:20:00,129460,13.657,True,


In [6]:
condition = df["Gender"].notnull()

df[condition]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2020-03-25 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2020-03-25 06:53:00,61933,4.170,True,
2,Maria,Female,1993-04-23,2020-03-25 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2020-03-25 13:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,2020-03-25 16:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,2020-03-25 01:35:00,115163,10.125,False,Legal
6,Ruby,Female,1987-08-17,2020-03-25 16:20:00,65476,10.012,True,Product
7,,Female,2015-07-20,2020-03-25 10:43:00,45906,11.598,True,Finance
8,Angela,Female,2005-11-22,2020-03-25 06:29:00,95570,18.523,True,Engineering
9,Frances,Female,2002-08-08,2020-03-25 06:51:00,139852,7.524,True,Business Development


In [7]:
mask1 = df["Salary"].between(60000, 70000)
df[mask1]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
1,Thomas,Male,1996-03-31,2020-03-25 06:53:00,61933,4.170,True,
6,Ruby,Female,1987-08-17,2020-03-25 16:20:00,65476,10.012,True,Product
10,Louise,Female,1980-08-12,2020-03-25 09:01:00,63241,15.132,True,
20,Lois,,1995-04-22,2020-03-25 19:18:00,64714,4.934,True,Legal
41,Christine,,2015-06-28,2020-03-25 01:08:00,66582,11.308,True,Business Development
47,Kathy,Female,2005-06-22,2020-03-25 04:51:00,66820,9.000,True,Client Services
57,Henry,Male,1996-06-26,2020-03-25 01:44:00,64715,15.107,True,Human Resources
59,Irene,Female,1997-05-07,2020-03-25 09:32:00,66851,11.279,False,Engineering
65,Steve,Male,2009-11-11,2020-03-25 23:44:00,61310,12.428,True,Distribution
74,Thomas,Male,1995-06-04,2020-03-25 14:24:00,62096,17.029,False,Marketing


### The `.drop_duplicates()`, `.unique()` and `.nunique()` Methods

In [1]:
import pandas as pd
df = pd.read_csv("employees.csv", parse_dates = ["Start Date", "Last Login Time"])
df["Senior Management"] = df["Senior Management"].astype("bool")
df["Gender"] = df["Gender"].astype("category")
df.sort_values("First Name", inplace = True)
df.head(3)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
101,Aaron,Male,2012-02-17,2020-03-26 10:20:00,61602,11.849,True,Marketing
327,Aaron,Male,1994-01-29,2020-03-26 18:48:00,58755,5.097,True,Marketing
440,Aaron,Male,1990-07-22,2020-03-26 14:53:00,52119,11.343,True,Client Services


In [39]:
df.drop_duplicates(subset = ["First Name"], keep = False)

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
8,Angela,Female,2005-11-22,2020-02-13 06:29:00,95570,18.523,True,Engineering
688,Brian,Male,2007-04-07,2020-02-13 22:47:00,93901,17.821,True,Legal
190,Carol,Female,1996-03-19,2020-02-13 03:39:00,57783,9.129,False,Finance
887,David,Male,2009-12-05,2020-02-13 08:48:00,92242,15.407,False,Legal
5,Dennis,Male,1987-04-18,2020-02-13 01:35:00,115163,10.125,False,Legal
495,Eugene,Male,1984-05-24,2020-02-13 10:54:00,81077,2.117,False,Sales
33,Jean,Female,1993-12-18,2020-02-13 09:07:00,119082,16.18,False,Business Development
832,Keith,Male,2003-02-12,2020-02-13 15:02:00,120672,19.467,False,Legal
291,Tammy,Female,1984-11-11,2020-02-13 10:30:00,132839,17.463,True,Client Services


In [41]:
df["Gender"].unique()

[Male, NaN, Female]
Categories (2, object): [Male, Female]

In [3]:
len(df["Team"].unique())

11

In [6]:
df["Team"].nunique(dropna = False)

11

### The `.set_index()` and `.reset_index()` Methods

In [20]:
import pandas as pd
bond = pd.read_csv("jamesbond.csv")
bond.head(3)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [14]:
bond.set_index("Film", inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [15]:
bond.reset_index(drop = False, inplace = True)
bond.head(3)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [21]:
bond.set_index("Film", inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [22]:
bond.set_index("Year", inplace = True)
bond.head(3)

Unnamed: 0_level_0,Actor,Director,Box Office,Budget,Bond Actor Salary
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1962,Sean Connery,Terence Young,448.8,7.0,0.6
1963,Sean Connery,Terence Young,543.8,12.6,1.6
1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [None]:
bond.set_index("Film", inplace = True)
bond.reset_index(inplace = True)
bond.set_index("Year", inplace = True)
bond.head(3)

### Retrieve Rows by Index Label with `.loc[]`

In [2]:
import pandas as pd
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [3]:
bond.loc["Goldfinger"]
bond.loc["GoldenEye"]
#bond.loc["Sacred Bond"]
bond.loc["Casino Royale"]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [4]:
bond.loc["Diamonds Are Forever" : "Moonraker"]
bond.loc[: "On Her Majesty's Secret Service"]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [5]:
bond.loc[["Octopussy", "Moonraker"]]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Octopussy,1983,Roger Moore,John Glen,373.8,53.9,7.8
Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,


In [6]:
bond.loc[["For Your Eyes Only", "Live and Let Die", "Gold Bond"]]

Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
For Your Eyes Only,1981.0,Roger Moore,John Glen,449.4,60.2,
Live and Let Die,1973.0,Roger Moore,Guy Hamilton,460.3,30.8,
Gold Bond,,,,,,


In [7]:
"Gold Bond" in bond.index

False

### Retrieve Row(s) by Index Position with `iloc`

In [8]:
bond = pd.read_csv("jamesbond.csv")
bond.head(3)

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
0,Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
1,From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
2,Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [9]:
bond.loc[15]
bond.iloc[15]

bond.iloc[[15, 20]]
bond.iloc[:4]
bond.iloc[4:8]
bond.iloc[20:]

Unnamed: 0,Film,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
20,The World Is Not Enough,1999,Pierce Brosnan,Michael Apted,439.5,158.3,13.5
21,Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
22,Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
23,Quantum of Solace,2008,Daniel Craig,Marc Forster,514.2,181.4,8.1
24,Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
25,Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,


In [10]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [11]:
bond.iloc[[5, 10, 15, 20]]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,


### Retrive rows by `.ix[]` Method

In [17]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(10)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [21]:
bond.ix["GoldenEye"]
bond.ix[["GoldenEye", "Diamonds Are Forever", "Casino Royale"]]
bond.ix["A View to a Kill" : "Diamonds Are Forever"]
# bond.ix["Sacred Bond"]

#"Spectre" in bond.index
#"Sacred Bond" in bond.index

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8


In [22]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [23]:
bond.loc["Moonraker", ["Actor", "Budget", "Year"]]

bond.iloc[14, [5, 3, 2]]

bond.ix[20, "Budget"]
bond.ix["The Man with the Golden Gun", :4]
bond.ix[5, 3]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  import sys


448.8

### Set New Values for a Specific Cell or Row

In [16]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [17]:
bond.ix["A View to a Kill"]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


Year                        1985
Actor                Roger Moore
Director               John Glen
Box Office                 275.2
Budget                      54.5
Bond Actor Salary            9.1
Name: A View to a Kill, dtype: object

In [19]:
bond.ix["A View to a Kill", "Actor"] = "Sir Sean Connery"
bond.head(3)

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Sir Sean Connery,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [20]:
bond.ix["A View to a Kill", ["Box Office", "Budget", "Bond Actor Salary"]] = [448800000, 7000000, 600000]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


In [21]:
bond.ix["A View to a Kill", "Budget"]

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


7000000.0

### Set Multiple Values in `DataFrame`

In [13]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [30]:
mask = bond["Actor"] == "Sean Connery"

In [34]:
bond.ix[mask, "Actor"] = "Sir Sean Connery"
bond.head(5)

.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing

See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#ix-indexer-is-deprecated
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sir Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9


### Rename Index Labels or Columns in a `DataFrame`

In [40]:
import pandas as pd
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [41]:
bond.rename(columns = {"Year" : "Release Date", "Box Office" : "Revenue"}, inplace = True)
bond.head(3)

Unnamed: 0_level_0,Release Date,Actor,Director,Revenue,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [42]:
bond.rename(index = {"Dr. No" : "Doctor No", 
                     "GoldenEye" : "Golden Eye",
                    "The World Is Not Enough" : "Best Bond Movie Ever"}, inplace = True)
bond.head(3)

Unnamed: 0_level_0,Release Date,Actor,Director,Revenue,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


### Delete Rows or Columns from a `DataFrame`

In [23]:
import pandas as pd
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [16]:
bond.drop("Casino Royale", inplace = True)
bond.head(10)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,


In [17]:
bond.drop(labels = ["Box Office", "Bond Actor Salary", "Actor"], axis = "columns", inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Director,Budget
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
A View to a Kill,1985,John Glen,54.5
Diamonds Are Forever,1971,Guy Hamilton,34.7
Die Another Day,2002,Lee Tamahori,154.2


In [19]:
actor = bond.pop("Actor")

In [20]:
actor

Film
A View to a Kill                      Roger Moore
Casino Royale                        Daniel Craig
Casino Royale                         David Niven
Diamonds Are Forever                 Sean Connery
Die Another Day                    Pierce Brosnan
Dr. No                               Sean Connery
For Your Eyes Only                    Roger Moore
From Russia with Love                Sean Connery
GoldenEye                          Pierce Brosnan
Goldfinger                           Sean Connery
Licence to Kill                    Timothy Dalton
Live and Let Die                      Roger Moore
Moonraker                             Roger Moore
Never Say Never Again                Sean Connery
Octopussy                             Roger Moore
On Her Majesty's Secret Service    George Lazenby
Quantum of Solace                    Daniel Craig
Skyfall                              Daniel Craig
Spectre                              Daniel Craig
The Living Daylights               Timothy Da

In [24]:
del bond["Director"]
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
A View to a Kill,1985,Roger Moore,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,581.5,145.3,3.3
Casino Royale,1967,David Niven,315.0,85.0,


### The `.nsmallest()` and `.nlargest()` Methods

In [25]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [26]:
bond.sort_values("Box Office", ascending = False).head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [27]:
bond.nlargest(3, columns = "Box Office")

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [28]:
bond.nsmallest(n = 2, columns = "Box Office")

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1


### Filtering with the `where` Method

In [1]:
import pandas as pd
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [2]:
mask = bond["Actor"] == "Sean Connery"
bond[mask]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4


In [3]:
bond.where(mask)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,,,,,,
Casino Royale,,,,,,
Casino Royale,,,,,,
Diamonds Are Forever,1971.0,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,,,,,,
Dr. No,1962.0,Sean Connery,Terence Young,448.8,7.0,0.6
For Your Eyes Only,,,,,,
From Russia with Love,1963.0,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,,,,,,
Goldfinger,1964.0,Sean Connery,Guy Hamilton,820.4,18.6,3.2


### The `.query()` Method

In [4]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [5]:
#bond.columns = [column_name.replace(" ", "_") for column_name in bond.columns]
#bond.head(1)

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1


In [6]:
bond.query('Actor == "Sean Connery"')
bond.query("Actor != 'Roger Moore'")

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,


In [7]:
bond.query("Actor == 'Roger Moore' or Director == 'John Glen'")

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
Live and Let Die,1973,Roger Moore,Guy Hamilton,460.3,30.8,
Moonraker,1979,Roger Moore,Lewis Gilbert,535.0,91.5,
Octopussy,1983,Roger Moore,John Glen,373.8,53.9,7.8
The Living Daylights,1987,Timothy Dalton,John Glen,313.5,68.8,5.2
The Man with the Golden Gun,1974,Roger Moore,Guy Hamilton,334.0,27.7,
The Spy Who Loved Me,1977,Roger Moore,Lewis Gilbert,533.0,45.1,


In [8]:
bond.query("Actor not in ['Sean Connery', 'Roger Moore']")

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
Quantum of Solace,2008,Daniel Craig,Marc Forster,514.2,181.4,8.1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,
The Living Daylights,1987,Timothy Dalton,John Glen,313.5,68.8,5.2


### `.apply()` Method on Single Columns

In [9]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [10]:
def convert_to_string_and_add_millions(number):
    return str(number) + " MILLIONS!"

In [11]:
columns = ["Box Office", "Budget", "Bond Actor Salary"]
for col in columns:
    bond[col] = bond[col].apply(convert_to_string_and_add_millions)

In [12]:
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2 MILLIONS!,54.5 MILLIONS!,9.1 MILLIONS!
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5 MILLIONS!,145.3 MILLIONS!,3.3 MILLIONS!
Casino Royale,1967,David Niven,Ken Hughes,315.0 MILLIONS!,85.0 MILLIONS!,nan MILLIONS!


### The `.copy()` Method

In [4]:
bond = pd.read_csv("jamesbond.csv", index_col = "Film")
bond.sort_index(inplace = True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [5]:
directors = bond["Director"]
directors.head(3)

Film
A View to a Kill          John Glen
Casino Royale       Martin Campbell
Casino Royale            Ken Hughes
Name: Director, dtype: object

In [6]:
directors["A View to a Kill"] = "Mister John Glen"

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.


In [7]:
directors.head(3)

Film
A View to a Kill    Mister John Glen
Casino Royale        Martin Campbell
Casino Royale             Ken Hughes
Name: Director, dtype: object

In [8]:
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,Mister John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


In [9]:
directors = bond["Director"].copy()
directors.head(3)

Film
A View to a Kill    Mister John Glen
Casino Royale        Martin Campbell
Casino Royale             Ken Hughes
Name: Director, dtype: object

In [10]:
directors["A View to a Kill"] = "Mister John Glen"

In [11]:
directors.head(3)

Film
A View to a Kill    Mister John Glen
Casino Royale        Martin Campbell
Casino Royale             Ken Hughes
Name: Director, dtype: object

In [12]:
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,Mister John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


## Working with Text Data

In [13]:
import pandas as pd
data=[[' Anil',90,'A','very good'],['Basanti ',89,'A-','Good'],[' Charan ',70,'B-','Average'],['Dolly',40,'D+','Below Avereage'],
      ['Emily',68,'B-','Average'],['Anil',99,'A+','Awesome'],['Charan',29,'F','Bad'],['Charan',55,'C+','Below Average']]
df=pd.DataFrame(data,columns=['StudentName','Marks','Grade','Remarks'])
df

Unnamed: 0,StudentName,Marks,Grade,Remarks
0,Anil,90,A,very good
1,Basanti,89,A-,Good
2,Charan,70,B-,Average
3,Dolly,40,D+,Below Avereage
4,Emily,68,B-,Average
5,Anil,99,A+,Awesome
6,Charan,29,F,Bad
7,Charan,55,C+,Below Average


In [2]:
df['StudentName'].nunique()

5

In [3]:
df['StudentName'].count()

8

### `lower()`, `upper()`, `title()` and `len()` methods

In [4]:
df['StudentName'].str.lower()

0       anil
1    basanti
2     charan
3      dolly
4      emily
5       anil
6     charan
7     charan
Name: StudentName, dtype: object

In [5]:
df['StudentName'].str.upper()

0       ANIL
1    BASANTI
2     CHARAN
3      DOLLY
4      EMILY
5       ANIL
6     CHARAN
7     CHARAN
Name: StudentName, dtype: object

In [8]:
df['Remarks'].str.title()

0         Very Good
1              Good
2           Average
3    Below Avereage
4           Average
5           Awesome
6               Bad
7     Below Average
Name: Remarks, dtype: object

In [9]:
df['StudentName'].str.len()

0    4
1    7
2    6
3    5
4    5
5    4
6    6
7    6
Name: StudentName, dtype: int64

### `.str.replace() , .strip(), .lstrip() , rstrip() and split()` methods

In [11]:
df["Remarks"].str.replace(" ","_")

0         very_good
1              Good
2           Average
3    Below_Avereage
4           Average
5           Awesome
6               Bad
7     Below_Average
Name: Remarks, dtype: object

In [14]:
df["StudentName"].str.strip()

0       Anil
1    Basanti
2     Charan
3      Dolly
4      Emily
5       Anil
6     Charan
7     Charan
Name: StudentName, dtype: object

In [15]:
df["Remarks"].str.split(" ").str.get(0)

0       very
1       Good
2    Average
3      Below
4    Average
5    Awesome
6        Bad
7      Below
Name: Remarks, dtype: object