## Remove Duplicates

### duplicated()
1. This is a record level function, return boolean value for each record

### drop_duplicates()
1. This method returns a DataFrame where duplicated array is False<br>
2. This method won't affect the original data frame unless you used [inplace=True] argument. <br>
3. 如果你想针对检测某一列的duplicate，并针对该列drop duplicate的话，就将column name作为该method的argument。 data.drop_duplicates(['k1'])<br>
4. By default, drop_duplicates keep the first observed value. <br>
    a. If you want to keep the latter one, then use [keep = 'last'] argument. <br>
    b. [keep=False], consider all the same values as duplicates. This will drop all the duplicates.<br>

In [1]:
import pandas as pd
import numpy as np

In [2]:
data = pd.DataFrame({'k1':['one','two']*3+['two'], 'k2':[1,1,2,3,3,4,4]})
data

Unnamed: 0,k1,k2
0,one,1
1,two,1
2,one,2
3,two,3
4,one,3
5,two,4
6,two,4


In [3]:
data.duplicated()

0    False
1    False
2    False
3    False
4    False
5    False
6     True
dtype: bool

In [4]:
data.drop_duplicates()

Unnamed: 0,k1,k2
0,one,1
1,two,1
2,one,2
3,two,3
4,one,3
5,two,4


In [6]:
data.drop_duplicates(['k1'])

Unnamed: 0,k1,k2
0,one,1
1,two,1


In [9]:
data.drop_duplicates(['k1'],keep='last')

Unnamed: 0,k1,k2
4,one,3
6,two,4


In [10]:
data.drop_duplicates(['k1'],keep=False)

Unnamed: 0,k1,k2


## Transforming Data Using a Function or Mapping

1. map() function can only be used for Series. DataFrame doesn't have this function. <br>
2. map() can accept a function or dict-like object or a Series.<br>
3. When argument is a dictionary, values in Series that are not in the dictionary are converted to NaN. <br>
4. When the argumetn is a function , to avoid applying the function to missing values and keep them as NaN. use [na_action='ignore'] argument.

In [13]:
data = pd.DataFrame({'food':['bacon','pulled pork','bacon'
                             ,'Pastrami','corned beef','bacon'
                             ,'pastrami','honey ham','nova lox'],
                      'ounces':[4,3,12,6,7.5,8,3,5,6]})
data

Unnamed: 0,food,ounces
0,bacon,4.0
1,pulled pork,3.0
2,bacon,12.0
3,Pastrami,6.0
4,corned beef,7.5
5,bacon,8.0
6,pastrami,3.0
7,honey ham,5.0
8,nova lox,6.0


In [16]:
lowercased = data['food'].str.lower() # if you want to use string method to a series, add str., then method name.
meat_to_animal = {'bacon':'pig','pulled pork':'pork','pastrami':'cow','honey ham':'pig','nova lox':'salmon'}
data['animal']=lowercased.map(meat_to_animal)
data

Unnamed: 0,food,ounces,animal
0,bacon,4.0,pig
1,pulled pork,3.0,pork
2,bacon,12.0,pig
3,Pastrami,6.0,cow
4,corned beef,7.5,
5,bacon,8.0,pig
6,pastrami,3.0,cow
7,honey ham,5.0,pig
8,nova lox,6.0,salmon


In [19]:
data['food'].map(lambda x:meat_to_animal[x.lower()],na_action='ignore')

KeyError: 'corned beef'

## Replacing Values