## Drop Columns from Dataframe

Let us get the list of not required columns from column mapping and drop them from the Dataframe.
* For each source column we have defined target attribute details.
* **is_required** is one of the target attribute details. If **is_required** is false, then we would like to discard those fields before writing into the target table.
* We need to develop the logic to get list of columns where **is_required** is false.
* We can pass the list to the drop function on top of Dataframe.

In [5]:
import pandas as pd
customers = pd.read_csv('/data/ecomm/customers/part-00000')

In [6]:
column_mapping_str = '''{
    "customer_first_name": {"target_field_name": "FirstName", "is_required": true},
    "customer_last_name": {"target_field_name": "LastName", "is_required": true},
    "customer_email": {"target_field_name": "Email", "is_required": true},
    "product_name": {"is_required": false},
    "product_subscription": {"is_required": false}
}'''

In [7]:
import json
column_mapping = json.loads(column_mapping_str)

In [8]:
# Converts dict to list of tuples
column_mapping.items()

dict_items([('customer_first_name', {'target_field_name': 'FirstName', 'is_required': True}), ('customer_last_name', {'target_field_name': 'LastName', 'is_required': True}), ('customer_email', {'target_field_name': 'Email', 'is_required': True}), ('product_name', {'is_required': False}), ('product_subscription', {'is_required': False})])

In [9]:
# Get first tuple from the list
list(column_mapping.items())[0]

('customer_first_name',
 {'target_field_name': 'FirstName', 'is_required': True})

In [10]:
# Assigning first tuple to a variable
col = list(column_mapping.items())[0]

In [11]:
# Getting second element from the tuple
# This will return target attribute details
# It is of type dict
col[1]

{'target_field_name': 'FirstName', 'is_required': True}

In [32]:
# Getting the value of is_required from the dict
col[1]['is_required']

True

In [12]:
# Same process for 5th element in the list of tuples
list(column_mapping.items())[4] # Picking 5th element in the list

('product_subscription', {'is_required': False})

In [78]:
col = list(column_mapping.items())[4]

In [79]:
col[1]

{'is_required': False}

In [80]:
col[1]['is_required']

False

In [42]:
not col[1]['is_required']

True

In [13]:
# Returns list of items where is_required is false
list(filter(lambda col: not col[1]['is_required'], column_mapping.items()))

[('product_name', {'is_required': False}),
 ('product_subscription', {'is_required': False})]

In [14]:
# Convert to a dict
dict(list(filter(lambda col: not col[1]['is_required'], column_mapping.items())))

{'product_name': {'is_required': False},
 'product_subscription': {'is_required': False}}

In [15]:
# Get list of not required fields
dict(list(filter(lambda col: not col[1]['is_required'], column_mapping.items()))).keys()

dict_keys(['product_name', 'product_subscription'])

In [18]:
# Assigning the list of not required fields to a variable
columns_to_be_dropped = dict(list(filter(lambda col: not col[1]['is_required'], column_mapping.items()))).keys()

* You can use drop on dataframe to drop the columns. You can pass the names using `columns` keyword argument.

In [19]:
customers.drop?

[0;31mSignature:[0m
[0mcustomers[0m[0;34m.[0m[0mdrop[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mlabels[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0maxis[0m[0;34m=[0m[0;36m0[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mindex[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcolumns[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlevel[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0minplace[0m[0;34m=[0m[0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0merrors[0m[0;34m=[0m[0;34m'raise'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Drop specified labels from rows or columns.

Remove rows or columns by specifying label names and corresponding
axis, or by specifying directly index or column names. When using a
multi-index, labels on different levels can be removed by specifying
the level.

Parameters
----------

In [20]:
# Returns dataframe by dropping the not required fields
customers.drop(columns=columns_to_be_dropped)

Unnamed: 0,customer_first_name,customer_last_name,customer_email
0,Cassaundra,Collinson,ccollinson0@alibaba.com
1,Rozamond,Oene,roene1@technorati.com
2,Gus,Hawick,ghawick2@dagondesign.com
3,Delano,Ashbey,dashbey3@purevolume.com
4,Fara,Simondson,fsimondson4@umn.edu
5,Myrilla,Gates,mgates5@sina.com.cn
6,Arabela,Tweedlie,atweedlie6@comcast.net
7,Loise,Schindler,lschindler7@discovery.com
8,Storm,McBrearty,smcbrearty8@ovh.net
9,Westley,Matityahu,wmatityahu9@altervista.org
