# Pandas - Resource for All Essential Methods and Techniques

*This notebook is for anyone who works with, or wishes to learn **Pandas**. This notebook contains most of the essential methods and techniques that you need in hand in order to, successfully and consistently, clean & manipulate data*

## Table of Contents:
<ul>
    <li><a href='#intro'>Introduction</a></li>
    <li><a href='#dfc'>DataFrame Creation</a></li>
    <li><a href='#drdg'>Creating a Dataset from Random Data Generator</a></li>
    <li><a href='#method_info'>Methods to Obtain Information from Data</a></li>
    <li><a href='#method_index'>Indexing Methods</a></li>
    <li><a href='#method_filt'>Filtering Methods</a></li>
    <li><a href='#method_add'>Methods for Adding/Removing Rows & Columns</a></li>
    <li><a href='#method_upd'>Methods for Updating Rows & Columns</a></li>
    <li><a href='#method_grp'>Grouping Methods</a></li>
</ul>

<a id='intro'></a>
## Introduction

>***Pandas*** *is a powerful, fast, and flexible Python library for Data Preparation, Cleaning and Manipulation. For more information on* ***Pandas***, *visit https://pandas.pydata.org/docs/user_guide/index.html*

This Jupyter notebook serves as a reference for individuals who are new to **Pandas** and wish they had a single resource they can refer to, that contains the most frequently used **Pandas** methods and how to use them. 

This notebook is also a resource for individuals to practice getting familiar with **Pandas** without having to deal with large datasets.

If you decided to learn **Pandas** I would assume that you already have basic knowledge of Python. If not, I suggest you become at least familiar with the language before attempting to learn **Pandas**.

Throughout this notebook you will find me saying 'methods' a lot. There's no difference between a method and a function, except that when I'm talking about a **Pandas** built-in function, I will refer to it as 'method'.

<br>

#### Importing Packages & Libraries

In [1]:
import numpy as np
import pandas as pd
import random

There are 2 ways to create a DataFrame in **Pandas**
<ol>
    <li>Create a DataFrame from Dictionary</li>
    <li>Import .csv file into DataFrame</li>
</ol>

<a id='dfc'></a>
## DataFrame Creation

### 1. Create a DataFrame from Dictionary

a) you could construct a dictionary inside Pandas' **pd.DataFrame( )** method.

In [2]:
df = pd.DataFrame(
    {
        "A": 1.0,
        "B": pd.Timestamp("20220920"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
})

df

Unnamed: 0,A,B,C,D,E,F
0,1.0,2022-09-20,1.0,3,test,foo
1,1.0,2022-09-20,1.0,3,train,foo
2,1.0,2022-09-20,1.0,3,test,foo
3,1.0,2022-09-20,1.0,3,train,foo


<br>

b) you could define a dictionary, then pass the name of the dictionary as an argument in Pandas' **pd.DataFrame( )** method.

In [3]:
df_dict = {
        "A": 1.0,
        "B": pd.Timestamp("20220920"),
        "C": pd.Series(1, index=list(range(4)), dtype="float32"),
        "D": np.array([3] * 4, dtype="int32"),
        "E": pd.Categorical(["test", "train", "test", "train"]),
        "F": "foo",
} 

df = pd.DataFrame(df_dict)

df

Unnamed: 0,A,B,C,D,E,F
0,1.0,2022-09-20,1.0,3,test,foo
1,1.0,2022-09-20,1.0,3,train,foo
2,1.0,2022-09-20,1.0,3,test,foo
3,1.0,2022-09-20,1.0,3,train,foo


<br>

### 2. Create a DataFrame by importing .csv file

you can create a DataFrame using Pandas' **pd.read_csv( )** method by passing the .csv file name as an argument.

In [4]:
df = pd.read_csv('./sample_df.csv')
df

Unnamed: 0,A,B,C,D,E,F
0,1.0,2022-09-20,1.0,3,test,foo
1,1.0,2022-09-20,1.0,3,train,foo
2,1.0,2022-09-20,1.0,3,test,foo
3,1.0,2022-09-20,1.0,3,train,foo


***However, it's important to know that in real-life situations you will most likely use the 'pd.read_csv( )' method more often. Because in most cases, the data you are going to analyse or manipulate will either be a .csv file pulled from a database, or a .csv file you can find on the web.***
<br>
<br>
<hr>
<br>

> The dataset we are going to use will be a generic dataset initiated from a dictionary using a <a href='#rdg'>**random data generator**</a>.

<a id='drdg'></a>
## Creating a Dataset from a Random Data Generator

Below is a basic dictionary generator built to help you initiate a dataset with random values without much work.

This can be helpful when trying to learn Pandas and can't find a simple dataset to work with. Using this random data generator, you can build a simple dataset.

The code below generates a dictionary with random values, that can be transformed into a dataframe using **Pandas**.

This function will return **df.csv** which we will import into Pandas using **pd.read_csv( )** method.

> **YOU DON'T NEED TO RUN THE CODE!!!**<br>
<br>
*the **df.csv** file that we will use to import the data into Pandas is already included in the directory that contains this Jupyter notebook.* 

However if you wish to run the code, note that this is a random data generator, which means that everytime you run the code, you will get a new & unique dataset. So if would like to follow along with matching data, I suggest you don't run the code.

***Otherwise, feel free to run the code and use it to your liking.***

<a id='rdg'></a>
### Random Data Generator

The purpose of this function is to help you save time building datasets
when learning Pandas.
<br>
<br>
> feel free to change any values to your liking. (add or remove names, countries, 
keys from/to the dictionary etc.)

In [103]:
# !!!!! FUNCTION IS DISABLED, CHECK BOTTOM OF CELL TO ENABLE !!!!!!


def build_dict(num):
    first_name = ['Adam', 'Aaron', 'Andrew', 'Benjamin', 'Bill', 'Conrad', 'Carl',
                  'Daniel','David', 'Erik', 'Edwin', 'Evan', 'Frank', 'Frederick',
                  'Gabriel', 'Ian', 'Jack', 'John', 'Jason', 'Mark', 'Martin',
                  'Magnus', 'Neil', 'Oscar', 'Oswald', 'Patrick', 'Peter', 'Quentin', 
                  'Russel', 'Ron', 'Steven', 'Stanislas', 'Tyler', 'Victor', 'William', 'Xavier', 
                  'Amin', 'Salim']
    
    last_name = ['Morten', 'Atkinson', 'Manning', 'Stevenson', 'Eriksen', 'Robinson', 'Perreira',
                 'Flick', 'Tanner', 'Fraser', 'Lehmann', 'Hansen', 'Hassan', 'Magnusson', 'Hvar', 
                 'Kimmel', 'Ronson', 'Hamilton', 'Thomas', 'Richards', 'Terry', 'Gerard',
                 'Hank', 'Williamson', 'Roberts', 'Smith', 'Towns', 'Phillips', 'Woodburn', 'Patten', 
                 'Fernandez', 'Williams', 'Bulsic', 'Kramer', 'Mendez', 'Albert', 'Samir', 'Aygun']
    
    countries = ['USA', 'UK', 'Scotland', 'Canada', 'Australia', 'New Zealand']
    
    languages = ['html; CSS; Javascript', 'Python', 'Java', 'Dart', 'PHP', 'Ruby', 'R']
    
    names_dict = {'first_name': [],
             'last_name': [],
             'email': [],
             'age': [],
             'country': [],
             'salary': [],
             'languages': [],     
            }

    for i in range(num):
        firstname = random.choice(first_name)
        lastname = random.choice(last_name)
        email = '{}{}{}@gmail.com'.format(firstname[0].lower(), lastname.lower(), random.randint(10, 99))
        age = random.randint(25, 45)
        country = random.choice(countries)
        salary = random.randint(40, 120) * 1000
        language = random.choice(languages)
        
# Here we will create a conditional statement to ensure that there are no name duplicates
# in the dataset, this may affect the resulting number of rows in the dataset.

        if firstname in names_dict['first_name'] and lastname in names_dict['last_name']:
            continue
        
        else:
            names_dict['first_name'] += [firstname]
            names_dict['last_name'] += [lastname]
            names_dict['email'] += [email]
            names_dict['age'] += [age]
            names_dict['country'] += [country]
            names_dict['salary'] += [salary]
            names_dict['languages'] += [language]
            
        
    
    df = pd.DataFrame(names_dict)
    df.to_csv('./df.csv', index=False)
    
    return len(names_dict['first_name'])


# to control the number of values that will exist in the dictionary, you can pass 
# the number you want as an arg when calling the 'build_dict()' function,
# as demonstrated below. (default = 100)


# !!!!!! TO ENABLE THIS FUNCTION, REMOVE '#' BELOW & RUN THE CELL !!!!!!

#build_dict(100)

**N.B**: if we run Python's ***len( )*** method on the generated dictionary, which is the return value for **build_dict( )** function. We find that the number of rows is different than the number we passed as an argument in the <a href='#rdg'>random data generator's</a> **build_dict( )** function.
<br>
<br>
This is the effect of the **if conditional** statement located in the for loop inside **build_dict( )** function, as explained <a href='#rdg'>*above*</a>

### Important Read

*After running the random data generator we can see that it generated a new **'df.csv'** file. This is the file we will be using from now on, and we will import it into **Pandas** in the next section.*
<br>

> *If you did not run the code, you can use the **'df.csv'** file located in the directory that contains this Jupyter notebook.*



### Creating DataFrame

In [31]:
df = pd.read_csv('./df.csv')
df

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
0,Salim,Albert,salbert29@gmail.com,38,Scotland,114000,Java
1,Xavier,Ronson,xronson67@gmail.com,37,Australia,48000,Ruby
2,Frank,Hank,fhank55@gmail.com,27,UK,43000,Python
3,Xavier,Williams,xwilliams75@gmail.com,44,USA,77000,Ruby
4,Patrick,Smith,psmith73@gmail.com,35,Scotland,60000,R
5,Gabriel,Lehmann,glehmann43@gmail.com,40,Canada,51000,html; CSS; Javascript
6,Gabriel,Mendez,gmendez93@gmail.com,35,Australia,49000,Ruby
7,Russel,Magnusson,rmagnusson59@gmail.com,34,Canada,50000,Ruby
8,Erik,Albert,ealbert85@gmail.com,43,Scotland,101000,R
9,John,Atkinson,jatkinson43@gmail.com,38,Scotland,53000,Java


<hr>
<br>


<a id='method_info'></a>
### Methods to obtain information from data

> Now that the data is loaded, I am going to point out the general and most used methods in **Pandas**. As well as explain the purpose of every method.
<br>
<br>
**You can find all the methods at** https://pandas.pydata.org/docs/reference/frame.html

In [32]:
df.info()     # Returns a concise summary of the DataFrame

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49 entries, 0 to 48
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   first_name  49 non-null     object
 1   last_name   49 non-null     object
 2   email       49 non-null     object
 3   age         49 non-null     int64 
 4   country     49 non-null     object
 5   salary      49 non-null     int64 
 6   languages   49 non-null     object
dtypes: int64(2), object(5)
memory usage: 2.8+ KB


In [42]:
df.head()     # Returns the first 5 rows (You can pass the number of rows you want as arg)

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
0,Salim,Albert,salbert29@gmail.com,38,Scotland,114000,Java
1,Xavier,Ronson,xronson67@gmail.com,37,Australia,48000,Ruby
2,Frank,Hank,fhank55@gmail.com,27,UK,43000,Python
3,Xavier,Williams,xwilliams75@gmail.com,44,USA,77000,Ruby
4,Patrick,Smith,psmith73@gmail.com,35,Scotland,60000,R


In [43]:
df.tail()     # Returns the last 5 rows (You can pass the number of rows you want as arg)

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
44,Victor,Ronson,vronson36@gmail.com,32,USA,102000,PHP
45,Edwin,Phillips,ephillips71@gmail.com,27,Scotland,97000,Dart
46,Neil,Manning,nmanning78@gmail.com,38,Canada,49000,Java
47,Steven,Kimmel,skimmel64@gmail.com,32,Scotland,91000,Python
48,Frank,Fernandez,ffernandez63@gmail.com,40,New Zealand,70000,html; CSS; Javascript


In [153]:
df.max()     # Returns maximum values for columns in DataFrame

first_name                   Xavier
last_name                  Woodburn
email           xtanner62@gmail.com
age                              70
country                         USA
salary                       115000
languages     html; CSS; Javascript
dtype: object

In [158]:
df.min()     # Returns minimum values for columns in DataFrame

first_name                  Aaron
last_name                  Albert
email         abulsic65@gmail.com
age                            25
country                 Australia
salary                      40000
languages                    Dart
dtype: object

In [44]:
df.shape     # Returns a tuple representing the dimensions of the DataFrame

(49, 7)

In [45]:
df.dtypes     # Returns data types of columns in DataFrame

first_name    object
last_name     object
email         object
age            int64
country       object
salary         int64
languages     object
dtype: object

In [46]:
df.describe()     # Returns descriptive statistics on DataFrame

Unnamed: 0,age,salary
count,49.0,49.0
mean,34.653061,74183.673469
std,5.19034,23997.980357
min,25.0,40000.0
25%,30.0,51000.0
50%,35.0,70000.0
75%,38.0,97000.0
max,45.0,115000.0


In [77]:
df.duplicated().sum()     # Returns sum of duplicate rows

0

In [76]:
df.isnull().sum()     # Returns sum of missing values

first_name    0
last_name     0
email         0
age           0
country       0
salary        0
languages     0
dtype: int64

> *These are the most common methods you will use to obtain preliminary information about the DataFrame.*

<br>
<br>
<hr>

<a id='method_index'></a>
### Indexing Methods

> In this section, I will demonstrate how to index specific columns from the DataFrame and the methods that can be applied to those columns.

#### 1. Indexing Columns

In [148]:
df['email']     # Returns values for selected column

0         salbert29@gmail.com
1         xronson67@gmail.com
2           fhank55@gmail.com
3         mtomassen@gmail.com
4          psmith73@gmail.com
5        glehmann43@gmail.com
6         gmendez93@gmail.com
7      rmagnusson59@gmail.com
8         ealbert85@gmail.com
9       jatkinson43@gmail.com
10        pkimmel29@gmail.com
11         mtowns19@gmail.com
12         qflick57@gmail.com
13         cterry70@gmail.com
14        afraser13@gmail.com
15        jfraser66@gmail.com
16        dkramer30@gmail.com
17       eroberts52@gmail.com
18      watkinson80@gmail.com
19      awilliams14@gmail.com
20        ffraser90@gmail.com
21        abulsic65@gmail.com
22      iperreira50@gmail.com
23        chansen94@gmail.com
24        ethomas35@gmail.com
25     bstevenson24@gmail.com
26        xtanner62@gmail.com
27      jrichards32@gmail.com
28       tmanning62@gmail.com
29        pmendez35@gmail.com
30          dhvar58@gmail.com
31      bwilliams90@gmail.com
32         asamir33@gmail.com
33    ewil

In [160]:
df[['last_name', 'email', 'salary']]

# Returns a DataFrame from selected columns. Note that when accessing multiple columns
# you have to pass column names as a list.

Unnamed: 0,last_name,email,salary
0,Albert,salbert29@gmail.com,114000
1,Ronson,xronson67@gmail.com,48000
2,Hank,fhank55@gmail.com,43000
3,Tomassen,mtomassen@gmail.com,77000
4,Smith,psmith73@gmail.com,60000
5,Lehmann,glehmann43@gmail.com,51000
6,Mendez,gmendez93@gmail.com,49000
7,Magnusson,rmagnusson59@gmail.com,50000
8,Albert,ealbert85@gmail.com,101000
9,Atkinson,jatkinson43@gmail.com,53000


> *Indexing allows you to run some of the general methods discussed earlier on the specified column(s) you've indexed.*

In [142]:
df['age'].max()     # Returns maximum value for selected column

70

In [143]:
df['salary'].min()     # Returns minimum value for selected column

40000

In [144]:
df['languages'].value_counts()

# Returns Series with count of unique values in specified column

Python                   8
R                        8
PHP                      8
Java                     7
Ruby                     7
html; CSS; Javascript    6
Dart                     5
Name: languages, dtype: int64

<br>

#### 2. Indexing Rows

> To access values for a specific row, we use **.iloc** and **.loc** methods.
<br>
<br>
**.iloc** and **.loc** are similar methods. The only difference is that **.iloc** takes *integers* as arguments, while **.loc** can take *integers or strings* as arguments

In [206]:
df.iloc[38]     # Returns values for selected row index

first_name                   Xavier
last_name                  Hamilton
email         xhamilton75@gmail.com
age                            41.0
country                         USA
salary                      48000.0
languages                       PHP
Name: 38, dtype: object

In [90]:
df.iloc[0, 1]     # Returns value for selected row & specified column 

'Albert'

In [99]:
df.iloc[[0, 23]]     # Returns DataFrame with values for selected rows

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
0,Salim,Albert,salbert29@gmail.com,38,Scotland,114000,Java
23,Carl,Hansen,chansen94@gmail.com,35,USA,45000,Java


In [92]:
df.iloc[[0, 11], 2]     # Returns values for selected rows in a specific column

0     salbert29@gmail.com
11     mtowns19@gmail.com
Name: email, dtype: object

In [100]:
df.loc[[0, 32]]     # Returns DataFrame values for selected rows

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
0,Salim,Albert,salbert29@gmail.com,38,Scotland,114000,Java
32,Andrew,Samir,asamir33@gmail.com,45,Australia,98000,R


In [96]:
df.loc[0, 'last_name']     # Returns value for selected row & specified column

'Albert'

In [102]:
df.loc[0:8, ['email', 'last_name']]

# Returns DataFrame with values for index range & specified columns

Unnamed: 0,email,last_name
0,salbert29@gmail.com,Albert
1,xronson67@gmail.com,Ronson
2,fhank55@gmail.com,Hank
3,xwilliams75@gmail.com,Williams
4,psmith73@gmail.com,Smith
5,glehmann43@gmail.com,Lehmann
6,gmendez93@gmail.com,Mendez
7,rmagnusson59@gmail.com,Magnusson
8,ealbert85@gmail.com,Albert


> Those are the most common methods for indexing rows and columns in **Pandas**.

<a id='method_filt'></a>
### Filtering Methods

> In this part, I will demonstrate how to create filters to help you pick specific values from the DataFrame.

In [207]:
df[df['last_name'] == 'Manning']

# Returns all rows where 'last_name' is 'Manning'

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
28,Tyler,Manning,tmanning62@gmail.com,29.0,UK,113000.0,R
46,Neil,Manning,nmanning78@gmail.com,38.0,Canada,49000.0,Java


In [118]:
df[df['salary'] > 100000]

# Returns all rows where 'salary' is less than 100,000

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
0,Salim,Albert,salbert29@gmail.com,38,Scotland,114000,Java
8,Erik,Albert,ealbert85@gmail.com,43,Scotland,101000,R
10,Patrick,Kimmel,pkimmel29@gmail.com,39,Canada,104000,html; CSS; Javascript
14,Amin,Fraser,afraser13@gmail.com,26,UK,115000,R
21,Amin,Bulsic,abulsic65@gmail.com,30,UK,110000,Java
27,Jason,Richards,jrichards32@gmail.com,34,Scotland,107000,Java
28,Tyler,Manning,tmanning62@gmail.com,29,UK,113000,R
43,Oscar,Hvar,ohvar35@gmail.com,30,USA,112000,Python
44,Victor,Ronson,vronson36@gmail.com,32,USA,102000,PHP


In [119]:
df[df['age'] <= 27]

# Returns all rows where 'age' is less than or equal to 27

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
2,Frank,Hank,fhank55@gmail.com,27,UK,43000,Python
14,Amin,Fraser,afraser13@gmail.com,26,UK,115000,R
29,Peter,Mendez,pmendez35@gmail.com,25,UK,84000,Ruby
41,Ron,Gerard,rgerard20@gmail.com,25,Australia,63000,Python
45,Edwin,Phillips,ephillips71@gmail.com,27,Scotland,97000,Dart


> This method passes a boolean on the DataFrame and only returns values that are True.

Another method can be used to return values where two cases are evaluated. It is denoted by **&** for 'and', and **|** for 'or', as demonstrated below.

In [138]:
df[(df['last_name'] == 'Williams') & (df['first_name'] == 'Benjamin')]

# Returns rows where last name is Williams AND first is Benjamin

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
31,Benjamin,Williams,bwilliams90@gmail.com,37,Scotland,43000,PHP


In [139]:
df[(df['last_name'] == 'Williams') | (df['first_name'] == 'Benjamin')]

# Returns rows where last name is Williams OR first name is Benjamin 

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
19,Andrew,Williams,awilliams14@gmail.com,40,Scotland,58000,Dart
31,Benjamin,Williams,bwilliams90@gmail.com,37,Scotland,43000,PHP


> The same result can be achieved using **.loc** method, as demonstrated below.

In [140]:
df.loc[(df['last_name'] == 'Williams') & (df['first_name'] == 'Benjamin')]

# Returns rows where last name is Williams AND first is Benjamin

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
31,Benjamin,Williams,bwilliams90@gmail.com,37,Scotland,43000,PHP


In [141]:
df.loc[(df['last_name'] == 'Williams') | (df['first_name'] == 'Benjamin')]

# Returns rows where last name is Williams OR first is Benjamin

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
19,Andrew,Williams,awilliams14@gmail.com,40,Scotland,58000,Dart
31,Benjamin,Williams,bwilliams90@gmail.com,37,Scotland,43000,PHP



<br>

<a id='method_add'></a>
### Methods for Adding/Removing Rows and Columns

#### 1. Methods for Adding/Removing Rows

>If you wish to add a row to a DataFrame, you can use the **df.append( )** method

In [204]:
df = df.append({
        'first_name': 'Karim',
        'last_name': 'Carlson',

}, ignore_index=True)

# You can append values to a DataFrame by passing a dictionary where dictionary keys 
# and values correspond to column names and the values you want to append to the DataFrame.
# Assign 'df.append()' method to 'df' variable.

# Remember to pass 'ignore_index=True' or else you will get an Error.

In [205]:
df.tail()

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
45,Edwin,Phillips,ephillips71@gmail.com,27.0,Scotland,97000.0,Dart
46,Neil,Manning,nmanning78@gmail.com,38.0,Canada,49000.0,Java
47,Steven,Kimmel,skimmel64@gmail.com,32.0,Scotland,91000.0,Python
48,Frank,Fernandez,ffernandez63@gmail.com,40.0,New Zealand,70000.0,html; CSS; Javascript
49,Karim,Carlson,,,,,


> Now we can confirm that the DataFrame has been updated. We can see that the other columns show 'NaN' values, because we did not pass values for those columns. 

To remove the row that has been created we will use **df.drop( )** method.

In [208]:
df.drop(49, inplace=True)

Now we check to see if the row has actually been removed

In [209]:
df.tail()

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
44,Victor,Ronson,vronson36@gmail.com,32.0,USA,102000.0,PHP
45,Edwin,Phillips,ephillips71@gmail.com,27.0,Scotland,97000.0,Dart
46,Neil,Manning,nmanning78@gmail.com,38.0,Canada,49000.0,Java
47,Steven,Kimmel,skimmel64@gmail.com,32.0,Scotland,91000.0,Python
48,Frank,Fernandez,ffernandez63@gmail.com,40.0,New Zealand,70000.0,html; CSS; Javascript


>*We have successfully removed the new row that has been created.*

<br>

#### 2. Methods for Adding/Removing Columns

In [223]:
df.head(2)

Unnamed: 0,first_name,last_name,email,age,country,salary,languages,monthly_salary
0,Salim,Albert,salbert29@gmail.com,38.0,Scotland,114000.0,Java,9500
1,Xavier,Ronson,xronson67@gmail.com,37.0,Australia,48000.0,Ruby,4000


*We add a new column by passing the new column name is an index of the DataFrame and assign an operation to it, as demonstrated below.*

In [226]:
df['monthly_salary'] = (df['salary'] / 12)
df.head()

Unnamed: 0,first_name,last_name,email,age,country,salary,languages,monthly_salary
0,Salim,Albert,salbert29@gmail.com,38.0,Scotland,114000.0,Java,9500.0
1,Xavier,Ronson,xronson67@gmail.com,37.0,Australia,48000.0,Ruby,4000.0
2,Frank,Hank,fhank55@gmail.com,27.0,UK,43000.0,Python,3583.333333
3,Marek,Tomassen,mtomassen@gmail.com,70.0,USA,77000.0,Ruby,6416.666667
4,Patrick,Smith,psmith73@gmail.com,35.0,Scotland,60000.0,R,5000.0


<br>

*To remove the column we just created, we can use **df.drop( )** method.*

In [232]:
df.drop(columns='monthly_salary', inplace=True)
df.head()

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
0,Salim,Albert,salbert29@gmail.com,38.0,Scotland,114000.0,Java
1,Xavier,Ronson,xronson67@gmail.com,37.0,Australia,48000.0,Ruby
2,Frank,Hank,fhank55@gmail.com,27.0,UK,43000.0,Python
3,Marek,Tomassen,mtomassen@gmail.com,70.0,USA,77000.0,Ruby
4,Patrick,Smith,psmith73@gmail.com,35.0,Scotland,60000.0,R


>*We have successfully removed the column 'monthly_salary'*

<br>

<a id='method_upd'></a>
### Methods for Updating Rows & Columns

In [125]:
df.columns     # Returns list with column names

Index(['first_name', 'last_name', 'email', 'age', 'country', 'salary',
       'languages'],
      dtype='object')

> Below are 3 different ways to rename columns

In [129]:
df.columns = [x.upper() for x in df.columns]

# Applies .upper() on all columns 

df.columns

Index(['FIRST_NAME', 'LAST_NAME', 'EMAIL', 'AGE', 'COUNTRY', 'SALARY',
       'LANGUAGES'],
      dtype='object')

In [130]:
df.rename(columns={'FIRST_NAME': 'First_Name'}, inplace=True)

# This is used to rename only the values you pass in {'Column_Name': 'New_Column_Name'} 

df.columns

Index(['First_Name', 'LAST_NAME', 'EMAIL', 'AGE', 'COUNTRY', 'SALARY',
       'LANGUAGES'],
      dtype='object')

In [131]:
df.rename(columns= lambda x: x.lower(), inplace=True)

# Rename columns using lambda function

df.columns

Index(['first_name', 'last_name', 'email', 'age', 'country', 'salary',
       'languages'],
      dtype='object')

> *You can use **.loc** method to re-assign values within the specified row, as demonstrated below.*

In [146]:
df.loc[3, ['first_name', 'last_name', 'email', 'age']] = ['Marek', 'Tomassen', 'mtomassen@gmail.com', 70]

# Specify row and and columns where you want to change values using .loc method
# by passing row index and list of column names you wish to re-assign.

df.loc[3]

first_name                  Marek
last_name                Tomassen
email         mtomassen@gmail.com
age                            70
country                       USA
salary                      77000
languages                    Ruby
Name: 3, dtype: object

In [147]:
df.head(4)

Unnamed: 0,first_name,last_name,email,age,country,salary,languages
0,Salim,Albert,salbert29@gmail.com,38,Scotland,114000,Java
1,Xavier,Ronson,xronson67@gmail.com,37,Australia,48000,Ruby
2,Frank,Hank,fhank55@gmail.com,27,UK,43000,Python
3,Marek,Tomassen,mtomassen@gmail.com,70,USA,77000,Ruby


>As you can see, row index 3 values have changed.

*These are the methods for updating rows and columns in **Pandas**.*

<br>

<a id='method_grp'></a>
### Grouping Methods