## Pandas Continued

Lets first lets learn how to update column names.

### Columns

In [None]:
import pandas as pd

data = {
    "first":["John","Mark","David","Peter","James"],
    "last":["Reilly","Boyle","Smith","Doe","Bond"],
    "email":["John.Reilly@strath.ac.uk","Mark.Boyle@MB.com","DavidSmith2020@Smith.co.uk", "PeterDoe@PeterDoe.com","JamesBond007@MI6.gov.uk"],
    "id":[1,2,3,4,5]
}

In [None]:
df = pd.DataFrame(data)
df

In [None]:
df.columns

Lets take a look at how to rename all the column names and then we will cover how to replace only certain column names.

In [None]:
df.columns = ["first name", "last name", "email address", "ID"]
df

We can also use list comprehension to do thing such as uppercase,lowercase or capitalise.

In [None]:
df.columns = [x.capitalize() for x in df.columns]
df

In [None]:
df.columns = [x.upper() for x in df.columns]
df

In [None]:
df.columns = [x.lower() for x in df.columns]
df

We can replace characters in the column names using the str method which contains the replace method.

In [None]:
df.columns = df.columns.str.replace(" ", "_")
df

If we don't want to replace all column names then we should use the rename method provided by pandas. As this allows us to replace 1 or up to all the column names.

In [None]:
df.rename(columns={"first_name":"first","last_name":"last"})

In [None]:
df

In [None]:
df.rename(columns={"first_name":"first","last_name":"last"}, inplace=True)
df

Now we know how to change column names, lets see how to update rows.

### Rows

we can replace data in the any row using the loc method we have used in previous tutorials.

In [None]:
df.loc[1]

In [None]:
df.loc[1] = ["Mark", "Hamilton", "Mark.Hamilton@MH.com", "5"]
df

In [None]:
df.loc[1] = ["Mark", "Hamilton", "Mark.Hamilton@MH.com"]

As you can see the data in  the row 1 has had its data updated but doing it this way is long and cumbersome, as you need to pass data for all columns. 

We can pass a list into the loc method as shown before which means we only need to pass the data we want to change.

In [None]:
df.loc[1, ["first","email_address"]] = ["Marcus", "Marcus.Hamilton@MH.com"]
df

To update a single column on a row, simply remove the list and pass just the string of the column name.

In [None]:
df.loc[1,"id"] = 1234
df

If you want to find or update only one column on a single row you could use the at method which is identical to loc but technically faster.

In [None]:
df.at[1,"id"] = 2
df

If we want to change multiple rows at once, lets say make all email address lower case we would do the following.

In [None]:
df["email_address"] = df["email_address"].str.lower()
df

Another way to update all rows on a single column is to use the apply method.

In [None]:
df["email_address"].apply(len)

What has this returned? It has returned the length of each of the email addresses in the email_address column.

you can pass any function to the apply method, lets change the id numbers to a str instead of an integer.

In [None]:
def int_to_str(num):
    return str(num)

In [None]:
df["id"] = df["id"].apply(int_to_str)
df

What happens if we used apply on the entire dataframe with the len function?

In [None]:
df.apply(len)

It has returned the length of data in each column which we know in this case is 5. 

To get the same return on the entire dataframe as we did on the email column, we need to use the applymap method. 

Be aware we had to convert id from an integer to a str, as integers cannot be used in the len function.

In [None]:
df.applymap(len)

As you can see applymap as applied the len function to each column in the entire dataframe.

In [None]:
df

In [None]:
df.applymap(str.upper)

If we want to change only certain values in a column then we should use the method replace.

In [None]:
df["first"] = df["first"].replace({"John":"Jack", "David":"Davy"})
df

Today we have learned how to use the DataFrame method to load in data from a dictionary, update column names and update rows of data.