<a href="https://colab.research.google.com/github/sosapaul54/2025HUDS/blob/main/2_Working_with_dataframes_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <center>  Working With Dataframes </center>
<div>
<img src="https://pandas.pydata.org/static/img/pandas.svg" width="600"/>
</div>

In the last lecture, we were introduced to the `pandas` library, which gives us the ability to work with two new data types: dataframes and series. In this lecture, we will learn more about how to use and work with dataframes.

We're going to talk about some ways to handle and work with dataframes, which includes basic selection and deletion of columns, renaming columns, etc. Then, we'll go a bit into subsetting data using various selection methods. Finally, we talk about some methods for dataframes that will be useful for data organization, manipulation and analysis.

## Handling Dataframes

To learn how to use dataframes to their fullest capacity, we first need to know how to handle them. First, let's import pandas and numpy.

In [None]:
import pandas as pd
import numpy as np

In [None]:
# Load the Dataset
# Mount the Google Drive
from google.colab import drive
drive.mount('/content/drive', force_remount=True)



# Import the libraries
import numpy as np                  # Scientific Computing
import pandas as pd                 # Data Analysis
import matplotlib.pyplot as plt     # Plotting
import seaborn as sns               # Statistical Data Visualization
#import twitter as tw

# Let's make sure pandas returns all the rows and columns for the dataframe
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

# Force pandas to display full numbers instead of scientific notation
# pd.options.display.float_format = '{:.0f}'.format

# Library to suppress warnings
import warnings
warnings.filterwarnings('ignore')





# Read the dataset
path = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/active_players_2.csv')

# Create the Dataframe
players = pd.DataFrame(path)

Mounted at /content/drive


This is a dataset of sales from a retail store over 4 years. At a quick glance, we can see the type of data collected. We have some IDs in the form of alphanumerical data, some dates, names, locations, etc.

In getting acquainted with our data, we may want to quickly know how many entries there are and how many attributes are measured per entry:

In [None]:
print(players.shape)

(558, 9)


Recall that dataframes are made of concatanated series in the form of columns. To extract one column from a dataframe, we call for the dataframe and then close the name of the column (as a string) in single square brackets or double square brackets:

In [None]:
players['Team']

0              Boston Celtics
1              Boston Celtics
2              Boston Celtics
3              Boston Celtics
4              Boston Celtics
5              Boston Celtics
6              Boston Celtics
7              Boston Celtics
8              Boston Celtics
9              Boston Celtics
10             Boston Celtics
11             Boston Celtics
12             Boston Celtics
13             Boston Celtics
14             Boston Celtics
15             Boston Celtics
16             Boston Celtics
17             Boston Celtics
18             Boston Celtics
19              Brooklyn Nets
20              Brooklyn Nets
21              Brooklyn Nets
22              Brooklyn Nets
23              Brooklyn Nets
24              Brooklyn Nets
25              Brooklyn Nets
26              Brooklyn Nets
27              Brooklyn Nets
28              Brooklyn Nets
29              Brooklyn Nets
30              Brooklyn Nets
31              Brooklyn Nets
32              Brooklyn Nets
33        

In [None]:
players[['Team']]

Unnamed: 0,Team
0,Boston Celtics
1,Boston Celtics
2,Boston Celtics
3,Boston Celtics
4,Boston Celtics
5,Boston Celtics
6,Boston Celtics
7,Boston Celtics
8,Boston Celtics
9,Boston Celtics


**What do notice about the difference in extracting a column with single square brackets versus double square brackets?**

Notice that using single square brackets returns a series, while the use of double square brackets return a one-column dataframe. This is important, as it can dictate how the output can be used, as well as what functions and methods can be used on it.

For example, if we wanted to get the first item of the `Segment` column, we could use `sales['Segment'][0]` with a series, but not with a dataframe:

In [None]:
players['Salary'][544]

12420000.0

However, if you wanted to select multiple columns at once, that can be done with a dataframe, but not a series:

In [None]:
players[['Name', 'Age']]

Unnamed: 0,Name,Age
0,Juhann Begarin,19
1,Jaylen Brown,24
2,Kris Dunn,27
3,Carsen Edwards,23
4,Tacko Fall,25
...,...,...
553,Juwan Morgan,24
554,Royce O'Neale,28
555,Olumiye Oni,24
556,Eric Paschall,24


In [None]:
players.Name

0        Juhann Begarin
1          Jaylen Brown
2             Kris Dunn
3        Carsen Edwards
4            Tacko Fall
             ...       
553        Juwan Morgan
554       Royce O'Neale
555         Olumiye Oni
556       Eric Paschall
557    Hassan Whiteside
Name: Name, Length: 558, dtype: object

It is also useful to rename columns in a dataframe using the `.rename()` method. Calling this method on a dataframe, you can easily rename a column by passing a dictionary with the keys being the current column name and the values being the desired column name:

In [None]:
players = players.rename(columns={'Height':'Height in Ft', 'Height_i':'Height_f'})
players.head(3)

NameError: ignored

New columns can be added to dataframes as well. If you wish to add a column based on other columns that already exist in the dataframe, you can perform operations on the columns and assign them to the new desired column.

In [None]:
players['Height'] = players['Height_i'] *12
players.head(3)


NameError: name 'players' is not defined

If desired, you can also create an empty column:

In [None]:
players.insert(3,'Empty', " ")
players.head(3)

Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
0,Juhann Begarin,Boston Celtics,SG,,19,"6' 5""",6.5,185,,,204,"6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6...",78.0
1,Jaylen Brown,Boston Celtics,SG,,24,"6' 6""",6.6,223,California,26758928.0,247,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2
2,Kris Dunn,Boston Celtics,PG,,27,"6' 3""",6.3,205,Providence,5005350.0,232,"6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6...",75.6


There may be times where you want to delete columns. This can be accomplished using the `.drop()` method. When you call this method on a dataframe, you can pass a single column name or a list of column names to the `columns` parameter. The `inplace=True` parameter updates the dataframe.

In [None]:
players.drop(columns=['Empty'], inplace=True)
players.head(3)

Unnamed: 0,New Name,Team,Position,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
0,Juhann Begarin,Boston Celtics,SG,19,"6' 5""",6.5,185,,,204,"6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6...",78.0
1,Jaylen Brown,Boston Celtics,SG,24,"6' 6""",6.6,223,California,26758928.0,247,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2
2,Kris Dunn,Boston Celtics,PG,27,"6' 3""",6.3,205,Providence,5005350.0,232,"6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6...",75.6


Notice that we utilized two ways of updating dataframes using these methods. One way was to use the method on the dataframe and reassign the output of that to the name of the dataframe (as we did when using the `.rename()` method).The other way was by specifying the `inplace` parameter as `True` (as we did with the `.drop()` method).

Both of these ways are fine, but be mindful that not all methods and functions have an `inplace` parameter. Therefore, it's always a good idea to refer to the documentation of a function or method to understand how it can be used.

## Subsetting data in dataframes

Being able to parse and subset data intentionally is important for downstream data analysis. When consecutive rows of data need to be extracted, dataframes can be sliced similar to lists and arrays. To extract the first 50 rows of `sales` we could slice the dataframe as follows:

In [None]:
top50 = players[0:50]
#print('Number of rows in top50:', len(top50))
#top50.head(3)

top50.shape
top50.columns

Index(['New Name', 'Team', 'Position', 'Empty', 'Age Today', 'Height',
       'Height_i', 'Weight', 'College', 'Salary', 'Weight and Age',
       'Height inch', 'Height inch2'],
      dtype='object')

Likewise we can extract intervals of rows using double colons:

In [None]:
even_rows = players[0::2]
print('Selecting every other column gives us a dataframe of', len(even_rows), 'rows.')

even_rows.head(3)

Selecting every other column gives us a dataframe of 279 rows.


Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
0,Juhann Begarin,Boston Celtics,SG,,19,"6' 5""",6.5,185,,,204,"6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6...",78.0
2,Kris Dunn,Boston Celtics,PG,,27,"6' 3""",6.3,205,Providence,5005350.0,232,"6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6...",75.6
4,Tacko Fall,Boston Celtics,C,,25,"7' 5""",7.5,311,UCF,,336,"7' 5""7' 5""7' 5""7' 5""7' 5""7' 5""7' 5""7' 5""7' 5""7...",90.0


To exact all rows and a specific column, we can use the `.iloc[]` method. The `.iloc[]` method uses integer-location based indexing for selection of rows and columns by position. To select all rows and the `Ship Date` column (which has an index of 3), we can type the following code:

In [None]:
players.iloc[:, 3]

0      19
1      24
2      27
3      23
4      25
       ..
553    24
554    28
555    24
556    24
557    32
Name: Age Today, Length: 558, dtype: int64

Vice versa, we can select the data corresponding to all columns of the fourth row (which has an index of 3).

In [None]:
players.iloc[3, :]

New Name                                             Carsen Edwards
Team                                                 Boston Celtics
Position                                                         PG
Age Today                                                        23
Height                                                       5' 11"
Height_i                                                       5.11
Weight                                                          200
College                                                      Purdue
Salary                                                    1782621.0
PS and Team                                      PG, Boston Celtics
Weight and Age                                                  223
Height inch       5' 11"5' 11"5' 11"5' 11"5' 11"5' 11"5' 11"5' 1...
Height inch2                                                  61.32
Empty                                                              
Name: 3, dtype: object

Using `.iloc[]` we can specifically select cells of a dataframe by inputting its row and column index values:

In [None]:
players.iloc[234, 9]

'PG, Miami Heat'

Similar to slicing, we can also select specific indexes or a range of consecutive indexes:

In [None]:
players.iloc[0:500, [3,6,12]]

Unnamed: 0,Age Today,Weight,Height inch2
0,19,185,78.00
1,24,223,79.20
2,27,205,75.60
3,23,200,61.32
4,25,311,90.00
...,...,...,...
495,23,213,76.80
496,27,218,80.40
497,22,264,82.80
498,25,193,76.80


Another way to extract data from dataframes is using the `.loc[]` method. The `.loc[]` method utilizes labels or boolean arrays to select data. While specifying the indices of a list of rows you wish to select, you can utilize the name of a column to extract data of interest, as shown below:

In [None]:
players.loc[[10,15,67], "College"]

10        Vanderbilt
15    Oklahoma State
67              Duke
Name: College, dtype: object

The `.loc[]` method is a really powerful selection method that can be used to extract data based on conditions. Here, we select all rows where the `Team` column equates to `Denver Nuggets`:

In [None]:
DN = players.loc[(players['Team'] == 'Denver Nuggets')]
DN.head(3)

Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
469,Will Barton,Denver Nuggets,SF,,30,"6' 6""",6.6,181,Memphis,15625000.0,211,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2
470,Bol Bol,Denver Nuggets,C,,21,"7' 2""",7.2,220,Oregon,2161152.0,241,"7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7...",86.4
471,Facundo Campazzo,Denver Nuggets,PG,,30,"5' 10""",5.1,195,,3200000.0,225,"5' 10""5' 10""5' 10""5' 10""5' 10""5' 10""5' 10""5' 1...",61.2


We can also compound conditions using bitwise operators. We can select rows that meet one of multiple conditions by using the `|` operator:

In [None]:
champs = players.loc[(players['Team'] == 'Denver Nuggets') | (players['Salary'] < 1000000)]
champs.head(3)

champs.shape

(23, 13)

If we want rows that meet <u>all</u> of our conditions of interests, we can use the `&` operator:

In [None]:
champs = players.loc[(players['Team'] == 'Denver Nuggets') & (players['Salary'] > 1000000)]
champs.head(3)

Unnamed: 0,New Name,Team,Position,Age Today,Height,Height_i,Weight,College,Salary,PS and Team,Weight and Age,Height inch,Height inch2,Empty
469,Will Barton,Denver Nuggets,SF,30,"6' 6""",6.6,181,Memphis,15625000.0,"SF, Denver Nuggets",211,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2,
470,Bol Bol,Denver Nuggets,C,21,"7' 2""",7.2,220,Oregon,2161152.0,"C, Denver Nuggets",241,"7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7...",86.4,
471,Facundo Campazzo,Denver Nuggets,PG,30,"5' 10""",5.1,195,,3200000.0,"PG, Denver Nuggets",225,"5' 10""5' 10""5' 10""5' 10""5' 10""5' 10""5' 10""5' 1...",61.2,


As shown before, we don't have to return all columns of the dataframe when selecting based on condition. We can return only the columns of interest by passing the names of these columns as a list into the `.loc[]` method:

In [None]:
champs = players.loc[(players['Team'] == 'Denver Nuggets') & (players['Salary'] > 1000000)]
champs

Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
469,Will Barton,Denver Nuggets,SF,,30,"6' 6""",6.6,181,Memphis,15625000.0,211,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2
470,Bol Bol,Denver Nuggets,C,,21,"7' 2""",7.2,220,Oregon,2161152.0,241,"7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7' 2""7...",86.4
471,Facundo Campazzo,Denver Nuggets,PG,,30,"5' 10""",5.1,195,,3200000.0,225,"5' 10""5' 10""5' 10""5' 10""5' 10""5' 10""5' 10""5' 1...",61.2
472,Vlatko Cancar,Denver Nuggets,SF,,24,"6' 8""",6.8,236,,1782621.0,260,"6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6...",81.6
473,PJ Dozier,Denver Nuggets,SG,,24,"6' 6""",6.6,205,South Carolina,1910860.0,229,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2
474,Aaron Gordon,Denver Nuggets,PF,,25,"6' 8""",6.8,235,Arizona,16409091.0,260,"6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6...",81.6
475,JaMychal Green,Denver Nuggets,PF,,31,"6' 8""",6.8,227,Alabama,8200000.0,258,"6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6...",81.6
476,Jeff Green,Denver Nuggets,PF,,35,"6' 8""",6.8,235,Georgetown,4500000.0,270,"6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6...",81.6
479,Nah'Shon Hyland,Denver Nuggets,PG,,20,"6' 3""",6.3,165,Virginia Commonwealth,2096880.0,185,"6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6' 3""6...",75.6
480,Nikola Jokic,Denver Nuggets,C,,26,"6' 11""",6.11,284,,31579390.0,310,"6' 11""6' 11""6' 11""6' 11""6' 11""6' 11""6' 11""6' 1...",73.32


Using a combination of index-based and label-based location, the `.loc[]` method also allows us to select specific cells of interest. By doing so, we can change data in our dataframe as needed:

In [None]:
players.loc[0, ['College']]

College    Geogetown
Name: 0, dtype: object

Finally, we can use `.loc[]` to insert rows in specific positions. If we want to insert a row between index 558 and 559, can set an index at some number in between (i.e. 558.5) and add data using a dictionary. The key of the dictionary corresponds to the name of the column and the value corresponds to the data added in that column for the row:

In [None]:
players.loc[558.5] = {'New Name': 'Amy Quarkume', 'College':'Howard','Position':'PG','Age Today':38}
players.tail()

Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
554.0,Royce O'Neale,Utah Jazz,PF,,28,"6' 4""",6.4,226.0,Baylor,8800000.0,254.0,"6' 4""6' 4""6' 4""6' 4""6' 4""6' 4""6' 4""6' 4""6' 4""6...",76.8
555.0,Olumiye Oni,Utah Jazz,SG,,24,"6' 5""",6.5,206.0,Yale,1782621.0,230.0,"6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6...",78.0
556.0,Eric Paschall,Utah Jazz,F,,24,"6' 6""",6.6,255.0,Villanova,1782621.0,279.0,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2
557.0,Hassan Whiteside,Utah Jazz,C,,32,"7' 0""",7.0,265.0,Marshall,1669178.0,297.0,"7' 0""7' 0""7' 0""7' 0""7' 0""7' 0""7' 0""7' 0""7' 0""7...",84.0
558.5,Amy Quarkume,,PG,,38,,,,Howard,,,,


## Organizing, manipulating, analyzing data in dataframes

There are many functions and methods that can assist with organizing, manipulating, and analyzing data in dataframes. We will discuss several important ones to remember.

The first method we will discuss is the `.sort_values()` method. This is a useful method for looking at trends or sequential values in a dataset. The `.sort_values()` method has an `ascending` parameter, which is set to `True` by default. It also has an `inplace` parameter, which is set to `False` by default:



In [None]:
players.sort_values('Team', inplace= True)
players.head(3)

Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
190.0,Bogdan Bogdanovic,Atlanta Hawks,SG,,29,"6' 6""",6.6,220.0,,18000000.0,249.0,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2
206.0,Trae Young,Atlanta Hawks,PG,,22,"6' 1""",6.1,180.0,Oklahoma,8326471.0,202.0,"6' 1""6' 1""6' 1""6' 1""6' 1""6' 1""6' 1""6' 1""6' 1""6...",73.2
205.0,Delon Wright,Atlanta Hawks,SG,,29,"6' 5""",6.5,185.0,Utah,8526316.0,214.0,"6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6...",78.0


Another useful method is the
`.value_counts()` method. This method can be used on a series to count the number of occurences for each unique value in the series. Let's use this method to determine how many shipments were shipped to each region:






In [None]:
players['Team'].value_counts()

Brooklyn Nets             25
Memphis Grizzlies         22
Orlando Magic             21
Charlotte Hornets         21
Oklahoma City Thunder     21
New York Knicks           21
Houston Rockets           21
Milwaukee Bucks           20
Philadelphia Sixers       20
San Antonio Spurs         19
Golden State Warriors     19
Boston Celtics            19
Portland Trail Blazers    18
Washington Wizards        18
Sacremento Kings          18
Los Angeles Clippers      18
Detroit Pistons           18
Dallas Mavericks          18
Utah Jazz                 18
Indiana Pacers            18
Phoenix Suns              17
Toronto Raptors           17
Atlanta Hawks             17
New Orleans Pelicans      17
Denver Nuggets            17
Chicago Bulls             17
Miami Heat                17
Los Angeles Lakers        16
Minnesota Timberwolves    15
Cleveland Cavaliers       15
Name: Team, dtype: int64

There are a number of statistical and summative methods that can be used on dataframes as well. For example, the `.min()` and `.max()` methods can be used on a a dataframe or a column within a dataframe to determine the minimum and maximum values, respectively. The `.describe()` method returns multiple useful summary statistics:

In [None]:
players.describe()

Unnamed: 0,Age Today,Height_i,Weight,Salary,Weight and Age,Height inch2
count,559.0,558.0,558.0,445.0,558.0,558.0
mean,25.565295,6.492151,216.163082,8813696.0,241.706093,77.905806
std,4.346829,0.335012,24.573787,9886777.0,25.496076,4.020145
min,18.0,5.1,160.0,925258.0,180.0,61.2
25%,22.0,6.2,199.25,1802057.0,223.0,74.4
50%,25.0,6.5,215.0,4447896.0,239.0,78.0
75%,28.0,6.7,233.0,12000000.0,258.0,80.4
max,41.0,7.5,311.0,45780970.0,336.0,90.0


In [None]:
players['Age Today'].max()

41

In [None]:
players['Age Today'].min()

18

In [None]:
players['Team'].describe()

count               558
unique               30
top       Brooklyn Nets
freq                 25
Name: Team, dtype: object

Here's a short list of methods that can be used to determine summative information and/or statistics:

`.min()` - determines the minimum value in a dataframe/series

`.max()` - determines the maximum value in a dataframe/series

`.mean()` - determines the mean value of a dataframe/series

`.median()` - determines the median value in a dataframe/series

`.sum()` - determines the sum of the values in a dataframe/series

`.mode()` - determines the mode of a dataframe/series

`.nlargest()` - determines the largest *n* items in a series. (Default = 5)

`.nsmallest()` - determines the smallest *n* items in a series. (Default = 5)

`.count()` - counts all non-null items in a dataframe/series

`.value_counts` - counts the number of times an item occurs in a dataframe/series

`.nunique()` - counts the number of unique items in a dataframe/series

`.unique()` - lists all of the unique items in a dataframe/series

`.describe` - generates descriptive statistics of a dataframe column, including those that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding null values (NaN).

Sometimes when working with data, especially data generated by human input, we encounter errors within the data. One useful tool that can be used to find mistakes is the `.duplicated()` method. This method shows duplicated rows within a dataframe, like so:

In [None]:
players[players.duplicated()]

When `.duplicated()` is called on a dataframe, another dataframe containing the rows of the duplicated entries is returned to us. We can easily get rid of these duplicated rows by calling `.drop_duplicates()` on the dataframe:

In [None]:
print(players.shape)
players = players.drop_duplicates()
print(players.shape)

We see that prior to dropping the duplicates, `sales` had 9804 rows. After dropping the duplicates, there are now 9799 rows, 5 less than before. This is the number of rows that were determined to be duplicated in `sales`, as shown above.

We can also correct typos observed in dataframes. Say for example, we had a customer listed in our dataframe as Eric Hoffmann. We can extract all the rows pertaining to this customer:

In [None]:
players.loc[(players['Team']) == 'Boston Celtics'].head(3)

Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
18.0,Robert Williams III,Boston Celtics,C,,23,"6' 8""",6.8,237.0,Texas A&M,3661976.0,260.0,"6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6' 8""6...",81.6
0.0,Juhann Begarin,Boston Celtics,SG,,19,"6' 5""",6.5,185.0,,,204.0,"6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6...",78.0
5.0,Bruno Fernando,Boston Celtics,F,,23,"6' 9""",6.9,240.0,Maryland,1782621.0,263.0,"6' 9""6' 9""6' 9""6' 9""6' 9""6' 9""6' 9""6' 9""6' 9""6...",82.8


Say an extra 'n' was accidentally added to Eric's last name. We can correct this typo in all of these cells by using the `.replace()` method. When using the `.replace()` method, we can call the current mispelling into the `to_replace` parameter. We can then call the correct spelling into the `value` parameter. The `inplace` parameter will update our dataframe in real time.

If we try to extract rows with the original spelling, we get an empty dataframe returned to us:

In [None]:
players["Team"].replace(to_replace='Boston Celtics', value='BC', inplace=True)

players.loc[(players['Team']) == 'Boston Celtics'].head(3)

Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2


If we use the new spelling, we can see that the name has been updated:

In [None]:
players.loc[(players['Team']) == 'Boston Celtics'].head(3)

We can also apply string methods to columns in a dataframe and update the column through variable assignment:

In [None]:
players['College'] = players['College'].str.upper()
players.head(3)

Unnamed: 0,New Name,Team,Position,Empty,Age Today,Height,Height_i,Weight,College,Salary,Weight and Age,Height inch,Height inch2
190.0,Bogdan Bogdanovic,Atlanta Hawks,SG,,29,"6' 6""",6.6,220.0,,18000000.0,249.0,"6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6' 6""6...",79.2
206.0,Trae Young,Atlanta Hawks,PG,,22,"6' 1""",6.1,180.0,OKLAHOMA,8326471.0,202.0,"6' 1""6' 1""6' 1""6' 1""6' 1""6' 1""6' 1""6' 1""6' 1""6...",73.2
205.0,Delon Wright,Atlanta Hawks,SG,,29,"6' 5""",6.5,185.0,UTAH,8526316.0,214.0,"6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6' 5""6...",78.0


Another useful method is the `.map()` method. The `.map()` method substitutes values from a series, dictionary, or function based on current values in the dataframe. Using this method, we can map the values in the `Shipping Method` column to their corresponding string values like so: