# Introduction to `pandas`

<table>
    <tr>
        <td><img src="https://mddean.people.wm.edu/MBA/images/panda2.jpg" alt="Panda in a tree" height=480></td>
        <td><img src="https://mddean.people.wm.edu/MBA/images/panda3.jpg" alt="Panda being cute"></td>
    </tr>
</table>


One of the early criticisms from many people in the data science arena surrounding the use of the Python language was the lack of useful data structures for performing data analysis tasks. This criticism stemmed in part from comparisons between the R language and Python. R has a built-in *DataFrame* object that greatly simplified many data analysis tasks. This deficiency was addressed in 2008 by Wes McKinney with the creation of [pandas][1] (the name was originally an abbreviation of panel data), and this module continues to be improved. To quote the pandas documentation:

>Python has long been great for data munging and preparation, but less
>so for data analysis and modeling. pandas helps fill this gap, enabling
>you to carry out your entire data analysis workflow in Python without
>having to switch to a more domain specific language like R.

The pandas module introduces several new data structures like the `Series`, `DataFrame`, and `Panel` that build on top of existing tools like `NumPy` to speed up data analysis tasks. (`NumPy` stands for "numerical python" and the module provides support for many useful numerical operations that we will explore in a future module). The pandas module also provides efficient mechanisms for moving data between in-memory representations and different data formats, including comma separated values (`.csv`) and text files, JSON files, SQL databases, HDF5 format files, and even Excel spreadsheets. Finally, the pandas module also provides support for dealing with missing or incomplete data and aggregating or grouping data.

-----
[1]: http://pandas.pydata.org

### Importing pandas

To use the pandas package, we import it using the de facto standard `pd` alias.

In [None]:
import pandas as pd

<hr style="border:1px solid gray">

## Basic Data Structures

There are three main data types in pandas: `Series`, `Index`, and `DataFrame`. We will focus on `Series` and `DataFrame` objects. They have similar functionality. In fact, a `DataFrame` contains one or more `Series` objects. Each data column (which is a `Series`) in a `DataFrame` must contain only a single data type (e.g., int32, float64, object). Note that there are also the types `TimeSeries` and `Panel` in pandas, but the types we will most commonly use are `DataFrame` and `Series`.


### Operations with `Series`

Let's create a larger `Series` and try a few different basic operations.

In [None]:
# Create ice cream flavors list
ice_cream = ['chocolate', 'strawberry', 'vanilla', 'rum raisin', 
             'chocolate', 'vanilla', 'vanilla', 'strawberry', 
             'rum raisin', 'chocolate', 'strawberry', 'cotton candy', 
             'chocolate', 'vanilla', 'rum raisin', 'vanilla', 
             'vanilla', 'strawberry', 'chocolate', 'vanilla', 
             'chocolate', 'vanilla', 'strawberry', 'vanilla', 
             'chocolate', 'chocolate', 'purple cow', 'chocolate', 
             'rum raisin', 'vanilla', 'chocolate', 'bubble gum', 'vanilla']

In [None]:
# Create a new series
flavors = pd.Series(ice_cream)
print(flavors)

In [None]:
# Look at just the top using .head()
print(flavors.head())

In [None]:
# By default head gives top 5
# You can specify n - the number to show
print(flavors.head(3))

In [None]:
# You can also see the bottom using .tail()
print(flavors.tail(3))

In [None]:
# Sometimes we want to "sample" from the series 
print(flavors.sample(5))

### Counting Categorical Series

Suppose our ice cream flavors were the results of a survey filled out by your fellow classmates. It would be nice to know which flavors were the most popular, least popular, etc. You could sum up the responses for each flavor using a `for` loop. Fortunately, pandas provides a much easier way to arrive at our summary by using the `value_counts()` method. Let's try it.

In [None]:
# Find out the popularity of each flavor
print(flavors.value_counts())

In [None]:
# What type does value_counts() return?
print(type(flavors.value_counts()))

Because the returned object is a `Series`, you can call the `.index` and `.values` attributes just like you would on any other `Series` object. 

There is also a way to easily find only the unique values for a categorical `Series` like our `flavors` object: use the `unique()` method.

In [None]:
# Find only the unique flavors
print(flavors.unique())

<hr style="border:1px solid gray">

<font color='red' size = '5'> Student Exercise </font>

In the code cell below, you have been given a `list` representing the responses of your third-grade niece's survey to her classmates. She asked them, "What is your favorite color?" 

1. Run the code cell that contains the variable named `colors`.
2. Create a `Series` object named `color_series`.
3. For each color, how many students responded it was their favorite? Which one is the most popular? Least popular?
4. How many different, unique colors were given in the responses?

-----

In [None]:
# 1. Run the code cell that contains the variable named `colors`.
colors = ['red', 'orange', 'yellow', 'green', 'pink', 'purple',
          'blue', 'indigo', 'blue', 'red', 'green', 'violet', 
          'purple', 'red', 'blue', 'blue', 'yellow', 'green',
          'blue', 'red', 'purple']

In [None]:
# 2. Create a `Series` object named `color_series`.


In [None]:
# 3. For each color, how many students responded it was their
# favorite? Which one is the most popular? Least popular?


In [None]:
# 4. How many different, unique colors were given in the responses?


<hr style="border:1px solid gray">

## Numerical `Series`

I want to create some random numerical data to see how a `Series` containing numerical data is different from a `Series` with categorical or string data, as we had above. We'll use the `numpy` package to generate the random numbers. (We will discuss the `numpy` package in a later module.)

In [None]:
# import the numpy package using np alias
import numpy as np

In [None]:
# Set the seed so that I can replicate the random numbers
np.random.seed(42)

# List comprehension to generate random floating point numbers with one digit
temps = [float(f'{np.random.randint(45, 67) + np.random.random():.1f}') for i in range(50)]
print(temps)

In [None]:
# Create a series from float list temps
tempF = pd.Series(temps)
print(tempF)

-----

### Accessing `Series` Data

Data rows can be accessed either by the "label" in the index column or by their position in the data column. The `.loc` command finds data rows based on their label. The `.iloc` command finds data rows based on its position; that is, the sequence in which the rows are found in the `Series`. One way to remember the difference between the two methods is that the `i` in `iloc` stands for the **integer** position to look up. When we created our `tempF` `Series`, we did not specify an index. Therefore, it will have a `RangeIndex` starting at 0 and incrementing by 1 for each subsequent row. When we create a `Series` this way, the commands `.loc` and `.iloc` will work identically. This is the case for our `tempF` object.

In [None]:
# Get the element using the labeled index of 0
print(tempF.loc[0])

In [None]:
# Get the element using the INTEGER position of 0
print(tempF.iloc[0])

But now let's change the indices to something else.

In [None]:
# Change index to start at 50
tempF.index = range(50, 100, 1)
print(tempF.index)

The statement below will result in a `KeyError` because our index labels have changed.

In [None]:
tempF.loc[0]

To get the first element of the `Series`, we need to use the newly labeled index of 50.

Using `.iloc[0]`, on the other hand, will still give us the first row of data in the `Series`.

In [None]:
# Try getting the first element with the new label
print(tempF.loc[50])

In [None]:
# Using the integer position we can still use 0 for first element
print(tempF.iloc[0])

-----

### `Series` Methods for Numerical Data

When we have a numerical data type in a `Series`, we will often want to find some summary statistics to get an idea of the data we are dealing with. Let's try a few of the methods we have available to us.

In [None]:
# We add up the values with sum()
print(f'tempF.sum():     {tempF.sum()}')

# We can find the average with mean()
print(f'tempF.mean():    {tempF.mean()}')

# We can find the median with median()
print(f'tempF.median():  {tempF.median()}')

# We can find the standard deviation with std()
print(f'tempF.std():     {tempF.std()}')

# We can find the product with product()
print(f'tempF.product(): {tempF.product()}')

Those methods are all nice, but there is a function that we can use on a numerical `Series` that provides some of the most common summary statistics: `describe()`.

In [None]:
# Get the summary statistics
print(tempF.describe())

### Sorting

We will also want to sort a `Series` based on the numerical values. As expected, you can sort in either ascending or descending order.

In [None]:
# First look at the head()
print(tempF.head())

In [None]:
# Try sorting to see what happens
tempF.sort_values()
print(tempF.head())

Well, that did **not** work the way we had hoped. What happened? The `sort_values()` returns a new `Series` object. We can store the result in a new variable and see if that reacts the way we had hoped.

In [None]:
sorted_temps = tempF.sort_values()
print(sorted_temps.head())

What if we want to keep it in the same variable? We can sort "inplace".

In [None]:
# See original
print(tempF.head())

In [None]:
# Sort in place
tempF.sort_values(inplace=True)
print(tempF.head())

Now, if we want to sort in descending order, we have to add the argument `ascending=False`.

In [None]:
tempF.sort_values(inplace=True, ascending=False)
print(tempF.head())

What if you want to get back to the original sort order? Well, if you created the `Series` object with a `RangeIndex`, you can use that to get back to the original order by calling `sort_index()`.

In [None]:
# Re-sort using the index
tempF.sort_index(inplace=True)
print(tempF.head())

----

### Concatenating One Series to Another

We can use the `pd.concat()` method to concatenate multiple `Series` objects. As we saw with sorting, the combined `Series` object is not permanent unless you resave it or save it in a new variable.

In [None]:
# Look at original size and index
print(f'tempF.size:  {tempF.size}')
print(f'tempF.index: {tempF.index}')

# Create a new single element Series
temp_series = pd.Series([0.0])

# append it to tempF and see if it "stuck"
pd.concat([tempF, temp_series])

print('\nAFTER CONCATENATING:')
print(f'tempF.size:  {tempF.size}')
print(f'tempF.index: {tempF.index}')

This confirms that it did **not** save the result of the concatenation back to the original `Series` object. Let's try again.

In [None]:
# append it to tempF and save it back to tempF
tempF = pd.concat([tempF, temp_series])

print('AFTER CONCATENATING:')
print(f'tempF.size:  {tempF.size}')
print(f'tempF.index: {tempF.index}')

Now, we have our combined `Series`. Notice that the index it gave the new element was `0`. If we want to get rid of that new value, then we can use the `.drop()` method. The `.drop()` method deletes a row in a `Series` based on the row label (i.e., index). Let's try it.

In [None]:
tempF.drop(labels=0, inplace=True)
print('AFTER DROPPING:')
print(f'tempF.size:  {tempF.size}')
print(f'tempF.index: {tempF.index}')

<hr style="border:1px solid gray">

<font color='red' size = '5'> Student Exercise </font>

In the code cell below, you have been given a `list` of the US presidents' heights in centimeters. Complete the following tasks:

1. Run the code cell that contains the variable named `prez_heights`.
2. Create a `Series` object named `prez_series` from the given list. Print out its type.
3. Print the first 7 elements of `prez_series`.
4. Print the last 4 elements of `prez_series`.
5. Print a sample of 5 elements of `prez_series`.
6. Print the summary statistics for `prez_series`.

-----

In [None]:
# 1. Run the code cell that contains the variable named `prez_heights`.
prez_heights = [193, 192, 191, 189, 188, 188, 188, 188, 188, 187, 
               185, 185, 185, 183, 183, 183, 183, 183, 183, 182, 
               182, 182, 182, 182, 180, 180, 179, 178, 178, 178, 
               178, 177, 175, 175, 174, 173, 173, 173, 173, 171, 
               170, 170, 168, 168, 163]

In [None]:
# 2. Create a `Series` object named `prez_series` from 
# the given list. Print out its type.


In [None]:
# 3. Print the first 7 elements of `prez_series`.


In [None]:
# 4. Print the last 4 elements of `prez_series`.


In [None]:
# 5. Print a sample of 5 elements of `prez_series`.


In [None]:
# 6. Print the summary statistics for `prez_series`.


<hr style="border:1px solid gray">

## Understanding `DataFrame`s

The `Series` class can be thought of as a single column of a spreadsheet where all the data is the same type. The `DataFrame` class builds on the `Series` class by having many columns, each with their own data type. You can think of this as representing the entire spreadsheet. So, a `DataFrame` is simply a two-dimensional object where each column is a `Series`. Thus, all of the `Series` properties and methods we worked with earlier can be applied to individual `DataFrame` columns. 

### Commonly Used Attributes

Here are a few of the commonly used attributes on a `DataFrame` object.

| Attribute | Returns |
|:----------|--------:|
|`dtypes` | The data type of each column |
|`shape` | A `tuple` showing the number of rows and columns |
|`index` | The `Index` object of the `DataFrame` |
|`columns` | The names of the columns |
|`values` | The data in the `DataFrame` object |
|`empty` | Checks to see if the `DataFrame` object is empty|

<hr style="border:1px solid gray">

## Manipulating `DataFrame`s

We have seen how to create DataFrames when given a list for each column. This is only one way to create a DataFrame. I encourage you to explore other ways to use the constructor of the `DataFrame` class to create objects.

Once you have a DataFrame in memory, you can begin to manipulate it and perform some analysis of the data. You can extract an entire column by using the square bracket notation: `data_frame_name['column_name']`. You can also use the dot notation to retrieve an entire column, assuming the column name does not contain spaces: `data_frame_name.column_name`.

Getting rows of a `DataFrame` is similar to retrieving a row in a `Series`. You can either use `.loc[label]` or `.iloc[integer position]`.

### Attributes and Methods of a `DataFrame`

We've already seen some of the commonly used attributes for getting information about a `DataFrame`. One detail that you should be aware of is that the type of the returned object from calling `values` is a `numpy.ndarray` - something we will see more of later.

### Sorting a `DataFrame`

We can sort a `DataFrame` by a specified column using the `sort_values()` method and passing in the name of the column to the `by` argument. If you want the sort order to be preserved, you can use the `inplace=True` argument.

### Creating New Columns in a `DataFrame`

There are many times that you will want to create a new column within an existing `DataFrame`, often based on the current columns. In machine learning parlance, this is often referred to as "feature engineering". To keep things very, very simple, we will simply convert one of the columns from centimeters to inches. Hopefully it is obvious that this would not really help us if we were trying to improve some of our machine learning models, since it is simply scaling the variable rather than exploring a relationship among different variables. 

Let's convert the `PetalLengthCm` column to inches and name it `PetalLengthIn`. 

The statement below might be a little confusing at first glance. It seems we are dividing a `DataFrame` column, which is a `Series`, by a constant. Will this work even though it appears we have mixed types of a `Series` and a `float`? Indeed, it will work. The division is interpreted element-wise, so each element of the `Series` is divided by our conversion constant of 2.54.

```python
# conversion constant
cm_per_inch = 2.54
iris['PetalLengthIn'] = iris['PetalLengthCm'] / cm_per_inch
iris
```

### Grouping

As expert users of Excel, you understand PivotTables and really want to use them in Python. We'll look at this in more detail later, but here is a small taste of creating one. We are interested in looking at the average values for each numerical column for each species. That is, we want to "group by" species and see the average for all the columns.

```python
# Create a pivot table
iris.groupby('Species').mean()
```

Wow! Super easy! You can also use other statistical functions such as median.

<hr style="border:1px solid gray">

## Reading and Manipulating `.csv` Files

Commonly, the data we are interested in resides in external files. We have already seen ways to read data from files using built-in "base" Python functions. Fortunately, `pandas` provides an efficient and easy alternative to read the data from files into a `DataFrame`. If your file is in a text file, such as a `.csv` file, you can use the `.read_csv()` method. The nice thing about `.csv` files is that they are text and can easily be transferred and read on any operating system. If instead your data is stored in Microsoft Excel files, there are `pandas` methods to read data from that format too. Let's start with `.csv` files.

In [None]:
# Use pandas to read states.csv into a DataFrame
states = pd.read_csv('./data/states.csv')
states

In [None]:
# Get a sample of the states DataFrame
states.sample()

In [None]:
# That only returned a single row
# Let's get 5 rows
states.sample(5)

In [None]:
# Sort by population descending
states.sort_values(by='Population', ascending=False)

In [None]:
# What if we wanted only the top five most populous states?
# We could sort and then take the .head(5) as one approach
top5 = states.sort_values(by='Population', ascending=False).head(5)
top5

In [None]:
# What if we wanted the top five largest by land area?
top5_land = states.sort_values(by='SquareMiles', ascending=False).head(5)
top5_land

In [None]:
# Try .describe() to see what it does for us
states.describe()

In [None]:
# What if we wanted to add up the columns?
# Now just call the .sum() on the DataFrame 
states.sum()

In [None]:
# What if we don't want State?
# We could specify the columns we want as a list
states[['Population','ElectoralVotes','HighwayMiles','SquareMiles']].sum()

If your list of columns is large, listing them all out is neither fun nor practical. If you know the name of the column you do not want to include, you can instead specify to **exclude** it. We do so next.

In [None]:
# Exclude the column named 'State'
states.loc[:, states.columns != 'State'].sum()

<hr style="border:1px solid gray">

## Writing `.csv` Files

Once you have manipulated the data, you may wish to save the results in a file. For example, earlier we found the top five states based on their land area. Suppose we wanted to save those results to a file as a `.csv` file. That is straightforward to do with `pandas` by using the `.to_csv()` method. Let's try it.

In [None]:
# Assuming top5_land is still in memory
# Print it out to see it
top5_land

In [None]:
# Now write it to a file named `top5_land.csv`
top5_land.to_csv('./data/top5_land.csv')

After opening up the file (either with Excel or in the Jupyter interface), you should notice something. The first column does not have a header. If you look closer, you will notice that this is the index from the original `DataFrame`. In many cases, we may not want the index included when we export the data to a `.csv` file. By default the `.to_csv()` method exports the index. You can turn it off by using the argument `index=False`. Let's try that and see if it works.

In [None]:
# Write it out again, without the index
top5_land.to_csv('./data/top5_land.csv', index=False)

<hr style="border:1px solid gray">

## Reading Simple `.xlsx` Files

With `pandas`, we can also easily read data from Microsoft Excel files. We will start by looking at a simple Excel file that only contains a single worksheet. We will be using the file `presidents.xlsx`, which contains each US president including the number, their name, and their height in centimeters. We use the `pd.read_excel()` function to get data out of an Excel file. You will also notice that when we read the file in this time, we are going to specify the index to use. We will use the column `PresidentNum`.

In [None]:
# Read the contents of the file presidents.xlsx into presidents
presidents = pd.read_excel('./data/presidents.xlsx', 
                           index_col='PresidentNum')
presidents

In [None]:
# Thomas Jefferson was the 3rd US president
# Print out his row of data using index label
presidents.loc[3]

In [None]:
# Print out TJ's row of data using position
# REMEMBER counting starts at 0
presidents.iloc[2]

In [None]:
# Who were the six shortest presidents?
presidents.sort_values(by='HeightCm').head(6)

In [None]:
# Did any president serve non-consecutive terms?
presidents.Name.value_counts()

In [None]:
# What were the presidential numbers that Grover Cleveland served?
presidents[presidents.Name == 'Grover Cleveland']

<hr style="border:1px solid gray">

<font color='red' size = '5'> Student Exercise </font>

You have been given an Excel file with the list of US presidents and their heights. Complete the following tasks in the code cells below:

1. Read the data from Excel file into a `DataFrame` called `prez`, being sure to use the `PresidentNum` as the index. What is its shape?
2. Sample 5 rows from `prez` to see what the data looks like.
3. Print out the summary statistics for the numerical columns in `prez`.
4. Print out the summary statistics for **ALL** columns in `prez`.
5. Create a new column that converts `HeightCm` into inches. Print out the `prez` to verify that it worked.

-----

In [None]:
# 1. Read in the Excel file 'presidents.xlsx' into a DataFrame called prez,
# being sure to use the `PresidentNum` as the index.

# What is its shape?


In [None]:
# 2. Sample 5 rows from `prez` to see what the data looks like.


In [None]:
# 3. Print out the summary statistics for the numerical columns in `prez`.


In [None]:
# 4. Print out the summary statistics for **ALL** columns in `prez`.


In [None]:
# 5. Create a new column that converts `HeightCm` into inches.
# Print out the `prez` to verify that it worked.


<hr style="border:1px solid gray">

## Reading `.xlsx` Files with Multiple Worksheets

In many situations, you will need to read data from multiple worksheets in an Excel workbook. If you know the names of the worksheets, you can specify the ones you want to read in with the `sheet_name` argument. You can also specify the position of the sheets with integers. If you want to read all the sheets, you can specify `sheet_name=None`. The result will be a dictionary. 

We have a spreadsheet containing information about our historical sales in the workbook named `storeSales.xlsx`. Let's try reading in different worksheets. (Note: This file is "large", so some of the reading may take a few seconds.)

In [None]:
first_sheet = pd.read_excel('./data/storeSales.xlsx')
first_sheet

In [None]:
# What does the 3rd sheet look like?
# Remember indexing starts at 0
third_sheet = pd.read_excel('./data/storeSales.xlsx', sheet_name=2)
third_sheet

In [None]:
# Now let's trying reading in all the sheets
# NOTE: This may take a minute or so ... be patient
store_sales = pd.read_excel('./data/storeSales.xlsx', sheet_name=None)

In [None]:
# How many sheets total?
print(f'There are {len(store_sales)} worksheets')

In [None]:
# What are their names?
print(store_sales.keys())

In [None]:
# What does the sheet 'promotions' look like?
store_sales['promotions']

In [None]:
# Look at the data types for promotions
store_sales['promotions'].dtypes

<hr style="border:1px solid gray">

## Writing `.xlsx` Files

When you only need to write data to an Excel file with a **single** sheet, the process is straightforward. You simply use the `.to_excel('desired_file_name.xlsx')` function. That is, it is only necessary to specify a target file name.

To write to multiple sheets it is necessary to create an `ExcelWriter` object with a target file name, and specify a sheet in the file to write to. By specifying unique `sheet_name` arguments, multiple sheets may be written. Note that creating an `ExcelWriter` object with a file name that already exists will result in the contents of the existing file being erased.

Let's try it.

In [None]:
# Find the 6 shortest US presidents and write to `short_prez.xlsx`
# Assuming that presidents is still in memory here
short_prez = presidents.sort_values(by='HeightCm').head(6)
short_prez.to_excel('./data/short_prez.xlsx')
print('Should have saved these to short_prez.xlsx')
short_prez

Let's open up that file and look at it. You should notice that it saved the index column and the names of the columns by default. This is the result we are probably hoping for in this case. If you do not want the index saved, similar to what we did earlier for `.csv` files, then you can specify `index=False`.

----

Now let's try writing multiple worksheets to a new Excel file. Let's take the first 10 rows of each worksheet from our `store_sales` data and write it back out to a new Excel file named `first_ten.xlsx`. Each worksheet should use the same name as in the original data file.

In [None]:
# Let's first just loop over store_sales, making sure we can get
# the sheet name and the first 10 rows of data
for k,v in store_sales.items():
    print(k)
    print(v.head(10))

In [None]:
# Use an `ExcelWriter` object
with pd.ExcelWriter('./data/first_ten.xlsx') as writer:
    # Loop over the dictionary. k = sheet_name, v.head(10) is DataFrame
    for k,v in store_sales.items():
        print(f'Writing to sheet {k}')
        v.head(10).to_excel(writer, sheet_name=k)

In [None]:
# After opening the resulting file, we see that it included
# the index in each sheet, which we really don't want
# Try again ...
with pd.ExcelWriter('./data/first_ten.xlsx') as writer:
    # Loop over the dictionary. k = sheet_name, v.head(10) is DataFrame
    for k,v in store_sales.items():
        print(f'Writing to sheet {k}')
        v.head(10).to_excel(writer, sheet_name=k, index=False)

<hr style="border:1px solid gray">

<font color='red' size = '5'> Student Exercise </font>

Complete the following tasks in the code cells below:

1. Using the `DataFrame` called `presidents`, find the 7 tallest US presidents, storing the result in a variable named `tall_prez`.
2. Export `tall_prez` to the file `tall_prez.xlsx`. Verify it worked.
3. Using the `store_sales` data, sample seven rows from the worksheets 'customers' and 'transactions' and write the results to a new Excel file named `random_seven.xlsx`. Each worksheet should use the same name as the worksheet the data came from.


-----

In [None]:
# 1. Using the `DataFrame` called `presidents`, find the 7 tallest 
# US presidents, storing the result in a variable named `tall_prez`.


In [None]:
# 2. Export `tall_prez` to the file `tall_prez.xlsx`. Verify it worked.


In [None]:
# 3. Using the `store_sales` data, sample seven rows from the worksheets 
# 'customers' and 'transactions' and write the results to a new Excel file
# named `random_seven.xlsx`. Each worksheet should use the same name as 
# the worksheet the data came from.


print('Do not forget to verify it worked!')

![Panda waving good-bye](https://mddean.people.wm.edu/MBA/images/panda4.jpg)

### Additional Resources

The following links point you to additional resources that you might find helpful in learning this material.

1. The official API reference for [`pandas.Series`][1].
2. The official API reference for [`pandas.DataFrame`][2].
3. The [user guide][3] for `pandas`.

-----

[1]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.html
[2]: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html
[3]: https://pandas.pydata.org/pandas-docs/stable/user_guide/index.html


**&copy; 2022 - Present: Matthew D. Dean, Ph.D.   
Clinical Associate Professor of Business Analytics at William \& Mary.**