# What is Pandas?
NumPy is fast and organized. In some ways, it can be used like a table, since we can access individual rows and columns of a NumPy matrix.  However, because NumPy arrays have a fixed data type, the column names can't be a part of the matrix without making it difficult to access an entire column of just numerical data (if you tried to take the mean of a column you would get an error because the header is text and text cannot be calculated as part of the mean). 

Pandas is a Python library built on top of NumPy that is useful for loading, cleaning, and analyzing two dimensional (table-like) data. Data scientists often use it to import data from a particular format (.csv, .xlsx, .txt), explore it, and transform it to turn data into information.

Pandas is a fast, powerful, and easy to use. It facilitates data imports and converts tables into data frames, which come pre-equipped with methods for summarizing, cleaning, and exporting data. Pandas data frames are also nicely compatible with other scientific libraries, such as the visualization tool matplotlib.

The Pandas module of this course is the largest and most intense one. However, don't be afraid! Pandas is easy to grasp if you've worked with SQL and Python before. Below is a summary of what you will learn in this module:

## Using Pandas with Jupyter Lab
Pandas and Jupyter Lab are like peanut butter and jelly; they were made to be used together! Pandas data is contained in organized structures called DataFrames that can be printed out with symbols in a regular Python console, but are created with HTML in Jupyter Lab. This means that you can interact with Pandas data as a formatted table, allowing you to hover over individual rows and scroll vertically and horizontally to see more rows and columns. Pandas also has methods that allow you to customize the output of data to the screen in your Jupyter Lab notebooks.

## Dataframes
Pandas organizes information into data structures called dataframes, which are table-like in their representation and thus easily understood logically. The DataFrame is really just a dictionary of Series objects, where each Series represents a column and is formatted as a NumPy array (one value for each row of data). The key of each item in the dictionary is the column name, and the value of the item is an array of data.

In [4]:
import pandas as pd

# On the outside, a DataFrame looks like a table. On the inside, it looks something like this:
df = pd.DataFrame(
    {
        "firstName": ["Michael", "Dwight", "Jim", "Pam"],
        "lastName": ["Scott", "Schrute", "Halpert", "Beesley"],
        "age": [47, 35, 31, 29]
    }
)

Dataframes and Series objects are the fundamental building blocks of Pandas. They come with many useful methods built-in, including the `.describe()` and `.info()` functions for obtaining quick overviews of the data.

## Indexes
Indexes in Pandas work in the same way that they worked in NumPy, meaning that each row and each column can be accessed by its numerical index. However, Pandas dataframes provide additional functionality to these indexes, allowing us to not only access data by its location, but also by its name (ie. column names or row names). Thus, data access is easier to perform, since memorizing indexes is unnecessary. Pandas also allows the user to manually add, remove, and rename indexes to accommodate organization and performance needs.

## Filtering
Much in the same way that we saw with NumPy arrays, Pandas dataframes can be filtered to exclude or include rows that meet specified conditions. Pandas also comes pre-equipped with functions that can easily process entire Series objects (arrays of data) using vectorized equations, meaning that filtering is performed more quickly and easily than it would be by iteration in a for-loop.

## Updating rows and columns
Pandas uses vectorized equations to update an entire series of rows or columns at once, rather than iterating through each of them. Using the `.apply()` and `.map()` functions, Pandas allows the user to easily apply their own custom functions to datasets to clean and modify them.

## Adding and removing rows and columns
As might be expected, Pandas allows the user to add and remove rows from DataFrames. Removing data is as easy as filtering the dataframe and then saving the new structure to the original variable name. Adding new rows and columns, however, can be accomplished with the `.append()`, `.join()`, and `.merge()` functions. These functions work in the same way that a SQL JOIN or INSERT would work.

## Sorting data
Just like in SQL, Pandas dataframes can be sorted by a single or multiple columns. This can be useful for data exploration and data cleaning.

## Grouping and aggregating data
Pandas dataframes can be aggregated and grouped similarly to SQL. Grouping and aggregating creates a new dataframe that reduces many rows to just a few and facilitates easy comparison between groups within a dimension. Aggregating is also useful for plotting, especially when counts or averages are not automatically plotted by the package.

## Cleaning data
One of the most important functions of the Pandas library is its ability to facilitate data cleaning. By combining filters to update specific rows of the dataframe, we can easily change the format of values in the dataframe so that the data meets the needs of our analysis.

## Date formatting
Pandas comes with some date functions built-in. Just like in SQL, this allows for easier cleaning and data exploration when dates are involved with the data set.

## Exporting data
At any time, Pandas dataframes can be exported to a CSV or Excel file, although many other file types are available. It accomplishes this by using one of many methods attached to the dataframe, including `.to_csv()` and `.to_excel()`.

Please note that this course will not be able to teach you every single Pandas function and possible parameter. If you have specific questions about how to use Pandas for your project, do a Google search or look up a function using the [Pandas documentation](https://pandas.pydata.org/docs/).